Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use ipex.optimize for two models (model distillation) and one optimizer #576

Open
ferreirafabio opened this issue Mar 30, 2024 · 2 comments

Comments

@ferreirafabio
Copy link

Describe the issue

Hi,
I'm wondering how to use ipex.optimize(...) when I have two models, for example, teacher and student in model distillation but only one optimizer. Would calls like the following work? Note that teacher and student are both in train mode but teacher parameters are set to requires_grad=False, and the optimizer only operates over the student parameters.

teacher, optimizer = ipex.optimize(model=teacher, dtype=torch.float32, optimizer=optimizer)
student, optimizer = ipex.optimize(model=student, dtype=torch.float32, optimizer=optimizer)

I'm unaware of the intricacies this way of calling it may entail, and would like to get a waiver indicating that this is something okay to do. Thank you.

@vishnumadhu365
Copy link

@ferreirafabio Hey Fabio, assuming the teacher is a large pretrained model, the optimizer would be required only for the student model. The following snippet depicts the usage of ipex,

teacher.eval()
student.train()
optimizer = ...

teacher = ipex.optimize(model=teacher, dtype=torch.float32)
student, optimizer = ipex.optimize(model=student, dtype=torch.float32, optimizer=optimizer)
...iterate over train data...
    with torch.no_grad():
        teacher_probs = softmax(teacher(train_data))
    
    student_probs = softmax(student(train_data))
    ... compute combined loss ...
    ..backprop..
    ..optimizer update..

Other things to consider:

  1. for teacher model you could also apply torch jit trace and freeze for added eval perf gains, refer ipex inference code samples
  2. using torch.bfloat16 dtype on latest Intel Xeon CPU and GPU's provides better performance

@ferreirafabio
Copy link
Author

@vishnumadhu365 thank you for your reply and code example. I tried that but get a

AssertionError: The optimizer should be given for training mode

since the teacher is still in train mode since I do not execute .eval() mode on it. AFAIK, I cannot put it in eval mode because I do still want the Batch Norm statistics from the training mode. The precise application I'm using is training DINO. see here for the exact usage of teacher/student:

https://github.com/facebookresearch/dino/blob/7c446df5b9f45747937fb0d72314eb9f7b66930a/main_dino.py#L210

What can be done in such a case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants