Replies: 1 comment
-
Hi @theshypig,
You could extract the BERT embeddings separately, save them to the file, and load the model via https://github.com/espnet/espnet/blob/master/egs2/TEMPLATE/asr1/asr.sh#L753
You do not need to modify it. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Dear ESPnet Team,
I hope this message finds you well. I am writing to inquire about the possibility of implementing cross-modal knowledge transfer in CTC-based ASR models using a BERT pre-trained model within the ESPnet framework.
From my observations, it appears that the current ESPnet framework does not support this implementation. I noticed in a previous issue that someone had raised a similar question and received a response indicating that knowledge distillation is implemented in the TTS component. However, this does not align with my specific needs.
Upon reviewing the TTS implementation, I noticed that it requires two GPUs to perform distillation. In my opinion, this approach seems somewhat resource-intensive. My current idea is to save the last hidden layer representation of the BERT pre-trained model on the ASR dataset, load it during training, and then proceed with cross-modal knowledge transfer.
During this process, I have two areas of uncertainty where I would greatly appreciate your advice:
If I want to introduce the saved BERT embeddings during training, which parts should I modify? (e.g., iterable_dataset.py, dataset.py, preprocessor, trainer for getting batch data, training loss)
What is the purpose of
aux_ctc_tasks
during model training, and do I need to modify it?I look forward to your valuable insights and suggestions. Thank you for your time and consideration.
Best regards,
ck
Beta Was this translation helpful? Give feedback.
All reactions