Finetuning multi-speaker model? #3733
Unanswered
suckrowPierre
asked this question in
General Q&A
Replies: 2 comments 4 replies
-
Maybe you are wrong in inference. You should check again speaker_id as arg pass in to model when inference.
|
Beta Was this translation helpful? Give feedback.
3 replies
-
So how do I finetune a multispeaker model with multiple new speakers? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I tried finetuning the XTTS2 multi speaker model. But I am not sure I did it the correct way. I created a train and eval dataset with the structure
audio_file|text|speaker_name
. For the speaker_name I used the name of my new speaker. After completion, I loaded the model in the demo gradio ui like in the following: https://www.youtube.com/watch?v=8tpDiiouGxc. Now I can generate some audio in the voice of the new speaker, but I still need to provide a reference audio. So I actually wonder if it really trained on my data or just skipped everything because of the new speaker_name.I can't list the speaker ids with
--list_speaker_idxs
. Or try to run it with the tts command. When doing it I get:NotADirectoryError: [Errno 20] Not a directory: '/home/coqui/jens/model/run/training/GPT_XTTS_FT-May-11-2024_08+18AM-0000000/best_model.pth/model.pth'
What I want is a multi-speaker model with some new speakers.
Can this be done with finetuning ?
Or do I need to train a multi-model from scratch with my new speakers and the used data for XTTS-v2 ?
Any help would be greatly appreciated. Currently I am a bit lost and can't find any concrete examples for this.
Beta Was this translation helpful? Give feedback.
All reactions