Finetuning Model on custom dataset, resulting in audio generations which have high treble #3746
Unanswered
chinmay-choudhary
asked this question in
General Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am finetuning the xtts v2 model on a custom dataset which I have formatted into the ljspeech data format. The dataset contains audios and transcripts of speakers speaking english with indian accents. Currently I am training with 10K data samples, the problem I am having is that when I listen to the audios generated using my test sentences in tensorboard, all seem to have a lot of treble I will upload one of the audio sample below where I had to reduct the treble to -10 in audacity to make it better. I was wondering if there is any way to control that or any advice on the type of data should I use for training to not face this issue?
Original Enlish Audio
english.mov
Treble Reduced Audio
english_treble_reduced.wav.mov
Beta Was this translation helpful? Give feedback.
All reactions