Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] MDX23C-8KFFT-InstVoc_HQ bug on linux (Google Colab) #5

Open
ShiromiyaG opened this issue Apr 5, 2024 · 7 comments
Open

[Bug] MDX23C-8KFFT-InstVoc_HQ bug on linux (Google Colab) #5

ShiromiyaG opened this issue Apr 5, 2024 · 7 comments

Comments

@ShiromiyaG
Copy link

I was testing the MDX23C-8KFFT-InstVoc_HQ on Google Colab, and I was surprised when I heard the result, the audio was slow, the singer was singing slowly and the audio length was longer. I tested the same song on Windows with the same settings, and the results were normal.
Here, the code I used, both in Colab and on Windows:

MDX23C = models.MDXC(name="MDX23C-8KFFT-InstVoc_HQ", other_metadata={'is_mdx_c_seg_def': True,'segment_size': 384,'batch_size': 8,'overlap_mdx23': 8,'semitone_shift': 0},device=device, logger=None)
res = MDX23C(input_file)
vocals = res["vocals"]
af.write(f"{no_inst_folder}/{basename}_MDX23C.wav", vocals, MDX23C.sample_rate)

Here, the link to the songs:
https://drive.google.com/drive/folders/11aete_dd56XqR68P2cr_BMRlPhvHb7W0?usp=drive_link

And also an Audacity photo of the songs:
image

@MohannadEhabBarakat
Copy link
Contributor

Are you sure that the input_file had 44100 sampling rate? The current code doesn’t resample automatically.

@ShiromiyaG
Copy link
Author

@MohannadEhabBarakat Yes, I'm sure, I don't think I've ever used Hi-Res audio in separation. All the audio I use comes from Deezer

@ShiromiyaG
Copy link
Author

ShiromiyaG commented Apr 5, 2024

@MohannadEhabBarakat And also, I used the same audio in both Windows and Colab and had different results, which I found strange. Maybe it's something to do with package versions. Here is the requirements file that I used in colab. I'm going to test it today with VR models with the same package versions, and write what the results were
requirements.txt

@ShiromiyaG
Copy link
Author

I just tested with two models, a VR (karokee_4band_v2_sn) and an MDX (Reverb HQ), and both gave normal results. I remembered that in the last tests I did, I used videos from YT, not from Deezer, but I don't think this is a problem, since the normal results from VR and MDX were using a video from YT

@ShiromiyaG
Copy link
Author

ShiromiyaG commented Apr 16, 2024

I was testing the HQ4, it also has this same problem, both on Windows and Linux. It looks like the semitone_shift is wrong. Also, this message apear

C:\Users\Guilherme\anaconda3\lib\site-packages\uvr\models_dir\mdx\mdx_interface.py:270: RuntimeWarning: invalid value encountered in divide
  tar_waves = result / divider

@MohannadEhabBarakat
Copy link
Contributor

MohannadEhabBarakat commented Apr 18, 2024

@MohannadEhabBarakat And also, I used the same audio in both Windows and Colab and had different results, which I found strange. Maybe it's something to do with package versions. Here is the requirements file that I used in colab. I'm going to test it today with VR models with the same package versions, and write what the results were requirements.txt

I think that might be caused because of package versions or resampling algorithms. I noticed that UVR GUI used different resampling according to the OS. I'm not sure why they did it but I just followed them to replicate the same results. For the package versions unfortunately even using the same versions might not solve the issue; As some libraries will have different implementations on different OSs (even with the same version). The workaround that worked for me in the past was to wrap everything in a docker file. Which is basically unifying the OS.

As I'm back now I'll be working on:

  1. Fixing the bugs you found
  2. Adding new docs
  3. Adding new weights (at least the ones you tested)

So if you can send me an email with your findings and the current bugs, it will help me a lot 🤗. Mohannad.Barakat@fau.de

@ShiromiyaG
Copy link
Author

ShiromiyaG commented Apr 19, 2024

@MohannadEhabBarakat And also, I used the same audio in both Windows and Colab and had different results, which I found strange. Maybe it's something to do with package versions. Here is the requirements file that I used in colab. I'm going to test it today with VR models with the same package versions, and write what the results were requirements.txt

I think that might be caused because of package versions or resampling algorithms. I noticed that UVR GUI used different resampling according to the OS. I'm not sure why they did it but I just followed them to replicate the same results. For the package versions unfortunately even using the same versions might not solve the issue; As some libraries will have different implementations on different OSs (even with the same version). The workaround that worked for me in the past was to wrap everything in a docker file. Which is basically unifying the OS.

As I'm back now I'll be working on:

  1. Fixing the bugs you found
  2. Adding new docs
  3. Adding new weights (at least the ones you tested)

So if you can send me an email with your findings and the current bugs, it will help me a lot 🤗. Mohannad.Barakat@fau.de

I can try to help, but I don't know if it would be of much help, since I don't use most models, and I end up using only specific ones. In fact, I tested a model that is not available in the UVR repository, but that works both in UVR and in your code. If you want to take a look at this model I'm referring to, I uploaded it to the link below:
https://github.com/ShiromiyaG/RVC-AI-Cover-Maker/releases (its the karokee model)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants