Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should a GPU help this algorithm go faster or no? #19

Open
jsteinberg-rbi opened this issue Sep 14, 2023 · 6 comments
Open

Should a GPU help this algorithm go faster or no? #19

jsteinberg-rbi opened this issue Sep 14, 2023 · 6 comments

Comments

@jsteinberg-rbi
Copy link

So from what I've seen when the script runs it attempts to run as a GPU if one is present, which of course is great. In fact I think it's even the default. For whatever reason it doesn't run as GPU on my NVIDIA A100. I have no issues with running whisper ... --device cuda, it works great and reduces the runtime of my transcription by an order of magnitude. I wish I could get the same result with Hallu. What am I missing? Thanks! Let me know if you want any other information from me.

@EtienneAb3d
Copy link
Owner

WhisperHallu is using Whisper or FasterWisper out of the box, without any modification on them. I don't understand why you didn't get them using your GPU.

@jsteinberg-rbi
Copy link
Author

jsteinberg-rbi commented Sep 14, 2023

@EtienneAb3d

Hey thanks for the prompt response! Er -- Whisper and FasterWhisper will use the GPU, but what about ffmpeg, demucs, etc -- are those going to take forever? I had figured that running your algorithm on a GPU would make all that "pre-processing" that prevents the Whisper hallucination go a lot faster? I'm using an NVIDIA A100 40GB.

Here's the log so far:

(base) root@instance-2:/home/jsteinberg/WhisperHallu# ls
README.md  data  demucsWrapper.py  hallu.py  markers  transcribeHallu.py
(base) root@instance-2:/home/jsteinberg/WhisperHallu# python hallu.py
Python >= 3.10
/opt/conda/lib/python3.10/site-packages/torch/hub.py:286: UserWarning: You are about to download and run code from an untrusted repository. In a future release, this won't be allowed. To add the repository to your trusted list, change the command to {calling_fn}(..., trust_repo=False) and a command prompt will appear asking for an explicit confirmation of trust, or load(..., trust_repo=True), which will assume that the prompt is to be answered with 'yes'. You can also use load(..., trust_repo='check') which will only prompt for confirmation if the repo is not already trusted. This will eventually be the default behaviour
  warnings.warn(
Downloading: "https://github.com/snakers4/silero-vad/zipball/master" to /root/.cache/torch/hub/master.zip
Using Demucs
Downloading: "https://dl.fbaipublicfiles.com/demucs/hybrid_transformer/955717e8-8726e21a.th" to /root/.cache/torch/hub/checkpoints/955717e8-8726e21a.th
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 80.2M/80.2M [00:00<00:00, 111MB/s]
/opt/conda/lib/python3.10/site-packages/whisper/timing.py:58: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  def backtrace(trace: np.ndarray):
Using standard Whisper
LOADING: large-v2 GPU:0 BS: 2
100%|█████████████████████████████████████| 2.87G/2.87G [00:47<00:00, 65.2MiB/s]
LOADED
=====transcribePrompt
PATH=../230821_0020S12.wav
LNGINPUT=en
LNG=en
PROMPT=Whisper, Ok. A pertinent sentence for your purpose in your language. Ok, Whisper. Whisper, Ok. Ok, Whisper. Whisper, Ok. Please find here, an unlikely ordinary sentence. This is to avoid a repetition to be deleted. Ok, Whisper. 
CMD: ffmpeg -y -i "../230821_0020S12.wav"  -c:a pcm_s16le -ar 16000 "../230821_0020S12.wav.WAV.wav" > "../230821_0020S12.wav.WAV.wav.log" 2>&1
T= 10.130795001983643
PATH=../230821_0020S12.wav.WAV.wav
Demucs using device: cuda:0
Source: drums
Source: bass
Source: other
Source: vocals
T= 186.54959273338318
PATH=../230821_0020S12.wav.WAV.wav.vocals.wav
CMD: ffmpeg -y -i "../230821_0020S12.wav.WAV.wav.vocals.wav" -af "silenceremove=start_periods=1:stop_periods=-1:start_threshold=-50dB:stop_threshold=-50dB:start_silence=0.2:stop_silence=0.2, loudnorm"  -c:a pcm_s16le -ar 16000 "../230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav" > "../230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav.log" 2>&1
T= 58.83332967758179
PATH=../230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav
DURATION=7452
/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: UserWarning: operator() profile_node %669 : int[] = prim::profile_ivalue(%667)
 does not have profile information (Triggered internally at ../third_party/nvfuser/csrc/graph_fuser.cpp:104.)
  return forward_call(*args, **kwargs)
T= 27.54055142402649
PATH=../230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav.VAD.wav
NOT USING MARKS FOR DURATION > 30s
[0] PATH=../230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav.VAD.wav

@jsteinberg-rbi
Copy link
Author

Wowza. I got it working.

@EtienneAb3d
Copy link
Owner

@jsteinberg-rbi
Demucs should run GPU. I think this is not possible with ffmpeg, but perhaps there is a possibility I ignore, especially for some features.
What did you do to get it working?

@jsteinberg-rbi
Copy link
Author

@EtienneAb3d The file I was testing with initially was a 4GB file and it would just spin forever. When I switched to a 2GB it ran in under 10 minutes :)

Question for you: so I ran your script over 30 files last night. Which one of these files has the silence removed?

230821_0020S12.wav
230821_0020S12.wav.WAV.wav
230821_0020S12.wav.WAV.wav.bass.wav
230821_0020S12.wav.WAV.wav.drums.wav
230821_0020S12.wav.WAV.wav.log
230821_0020S12.wav.WAV.wav.other.wav
230821_0020S12.wav.WAV.wav.vocals.wav
230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav
230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav.VAD.wav
230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav.log

@EtienneAb3d
Copy link
Owner

@jsteinberg-rbi
SILCUT = Silence Cut

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants