-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
please create true comparisons with other whisper implementations #162
Comments
Hey @BBC-Esq! Thanks for reaching out, I appreciate your interest in the Whisper-JAX project! Unfortunately this repo is more or less archived now, since we stopped working on it back in April. It was a fun project to see how fast we could make Whisper in JAX on TPU v4-8's, but the community is simply more interested in running on GPUs, which means we've switched to focussing on optimisations that can be applied uniformly, independent of hardware (e.g. Distil-Whisper: https://github.com/huggingface/distil-whisper). There are some scripts for reproducing the benchmarks here. The |
Hey, thank you. BTW, tell your colleague over at insanely-fast-whisper to change his readme and not dump on other peoples' work... Moving on...Thank you sincerely for the technical discussion. I'm excited that you're working on Distil-Whisper. I still need to test that! To make sure I understand you...Jax version is much faster on TPU but Hugging Face Transformers is much faster on GPUs? I don't have a TPU so... |
I forgot to ask, is it true that Distil-Whisper can't do any language besides English...and that's basically the tradeoff? Looking forward to your work on distil-whisper. Definitely in the hopper to try out. |
We found Whisper JAX to be faster than Hugging Face Transformers' Whisper (same as If you're using cloud computing, then swapping from GPUs to TPU v3's on GCP are quite reasonably priced, and you can run transcription super fast using there: https://cloud.google.com/tpu/pricing. TPU v4s are what we benchmarked to get the fastest results. |
Yes that's right. Distil-Whisper is English-only since it's the language that has the most usage, but we still want to provide checkpoints that support more languages. Distilling Whisper on all the languages it supports in one go is hard - the decoder is very small, so it's difficult for it to have good knowledge of all languages at once. Instead, we're actively encouraging the community to distill language-specific Whisper checkpoints. We've released all the Distil-Whisper training code, and a comprehensive description of the steps required to perform distillation: https://github.com/huggingface/distil-whisper/tree/main/training. Feel free to ask if you have any questions about Distil-Whisper on the repo! I'd be more than happy to answer. |
That's awesome, thanks for the info, very helpful. I noticed you said that Hugging Face Transformers' Whisper...is that the same as BetterTransformers in a way? BetterTransformers is basically a class/library that Huggingface created...kind of like Pipeline? I'm learning about Pipeline and how it simplifies things...and I'm learning about the parameters you can use... My question is, what person (or people) actually, physically, created the batching functionality of the Pipeline...upon which the "insanely" fast whisper (insert lightning bolt, insert the word "blazingly" a few more times...) uses? It appears that the developer for insanely-faster-whisper singlehandedly created that functionality, thus enabling the world to experience insanely faster whisper for the betterment of mankind. I'd like to know who's actually responsible and/or if it was a team effort over there with you guys. I'd like to know who to follow to keep abreast of the creative and hard work you guys do...Thanks. |
=> now this is all the underlying code that you need to get the reported speed-ups What |
I didn't get a direct response to my main question. Let's try asking again...Did the dude at insanely-fast-whisper actually create any of the code of these underlying technologies? Wink with your left eye if you don't want to answer because ya'll work together or wink with your right eye for no, he didn't actually create any new innnovation and it was other people...We'll just keep this between you and I. lol. |
You can do all the best open-source work, develop the best open-source models, and have the fastest open-source library. But if no-one knows about, it's useless! In that regard, making these tools more accessible and visible to the OS community is just as valuable (if not more) than actually developing them. I don't think we should credit people any more or any less depending on what they've done here. It's a collaborative effort in which we simply want to work with the community to create the best open-source speech technologies possible. |
Thanks for the platitudes, but they still didn't answer my question. No worries, I understand...you're in a tight spot with him being a co-worker. I am not, however, in such a situation. If he deserved credit I'll recognize that, but seems like he's actually done nothing new or innovative whatsoever so... Anyways, I've spent a lot of my personal time on this so I'm going to give it a break for a day...Feel free to test out my program if you want, or if you want to re-created the tests I've done. Thanks. |
Based on Vaibhavs10/insanely-fast-whisper#82 I'd suggest to have this closed. |
lol, Noting was actually addressed, but go ahead and close if you want Sanchit. |
Rather than re-typing everything, I'm simply providing a link to my issue request in the
insanely-fast-whisper
repository for you asking the same type of information be put in the readme:See Here
Plus, you guys both work on the same team so I'm hoping to work together to get some more accurate/explanatory numbers for people to rely on...
You claim that the Jax implementation is 70x faster...that portion of the readme hasn't been updated in awhile, and in the meantime other advancements have been made. Also, it has never compared other options like faster-whisper, whisper.cpp, WhisperX, etc. If you would be so kind, please include additional true comparisons for people, that way when they spend hours possibly revising code they can assess whether it's worth it from a cost-benefit perspective and they're clear on "batching" being the source of any speed increase...increased vram usage...or whatever.
Thanks, still love the work ya'll do!
The text was updated successfully, but these errors were encountered: