zindi_mcv_swahilli

public word error rate: 0.114294524
private word error rate: 0.112809661

How I used Seamless m4t large to get to the top 5 of the mozilla common voice competition. I only downloaded the test.tar.gz directory later I unzipped it and resampled all the audio to 16KHz. I noticed that there was some audio that was muffled, and was pretty bad as is due to the sampling rates that were set. Anyways, the script I used to do the conversion is called prepare_files.sh. Follow the instructions to install seamless m4t large. I performed inference on each audio file python asr.py the output was then saved to asr_results.csv then it was formatted to a certain format needed for Zindi with python clean_submission.py.

You can do all this in one step

make run

Lesson

Review huggingface leaderboard for the ASR models. Look for one with the fastest and the most accurate.

leaderboard

Facebook/meta have a lot of Speech to text models. Look for one that is capable of doing Speech to text. The ones that primarily do one thing seem to be the best.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
asr.py		asr.py
clean_submission.py		clean_submission.py
prepare_files.sh		prepare_files.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

Makefile

Makefile

README.md

README.md

asr.py

asr.py

clean_submission.py

clean_submission.py

prepare_files.sh

prepare_files.sh

Repository files navigation

zindi_mcv_swahilli

You can do all this in one step

Lesson

About

Releases

Packages

Languages

License

Shuyib/zindi_mcv_swahilli

Folders and files

Latest commit

History

Repository files navigation

zindi_mcv_swahilli

You can do all this in one step

Lesson

About

Topics

Resources

License

Stars

Watchers

Forks

Languages