Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation
-
Updated
Jun 3, 2024
Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
faster_whisper GUI with PySide6
Metadata and versioning details for the Common Voice dataset
A fast CPU-first video/audio transcriber for generating caption files with Whisper and CTranslate2, hosted on Hugging Face Spaces.
Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
Java library which allows you to retrieve subtitles/transcripts for a YouTube video.
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
FreeSWITCH Mod_FunASR语音识别模块,基于此模块实现空号识别+关机等异常状态或早期媒体音检测,无需Asr语音识别费用。
FreeSWITCH 阿里云Mod_ASR模块 基于2024年阿里云最新Sdk3,经过大量生产环境测试稳定。可用于AI智能外呼机器人。
Automatic Speech Recognition system for non-native English speakers
Production First and Production Ready End-to-End Speech Recognition Toolkit
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!
Synchronized Translation for Videos. Video dubbing
Running speech to text model (whisper.cpp) in Unity3d on your local machine.
Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.
how to compress large knowledge base (.mp4, .mp3, .wav) and transfer it into readable, short, summarized form for effective knowledge transfer
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Add a description, image, and links to the asr topic page so that developers can more easily learn about it.
To associate your repository with the asr topic, visit your repo's landing page and select "manage topics."