Speech, Language, Audio, Music Processing with Large Language Model
-
Updated
Jun 12, 2024 - Python
Speech, Language, Audio, Music Processing with Large Language Model
Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image Analysis
A PyTorch-based Speech Toolkit
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Multilingual and Controllable Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart.
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Automated Reproducible Acoustical Analysis
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Real-time Voice Activity Detection (VAD) with some example use case like simple voice bot and live transcription (realtime transcription)
😎 Awesome lists about Speech Emotion Recognition
Scripts for data generation, scoring and data manifest preparation for CHiME-8 DASR task.
Reading list for research topics in multimodal machine learning
Speaker identification on audio files using the pyannote/embedding model.
Cross-platform, real-time, offline speech recognition plugin for Unreal Engine. Based on Whisper OpenAI technology, whisper.cpp.
General Speech Restoration
Python library for converting numbers to words for all Indian Languages.
Implementation of [Librosa](https://github.com/librosa/librosa) like [STFT](https://en.wikipedia.org/wiki/Short-time_Fourier_transform) using [FFTW](https://www.fftw.org/)
🔉 spafe: Simplified Python Audio Features Extraction
Add a description, image, and links to the speech-processing topic page so that developers can more easily learn about it.
To associate your repository with the speech-processing topic, visit your repo's landing page and select "manage topics."