Skip to content

rudymohammadbali/Real-time-STT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Real-time STT

Real-time Speech To Text using Faster Whisper.

Features

  • Real-time Speech to Text Conversion: Converts spoken language into written text in real-time.
  • Microphone Support: Utilizes the system’s default microphone for audio input.
  • Background Listening: Continuously listens to audio input in the background.
  • Transcription: Transcribes the recorded audio into text.
  • Stopping Mechanism: Provides an option to stop the transcription process at any time.
  • Retrieving Transcription: Allows retrieval of the last transcribed text.
  • Thread Safety: Ensures safe concurrent execution with multiple threads.
  • Logging: Logs important events and messages for debugging and tracking.

Demo

13 seconds audio file generated by AI

stt.mp4

Using audio file:

[0.00s -> 4.56s] A golden sunrise painted the sky, casting a warm glow on the quiet town below.

[5.44s -> 8.32s] The aroma of freshly baked bread wafted through the air.

[9.28s -> 13.52s] The town was waking up, ready to embrace a new day full of possibilities.

Real-time transcription:

You said:  The golden sunrise painted the sky, casting a warm glow on the quiet town below.

You said:  the aroma of freshly baked bread wafted through the air.

You said:  The town was waking up, ready to embrace a new day full of possibilities.

Check demo folder for audio files and results.

Installation

Install Real-time STT manually

  • Python 3.7-3.9 (tested on 3.8)
  • CUDA 11.8
  • CUDA Toolkit 12
  git clone https://github.com/rudymohammadbali/Real-time-STT.git
  cd Real-time-STT
  pip install -r requirements.txt

Install CUDA

# CUDA 11.8
pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118
# CPU only
pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cpu

Download and install CUDA Toolkit 12 from: https://developer.nvidia.com/cuda-downloads

Check FasterWhisper for more info: https://github.com/SYSTRAN/faster-whisper

Usage/Examples

try:
    stt = STT(model_size="base.en", device="cuda", compute_type="float16", language="en", logging_level="INFO")
    stt.listen() # Start listening in background

    while stt.is_listening:
        last_transcription = stt.get_last_transcription()
        if len(last_transcription) > 0:
            print("You said: ", last_transcription) # Get last transcription
            # If user said 'stop' then stop the transcription process by calling stt.stop()
            if "stop" in last_transcription.lower():
                stt.stop()

        time.sleep(1)

except KeyboardInterrupt:
    pass

Contributing

Contributions are always welcome!

  • Reporting a bug
  • Discussing the current state of the code
  • Submitting a fix
  • Proposing new features

If you want to contribute, please fork the repository and use a feature branch. Pull requests are warmly welcome.

Support

If you'd like to support my ongoing efforts in sharing fantastic open-source projects, you can contribute by making a donation via PayPal.

About

Real-time Speech To Text using Faster Whisper.

Topics

Resources

Stars

Watchers

Forks

Languages