Real-time STT

Real-time Speech To Text using Faster Whisper.

Features

Real-time Speech to Text Conversion: Converts spoken language into written text in real-time.
Microphone Support: Utilizes the system’s default microphone for audio input.
Background Listening: Continuously listens to audio input in the background.
Transcription: Transcribes the recorded audio into text.
Stopping Mechanism: Provides an option to stop the transcription process at any time.
Retrieving Transcription: Allows retrieval of the last transcribed text.
Thread Safety: Ensures safe concurrent execution with multiple threads.
Logging: Logs important events and messages for debugging and tracking.

Demo

13 seconds audio file generated by AI

stt.mp4

Using audio file:

[0.00s -> 4.56s] A golden sunrise painted the sky, casting a warm glow on the quiet town below.

[5.44s -> 8.32s] The aroma of freshly baked bread wafted through the air.

[9.28s -> 13.52s] The town was waking up, ready to embrace a new day full of possibilities.

Real-time transcription:

You said:  The golden sunrise painted the sky, casting a warm glow on the quiet town below.

You said:  the aroma of freshly baked bread wafted through the air.

You said:  The town was waking up, ready to embrace a new day full of possibilities.

Check demo folder for audio files and results.

Installation

Install Real-time STT manually

Python 3.7-3.9 (tested on 3.8)
CUDA 11.8
CUDA Toolkit 12

  git clone https://github.com/rudymohammadbali/Real-time-STT.git
  cd Real-time-STT
  pip install -r requirements.txt

Install CUDA

# CUDA 11.8
pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118
# CPU only
pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cpu

Download and install CUDA Toolkit 12 from: https://developer.nvidia.com/cuda-downloads

Check FasterWhisper for more info: https://github.com/SYSTRAN/faster-whisper

Usage/Examples

try:
    stt = STT(model_size="base.en", device="cuda", compute_type="float16", language="en", logging_level="INFO")
    stt.listen() # Start listening in background

    while stt.is_listening:
        last_transcription = stt.get_last_transcription()
        if len(last_transcription) > 0:
            print("You said: ", last_transcription) # Get last transcription
            # If user said 'stop' then stop the transcription process by calling stt.stop()
            if "stop" in last_transcription.lower():
                stt.stop()

        time.sleep(1)

except KeyboardInterrupt:
    pass

Contributing

Contributions are always welcome!

Reporting a bug
Discussing the current state of the code
Submitting a fix
Proposing new features

If you want to contribute, please fork the repository and use a feature branch. Pull requests are warmly welcome.

Support

If you'd like to support my ongoing efforts in sharing fantastic open-source projects, you can contribute by making a donation via PayPal.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
demo		demo
.gitattributes		.gitattributes
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

demo

demo

.gitattributes

.gitattributes

README.md

README.md

main.py

main.py

requirements.txt

requirements.txt

Repository files navigation

Real-time STT

Features

Demo

Using audio file:

Real-time transcription:

Installation

Usage/Examples

Contributing

Support

About

Languages

rudymohammadbali/Real-time-STT

Folders and files

Latest commit

History

Repository files navigation

Real-time STT

Features

Demo

Using audio file:

Real-time transcription:

Installation

Usage/Examples

Contributing

Support

About

Topics

Resources

Stars

Watchers

Forks

Languages