Skip to content

lexust1/av2txtsum

Repository files navigation

av2txtsum

This repository is related to automatic speech recognition (ASR).

Repository Structure

The repository includes the following .ipynb files:

This notebook outlines the primary goals and objectives of the analysis. It includes instructions on how to download a video, extract audio, convert a text transcript to an SRT transcript, and describes the main tools and libraries used for transcription.

Open source project: whisper.cpp (based on OpenAI Whisper)

This notebook describes the results of using whisper.cpp, which is based on OpenAI Whisper.

Open source project: SeamlessM4T

This notebook describes the results of using the SeamlessM4T.

Open source project: faster-whisper (based on OpenAI Whisper)

This notebook describes the results of using faster-whisper, which is based on OpenAI Whisper.

Additionally, there are a few folders:

  • The data folder contains the transcripts. MP3, WAV, and MP4 files are excluded due to their significant size, but they can be extracted as described in the .ipynb files.
  • The utils folder contains several Python files that are excluded from the .ipynb files to avoid overloading them with code. Links to these files are included in the .ipynb files.