The YouTube Transcript Generator is a powerful tool designed to streamline the process of extracting and processing transcripts from YouTube videos. Whether you're looking to transcribe lectures, interviews, or any other video content, this project provides a convenient solution.
This tool is particularly useful for:
- Note Taking: Quickly convert YouTube videos into text format for easy note-taking.
- Content Analysis: Analyze and derive insights from video content by converting it into text data.
- Chat Bot Training: Use the generated transcripts to train chat bots, such as ChatGPT, for natural language understanding.
- Archiving: Create a textual archive of valuable information from YouTube videos. This can be particularly useful for archiving interviews, tutorials, or any content you'd like to reference later without the need to re-watch the video.
- Personal Knowledge Base: Build a personal knowledge base by extracting and processing transcripts from YouTube videos. This can aid in consolidating information on diverse topics in a readable and accessible format.
- Accessibility Improvement: Enhance accessibility for individuals who prefer or require text-based content. The tool can be used to generate transcripts with added punctuation, improving the overall readability of the content.
- Transcription: Obtain raw transcripts from YouTube videos.
- Punctuation: Enhance transcripts by adding punctuation using deep multilingual punctuation models.
- Chapter Detection: Identify and separate chapters in the video based on provided timestamps.
- User-friendly: Easy-to-use script with customizable parameters.
YOUTUBE_API_KEY
: Set up your Google API key for video information retrieval. You will need to create a Project in the Google Cloud for this and enable the YouTube v3 API. This is optional, if you don't add it, the chapters will not be added.
When running the script locally, you can pass these parameters to the script:
url
: YouTube video URL
-h, --help
: Show the help message and exit-l LANGUAGE, --language LANGUAGE
: Language for the transcript (default: en)-p, --punctuated
: Generate punctuated transcript (default: False)-a, -auto-open
: Automatically open the transcript in the default app (default: False)-o OUTPUT_DIR, --output_dir OUTPUT_DIR
: Output directory for saving the transcript (default: current directory)-f FILENAME, --filename FILENAME
: Filename for saving the transcript (default: Video Title or Video Id)-m PUNCTUATION_MODEL, --punctuation_model PUNCTUATION_MODEL
: Path to the punctuation model (default: None)-v, --verbose
: Print verbose output (default: False)
To run this project in Google Colab, follow these steps:
- Open the Google Colab Notebook.
- Add Google's Project API key to the secrets tab under this key:
YOUTUBE_API_KEY
and toggle notebook access to on. - Go to Runtime > Change Runtime Type and select T4 GPU type. If you use CPU, the output for punctuated transcript will take some minutes to complete (around 1 minute per 10-minute video)
- Change the values in the second cell to include your URL etc.
- Press CTRL+F9 or CMD+F9 to run the notebook.
I do not recommend running locally as it will download tensors and other stuff which are over 6gb. But if you want you can do this:
- Clone the repository:
git clone https://github.com/therohitdas/Youtube-Transcript-Generator.git && cd Youtube-Transcript-Generator
- Create a virtual environment:
python -m venv venv
- Activate the virtual environment:
source venv/bin/activate
(Linux/MacOS) orvenv\Scripts\activate
(Windows) - Install dependencies:
pip install -r requirements.txt
- Set up the required environment variables:
YOUTUBE_API_KEY
(optional). You can either create a.env
file or set them up in your system using. - Run the script:
python index.py <YouTube_URL>
orpython index.py -h
for the help menu.
For any issues or feature requests, please create an issue.
Here's an example of how to run the script with various options:
python index.py https://www.youtube.com/watch?v=VIDEO_ID
python index.py https://www.youtube.com/watch?v=VIDEO_ID -l fr
python index.py https://www.youtube.com/watch?v=VIDEO_ID
python index.py https://www.youtube.com/watch?v=VIDEO_ID -p
python index.py https://www.youtube.com/watch?v=VIDEO_ID -o /path/to/output
python index.py https://www.youtube.com/watch?v=VIDEO_ID -f custom_filename
python index.py https://www.youtube.com/watch?v=VIDEO_ID -v
python index.py https://www.youtube.com/watch?v=VIDEO_ID -m author/model_name
Punctuation model name can be taken from here.
Make sure to replace https://www.youtube.com/watch?v=VIDEO_ID
with the actual URL of the YouTube video you want to process.
Feel free to copy and paste these examples into your terminal.
This script utilizes the youtube-transcript-api and fullstop-punctuation-multilang-large libraries. Special thanks to their contributors.
Feel free to adapt and use the script based on your requirements. Enjoy the convenience of YouTube transcript processing!
The best way to connect is to email me namaste@theRohitDas.com
🚀 Happy transcribing!