Skip to content

Create transcripts of YouTube video in your favorite language. Use it for note-taking, chat with YouTube videos and for training data.

License

Notifications You must be signed in to change notification settings

therohitdas/Youtube-Transcript-Generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

YouTube Transcript Generator

Open in Colab GitHub License GitHub Repo stars CodeFactor

Overview 🌐

The YouTube Transcript Generator is a powerful tool designed to streamline the process of extracting and processing transcripts from YouTube videos. Whether you're looking to transcribe lectures, interviews, or any other video content, this project provides a convenient solution.

How It Can Help 🚀

This tool is particularly useful for:

  • Note Taking: Quickly convert YouTube videos into text format for easy note-taking.
  • Content Analysis: Analyze and derive insights from video content by converting it into text data.
  • Chat Bot Training: Use the generated transcripts to train chat bots, such as ChatGPT, for natural language understanding.
  • Archiving: Create a textual archive of valuable information from YouTube videos. This can be particularly useful for archiving interviews, tutorials, or any content you'd like to reference later without the need to re-watch the video.
  • Personal Knowledge Base: Build a personal knowledge base by extracting and processing transcripts from YouTube videos. This can aid in consolidating information on diverse topics in a readable and accessible format.
  • Accessibility Improvement: Enhance accessibility for individuals who prefer or require text-based content. The tool can be used to generate transcripts with added punctuation, improving the overall readability of the content.

Features 🛠️

  • Transcription: Obtain raw transcripts from YouTube videos.
  • Punctuation: Enhance transcripts by adding punctuation using deep multilingual punctuation models.
  • Chapter Detection: Identify and separate chapters in the video based on provided timestamps.
  • User-friendly: Easy-to-use script with customizable parameters.

Environment Variables 🌐

  • YOUTUBE_API_KEY: Set up your Google API key for video information retrieval. You will need to create a Project in the Google Cloud for this and enable the YouTube v3 API. This is optional, if you don't add it, the chapters will not be added.

Script Parameters 📜

When running the script locally, you can pass these parameters to the script:

Positional Argument:

  • url: YouTube video URL

Optional Arguments:

  • -h, --help: Show the help message and exit
  • -l LANGUAGE, --language LANGUAGE: Language for the transcript (default: en)
  • -p, --punctuated: Generate punctuated transcript (default: False)
  • -a, -auto-open: Automatically open the transcript in the default app (default: False)
  • -o OUTPUT_DIR, --output_dir OUTPUT_DIR: Output directory for saving the transcript (default: current directory)
  • -f FILENAME, --filename FILENAME: Filename for saving the transcript (default: Video Title or Video Id)
  • -m PUNCTUATION_MODEL, --punctuation_model PUNCTUATION_MODEL: Path to the punctuation model (default: None)
  • -v, --verbose: Print verbose output (default: False)

Run in Google Colab 🚀

To run this project in Google Colab, follow these steps:

  1. Open the Google Colab Notebook.
  2. Add Google's Project API key to the secrets tab under this key: YOUTUBE_API_KEY and toggle notebook access to on.
  3. Go to Runtime > Change Runtime Type and select T4 GPU type. If you use CPU, the output for punctuated transcript will take some minutes to complete (around 1 minute per 10-minute video)
  4. Change the values in the second cell to include your URL etc.
  5. Press CTRL+F9 or CMD+F9 to run the notebook.

Run Locally 💻

I do not recommend running locally as it will download tensors and other stuff which are over 6gb. But if you want you can do this:

  1. Clone the repository: git clone https://github.com/therohitdas/Youtube-Transcript-Generator.git && cd Youtube-Transcript-Generator
  2. Create a virtual environment: python -m venv venv
  3. Activate the virtual environment: source venv/bin/activate (Linux/MacOS) or venv\Scripts\activate (Windows)
  4. Install dependencies: pip install -r requirements.txt
  5. Set up the required environment variables: YOUTUBE_API_KEY (optional). You can either create a .env file or set them up in your system using.
  6. Run the script: python index.py <YouTube_URL> or python index.py -h for the help menu.

Support 🤝

For any issues or feature requests, please create an issue.

Example 📋

Here's an example of how to run the script with various options:

Basic Usage

python index.py https://www.youtube.com/watch?v=VIDEO_ID

Specify the Language

python index.py https://www.youtube.com/watch?v=VIDEO_ID -l fr

Generate a Raw Transcript

python index.py https://www.youtube.com/watch?v=VIDEO_ID

Generate a Punctuated Transcript

python index.py https://www.youtube.com/watch?v=VIDEO_ID -p

Specify the Output Directory

python index.py https://www.youtube.com/watch?v=VIDEO_ID -o /path/to/output

Specify a Custom Filename

python index.py https://www.youtube.com/watch?v=VIDEO_ID -f custom_filename

Enable Verbose Mode

python index.py https://www.youtube.com/watch?v=VIDEO_ID -v

Specify a Punctuation Model

python index.py https://www.youtube.com/watch?v=VIDEO_ID -m author/model_name

Punctuation model name can be taken from here.

Make sure to replace https://www.youtube.com/watch?v=VIDEO_ID with the actual URL of the YouTube video you want to process.

Feel free to copy and paste these examples into your terminal.

Acknowledgments 🙌

This script utilizes the youtube-transcript-api and fullstop-punctuation-multilang-large libraries. Special thanks to their contributors.

Feel free to adapt and use the script based on your requirements. Enjoy the convenience of YouTube transcript processing!

Connect with me 📧

The best way to connect is to email me namaste@theRohitDas.com

🚀 Happy transcribing!

About

Create transcripts of YouTube video in your favorite language. Use it for note-taking, chat with YouTube videos and for training data.

Topics

Resources

License

Stars

Watchers

Forks