This SDK generates datasets for training Video LLMs from youtube videos. More sources coming later!
- Generate search queries with GPT.
- Search for youtube videos for each query using scrapetube.
- Download the videos that were found and subtitles using yt-dlp.
- Detect segments from each video using CLIP and a fancy manual algorithm.
- Generate annotations for each segment with GPT using audio transcript (eg instructions) in 2 steps: first extract clues from the trancript, then generate annotations based on these clues.
- Aggregate segments with annotations into one file
- Cut segments into separate video clips with ffmpeg.
In the end you'll have a directory with useful video clips and an annotation file, which you can then train a model on.
pip install -r requirements.txt
. If it doesn't work, try updatingpip install -U -r requirements.txt
.- make
.env
file with:OPENAI_API_KEY
for openaiAZURE_OPENAI_ENDPOINT
andAZURE_OPENAI_API_KEY
for azureOPENAI_API_VERSION='2023-07-01-preview'
- set config params in the notebook:
openai.type
: openai/azureopenai.temperature
: the bigger, the more random/creative output will beopenai.deployment
: model for openai / deployment for azure. Needs to be able to do structured output and process images. Tested on gpt4o on azure.data_dir
: the path where all the results will be saved. Change it for each experiment/dataset.
Please refer to getting_started.ipynb
If you have your own videos with descriptions, you can skip the download/filtering steps and move straight to generating annotaions!