In-Video Search Using Text Prompts in University Lectures

This project was conducted as part of the Practical Course: Applied Foundation Models in Computer Vision at the Computer Vision group of Professor Daniel Cremers. The focus is on exploring in-video search using text prompts, applied to university lectures which are inherently multi-modal. Searching within long university lectures represents a suitable use case for this technology. We developed a full pipeline with a visual interface.

Methodology

Project Overview

We explored a complete pipeline for in-video search using text prompts. The key steps are:

Video Preprocessing: Downloading and automatically extracting keyframes (~80-100 per lecture) from YouTube lecture videos (~1 hour length).
Information Extraction: The keyframes are processed to extract different modalities using the following foundational model architectures:

Whisper for Audio Transcriptions
LLAVA as Visual Language Models (VLM) for image captioning
Optical Character Recognition (OCR) library for text extraction

Information Aggregation: The keyframes are summarized using the LLAMA 3-7b model.
Data Embedding: Embedding lecture content.
Data Retrieval: Matching keyframes based on user prompts and displaying results in a graphical user interface (GUI).

Getting Started

Prerequisites

Install FFmpeg

More information on how to install FFmpeg can be found here.
Install Ollama

All models are run locally. Instructions can be found here.

Installation

pip install poetry                          // only for the first execution
poetry lock                                 // only for the first execution

poetry install                              // only for the first execution
poetry config virtualenvs.in-project true   // only for the first execution

Running the Pipeline

Make sure you have Ollama running !

Follow the steps in data_generation_pipeline.ipynb notebook to download process & embed the videos.

To test the Search performance:

Run the retrieve_data.ipynb notebook for data retrieval.

Running the Streamlit Frontend

To run the frontend using Streamlit, please follow these steps:

streamlit run app.py

Name		Name	Last commit message	Last commit date
Latest commit History 189 Commits
.vscode		.vscode
app		app
data		data
experiments		experiments
models		models
src		src
visuals		visuals
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
SECURITY.md		SECURITY.md
app.py		app.py
data_generation_pipeline.ipynb		data_generation_pipeline.ipynb
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
retrieve_data.ipynb		retrieve_data.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

In-Video Search Using Text Prompts in University Lectures

Methodology

Project Overview

Getting Started

Prerequisites

Installation

Running the Pipeline

Running the Streamlit Frontend

About

Releases

Packages

Contributors 4

Languages

Applied-Foundation-Models/In-Video_Search

Folders and files

Latest commit

History

Repository files navigation

In-Video Search Using Text Prompts in University Lectures

Methodology

Project Overview

Getting Started

Prerequisites

Installation

Running the Pipeline

Running the Streamlit Frontend

About

Resources

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages