This repo contains the code for a personal project I spent a few hours on. If you want help running it, DM me on X. If enough people are curious I can write out some directions.
See a live demo of app.py here: https://x.com/mteamisloading/status/1889713505038655872
Each file's function:
-
process_tweets.py:
- takes in a json file with all the tweets from the X API
- takes in a json file with all the note tweets from the X API (any tweets that are longer than the twitter display limit)
- creates a new directory with the tweets as markdown files, extending tweets with the note tweets
-
gemini_search.py:
- takes in the directory of tweets
- uses chroma to embed the tweets
- creates a search engine using gemini to search the tweets, no frontend for this
-
app.py:
- a Flask web application that provides a search interface for tweets
- uses faiss_search_lib.py to power the search functionality
- provides a simple web UI through templates/index.html
-
FAISS_search.py:
- standalone script that implements tweet search using FAISS vector similarity
- uses sentence-transformers to create embeddings
- includes functions to load tweets, create embeddings, and save/load indexes
- can be run directly for testing search functionality
-
faiss_search_lib.py:
- library version of FAISS_search.py used by app.py
- provides a TweetSearchEngine class for easy integration
- handles initialization and search operations
The project requires several dependencies listed in requirements.txt. The search functionality is implemented in three different ways:
- Using Gemini (gemini_search.py)
- Using FAISS directly (FAISS_search.py)
- Using FAISS through a web interface (app.py + faiss_search_lib.py)
All search implementations work with the markdown files generated by process_tweets.py from your X data export.
-
Install dependencies:
python3 -m pip install -r requirements.txt
-
Edit files for your api keys and twitter username
-
Download your X data export, may take a few days to process
-
Move the tweets.js and note-tweets.js files into a new subdirectory called
input
-
Process the tweets:
python3 process_tweets.py
-
Check the output (by default at
Bracket Project Vault
) -
Run the app:
python3 app.py
Things will probably break, good luck fixing it. DM me on X if have questions.