A toolkit to download, preprocess and analyse reddit sentiment data. It is derived from the Reddit Comments Archive hosted by pushshift.
- Install requirements (jq, curl), e.g
brew install jq curl
- Install Poetry
- Download archives
- Distill a smaller dataset
poetry install
# use --help for help with the commands
poetry run download-annotate-archives 2005 2006 --multithreading
poetry run distill-dataset
For a basic and more advanced usages of the resulting dataset, consider the analysis
folder.