Comparing Market Performance To Warren Buffett's Letters To Shareholders

Every year, Warren Buffett releases a 'Letter to Shareholders' to Berkshire Hathaway investors in which he covers the performance of the company and the market as a whole. The goal of this project is to determine similarity between these letters using various natural language processing techniques and see if predictions can be made about the stock market the year following a letter is published using the S&P 500 as an indicator for market performance.

Data

get_letters.py can be called to reproduce all of the data used in this project - letters.txt, which contains each year's letter (from 1977 to 2021) on a single line, and sp500.csv, which contains the S&P 500 annual returns since 1926. Some cleaning is performed by this script as well, but more is performed by the notebook files when necessary.

Similarity Findings

similarity.ipynb processes these letters and creates embeddings to compare similarity between them. The first technique, count vectorization, shows no useful similarity between letters. The second technique, term frequency-inverse document frequency shows even less similarity between letters. The third technique, training a Doc2Vec model, again showing no useful similarty between letters.

Letters that are written in similar years are found to be the most similar to each other. This may in part be because of Buffett's writing style changing throughout the years and years written in similar times would have the most similar styles. Letters also change drastically in length - the first several and last several letters are extremely short compared to all of the other letters. Longer letters presumably have more information and larger vocabularies, which would make them more difficult to compare to other letters.

Predicting Sentiment And Comparing To Market Performance

prediction.ipynb trains two different neural networks - a basic ANN and an LSTM - to predict the overall sentiment of each letter. Each model was trained on IMDB reviews. Given their length, a few sentences, we predicted the sentiment of each sentence in Buffett's letters and averaged the results to get the overall sentiment.

The LSTM appeared to perform better than the ANN, but there was not a consistent correlation between predicted sentiment and market performance. Training data that more accurately reflects the language used in Buffett's letters may improve the results, as well building more complex models or fine tuning pre-trained models on some of the letters.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
models/lstm_model.tf		models/lstm_model.tf
README.md		README.md
get_letters.py		get_letters.py
prediction.ipynb		prediction.ipynb
similarity.ipynb		similarity.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparing Market Performance To Warren Buffett's Letters To Shareholders

Data

Similarity Findings

Predicting Sentiment And Comparing To Market Performance

About

Releases

Packages

Languages

biovino1/BuffettLetters

Folders and files

Latest commit

History

Repository files navigation

Comparing Market Performance To Warren Buffett's Letters To Shareholders

Data

Similarity Findings

Predicting Sentiment And Comparing To Market Performance

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages