The Stocks Analysis Project aims to create a Dash Dashboard with financial information for stocks and ETFs.
The first part of the project focuses on creating a Reddit Posts Dashboard with the posts found within the wallstreetbets subreddit. The dashboard includes:
- List of stocks and ETFs mentioned in the Reddit posts.
- List of posts with their sentiment analysis.
- Summary of the company along with financial information.
- An interactive candlestick graph with indicators.
The project uses Docker to containerize various services:
- Apache Airflow: Orchestrates the tasks.
- Apache Kafka: Reddit posts are produced to a Kafka topic.
- Apache Spark: Processes the data.
- Cassandra DataStax Enterprise: Stores the data.
- Dash: Displays the Dashboard.
There are two docker-compose files in /docker folder.
- docker-compose.yml: Apache Airflow runs with LocalExecutor. It is a Lightweight version to consume less resources in development mode.
- docker-compose-full-airflow.yml: Apache Airflow runs with CeleryExecutor. The complete version of Apache Airflow.
The following tasks are defined in an Apache Airflow DAG:
- zip_python_modules: Zips modules for transfer to the Spark Master.
- create_register_kafka_topic: Creates a Kafka topic and registers the schema in the Confluent Schema Registry.
- create_cassandra_tables: Creates the Cassandra keyspace and tables in DataStax Entreprise.
- reddit_produce_to_kafka: Produces Reddit posts messages to the Kafka broker.
- reddit_pipeline: Spark pipeline that reads posts from Kafka, extracts stock symbols, performs sentiment analysis with a Hugging Face pipeline, and saves the data to Cassandra.
- stocks_data_pipeline: Spark pipeline that extracts symbols from Cassandra, retrieves information and price history from Yahoo Finance, calculates indicators, and saves the data to Cassandra.
- apache-airflow
- prawl
- pandas
- confluent-kafka
- pyspark
- cassandra-driver
- transformers
- yfinance
- pandas-ta
- dash
- Install Docker and docker-compose.
- Get a Reddit Secret Key and Client Id.
- Get an Alpha Vantage API Key.
Clone this project to your computer.
Insert your Reddit Secret Key and Client Id and Alpha Vantage API Key in stocks_analysis/stocks_etl/utils/
Open the terminal and navigate to the stocks_analysis\docker.
Choose the appropriate docker-compose file.
Run one of the following commands to build the images and install the containers:
/docker> docker-compose up -d --build
/docker> docker-compose -f docker-compose-full-airflow.yml up -d --build
Open Apache Airflow Webserver at http://localhost:8080/.
Log in to Apache Airflow with:
- Username: airflow
- Password: airflow
Go to Admin -> Connections and add a Spark connection with the following options:
- Connection Id: spark_default
- Connection Type: Spark
- Host: spark://spark-master
- Port: 7077
Run the DAG.
After completion, open the Dash Dashboard at http://localhost:8050/.