As every Spotify user receive a Year-end story and playlist from Spotify about his annual listening habit and insights, it may be more interesting to have a dashboard for yourself to check out anytime of the year with a broader dimension. It is also important to store your precious data generated by yourself on your end. Therefore, our goal of this project is to build a mini data lake and interactive dashboard to do analytics for your listening habit on Spotify.
The application is currently in development, v0.1.0 (Beta).
The solution is available for testing purpose, you may run the solution and test the dashboard in GoodData. But it is still in development and this is not the final version.
The goals are to develop the ETL pipeline to systematically store the listening records obtained from Spotify in a centralized data lake and use this backend to power the dashboards or machine learning infrastructure.
You may find more detailed requirements in the Goals and Requirements folder.
Anyone who is interested to do analytics for your listening habit on Spotify! However, there are two expected personas: Administrators and Causal Users. Administrators are the one who responsible for seting up and maintaining the backend of the whole applications; causal users are simply the users who browses and use the dashboards.
The listening records are obtained from Spotify directly. You may request the data through: Spotify User Profile -> Privacy Settings -> Download Your Data. You may request your Account Data, including last 12 months of listening records, in about 5 days. You may also request your Spotify lifetime records, although it may take up to 30 days.
Due to privacy, my full listening record will not be posted in this repository but saved in my local machine. However, the pipeline is designed to ingest dynamically which is not user-specify. Therefore, you may still able to utilize the pipeline for your own use.
You may find more details about the data from Spotify, data lake structure, or anything about data in the Data folder, the Dataset Structure from Spotify in the Data Structure folder , or the ELT Overview in the ELT folder. However, the scripts of the pipeline are available in the DAGs folder under the Setup folder.
Currently, we are only support the attributes offered by Spotify. The data provided from Spotify does not have sufficient attribute data on songs' metadata. It may be followed up in the future.
We will be using the following tools:
- Docker
- Postgres
- Airflow
- Python
- Pandas
- Plotly/Dash
- GoodData
The applications are expected run in Docker Containers and utilize open-source resouce only. The database choice is Postgres and local folders for flatfile storage.
You may find more details about the data from Spotify, data lake structure, or anything about data in the Data folder.
You may access the dashboard hosted by GoodData.CN at http://localhost:3000 after GoodData is configured and ran. The following is the default dashboard layout:
You may visit the Gallery folder to look at the dashboard screenshots.
If you are an administrator to initiate the backend and dashboard, you would first initiate the Docker network defined in the Setup folder and it would automatically set up for you. Once Airflow is ready, you are expected to supply the data set and maintain the data pipeline. And finally, you would also setup and maintain the backend of GoodData.CN. You may find more details on how to setup the application with Docker in the Setup folder.
Once the application is setup, causal users may access the dashboard hosted by GoodData.CN at http://localhost:3000 after GoodData is configured. You may find the Instructions on how to use the GoodData Dashboard in the Instruction folder.
- v0.1.0 (beta) - Functional Backend and functional GoodData Dashboards for Basics and KPIs report
- v0.2.0 (beta) - Plotly Dash on Trend Analysis and Anomaly Detection
- v1.0.0 - Official Launch
- v1.1.0 - Playlist Analysis via Plotly Dash
More details in the Release Notes
Note:
- Playtlist Analysis discover low play rate songs and make recommendation to remove from playlist