Skip to content

A repository for the Analytics on my Spotify usage to show how to build a end-to-end analytics solution.

Notifications You must be signed in to change notification settings

jacquessham/Spotify_Analytics_MyUsage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spotify Analytics on my Usage

As every Spotify user receive a Year-end story and playlist from Spotify about his annual listening habit and insights, it may be more interesting to have a dashboard for yourself to check out anytime of the year with a broader dimension. It is also important to store your precious data generated by yourself on your end. Therefore, our goal of this project is to build a mini data lake and interactive dashboard to do analytics for your listening habit on Spotify.

The application is currently in development, v0.1.0 (Beta).

The solution is available for testing purpose, you may run the solution and test the dashboard in GoodData. But it is still in development and this is not the final version.

Goals and Requirements

The goals are to develop the ETL pipeline to systematically store the listening records obtained from Spotify in a centralized data lake and use this backend to power the dashboards or machine learning infrastructure.

You may find more detailed requirements in the Goals and Requirements folder.

Who Would Use This?

Anyone who is interested to do analytics for your listening habit on Spotify! However, there are two expected personas: Administrators and Causal Users. Administrators are the one who responsible for seting up and maintaining the backend of the whole applications; causal users are simply the users who browses and use the dashboards.

Data

The listening records are obtained from Spotify directly. You may request the data through: Spotify User Profile -> Privacy Settings -> Download Your Data. You may request your Account Data, including last 12 months of listening records, in about 5 days. You may also request your Spotify lifetime records, although it may take up to 30 days.

Due to privacy, my full listening record will not be posted in this repository but saved in my local machine. However, the pipeline is designed to ingest dynamically which is not user-specify. Therefore, you may still able to utilize the pipeline for your own use.

You may find more details about the data from Spotify, data lake structure, or anything about data in the Data folder, the Dataset Structure from Spotify in the Data Structure folder , or the ELT Overview in the ELT folder. However, the scripts of the pipeline are available in the DAGs folder under the Setup folder.

Currently, we are only support the attributes offered by Spotify. The data provided from Spotify does not have sufficient attribute data on songs' metadata. It may be followed up in the future.

Tools

We will be using the following tools:

  • Docker
  • Postgres
  • Airflow
  • Python
    • Pandas
  • Plotly/Dash
  • GoodData

Databases and Flatfile Storage

The applications are expected run in Docker Containers and utilize open-source resouce only. The database choice is Postgres and local folders for flatfile storage.

You may find more details about the data from Spotify, data lake structure, or anything about data in the Data folder.

Dashboard

You may access the dashboard hosted by GoodData.CN at http://localhost:3000 after GoodData is configured and ran. The following is the default dashboard layout:



You may visit the Gallery folder to look at the dashboard screenshots.

Setup

If you are an administrator to initiate the backend and dashboard, you would first initiate the Docker network defined in the Setup folder and it would automatically set up for you. Once Airflow is ready, you are expected to supply the data set and maintain the data pipeline. And finally, you would also setup and maintain the backend of GoodData.CN. You may find more details on how to setup the application with Docker in the Setup folder.

Instructions for Causal Users

Once the application is setup, causal users may access the dashboard hosted by GoodData.CN at http://localhost:3000 after GoodData is configured. You may find the Instructions on how to use the GoodData Dashboard in the Instruction folder.

Future Development

  • v0.1.0 (beta) - Functional Backend and functional GoodData Dashboards for Basics and KPIs report
  • v0.2.0 (beta) - Plotly Dash on Trend Analysis and Anomaly Detection
  • v1.0.0 - Official Launch
  • v1.1.0 - Playlist Analysis via Plotly Dash



More details in the Release Notes

Note:

  • Playtlist Analysis discover low play rate songs and make recommendation to remove from playlist

About

A repository for the Analytics on my Spotify usage to show how to build a end-to-end analytics solution.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published