Run tests:
$ docker-compose up -d && pytest
Build and run container:
$ docker-compose up -f docker-compose.prod.yml --build
DSPs report DSRs containing hundreds of millions of usages. If you were to deploy this solution to production, would you do any change in the database or process, in order to import the usages? Which ones?
- Denormalize the data to minimize joins, if business requirements allow it. Currencies and locations can be represented as string enumerations in DSR rows.
- Wrap
ingest_dsr
in a consumer to perform ingestion asynchronously. AddIN_PROGRESS
status for DSRs. Store ingestion logs. - Think on faster validation/serialization. Maybe employ a more lightweight framework.
- Delegate file upload to a highly available storage service proxied by a dedicated thin microservice. Offload gzip decompression to infrastructure, e.g. nginx.
A lot of our work is about connecting digital service providers (DSPs) like Spotify or YouTube with societies like SGAE or SACEM, who represent music creators. DSPs provide digital sales reports (DSRs), which contain information about music metadata and revenue generated. We crunch this data and give societies the information they need.
For this test, we provide several DSRs that represent the usages and revenue from different countries. The aim is to parse the contents of the DSRs and insert them into a database to extract statistics through an API. Each line of a DSR represents a sound recording and its associated usage data. In detail, it contains the following fields:
dsp_id: the unique identifier of a sound recording provided by DSP.
title: sound recording title.
artists: pipe-separated list of artists.
isrc: International Sound Recording Code.
usages: number of plays for this sound recording, territory and period.
revenue: revenue generated by this sound recording, territory and period.
DSR filenames specify metadata related to the DSR, such as Territory, Period,
and Currency. You will find the DSRs in the data/
directory.
The API specification is provided as an OpenAPI specification:
openapi.md
Our current (and incomplete) database contains the following tables:
- DSR: Models the DSR file and stores some relevant information.
- Currency: Models a currency.
- Territory: Models a territory.
Deliverables:
- A way to import the contents of DSRs to the DB.
- Complete the API according to the OpenAPI specification.
- A form in the admin page to delete DSRs and it's contents.
- Tests for each api endpoint, using any preferred testing framework.
- Dockerfile
Requirements:
- Django 3.1
- Python 3.9
Extra questions:
- DSPs report DSRs containing hundreds of millions of usages. If you were to deploy this solution to production, would you do any change in the database or process, in order to import the usages? Which ones?
Note:
In order to manage python dependencies, it will be necessary to use any tool (e.g.: pipenv) that interprets the Pipfile placed in the root folder.
For example, using pipenv
, it's enough to do:
pipenv sync --dev