Skip to content

khvn26/backend-software-digital

Repository files navigation

Candidate notes

Run tests:

$ docker-compose up -d && pytest

Build and run container:

$ docker-compose up -f docker-compose.prod.yml --build

DSPs report DSRs containing hundreds of millions of usages. If you were to deploy this solution to production, would you do any change in the database or process, in order to import the usages? Which ones?

  • Denormalize the data to minimize joins, if business requirements allow it. Currencies and locations can be represented as string enumerations in DSR rows.
  • Wrap ingest_dsr in a consumer to perform ingestion asynchronously. Add IN_PROGRESS status for DSRs. Store ingestion logs.
  • Think on faster validation/serialization. Maybe employ a more lightweight framework.
  • Delegate file upload to a highly available storage service proxied by a dedicated thin microservice. Offload gzip decompression to infrastructure, e.g. nginx.

Digital - Senior Engineer test

A lot of our work is about connecting digital service providers (DSPs) like Spotify or YouTube with societies like SGAE or SACEM, who represent music creators. DSPs provide digital sales reports (DSRs), which contain information about music metadata and revenue generated. We crunch this data and give societies the information they need.

For this test, we provide several DSRs that represent the usages and revenue from different countries. The aim is to parse the contents of the DSRs and insert them into a database to extract statistics through an API. Each line of a DSR represents a sound recording and its associated usage data. In detail, it contains the following fields:

dsp_id: the unique identifier of a sound recording provided by DSP.
title: sound recording title.
artists: pipe-separated list of artists.
isrc: International Sound Recording Code.
usages: number of plays for this sound recording, territory and period.
revenue: revenue generated by this sound recording, territory and period.

DSR filenames specify metadata related to the DSR, such as Territory, Period, and Currency. You will find the DSRs in the data/ directory.

The API specification is provided as an OpenAPI specification:

openapi.md

Our current (and incomplete) database contains the following tables:

  • DSR: Models the DSR file and stores some relevant information.
  • Currency: Models a currency.
  • Territory: Models a territory.

Deliverables:

  • A way to import the contents of DSRs to the DB.
  • Complete the API according to the OpenAPI specification.
  • A form in the admin page to delete DSRs and it's contents.
  • Tests for each api endpoint, using any preferred testing framework.
  • Dockerfile

Requirements:

  • Django 3.1
  • Python 3.9

Extra questions:

  • DSPs report DSRs containing hundreds of millions of usages. If you were to deploy this solution to production, would you do any change in the database or process, in order to import the usages? Which ones?

Note:

In order to manage python dependencies, it will be necessary to use any tool (e.g.: pipenv) that interprets the Pipfile placed in the root folder.

For example, using pipenv, it's enough to do:

pipenv sync --dev

About

This is a test project for BMAT.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published