Mercury is a data enrichment service for Analogue. It's primarily used to extract rich data and images for use on Analogue (people, topics, information etc)
Live endpoint can be found: https://analogue-mercury.herokuapp.com/get
Pass in the parameter url
with a valid URL to get data.
GET https://analogue-mercury.herokuapp.com/get?url=https://www.youtube.com/watch?v=dzqpfu5izjE
Install Python3 (setup guide) and follow the Flask installation guide.
- Create
virtualenv
python3 -m venv mercury
- Activate
virtualenv
source mercury/bin/activate
- Install requirements
pip install -r requirements.txt
- Add to
.env
locally to run in debugger mode
FLASK_ENV=development
- Run
app.py
from the root to start Flask
python3 app.py
- Copy paste these keys into your
.env
file
From a UX perspective, the idea solution is to get back data as fast as possible when someone adds a URL. So it would spit back the simple data first (url, image, description, medium type), and if it's new and needs to be enriched, we enrich it in the background by hitting the appropriated APIs.
So maybe there are two endpoints, one with a quick response (no enriching) and one that does the full enrichment. We can discuss and figure out the best solution together.
Supports the following URLs and APIs. Example URLs linked.
Medium | URLs | APIs |
---|---|---|
Book | https://goodreads.com https://amazon.com |
Google Books API for data, authors, topics OpenLibrary for image covers Amazon solution TBD |
Music Podcast |
Spotify (song, album) Apple (show, episode) |
Spotify API Apple TBD |
Film TV |
IMDB (film, show, episode) | OMDB API for data TMDB API for people, trailers, etc |
Art WikiArt |
Artsy WikiArt |
Artsy API WikiArt API |
This endpoint will be used to get the initial data as quickly as possible. Ideally it doesn't even hit APIs, as to save time for the user. But you might have to hit APIs to get the specific medium and form type (e.g. for IMDB links, films vs TV shows)
JSON response:
{
title: 'url title from og or twitter or <title> tag'
url: 'CANONICAL_URL_NORMALIZED', // shouldn't have query params, except for youtube (e.g. ?v=afdsafxxx)
medium: 'one of the medium types mapped below',
form: 'one of the form types mapped below',
image: 'url to image from og or twitter tags or first image in html',
description: 'short description from og or twitter or meta tags or first paragraph of html'
}
Form | Medium | URLs |
---|---|---|
video |
video_link |
youtube.com, vimeo.com, ted.com |
video |
film |
imdb.com film url (example) |
video |
tv |
imdb.com show url (example) |
video |
tv_episode |
imdb.com episode url (example) |
audio |
song |
spotify.com song url (example) |
audio |
album |
spotify.com album url (example) |
audio |
playlist |
spotify.com playlist url (example) |
audio |
podcast |
spotify.com podcast show url (example) |
audio |
podcast_episode |
spotify.com podcast episode url (example) |
audio |
audio_link |
soundcloud.com |
text |
book |
amazon.com |
text |
link |
default form and medium (most urls) |
This endpoint will be used to enrich the data (through a background job in Rails). So this will provide full rich responses, including related data (e.g. authors for books from Google, director for films from IMDB).
- Leverages Open Graph and Twitter meta tags
- The scraper will do selective parsing which means it will create a parsing tree only for some specific tags, not for all the tags in the HTML doc.
- A link url will be sent using the GET Method only as it is faster than POST and PUT.