Mercury

Mercury is a data enrichment service for Analogue. It's primarily used to extract rich data and images for use on Analogue (people, topics, information etc)

Endpoint

Live endpoint can be found: https://analogue-mercury.herokuapp.com/get

Pass in the parameter url with a valid URL to get data.

GET https://analogue-mercury.herokuapp.com/get?url=https://www.youtube.com/watch?v=dzqpfu5izjE

Running locally

Install Python3 (setup guide) and follow the Flask installation guide.

Create virtualenv

python3 -m venv mercury

Activate virtualenv

source mercury/bin/activate

Install requirements

pip install -r requirements.txt

Add to .env locally to run in debugger mode

FLASK_ENV=development

Run app.py from the root to start Flask

python3 app.py

Copy paste these keys into your .env file

Project Scope

From a UX perspective, the idea solution is to get back data as fast as possible when someone adds a URL. So it would spit back the simple data first (url, image, description, medium type), and if it's new and needs to be enriched, we enrich it in the background by hitting the appropriated APIs.

So maybe there are two endpoints, one with a quick response (no enriching) and one that does the full enrichment. We can discuss and figure out the best solution together.

Supports the following URLs and APIs. Example URLs linked.

Medium	URLs	APIs
Book	https://goodreads.com https://amazon.com	Google Books API for data, authors, topics OpenLibrary for image covers Amazon solution TBD
Music Podcast	Spotify (song, album) Apple (show, episode)	Spotify API Apple TBD
Film TV	IMDB (film, show, episode)	OMDB API for data TMDB API for people, trailers, etc
Art WikiArt	Artsy WikiArt	Artsy API WikiArt API

Quick response endpoint `/get`

This endpoint will be used to get the initial data as quickly as possible. Ideally it doesn't even hit APIs, as to save time for the user. But you might have to hit APIs to get the specific medium and form type (e.g. for IMDB links, films vs TV shows)

JSON response:

{
  title: 'url title from og or twitter or <title> tag'
  url: 'CANONICAL_URL_NORMALIZED', // shouldn't have query params, except for youtube (e.g. ?v=afdsafxxx)
  medium: 'one of the medium types mapped below',
  form: 'one of the form types mapped below',
  image: 'url to image from og or twitter tags or first image in html',
  description: 'short description from og or twitter or meta tags or first paragraph of html'
}

Medium mapping

Form	Medium	URLs
`video`	`video_link`	youtube.com, vimeo.com, ted.com
`video`	`film`	imdb.com film url (example)
`video`	`tv`	imdb.com show url (example)
`video`	`tv_episode`	imdb.com episode url (example)
`audio`	`song`	spotify.com song url (example)
`audio`	`album`	spotify.com album url (example)
`audio`	`playlist`	spotify.com playlist url (example)
`audio`	`podcast`	spotify.com podcast show url (example)
`audio`	`podcast_episode`	spotify.com podcast episode url (example)
`audio`	`audio_link`	soundcloud.com
`text`	`book`	amazon.com
`text`	`link`	default form and medium (most urls)

Rich response endpoint: `/enrich`

This endpoint will be used to enrich the data (through a background job in Rails). So this will provide full rich responses, including related data (e.g. authors for books from Google, director for films from IMDB).

Additional Notes

Leverages Open Graph and Twitter meta tags
The scraper will do selective parsing which means it will create a parsing tree only for some specific tags, not for all the tags in the HTML doc.
A link url will be sent using the GET Method only as it is faster than POST and PUT.

Name		Name	Last commit message	Last commit date
Latest commit History 497 Commits
.husky		.husky
.vscode		.vscode
apis		apis
helpers		helpers
models		models
.env_SAMPLE		.env_SAMPLE
.gitignore		.gitignore
.slugignore		.slugignore
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
app.py		app.py
commitlint.config.js		commitlint.config.js
db_config.py		db_config.py
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
runtime.txt		runtime.txt
sections.json		sections.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mercury

Endpoint

Running locally

Project Scope

Quick response endpoint `/get`

Medium mapping

Rich response endpoint: `/enrich`

Additional Notes

About

Releases

Packages

Contributors 8

Languages

License

analogueapp/mercury

Folders and files

Latest commit

History

Repository files navigation

Mercury

Endpoint

Running locally

Project Scope

Quick response endpoint /get

Medium mapping

Rich response endpoint: /enrich

Additional Notes

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Languages

Quick response endpoint `/get`

Rich response endpoint: `/enrich`

Packages