Skip to content

Latest commit

 

History

History
72 lines (58 loc) · 4.84 KB

README.md

File metadata and controls

72 lines (58 loc) · 4.84 KB

fetch-stats

Fetches and cleans data from NSO websites and publishes them as in a standardised tidy data format.

This work has two goals

The data files follows a simple timescale,observation format. Time is YYYY-MM, and observation is percentage change. For example:

month,observation
1996-01,47.56
1996-02,43.645
1996-03,41.9048
...

These are the statistics that are fetched, reformatted and stored in the ./data directory:

In almost all cases the data file is downloaded and read in (except for Philippines where the numbers were hard-coded). Preferably the files would be JSON or a CSV, but some countries have PDFs or XLS files. The location of all these files online and other metadata is in the data/nso_stats_metadata.json file.

It is also deployed as a Github action which runs several times between 6am and 10am UTC. So some of the statistics should stay up-to-date. You can view this Github action in .github/workflow/fetch_stats.yaml. However, given the variability of these statistics data, it wouldn't be surprising if the action breaks at some point if the published format changes.

Dependenices

  • Java 8+ (for Tabula to read PDFs)
  • Python 3.10+
    • It likely works for older versions of Python, but it hasn't been tested

Setup

Clone this repo

git clone https://github.com/FullFact/nso-stats-fetcher.git

Install required libraries

Either

poetry install

or

pip install -r requirements.txt

To run the scripts and fetch updated versions of all the statistics data, run:

python src/nsofetch/fetch_all.py

Or just run each country's individual script individually. We use ISO 3166 country codes for standardised country names.