Fetches and cleans data from NSO websites and publishes them as in a standardised tidy data format.

This work has two goals

Provide a database of well-formatted data that can be used in Full Fact’s Stats Checking tools.
To highlight how much work is involved to collect and compare national statistics data across countries, as discussed in the write-up.

The data files follows a simple timescale,observation format. Time is YYYY-MM, and observation is percentage change. For example:

month,observation
1996-01,47.56
1996-02,43.645
1996-03,41.9048
...

These are the statistics that are fetched, reformatted and stored in the ./data directory:

Argentina
- Consumer price index – monthly year-on-year (source)
Ireland
- Consumer price index – monthly year-on-year (source)
Japan
- Consumer price index – monthly year-on-year (source)
Mexico
- Consumer price index – monthly year-on-year (source)
Nigeria
- Consumer price index – monthly year-on-year (source)
Philippines
- Consumer price index – monthly year-on-year (source)
UK
South Africa
- Consumer price index - monthly year-on-year (source)
- Producer price index - monthly year-on-year (source)

In almost all cases the data file is downloaded and read in (except for Philippines where the numbers were hard-coded). Preferably the files would be JSON or a CSV, but some countries have PDFs or XLS files. The location of all these files online and other metadata is in the data/nso_stats_metadata.json file.

It is also deployed as a Github action which runs several times between 6am and 10am UTC. So some of the statistics should stay up-to-date. You can view this Github action in .github/workflow/fetch_stats.yaml. However, given the variability of these statistics data, it wouldn't be surprising if the action breaks at some point if the published format changes.

Dependenices

Java 8+ (for Tabula to read PDFs)
Python 3.10+
- It likely works for older versions of Python, but it hasn't been tested

Setup

Clone this repo

git clone https://github.com/FullFact/nso-stats-fetcher.git

Install required libraries

Either

poetry install

or

pip install -r requirements.txt

To run the scripts and fetch updated versions of all the statistics data, run:

python src/nsofetch/fetch_all.py

Or just run each country's individual script individually. We use ISO 3166 country codes for standardised country names.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Dependenices

Setup

Files

README.md

Latest commit

History

README.md

File metadata and controls

Dependenices

Setup