Skip to content
This repository has been archived by the owner on Jun 10, 2021. It is now read-only.

Latest commit

 

History

History
49 lines (41 loc) · 1.59 KB

README.md

File metadata and controls

49 lines (41 loc) · 1.59 KB

Testing POSTGAP output

The tests folder contains three types of quality control utilities for POSTGAP output, namely health checks, data checks and reports.

Health checks

These are unit tests that:

  • check the data schema, such as:
    • consistent format for all values in a column
    • uniqueness of values in a column, when grouped by values in another
  • can be run against either a partial or whole output file, such as:
    • an output file for a single EFO term
    • a large concatenated output file (across all EFO terms)

Data checks

These are unit tests that:

  • check biological expectations, such as:
    • filtering of the MHC region
    • filtering of trans relations (ie. when genes and snps have different chromosomes)
  • can be run against a whole output file only

Reports

These are summary or metadata files that:

  • can be generated for either a partial or whole output file
  • present summary statistics to allow comparison between POSTGAP output files

Usage

Installation requirements

Assumption: You have python3 installed.

Ideally, use virtualenv as follows:

# (in tests folder)
virtualenv -p python3 venv
source venv/bin/activate
pip install -r requirements.txt

Run tests against a file

For a file in POSTGAP TSV format, run:

python runner.py ./sample_data/postgap.20180108.asthma.tsv.gz

Run report generator against a file

For a file in POSTGAP TSV format, run:

python reporter.py ./sample_data/postgap.20180108.asthma.tsv.gz

This will produce a file in tests/__reports__ with filename format format <input_file>.REPORT.<timestamp>.ipynb.