Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alpha.6 #274

Merged
merged 30 commits into from
May 24, 2024
Merged

alpha.6 #274

merged 30 commits into from
May 24, 2024

Conversation

smlmbrt
Copy link
Member

@smlmbrt smlmbrt commented Apr 9, 2024

Changelog

Improvements

  • Migrate our custom python tools to new pygscatalog packages
    • Reference / target intersection now considers allelic frequency and variant missingness to determine PCA eligibility
    • Downloads from PGS Catalog should be faster (async)
    • Packages are now documented 🥳
  • Update plink version to alpha 5.10 final Update plink version? #179
  • Add docs describing cloud execution
  • Add correlation test comparing calculated scores against known good scores
  • When matching variants, matching logs are now written before scorefiles to improve debugging UX
  • Improvements to PCA quality (ensuring low missingness and suitable MAF for PCA-eligble variants in target samples).
    • This could allow us to implement MAF/missingness filters for scoring file variants in the future.

Bug fixes

@smlmbrt smlmbrt linked an issue Apr 9, 2024 that may be closed by this pull request
nebfield and others added 13 commits April 19, 2024 12:06
this change affects people running the workflow directly from
github, e.g.

$ nextflow run pgscatalog/pgsc_calc ...

if --outdir isn't set, then the results folder can be in $NXF_HOME,
which is a hidden folder in the home directory by default. not a
helpful place for results to be!

this doesn't affect people running from a cloned repo directly
* add correlation test

* add correlation action

* fix download URL

* use scoring files from correlation archive

* get test profile working with pygscatalog

* integration updates

* fix correlation scorefile wildcard

* fix tests

* update plink2

* gzip afreq in plink2_vcf

* update custom scoring files for liftover

* fix match module test

* use local files in test suite

* fix singularity container definition

* check for environment variables with set -euxo

* logs are massive, don't upload, debug locally
* Output allele frequencies along with missingness (for filtering variants)

* Add afreq to output

* Add afreq to intersect_variants.nf

* add afreq to intersect_thinned

* intersect with new pgscatalog-intersect application

* rebase

* Make verbose

* Remove duplication

* Use new output of intersect_variants in filtering

* Use new output of intersect_variants in intersect_variants.nf : keeps memory footprint very low (but higher I/O into tempfiles)

* Fix column index to PCA_ELIGIBLE (13)

* Fix awk statement that doesn't work with odd carriage return?

* Fix awk statement for True/False (not 0/1 as in previous version)

* Add in variant-based filters

---------

Co-authored-by: Benjamin Wingfield <bwingfield@ebi.ac.uk>
@nebfield nebfield merged commit 1321c1a into main May 24, 2024
234 checks passed
@nebfield nebfield deleted the dev branch May 24, 2024 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment