-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error: results missing for single sample #283
Comments
Thanks for the bug report! Sorry, I can't reproduce on the dev branch. Here's what I tried:
I noticed @smlmbrt is the expert (and out of office 🌴 until next week ) but perhaps low variant match rates could contribute to NA values. Some changes were made to the ancestry normalisation steps to handle low variance cases in the most recent release. |
Thanks for testing it. I'll try again. The match rates were high (99.x%) in the alpha 4 version run, and the genome is imputed (between 20-30 million variants). |
I tried again after creating a clean new set up, and got the same results.
Version
|
I think @nebfield is right, it has probably triggered this exception which should only be applied to the target when there's more than 3 samples: https://github.com/PGScatalog/pgscatalog_utils/blob/b5962bf5f12bb2aba9d51a3c569a0d831072ecf0/pgscatalog_utils/ancestry/tools.py#L250-L253 |
@smlmbrt Great! I'll try a test with more samples sometime. Also, is it possible to recover z-scores and percentiles which are not normalized for ancestry from only the values in SUM? My understanding is that SUM is calculated only on variants that are a subset of the variants in the submitted sample(s), the reference panel (e.g. 1000G), and the scorefile. Since the variants are the same between the reference and submitted samples, I assume it would be a "fair" comparison (e.g., no normalization for matched variant number needed). |
The percentiles and z for the most similar population are not normalised, they just use that as the reference distribution.
Correct.
I think this will depend on your use case, the reason for using reference populations is because the mean and variance of the PGS distribution is caused by allele frequency and LD. If these are unmatched than an individual's relative place in a distribution will be incorrect. |
Description of the bug
In the current dev build, the report is made, but it does not contain any columns except for SUM:
This is in both the .html report as well as the raw testfile_pgs.txt.gz file.
In that file, only the SUM column is populated.
However, when using the next most current version (alpha 4), all columns are correctly populated (despite technically failing on the report making step, #242).
I know the build is dev and not released yet, but it might happen on alpha 5 too (I'm unable to test it because of the _vcf filename error).
Command used and terminal output
Relevant files
No response
System information
Ubuntu, Docker, Singularity, current Nextflow
The text was updated successfully, but these errors were encountered: