-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bfile works but pfile errors out #394
Comments
Thanks for the bug report. I think your problems are related to the columns in the pvar file. What column names do you have in the pvar file? |
I've got several header lines and then the column names look as follows.
##FILTER=<ID=PASS,Description="All filters passed">
##filedate=2022.4.1
##contig=<ID=chr22>
...
#CHROM POS ID REF ALT FILTER INFO
22 11210964 chr22:11210964:C:A C A PASS AF=0.01485;MAF=0.01485;R2=0.8448;IMPUTED;rsID=rs1261998299
22 11211016 chr22:11211016:A:T A T PASS AF=0.01518;MAF=0.01518;R2=0.82616;IMPUTED;rsID=rs946399183
From: Benjamin Wingfield ***@***.***>
Sent: Thursday, December 5, 2024 5:33 AM
To: PGScatalog/pgsc_calc ***@***.***>
Cc: Batzler, Anthony ***@***.***>; Author ***@***.***>
Subject: [EXTERNAL] Re: [PGScatalog/pgsc_calc] bfile works but pfile errors out (Issue #394)
Thanks for the bug report. I think your problems are related to the columns in the pvar file. What column names do you have in the pvar file?
-
Reply to this email directly, view it on GitHub<#394 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AS4PKZSBHDN4BXTIMQSGWT32EA2X5AVCNFSM6AAAAABTA5W3J6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRQGA2TKMJSGQ>.
You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>
|
Thanks for the details. I think the problem is related to PGScatalog/pygscatalog#29 If you try:
This should overwrite the existing pvar file to remove the FILTER and INFO columns, which are causing the problem. If the plink command does remove the columns, please do the same for all of your chromosomes and test the calculator again 😄 |
This worked great. Thank you
From: Benjamin Wingfield ***@***.***>
Sent: Thursday, December 5, 2024 9:33 AM
To: PGScatalog/pgsc_calc ***@***.***>
Cc: Batzler, Anthony ***@***.***>; Author ***@***.***>
Subject: [EXTERNAL] Re: [PGScatalog/pgsc_calc] bfile works but pfile errors out (Issue #394)
Thanks for the details. I think the problem is related to PGScatalog/pygscatalog#29<PGScatalog/pygscatalog#29>
If you try:
$ plink2 --pfile pg_imputed22 --make-just-pvar cols=-xheader,-maybequal,-maybefilter,-maybeinfo,-maybecm --out pg_imputed22
This should overwrite the existing pvar file to remove the FILTER and INFO columns, which are causing the problem.
If the plink command does remove the columns, please do the same for all of your chromosomes and test the calculator again 😄
—
Reply to this email directly, view it on GitHub<#394 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AS4PKZSEGTVYKA3QVFM5OCT2EBW3NAVCNFSM6AAAAABTA5W3J6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRQGY2DCNBVGQ>.
You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>
|
Great, thanks! I'll leave this issue open because the calculator should ignore these extra columns automatically. We'll fix it in the next release 😄 |
Description of the bug
I'm submitting a pgsc_calc job successfully using bfile (plink bed) however when I use the pfile format (plink pgen) it fails for unknown reasons (I cant interpret the error anyway...).
Input files are as follows
-rw-r----- 1 batzler bsi 23377581643 Dec 4 10:53 pg22.bed
-rw-r----- 1 batzler bsi 30106514 Dec 4 10:53 pg22.bim
-rw-r----- 1 batzler bsi 8074325 Dec 4 10:53 pg22.fam
-rw-r----- 1 batzler bsi 1396 Dec 4 10:53 pg22.log
-rw-r----- 1 batzler bsi 1294 Dec 4 13:34 pg_imputed22.log
-rw-r----- 1 batzler bsi 23982440039 Dec 4 13:34 pg_imputed22.pgen
-rw-r----- 1 batzler bsi 7259937 Dec 4 13:34 pg_imputed22.psam
-rw-r----- 1 batzler bsi 78753551 Dec 4 13:34 pg_imputed22.pvar
plink bed/binary files were created from the pgen files
$PLINK2 --pfile pg_imputed22 --make-bed --out pg$CHR
When I run through pipeline using format bfile everything executes properly.
When running with the pfile format and the pgen files
Error traceback is as follows
Traceback (most recent call last):
File "/app/pgscatalog.utils/.venv/bin/pgscatalog-match", line 8, in
sys.exit(run_match())
^^^^^^^^^^^
File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/cli/match_cli.py", line 87, in run_match
ipc_path = get_match_candidates(
^^^^^^^^^^^^^^^^^^^^^
File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/cli/match_cli.py", line 124, in get_match_candidates
with variants as target_df:
File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/lib/variantframe.py", line 54, in enter
self.arrowpaths = loose(self.variants, tmpdir=self._tmpdir)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/functools.py", line 909, in wrapper
return dispatch(args[0].class)(*args, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/lib/_arrow.py", line 94, in _
return batch_read(reader, tmpdir=tmpdir, cols_keep=cols_keep)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/lib/_arrow.py", line 102, in batch_read
batches = reader.next_batches(batch_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/polars/io/csv/batched_reader.py", line 134, in next_batches
batches = self._reader.next_batches(n)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.ComputeError: found more fields than defined in 'Schema'
Consider setting 'truncate_ragged_lines=True'.
Command used and terminal output
Relevant files
No response
System information
Nextflow version
nextflow/23.04.2
slurm executor
apptainer/singularity
linux
The text was updated successfully, but these errors were encountered: