Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VCF file parsing issue #8

Open
yangyxt opened this issue Feb 18, 2021 · 1 comment
Open

VCF file parsing issue #8

yangyxt opened this issue Feb 18, 2021 · 1 comment

Comments

@yangyxt
Copy link

yangyxt commented Feb 18, 2021

I got this error message:
no DISPLAY variable so Tk is not available [2021-02-16 16:01:55] Load genotype data in VCF file: /paedyl01/disk1/yangyxt/public_data/1000g/1000g_phase3_from_fei/samples_used_to_build_ref_model/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.unrelated.vcf.gz [1] FALSE Warning messages: 1: In fread(geno, data.table = FALSE) : Detected 1 column names but the data has 2 columns (i.e. invalid file). Added 1 extra default column name for the first column which is guessed to be row names or an index. Use setnames() afterwards if this guess is not correct, or fix the file write command that create 2: In fread(geno, data.table = FALSE) : Stopped early on line 229. Expected 2 fields but found 4. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">>>

It seems the header line describing the GT definition cannot have more than 2 fields seperated by comma?
Or should I just abandon the header line to input VCF body content only?

Here is how my VCF file looks like (from 1000g):
image

@mike8115
Copy link

From my experience so far, you need to drop all but the last line of the header. It doesn't look like the script makes any attempt at parsing the header section, so it tries to treat the header as actual data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants