VCF file parsing issue #8

yangyxt · 2021-02-18T02:31:57Z

I got this error message:
no DISPLAY variable so Tk is not available [2021-02-16 16:01:55] Load genotype data in VCF file: /paedyl01/disk1/yangyxt/public_data/1000g/1000g_phase3_from_fei/samples_used_to_build_ref_model/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.unrelated.vcf.gz [1] FALSE Warning messages: 1: In fread(geno, data.table = FALSE) : Detected 1 column names but the data has 2 columns (i.e. invalid file). Added 1 extra default column name for the first column which is guessed to be row names or an index. Use setnames() afterwards if this guess is not correct, or fix the file write command that create 2: In fread(geno, data.table = FALSE) : Stopped early on line 229. Expected 2 fields but found 4. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">>>

It seems the header line describing the GT definition cannot have more than 2 fields seperated by comma?
Or should I just abandon the header line to input VCF body content only?

Here is how my VCF file looks like (from 1000g):

The text was updated successfully, but these errors were encountered:

mike8115 · 2021-03-10T22:13:47Z

From my experience so far, you need to drop all but the last line of the header. It doesn't look like the script makes any attempt at parsing the header section, so it tries to treat the header as actual data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VCF file parsing issue #8

VCF file parsing issue #8

yangyxt commented Feb 18, 2021

mike8115 commented Mar 10, 2021

VCF file parsing issue #8

VCF file parsing issue #8

Comments

yangyxt commented Feb 18, 2021

mike8115 commented Mar 10, 2021