VCF_TE_annotate

Some rudimentary scripts to annotate vcf variants produced by pangenome graph approaches using the output of repeat annotation tools such as EDTA.

I was unable to find an existing tool that could associate inserted/deleted sequences present in VCF records with a Transposable element library. The script should provide text entries under the vcf tag ("TE") for each sample. The record should also be able to handle multiple repeat annotations for a single allele and also multiallelic entries.

vcf_anno.py is simply a wrapper for RepeatMasker - it pulls out the fasta entries for the individual allele sequences in a vcf and runs the masking on them.

vcf_modify.py uses the output of this RepeatMasking step (after filtering with the RM2bed script with preferred overlap handling) to attach the results to the individual genotypes within the vcf .

For example: "FR989862.1 131726 FR989862.1-131726-SNV-0-1 A T 60 . ID=FR989862.1-131726-SNV-0-1 GT:TE 0|1:DNA_LTR_TE_00005245_87,DNA_LTR_TE_00005245_87 0|0:DNA_LTR_TE_00005245_87,DNA_LTR_TE_00005245_87 1|1:DNA_LTR_TE_00005245_87,DNA_LTR_TE_00005245_87 0|1:DNA_LTR_TE_00005245_87,DNA_LTR_TE_00005245_87 0|1:DNA_LTR_TE_00005245_87,DNA_LTR_TE_00005245_87 1|1:DNA_LTR_TE_00005245_87,DNA_LTR_TE_00005245_87 0|1:DNA_LTR_TE_00005245_87,DNA_LTR_TE_00005245_87"

The vcf file was generated with the Pangenie make pangenome from assemblies script. EDTA was used to annotate each individual pangenome haplotype to create a pangenome TE library. These scripts aim to apply the EDTA repeat library to the vcf to facilitate viewing in IGV and to help calculate population frequencies of specific repeat families.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md
conda_env.yml		conda_env.yml
vcf_anno.py		vcf_anno.py
vcf_modify.py		vcf_modify.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VCF_TE_annotate

About

Releases

Packages

Languages

License

swomics/VCF_TE_annotate

Folders and files

Latest commit

History

Repository files navigation

VCF_TE_annotate

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages