pan_genome_scripts

Python scripts for constructing pangenomes using the iterative mapping style for the Bioinformatics Methods Book chapter

Please contact @philippbayer for any comments, questions and queries regarding the scripts

splitFiles.py

This script is used to split the unmapped fastq files into R1, R2 and unpaired reads for assembly by MaSuRCA.

Usage: python splitFiles.py filename_unmapped_merged_sortedName.fastq

filter_blast.py

This script removes contaminants from the blast output based on the plant genus list. The All_plant_genus_list.txt file can be replaced with any list of genuses

Usage: python filter_blast.py [merged blast output file] All_plant_genus_list.txt > genome.best.hits.contamination.blast

contamination_removal.py

This script is used to filter the contamination out of masurca assembly based on the blast names given by BLAST

This requires the output file of the filter_blast.py script to be filtered into the uncontaminated_chickpea_contig_names.txt file

Usage: python contamination_removal.py [masurca assembly] uncontaminated_chickpea_contig_names.txt uncontaminated.scf.fasta

All_plant_genus_list.txt

A list of all known plant genera

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pan_genome_scripts

splitFiles.py

filter_blast.py

contamination_removal.py

All_plant_genus_list.txt

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
All_plant_genus_list.txt		All_plant_genus_list.txt
README.md		README.md
contamination_removal.py		contamination_removal.py
filter_blast.py		filter_blast.py
splitFiles.py		splitFiles.py

AppliedBioinformatics/pan_genome_scripts

Folders and files

Latest commit

History

Repository files navigation

pan_genome_scripts

splitFiles.py

filter_blast.py

contamination_removal.py

All_plant_genus_list.txt

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages