Skip to content

USDA-ARS-GBRU/xmfa_tools

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 

Repository files navigation

Public Domain Mark
This work is free of known copyright restrictions.

xmfa_tools

Test data sets, scripts and additional documentation specific to skim-seq processing are also available at the PanPipes repository.

Description:

xmfa_tools.pl can be used to create GFA sequence graphs,
gapped fasta files and/or sorted multi-sequence alignments
from raw XMFA files. Additionally, invalid gaps (see
--nogapfilter option below) are also removed.

Sorting is accomplished by defining a multi-pass sort using
sequence IDs found in the XMFA header. The primary sort
sequence ID is used to establish a backbone of blocks.
Additional sorting passes recursively extend the backbone
blocks using the non-primary sequence ID block positions.

Blocks are oriented so that the primary sort sequence is
always on the forward strand. Blocks that do not contain
a primary sort alignment will be oriented based on the
sequence alignment of highest sort priority. When the
highest sort priority sequence alignment is on the reverse
strand, all block sequences are reverse complemented.

Usage:

xmfa_tools.pl -x xmfa.file [options]

Options:

required:

 -x --xmfa     input xmfa file (required)

processing:

 -p --print    print xmfa seq ids, names and files, then exit

 -s --sort     enable block sorting

 -o --order    sort order
                 provide one or more seq ids (from header)
                 default: sort by 1, 2, 3... 
                 example: --order 2 3 1 (sort by 2, then 3, then 1)
                 example: --order 3 (sort by 3, then 1, then 2)

 --nogapfilter disable invalid gap filter
                 default: invalid gap filtering is enabled
                 invalid gap example:
                 ACTAGCTGATG--------CTGACGTAATCGTGATGATCGATGCTGA
                 ACTAGCTGATGCTGACGTA--------ATCGTGATGATCGATGCTGA
                 ACTAGCTGATGCTGACGTA--------ATCGTGATGATCGATGCTGA

 --noseqnames  disable use of seq names in fasta and gfa files
                 By default, seq names (viewed using -p option)
                 are used to name output fasta files and gfa path
                 records. The --noseqnames option will disable
                 the use of seq names and use seq ids instead.

 -i --include  include only specified seq ids in output
                 default: include all
                 example: --include 1 3 7

 -t --threads  number of vg threads to use for gfa processing
                 Current vg versions yield a very modest gain
                 in processing speed with multithreading enabled.
                 default: 1

output:

 --xmfaout     output xmfa file (requires -s/--sort or
                 -i/--include)

 -g --gfa      output gfa file (vg-based, similar to v1 spec)

 --gfapostfix  include gfa seq postfix in gfa path records
                 default: no postfix
                 example: --gfapostfix ".chr01"

 -v --vg       path to vg executable (required for gfa processing)
                 default: autodetect in $PATH (if available)

 -c --coords   output fasta coords file
                 specifying a fasta coords file will generate
                 a gapped fasta file for each (selected) sequence

 -n --null     output fasta coords null record value
                 default: "NA"

 --fastadir    output fasta directory
                 default: current working directory

 --fapostfix   output gapped fasta file postfix
                 default: ".sort.gapped.fa"

 -l --linker   output fasta block sequence linker
                 default: no linker
                 example: -l "NNNNNNNNNN"

help:

 -h --help     display help menu

About

utilities for interacting with XMFA files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Perl 100.0%