Skip to content

Bash wrapper script to generate count/taxon/OTU tables from RMA6 files (wrapper to MEGAN6 rma2info)

License

Notifications You must be signed in to change notification settings

jfy133/rma-tabuliser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 

Repository files navigation

rma-tabuliser

DOI GitHub GitHub release (latest by date) GitHub issues

Description

Bash script to generate count/taxon/OTU tables from RMA6 files (wrapper to MEGAN6 rma2info).

Requirements

Mandatory

  • MEGAN (>= v6.21.7)
    • ⚠️ contents of tools/ must be in $PATH
    • (bio)conda install recommended
  • awk
    • Tested on GNU Awk 5.0.1
  • sed
    • Tested on sed (GNU sed) 4.7
  • grep
    • Tested on grep (GNU grep) 3.4
  • cat
    • Tested on cat (GNU coreutils) 8.30
  • sort
    • Tested on sort (GNU coreutils) 8.30

Optional

  • GNU parallel
    • For in-parallel processing of multiple input files

Usage

$ rma-tabuliser -h

  RMA-TABULISER

  NAME

      rma-tabuliser - convert multiple MEGAN RMA6 files to count tables

  SYNOPSIS

      rma-tabuliser -d <input_directory> [OPTIONS]...

  DESCRIPTION

      rma-tabuliser is a bash script that takes multiple RMA6 files from MEGAN, extracts nodes and counts, and merges
      this information into a table of samples as columns, and nodes as rows, aligned reads in cells. It also allows
      some optional filtering functionality for Taxonomy-based tables based on taxonomic levels.

      Requires: MEGAN (>= v6.21.7) to be installed on your system, and the contents of the tools/ directory (in the
      MEGAN installation path) to be in your $PATH. (Tip: the bioconda version of MEGAN puts these tools already in
      your path).

      The resulting table, count_table.tsv, will be saved alongside the RMA6 files.

  OPTIONS

    MANDATORY

      -d [PATH]

      Input directory containing RMA6 files (RMA6  files should not be in in daughter-directories!)

    OPTIONAL

      -c [CLASS]      Type of count table to create. Options are: EC EGGNOG GTDB INTERPRO2GO KEGG SEED Taxonomy. Default: Taxonomy

      -h              Display this help message.

      -k              Specify to keep intermediate files.

      -n              Specify to print names of each feature, instead of ID numbers.

      -r [RANK]

        For Taxonomy class tables, specify which major taxonomic rank to filter from. Use first letter of the rank in
        captials. Specify 'A' for no filtering (i.e. all ranks). Options are: A D K P C O F G S. Default A

      -s              Specify to summarise node counts, i.e. all
      
      -p              Specify to print paths as well as of names.

      -t [N_THREADS]

        Specify the number of threads to parallelise processing of files. Note: this requires GNU parallel to be
        installed and avaliable on your $PATH. Default: 1

      -u              Specify to include unassigned reads in output table.



      -v              Print version to terminal.

      -V              Print verbose information during processing.



  EXAMPLES

      rma-tabuliser -d /path/to/files/
          Will generate a Taxonomy table, with node ID numbers at all taxonomic levels.

      rma-tabuliser -d /path/to/files/ -c Taxonomy -n -t 2 -r 'S'
          Will generate a Taxonomy table, with node names and filtered to S(pecies) level, processing 2 files at a time.

      rma-tabuliser -d /path/to/files/ -c Taxonomy -n -r 'G' -s
          Will generate a Taxonomy table, with node names and filtered to G(enus) level with counts on daughter nodes included in genus count.

  AUTHOR

      James A. Fellows Yates (jfy133@gmail.com)

  VERSION

    0.1.0

Acknowledgements

  • @ivelsko for testing

About

Bash wrapper script to generate count/taxon/OTU tables from RMA6 files (wrapper to MEGAN6 rma2info)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages