Skip to content

Configuration

Pedro Q edited this page May 13, 2021 · 16 revisions

Requirements

Software requirements

This tool requires a Conda environment with the following packages:

  • Python, tested with v3.7.3 but anything above v3 should be fine
  • requests, tested with v2.22.0
  • numpy, tested with v1.18.1
  • nltk, tested with v3.4.4
  • sqlite, tested with v3.30.1
  • psutil, tested with 5.6.7
  • HMMER, tested with v3.2.1
  • GCC, for compilation of cython code (most systems should have it by default)

A conda environment is available - mantis_env.yml.

How much space do we need?

The lineage annotation with eggNOG requires a lot of space since eggNOG's HMM database is quite extensive. For the taxonomy you will need around 1.5 terabytes. The rest of the HMMs only take up around 27 gibabytes. To check default datasets see Reference data
You don't need to use all of this data though!

Installation

Mantis is easy to setup, simply:

  1. Clone the repository with git
  2. Edit MANTIS.config with desired paths
  3. Create a Conda environment for Mantis with conda env create -f mantis/mantis_env.yml
  4. Activate the previously created Conda environment
  5. Setup all default databases for Mantis withpython mantis setup_databases

To check your installation run:

python mantis check_installation

Keep in mind the installation will take a while as a lot of data is downloaded. If NOG's hmms are not used it can finish within a couple of hours, otherwise it may take a few days.
To customize your installation (setting installation paths or removing certain HMMs) please refer to configuration.

Configuration

The MANTIS.config allows the user to edit and add custom HMMs. An example config file is included, please use the same syntax, otherwise configuration won't be taken into account.

Mantis comes with a MANTIS.config file which serves as the default to all the users in the system. You can configure your own MANTIS.config file by copying this file and editing it as you wish. Afterwards you can just add -mc <path/to/edited_MANTIS.config>.

Conda environment

It's preferable to use a self contained environment, avoiding compatibility issues, but you can run Mantis in whichever Conda environment you'd like, simply active it and run Mantis.

This is not necessary, but if you'd like to share your Mantis environment across multiple users do the following:

  1. Create the Mantis environment in a group folder location, by running conda env create -f mantis_env.yml -p <path/to/group/folder/> Future Mantis users now need to tot hef following:
  2. Run conda config to generate the .condarc file
  3. Edit .condarc file (usually located in your root folder) and add:
envs_dirs:  
    - path/to/group/folder/  

Restricting eggNOG HMMs download

Downloading the whole eggNOG compendium of HMMs may not be beyond your usage scope, e.g., you are only annotating a few seelect taxon. Therefore you have the option of selecting only the eggNOG HMMs you want to download. To do so, insert a list of IDs or organism names in the MANTIS.config file line nog_tax=. If an organism name is introduced, an automatic web search retrieves the respective NCBI ID. A lineage for each NCBI ID is then generated and all the required TSHMMs are downloaded. The line nog_tax is commented by default.

Please keep in mind that this will also restrict the general eggNOG HMM. When downloading the full eggNOG compendium, the general eggNOG HMM will contain all non-redundant HMMs from 2157 (Archaea), 2 (Bacteria), 2759 (Eukaryota), 10239 (Viruses), 28384 (Others), and 12908 (Unclassified). However, when restricting the taxon with nog_tax, the general HMM will only contain the top-level HMMs from the selected taxa. For example, if using nog_tax=562, the general eggNOG HMM will only contain the HMMs from taxon 2 since the taxonomic lineage of the NCBI taxon 562 corresponds to 2 - 1224 - 1236 - 91347 - 543 - 561 - 562.

This will not affect the NPFM TSHMMs download.

Setting your own paths

After running setup_databases you may wish to move data around, if so, make sure you change all these paths:

ncbi_resources_folder=/path/to/mantis/Resources/NCBI/  
default_ref_folder=/path/to/mantis/hmm/  
nog_hmm_folder=/path/to/nog/  
ncbi_hmm_folder=/path/to/ncbi/  
pfam_hmm_folder=/path/to/pfam/  
kofam_hmm_folder=/path/to/kofam/  
tigrfam_hmm_folder=/path/to/tigrfam/  
tcdb_seq_folder=/path/to/tcdb/  

If you don't move any of these folders, don't worry about configuring this.
If you don't want all the hmm files to be used, you can change the path to 'NA', for example: nog_hmm_folder=NA

Important: All of the default hmms belong to their respective authors, I haven't compiled any of this data, I'm merely distributing it in a more automated manner! Make sure you cite them when using this tool/their data.

Custom references

Custom references can be added in MANTIS.config by adding their absolute path or folder path, for example:

    custom_ref=path/to/hmm/custom1.hmm
    custom_ref=path/to/hmm/custom2.dmnd

Alternatively you may add them to the custom_refs folder, for example:

    Mantis/References/Custom_references/custom1/custom1.hmm
    Mantis/References/Custom_references/custom2/custom2.dmnd

You may also redifine the custom_refs folder path by adding your preferred path to custom_refs_folder in the MANTIS.config file, for example:

    custom_refs_folder=path/to/custom_refs/

Adding custom HMMs

If custom HMMs are divided 1 hmm/hmm file make sure you merge them together using the merge_hmm_folder function.
If HMMs from the same source are not merged, hits processing won't take into account potential hmm hits overlaps.
Remember to use HMMER's hmmpress on the custom hmms!

Adding custom DMNDs

When using a list of sequences as a reference please use Diamond to generate a .dmnd file.

Custom references metadata

Most metadata is formatted differently, therefore, for custom references this tool requires the metadata to be formatted in a specific manner, otherwise only the hmm/sequence name will be extracted as "metadata". To see an example please go to hmm/custom_refs/ where you will find two files custom.hmm and custom.tsv.
In the custom.tsv you can see how the metadata should be formatted. In the first column there should be the HMM/sequence name, in the columns that come after any kind of metadata can be added. To specify the type of metadata simply add the type to the headers of the .tsv file. Columns without any headers will be assumed to be a free-text description. Some identifiers will still be searched for in this free text (EC, KO, TCDB, DUF, GO, and COG). For the custom metadata to be recognized please place the custom metadata in the same folder as the custom reference file and use the same name but with a .tsv extension, for example: path/to/custom_ref/custom123.hmm and path/to/custom_ref/custom123.tsv.
The metadata tsv files should have the following format:

Reference enzyme_ec kegg_ko description
HMM_1 2.1.15.64 this is a description
HMM_1 3.2.9.13 KO0002 this is a description

Currently Mantis uses all these ID types:

  • kegg_map_lineage
  • kegg_ko
  • description
  • kegg_cazy
  • eggnog
  • go
  • cog
  • pfam
  • tigrfam
  • tcdb
  • enzyme_ec

Please make sure you use the same format when adding your custom metadata tsv. Other links are supported but may not be properly recognized during consensus generation.

Setting references weight

When generating the consensus, some references can be given more weight, this is important because some are more specific than others. To configure the weight of a reference simply change the MANTIS.config file:

  • example: nog_hmm_folder should be nog_weight=X where X is the weight of the HMM (0-1)
  • example: custom_hmm=path/to/customHMM.hmm should be customref_weight=X where X is the weight of the HMM (0-1)

In essence make sure the names of the weights correspond to the path of the references.
Default weight is 0.7.

Updating reference data

Reference data can be updated by simply deleteing the old reference data folders (e.g., KOfam) and running setup_databases, Mantis will then download the most recent data from the respective source.