Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
mtisza1 committed Jan 9, 2024
1 parent 6f9f572 commit ed4e895
Show file tree
Hide file tree
Showing 5 changed files with 81 additions and 24 deletions.
105 changes: 81 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,19 @@

Discover and annotate the virome

`Cenote-Taker 3` is a bioinformatics tool that scales from individual genomes sequences to massive metagenome assemblies to:
![Logo](images/cenote-taker_3_logo.png)

Works on your laptop or HPC (compatible with MacOS and Linux)

`Cenote-Taker 3` is a virus bioinformatics tool that scales from individual genomes sequences to massive metagenome assemblies to:

1) Identify sequences containing genes specific to viruses (virus hallmark genes)

2) Annotate virus sequences including:

---a) adaptive ORF calling

---b) a large catalog of HMMs from virus gene families
---b) a large catalog of HMMs from virus gene families for functional annotation

---c) Hierarchical taxonomy assignment based on hallmark genes

Expand All @@ -20,6 +24,14 @@ Discover and annotate the virome

**Also, `Cenote-Taker 3` is very fast, many many times faster than `Cenote-Taker 2`, and faster than comparable annotation using `pharokka` (in my hands)**

Image of example genome map:

![Map](images/genome_map1.png)

## Schematic

![Schematic](images/cenote-taker_3_schematic.png)

## Installation Instructions

*A bioconda package is forthcoming, but this is considered a beta build, so I'm holding off on that*
Expand All @@ -30,34 +42,55 @@ Discover and annotate the virome

2) Using `mamba` (package manager within `conda`) and the provided yaml file, make the environment:

`mamba env create -f Cenote-Taker3/environment/ct3_beta_env.yaml`
`mamba env create -f Cenote-Taker3/environment/ct3_env.yaml`

*Versions used in test installations*

mamba 1.5.1

conda 23.7.4


3) Activate the conda environment.

`conda activate ct3_beta`
`conda activate ct3_env`

4) Change to repo and `pip` install command line tool.

`cd Cenote-Taker3`

`pip install .`

*You should be able to type `cenotetaker3` in terminal to bring up help menu now*
*You should be able to type `cenotetaker3` and 'get_ct3_dbs' in terminal to bring up help menu now*

5) Change to a directory where you'd like to install databases and run database script, specify DB directory with `-o`.

*Total DB file size of 3.0 GB after file decompression*

`cd ..`

`python Cenote-Taker3/src/cenote/get_ct3_databases.py -o ct3_DBs --hmm T --mmseqs_tax T --mmseqs_cdd T --domain_list T`
`get_ct3_dbs -o ct3_DBs --hmm T --mmseqs_tax T --mmseqs_cdd T --domain_list T`

<details>

<summary>With optional hhsuite databases</summary>

Warning: due to inconsistent server speed, these downloads may take over 2 hours.

You may download one or more hhsuite DB.

The data footprint is:

| Database | Size |
|----------|--------|
| CDD | 6.1 GB |
| pfam | 4.6 GB |
| pdb70 | 56 GB |

```
get_ct3_dbs -o ct3_DBs --hmm T --mmseqs_tax T --mmseqs_cdd T --domain_list T --hhCDD T --hhPFAM T --hhPDB T
```

</details>

6) Set the database directory as a conda environmental variable.

Expand All @@ -69,50 +102,74 @@ conda 23.7.4

### Help Menu

```
```
cenotetaker3 -h
```


### Test contigs

```
```
cenotetaker3 -c Cenote-Taker3/test_data/testcontigs_DNA_ct2.fasta -r test_ct3 -p T
```

### Default Discover and Annotate

```
```
cenotetaker3 -c my_metagenome_contigs.fna -r my_meta_ct3 -p T
```

### Discover and Annotate, Force `prodigal` (faster)
### Discover and Annotate, Force `prodigal` (prodigal-gv is default)

```
```
cenotetaker3 -c my_metagenome_contigs.fna -r my_meta_ct3pr -p T --caller prodigal
```

### Just Annotate

```
```
cenotetaker3 -c my_virus_contigs.fna -r my_virs_ct3 -p F -am T
```

### Notes

`Cenote-Taker 3` is under active development, so please open an issue if anything seems unusual or any errors occur. It's likely that I've not tested every parameter combination, and bugs will be a simple fix.
### Choose which HMM DBs are hallmark (virion rdrp is default)

```
cenotetaker3 -c my_metagenome_contigs.fna -r my_meta_ct3 -p T -db virion rdrp dnarep
```

### To-do list
### Calculate coverage level with reads

* Add module to use `HHsearch` for gene annotation
```
cenotetaker3 -c my_metagenome_contigs.fna -r my_meta_ct3 -p T --reads my_reads/*fastq
```

* Add RDRP database as option for virus discovery
## Output Files

``` {style="color: blue"}
{run_title}/
| {run_title}_virus_summary.tsv <- main summary file for each virus
| {run_title}_virus_sequences.fna <- all virus genome seqs
| {run_title}_virus_AA.faa <- all virus AA seqs
| {run_title}_prune_summary.tsv <- summary of pruning of each sequence
| final_genes_to_contigs_annotation_summary.tsv <- annotation info, all genes
| run_arguments.txt <- arguments used in this run
│ {run_title}_cenotetaker.log <- main log file
└───sequin_and_genome_maps/
│ │ {run_title}*gbf <- genome maps
│ │ {run_title}*fsa <- genome sequence
│ │ {run_title}*gtf <- feature table gtf format
│ │ {run_title}*tbl <- feature table sequin format
│ │ {run_title}*sqn <- non-human-readable sequin file for GenBank sub
│ │ {run_title}*cmt <- sequin comment file
└───ct_processing/
│ │ --- many intermediate files ---
```

* Nucleotide or kmer-based species-level taxonomy
### Notes

* Incorporate `prodigal-gv` as alternative ORF caller
`Cenote-Taker 3` is under active development, so please open an issue if anything seems unusual or any errors occur. It's likely that I've not tested every parameter combination, and bugs will be a simple fix.

* Update HMM database to increase consistency of names/functions between similar models
### To-do list

* instructions for manual curation -> GenBank deposit of `Cenote-Taker 3` output
- instructions for manual curation -\> GenBank deposit of `Cenote-Taker 3` output
Binary file added images/cenote-taker_3_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/cenote-taker_3_schematic.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed images/cenote-taker_logo.png
Binary file not shown.
Binary file added images/genome_map1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit ed4e895

Please sign in to comment.