You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As the devs have stated that they will not further maintain version 1, I am posting here my collection of fixes and tips that enabled me to use v1.5 including KEGG/KO and UniRef databases (tested: DB setup, annotation, distill). Some of the fixes were already posted in the forum, others I found during debugging.
2 important bugfixes have been added to master since the release of v1.5.0, they can be found here v1.5.0...master, BUT NOTE:
'README.md': Does not need patching
mag_annotator/annotate_bins.py/annotate_orfs() ln1045+: This fix is wrong, the warning message is not the problem, the code logic does not do what the new message claims. What happened is that the else-block has been separated from its original, logical position, see fix below
mag_annotator/annotate_bins.py/annotate_called_genes_cmd() ln1604+ and mag_annotator/annotate_bins.py/annotate_called_genes() ln1628+: Probably an important fix (I haven't tested what happens w/o)
scripts/DRAM-setup.py ln123+: Essential fix, DRAM-setup will not work w/o
mag_annotator/annotate_bins.py/annotate_orfs() ln1045+ else...logger.warning:
the else-block (with unchanged warning message) needs to be moved back to
where it belongs, right after the elif...kofam block ln1029+
mag_annotator/database_setup.py/UniRefDescription() ln43-44:
replace the 2 occurrences of kegg_ with uniref_
mag_annotator/database_handler.py/set_database_paths() ln251: Add a new line to
function signature: gene_ko_link_loc=None,
mag_annotator/database_processing.py/prepare_databases() ln503: in dictionary process_settings change line to 'kegg': {'gene_ko_link_loc': gene_ko_link_loc, 'download_date': kegg_download_date},
For some parameters, DRAM is iffy about filenames with a relative path and fails. In this
case use an absolute path.
To run DRAM (DB setup and, e.g., annotate, but not distill) huge amounts of RAM are
necessary, at least enough to hold the biggest database (all of it including
index files etc.). This is usually the UniRef MMseqs2 DB (unless skipped): For a recent UniRef90 download this is >700GB. ATTN: DRAM does not complain or
fail if the RAM is insufficient, instead it gets stuck indefinitely.
DRAM performance (DB setup and annotate) is highly impacted by I/O speed. In my experience, DRAM cannot at all be used with UniRef on a standard file system / disk storage in HPC. However, everything works reasonably well if you have a node attached SSD: Build the DRAM database on the SDD (and archive somewhere else afterwards) and copy the database to the SSD before annotating.
Output directories (-o parameter) must not exist, DRAM will fail otherwise.
Bash globbing (wildcards) on the command line do not work. However, globbing for input files is applied within DRAM if the pattern is provided as a single quotation string literal, e.g. 'mydata/*.fa'.
The KEGG gene_ko_link_loc file must be a 2 column TSV file with a KO in every line, in the 2nd column (the ko: tag is not required). I have used the following command to extract this information from the dat files that nowadays come with KEGG:
gzip -cd "path-to-KEGG/prokaryotes.dat.gz" \
| sed -nE 's/^(\S+)\s+(K\S+).*/\1\t\2/p' \
>"./prokaryotes_2c.tsv"
The text was updated successfully, but these errors were encountered:
As the devs have stated that they will not further maintain version 1, I am posting here my collection of fixes and tips that enabled me to use v1.5 including KEGG/KO and UniRef databases (tested: DB setup, annotation, distill). Some of the fixes were already posted in the forum, others I found during debugging.
Starting point is the latest official release 1.5.0 from 2024-01-04:
https://github.com/WrightonLabCSU/DRAM/archive/refs/tags/v1.5.0.tar.gz
Bugfixes
mag_annotator/annotate_bins.py/annotate_orfs()
ln1045+: This fix is wrong, the warning message is not the problem, the code logic does not do what the new message claims. What happened is that the else-block has been separated from its original, logical position, see fix belowmag_annotator/annotate_bins.py/annotate_called_genes_cmd()
ln1604+ andmag_annotator/annotate_bins.py/annotate_called_genes()
ln1628+: Probably an important fix (I haven't tested what happens w/o)scripts/DRAM-setup.py
ln123+: Essential fix, DRAM-setup will not work w/omag_annotator/annotate_bins.py/annotate_orfs()
ln1045+else...logger.warning
:the else-block (with unchanged warning message) needs to be moved back to
where it belongs, right after the
elif...kofam
block ln1029+mag_annotator/database_processing.py/process_vogdb()
ln376: Change lineaccording to VOGDB issue--extra enclosing folder #340
mag_annotator/annotate_vgfs.py/get_gene_order()
ln189+: Change according toDram-v on Virsorter2 output: pandas error, unexpected EOF, syntax error #363 Probably an important fix (I haven't tested what happens w/o)
mag_annotator/database_setup.py/UniRefDescription()
ln43-44:replace the 2 occurrences of
kegg_
withuniref_
mag_annotator/database_handler.py/set_database_paths()
ln251: Add a new line tofunction signature:
gene_ko_link_loc=None,
mag_annotator/database_processing.py/prepare_databases()
ln503: in dictionaryprocess_settings
change line to'kegg': {'gene_ko_link_loc': gene_ko_link_loc, 'download_date': kegg_download_date},
Other notes/tips
case use an absolute path.
necessary, at least enough to hold the biggest database (all of it including
index files etc.). This is usually the UniRef MMseqs2 DB (unless skipped): For a recent UniRef90 download this is >700GB. ATTN: DRAM does not complain or
fail if the RAM is insufficient, instead it gets stuck indefinitely.
-o
parameter) must not exist, DRAM will fail otherwise.'mydata/*.fa'
.gene_ko_link_loc
file must be a 2 column TSV file with a KO in every line, in the 2nd column (theko:
tag is not required). I have used the following command to extract this information from thedat
files that nowadays come with KEGG:The text was updated successfully, but these errors were encountered: