Quantification of genes present in the gene catalog among samples #423
Replies: 12 comments 10 replies
-
My personal view is that one should quantify genomes and not genes. The whole point of assembly-based metagenomics is to work with the genomes instead of genes. What do you think @fconstancias : Why would you quantify genes instead of genomes? I made some scripts on how to use the genome output of atlas. But obviously you can do everything with atlas:
gives you the two tables:
Where the second one is the gene counts normalized to the gene length. However, I would take the "Nmapped reads" and use a CODA approach. For more information: Microbiome Datasets Are Compositional: And This Is Not Optional https://doi.org/10.3389/fmicb.2017.02224 The tutorial is for 16S sequencing but the same logic applies also for genome- or gene-based metagenomics. Last thing: In the next days I will work on an extension for the gene catalog workflow on https://github.com/metagenome-atlas/genecatalog_atlas E.g. Make genecatalogs at 100id, 90id and 50id. |
Beta Was this translation helpful? Give feedback.
-
@SilasK thanks for your feedback.
Well, it all depend on your question afterall but it is good to have both because most of the case reconstructing all the MAGs from all the populations of all our sample is impossible. And if you want to compare oral samples from healthy patients or from patient with dental caries for instance, you migh want to check if genes present and/or expressed are specific. I agree it is even better to do it at the MAG level.
I have already sucessfully generated a genecatalog with
thanks |
Beta Was this translation helpful? Give feedback.
-
Yes, of course. Just run atlas in the same working directory.
Do you have experience with bwa. I've heard you have to tweak the parameters to get a mapping done in a reasonable time. About the mapping, now it's done with bbmap, and I map the paired-end reads. After reflection, this is maybe not the best way to do it. As the two ends of paired-end reads might mao to different genes. |
Beta Was this translation helpful? Give feedback.
-
I usually use Bowtie2 but I haven't benchmark the different options.
I did but it seems atlas did took the already processed assembly but somehow is running again the annotation :
|
Beta Was this translation helpful? Give feedback.
-
Sorry, this shouldn't happen. You have the combined annotation ? eggnog.tsv Then stop the running process.
This shouldn't start the annotation. |
Beta Was this translation helpful? Give feedback.
-
I am having some issues running "genecatalog combine_gene_coverages". Atlas can't seem to find a sam file. I can see while its running bam and sam files are created, but then the sam is removed and only the bam remains. However atlas seems to want to find the missing sam: Currently running this via singularity (atlas, version 2.6a2) on an HPC [2021-07-29 10:46 INFO] Executing: snakemake --snakefile /usr/local/lib/python3.7/site-packages/atlas/Snakefile --directory /scratch/user/metagenome-atlas --rerun-incomplete --configfile '/scratch/user/metagenome-atlas/config.yaml' --nolock --use-conda --conda-prefix /scratch/user/metagenome-atlas/databases/conda_envs --scheduler greedy genecatalog combine_gene_coverages --cores 24 Building DAG of jobs... align_reads_to_Genecatalog 3 24 24 [Thu Jul 29 10:46:44 2021] ESC[33mlocalrules directive specifies rules that are not present in the Snakefile: [Thu Jul 29 11:28:05 2021] ESC[33mlocalrules directive specifies rules that are not present in the Snakefile: thoughts? |
Beta Was this translation helpful? Give feedback.
-
@soilmicrobiome Thank you for the reporting. I solved the issue in v. 2.6a3 (master branch). I'm testing it and then releasing it via conda soon. |
Beta Was this translation helpful? Give feedback.
-
Great thanks very much! Will this be available in the singularity/docker as well? |
Beta Was this translation helpful? Give feedback.
-
Yes, yes. As you do in #359 You install atlas outside the singularity. |
Beta Was this translation helpful? Give feedback.
-
Until a new conda comes with the gene abundance part arrives, wondering about running this externally - to generate e.g FPKM (fragments per kilobase per million mappable reads) or TPM (transcripts per million). I assume the correct files to use would be: /path_to_samples/sample2/sequence_quality_control/sample2_QC_R1.fastq.gz genes to map to But if i have more than one sample - the genecatalog is combined for all samples? Would this lead to issues? or just map each sample one at a time against it? |
Beta Was this translation helpful? Give feedback.
-
This calculation is implemented in atlas v2.8 |
Beta Was this translation helpful? Give feedback.
-
Hi everyone, Great new feature! Before atlas I searched the raw or clean reads using diamond whenever I needed to quantify some specific proteins for which I had the sequences in fasta files among samples. Usually, these genes are not yet in the KEGG database etc. How would you look for these genes using the atlas Gene Catalog? Would you blast them against the |
Beta Was this translation helpful? Give feedback.
-
Dear atlas developers/users,
I have generated a gene catalog using atlas. Do you have any guidline how to map back and quantify the proportion of genes (I guess noramlizing by gene length) in the different samples used to generate the gene catalog?
Thanks a lot.
Beta Was this translation helpful? Give feedback.
All reactions