Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taxa names coming up as random words #1762

Open
backwards-charm opened this issue Jun 26, 2024 · 7 comments
Open

Taxa names coming up as random words #1762

backwards-charm opened this issue Jun 26, 2024 · 7 comments

Comments

@backwards-charm
Copy link

My code and output is as follows
`> # Create a taxonomy table

taxmat = matrix(sample(words, 32, replace = TRUE), nrow = nrow(otumat), ncol = 6)
rownames(taxmat) <- rownames(otumat)
colnames(taxmat) <- c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus")
taxmat
Kingdom Phylum Class Order Family Genus
OTU1 "white" "weigh" "white" "weigh" "white" "weigh"
OTU2 "load" "twenty" "load" "twenty" "load" "twenty"
OTU3 "print" "cover" "print" "cover" "print" "cover"
OTU4 "why" "support" "why" "support" "why" "support"
OTU5 "debate" "luck" "debate" "luck" "debate" "luck"
OTU6 "left" "room" "left" "room" "left" "room"
OTU7 "large" "already" "large" "already" "large" "already"
OTU8 "hall" "thirteen" "hall" "thirteen" "hall" "thirteen"
OTU9 "cause" "mister" "cause" "mister" "cause" "mister"
OTU10 "must" "guess" "must" "guess" "must" "guess"
OTU11 "function" "correct" "function" "correct" "function" "correct"
OTU12 "of" "try" "of" "try" "of" "try"
OTU13 "realise" "scotland" "realise" "scotland" "realise" "scotland"
OTU14 "quarter" "never" "quarter" "never" "quarter" "never"
OTU15 "sunday" "already" "sunday" "already" "sunday" "already"
OTU16 "hand" "jesus" "hand" "jesus" "hand" "jesus" `

Why am I getting random words for my OTU table (such as "print", "why", "debate", "quarter") and not the actual names of the bacterial classes (such as "Gammaproteobacteria", "Bacilli", "Actinobacteria")? And how do I amend this issue?

@benjjneb
Copy link
Contributor

taxmat = matrix(sample(words, 32, replace = TRUE), nrow = nrow(otumat), ncol = 6)

Because you are making a matrix of random words.

@backwards-charm
Copy link
Author

How do I change it to the taxa names I have from dada2? I am new to R

@benjjneb
Copy link
Contributor

Did you already make a taxonomy table in dada2?
If so, just use that. If not, go to the assign taxonomy section of the dada2 tutorial and follow those instructions.

@backwards-charm
Copy link
Author

I did, using this code

taxa <- assignTaxonomy(seqtab.nochim, "/Users/Desktop/silva_nr99_v138.1_train_set.fa.gz", multithread=TRUE)
taxa.print <- taxa      # Removing sequence rownames for display only
rownames(taxa.print) <- NULL
head(taxa.print)

Would I have to use the code from the green "Alternatives: IdTaxa" box instead? I don't see how my output from the original code would fit in with the otu table code to ultimately make the phyloseq object

@benjjneb
Copy link
Contributor

If you continue following the dada2 tutorial, it includes the "handoff to phyloseq" section with code for creating the phyloseq object from the sequence table and the taxonomy table. Have you looked at that code?

@backwards-charm
Copy link
Author

Yes, I have. Something about it isn't working out for me.
I was told I could use a metadata file for sample naming but I couldn't figure out at which point throughout the process to add in the metadata so I went back and edited the file names to somewhat match the tutorial.

My file naming scheme is: a letter indicating treatment type (C for control, A for acids, etc.), a number indicating which replicate for the specific treatment type (1 or 2), the letter "D", and the day in which the sample was taken (0, 15, etc). So for example, one of my files is named: C1D0.

After doing this, I was able to generate some plots where "Day" is the focus using the following code:

# Create a new variable called samples.out from the rownames of seqtab.nochim
samples.out <- rownames(seqtab.nochim)
# Create a new variable called subject that is the part of samples.out before the first "D"
subject <- sapply(strsplit(samples.out, "D"), `[`, 1)
# Create a new variable called gender that is the first letter of subject
treatment <- substr(subject,1,1)
# Reassign subject to be the part of subject after the first character
subject <- substr(subject,2,999)
# Create a new variable called day that is the part of samples.out after the first "D"
day <- as.integer(sapply(strsplit(samples.out, "D"), `[`, 2))
# Create a new data frame called samdf with columns Subject, Gender and Day
samdf <- data.frame(Subject=subject, Treatment=treatment, Day=day)
# Create a new variable called When that is “Day 0" 
samdf$When <- "Day 0"
# Change the value of When to “Day 37" for all rows where Day is greater than 36
samdf$When[samdf$Day>36] <- "Day 37"
# Add the rownmanes of seqtab.nochim to samdf
rownames(samdf) <- samples.out

This looks great for Shannon-Simpson and Bray-Curtis but when I get to the bar graphs, they turn up looking a bit odd.

Originally I used the code:

# Create Bar plot
top20 <- names(sort(taxa_sums(ps), decreasing=TRUE))[1:20]
ps.top20 <- transform_sample_counts(ps, function(OTU) OTU/sum(OTU))
ps.top20 <- prune_taxa(top20, ps.top20)
plot_bar(ps.top20, x="Day", fill="Class") + facet_wrap(~When, scales="free_x")

and ended up with:
Screenshot 2024-06-26 at 4 06 37 PM

I also edited it to:

 # Create Bar plot
top1000 <- names(sort(taxa_sums(ps), decreasing=TRUE))[1:1000]
ps.top1000 <- transform_sample_counts(ps, function(OTU) OTU/sum(OTU))
ps.top1000 <- prune_taxa(top1000, ps.top1000)
plot_bar(ps.top1000, x="Day", fill="Class") + facet_wrap(~When, scales="free_x")

and the bar graphs look completely black rather than colorful:
Screenshot 2024-06-26 at 4 07 03 PM

I am also more interested in examining the differences between treatment type, rather than between days.
I am not sure how to edit the code to show me differences in treatment, seeing as I have 8 different treatment types, other than doing the method with the otu and tax table using original file names rather than changing all my file names to a naming scheme

@backwards-charm
Copy link
Author

With tis method, the plots look nice, despite being full of random words.

Tutorial from https://joey711.github.io/phyloseq/import-data.html#import_biom

# Create an OTU table
otumat = matrix(sample(1:100, 100, replace = TRUE), nrow = 16, ncol = 16)
otumat
rownames(otumat) <- paste0("OTU", 1:nrow(otumat))
colnames(otumat) <- paste0("Sample", 1:ncol(otumat))
otumat
# Create a taxonomy table
taxmat = matrix(taxa(words, 32, replace = TRUE), nrow = nrow(otumat), ncol = 6)
rownames(taxmat) <- rownames(otumat)
colnames(taxmat) <- c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus")
taxmat
class(otumat)
class(taxmat)
# Combine into a phyloseq object
library(phyloseq)
OTU = otu_table(otumat, taxa_are_rows = TRUE)
TAX = tax_table(taxmat)
OTU
TAX
physeq = phyloseq(OTU, TAX)
physeq
plot_bar(physeq, fill = "Class")
Screenshot 2024-06-26 at 4 10 54 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants