Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ko2kegg_abundance leads to 0 keggs #126

Open
ranfoxall opened this issue Nov 26, 2024 · 3 comments
Open

ko2kegg_abundance leads to 0 keggs #126

ranfoxall opened this issue Nov 26, 2024 · 3 comments

Comments

@ranfoxall
Copy link

ranfoxall commented Nov 26, 2024

I've been trying to follow your tutorial and have been struggling at the first few steps.. I've determined my issue is my table with KO abundances is converted to a kegg table with only my sample names and no kegg abundances. Could you please help me to try figure out what the issue is? I've been trying my best, but to no avail.

pred_metagenome_unstrat.tsv.gz

kegg_abundance <- ko2kegg_abundance(file = "/Picrust2/KO/pred_metagenome_unstrat.tsv")

> kegg_abundance [1] 1B_1 1B_2 1B_3 2B_1 2B_2 2B_3 3B_1 3B_2 3B_3 4B_1 4B_2 4B_3 5B_1 5B_2 5B_3 6B_1 [17] 6B_2 6B_3 7B_1 7B_2 7B_3 8B_1 8B_2 8B_3 9B_1 9B_2 9B_3 <0 rows> (or 0-length row.names)

> str(kegg_abundance) 'data.frame': 0 obs. of 27 variables: $ 1B_1: num $ 1B_2: num..

@cafferychen777
Copy link
Owner

Hi @ranfoxall,

I apologize for any inconvenience caused by the recent bug in the ko2kegg_abundance function. I have just fixed this issue in the latest version of ggpicrust2.

To resolve this problem, please reinstall the package using:

remotes::install_github("cafferychen777/ggpicrust2")

The function should now work correctly and return the proper KEGG pathway abundance results. If you encounter any other issues, please don't hesitate to let me know.

Thank you for bringing this to my attention and for your patience!

Best regards,
Chen

@ranfoxall
Copy link
Author

ranfoxall commented Dec 2, 2024

Hi @cafferychen777 ,
Thanks so much for solving the issue. I can now successfully run ko2kegg_abundance! I thought solving the ko2kegg problem would solve my other issue but it hasn't. I have a colleague with an older version of ggpicrust which my data/metadata work with (we tested to see whether it was my install). We have a manuscript in review that uses ggpicrust! Hopefully we will publish it soon and we can reference your very useful package!
I thought I could solve my issues by installing an older version of ggpicrust, but I can't figure out how to install the older version, so I'm still struggling to get your awesome package to work in my hands.
pred_metagenome_unstrat.tsv.gz
IMTALizzy_SampleData.txt

My new issue
KOabundance_file<-"/Users/randifoxall/Desktop/IMTA_Lizzy/IMTA16s/Picrust2/KO/pred_metagenome_unstrat.tsv" KOabundance_data <- read_delim(KOabundance_file, delim = "\t", col_names = TRUE, trim_ws = TRUE) results_data_input <- ggpicrust2(data = KOabundance_data, metadata = metadata, group = "Treatment", pathway = "KO", daa_method = "LinDA", ko_to_kegg = TRUE, order = "pathway_class", p_values_bar = TRUE, x_lab = "pathway_name", reference="SETTLE")

and I get the error:

`Starting the ggpicrust2 analysis...

Converting KO to KEGG...

Processing provided data frame...
Loading KEGG reference data. This might take a while...
Processing 306 KEGG pathways...
  |====================================================================================================| 100%
Total KO matches found: 4852
Number of non-zero pathways before filtering: 263
Removing KEGG pathways with zero abundance across all samples...
Final number of KEGG pathways: 263
Performing pathway differential abundance analysis...

Using column 'sample_name' as sample identifier
Running LinDA analysis...
Error in data.frame(feature = rownames(first_comparison), method = "LinDA",  : 
  arguments imply differing number of rows: 263, 1, 2
In addition: Warning message:
In MicrobiomeStat::linda(feature.dat = feature.dat, meta.dat = meta.dat,  :
  Some features have less than 3 nonzero values! 
						They have virtually no statistical power. You may consider filtering them in the analysis!`

I've attached my metadata and abundance files. Any suggestions or insights are very much appreciated!

@cafferychen777
Copy link
Owner

Hi @ranfoxall,

Thank you for reporting this issue. I've analyzed the LinDA analysis error and identified two main causes:

  1. Features with insufficient non-zero values
  2. Row count mismatch in results data frame creation

Here's a step-by-step solution:

  1. Filter low prevalence features:

# Filter out features with low prevalence
min_samples <- 3  # minimum number of non-zero samples required
keep_features <- rowSums(KOabundance_data > 0) >= min_samples
KOabundance_data_filtered <- KOabundance_data[keep_features, ]
  1. Ensure metadata and abundance data alignment:

# Match sample names between abundance and metadata
common_samples <- intersect(colnames(KOabundance_data), metadata$sample_name)
KOabundance_data <- KOabundance_data[, common_samples]
metadata <- metadata[match(common_samples, metadata$sample_name), ]
  1. Run analysis with modified data:

results_data_input <- ggpicrust2(
  data = KOabundance_data_filtered,  # Use filtered data
  metadata = metadata,
  group = "Treatment",
  pathway = "KO",
  daa_method = "LinDA",
  ko_to_kegg = TRUE,
  order = "pathway_class",
  p_values_bar = TRUE,
  x_lab = "pathway_name",
  reference = "SETTLE"
)

If issues persist, try these additional steps:

  1. Verify data structure:

# Print dimensions and sample names
print(dim(KOabundance_data))
print(colnames(KOabundance_data))
print(dim(metadata))
print(metadata$sample_name)

# Check Treatment levels
print(table(metadata$Treatment))
  1. Alternative method:

results_data_input <- ggpicrust2(
  data = KOabundance_data_filtered,
  metadata = metadata,
  group = "Treatment",
  pathway = "KO",
  daa_method = "ALDEx2",  # Alternative to LinDA
  ko_to_kegg = TRUE,
  order = "pathway_class",
  p_values_bar = TRUE,
  x_lab = "pathway_name"
)

Note: The warning about features having less than 3 nonzero values indicates sparse pathways in your dataset. The filtering step above should address this issue by removing these sparse features before analysis.

Please let me know if you need any clarification or encounter other issues!

Best regards,
Chen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants