Best way to go from long GO list to GO slim terms? #1250

sr320 · 2021-07-13T20:11:37Z

sr320
Jul 13, 2021
Maintainer

What is the best way to go from a file like

TRINITY_DN0_c0_g1	GO:0003674,GO:0003824,GO:0003964,GO:0006139,GO:0006259,GO:0006310,GO:0006313,GO:0006725,GO:0006807,GO:0008150,GO:0008152,GO:0009987,GO:0016740,GO:0016772,GO:0016779,GO:0032196,GO:0034061,GO:0034641,GO:0043170,GO:0044237,GO:0044238,GO:0044260,GO:0044699,GO:0044710,GO:0044763,GO:0046483,GO:0071704,GO:0090304,GO:1901360
TRINITY_DN0_c10_g1	GO:0003674,GO:0003824,GO:0004659,GO:0004660,GO:0005488,GO:0005575,GO:0005829,GO:0005875,GO:0005965,GO:0006464,GO:0006807,GO:0008150,GO:0008152,GO:0008270,GO:0008318,GO:0009987,GO:0016740,GO:0016765,GO:0018342,GO:0018343,GO:0019538,GO:0032991,GO:0036211,GO:0043167,GO:0043169,GO:0043170,GO:0043234,GO:0043412,GO:0044237,GO:0044238,GO:0044260,GO:0044267,GO:0044422,GO:0044424,GO:0044430,GO:0044444,GO:0044446,GO:0044464,GO:0046872,GO:0046914,GO:0071704,GO:0097354,GO:1901564,GO:1902494,GO:1990234
TRINITY_DN0_c2_g4	GO:0000166,GO:0003674,GO:0005488,GO:0005524,GO:0005575,GO:0005737,GO:0005856,GO:0017076,GO:0030554,GO:0032553,GO:0032555,GO:0032559,GO:0035639,GO:0036094,GO:0043167,GO:0043168,GO:0043226,GO:0043228,GO:0043229,GO:0043232,GO:0044424,GO:0044464,GO:0097159,GO:0097367,GO:1901265,GO:1901363

specifically - https://gannet.fish.washington.edu/Atumefaciens/20210318_cbai_trinotate_transcriptome-v4.0/20210318.cbai_transcriptome_v4.0.fasta.trinotate.go_annotations.txt

to a table of GO slim terms that could be easily grepped?

Answered by kubu4

Jul 14, 2021

To address your first question:

awk '{print $2}' steven_GO.txt | tr "," "\n"

That will pull GO terms from input file (steven_GO.txt) and then create a newline-delimited file of all the GO terms.

Then, use GSEAbase in R to map to Biological Process GOslims (note: Requires goslim_generic.obo from http://geneontology.org/docs/go-subset-guide/) :

# Load necessary libraries
library(GSEABase)
library(tidyverse)

# Expects GO terms to be in first column of input file

## Get max number of fields
# Needed to handle reading in file with different number of columns in each row
max_fields <- max(na.omit((count.fields(item, sep = "\t", blank.lines.skip = TRUE))))
  
## Read in tab-delimited GOseq file
#

View full answer

sr320 · 2021-07-13T20:13:32Z

sr320
Jul 13, 2021
Maintainer Author

Good question! but have you every thought about the reverse. Say you wanted all genes related to Immune Response? why not just search for GO:0006955?
see http://amigo.geneontology.org/amigo/term/GO:0006955

0 replies

kubu4 · 2021-07-14T12:42:10Z

kubu4
Jul 14, 2021
Maintainer

To address your first question:

awk '{print $2}' steven_GO.txt | tr "," "\n"

That will pull GO terms from input file (steven_GO.txt) and then create a newline-delimited file of all the GO terms.

Then, use GSEAbase in R to map to Biological Process GOslims (note: Requires goslim_generic.obo from http://geneontology.org/docs/go-subset-guide/) :

# Load necessary libraries
library(GSEABase)
library(tidyverse)

# Expects GO terms to be in first column of input file

## Get max number of fields
# Needed to handle reading in file with different number of columns in each row
max_fields <- max(na.omit((count.fields(item, sep = "\t", blank.lines.skip = TRUE))))
  
## Read in tab-delimited GOseq file
# Use "max_fields" to populate all columns with a sequentially numbered header
 go_seqs <- read.table(item,
                        sep = "\t",
                        header = TRUE,
                        col.names = paste0("V",seq_len(max_fields)),
                        fill = TRUE)
  
## Grab just the individual GO terms from the "category" column)
 goterms <- as.character(go_seqs$V1)
  
### Use GSEA to map GO terms to GOslims
  
## Store goterms as GSEA object
myCollection <- GOCollection(goterms)
  
## Use generic GOslim file to create a GOslim collection
  
# I downloaded goslim_generic.obo from http://geneontology.org/docs/go-subset-guide/
# then i moved it to the R library for GSEABase in the extdata folder
# in addition to using the command here - I think they're both required.
slim <- getOBOCollection("~/data/goslim_generic.obo")
  
## Map GO terms to GOslims and select Biological Processes group
slims <- goSlim(myCollection, slim, "BP", verbose = TRUE)

EDITED: Forgot to address GO to GOslim part of question.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best way to go from long GO list to GO slim terms? #1250

{{title}}

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Best way to go from long GO list to GO slim terms? #1250

sr320 Jul 13, 2021 Maintainer

Replies: 2 comments

sr320 Jul 13, 2021 Maintainer Author

kubu4 Jul 14, 2021 Maintainer

sr320
Jul 13, 2021
Maintainer

sr320
Jul 13, 2021
Maintainer Author

kubu4
Jul 14, 2021
Maintainer