Commit made by the Bioconductor Git-SVN bridge.

Commit id: 47b052bd356a28b6621a64ae2a89492555bcaa9c ready for 1.1.6: updated data of package and info in DESCRIPTION file Commit id: a68381b284aa922c21ed8cc0f006c843f1a02d75 Toward 1.1.6: running title shorter from full title. For heatmap, combination of subsetting and re-labelling made easier. Again, massive edits in the User Guide. Commit id: c861c4da45c846ee361329f909a445838e3ee21e toward 1.1.6: changed way to deal with synonyms in subset_scores() function. subset_scores() function now creates/updates a filters.GO slot in the output result object, stating the filters and cutoffs applied. Warnings are issue if conflicting filters and cutoff values are applied. Documentation updated for this new feature. A few = replaced by <- in man pages examples. UserGuide updated in places Commit id: 22839ac06d4b41a65bedcba17bdec4f283aa3c9c towards 1.1.6: heatmap_GO replaces blank gene names with the gene feature identifier. heatmap_GO semi-automatically resizes bottom and right margins to accomodate large gene and sample labels, respectively. Commit id: 9096744d5e7f2393d41a4775e82b639e21878562 towards 1.1.6: table_genes function supports ordering by score, rank, gene id, and gene name. Some = replaced by <- in the code. Commit id: 52ea5a0d316a11f13dd786f4919c569d27a29f1f Toward 1.1.6: Massive commit (bad practice!). Added pValue_GO function to generate permutation-based GO P-value, help page and section in UserGuide. Support external_gene_id header from Ensembl releases 75 and earlier. Added rank.by slot in GO_analyse output. rerank function updates rank.by slot. rerank function supports p-value. subset_scores supports p-value. progress bar function in toolkit. Updated AlvMac to include RPL36A, an example of gene name with multiple identifiers. Updated annotations with data from Ensembl 75 release for traceability. UserGuide updated to describe how to prepare local annotations. Help pages updated to a more consistent indentation in example section. Updated help pages with example including p-values. Added an example of pValue_GO output and help page. Bug fix for overlap_GO with three go_ids. Update the manual in many places with up-to-date information and more examples (custom annotations, p-values, re-labelling of heatmap, sessionInfo, typos). Commit id: 371ffd8aadad09cc5acb87b42e06b5455352e54f Toward 1.1.6: typo 'labRow' in the man page of heatmap_GO Commit id: c65147122539f285573fe7895a75ea0db7dbc53a Toward 1.1.6: set random seed in vignette to allow reproducible results Commit id: 5c706d981260e6d993ef79878ad42c0c4622642a Toward 1.1.6: heatmap GO row labels can be overriden without affecting the colour-coding of samples. git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/GOexpress@100632 bc3139a8-67e5-0310-9ffc-ced21a209358
kevinrue · Mar 13, 2015 · 93a96fc · 93a96fc
1 parent 9cf7dd4
commit 93a96fc
Show file tree

Hide file tree

Showing 37 changed files with 1,343 additions and 431 deletions.
diff --git a/.gitignore b/.gitignore
@@ -9,3 +9,4 @@
 *.synctex.gz
 *.toc
 *.tiff
+core
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,7 +1,7 @@
 Package: GOexpress
 Title: Visualise microarray and RNAseq data using gene ontology annotations
-Version: 1.1.5
-Date: 2014-12-13
+Version: 1.1.6
+Date: 2014-03-13
 Authors@R: c(
     person(given="Kevin", family="Rue-Albrecht",
     role = c("aut", "cre"), email = "kevin.rue@ucdconnect.ie"),
@@ -13,20 +13,23 @@ Authors@R: c(
     person(given=c("Stephen", "V."), family="Gordon", role = c("ths")),
     person(given=c("David", "E."), family="MacHugh", role = c("ths")))
 Description: The package contains methods to visualise the expression profile
-    of genes from a microarray or RNA-seq experiment and offers a clustering
-    analysis to identify GO terms enriched in genes with expression levels
-    best clustering two or more predefined groups of samples. Annotations for
-    the genes present in the expression dataset are obtained from Ensembl
-    through the biomaRt package. The random forest framework is used to
-    evaluate the ability of each gene to cluster samples according to the
+    of genes from a microarray or RNA-seq experiment, and offers a 
+    supervised clustering approach to identify GO terms enriched in genes
+    with expression levels best clustering two or more predefined groups of
+    samples. Annotations for the genes present in the expression dataset may 
+    be obtained from Ensembl through the biomaRt package, if not provided by
+    the user. The default random forest framework is used to evaluate the
+    ability of each gene to cluster samples according to the
     factor of interest. Finally, GO terms are scored by averaging the
     rank (alternatively, score) of their respective gene sets to cluster
-    the samples. An ANOVA approach is also available as an alternative
-    statistical framework.
-Depends: R (>= 3.0.2), grid, Biobase (>= 2.22.0)
+    the samples. P-values may be computed to assess the significance of GO
+    term ranking. Visualisation function include gene expression profile,
+    gene ontology-based heatmaps, and hierarchical clustering of 
+    experimental samples using gene expression data.
+Depends: R (>= 3.0.2), grid, Biobase (>= 2.22.0), VennDiagram (>= 1.6.5)
 Imports: biomaRt (>= 2.18.0), stringr (>= 0.6.2), 
     ggplot2 (>= 0.9.0), RColorBrewer (>= 1.0), gplots (>= 2.13.0),
-    VennDiagram (>= 1.6.5), randomForest (>= 4.6)
+    randomForest (>= 4.6)
 Suggests: RCurl (>= 1.95), BiocStyle
 License: GPL (>= 3)
 biocViews: Software, GeneExpression, Transcription, DifferentialExpression,

diff --git a/NAMESPACE b/NAMESPACE
@@ -12,6 +12,7 @@ export("hist_scores")
 export("list_genes")
 export("overlap_GO")
 export("plot_design")
+export("pValue_GO")
 export("quantiles_scores")
 export("rerank")
 export("subEset")

diff --git a/NEWS b/NEWS
@@ -1,3 +1,94 @@
+CHANGES IN VERSION 1.1.6
+--------------------------
+
+BUG FIXES:
+
+    o overlap_GO() was crashing for 3-group Venn diagrams, except if the
+    VennDiagram was loaded manually loaded in the workspace using 
+    libray(VennDiagram). The function will now run seemlessly without that
+    manual step, as loading GOexpress will immediately load VennDiagram
+    in the workspace (stated as a dependency in the DESCRIPTION file).
+
+NEW FEATURES:
+
+    o New function pValue_GO() allows calculation of P-value for each
+    ontology using permutation of genes labels. This allows users to estimate
+    the chance of seeing a GO term reach a particular rank (or score).
+    Features a fancy progres bar shamelessly adapted from StackOverflow.
+
+    o heatmap_GO now semi-autmoatically resizes the bottom and right margins
+    to accomodate large gene and sample labels, respectively. The user may
+    control those margins using the "margins" argument of the function.
+
+    o heatmap_GO default call now shows the gene feature identifier for those
+    missing an annotated gene name, when gene names are requested (also
+    the default).
+
+    o A rank.by slot is now created by the GO_analyse() function in the result
+    object to state the metric used to order the result tables.
+
+    o a filters.GO slot stating the filters and cutoffs applied to the result
+    object is now created or updated by successive uses of the subset_scores()
+    function. Warnings and notes are displayed if conflicting filters and
+    cutoffs are applied on a previously filtered result object.
+
+    o rerank() function now supports re-ordering by P-value. Note that this
+    is only applicable to the output of the pValue_GO() function mentioned
+    above.
+
+    o rerank() function now updates the rank.by slot of the result object
+    to state the current ordering metric.
+
+    o subset_scores() function now allows filtering by P-value. Note that this
+    is only applicable to the output of the pValue_GO() function mentioned
+    above.
+
+    o Backward compatibility with Ensembl annotation releases 75 and earlier,
+    which used 'external_gene_id', which was renamed to 'external_gene_name'
+    in releases 76 and later.
+
+    o table_genes() function defaults to sorting genes by decreasing score
+    (equivalent to increasing rank). Gene feature name or identifier are
+    supported alternative filters for sorting.
+
+
+UPDATED FEATURES:
+
+    o Allow user to override row_labels in heatmap_GO. This way, the 
+    color-coding of the sample can be kept, while better description of the
+    samples can be used to label them, instead of the phenodata values.
+
+    o In heatmap_GO(), if the labRow argument is of length 1, it is assumed to
+    be the name of a column in the phenoData slot. Useful to re-label
+    subsetted ExpressionSet objects.
+
+GENERAL UPDATES:
+
+    o Updated the AlvMac training dataset to include 'RPL36A' an example
+    of multiple Ensembl gene identifier annotated to the same gene name.
+
+    o Updated the AlvMac example custom annotations to match the updated
+    dataset.
+
+    o Updated the example AlvMac_results to match the updated dataset.
+
+    o Set the random seed prior to running the GO_analyse() example in 
+    the vignette. Hopefully, this should allow reproducible testing by the
+    users.
+
+    o In User Guide, load package before loading the attached data.
+
+    o In User Guide, new sections and examples dealing with the re-labelling
+    of heatmap samples, the use of P-values, the re-ranking and subsetting
+    of results using P-values. New sub-sections for clarity. Emphasis on
+    the use and generation of local annotation, rather than use of current
+    online Ensembl annotation release.
+
+    o No more code connecting to the Ensembl server in any the help files
+    and User Guide.
+
+    o Help pages examples with more consistent indentation of code.
+
 CHANGES IN VERSION 1.1.5
 --------------------------
 

diff --git a/R/analysis.R b/R/analysis.R
@@ -250,8 +250,8 @@ GO_analyse <- function(
                 "Non-NULL GO_genes argument: Ignoring 'biomart_dataset' ",
                 "and 'microarray' arguments."
                 )
-            biomart_dataset = ""
-            microarray = ""
+            biomart_dataset <- ""
+            microarray <- ""
         }
         mart <- NULL
     }
@@ -332,7 +332,7 @@ GO_analyse <- function(
         if (! "name_1006" %in% colnames(all_GO)){
             # Allow the header "name" but internally convert it to name_1006
             if ("name" %in% colnames(all_GO)){
-                colnames(all_GO)[colnames(all_GO) == "name"] = "name_1006"
+                colnames(all_GO)[colnames(all_GO) == "name"] <- "name_1006"
             }
             # else if could allow more headers
             else {
@@ -347,7 +347,7 @@ GO_analyse <- function(
             if ("namespace" %in% colnames(all_GO)){
                 colnames(all_GO)[
                     colnames(all_GO) == "namespace"
-                    ] = "namespace_1006"
+                    ] <- "namespace_1006"
             }
             # else if could allow more headers
             else {
@@ -449,7 +449,7 @@ GO_analyse <- function(
                 all_genes <- getBM(
                     attributes=c(
                         "ensembl_gene_id",
-                        "external_gene_name",
+                        "external_gene_name", # since Ensembl release 76
                         "description"
                     ),
                     filters="ensembl_gene_id",
@@ -461,7 +461,7 @@ GO_analyse <- function(
                 all_genes <- getBM(
                     attributes=c(
                         microarray,
-                        "external_gene_name",
+                        "external_gene_name", # since Ensembl release 76
                         "description"
                     ),
                     filters=microarray,
@@ -482,9 +482,14 @@ GO_analyse <- function(
             if ("name" %in% colnames(all_genes)){
                 colnames(all_genes)[
                     colnames(all_genes) == "name"
-                    ] = "external_gene_name"
+                    ] <- "external_gene_name"
             }
-            # else if could allow more headers
+            else if ("external_gene_id" %in% colnames(all_genes)){
+                colnames(all_genes)[
+                    colnames(all_genes) == "external_gene_id"
+                    ] <- "external_gene_name"
+            }
+            # "else if" could allow more synonym headers
             else {
                 warning(
                     "We encourage the use of a \"name\" column describing",
@@ -601,6 +606,7 @@ GO_analyse <- function(
                 factor=f,
                 method=method,
                 subset=subset,
+                rank.by=rank.by,
                 ntree=ntree,
                 mtry=mtry
                 )
@@ -614,7 +620,8 @@ GO_analyse <- function(
                 genes=genes_score,
                 factor=f,
                 method=method,
-                subset=subset
+                subset=subset,
+                rank.by=rank.by
                 )
             )
     }
-Original file line number
+Diff line change
@@ Expand Up / @@ -9,3 +9,4 @@ @@
     *.synctex.gz
     *.toc
     *.tiff
+    core