diff --git a/docs/concepts/GAMBLR_family.html b/docs/concepts/GAMBLR_family.html index 5c76a90..1c12fc2 100644 --- a/docs/concepts/GAMBLR_family.html +++ b/docs/concepts/GAMBLR_family.html @@ -106,6 +106,10 @@
  • Frequently Asked Qestions +
  • +
  • + + Tutorial: The prettiest forestplot
  • diff --git a/docs/concepts/glossary.html b/docs/concepts/glossary.html index b08a085..aa223e6 100644 --- a/docs/concepts/glossary.html +++ b/docs/concepts/glossary.html @@ -103,6 +103,10 @@
  • Frequently Asked Qestions +
  • +
  • + + Tutorial: The prettiest forestplot
  • diff --git a/docs/faq.html b/docs/faq.html index bdd3a3c..600ed6f 100644 --- a/docs/faq.html +++ b/docs/faq.html @@ -103,6 +103,10 @@
  • Frequently Asked Qestions +
  • +
  • + + Tutorial: The prettiest forestplot
  • @@ -204,6 +208,12 @@ Frequently Asked Qestions +
  • +
  • Frequently Asked Qestions +
  • +
  • + + Tutorial: The prettiest forestplot
  • diff --git a/docs/install.html b/docs/install.html index 63a9de3..3d0c6e0 100644 --- a/docs/install.html +++ b/docs/install.html @@ -133,6 +133,10 @@
  • Frequently Asked Qestions +
  • +
  • + + Tutorial: The prettiest forestplot
  • @@ -234,6 +238,12 @@ Frequently Asked Qestions +
  • +
  • Frequently Asked Qestions +
  • +
  • + + Tutorial: The prettiest forestplot
  • diff --git a/docs/search.json b/docs/search.json index cac9d09..7b5c676 100644 --- a/docs/search.json +++ b/docs/search.json @@ -35,74 +35,60 @@ "text": "There are several key concepts underlying the logic behind the GAMBLR.viz package. The main terms are:\n\nthese_samples_metadata: This is a data frame with a set of minimal required columns: patient_id, Tumor_Sample_Barcode, sample_id, seq_type, sex, cohort, and pathology. The columns like sex and cohort can contain NA values but must be present in the metadata. The main purpose of this data frame is to provide a structure for the metadata that is always expected to be available and provides linkage between unique sample identifiers and associated basic metadata values. The columns Tumor_Sample_Barcode and sample_id are expected to share the same values, but are required to be present for direct operation on the outputs of different upstream tools.\n\n\n\n\n Back to top" }, { - "objectID": "tutorials/getting_started.html", - "href": "tutorials/getting_started.html", - "title": "Getting Started", + "objectID": "tutorials/forestplot.html", + "href": "tutorials/forestplot.html", + "title": "Tutorial: The prettiest forestplot", "section": "", - "text": "This is a quick tour of some basic commands and usage patterns, just to get you started.\n# Load packages\nlibrary(GAMBLR.data)\nlibrary(GAMBLR.helpers)\nlibrary(GAMBLR.viz)\nlibrary(tidyverse)\nThis tutorial explores how to generate some basic and most common plots, commonly occurring arguments across different functions, best practices and recommendations in the scope of visualizing data." - }, - { - "objectID": "tutorials/getting_started.html#what-are-standartized-colours", - "href": "tutorials/getting_started.html#what-are-standartized-colours", - "title": "Getting Started", - "section": "What are standartized colours?", - "text": "What are standartized colours?\nFirst, let’s explore the standartized color pallettes in the GAMBLR.viz. They are stored as list in one of the GAMBLR.viz dependencies (GAMBLR.helpers) and are an integral part of visualizations. For demonstration purposes, we will obtain all of the standartized colours:\n\nall_c <- get_gambl_colours(\n as_dataframe = TRUE\n)\n\nWhat are the colours available?\n\nstr(all_c)\n\n'data.frame': 268 obs. of 3 variables:\n $ group : chr \"seq_type\" \"seq_type\" \"seq_type\" \"type\" ...\n $ name : chr \"mrna\" \"genome\" \"capture\" \"gain\" ...\n $ colour: chr \"#E41A1C\" \"#377EB8\" \"#4DAF4A\" \"#0000FF\" ...\n\n\nWhat are the colour groups?\n\ntable(all_c$group)\n\n\n BL blood chapuy_classifier clinical \n 7 15 6 47 \n cohort coo copy_number EBV \n 16 12 17 4 \n FL genetic_subgroup hmrn indels \n 3 24 8 2 \n lacy_classifier lymphgen lymphgenerator mutation \n 8 14 10 13 \n pathology pos_neg rainfall seq_type \n 31 11 7 3 \n sex svs type \n 6 2 2 \n\n\nMany of these colours are conviniently provided for you to ensure consistency that is independent of formatting and case: for example, when the color for DLBCL COO is returned, the same color will be used for UNCLASS, U, UNC, Unclassified etc.\nJust for the purpose of this guide, we will define a simple function to display some of these colour pallettes:\n\nshow_col <- function(data, group){\n data %>%\n filter(\n !!sym(\"group\") == {{group}}\n ) %>%\n ggplot(\n aes(\n x = name,\n y = 0,\n fill = colour,\n label = name\n )\n ) +\n geom_tile(width = 0.9, height = 1) +\n geom_text(color = \"white\", fontface=\"bold\") +\n scale_fill_identity(guide = \"none\") +\n coord_flip() +\n theme_void() +\n labs(title = toupper(group)) +\n theme(plot.title = element_text(lineheight = 0.9,hjust=0.5,face=\"bold\"))\n}" - }, - { - "objectID": "tutorials/getting_started.html#hex-codes-for-b-cell-lymphomas", - "href": "tutorials/getting_started.html#hex-codes-for-b-cell-lymphomas", - "title": "Getting Started", - "section": "Hex codes for B-cell lymphomas", - "text": "Hex codes for B-cell lymphomas\n\nshow_col(all_c, \"pathology\")" + "text": "One of the integral parts of this package is the analysis and display of the differences in the frequency of mutations for two different groups in a given cohort. Because it is easy to use, conducts flexible comparisons, and generates easy-to-follow display items, it is called prettyForestPlot and it belongs to the pretty family of GAMBLR.viz functions. There is no specific formatting or data preparation needed for the analysis and visualization, and the only required inputs are the mutation data (can be maf format or binary feature matrix), metadata (containing sample identifiers in sample_id column and annotation of the group that will be used in comparison), and a character of the column name in metadata where the sample annotations are specified. This tutorial will demonstate the example of the inputs and showcase the main features of this function." }, { - "objectID": "tutorials/getting_started.html#hex-codes-for-genetic-subgroups", - "href": "tutorials/getting_started.html#hex-codes-for-genetic-subgroups", - "title": "Getting Started", - "section": "Hex codes for genetic subgroups", - "text": "Hex codes for genetic subgroups\n\nshow_col(all_c, \"genetic_subgroup\")" + "objectID": "tutorials/forestplot.html#prepare-setup", + "href": "tutorials/forestplot.html#prepare-setup", + "title": "Tutorial: The prettiest forestplot", + "section": "Prepare setup", + "text": "Prepare setup\nWe will first import the necessary packages:\n\n# Load packages\nlibrary(GAMBLR.data)\nlibrary(GAMBLR.viz)\nlibrary(dplyr)\n\nNext, we will get some data to display. The metadata is expected to be a data frame with one required column: sample_id and another column that will contain sample annotations according to the comparison group. In this example, we will use as example the data set and variant calls from the study that identified genetic subgroup of Burkitt lymphoma (BL).\n\nmetadata <- get_gambl_metadata() %>%\n filter(cohort == \"BL_Thomas\")\n\nNext, we will obtain the coding mutations that will be used in the plotting. The data is a data frame in a standartized maf format.\n\nmaf <- get_ssm_by_samples(\n these_samples_metadata = metadata,\n tool_name = \"publication\",\n projection = \"hg38\"\n)\n\n# How does it look like?\ndim(maf)\n\n[1] 47043 45\n\nhead(maf) %>%\n select(\n Tumor_Sample_Barcode,\n Hugo_Symbol,\n Variant_Classification\n )\n\n Tumor_Sample_Barcode Hugo_Symbol Variant_Classification\n1: Akata CPTP Missense_Mutation\n2: Akata FNDC10 Missense_Mutation\n3: Akata MORN1 Missense_Mutation\n4: Akata MEGF6 Missense_Mutation\n5: Akata NPHP4 Silent\n6: Akata GPR157 Missense_Mutation\n\n\nFor the purpose of this tutorial, we will focus on a small subset of genes known to be significantly mutated in BL.\n\ngenes <- lymphoma_genes_bl_v_latest$Gene\nhead(genes)\n\n[1] \"ALPK2\" \"ARHGEF1\" \"ARID1A\" \"B2M\" \"BACH2\" \"BCL10\" \n\n\nNow we have our metadata and mutations we want to explore, so we are ready to start visualizing the data." }, { - "objectID": "tutorials/getting_started.html#hex-codes-for-clinical-variables", - "href": "tutorials/getting_started.html#hex-codes-for-clinical-variables", - "title": "Getting Started", - "section": "Hex codes for clinical variables", - "text": "Hex codes for clinical variables\n\nshow_col(all_c, \"clinical\")" + "objectID": "tutorials/forestplot.html#the-default-forest-plot", + "href": "tutorials/forestplot.html#the-default-forest-plot", + "title": "Tutorial: The prettiest forestplot", + "section": "The default forest plot", + "text": "The default forest plot\nThe forest plot is ready to be called with the default parameters after just providing the metadata and data frame with mutations in standard maf format. Here is an example of the output with all default parameters:\n\ncomparison_column <- \"EBV_status_inf\" # character of column name for comparison\nfp <- prettyForestPlot(\n metadata = metadata,\n maf = maf,\n genes = genes,\n comparison_column = comparison_column\n)" }, { - "objectID": "tutorials/getting_started.html#hex-codes-for-mutation-types", - "href": "tutorials/getting_started.html#hex-codes-for-mutation-types", - "title": "Getting Started", - "section": "Hex codes for Mutation types", - "text": "Hex codes for Mutation types\n\nshow_col(all_c, \"mutation\")" + "objectID": "install.html", + "href": "install.html", + "title": "Installation", + "section": "", + "text": "Installation\nWe recommend installing the package directly from GitHub (requires devtools dependency).\nif (!require(\"devtools\")) install.packages(\"devtools\")\n\ndevtools::install_github(\n \"morinlab/GAMBLR.viz\",\n repos = BiocManager::repositories()\n)\nYou can confirm successful installation by running one of the most popular functions:\nlibrary(GAMBLR.data)\n\nmaf_metadata <- get_gambl_metadata(seq_type_filter = \"genome\") %>%\n dplyr::filter(pathology %in% c(\"FL\", \"DLBCL\"))\n\nmaf_data <- get_ssm_by_samples(\n these_samples_metadata = maf_metadata\n)\n\n#define some genes of interest\nfl_genes = c(\"RRAGC\", \"CREBBP\", \"VMA21\", \"ATP6V1B2\")\ndlbcl_genes = c(\"EZH2\", \"KMT2D\", \"MEF2B\", \"CD79B\", \"MYD88\", \"TP53\")\ngenes = c(fl_genes, dlbcl_genes)\n\nprettyOncoplot(\n maf_df = maf_data,\n genes = genes,\n these_samples_metadata = maf_metadata\n)\nThere is a lot of functionality to hand-craft this plot exactly in the way you want. Interested? Read more in the tutorials section.\n\n\n\n\n Back to top" }, { - "objectID": "faq.html", - "href": "faq.html", - "title": "Frequently Asked Qestions", + "objectID": "index.html", + "href": "index.html", + "title": "GAMBLR.viz", "section": "", - "text": "This section will cover most of the questions you may have about GAMBLR.viz. If there is something that is not covered, please feel free to reach out to us via GitHub by reporting an issue and we will be happy to add it to this page." + "text": "Why use GAMBLR.viz?\n \n \n \n How to install?\n \n \n \n How to use?\n \n \n \n Release notes\n \n \n \n GitHub" }, { - "objectID": "faq.html#where-can-i-get-example-data-that-works-with-this-package", - "href": "faq.html#where-can-i-get-example-data-that-works-with-this-package", - "title": "Frequently Asked Qestions", - "section": "Where can I get example data that works with this package?", - "text": "Where can I get example data that works with this package?\nThe example data of all types is available with one of GAMBLR.viz dependencies (GAMBLR.data). Every function demonstrates how to get this data in it’s example, or is already setup to automatically retreive it for you with minimal information (e.g. sample_id)." + "objectID": "index.html#install", + "href": "index.html#install", + "title": "GAMBLR.viz", + "section": "Install", + "text": "Install\nWe recommend installing the package directly from GitHub (requires devtools dependency).\ndevtools::install_github(\n \"morinlab/GAMBLR.viz\",\n repos = BiocManager::repositories()\n)\n\n\n\nShow quickstart" }, { - "objectID": "faq.html#can-i-use-my-own-colors-and-not-the-ones-the-package-offers", - "href": "faq.html#can-i-use-my-own-colors-and-not-the-ones-the-package-offers", - "title": "Frequently Asked Qestions", - "section": "Can I use my own colors and not the ones the package offers?", - "text": "Can I use my own colors and not the ones the package offers?\nAbsolutely! Most functions will accept argument custom_colours where list of color mappings can be used to specify your own pallette." + "objectID": "index.html#quickstart", + "href": "index.html#quickstart", + "title": "GAMBLR.viz", + "section": "Quickstart", + "text": "Quickstart\nThe quick and easy way to get started is to make sure the devtools dependency is installed, then install the GAMBLR.viz:\n# Verify devtools is installed\nif (!require(\"devtools\")) install.packages(\"devtools\")\n\n# Install GAMBLR.viz\ndevtools::install_github(\n \"morinlab/GAMBLR.viz\",\n repos = BiocManager::repositories()\n)" }, { - "objectID": "faq.html#can-i-use-my-own-data-we-generated-in-our-lab", - "href": "faq.html#can-i-use-my-own-data-we-generated-in-our-lab", - "title": "Frequently Asked Qestions", - "section": "Can I use my own data we generated in our lab?", - "text": "Can I use my own data we generated in our lab?\nAbsolutely! Most functions will accept metadata and data frame with mutations as input, so you can provide any outside data as long as the formatting is consistent with the example data." + "objectID": "index.html#installation-for-developers", + "href": "index.html#installation-for-developers", + "title": "GAMBLR.viz", + "section": "Installation for developers", + "text": "Installation for developers\nThe easiest way to obtain and contribute to GAMBLR.viz is to do this via cloning the repository\ncd\ngit clone git@github.com:morinlab/GAMBLR.viz.git\nIn your R editor of choice (which is hopefully VS Code now), set your working directory to the place you just cloned the repo.\nsetwd(\"~/GAMBLR.viz\")\nInstall the package in R by running the following command (requires the devtools package):\ndevtools::install()\nAfter applying your modifications to the code, use the following command to quickly test your changes without directly installing the packaage (requires the devtools dependency):\ndevtools::load_all()\nGAMBLR.viz is a free open-source package, but the Master branch is protected. We welcome contributions (pull request, bug report, feature request, PR review) from all levels of users. All commits must be submitted via pull request on a branch. Please refer to the GitHub documentation for details on how to do pull request." }, { "objectID": "why.html", @@ -147,39 +133,74 @@ "text": "Getting started\nIf you’re interested in trying GAMBLR.viz we recommend the getting started tutorial." }, { - "objectID": "index.html", - "href": "index.html", - "title": "GAMBLR.viz", + "objectID": "faq.html", + "href": "faq.html", + "title": "Frequently Asked Qestions", "section": "", - "text": "Why use GAMBLR.viz?\n \n \n \n How to install?\n \n \n \n How to use?\n \n \n \n Release notes\n \n \n \n GitHub" + "text": "This section will cover most of the questions you may have about GAMBLR.viz. If there is something that is not covered, please feel free to reach out to us via GitHub by reporting an issue and we will be happy to add it to this page." }, { - "objectID": "index.html#install", - "href": "index.html#install", - "title": "GAMBLR.viz", - "section": "Install", - "text": "Install\nWe recommend installing the package directly from GitHub (requires devtools dependency).\ndevtools::install_github(\n \"morinlab/GAMBLR.viz\",\n repos = BiocManager::repositories()\n)\n\n\n\nShow quickstart" + "objectID": "faq.html#where-can-i-get-example-data-that-works-with-this-package", + "href": "faq.html#where-can-i-get-example-data-that-works-with-this-package", + "title": "Frequently Asked Qestions", + "section": "Where can I get example data that works with this package?", + "text": "Where can I get example data that works with this package?\nThe example data of all types is available with one of GAMBLR.viz dependencies (GAMBLR.data). Every function demonstrates how to get this data in it’s example, or is already setup to automatically retreive it for you with minimal information (e.g. sample_id)." }, { - "objectID": "index.html#quickstart", - "href": "index.html#quickstart", - "title": "GAMBLR.viz", - "section": "Quickstart", - "text": "Quickstart\nThe quick and easy way to get started is to make sure the devtools dependency is installed, then install the GAMBLR.viz:\n# Verify devtools is installed\nif (!require(\"devtools\")) install.packages(\"devtools\")\n\n# Install GAMBLR.viz\ndevtools::install_github(\n \"morinlab/GAMBLR.viz\",\n repos = BiocManager::repositories()\n)" + "objectID": "faq.html#can-i-use-my-own-colors-and-not-the-ones-the-package-offers", + "href": "faq.html#can-i-use-my-own-colors-and-not-the-ones-the-package-offers", + "title": "Frequently Asked Qestions", + "section": "Can I use my own colors and not the ones the package offers?", + "text": "Can I use my own colors and not the ones the package offers?\nAbsolutely! Most functions will accept argument custom_colours where list of color mappings can be used to specify your own pallette." }, { - "objectID": "index.html#installation-for-developers", - "href": "index.html#installation-for-developers", - "title": "GAMBLR.viz", - "section": "Installation for developers", - "text": "Installation for developers\nThe easiest way to obtain and contribute to GAMBLR.viz is to do this via cloning the repository\ncd\ngit clone git@github.com:morinlab/GAMBLR.viz.git\nIn your R editor of choice (which is hopefully VS Code now), set your working directory to the place you just cloned the repo.\nsetwd(\"~/GAMBLR.viz\")\nInstall the package in R by running the following command (requires the devtools package):\ndevtools::install()\nAfter applying your modifications to the code, use the following command to quickly test your changes without directly installing the packaage (requires the devtools dependency):\ndevtools::load_all()\nGAMBLR.viz is a free open-source package, but the Master branch is protected. We welcome contributions (pull request, bug report, feature request, PR review) from all levels of users. All commits must be submitted via pull request on a branch. Please refer to the GitHub documentation for details on how to do pull request." + "objectID": "faq.html#can-i-use-my-own-data-we-generated-in-our-lab", + "href": "faq.html#can-i-use-my-own-data-we-generated-in-our-lab", + "title": "Frequently Asked Qestions", + "section": "Can I use my own data we generated in our lab?", + "text": "Can I use my own data we generated in our lab?\nAbsolutely! Most functions will accept metadata and data frame with mutations as input, so you can provide any outside data as long as the formatting is consistent with the example data." }, { - "objectID": "install.html", - "href": "install.html", - "title": "Installation", + "objectID": "tutorials/getting_started.html", + "href": "tutorials/getting_started.html", + "title": "Getting Started", "section": "", - "text": "Installation\nWe recommend installing the package directly from GitHub (requires devtools dependency).\nif (!require(\"devtools\")) install.packages(\"devtools\")\n\ndevtools::install_github(\n \"morinlab/GAMBLR.viz\",\n repos = BiocManager::repositories()\n)\nYou can confirm successful installation by running one of the most popular functions:\nlibrary(GAMBLR.data)\n\nmaf_metadata <- get_gambl_metadata(seq_type_filter = \"genome\") %>%\n dplyr::filter(pathology %in% c(\"FL\", \"DLBCL\"))\n\nmaf_data <- get_ssm_by_samples(\n these_samples_metadata = maf_metadata\n)\n\n#define some genes of interest\nfl_genes = c(\"RRAGC\", \"CREBBP\", \"VMA21\", \"ATP6V1B2\")\ndlbcl_genes = c(\"EZH2\", \"KMT2D\", \"MEF2B\", \"CD79B\", \"MYD88\", \"TP53\")\ngenes = c(fl_genes, dlbcl_genes)\n\nprettyOncoplot(\n maf_df = maf_data,\n genes = genes,\n these_samples_metadata = maf_metadata\n)\nThere is a lot of functionality to hand-craft this plot exactly in the way you want. Interested? Read more in the tutorials section.\n\n\n\n\n Back to top" + "text": "This is a quick tour of some basic commands and usage patterns, just to get you started.\n# Load packages\nlibrary(GAMBLR.data)\nlibrary(GAMBLR.helpers)\nlibrary(GAMBLR.viz)\nlibrary(tidyverse)\nThis tutorial explores how to generate some basic and most common plots, commonly occurring arguments across different functions, best practices and recommendations in the scope of visualizing data." + }, + { + "objectID": "tutorials/getting_started.html#what-are-standartized-colours", + "href": "tutorials/getting_started.html#what-are-standartized-colours", + "title": "Getting Started", + "section": "What are standartized colours?", + "text": "What are standartized colours?\nFirst, let’s explore the standartized color pallettes in the GAMBLR.viz. They are stored as list in one of the GAMBLR.viz dependencies (GAMBLR.helpers) and are an integral part of visualizations. For demonstration purposes, we will obtain all of the standartized colours:\n\nall_c <- get_gambl_colours(\n as_dataframe = TRUE\n)\n\nWhat are the colours available?\n\nstr(all_c)\n\n'data.frame': 268 obs. of 3 variables:\n $ group : chr \"seq_type\" \"seq_type\" \"seq_type\" \"type\" ...\n $ name : chr \"mrna\" \"genome\" \"capture\" \"gain\" ...\n $ colour: chr \"#E41A1C\" \"#377EB8\" \"#4DAF4A\" \"#0000FF\" ...\n\n\nWhat are the colour groups?\n\ntable(all_c$group)\n\n\n BL blood chapuy_classifier clinical \n 7 15 6 47 \n cohort coo copy_number EBV \n 16 12 17 4 \n FL genetic_subgroup hmrn indels \n 3 24 8 2 \n lacy_classifier lymphgen lymphgenerator mutation \n 8 14 10 13 \n pathology pos_neg rainfall seq_type \n 31 11 7 3 \n sex svs type \n 6 2 2 \n\n\nMany of these colours are conviniently provided for you to ensure consistency that is independent of formatting and case: for example, when the color for DLBCL COO is returned, the same color will be used for UNCLASS, U, UNC, Unclassified etc.\nJust for the purpose of this guide, we will define a simple function to display some of these colour pallettes:\n\nshow_col <- function(data, group){\n data %>%\n filter(\n !!sym(\"group\") == {{group}}\n ) %>%\n ggplot(\n aes(\n x = name,\n y = 0,\n fill = colour,\n label = name\n )\n ) +\n geom_tile(width = 0.9, height = 1) +\n geom_text(color = \"white\", fontface=\"bold\") +\n scale_fill_identity(guide = \"none\") +\n coord_flip() +\n theme_void() +\n labs(title = toupper(group)) +\n theme(plot.title = element_text(lineheight = 0.9,hjust=0.5,face=\"bold\"))\n}" + }, + { + "objectID": "tutorials/getting_started.html#hex-codes-for-b-cell-lymphomas", + "href": "tutorials/getting_started.html#hex-codes-for-b-cell-lymphomas", + "title": "Getting Started", + "section": "Hex codes for B-cell lymphomas", + "text": "Hex codes for B-cell lymphomas\n\nshow_col(all_c, \"pathology\")" + }, + { + "objectID": "tutorials/getting_started.html#hex-codes-for-genetic-subgroups", + "href": "tutorials/getting_started.html#hex-codes-for-genetic-subgroups", + "title": "Getting Started", + "section": "Hex codes for genetic subgroups", + "text": "Hex codes for genetic subgroups\n\nshow_col(all_c, \"genetic_subgroup\")" + }, + { + "objectID": "tutorials/getting_started.html#hex-codes-for-clinical-variables", + "href": "tutorials/getting_started.html#hex-codes-for-clinical-variables", + "title": "Getting Started", + "section": "Hex codes for clinical variables", + "text": "Hex codes for clinical variables\n\nshow_col(all_c, \"clinical\")" + }, + { + "objectID": "tutorials/getting_started.html#hex-codes-for-mutation-types", + "href": "tutorials/getting_started.html#hex-codes-for-mutation-types", + "title": "Getting Started", + "section": "Hex codes for Mutation types", + "text": "Hex codes for Mutation types\n\nshow_col(all_c, \"mutation\")" }, { "objectID": "tutorials/oncoplot.html", @@ -200,84 +221,84 @@ "href": "tutorials/oncoplot.html#the-simplest-oncoplot", "title": "Tutorial: The prettiest oncoplot", "section": "The simplest oncoplot", - "text": "The simplest oncoplot\nThere is a number of options how to customize your oncoplot, but it is ready for you to use with just the metadata and maf. Here is an example of the output with all default parameters:\n\nminMutationPercent <- 10 # only show genes mutated in at least 10% of samples\nprettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n minMutationPercent = minMutationPercent\n)\n\n[1] \"numcases: 441\"" + "text": "The simplest oncoplot\nThere is a number of options how to customize your oncoplot, but it is ready for you to use with just the metadata and maf. Here is an example of the output with all default parameters:\n\nminMutationPercent <- 10 # only show genes mutated in at least 10% of samples\nprettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n minMutationPercent = minMutationPercent\n)" }, { "objectID": "tutorials/oncoplot.html#adding-annotation-tracks", "href": "tutorials/oncoplot.html#adding-annotation-tracks", "title": "Tutorial: The prettiest oncoplot", "section": "Adding annotation tracks", - "text": "Adding annotation tracks\nWe can customize this and add some of the annotation tracks for more informative display of the metadata we ate interested in:\n\nmetadataColumns <- c(\n \"pathology\",\n \"lymphgen\",\n \"genetic_subgroup\",\n \"COO_consensus\",\n \"sex\"\n)\nprettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n minMutationPercent = minMutationPercent,\n metadataColumns = metadataColumns\n)\n\n[1] \"numcases: 441\"" + "text": "Adding annotation tracks\nWe can customize this and add some of the annotation tracks for more informative display of the metadata we ate interested in:\n\nmetadataColumns <- c(\n \"pathology\",\n \"lymphgen\",\n \"genetic_subgroup\",\n \"COO_consensus\",\n \"sex\"\n)\nprettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n minMutationPercent = minMutationPercent,\n metadataColumns = metadataColumns\n)" }, { "objectID": "tutorials/oncoplot.html#changing-font-sizes", "href": "tutorials/oncoplot.html#changing-font-sizes", "title": "Tutorial: The prettiest oncoplot", "section": "Changing font sizes", - "text": "Changing font sizes\nYou may notice that as more (or less) genes and annotations are displayed with the oncoplot we may want to modify the size of the gene names and/or the annotation tracks with their labels. There are several parameters available for you to do so: - metadataBarHeight: will change the height of the annotation tracks at the bottom of the oncoplot - metadataBarFontsize: will change the font size of the annotation tracks at the bottom of the oncoplot - fontSizeGene: will change the font size of both percentage labels to the right of the oncoplot and gene names to the left of it - legendFontSize: will change the font size of the legend at the bottom of the plot Let’s see these parameters in action:\n\nmetadataBarHeight <- 5\nmetadataBarFontsize <- 10\nfontSizeGene <- 12\nlegendFontSize <- 7\n\nprettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n minMutationPercent = minMutationPercent,\n metadataColumns = metadataColumns,\n metadataBarHeight = metadataBarHeight,\n metadataBarFontsize = metadataBarFontsize,\n fontSizeGene = fontSizeGene,\n legendFontSize = legendFontSize\n)\n\n[1] \"numcases: 441\"" + "text": "Changing font sizes\nYou may notice that as more (or less) genes and annotations are displayed with the oncoplot we may want to modify the size of the gene names and/or the annotation tracks with their labels. There are several parameters available for you to do so: - metadataBarHeight: will change the height of the annotation tracks at the bottom of the oncoplot - metadataBarFontsize: will change the font size of the annotation tracks at the bottom of the oncoplot - fontSizeGene: will change the font size of both percentage labels to the right of the oncoplot and gene names to the left of it - legendFontSize: will change the font size of the legend at the bottom of the plot Let’s see these parameters in action:\n\nmetadataBarHeight <- 5\nmetadataBarFontsize <- 10\nfontSizeGene <- 12\nlegendFontSize <- 7\n\nprettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n minMutationPercent = minMutationPercent,\n metadataColumns = metadataColumns,\n metadataBarHeight = metadataBarHeight,\n metadataBarFontsize = metadataBarFontsize,\n fontSizeGene = fontSizeGene,\n legendFontSize = legendFontSize\n)" }, { "objectID": "tutorials/oncoplot.html#show-samples-ordered-on-annotations", "href": "tutorials/oncoplot.html#show-samples-ordered-on-annotations", "title": "Tutorial: The prettiest oncoplot", "section": "Show samples ordered on annotations", - "text": "Show samples ordered on annotations\nWe can notice that the default setting generates the classic “rainfall” style of the plot - but what if we want to add some structure to it and sort sample order in some way? It is easy to do so with the parameter sortByColumns. We can sort on the same annotations as we use to display with the oncoplot:\n\nprettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n minMutationPercent = minMutationPercent,\n metadataColumns = metadataColumns,\n metadataBarHeight = metadataBarHeight,\n metadataBarFontsize = metadataBarFontsize,\n fontSizeGene = fontSizeGene,\n legendFontSize = legendFontSize,\n sortByColumns = metadataColumns\n)\n\n[1] \"numcases: 441\"\n\n\n\n\n\n\n\n\n\n\n\nDid you know?\n\n\n\nThe ordering occurs sequentially according to the order of individual columns we have specified with the sortByColumns parameter. The ordering is in ascending order, and can be toggled with additional boolean parameter arrange_descending." + "text": "Show samples ordered on annotations\nWe can notice that the default setting generates the classic “rainfall” style of the plot - but what if we want to add some structure to it and sort sample order in some way? It is easy to do so with the parameter sortByColumns. We can sort on the same annotations as we use to display with the oncoplot:\n\nprettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n minMutationPercent = minMutationPercent,\n metadataColumns = metadataColumns,\n metadataBarHeight = metadataBarHeight,\n metadataBarFontsize = metadataBarFontsize,\n fontSizeGene = fontSizeGene,\n legendFontSize = legendFontSize,\n sortByColumns = metadataColumns\n)\n\n\n\n\n\n\n\n\n\n\nDid you know?\n\n\n\nThe ordering occurs sequentially according to the order of individual columns we have specified with the sortByColumns parameter. The ordering is in ascending order, and can be toggled with additional boolean parameter arrange_descending." }, { "objectID": "tutorials/oncoplot.html#displaying-only-specific-genes", "href": "tutorials/oncoplot.html#displaying-only-specific-genes", "title": "Tutorial: The prettiest oncoplot", "section": "Displaying only specific genes", - "text": "Displaying only specific genes\nThere can be scenarion where we might want to diplay genes not based on their recurrence, but out of interest in specific genes. Sure so, one way to do it is to pre-filter your maf data to the genes of interest. But this might have some unexpected consequences and limit your flexibility in doing more things, so the better way is to take advantage of the genes parameter:\n\nfl_genes <- c(\"RRAGC\", \"CREBBP\", \"VMA21\", \"ATP6V1B2\", \"EZH2\", \"KMT2D\")\ndlbcl_genes <- c(\"MEF2B\", \"CD79B\", \"MYD88\", \"TP53\")\ngenes <- c(fl_genes, dlbcl_genes)\n\nprettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n metadataColumns = metadataColumns,\n metadataBarHeight = metadataBarHeight,\n metadataBarFontsize = metadataBarFontsize,\n fontSizeGene = fontSizeGene,\n legendFontSize = legendFontSize,\n sortByColumns = metadataColumns,\n genes = genes\n)\n\n[1] \"numcases: 441\"\n[1] \"numgenes: 10\"\n\n\n\n\n\n\n\n\n\n\n\nNote\n\n\n\nNote that we removed the minMutationPercent in the last function call since we wanted to see the genes that we specifically requested.\n\n\nNow we are only looking at some specific genes of interest but they are arranged in the decreasing order of their recurrence in this cohort. What if we want to enforce the gene order on the oncoplot to be exactly the same as we specified it in our gene variable? We can take advantage of the keepGeneOrder parameter:\n\nprettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n metadataColumns = metadataColumns,\n metadataBarHeight = metadataBarHeight,\n metadataBarFontsize = metadataBarFontsize,\n fontSizeGene = fontSizeGene,\n legendFontSize = legendFontSize,\n sortByColumns = metadataColumns,\n genes = genes,\n keepGeneOrder = TRUE\n)\n\n[1] \"numcases: 441\"\n[1] \"numgenes: 10\"" + "text": "Displaying only specific genes\nThere can be scenarion where we might want to diplay genes not based on their recurrence, but out of interest in specific genes. Sure so, one way to do it is to pre-filter your maf data to the genes of interest. But this might have some unexpected consequences and limit your flexibility in doing more things, so the better way is to take advantage of the genes parameter:\n\nfl_genes <- c(\"RRAGC\", \"CREBBP\", \"VMA21\", \"ATP6V1B2\", \"EZH2\", \"KMT2D\")\ndlbcl_genes <- c(\"MEF2B\", \"CD79B\", \"MYD88\", \"TP53\")\ngenes <- c(fl_genes, dlbcl_genes)\n\nprettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n metadataColumns = metadataColumns,\n metadataBarHeight = metadataBarHeight,\n metadataBarFontsize = metadataBarFontsize,\n fontSizeGene = fontSizeGene,\n legendFontSize = legendFontSize,\n sortByColumns = metadataColumns,\n genes = genes\n)\n\n\n\n\n\n\n\n\n\n\nNote\n\n\n\nNote that we removed the minMutationPercent in the last function call since we wanted to see the genes that we specifically requested.\n\n\nNow we are only looking at some specific genes of interest but they are arranged in the decreasing order of their recurrence in this cohort. What if we want to enforce the gene order on the oncoplot to be exactly the same as we specified it in our gene variable? We can take advantage of the keepGeneOrder parameter:\n\nprettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n metadataColumns = metadataColumns,\n metadataBarHeight = metadataBarHeight,\n metadataBarFontsize = metadataBarFontsize,\n fontSizeGene = fontSizeGene,\n legendFontSize = legendFontSize,\n sortByColumns = metadataColumns,\n genes = genes,\n keepGeneOrder = TRUE\n)" }, { "objectID": "tutorials/oncoplot.html#grouping-genes-into-categories", "href": "tutorials/oncoplot.html#grouping-genes-into-categories", "title": "Tutorial: The prettiest oncoplot", "section": "Grouping genes into categories", - "text": "Grouping genes into categories\nWe can also group genes into specific categories. To do so, we need to have a named list where name of the list element corresponds to the gene name, and the list element corresponds to the gene group. We alreade have the genes variable, so we can convert it to the appropriate format:\n\ngene_groups <- c(\n rep(\"FL\", length(fl_genes)),\n rep(\"DLBCL\", length(dlbcl_genes))\n)\nnames(gene_groups) <- genes\n\ngene_groups\n\n RRAGC CREBBP VMA21 ATP6V1B2 EZH2 KMT2D MEF2B CD79B \n \"FL\" \"FL\" \"FL\" \"FL\" \"FL\" \"FL\" \"DLBCL\" \"DLBCL\" \n MYD88 TP53 \n \"DLBCL\" \"DLBCL\" \n\n\nNow we can use it to split the genes on the oncoplot into the groups:\n\nprettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n metadataColumns = metadataColumns,\n metadataBarHeight = metadataBarHeight,\n metadataBarFontsize = metadataBarFontsize,\n fontSizeGene = fontSizeGene,\n legendFontSize = legendFontSize,\n sortByColumns = metadataColumns,\n genes = genes,\n splitGeneGroups = gene_groups\n)\n\n[1] \"numcases: 441\"\n[1] \"numgenes: 10\"\n\n\n\n\n\n\n\n\n\n\n\nDid you know?\n\n\n\nYou can provide more than two groups of genes - any number of groups is supported as long as they are specified in the gene_groups.\n\n\n\n\n\n\n\n\nNote\n\n\n\nWithin each group, the genes are ordered in decreasing order of their recurrence, but the keepGeneOrder parameter is still supported and if specified, will keep the specified order within each group." + "text": "Grouping genes into categories\nWe can also group genes into specific categories. To do so, we need to have a named list where name of the list element corresponds to the gene name, and the list element corresponds to the gene group. We alreade have the genes variable, so we can convert it to the appropriate format:\n\ngene_groups <- c(\n rep(\"FL\", length(fl_genes)),\n rep(\"DLBCL\", length(dlbcl_genes))\n)\nnames(gene_groups) <- genes\n\ngene_groups\n\n RRAGC CREBBP VMA21 ATP6V1B2 EZH2 KMT2D MEF2B CD79B \n \"FL\" \"FL\" \"FL\" \"FL\" \"FL\" \"FL\" \"DLBCL\" \"DLBCL\" \n MYD88 TP53 \n \"DLBCL\" \"DLBCL\" \n\n\nNow we can use it to split the genes on the oncoplot into the groups:\n\nprettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n metadataColumns = metadataColumns,\n metadataBarHeight = metadataBarHeight,\n metadataBarFontsize = metadataBarFontsize,\n fontSizeGene = fontSizeGene,\n legendFontSize = legendFontSize,\n sortByColumns = metadataColumns,\n genes = genes,\n splitGeneGroups = gene_groups\n)\n\n\n\n\n\n\n\n\n\n\nDid you know?\n\n\n\nYou can provide more than two groups of genes - any number of groups is supported as long as they are specified in the gene_groups.\n\n\n\n\n\n\n\n\nNote\n\n\n\nWithin each group, the genes are ordered in decreasing order of their recurrence, but the keepGeneOrder parameter is still supported and if specified, will keep the specified order within each group." }, { "objectID": "tutorials/oncoplot.html#grouping-samples-into-categories", "href": "tutorials/oncoplot.html#grouping-samples-into-categories", "title": "Tutorial: The prettiest oncoplot", "section": "Grouping samples into categories", - "text": "Grouping samples into categories\nSimilar to the grouping of genes, we can also group samples into certain categories. Typically, it is done based on one of the annotations tracks. By default, there will be no labels for each sample category, but we also have an option of specifying these labels:\n\nprettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n metadataColumns = metadataColumns,\n metadataBarHeight = metadataBarHeight,\n metadataBarFontsize = metadataBarFontsize,\n fontSizeGene = fontSizeGene,\n legendFontSize = legendFontSize,\n sortByColumns = metadataColumns,\n genes = genes,\n splitGeneGroups = gene_groups,\n splitColumnName = \"pathology\",\n groupNames = c(\"Follicular lymphoma\", \"DLBCL\", \"COMFL\")\n)\n\n[1] \"numcases: 441\"\n[1] \"numgenes: 10\"" + "text": "Grouping samples into categories\nSimilar to the grouping of genes, we can also group samples into certain categories. Typically, it is done based on one of the annotations tracks. By default, there will be no labels for each sample category, but we also have an option of specifying these labels:\n\nprettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n metadataColumns = metadataColumns,\n metadataBarHeight = metadataBarHeight,\n metadataBarFontsize = metadataBarFontsize,\n fontSizeGene = fontSizeGene,\n legendFontSize = legendFontSize,\n sortByColumns = metadataColumns,\n genes = genes,\n splitGeneGroups = gene_groups,\n splitColumnName = \"pathology\",\n groupNames = c(\"Follicular lymphoma\", \"DLBCL\", \"COMFL\")\n)" }, { "objectID": "tutorials/oncoplot.html#tallying-mutation-burden", "href": "tutorials/oncoplot.html#tallying-mutation-burden", "title": "Tutorial: The prettiest oncoplot", "section": "Tallying mutation burden", - "text": "Tallying mutation burden\nPreviously, we noted that the maf data we were supplying to the prettyOncoplot was not subset to contain only coding mutations, and also discouraged from pre-filtering maf to a subset of genes if we are insterested only looking at some of them. Here is why this is important: if we want to layer on additional information like total mutation burden per sample, any subsetting or filtering of the maf would generate inaccurate and misleading results. Therefore, prettyOncoplot handles all of this for you! So if we were to go ahead with tallying the total mutation burden, we could just add some additional parameters to the function call:\n\nhideTopBarplot <- FALSE # will display TMB annotations at the top\ntally_all_mutations <- TRUE # will tally all mutations per sample\n\nprettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n metadataColumns = metadataColumns,\n metadataBarHeight = metadataBarHeight,\n metadataBarFontsize = metadataBarFontsize,\n fontSizeGene = fontSizeGene,\n legendFontSize = legendFontSize,\n sortByColumns = metadataColumns,\n genes = genes,\n splitGeneGroups = gene_groups,\n splitColumnName = \"pathology\",\n groupNames = c(\"Follicular lymphoma\", \"DLBCL\", \"COMFL\"),\n hideTopBarplot = hideTopBarplot,\n tally_all_mutations = tally_all_mutations\n)\n\n[1] \"numcases: 441\"\n[1] \"numgenes: 10\"\n\n\n\n\n\n\n\n\n\n\n\nDid you know?\n\n\n\nIf the dynamic range of total mutation burden is too big and there are some extreme outliers, the bar chart at the top of the oncoplot can be capped of at any numeric value by providing tally_all_mutations_max parameter.\n\n\nWhat if we want to additionally force the ordering based on the total number of mutations, so they are nicely arranged in the decreasing order? We can do so by adding the mutation counts as one of the annotation tracks and using it to sort the samples:\n\n# Count all muts to define the order of samples\ntotal_mut_burden <- maf %>%\n count(Tumor_Sample_Barcode)\n\nhead(total_mut_burden)\n\n Tumor_Sample_Barcode n\n1: 01-20260T 71\n2: 02-13135T 98\n3: 02-20170T 67\n4: 02-22991T 53\n5: 03-34157T 26\n6: 04-24937T 146\n\n# Add this info to metadata\nmetadata <- left_join(\n metadata,\n total_mut_burden\n) \n\nprettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n metadataColumns = metadataColumns,\n metadataBarHeight = metadataBarHeight,\n metadataBarFontsize = metadataBarFontsize,\n fontSizeGene = fontSizeGene,\n legendFontSize = legendFontSize,\n sortByColumns = c(\"n\", metadataColumns),\n genes = genes,\n splitGeneGroups = gene_groups,\n splitColumnName = \"pathology\",\n groupNames = c(\"Follicular lymphoma\", \"DLBCL\", \"COMFL\"),\n hideTopBarplot = hideTopBarplot,\n tally_all_mutations = tally_all_mutations,\n numericMetadataColumns = \"n\",\n arrange_descending = TRUE\n)\n\n[1] \"numcases: 441\"\n[1] \"numgenes: 10\"\n\n\n\n\n\n\n\n\n\n\n\nNote\n\n\n\nWe have modified here the sortByColumns parameter, and provided two additional parameters numericMetadataColumns and arrange_descending.\n\n\n\n\n\n\n\n\nDid you know?\n\n\n\nThe top annotation and n annotation at the bottom are the same thing? Remove n from the legend by adding hide_annotations = \"n\" and remove display of annotation track while keeping the ordering by adding hide_annotations_tracks = TRUE." + "text": "Tallying mutation burden\nPreviously, we noted that the maf data we were supplying to the prettyOncoplot was not subset to contain only coding mutations, and also discouraged from pre-filtering maf to a subset of genes if we are insterested only looking at some of them. Here is why this is important: if we want to layer on additional information like total mutation burden per sample, any subsetting or filtering of the maf would generate inaccurate and misleading results. Therefore, prettyOncoplot handles all of this for you! So if we were to go ahead with tallying the total mutation burden, we could just add some additional parameters to the function call:\n\nhideTopBarplot <- FALSE # will display TMB annotations at the top\ntally_all_mutations <- TRUE # will tally all mutations per sample\n\nprettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n metadataColumns = metadataColumns,\n metadataBarHeight = metadataBarHeight,\n metadataBarFontsize = metadataBarFontsize,\n fontSizeGene = fontSizeGene,\n legendFontSize = legendFontSize,\n sortByColumns = metadataColumns,\n genes = genes,\n splitGeneGroups = gene_groups,\n splitColumnName = \"pathology\",\n groupNames = c(\"Follicular lymphoma\", \"DLBCL\", \"COMFL\"),\n hideTopBarplot = hideTopBarplot,\n tally_all_mutations = tally_all_mutations\n)\n\n\n\n\n\n\n\n\n\n\nDid you know?\n\n\n\nIf the dynamic range of total mutation burden is too big and there are some extreme outliers, the bar chart at the top of the oncoplot can be capped of at any numeric value by providing tally_all_mutations_max parameter.\n\n\nWhat if we want to additionally force the ordering based on the total number of mutations, so they are nicely arranged in the decreasing order? We can do so by adding the mutation counts as one of the annotation tracks and using it to sort the samples:\n\n# Count all muts to define the order of samples\ntotal_mut_burden <- maf %>%\n count(Tumor_Sample_Barcode)\n\nhead(total_mut_burden)\n\n Tumor_Sample_Barcode n\n1: 01-20260T 71\n2: 02-13135T 98\n3: 02-20170T 67\n4: 02-22991T 53\n5: 03-34157T 26\n6: 04-24937T 146\n\n# Add this info to metadata\nmetadata <- left_join(\n metadata,\n total_mut_burden\n) \n\nprettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n metadataColumns = metadataColumns,\n metadataBarHeight = metadataBarHeight,\n metadataBarFontsize = metadataBarFontsize,\n fontSizeGene = fontSizeGene,\n legendFontSize = legendFontSize,\n sortByColumns = c(\"n\", metadataColumns),\n genes = genes,\n splitGeneGroups = gene_groups,\n splitColumnName = \"pathology\",\n groupNames = c(\"Follicular lymphoma\", \"DLBCL\", \"COMFL\"),\n hideTopBarplot = hideTopBarplot,\n tally_all_mutations = tally_all_mutations,\n numericMetadataColumns = \"n\",\n arrange_descending = TRUE\n)\n\n\n\n\n\n\n\n\n\n\nNote\n\n\n\nWe have modified here the sortByColumns parameter, and provided two additional parameters numericMetadataColumns and arrange_descending.\n\n\n\n\n\n\n\n\nDid you know?\n\n\n\nThe top annotation and n annotation at the bottom are the same thing? Remove n from the legend by adding hide_annotations = \"n\" and remove display of annotation track while keeping the ordering by adding hide_annotations_tracks = TRUE." }, { "objectID": "tutorials/oncoplot.html#annotating-significance-of-mutation-frequencies-in-sample-groups", "href": "tutorials/oncoplot.html#annotating-significance-of-mutation-frequencies-in-sample-groups", "title": "Tutorial: The prettiest oncoplot", "section": "Annotating significance of mutation frequencies in sample groups", - "text": "Annotating significance of mutation frequencies in sample groups\nWhen looking at our sample plots, we can notice that the frequency of mutations in RRAGC, ATP6V1B2, VMA21 and others is different between FL and DLBCL. But is this difference significant? Can we layer on this diffenerence to the display panel? Yes we can, and this is very easy with GAMBLR family! To do so we will first use another function from GAMBLR.viz to run Fisher’s test and find which genes are significantly different between the FL and DLBCL:\n\nfisher_test <- prettyForestPlot(\n maf = maf,\n metadata = metadata,\n genes = genes,\n comparison_column = \"pathology\",\n comparison_values = c(\"DLBCL\", \"FL\"), # we have three pathologies in data\n comparison_name = \"FL vs DLBCL\"\n)\nfisher_test$arranged\n\n\n\n\nIn fact, there are genes that are mutated at significantly different frequencies! Now let’s layer on this information to our oncoplot:\n\nprettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n metadataColumns = metadataColumns,\n metadataBarHeight = metadataBarHeight,\n metadataBarFontsize = metadataBarFontsize,\n fontSizeGene = fontSizeGene,\n legendFontSize = legendFontSize,\n sortByColumns = c(\"n\", metadataColumns),\n genes = genes,\n splitGeneGroups = gene_groups,\n splitColumnName = \"pathology\",\n groupNames = c(\"Follicular lymphoma\", \"DLBCL\", \"COMFL\"),\n hideTopBarplot = hideTopBarplot,\n tally_all_mutations = tally_all_mutations,\n numericMetadataColumns = \"n\",\n arrange_descending = TRUE,\n hide_annotations = \"n\",\n hide_annotations_tracks = TRUE,\n annotate_specific_genes = TRUE,\n this_forest_object = fisher_test\n)\n\n[1] \"numcases: 441\"\n[1] \"numgenes: 10\"" + "text": "Annotating significance of mutation frequencies in sample groups\nWhen looking at our sample plots, we can notice that the frequency of mutations in RRAGC, ATP6V1B2, VMA21 and others is different between FL and DLBCL. But is this difference significant? Can we layer on this diffenerence to the display panel? Yes we can, and this is very easy with GAMBLR family! To do so we will first use another function from GAMBLR.viz to run Fisher’s test and find which genes are significantly different between the FL and DLBCL:\n\nfisher_test <- prettyForestPlot(\n maf = maf,\n metadata = metadata,\n genes = genes,\n comparison_column = \"pathology\",\n comparison_values = c(\"DLBCL\", \"FL\"), # we have three pathologies in data\n comparison_name = \"FL vs DLBCL\"\n)\nfisher_test$arranged\n\n\n\n\nIn fact, there are genes that are mutated at significantly different frequencies! Now let’s layer on this information to our oncoplot:\n\nprettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n metadataColumns = metadataColumns,\n metadataBarHeight = metadataBarHeight,\n metadataBarFontsize = metadataBarFontsize,\n fontSizeGene = fontSizeGene,\n legendFontSize = legendFontSize,\n sortByColumns = c(\"n\", metadataColumns),\n genes = genes,\n splitGeneGroups = gene_groups,\n splitColumnName = \"pathology\",\n groupNames = c(\"Follicular lymphoma\", \"DLBCL\", \"COMFL\"),\n hideTopBarplot = hideTopBarplot,\n tally_all_mutations = tally_all_mutations,\n numericMetadataColumns = \"n\",\n arrange_descending = TRUE,\n hide_annotations = \"n\",\n hide_annotations_tracks = TRUE,\n annotate_specific_genes = TRUE,\n this_forest_object = fisher_test\n)" }, { "objectID": "tutorials/oncoplot.html#annotating-genes-with-hotspots", "href": "tutorials/oncoplot.html#annotating-genes-with-hotspots", "title": "Tutorial: The prettiest oncoplot", "section": "Annotating genes with hotspots", - "text": "Annotating genes with hotspots\nSome genes are mutated at certain positions more often that at others, therefore creating the mutational hotspots - and it we can layer on this level of information to our oncoplot. First, we will need to process our maf data to add a new column called hot_spot which will contain a boolean value showing whether or not particular mutation is a hotspot. If you don’t know how to do it, there is a function for exactly this purpose in the GAMBLR.data, and we will use it in this example:\n\n# Annotate hotspots\nmaf <- annotate_hotspots(maf)\n\n# What are the hotspots?\nmaf %>%\n filter(hot_spot) %>%\n select(Hugo_Symbol, hot_spot) %>%\n table()\n\n hot_spot\nHugo_Symbol TRUE\n CREBBP 76\n EZH2 86\n FOXO1 20\n MEF2B 23\n MYD88 46\n STAT6 51\n\n\n\n\n\n\n\n\nNote\n\n\n\nThe GAMBLR.data version of the annotate_hotspots only handles very specific genes and does not have functionality to annotate all hotspots.\n\n\nNow, we can add annotation of the hotspots to the oncoplot display by toggling the highlightHotspots parameter:\n\nhighlightHotspots <- TRUE\nprettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n metadataColumns = metadataColumns,\n metadataBarHeight = metadataBarHeight,\n metadataBarFontsize = metadataBarFontsize,\n fontSizeGene = fontSizeGene,\n legendFontSize = legendFontSize,\n sortByColumns = c(\"n\", metadataColumns),\n genes = genes,\n splitGeneGroups = gene_groups,\n splitColumnName = \"pathology\",\n groupNames = c(\"Follicular lymphoma\", \"DLBCL\", \"COMFL\"),\n hideTopBarplot = hideTopBarplot,\n tally_all_mutations = tally_all_mutations,\n numericMetadataColumns = \"n\",\n arrange_descending = TRUE,\n hide_annotations = \"n\",\n hide_annotations_tracks = TRUE,\n annotate_specific_genes = TRUE,\n this_forest_object = fisher_test,\n highlightHotspots = highlightHotspots\n)\n\n[1] \"numcases: 441\"\n[1] \"numgenes: 10\"" + "text": "Annotating genes with hotspots\nSome genes are mutated at certain positions more often that at others, therefore creating the mutational hotspots - and it we can layer on this level of information to our oncoplot. First, we will need to process our maf data to add a new column called hot_spot which will contain a boolean value showing whether or not particular mutation is a hotspot. If you don’t know how to do it, there is a function for exactly this purpose in the GAMBLR.data, and we will use it in this example:\n\n# Annotate hotspots\nmaf <- annotate_hotspots(maf)\n\n# What are the hotspots?\nmaf %>%\n filter(hot_spot) %>%\n select(Hugo_Symbol, hot_spot) %>%\n table()\n\n hot_spot\nHugo_Symbol TRUE\n CREBBP 76\n EZH2 86\n FOXO1 20\n MEF2B 23\n MYD88 46\n STAT6 51\n\n\n\n\n\n\n\n\nNote\n\n\n\nThe GAMBLR.data version of the annotate_hotspots only handles very specific genes and does not have functionality to annotate all hotspots.\n\n\nNow, we can add annotation of the hotspots to the oncoplot display by toggling the highlightHotspots parameter:\n\nhighlightHotspots <- TRUE\nprettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n metadataColumns = metadataColumns,\n metadataBarHeight = metadataBarHeight,\n metadataBarFontsize = metadataBarFontsize,\n fontSizeGene = fontSizeGene,\n legendFontSize = legendFontSize,\n sortByColumns = c(\"n\", metadataColumns),\n genes = genes,\n splitGeneGroups = gene_groups,\n splitColumnName = \"pathology\",\n groupNames = c(\"Follicular lymphoma\", \"DLBCL\", \"COMFL\"),\n hideTopBarplot = hideTopBarplot,\n tally_all_mutations = tally_all_mutations,\n numericMetadataColumns = \"n\",\n arrange_descending = TRUE,\n hide_annotations = \"n\",\n hide_annotations_tracks = TRUE,\n annotate_specific_genes = TRUE,\n this_forest_object = fisher_test,\n highlightHotspots = highlightHotspots\n)" }, { "objectID": "tutorials/oncoplot.html#co-oncoplot-two-plots-side-by-side", "href": "tutorials/oncoplot.html#co-oncoplot-two-plots-side-by-side", "title": "Tutorial: The prettiest oncoplot", "section": "Co-oncoplot: two plots side-by-side", - "text": "Co-oncoplot: two plots side-by-side\nIt may also be informative to generate a display panel where there are two oncoplots displayed side-by-side, so it is possible to visually compare the specific groups of samples while maintaining all annotations and ordering we built so far. For this purpose, the GAMBLR.viz has another function in the pretty family: prettyCoOncoplot. It accepts all of the same parameters as prettyOncoplot with addition of some unique additions. For example, lets break down our sample oncoplot we created so far by the genetic_subgroup and see how cFL compares to dFL:\n\nprettyCoOncoplot(\n metadata = metadata,\n maf = maf,\n comparison_column = \"genetic_subgroup\",\n label1 = \"cFL\",\n label2 = \"dFL\",\n metadataColumns = metadataColumns,\n metadataBarHeight = metadataBarHeight,\n metadataBarFontsize = metadataBarFontsize,\n fontSizeGene = fontSizeGene,\n legendFontSize = legendFontSize,\n sortByColumns = c(\"n\", metadataColumns),\n genes = genes,\n splitGeneGroups = gene_groups,\n splitColumnName = \"pathology\",\n keepGeneOrder = TRUE,\n groupNames = c(\"Follicular lymphoma\", \"DLBCL\", \"COMFL\"),\n hideTopBarplot = hideTopBarplot,\n tally_all_mutations = tally_all_mutations,\n numericMetadataColumns = \"n\",\n arrange_descending = TRUE,\n hide_annotations = \"n\",\n hide_annotations_tracks = TRUE,\n annotate_specific_genes = TRUE,\n this_forest_object = fisher_test,\n highlightHotspots = highlightHotspots,\n legend_row = 2,\n annotation_row = 2\n)\n\n[1] \"numcases: 110\"\n[1] \"numgenes: 10\"\n\n\n[1] \"numcases: 331\"\n[1] \"numgenes: 10\"\n\n\n\n\n\n\n\n\n\n\n\nNote\n\n\n\nIt is only possible to display two groups side-by-side. If the metadata column you want to split on contains more groups, the specific values can be specified with comparison_values parameter.\n\n\n\n\n\n\n\n\nDid you know?\n\n\n\nNotice that we did not need to create individual maf or metadata objects to supply to prettyCoOncoplot - the same objects we used before are also supported here, but specified with differen parameters metadata and maf.\n\n\nIn the above example, we forced the order of genes to be exaclty as we specified so that the same gene is is displayed on the same row for both oncoplots, othervise they wold not be on the same row due to the different frequencies in each group. In addition to specifying this parameter, we have also enforced specific number of rows in the legend below the plot, so they nicely align between the display items." + "text": "Co-oncoplot: two plots side-by-side\nIt may also be informative to generate a display panel where there are two oncoplots displayed side-by-side, so it is possible to visually compare the specific groups of samples while maintaining all annotations and ordering we built so far. For this purpose, the GAMBLR.viz has another function in the pretty family: prettyCoOncoplot. It accepts all of the same parameters as prettyOncoplot with addition of some unique additions. For example, lets break down our sample oncoplot we created so far by the genetic_subgroup and see how cFL compares to dFL:\n\nprettyCoOncoplot(\n metadata = metadata,\n maf = maf,\n comparison_column = \"genetic_subgroup\",\n label1 = \"cFL\",\n label2 = \"dFL\",\n metadataColumns = metadataColumns,\n metadataBarHeight = metadataBarHeight,\n metadataBarFontsize = metadataBarFontsize,\n fontSizeGene = fontSizeGene,\n legendFontSize = legendFontSize,\n sortByColumns = c(\"n\", metadataColumns),\n genes = genes,\n splitGeneGroups = gene_groups,\n splitColumnName = \"pathology\",\n keepGeneOrder = TRUE,\n groupNames = c(\"Follicular lymphoma\", \"DLBCL\", \"COMFL\"),\n hideTopBarplot = hideTopBarplot,\n tally_all_mutations = tally_all_mutations,\n numericMetadataColumns = \"n\",\n arrange_descending = TRUE,\n hide_annotations = \"n\",\n hide_annotations_tracks = TRUE,\n annotate_specific_genes = TRUE,\n this_forest_object = fisher_test,\n highlightHotspots = highlightHotspots,\n legend_row = 2,\n annotation_row = 2\n)\n\n\n\n\n\n\n\n\n\n\nNote\n\n\n\nIt is only possible to display two groups side-by-side. If the metadata column you want to split on contains more groups, the specific values can be specified with comparison_values parameter.\n\n\n\n\n\n\n\n\nDid you know?\n\n\n\nNotice that we did not need to create individual maf or metadata objects to supply to prettyCoOncoplot - the same objects we used before are also supported here, but specified with differen parameters metadata and maf.\n\n\nIn the above example, we forced the order of genes to be exaclty as we specified so that the same gene is is displayed on the same row for both oncoplots, othervise they wold not be on the same row due to the different frequencies in each group. In addition to specifying this parameter, we have also enforced specific number of rows in the legend below the plot, so they nicely align between the display items." }, { "objectID": "tutorials/oncoplot.html#using-oncoplot-in-multi-panel-figure", "href": "tutorials/oncoplot.html#using-oncoplot-in-multi-panel-figure", "title": "Tutorial: The prettiest oncoplot", "section": "Using oncoplot in multi-panel figure", - "text": "Using oncoplot in multi-panel figure\nWhen arranging items for the multi-panel figure when preparing manuscript or experiment report, it may be needed to use the generated oncoplot on the same page as other display items. The prettyOncoplot (and, therefore, prettyCoOncoplot), handles the ComplexHeatmap under the hood to generate graphics, and it is not readily available to be combined with the plots generated with other tools, for example ggplot2. Not readily available - but definitely not impossible! The output of prettyCoOncoplot is directly compatible with the arrangement on multi-panel figure since it uses the trick shown below under the hood to put two panels side-by-side, but the otuput of prettyOncoplot is a ComplexHeatmap object so needs some extra steps to allow multi-panel arrangement. First, lets store the returned oncoplot in a variable:\n\nmy_oncoplot <- prettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n metadataColumns = metadataColumns,\n metadataBarHeight = metadataBarHeight,\n metadataBarFontsize = metadataBarFontsize,\n fontSizeGene = fontSizeGene,\n legendFontSize = legendFontSize,\n sortByColumns = c(\"n\", metadataColumns),\n genes = genes,\n splitGeneGroups = gene_groups,\n splitColumnName = \"pathology\",\n groupNames = c(\"Follicular lymphoma\", \"DLBCL\", \"COMFL\"),\n hideTopBarplot = hideTopBarplot,\n tally_all_mutations = tally_all_mutations,\n numericMetadataColumns = \"n\",\n arrange_descending = TRUE,\n hide_annotations = \"n\",\n hide_annotations_tracks = TRUE,\n annotate_specific_genes = TRUE,\n this_forest_object = fisher_test,\n highlightHotspots = highlightHotspots\n)\n\nNext, we will import some of the packages needed to handle the trick:\n\nlibrary(ComplexHeatmap) # to handle the ComplexHeatmap object\nlibrary(ggpubr) # to arrange multiple panels\n\nAfter that, we will capture the display of the oncoplot:\n\nmy_oncoplot = grid.grabExpr(\n draw(my_oncoplot),\n width = 10,\n height = 17\n)\n\nNow, it is ready for us to arrange in multi-panel figure. We can use the forest plot we already looked at as an example, and put it to the right of the oncoplot:\n\nmultipanel_figure <- ggarrange(\n my_oncoplot, # left panel\n fisher_test$arranged, # right panel\n widths = c(1.5, 1), # so the oncoplot is a little wider than the forest\n labels = c(\"A\", \"B\"), # labels for the panels\n font.label = list( # make labels bold face\n color = \"black\",\n face = \"bold\"\n )\n)\n\nmultipanel_figure\n\n\n\n\nFinal note: it would be nice to have the genes in the forest plot directly aligned with the genes as they are displayed on the oncoplot, and we can do this by providing consistent ordering and adding some white space below forest plot to match the height of the oncoplot:\n\nmy_oncoplot <- prettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n metadataColumns = metadataColumns,\n metadataBarHeight = metadataBarHeight,\n metadataBarFontsize = metadataBarFontsize,\n fontSizeGene = fontSizeGene,\n legendFontSize = legendFontSize,\n sortByColumns = c(\"n\", metadataColumns),\n genes = rev(fisher_test$fisher$gene),\n keepGeneOrder = TRUE,\n splitGeneGroups = gene_groups,\n splitColumnName = \"pathology\",\n groupNames = c(\"Follicular lymphoma\", \"DLBCL\", \"COMFL\"),\n hideTopBarplot = hideTopBarplot,\n tally_all_mutations = tally_all_mutations,\n numericMetadataColumns = \"n\",\n arrange_descending = TRUE,\n hide_annotations = \"n\",\n hide_annotations_tracks = TRUE,\n annotate_specific_genes = TRUE,\n this_forest_object = fisher_test,\n highlightHotspots = highlightHotspots\n)\n\n[1] \"numcases: 441\"\n[1] \"numgenes: 10\"\n\nmy_oncoplot = grid.grabExpr(\n draw(my_oncoplot),\n width = 10,\n height = 17\n)\n\nmultipanel_figure <- ggarrange(\n my_oncoplot, # left panel\n ggarrange( # right panel\n NULL, # empty space at the top\n fisher_test$arranged, # forest on the top\n NULL, # empty space at the bottom\n nrow = 3, # arrange vertically\n heights = c(0.1, 2.5, 1) # match height of the oncoplot\n ),\n widths = c(1.5, 1), # so the oncoplot is a little wider than the forest\n labels = c(\"A\", \"B\"), # labels for the panels\n font.label = list( # make labels bold face\n color = \"black\",\n face = \"bold\"\n )\n)\n\nmultipanel_figure" + "text": "Using oncoplot in multi-panel figure\nWhen arranging items for the multi-panel figure when preparing manuscript or experiment report, it may be needed to use the generated oncoplot on the same page as other display items. The prettyOncoplot (and, therefore, prettyCoOncoplot), handles the ComplexHeatmap under the hood to generate graphics, and it is not readily available to be combined with the plots generated with other tools, for example ggplot2. Not readily available - but definitely not impossible! The output of prettyCoOncoplot is directly compatible with the arrangement on multi-panel figure since it uses the trick shown below under the hood to put two panels side-by-side, but the otuput of prettyOncoplot is a ComplexHeatmap object so needs some extra steps to allow multi-panel arrangement. First, lets store the returned oncoplot in a variable:\n\nmy_oncoplot <- prettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n metadataColumns = metadataColumns,\n metadataBarHeight = metadataBarHeight,\n metadataBarFontsize = metadataBarFontsize,\n fontSizeGene = fontSizeGene,\n legendFontSize = legendFontSize,\n sortByColumns = c(\"n\", metadataColumns),\n genes = genes,\n splitGeneGroups = gene_groups,\n splitColumnName = \"pathology\",\n groupNames = c(\"Follicular lymphoma\", \"DLBCL\", \"COMFL\"),\n hideTopBarplot = hideTopBarplot,\n tally_all_mutations = tally_all_mutations,\n numericMetadataColumns = \"n\",\n arrange_descending = TRUE,\n hide_annotations = \"n\",\n hide_annotations_tracks = TRUE,\n annotate_specific_genes = TRUE,\n this_forest_object = fisher_test,\n highlightHotspots = highlightHotspots\n)\n\nNext, we will import some of the packages needed to handle the trick:\n\nlibrary(ComplexHeatmap) # to handle the ComplexHeatmap object\nlibrary(ggpubr) # to arrange multiple panels\n\nAfter that, we will capture the display of the oncoplot:\n\nmy_oncoplot = grid.grabExpr(\n draw(my_oncoplot),\n width = 10,\n height = 17\n)\n\nNow, it is ready for us to arrange in multi-panel figure. We can use the forest plot we already looked at as an example, and put it to the right of the oncoplot:\n\nmultipanel_figure <- ggarrange(\n my_oncoplot, # left panel\n fisher_test$arranged, # right panel\n widths = c(1.5, 1), # so the oncoplot is a little wider than the forest\n labels = c(\"A\", \"B\"), # labels for the panels\n font.label = list( # make labels bold face\n color = \"black\",\n face = \"bold\"\n )\n)\n\nmultipanel_figure\n\n\n\n\nFinal note: it would be nice to have the genes in the forest plot directly aligned with the genes as they are displayed on the oncoplot, and we can do this by providing consistent ordering and adding some white space below forest plot to match the height of the oncoplot:\n\nmy_oncoplot <- prettyOncoplot(\n these_samples_metadata = metadata,\n maf_df = maf,\n metadataColumns = metadataColumns,\n metadataBarHeight = metadataBarHeight,\n metadataBarFontsize = metadataBarFontsize,\n fontSizeGene = fontSizeGene,\n legendFontSize = legendFontSize,\n sortByColumns = c(\"n\", metadataColumns),\n genes = rev(fisher_test$fisher$gene),\n keepGeneOrder = TRUE,\n splitGeneGroups = gene_groups,\n splitColumnName = \"pathology\",\n groupNames = c(\"Follicular lymphoma\", \"DLBCL\", \"COMFL\"),\n hideTopBarplot = hideTopBarplot,\n tally_all_mutations = tally_all_mutations,\n numericMetadataColumns = \"n\",\n arrange_descending = TRUE,\n hide_annotations = \"n\",\n hide_annotations_tracks = TRUE,\n annotate_specific_genes = TRUE,\n this_forest_object = fisher_test,\n highlightHotspots = highlightHotspots\n)\n\nmy_oncoplot = grid.grabExpr(\n draw(my_oncoplot),\n width = 10,\n height = 17\n)\n\nmultipanel_figure <- ggarrange(\n my_oncoplot, # left panel\n ggarrange( # right panel\n NULL, # empty space at the top\n fisher_test$arranged, # forest on the top\n NULL, # empty space at the bottom\n nrow = 3, # arrange vertically\n heights = c(0.1, 2.5, 1) # match height of the oncoplot\n ),\n widths = c(1.5, 1), # so the oncoplot is a little wider than the forest\n labels = c(\"A\", \"B\"), # labels for the panels\n font.label = list( # make labels bold face\n color = \"black\",\n face = \"bold\"\n )\n)\n\nmultipanel_figure" }, { "objectID": "concepts/GAMBLR_family.html", diff --git a/docs/tutorials/forestplot.html b/docs/tutorials/forestplot.html new file mode 100644 index 0000000..2c6249a --- /dev/null +++ b/docs/tutorials/forestplot.html @@ -0,0 +1,807 @@ + + + + + + + + + +GAMBLR.viz - Tutorial: The prettiest forestplot + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + +
    + +
    + + +
    + + + +
    + +
    +
    +

    Tutorial: The prettiest forestplot

    +
    + + + +
    + + + + +
    + + +
    + +

    One of the integral parts of this package is the analysis and display of the differences in the frequency of mutations for two different groups in a given cohort. Because it is easy to use, conducts flexible comparisons, and generates easy-to-follow display items, it is called prettyForestPlot and it belongs to the pretty family of GAMBLR.viz functions. There is no specific formatting or data preparation needed for the analysis and visualization, and the only required inputs are the mutation data (can be maf format or binary feature matrix), metadata (containing sample identifiers in sample_id column and annotation of the group that will be used in comparison), and a character of the column name in metadata where the sample annotations are specified. This tutorial will demonstate the example of the inputs and showcase the main features of this function.

    +
    +

    Prepare setup

    +

    We will first import the necessary packages:

    +
    +
    # Load packages
    +library(GAMBLR.data)
    +library(GAMBLR.viz)
    +library(dplyr)
    +
    +

    Next, we will get some data to display. The metadata is expected to be a data frame with one required column: sample_id and another column that will contain sample annotations according to the comparison group. In this example, we will use as example the data set and variant calls from the study that identified genetic subgroup of Burkitt lymphoma (BL).

    +
    +
    metadata <- get_gambl_metadata() %>%
    +    filter(cohort == "BL_Thomas")
    +
    +

    Next, we will obtain the coding mutations that will be used in the plotting. The data is a data frame in a standartized maf format.

    +
    +
    maf <- get_ssm_by_samples(
    +    these_samples_metadata = metadata,
    +    tool_name = "publication",
    +    projection = "hg38"
    +)
    +
    +# How does it look like?
    +dim(maf)
    +
    +
    [1] 47043    45
    +
    +
    head(maf) %>%
    +    select(
    +        Tumor_Sample_Barcode,
    +        Hugo_Symbol,
    +        Variant_Classification
    +    )
    +
    +
       Tumor_Sample_Barcode Hugo_Symbol Variant_Classification
    +1:                Akata        CPTP      Missense_Mutation
    +2:                Akata      FNDC10      Missense_Mutation
    +3:                Akata       MORN1      Missense_Mutation
    +4:                Akata       MEGF6      Missense_Mutation
    +5:                Akata       NPHP4                 Silent
    +6:                Akata      GPR157      Missense_Mutation
    +
    +
    +

    For the purpose of this tutorial, we will focus on a small subset of genes known to be significantly mutated in BL.

    +
    +
    genes <- lymphoma_genes_bl_v_latest$Gene
    +head(genes)
    +
    +
    [1] "ALPK2"   "ARHGEF1" "ARID1A"  "B2M"     "BACH2"   "BCL10"  
    +
    +
    +

    Now we have our metadata and mutations we want to explore, so we are ready to start visualizing the data.

    +
    +
    +

    The default forest plot

    +

    The forest plot is ready to be called with the default parameters after just providing the metadata and data frame with mutations in standard maf format. Here is an example of the output with all default parameters:

    +
    +
    comparison_column <- "EBV_status_inf" # character of column name for comparison
    +fp <- prettyForestPlot(
    +    metadata = metadata,
    +    maf = maf,
    +    genes = genes,
    +    comparison_column = comparison_column
    +)
    +
    + + +
    + + Back to top
    + +
    + + + + + \ No newline at end of file diff --git a/docs/tutorials/forestplot.qmd b/docs/tutorials/forestplot.qmd index 3f92a83..8f4c663 100644 --- a/docs/tutorials/forestplot.qmd +++ b/docs/tutorials/forestplot.qmd @@ -50,7 +50,8 @@ The data is a data frame in a standartized maf format. ```{r get_maf} maf <- get_ssm_by_samples( these_samples_metadata = metadata, - tool_name = "publication" + tool_name = "publication", + projection = "hg38" ) # How does it look like? @@ -64,631 +65,45 @@ head(maf) %>% ) ``` -::: {.callout-tip} -## Did you know? -You do not have to subset your maf data frame to coding mutations only before -using it with the `prettyOncoplot`. Much like other tools, it will be -automatically handled for you to only display coding mutations. -::: +For the purpose of this tutorial, we will focus on a small subset of genes known +to be significantly mutated in BL. + +```{r goi} +genes <- lymphoma_genes_bl_v_latest$Gene +head(genes) +``` Now we have our metadata and mutations we want to explore, so we are ready to start visualizing the data. -## The simplest oncoplot +## The default forest plot -There is a number of options how to customize your oncoplot, but it is ready for -you to use with just the metadata and maf. Here is an example of the output with -all default parameters: +The forest plot is ready to be called with the default parameters after just +providing the metadata and data frame with mutations in standard maf format. +Here is an example of the output with all default parameters: ```{r default} #| fig-width: 10 -minMutationPercent <- 10 # only show genes mutated in at least 10% of samples -prettyOncoplot( - these_samples_metadata = metadata, - maf_df = maf, - minMutationPercent = minMutationPercent -) -``` - -## Adding annotation tracks -We can customize this and add some of the annotation tracks for more informative -display of the metadata we ate interested in: -```{r add_annotations} -#| fig-width: 10 -metadataColumns <- c( - "pathology", - "lymphgen", - "genetic_subgroup", - "COO_consensus", - "sex" -) -prettyOncoplot( - these_samples_metadata = metadata, - maf_df = maf, - minMutationPercent = minMutationPercent, - metadataColumns = metadataColumns -) -``` - -## Changing font sizes - -You may notice that as more (or less) genes and annotations are displayed with -the oncoplot we may want to modify the size of the gene names and/or the -annotation tracks with their labels. There are several parameters available for -you to do so: -- `metadataBarHeight`: will change the height of the annotation tracks at the -bottom of the oncoplot -- `metadataBarFontsize`: will change the font size of the annotation tracks at -the bottom of the oncoplot -- `fontSizeGene`: will change the font size of both percentage labels to the -right of the oncoplot and gene names to the left of it -- `legendFontSize`: will change the font size of the legend at the bottom of -the plot -Let's see these parameters in action: -```{r adjust_fonts} -#| fig-height: 8 -metadataBarHeight <- 5 -metadataBarFontsize <- 10 -fontSizeGene <- 12 -legendFontSize <- 7 - -prettyOncoplot( - these_samples_metadata = metadata, - maf_df = maf, - minMutationPercent = minMutationPercent, - metadataColumns = metadataColumns, - metadataBarHeight = metadataBarHeight, - metadataBarFontsize = metadataBarFontsize, - fontSizeGene = fontSizeGene, - legendFontSize = legendFontSize -) -``` - -## Show samples ordered on annotations - -We can notice that the default setting generates the classic "rainfall" style of -the plot - but what if we want to add some structure to it and sort sample order -in some way? It is easy to do so with the parameter `sortByColumns`. We can sort -on the same annotations as we use to display with the oncoplot: -```{r sort_samples} -#| fig-height: 8 - -prettyOncoplot( - these_samples_metadata = metadata, - maf_df = maf, - minMutationPercent = minMutationPercent, - metadataColumns = metadataColumns, - metadataBarHeight = metadataBarHeight, - metadataBarFontsize = metadataBarFontsize, - fontSizeGene = fontSizeGene, - legendFontSize = legendFontSize, - sortByColumns = metadataColumns -) -``` - - -::: {.callout-tip} -## Did you know? -The ordering occurs sequentially according to the order of individual columns we -have specified with the `sortByColumns` parameter. The ordering is in ascending -order, and can be toggled with additional boolean parameter `arrange_descending`. -::: - - -## Displaying only specific genes - -There can be scenarion where we might want to diplay genes not based on their -recurrence, but out of interest in specific genes. Sure so, one way to do it is -to pre-filter your maf data to the genes of interest. But this might have some -unexpected consequences and limit your flexibility in doing more things, so the -better way is to take advantage of the `genes` parameter: - -```{r goi} -fl_genes <- c("RRAGC", "CREBBP", "VMA21", "ATP6V1B2", "EZH2", "KMT2D") -dlbcl_genes <- c("MEF2B", "CD79B", "MYD88", "TP53") -genes <- c(fl_genes, dlbcl_genes) - -prettyOncoplot( - these_samples_metadata = metadata, - maf_df = maf, - metadataColumns = metadataColumns, - metadataBarHeight = metadataBarHeight, - metadataBarFontsize = metadataBarFontsize, - fontSizeGene = fontSizeGene, - legendFontSize = legendFontSize, - sortByColumns = metadataColumns, - genes = genes -) -``` - -::: {.callout-note} -Note that we removed the `minMutationPercent` in the last function call since we -wanted to see the genes that we specifically requested. -::: - -Now we are only looking at some specific genes of interest but they are arranged -in the decreasing order of their recurrence in this cohort. What if we want to -enforce the gene order on the oncoplot to be exactly the same as we specified -it in our `gene` variable? We can take advantage of the `keepGeneOrder` -parameter: -```{r goi_order} -prettyOncoplot( - these_samples_metadata = metadata, - maf_df = maf, - metadataColumns = metadataColumns, - metadataBarHeight = metadataBarHeight, - metadataBarFontsize = metadataBarFontsize, - fontSizeGene = fontSizeGene, - legendFontSize = legendFontSize, - sortByColumns = metadataColumns, - genes = genes, - keepGeneOrder = TRUE -) -``` - -## Grouping genes into categories - -We can also group genes into specific categories. To do so, we need to have a -named list where name of the list element corresponds to the gene name, and -the list element corresponds to the gene group. We alreade have the `genes` -variable, so we can convert it to the appropriate format: - -```{r goi_named_list} -gene_groups <- c( - rep("FL", length(fl_genes)), - rep("DLBCL", length(dlbcl_genes)) -) -names(gene_groups) <- genes - -gene_groups -``` - -Now we can use it to split the genes on the oncoplot into the groups: -```{r goi_groups} -prettyOncoplot( - these_samples_metadata = metadata, - maf_df = maf, - metadataColumns = metadataColumns, - metadataBarHeight = metadataBarHeight, - metadataBarFontsize = metadataBarFontsize, - fontSizeGene = fontSizeGene, - legendFontSize = legendFontSize, - sortByColumns = metadataColumns, - genes = genes, - splitGeneGroups = gene_groups -) -``` - -::: {.callout-tip} -## Did you know? -You can provide more than two groups of genes - any number of groups is -supported as long as they are specified in the `gene_groups`. -::: - -::: {.callout-note} -Within each group, the genes are ordered in decreasing order of their recurrence, -but the `keepGeneOrder` parameter is still supported and if specified, will keep -the specified order within each group. -::: - -## Grouping samples into categories - -Similar to the grouping of genes, we can also group samples into certain -categories. Typically, it is done based on one of the annotations tracks. By -default, there will be no labels for each sample category, but we also have an -option of specifying these labels: - -```{r sample_groups} -prettyOncoplot( - these_samples_metadata = metadata, - maf_df = maf, - metadataColumns = metadataColumns, - metadataBarHeight = metadataBarHeight, - metadataBarFontsize = metadataBarFontsize, - fontSizeGene = fontSizeGene, - legendFontSize = legendFontSize, - sortByColumns = metadataColumns, - genes = genes, - splitGeneGroups = gene_groups, - splitColumnName = "pathology", - groupNames = c("Follicular lymphoma", "DLBCL", "COMFL") -) -``` - -## Tallying mutation burden - -Previously, we noted that the maf data we were supplying to the prettyOncoplot -was not subset to contain only coding mutations, and also discouraged from -pre-filtering maf to a subset of genes if we are insterested only looking at -some of them. **Here is why this is important:** if we want to layer on -additional information like total mutation burden per sample, any subsetting or -filtering of the maf would generate inaccurate and misleading results. Therefore, -`prettyOncoplot` handles all of this for you! So if we were to go ahead with -tallying the total mutation burden, we could just add some additional parameters -to the function call: -```{r tmb} -hideTopBarplot <- FALSE # will display TMB annotations at the top -tally_all_mutations <- TRUE # will tally all mutations per sample - -prettyOncoplot( - these_samples_metadata = metadata, - maf_df = maf, - metadataColumns = metadataColumns, - metadataBarHeight = metadataBarHeight, - metadataBarFontsize = metadataBarFontsize, - fontSizeGene = fontSizeGene, - legendFontSize = legendFontSize, - sortByColumns = metadataColumns, - genes = genes, - splitGeneGroups = gene_groups, - splitColumnName = "pathology", - groupNames = c("Follicular lymphoma", "DLBCL", "COMFL"), - hideTopBarplot = hideTopBarplot, - tally_all_mutations = tally_all_mutations -) -``` -::: {.callout-tip} -## Did you know? -If the dynamic range of total mutation burden is too big and there are some -extreme outliers, the bar chart at the top of the oncoplot can be capped of at -any numeric value by providing `tally_all_mutations_max` parameter. -::: - -What if we want to additionally force the ordering based on the total number of -mutations, so they are nicely arranged in the decreasing order? We can do so by -adding the mutation counts as one of the annotation tracks and using it to sort -the samples: -```{r tmb_order_by_meta} -#| fig-height: 8 - -# Count all muts to define the order of samples -total_mut_burden <- maf %>% - count(Tumor_Sample_Barcode) - -head(total_mut_burden) - -# Add this info to metadata -metadata <- left_join( - metadata, - total_mut_burden -) - -prettyOncoplot( - these_samples_metadata = metadata, - maf_df = maf, - metadataColumns = metadataColumns, - metadataBarHeight = metadataBarHeight, - metadataBarFontsize = metadataBarFontsize, - fontSizeGene = fontSizeGene, - legendFontSize = legendFontSize, - sortByColumns = c("n", metadataColumns), - genes = genes, - splitGeneGroups = gene_groups, - splitColumnName = "pathology", - groupNames = c("Follicular lymphoma", "DLBCL", "COMFL"), - hideTopBarplot = hideTopBarplot, - tally_all_mutations = tally_all_mutations, - numericMetadataColumns = "n", - arrange_descending = TRUE -) -``` - -::: {.callout-note} -We have modified here the `sortByColumns` parameter, and provided two additional -parameters `numericMetadataColumns` and `arrange_descending`. -::: - -::: {.callout-tip} -## Did you know? -The top annotation and `n` annotation at the bottom are the same thing? Remove -`n` from the legend by adding `hide_annotations = "n"` and remove display of -annotation track while keeping the ordering by adding -`hide_annotations_tracks = TRUE`. -::: - -## Annotating significance of mutation frequencies in sample groups - -When looking at our sample plots, we can notice that the frequency of mutations -in *RRAGC*, *ATP6V1B2*, *VMA21* and others is different between FL and DLBCL. -But is this difference significant? Can we layer on this diffenerence to the -display panel? Yes we can, and this is very easy with GAMBLR family! To do so we -will first use another function from GAMBLR.viz to run Fisher's test and find -which genes are significantly different between the FL and DLBCL: - -```{r fisher} -fisher_test <- prettyForestPlot( - maf = maf, - metadata = metadata, - genes = genes, - comparison_column = "pathology", - comparison_values = c("DLBCL", "FL"), # we have three pathologies in data - comparison_name = "FL vs DLBCL" -) -fisher_test$arranged -``` - -In fact, there are genes that are mutated at significantly different -frequencies! Now let's layer on this information to our oncoplot: - -```{r oncoplot_fisher} -prettyOncoplot( - these_samples_metadata = metadata, - maf_df = maf, - metadataColumns = metadataColumns, - metadataBarHeight = metadataBarHeight, - metadataBarFontsize = metadataBarFontsize, - fontSizeGene = fontSizeGene, - legendFontSize = legendFontSize, - sortByColumns = c("n", metadataColumns), - genes = genes, - splitGeneGroups = gene_groups, - splitColumnName = "pathology", - groupNames = c("Follicular lymphoma", "DLBCL", "COMFL"), - hideTopBarplot = hideTopBarplot, - tally_all_mutations = tally_all_mutations, - numericMetadataColumns = "n", - arrange_descending = TRUE, - hide_annotations = "n", - hide_annotations_tracks = TRUE, - annotate_specific_genes = TRUE, - this_forest_object = fisher_test -) -``` - -## Annotating genes with hotspots - -Some genes are mutated at certain positions more often that at others, therefore -creating the mutational hotspots - and it we can layer on this level of -information to our oncoplot. First, we will need to process our maf data to add -a new column called `hot_spot` which will contain a boolean value showing -whether or not particular mutation is a hotspot. If you don't know how to do it, -there is a function for exactly this purpose in the -[GAMBLR.data](https://github.com/morinlab/GAMBLR.data), and we will use it in -this example: -```{r annotate_maf} -# Annotate hotspots -maf <- annotate_hotspots(maf) - -# What are the hotspots? -maf %>% - filter(hot_spot) %>% - select(Hugo_Symbol, hot_spot) %>% - table() -``` -::: {.callout-note} -The GAMBLR.data version of the `annotate_hotspots` only handles very specific -genes and does not have functionality to annotate all hotspots. -::: -Now, we can add annotation of the hotspots to the oncoplot display by toggling -the `highlightHotspots` parameter: -```{r oncoplot_fisher_hotspot} -highlightHotspots <- TRUE -prettyOncoplot( - these_samples_metadata = metadata, - maf_df = maf, - metadataColumns = metadataColumns, - metadataBarHeight = metadataBarHeight, - metadataBarFontsize = metadataBarFontsize, - fontSizeGene = fontSizeGene, - legendFontSize = legendFontSize, - sortByColumns = c("n", metadataColumns), - genes = genes, - splitGeneGroups = gene_groups, - splitColumnName = "pathology", - groupNames = c("Follicular lymphoma", "DLBCL", "COMFL"), - hideTopBarplot = hideTopBarplot, - tally_all_mutations = tally_all_mutations, - numericMetadataColumns = "n", - arrange_descending = TRUE, - hide_annotations = "n", - hide_annotations_tracks = TRUE, - annotate_specific_genes = TRUE, - this_forest_object = fisher_test, - highlightHotspots = highlightHotspots -) -``` - -## Co-oncoplot: two plots side-by-side - -It may also be informative to generate a display panel where there are two -oncoplots displayed side-by-side, so it is possible to visually compare the -specific groups of samples while maintaining all annotations and ordering we -built so far. For this purpose, the GAMBLR.viz has another function in the -`pretty` family: `prettyCoOncoplot`. It accepts all of the same parameters as -`prettyOncoplot` with addition of some unique additions. For example, lets -break down our sample oncoplot we created so far by the `genetic_subgroup` and -see how cFL compares to dFL: - -```{r cooncoplot} -#| fig-keep: last -#| fig-height: 8 -#| fig-width: 15 - -prettyCoOncoplot( +#| fig-height: 15 +comparison_column <- "EBV_status_inf" # character of column name for comparison +fp <- prettyForestPlot( metadata = metadata, maf = maf, - comparison_column = "genetic_subgroup", - label1 = "cFL", - label2 = "dFL", - metadataColumns = metadataColumns, - metadataBarHeight = metadataBarHeight, - metadataBarFontsize = metadataBarFontsize, - fontSizeGene = fontSizeGene, - legendFontSize = legendFontSize, - sortByColumns = c("n", metadataColumns), genes = genes, - splitGeneGroups = gene_groups, - splitColumnName = "pathology", - keepGeneOrder = TRUE, - groupNames = c("Follicular lymphoma", "DLBCL", "COMFL"), - hideTopBarplot = hideTopBarplot, - tally_all_mutations = tally_all_mutations, - numericMetadataColumns = "n", - arrange_descending = TRUE, - hide_annotations = "n", - hide_annotations_tracks = TRUE, - annotate_specific_genes = TRUE, - this_forest_object = fisher_test, - highlightHotspots = highlightHotspots, - legend_row = 2, - annotation_row = 2 + comparison_column = comparison_column ) ``` -::: {.callout-note} -It is only possible to display two groups side-by-side. If the metadata column -you want to split on contains more groups, the specific values can be specified -with `comparison_values` parameter. -::: - -::: {.callout-tip} -## Did you know? -Notice that we did not need to create individual maf or metadata objects to -supply to `prettyCoOncoplot` - the same objects we used before are also -supported here, but specified with differen parameters `metadata` and `maf`. -::: - -In the above example, we forced the order of genes to be exaclty as we specified -so that the same gene is is displayed on the same row for both oncoplots, -othervise they wold not be on the same row due to the different frequencies in -each group. -In addition to specifying this parameter, we have also enforced specific number -of rows in the legend below the plot, so they nicely align between the display -items. - -## Using oncoplot in multi-panel figure - -When arranging items for the multi-panel figure when preparing manuscript or -experiment report, it may be needed to use the generated oncoplot on the same -page as other display items. The `prettyOncoplot` (and, therefore, -`prettyCoOncoplot`), handles the ComplexHeatmap under the hood to generate -graphics, and it is not readily available to be combined with the plots -generated with other tools, for example `ggplot2`. Not readily available - but -definitely not impossible! -The output of `prettyCoOncoplot` is directly compatible with the arrangement on -multi-panel figure since it uses the trick shown below under the hood to put -two panels side-by-side, but the otuput of `prettyOncoplot` is a ComplexHeatmap -object so needs some extra steps to allow multi-panel arrangement. -First, lets store the returned oncoplot in a variable: - -```{r store_oncoplot} -#| output: false -my_oncoplot <- prettyOncoplot( - these_samples_metadata = metadata, - maf_df = maf, - metadataColumns = metadataColumns, - metadataBarHeight = metadataBarHeight, - metadataBarFontsize = metadataBarFontsize, - fontSizeGene = fontSizeGene, - legendFontSize = legendFontSize, - sortByColumns = c("n", metadataColumns), - genes = genes, - splitGeneGroups = gene_groups, - splitColumnName = "pathology", - groupNames = c("Follicular lymphoma", "DLBCL", "COMFL"), - hideTopBarplot = hideTopBarplot, - tally_all_mutations = tally_all_mutations, - numericMetadataColumns = "n", - arrange_descending = TRUE, - hide_annotations = "n", - hide_annotations_tracks = TRUE, - annotate_specific_genes = TRUE, - this_forest_object = fisher_test, - highlightHotspots = highlightHotspots -) -``` - -Next, we will import some of the packages needed to handle the trick: -```{r load_extra} -library(ComplexHeatmap) # to handle the ComplexHeatmap object -library(ggpubr) # to arrange multiple panels -``` - -After that, we will capture the display of the oncoplot: -```{r capture_oncoplot} -my_oncoplot = grid.grabExpr( - draw(my_oncoplot), - width = 10, - height = 17 -) -``` - -Now, it is ready for us to arrange in multi-panel figure. We can use the forest -plot we already looked at as an example, and put it to the right of the -oncoplot: -```{r multi_panel} -#| fig-height: 8 -#| fig-width: 13 - -multipanel_figure <- ggarrange( - my_oncoplot, # left panel - fisher_test$arranged, # right panel - widths = c(1.5, 1), # so the oncoplot is a little wider than the forest - labels = c("A", "B"), # labels for the panels - font.label = list( # make labels bold face - color = "black", - face = "bold" - ) -) - -multipanel_figure -``` - -Final note: it would be nice to have the genes in the forest plot directly -aligned with the genes as they are displayed on the oncoplot, and we can do this -by providing consistent ordering and adding some white space below forest plot -to match the height of the oncoplot: -```{r final_plot} -#| fig-keep: last -#| fig-height: 8 -#| fig-width: 13 -my_oncoplot <- prettyOncoplot( - these_samples_metadata = metadata, - maf_df = maf, - metadataColumns = metadataColumns, - metadataBarHeight = metadataBarHeight, - metadataBarFontsize = metadataBarFontsize, - fontSizeGene = fontSizeGene, - legendFontSize = legendFontSize, - sortByColumns = c("n", metadataColumns), - genes = rev(fisher_test$fisher$gene), - keepGeneOrder = TRUE, - splitGeneGroups = gene_groups, - splitColumnName = "pathology", - groupNames = c("Follicular lymphoma", "DLBCL", "COMFL"), - hideTopBarplot = hideTopBarplot, - tally_all_mutations = tally_all_mutations, - numericMetadataColumns = "n", - arrange_descending = TRUE, - hide_annotations = "n", - hide_annotations_tracks = TRUE, - annotate_specific_genes = TRUE, - this_forest_object = fisher_test, - highlightHotspots = highlightHotspots -) - -my_oncoplot = grid.grabExpr( - draw(my_oncoplot), - width = 10, - height = 17 -) - -multipanel_figure <- ggarrange( - my_oncoplot, # left panel - ggarrange( # right panel - NULL, # empty space at the top - fisher_test$arranged, # forest on the top - NULL, # empty space at the bottom - nrow = 3, # arrange vertically - heights = c(0.1, 2.5, 1) # match height of the oncoplot - ), - widths = c(1.5, 1), # so the oncoplot is a little wider than the forest - labels = c("A", "B"), # labels for the panels - font.label = list( # make labels bold face - color = "black", - face = "bold" - ) -) +The output of the function is a list containing the following objects: +- `fisher`: a data frame with detailed statistics of the Fisher's test for each +gene +- `mutmat`: a binary matrix used for the Fisher's test +- `forest`: a ggplot2 object with the forest plot of the ORs from the Fisher's +test for each gene +- `bar`: a ggplot2 object wiht mutation frequencies for each Gene +- `arranged`: a display item where both the forest and bar plots are nicely +arranged side-by-side -multipanel_figure +```{r} +names(fp) ``` diff --git a/docs/tutorials/getting_started.html b/docs/tutorials/getting_started.html index f371e2e..f0a639b 100644 --- a/docs/tutorials/getting_started.html +++ b/docs/tutorials/getting_started.html @@ -137,6 +137,10 @@
  • Frequently Asked Qestions +
  • +
  • + + Tutorial: The prettiest forestplot
  • @@ -238,6 +242,12 @@ Frequently Asked Qestions +
  • +
  • Frequently Asked Qestions +
  • +
  • + + Tutorial: The prettiest forestplot
  • @@ -238,6 +242,12 @@ Frequently Asked Qestions +
  • +
  • Frequently Asked Qestions +
  • +
  • + + Tutorial: The prettiest forestplot