Time Course Analysis of Differential Gene Expression.

Working analysis for differentially expressed genes (DEGs) at individual timepoints and across time using three seperate analytical techniques (TiSA, Moanin, Temporal DESeq2).

Introduction

The analysis provided in this git repository supports the assignments of BIT101 & BIT107 of MSc Big Data Biology with the focus to:

Download RNAseq metadata from their GEO IDs,
Analyse DEGs,
Investigate the temporal nature of DEGs,
Investigate pathways effected.

Tip

The analysis provided here, and written in the BIT101 Scientific Communication Style Report uses data under the GEOID GSE217504.

Visualisation of the Workflow

RMarkdown Method

A comprehensive RMarkdown can be found in 0_RMarkdown_Summary/Assessment_Summary.Rmd. To recreate all elements of the BIT101 report submission knit this markdown. Note, it will take approximately 5 minutes to render.

Figure 5 & Supplementary Figure 7 were created using the TiSA method (mentioned below) and have been included in the RMarkdown as images. Manual editing of Figure 5 was needed to arrange similarly to Figure 6.

Overview of Scripts

All scripts can be found within the 1_analysis_scripts directory, with the exception of 1_analysis_scripts/TiSA_rmarkdown_method/5_TS_analysis.rmd1.

Script ID	Function	Language	Reference
0_matrix_download.sh	This is a bash script and needs to be run using the command line, this will download the matrix for any given GEOID provided.	Bash	N/A
1_DEG_Analysis.R	DEGs are identified across time and at each individual timepoint. This script produces figures used for differential analysis.	R	Batut et al., 2016 [1], Hiltemann et al., 2023 [3], Batut et al., 2018 [2],
2_Temporal_Analysis_ DESeq2.R	The first script of the time series analysis, conventional DESeq2 provides basic temporal information for the change of counts over time.	R	Love et al., 2019 [6],
3_Temporal_Analysis_ Moanin.R	This model takes genes of interest(*) and fits a spline curve to track their change over time.	R	Varoquaux & Purdom., 2020 [7],
4_TS_Analysis_Prep.R	This is a preperation script for the following time series analysis which requires the input data to be properly formatted; metadata needs to be comma seperated, and each sample to have its own count matrix which is tab seperated.	R	N/A
5_TS_analysis.rmd	This is a sample taken from the larger TiSA pipeline, this was chosen due to its automation. This runs a time series analysis for genes of interest(*).	Rmd	Lefol et al., 2023 [5],
6.1_Profile_Analysis.R	Profile analysis makes use of the KEGG, REACTOME and GO:BP pathway enrichment databases and used locally in R, this script is able to accept different settings to use a user specified database.	R	BioStatsSquid [4]
6.2_SUPPLEMENTARY _Profile_Analysis.R	Same functionality as `6.1_Profile_Analysis.R` but looks specifically at the individual timepoints (4, 12, 48hrs)	R	BioStatsSquid [4]

Tip

(*) - The genes of interest used in this analysis have been kept constant between Moanin and TiSA to compare both methods. These genes include: LOC112268133, CLIC4, NDNF, MGLL, CTH, ACE2, ACE, & TMPRSS2.

ACE2, ACE and TMPRSS2 were specifically selected due to their importance in viral entry and the remaining selected at random from a list of the top 10 p-adjusted DEGs.

Installation

This workflow is designed to be able to run as soon as it is downloaded. All of the scripts that need to be run can be found in 1_analysis_scripts and are numbered based on the order of analysis (also highlighted in the visualisation of the workflow above).

Tip

The script to download the metadata (0_matrix_download.sh) is a bash script and will need to be run on the command line.

Installation of this workflow is simple, clone this repository into RStudio; this workflow was prepared locally with the github extension in R and not on Posit.

To successfully do this, copy https://github.com/shelleydan/BIT107-A2.git and open a new project in RStudio. Select Version Control and Git, there paste the copied github repository URL. This will clone the repository.

Running the Analysis.

Run 0_matrix_download.sh, you will be prompted to input a GEOID, to follow the analysis in this repository use GSE217504.
Next, a targets.txt will be generated from the bash script; manually exported into RStudio, saved into 2_rawdata. Unfortunately, the count matrices all have unique HTMLs and must be downloaded manually.
The first script to use is 1_analysis_scripts/1_DEG_Analysis.R - this will clean the rawdata and apply the DESeq2 Analysis to individual timepoints.
The beginning of the temporal analysis starts with 1_analysis_scripts/2_Temporal_Analysis_DESeq2.R to apply some basic analysis.
The Moanin model (1_analysis_scripts/3_Temporal_Analysis_Moanin.R) is the secomnd temporal analysis, this will take the genes of intesest and fit a spline to their expression overtime.
The final temporal model is the Time Series Analysis, where we use an RMarkdown method (further instructions for this method can be found below - it's a bit different).
The next stage of analysis uses a sum of the DEGs at all timepoints and applies a pathway analysis.

Time Series Analysis

The RMarkdown to use for this analysis is found here: 1_analysis_scripts/TiSA_rmarkdown_method/5_TS_analysis.rmd - this pipeline was designed by Lefol et al. (2023)
From the initial DESeq2 analysis, we save the cleaned targets and count data to 2_rawdata/1_counts and 2_rawdata/2_targets.
Firstly, we need to prepare the data for analysis - this is done using 1_analysis_scripts/4_TS_Analysis_Prep.R. NB: The counts data needs to be tab seperated for the pipeline to function, do not alted the preperation script and all will be good.
All parameters have been set within 1_analysis_scripts/TiSA_rmarkdown_method/5_TS_analysis.rmd, including a list of genes used to compare the temporal methods. To run this analysis with the settings for this assingment simply knit the RMarkdown.

Note: Performing this analysis on the assignment dataset may take some time - the initial run took ~3hrs, the .html is provided as an example of what the markdown would produce.

Bug Reporting

Please understand that although this is for an assignment, this workflow may be continued to work on in the future - especially to reuse some of the code generated here. If anyone identifies any issues or errors with this analysis please submit an **issue.**

Known Issues

Some of the errors that can be produced are the following, but they do not impact the analysis:

Some errors producing the volcano plots with ggplot2 will be printed into the console - this is only due to the sheer volume of points to plot - the volcano plots will plot despite this.
The Moanin model will express that the fit of the spline is outside of the area - the area is maximised to balance resolution of the other genes, this is unfortunate but avoids sacrificing any other information.
If you attempt to run 6.2_SUPPLEMENTARY_Profile_Analysis.R be aware that at 4hrs for downregulated DEGs, and at 12hrs upregulated DEGs there are no significant pathways for REACTOME analysis - therefore this will flag an error and not work.

If using the RMarkdown method found in 0_RMarkdown_Method, none of this errors appear and the markdown is rendered.

Referances

1 Batut, B. et al. 2016. Reference-based RNA-Seq data analysis. Available at: https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/ref-based/tutorial.html [Accessed: 4 December 2024].
2 Batut, B. et al. 2018. Community-Driven Data Analysis Training for Biology. Cell Systems 6(6), pp. 752-758.e1. doi: 10.1016/j.cels.2018.05.012.
3 Hiltemann, S. et al. 2023. Galaxy Training: A powerful framework for teaching! PLOS Computational Biology 19(1), p. e1010752. doi: 10.1371/journal.pcbi.1010752.
4 Laura, L. BioStatSquid - Overview. Available at: https://github.com/biostatsquid [Accessed: 4 December 2024].
5 Lefol, Y. et al. 2023. TiSA: TimeSeriesAnalysis—a pipeline for the analysis of longitudinal transcriptomics data. NAR Genomics and Bioinformatics 5(1), p. lqad020. doi: 10.1093/nargab/lqad020.
6 Love, M.L., Anders, S., Kim, V. and Huber, W. 2019. RNA-seq workflow: gene-level exploratory analysis and differential expression. BioConductor. Available at: https://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html.
7 Varoquaux, N. and Purdom, E. 2020. A pipeline to analyse time-course gene expression data. Available at: https://f1000research.com/articles/9-1447 [Accessed: 4 December 2024].

RStudio SessionInfo()

R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=English_United Kingdom.utf8  LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggupset_0.4.0               enrichplot_1.22.0           DOSE_3.28.2                 org.Hs.eg.db_3.18.0        
 [5] clusterProfiler_4.10.1      ggfortify_0.4.17            NMF_0.28                    rngtools_1.5.2             
 [9] registry_0.5-1              BiocWorkflowTools_1.28.0    biomaRt_2.58.2              moanin_1.10.0              
[13] topGO_2.54.0                SparseM_1.84-2              GO.db_3.18.0                AnnotationDbi_1.64.1       
[17] graph_1.80.0                timecoursedata_1.12.0       BiocManager_1.30.25         ggplotify_0.1.2            
[21] plotly_4.10.4               ggdendro_0.2.0              factoextra_1.0.7            cluster_2.1.4              
[25] ggpubr_0.6.0                ggrepel_0.9.6               RColorBrewer_1.1-3          dendextend_1.19.0          
[29] DESeq2_1.42.1               SummarizedExperiment_1.32.0 Biobase_2.62.0              MatrixGenerics_1.14.0      
[33] matrixStats_1.4.1           apeglm_1.24.0               Rsamtools_2.18.0            Biostrings_2.70.3          
[37] XVector_0.42.0              GenomicRanges_1.54.1        GenomeInfoDb_1.38.8         IRanges_2.36.0             
[41] S4Vectors_0.40.2            BiocGenerics_0.48.1         lubridate_1.9.3             forcats_1.0.0              
[45] stringr_1.5.1               purrr_1.0.2                 readr_2.1.5                 tidyr_1.3.1                
[49] tibble_3.2.1                ggplot2_3.5.1               tidyverse_2.0.0             dplyr_1.1.4                
[53] pheatmap_1.0.12            

loaded via a namespace (and not attached):
  [1] splines_4.3.2           bitops_1.0-9            filelock_1.0.3          polyclip_1.10-7        
  [5] XML_3.99-0.17           lifecycle_1.0.4         rstatix_0.7.2           edgeR_4.0.16           
  [9] doParallel_1.0.17       lattice_0.21-9          MASS_7.3-60             backports_1.5.0        
 [13] magrittr_2.0.3          limma_3.58.1            rmarkdown_2.29          yaml_2.3.10            
 [17] cowplot_1.1.3           DBI_1.2.3               abind_1.4-8             zlibbioc_1.48.2        
 [21] ggraph_2.2.1            RCurl_1.98-1.16         yulab.utils_0.1.8       tweenr_2.0.3           
 [25] rappdirs_0.3.3          git2r_0.35.0            GenomeInfoDbData_1.2.11 tidytree_0.4.6         
 [29] codetools_0.2-19        DelayedArray_0.28.0     ggforce_0.4.2           xml2_1.3.6             
 [33] tidyselect_1.2.1        aplot_0.2.3             farver_2.1.2            gmp_0.7-5              
 [37] viridis_0.6.5           BiocFileCache_2.10.2    jsonlite_1.8.9          tidygraph_1.3.1        
 [41] Formula_1.2-5           iterators_1.0.14        bbmle_1.0.25.1          foreach_1.5.2          
 [45] tools_4.3.2             progress_1.2.3          treeio_1.26.0           Rcpp_1.0.13-1          
 [49] glue_1.8.0              gridExtra_2.3           SparseArray_1.2.4       xfun_0.49              
 [53] qvalue_2.34.0           usethis_3.0.0           withr_3.0.2             numDeriv_2016.8-1.1    
 [57] fastmap_1.2.0           fansi_1.0.6             digest_0.6.37           timechange_0.3.0       
 [61] R6_2.5.1                gridGraphics_0.5-1      colorspace_2.1-1        RSQLite_2.3.7          
 [65] utf8_1.2.4              generics_0.1.3          data.table_1.16.2       graphlayouts_1.2.0     
 [69] prettyunits_1.2.0       httr_1.4.7              htmlwidgets_1.6.4       S4Arrays_1.2.1         
 [73] scatterpie_0.2.4        pkgconfig_2.0.3         gtable_0.3.6            blob_1.2.4             
 [77] shadowtext_0.1.4        htmltools_0.5.8.1       carData_3.0-5           fgsea_1.28.0           
 [81] bookdown_0.41           scales_1.3.0            ClusterR_1.3.3          png_0.1-8              
 [85] ggfun_0.1.7             knitr_1.49              rstudioapi_0.17.1       tzdb_0.4.0             
 [89] reshape2_1.4.4          nlme_3.1-163            coda_0.19-4.1           curl_6.0.0             
 [93] bdsmatrix_1.3-7         cachem_1.1.0            parallel_4.3.2          HDO.db_0.99.1          
 [97] pillar_1.9.0            grid_4.3.2              vctrs_0.6.5             car_3.1-3              
[101] dbplyr_2.5.0            evaluate_1.0.1          mvtnorm_1.3-2           cli_3.6.3              
[105] locfit_1.5-9.10         compiler_4.3.2          rlang_1.1.4             crayon_1.5.3           
[109] ggsignif_0.6.4          emdbook_1.3.13          plyr_1.8.9              fs_1.6.5               
[113] stringi_1.8.4           viridisLite_0.4.2       gridBase_0.4-7          BiocParallel_1.36.0    
[117] munsell_0.5.1           lazyeval_0.2.2          GOSemSim_2.28.1         Matrix_1.6-1.1         
[121] patchwork_1.3.0         hms_1.1.3               bit64_4.5.2             KEGGREST_1.42.0        
[125] statmod_1.5.0           igraph_2.1.1            broom_1.0.7             memoise_2.0.1          
[129] ggtree_3.10.1           fastmatch_1.1-4         bit_4.5.0               gson_0.1.0             
[133] ape_5.8

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
0_RMarkdown_Summary		0_RMarkdown_Summary
1_analysis_scripts		1_analysis_scripts
2_rawdata		2_rawdata
3_output_data		3_output_data
4_figures		4_figures
5_Gene_Lists		5_Gene_Lists
.doc.css		.doc.css
.gitignore		.gitignore
BIT107-A2.Rproj		BIT107-A2.Rproj
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Time Course Analysis of Differential Gene Expression.

Working analysis for differentially expressed genes (DEGs) at individual timepoints and across time using three seperate analytical techniques (TiSA, Moanin, Temporal DESeq2).

Introduction

Visualisation of the Workflow

RMarkdown Method

Overview of Scripts

Installation

Running the Analysis.

Bug Reporting

Known Issues

Referances

RStudio SessionInfo()

About

Releases

Packages

Languages

shelleydan/BIT107-A2

Folders and files

Latest commit

History

Repository files navigation

Time Course Analysis of Differential Gene Expression.

Working analysis for differentially expressed genes (DEGs) at individual timepoints and across time using three seperate analytical techniques (TiSA, Moanin, Temporal DESeq2).

Introduction

Visualisation of the Workflow

RMarkdown Method

Overview of Scripts

Installation

Running the Analysis.

Bug Reporting

Known Issues

Referances

RStudio SessionInfo()

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages