R based open-source collection of scripts called OUKS (Omics Untargeted Key Script) providing comprehensive nine step LC-MS untargeted metabolomic profiling data processing toolbox 🧰
Script | Purpose |
---|---|
1. Randomization.R | experimental design and sample randomization |
2. Integration.R | peaks integration and time alignment |
3. Imputation.R | missing value imputation (MVI) and artifacts removal |
4. Correction.R | signal drift correction and batch effect removal |
5. Annotation.R | feature annotation and tentative identification by database search |
6. Filtering.R | peaks filtering for quality checking and accounting of technical variation |
7. Normalization.R | data normalization and adjusting of biological variation |
8. Grouping.R | peaks grouping and molecular features clustering |
9. Statistics.R | statistical analysis and hypothesis testing |
- Instruction and introduction into the OUKS toolbox is provided by Basic tutorial file. Session info and installed packages are listed in corresponding files.
- Scripts with comments, notes and references are stored in Scripts folder at a previously defined order along with code for plotting figures associated with article.
- MS2 spectra for selected potential biomarkers of bladder cancer are stored in mzXML format at corresponding folder.
- Datasets in .csv and other files (.RData, .R) are available for reproducibility from corresponding folders. Files descriptions are provived by Roadmap file. Raw data (.CDF format) are available from Metabolomics Workbench Repository, study ID: ST001682. Metadata table is also provided.
- Report in .Rmd, .pdf and .docx formats were provided as an example to reproduce the OUKS code script.
The only requirements are to be familiar with the basic syntax of the R language, PC with Internet connection and Windows OS (desirable), RStudio and R (≥ 4.0.0).
- 🗓️ 2021.04.29
- Creation date 🎬
- 🗓️ 2021.06.23
- Freely available at link (Supporting Information File 2).
- 🗓️ 2021.07.12
- "9. Statistics": Outlier detection method implementation (by Mahalanobis distance) via ClassDiscovery package (3.3.13, CRAN) was added. OutlierDetection package require spatstat package version 1.64-1 (CRAN).
- "9. Statistics": Add adjusted p-value for multiple comparisons in all cases.
- "9. Statistics": Multigroup Fold Change (structToolbox package) was replaced by base packages implementation.
- "7. Normalization": Add adjusted p-value for multiple comparisons in all cases.
- "4. Correction": Add PCA with gradient color.
- “Deleted functionality.R” was created for storing deleted code strings.
- 🗓️ 2021.07.17
- "7. Normalization": GAM (mgcv, 1.8-32, CRAN) and GAMM (gamm4, 0.2-6, CRAN) were added as new biological factor adjustment algorithms.
- "9. Statistics": Section “Signal Modeling” was added for LM, LMM, GAM (mgcv, 1.8-32, CRAN), GAMM (gamm4, 0.2-6, CRAN) and some other nonlinear functions for Dose-Response curve analysis (drc, 3.0-1, CRAN) modeling.
- "9. Statistics": In section “Time series” Dose-Response curve analysis and modeling was added (DRomics, 2.2-0, CRAN).
- 🗓️ 2021.08.04
- "5. Annotation": mWISE (0.1.0, GH, forked from b2slab/mWISE to plyush1993/mWISE and depends were manually changed to R (>= 4.0)).
- "9. Statistics": Add tdfdr (0.1, GH) for two-dimensional false discovery rate control in filtration and multigroup analysis.
- All scripts (from 5. Annotation to 9. Statistics) and files were updated.
- 🗓️ 2021.08.18
- "4. Correction": Add PC-PR2 for correction evaluation (pcpr2, 0.0.0.1, GH).
- "9. Statistics": Add PC-PR2 and PVCA for multigroup analysis.
- 🗓️ 2021.08.31
- "4. Correction": Add box-plot, mean Silhouette Score and One-Sample Test metric.
- "7. Normalization": Box-plot construction updated.
- 🗓️ 2021.09.10
- "5. Annotation": metID (1.1.0, GH) for database identification from peak table.
- "7. Normalization": GPBoost (gpboost, 0.6.7, CRAN) with boosting and mixed effects boosting were added as new biological factor adjustment algorithms.
- "9. Statistics": Slightly changed Fold Change calculations and canonical limma implementation was added.
- 🗓️ 2021.10.07
- "9. Statistics": In section “Time series” TOXcms (1.0.3, GH) and timeOmics (1.0.1, BC) were added, also DRomics part was updated.
- "4. Correction": Add WaveICA2.0 (0.1.0, GH) correction method.
- Add reports (by R Markdown, folder Report (Rmd)).
OUKS has been published in the Journal of Proteome Research. If you use this software to analyze your own data, please cite it as below, thanks:
Ivan V. Plyushchenko, Elizaveta S. Fedorova, Natalia V. Potoldykova, Konstantin A. Polyakovskiy, Alexander I. Glukhov, Igor A. Rodin. Omics Untargeted Key Script: R‑Based Software Toolbox for Untargeted Metabolomics with Bladder Cancer Biomarkers Discovery Case Study, Journal of Proteome Research, 2021, https://doi.org/10.1021/acs.jproteome.1c00392.
Please send any comment, suggestion or question you may have to the author (Mr. Ivan Plyushchenko 👨🔬): 📧 plyushchenko.ivan@gmail.com, 0000-0003-3883-4695.