Skip to content

Commit

Permalink
Merge pull request #236 from MoTrPAC/develop
Browse files Browse the repository at this point in the history
MotrpacBicQC 0.9.3: critical update, additional validation of refmet names
  • Loading branch information
biodavidjm authored Mar 25, 2024
2 parents f6242f9 + f523ef1 commit 3b50656
Show file tree
Hide file tree
Showing 95 changed files with 343 additions and 162 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Package: MotrpacBicQC
Type: Package
Title: QC/QA functions for the MoTrPAC community
Version: 0.9.2
Date: 2024-03-04
Version: 0.9.3
Date: 2024-03-25
Author: MoTrPAC Bioinformatics Center
Maintainer: David Jimenez-Morales <davidjm@stanford.edu>
Description: R Package for the analysis of MoTrPAC datasets.
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ importFrom(httr,status_code)
importFrom(inspectdf,inspect_na)
importFrom(jsonlite,fromJSON)
importFrom(lubridate,parse_date_time)
importFrom(readr,read_delim)
importFrom(readr,read_lines)
importFrom(scales,percent)
importFrom(stats,median)
Expand Down
12 changes: 12 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
# MotrpacBicQC 0.9.3 (2024-03-25)

* Critical update `validate_refmetname`: ensure checking the refmet standarized name. Update refmet tests
* Update `get_and_validate_mdd()`
+ Update REST service url
+ Update Documentation
+ Remove dependency on data.table
* Enhance: only one `metadata_phase` file allowed
* Enhance `dl_read_gcp`: replace data table by read_delim
* Enhance `open_file`: accept only tab delimited files


# MotrpacBicQC 0.9.2 (2024-03-04)

* Critical Update: Resolved an issue where the validation of refmet names was
Expand Down
99 changes: 59 additions & 40 deletions R/metabolomics_data_dictionary.R
Original file line number Diff line number Diff line change
@@ -1,44 +1,61 @@

# All about the RefMet data dictionary


#' @title Get and validate the entire RefMet database from Metabolomics Workbench
#' @title Get and Validate the Entire RefMet Database from Metabolomics Workbench
#'
#' @description This function fetches and validates the Metabolomics Data Dictionary
#' from the Metabolomics Workbench. It provides options to remove duplicates.
#'
#' @param remove_duplications Logical; if `TRUE`, removes duplicate entries based on
#' the `refmet_name` column.
#' @param verbose Logical; if `TRUE` (default), displays progress messages and warnings
#' during the function execution.
#'
#' @return Returns a data frame with the following columns:
#' \describe{
#' \item{\code{refmet_name}}{Character; the name standarized refmet name}
#' \item{\code{pubchem_cid}}{Character; the PubChem compound ID.}
#' \item{\code{lm_id}}{Character; the LIPID MAPS ID.}
#' \item{\code{inchi_key}}{Character; the International Chemical Identifier Key.}
#' \item{\code{exactmass}}{Numeric; the exact mass of the metabolite.}
#' \item{\code{formula}}{Character; the chemical formula of the metabolite.}
#' \item{\code{super_class}}{Character; the superclass category of the metabolite.}
#' \item{\code{main_class}}{Character; the main class category of the metabolite.}
#' \item{\code{sub_class}}{Character; the subclass category of the metabolite.}
#' \item{\code{hmdb_id}}{Character; the Human Metabolome Database ID.}
#' \item{\code{kegg_id}}{Character; the Kyoto Encyclopedia of Genes and Genomes ID.}
#' }
#' Each row of the data frame represents a unique metabolite entry from the
#' Metabolomics Workbench Data Dictionary.
#'
#' @details This function downloads the entire RefMet database from the Metabolomics
#' Workbench using their REST API. The data is initially fetched in JSON format and
#' then converted to a data frame. The function checks for the presence of a 'name'
#' column in the data frame, renaming it to 'refmet_name' for consistency. It also
#' provides an option to remove duplicate entries based on the 'refmet_name' column.
#' If duplicates are found and \code{remove_duplications} is `FALSE`, the function will
#' list the duplicated IDs but will not remove them. This can be helpful for reviewing
#' the data quality and consistency.
#'
#' @examples
#' \dontrun{
#' refmet <- get_and_validate_mdd(remove_duplications = TRUE, verbose = TRUE)
#' head(refmet)
#' }
#'
#' @description Get and validate Metabolomics Data Dictionary from
#' Metabolomics Workbench
#' @param remove_duplications (logical) if `TRUE``, removes duplications.
#' @return (vector) PHASE code
#' @export
get_and_validate_mdd <- function(remove_duplications = FALSE, verbose = TRUE){

# get_and_validate_mdd <- function(remove_duplications = FALSE){
#
# refmet <- MotrpacBicQC::metabolomics_data_dictionary
#
# if(remove_duplications){
# # REMOVE DUPLICATIONS
# if( any(duplicated(refmet$refmet_name)) ){
# duplirefmets <- length(refmet$refmet_name[(duplicated(refmet$refmet_name))])
# # message("WARNING: [ ",duplirefmets, " ] DUPLICATION(s) found in data dictionary!")
# refmet <- refmet[!(duplicated(refmet$refmet_name)),]
# }
# }
# return(refmet)
# }

get_and_validate_mdd <- function(remove_duplications = FALSE){

.id = name = NULL
name = NULL

if(verbose) message("- Warning: Downloading data from Metabolomics Workbench. This might take a few minutes.")
# REST metabolomics workbench data dictionary
# Previous REST version (motrpac only)
# Previous REST versions
# refmetjson <- jsonlite::fromJSON("https://www.metabolomicsworkbench.org/rest/refmet/motrpac")
refmetjson <- jsonlite::fromJSON("https://www.metabolomicsworkbench.org/rest/refmet/all/")

dt_list <- purrr::map(refmetjson, as.data.table)
dt <- data.table::rbindlist(dt_list, fill = TRUE, idcol = T)
df <- as.data.frame(dt)

colnames(df) <- tolower(colnames(df))
# refmetjson <- jsonlite::fromJSON("https://www.metabolomicsworkbench.org/rest/refmet/all/")
refmetjson <- jsonlite::fromJSON("https://www.metabolomicsworkbench.org/rest/refmet/all_ids/")

df <- purrr::map_dfr(refmetjson, ~ as.data.frame(.x), .id = "id")

if( !("name" %in% colnames(df)) ){
stop("`refmet_name` column not found in the Metabolomics Workbench data dictionary")
Expand All @@ -49,12 +66,11 @@ get_and_validate_mdd <- function(remove_duplications = FALSE){
# CHECK DUPLICATIONS
if(any(duplicated(df$refmet_name))){
duplications <- df[duplicated(df$refmet_name),]
message("Duplicated ids: ", length(duplications$refmet_name))
message("IDS: ", paste(duplications$refmet_name, collapse = ", "))
message("DUPLICATIONS IN REFMET ONLINE (REST VERSION)")
if(verbose) message("Duplicated ids: ", length(duplications$refmet_name))
if(verbose) message("IDS: ", paste(duplications$refmet_name, collapse = ", "))
}

refmet <- subset(df, select = -c(.id))
refmet <- subset(df, select = -c(id))

if(remove_duplications){
# REMOVE DUPLICATIONS
Expand All @@ -77,7 +93,6 @@ get_and_validate_mdd <- function(remove_duplications = FALSE){
validate_refmetname <- function(dataf, verbose){

irm <- 0
idna <- 0
for(i in 1:dim(dataf)[1]){
rn <- dataf$refmet_name[i]

Expand All @@ -100,11 +115,15 @@ validate_refmetname <- function(dataf, verbose){
if(here$refmet_name == "-"){
if(verbose) message(paste0(" (-) `refmet_name` [`", rn, "`] not available in RefMet. Please, contact MW/BIC (Error RN1)"))
irm <- irm + 1
idna <- idna + 1
}else{
if(here$refmet_name != rn){
if(verbose) message(paste0(" (-) `refmet_name` [`", rn, "`] must be modified to the RefMet Standarized name: \"", here$refmet_name, "\" (Error RN2)"))
irm <- irm + 1
}
}
}
if(idna > 0){
if(verbose) message(" (-) Total number of missed ids on MW: ", idna)
if(irm > 0){
if(verbose) message(" (-) Total number of missed ids on MW: ", irm)
}
return(irm)
}
29 changes: 20 additions & 9 deletions R/misc.R
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
#' @import naniar
#' @import progress
#' @import purrr
#' @importFrom readr read_lines
#' @importFrom readr read_lines read_delim
#' @importFrom scales percent
#' @importFrom stats median reorder
#' @import stringr
Expand Down Expand Up @@ -136,8 +136,9 @@ dl_read_gcp <- function(path,
}
# read in the data as a data.table
if(file.exists(new_path)){
dt <- data.table::fread(new_path, sep=sep, header=header,...)
return(dt)
df <- readr::read_delim(new_path, delim = sep, col_names = header, skip_empty_rows = TRUE, show_col_types = FALSE, ...)
df <- as.data.frame(df)
return(df)
}else{
stop("- Problems loading the file. Possible reason: the file does not exist in the bucket anymore. Please, validate the address. Re-run this command again with `verbose = TRUE`)")
}
Expand Down Expand Up @@ -346,12 +347,18 @@ open_file <- function(input_results_folder,
ofile <- NULL
filename <- NULL
}else{
flag <- TRUE

filename <- file_metametabolites[1]
ofile <- read.delim(filename, stringsAsFactors = FALSE, check.names = FALSE)
ofile <- remove_empty_columns(ofile, verbose = verbose)
ofile <- remove_empty_rows(ofile, verbose = verbose)
if(verbose) message(" + (+) File successfully opened")
file_ext <- sub(".*\\.(.*)$", "\\1", filename)
if (!file_ext %in% c("txt", "tsv")) {
if(verbose) message(" - (-) File extension must be .txt or .tsv (only tab delimited files accepted): FAIL")
}else{
ofile <- read.delim(filename, stringsAsFactors = FALSE, check.names = FALSE)
ofile <- remove_empty_columns(ofile, verbose = verbose)
ofile <- remove_empty_rows(ofile, verbose = verbose)
if(verbose) message(" + (+) File successfully opened")
flag <- TRUE
}
}

if(flag){
Expand Down Expand Up @@ -448,10 +455,14 @@ set_phase <- function(input_results_folder,
ignore.case = TRUE,
full.names=TRUE,
recursive = TRUE)

if(length(file_phase) > 1){
if(verbose) message("- (-) `More than one `metadata_phase.txt` file available. Only one is valid (place the valid one in the BATCH folder): FAIL")
}

# To be adjusted if two different batches are provided:
if ( !(purrr::is_empty(file_phase)) ){
phase_details <- readr::read_lines(file_phase, n_max = 1)
phase_details <- readr::read_lines(file_phase[1], n_max = 1)
if ( !(is.na(phase_details) || phase_details == '') ){
if(verbose) message("+ Motrpac phase reported: ", phase_details, " (info from metadata_phase.txt available): OK")

Expand Down
7 changes: 6 additions & 1 deletion R/validations.R
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,12 @@ check_metadata_phase_file <- function(input_results_folder,
if(verbose) message("- (-) `BATCH#_YYYYMMDD/metadata_phase.txt` file does not exist: FAIL")
return(FALSE)
}else{
return(TRUE)
if(length(file_phase) > 1){
if(verbose) message("- (-) `More than one `metadata_phase.txt` file available. Only one is valid (place the valid one in the BATCH folder): FAIL")
return(FALSE)
}else{
return(TRUE)
}
}

}
Expand Down
2 changes: 1 addition & 1 deletion docs/404.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion docs/LICENSE-text.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion docs/articles/index.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions docs/articles/other_functions.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<meta charset="utf-8">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="generator" content="pandoc">
<meta name="date" content="2024-03-04">
<meta name="date" content="2024-03-25">
<title>MotrpacBicQC: Other Functions</title>
<script src="other_functions_files/header-attrs-2.25/header-attrs.js"></script><script src="other_functions_files/jquery-3.6.0/jquery-3.6.0.min.js"></script><meta name="viewport" content="width=device-width, initial-scale=1">
<link href="other_functions_files/bootstrap-3.3.7/css/bootstrap.min.css" rel="stylesheet">
Expand Down Expand Up @@ -123,7 +123,7 @@ <h1 class="title">MotrpacBicQC: Other Functions</h1>

<p class="authors">
</p>
<p class="date"><span class="glyphicon glyphicon-calendar"></span> 2024-03-04</p>
<p class="date"><span class="glyphicon glyphicon-calendar"></span> 2024-03-25</p>



Expand Down
10 changes: 7 additions & 3 deletions docs/articles/qc_metabolomics.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<meta charset="utf-8">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="generator" content="pandoc">
<meta name="date" content="2024-03-04">
<meta name="date" content="2024-03-25">
<title>MotrpacBicQC: Metabolomics QC</title>
<script src="qc_metabolomics_files/header-attrs-2.25/header-attrs.js"></script><script src="qc_metabolomics_files/jquery-3.6.0/jquery-3.6.0.min.js"></script><meta name="viewport" content="width=device-width, initial-scale=1">
<link href="qc_metabolomics_files/bootstrap-3.3.7/css/bootstrap.min.css" rel="stylesheet">
Expand Down Expand Up @@ -126,7 +126,7 @@ <h1 class="title">MotrpacBicQC: Metabolomics QC</h1>

<p class="authors">
</p>
<p class="date"><span class="glyphicon glyphicon-calendar"></span> 2024-03-04</p>
<p class="date"><span class="glyphicon glyphicon-calendar"></span> 2024-03-25</p>



Expand Down Expand Up @@ -203,7 +203,11 @@ <h2 id="usage">Usage<a class="anchor" aria-label="anchor" href="#usage"></a>
<pre><code><span><span class="co">## + (+) `metabolite_name` OK</span></span></code></pre>
<pre><code><span><span class="co">## + (+) `refmet_name` unique values: OK</span></span></code></pre>
<pre><code><span><span class="co">## + Validating `refmet_name` (it might take some time)</span></span></code></pre>
<pre><code><span><span class="co">## + (+) `refmet_name` ids found in refmet: OK</span></span></code></pre>
<pre><code><span><span class="co">## (-) `refmet_name` [`Leucine/Isoleucine`] must be modified to the RefMet Standarized name: "Leucine" (Error RN2)</span></span></code></pre>
<pre><code><span><span class="co">## (-) `refmet_name` [`Oxoglutaric acid`] must be modified to the RefMet Standarized name: "2-Oxoglutaric acid" (Error RN2)</span></span></code></pre>
<pre><code><span><span class="co">## (-) `refmet_name` [`Citric acid/Isocitric acid`] must be modified to the RefMet Standarized name: "Citric acid" (Error RN2)</span></span></code></pre>
<pre><code><span><span class="co">## (-) Total number of missed ids on MW: 3</span></span></code></pre>
<pre><code><span><span class="co">## - (-) SUMMARY: 3 `refmet_name` not found in RefMet Metabolomics Data Dictionary: FAIL</span></span></code></pre>
<pre><code><span><span class="co">## + (+) {rt} all numeric: OK</span></span></code></pre>
<pre><code><span><span class="co">## + (+) {mz} all numeric: OK</span></span></code></pre>
<pre><code><span><span class="co">## + (+) {`neutral_mass`} all numeric values OK</span></span></code></pre>
Expand Down
4 changes: 2 additions & 2 deletions docs/articles/qc_olink.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<meta charset="utf-8">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="generator" content="pandoc">
<meta name="date" content="2024-03-04">
<meta name="date" content="2024-03-25">
<title>MotrpacBicQC: OLINK QC</title>
<script src="qc_olink_files/header-attrs-2.25/header-attrs.js"></script><script src="qc_olink_files/jquery-3.6.0/jquery-3.6.0.min.js"></script><meta name="viewport" content="width=device-width, initial-scale=1">
<link href="qc_olink_files/bootstrap-3.3.7/css/bootstrap.min.css" rel="stylesheet">
Expand Down Expand Up @@ -125,7 +125,7 @@ <h1 class="title">MotrpacBicQC: OLINK QC</h1>

<p class="authors">
</p>
<p class="date"><span class="glyphicon glyphicon-calendar"></span> 2024-03-04</p>
<p class="date"><span class="glyphicon glyphicon-calendar"></span> 2024-03-25</p>



Expand Down
4 changes: 2 additions & 2 deletions docs/articles/qc_proteomics.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<meta charset="utf-8">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="generator" content="pandoc">
<meta name="date" content="2024-03-04">
<meta name="date" content="2024-03-25">
<title>MotrpacBicQC: Proteomics QC</title>
<script src="qc_proteomics_files/header-attrs-2.25/header-attrs.js"></script><script src="qc_proteomics_files/jquery-3.6.0/jquery-3.6.0.min.js"></script><meta name="viewport" content="width=device-width, initial-scale=1">
<link href="qc_proteomics_files/bootstrap-3.3.7/css/bootstrap.min.css" rel="stylesheet">
Expand Down Expand Up @@ -125,7 +125,7 @@ <h1 class="title">MotrpacBicQC: Proteomics QC</h1>

<p class="authors">
</p>
<p class="date"><span class="glyphicon glyphicon-calendar"></span> 2024-03-04</p>
<p class="date"><span class="glyphicon glyphicon-calendar"></span> 2024-03-25</p>



Expand Down
6 changes: 3 additions & 3 deletions docs/authors.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion docs/index.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 3b50656

Please sign in to comment.