Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check coverage across webchem #264

Merged
merged 43 commits into from
Jul 8, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
edeb0ac
added from arguments to function with match.arg()
Aariq Jun 7, 2020
7a7d7cb
changed "type" argument to "from" for consistency
Aariq Jun 9, 2020
8a853c4
added "query" and "from" arguments
Aariq Jun 9, 2020
94a4994
Made possible "from" arguments more consistent
Aariq Jun 9, 2020
eaffafc
added "cas" as a possible value for "from" and fixed #255 issue with …
Aariq Jun 9, 2020
af8a545
update tests and news
Aariq Jun 9, 2020
1f8e674
added case-insensitive arg matching
Aariq Jun 9, 2020
33852b5
made arguments case-insenstivie
Aariq Jun 9, 2020
eba73cf
added autotranslate function draft
Aariq Jun 9, 2020
ed97036
re-built documentation
Aariq Jun 9, 2020
c67d074
added dots to absorb unused arguments
Aariq Jun 9, 2020
c5e15a8
uniformity of outputs
Aariq Jun 9, 2020
b51ab91
added check_coverage() function
Aariq Jun 9, 2020
aa1f014
updated docs
Aariq Jun 9, 2020
b4db84b
added plot output (for fun and testing). need to eventually change t…
Aariq Jun 11, 2020
1864fa3
added additional translators
Aariq Jun 11, 2020
859f9b7
removed additional translators because they are never used, I think.
Aariq Jun 11, 2020
924af45
moved to separate r file
Aariq Jun 11, 2020
ace8aca
added tests
Aariq Jun 11, 2020
18ee4f5
fixed another skipped test #255
Aariq Jun 11, 2020
91471b2
finishing touches
Aariq Jun 11, 2020
18ab9b9
added Suggests to make plotting work.
Aariq Jun 11, 2020
52b0913
I don't know why these tests are failing. They work when run manually…
Aariq Jun 11, 2020
e910965
fixed a bug in chooser() utility I didn't know existed
Aariq Jun 11, 2020
b3f7a7c
added deprecated arguments to documentation
Aariq Jun 11, 2020
fd220c8
REALLY fixed bug in chooser()
Aariq Jun 11, 2020
bd97009
switched NA to NA_character_
Aariq Jun 12, 2020
b8f9289
Revert "switched NA to NA_character_"
Aariq Jun 12, 2020
6e24d73
fixed merge
Aariq Jun 16, 2020
14ccbd2
changed function name from check_coverage to has_entry
Aariq Jun 19, 2020
398a504
pass NAs through
Aariq Jun 19, 2020
86efab9
switched to base R plotting
Aariq Jun 19, 2020
a41f299
small fixes
Aariq Jun 19, 2020
d0defe7
updated `pc_synonyms` `cts_convert`, and `cir_query` to use `match` i…
Aariq Jun 22, 2020
4deeec3
changed get_etoxid to use matcher() internally. Added warning for ma…
Aariq Jun 23, 2020
f6aa772
changed default match = . updated tests
Aariq Jun 23, 2020
b83aa1a
fix tests
Aariq Jun 23, 2020
7c3a889
addressing review of PR
Aariq Jun 27, 2020
e29b476
add skips for integration function tests.
Aariq Jun 27, 2020
76c32a8
change pan example, update a test.
Aariq Jul 2, 2020
99b5284
lintr suggestions
Aariq Jul 3, 2020
fdf6547
change autotranslate to with_cts and make unexported. Re-run documen…
Aariq Jul 7, 2020
19ba9d1
changed function name from has_entry to find_db. Other minor changes…
Aariq Jul 8, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Description: Chemical information from around the web. This package interacts
Flavornet, NIST Chemistry WebBook, OPSIN, PAN Pesticide Database, PubChem,
SRS, Wikidata.
Type: Package
Version: 1.0.0
Version: 1.0.0.9000
Date: 2020-05-27
License: MIT + file LICENSE
URL: https://docs.ropensci.org/webchem, https://github.com/ropensci/webchem
Expand Down Expand Up @@ -39,13 +39,15 @@ Imports:
purrr,
data.tree,
tibble,
base64enc
base64enc,
rlang
Suggests:
testthat,
rcdk,
covr,
robotstxt,
knitr,
rmarkdown
RoxygenNote: 7.1.0
rmarkdown,
plot.matrix
RoxygenNote: 7.1.1
VignetteBuilder: knitr
6 changes: 6 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

S3method(cas,aw_query)
S3method(cas,chebi_comp_entity)
S3method(cas,ci_query)
S3method(cas,cts_compinfo)
S3method(cas,default)
S3method(cas,etox_basic)
Expand All @@ -10,6 +11,7 @@ S3method(cas,pan_query)
S3method(cas,wd_ident)
S3method(inchikey,aw_query)
S3method(inchikey,chebi_comp_entity)
S3method(inchikey,ci_query)
S3method(inchikey,cts_compinfo)
S3method(inchikey,default)
S3method(inchikey,etox_basic)
Expand All @@ -19,6 +21,7 @@ S3method(inchikey,pc_prop)
S3method(inchikey,wd_ident)
S3method(smiles,aw_query)
S3method(smiles,chebi_comp_entity)
S3method(smiles,ci_query)
S3method(smiles,cts_compinfo)
S3method(smiles,default)
S3method(smiles,etox_basic)
Expand Down Expand Up @@ -50,6 +53,7 @@ export(cts_to)
export(etox_basic)
export(etox_targets)
export(etox_tests)
export(find_db)
export(fn_percept)
export(get_chebiid)
export(get_cid)
Expand Down Expand Up @@ -106,6 +110,8 @@ importFrom(purrr,map)
importFrom(purrr,map2)
importFrom(purrr,map_df)
importFrom(purrr,map_dfr)
importFrom(rlang,as_function)
importFrom(rlang,fn_fmls)
importFrom(rvest,html_table)
importFrom(stats,rexp)
importFrom(stats,rgamma)
Expand Down
11 changes: 10 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,16 @@

## NEW FEATURES
Aariq marked this conversation as resolved.
Show resolved Hide resolved

* Download images of substances from ChemSpider with cs_img()
* Download images of substances from ChemSpider with `cs_img()`
* `find_db()` checks if a query gets a hit in most databases integrated in webchem. Useful for deciding which of several databases to focus on given a set of chemicals.

## MINOR IMPROVEMENTS

* The `"type"` argument in `ci_query()` and `aw_query()` has been changed to `"from"` for consistency with other functions
* `fn_percept()` and `cts_compinfo()` now have `"query"` and `"from"` arguments for consistency with other functions
* Possible values for `"from"` have been made more consistent across functions
* `pc_synonyms()`, `cts_convert()`, `cir_query()` have been changed to use the `match` argument instead of `choices` for consistency with other functions
* `get_etoxid()` output changed slightly so that the matched chemical name string no longer includes the etoxid in parentheses.

# webchem 1.0.0

Expand Down
40 changes: 27 additions & 13 deletions R/alanwood.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,16 @@
#' @importFrom stats rgamma
#'
#' @param query character; search string
#' @param type character; type of input ('cas' or 'commonname')
#' @param from character; type of input ('cas' or 'name')
#' @param verbose logical; print message during processing to console?
#' @param force_build logical; force building a new index? See
#' \code{\link{build_aw_idx}} for more details.
#' @param ... currently unused.
#' @param type deprecated
#' @return A list of eight entries: common-name, status, preferred IUPAC Name,
#' IUPAC Name, cas, formula, activity, subactivity, inchikey, inchi and source
#' url.
#' @note for type = 'cas' only the first matched link is returned.
#' @note for from = 'cas' only the first matched link is returned.
#' Please respect Copyright, Terms and Conditions
#' \url{http://www.alanwood.net/pesticides/legal.html}!
#' @references Eduard Szöcs, Tamás Stirling, Eric R. Scott, Andreas Scharmüller,
Expand All @@ -24,30 +26,42 @@
#' @export
#' @examples
#' \dontrun{
#' aw_query('Fluazinam', type = 'commonname')
#' out <- aw_query(c('Fluazinam', 'Diclofop'), type = 'com')
#' aw_query('Fluazinam', from = 'name')
#' out <- aw_query(c('Fluazinam', 'Diclofop'), from = 'name')
#' out
#' # extract subactivity from object
#' sapply(out, function(y) y$subactivity[1])
#'
#' # use CAS-numbers
#' aw_query("79622-59-6", type = 'cas')
#' aw_query("79622-59-6", from = 'cas')
#' }
#' @seealso \code{\link{build_aw_idx}}
aw_query <- function(query, type = c("commonname", "cas"), verbose = TRUE,
force_build = FALSE) {
aw_idx <- build_aw_idx(verbose = FALSE, force_build)
foo <- function(query, type = c("commonname", "cas"), verbose) {

aw_query <- function(query, from = c("name", "cas"), verbose = TRUE,
force_build = FALSE, type, ...) {
if (!missing(type)) {
message('"type" is deprecated. Please use "from" instead. ')
from <- type
}

if ("commonname" %in% from) {
warning('To search by compound name use "name" instead of "commonname"')
from <- "name"
}
from <- match.arg(from)
aw_idx <- build_aw_idx(verbose, force_build)

foo <- function(query, from, verbose) {
on.exit(suppressWarnings(closeAllConnections()))
type <- match.arg(type)

# search links in indexes
if (type == "commonname") {
if (from == "name") {
links <- aw_idx$links[aw_idx$source == "cn"]
names <- aw_idx$linknames[aw_idx$source == "cn"]
cname <- query
}

if (type == "cas") {
if (from == "cas") {
names <- aw_idx$names[aw_idx$source == "rn"]
# select only first link
links <- aw_idx$links[aw_idx$source == "rn"]
Expand Down Expand Up @@ -131,7 +145,7 @@ aw_query <- function(query, type = c("commonname", "cas"), verbose = TRUE,
source_url = source_url)
return(out)
}
out <- lapply(query, function(x) foo(x, type = type, verbose = verbose))
out <- lapply(query, function(x) foo(x, from = from, verbose = verbose))
out <- setNames(out, query)
class(out) <- c("aw_query", "list")
return(out)
Expand Down
110 changes: 62 additions & 48 deletions R/chebi.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,24 +9,26 @@
#' @importFrom stats setNames
#'
#' @param query character; search term.
#' @param from character; type of input, can be one of 'ALL', 'CHEBI ID',
#' 'CHEBI NAME', 'DEFINITION', 'ALL NAMES', 'IUPAC NAME', 'CITATIONS',
#' 'REGISTRY NUMBERS', 'MANUAL XREFS', 'AUTOMATIC XREFS', 'FORMULA', 'MASS',
#' 'MONOISOTOPIC MASS', 'CHARGE', 'INCHI/INCHI KEY', 'SMILES', 'SPECIES'.
#' @param match character; How should multiple hits be handled?,
#' \code{"all"} all matches are returned,
#' \code{"best"} the best matching (by the ChEBI searchscore) is returned,
#' \code{"ask"} enters an interactive mode and the user is asked for input,
#' \code{"na"} returns NA if multiple hits are found.
#' @param from character; type of input. \code{"all"} searches all types and
#' \code{"name"} searches all names. Other options include \code{'chebi id'},
#' \code{'chebi name'}, \code{'definition'}, \code{'iupac name'},
#' \code{'citations'}, \code{'registry numbers'}, \code{'manual xrefs'},
#' \code{'automatic xrefs'}, \code{'formula'}, \code{'mass'},
#' \code{'monoisotopic mass'},\code{'charge'}, \code{'inchi'},
#' \code{'inchikey'}, \code{'smiles'}, and \code{'species'}
#' @param match character; How should multiple hits be handled?, \code{"all"}
#' all matches are returned, \code{"best"} the best matching (by the ChEBI
#' searchscore) is returned, \code{"ask"} enters an interactive mode and the
#' user is asked for input, \code{"na"} returns NA if multiple hits are found.
#' @param max_res integer; maximum number of results to be retrieved from the
#' web service
#' @param stars character; type of input can be one of 'ALL', 'TWO ONLY',
#' 'THREE ONLY'.
#' web service
#' @param stars character; "three only" restricts results to those manualy
#' annotated by the ChEBI team.
#' @param verbose logical; should a verbose output be printed on the console?
#' @param ... optional arguments
#' @param ... currently unused
#' @return returns a list of data.frames containing a chebiid, a chebiasciiname,
#' a searchscore and stars if matches were found.
#' If not, data.frame(NA) is returned
#' a searchscore and stars if matches were found. If not, data.frame(NA) is
#' returned
#'
#' @references Hastings J, Owen G, Dekker A, Ennis M, Kale N, Muthukrishnan V,
#' Turner S, Swainston N, Mendes P, Steinbeck C. (2016). ChEBI in 2016:
Expand All @@ -50,9 +52,9 @@
#' ChEBI: a database and ontology for chemical entities of biological
#' interest. Nucleic Acids Res. 36, D344–D350.
#' @references Eduard Szöcs, Tamás Stirling, Eric R. Scott, Andreas Scharmüller,
#' Ralf B. Schäfer (2020). webchem: An R Package to Retrieve Chemical
#' Information from the Web. Journal of Statistical Software, 93(13).
#' <doi:10.18637/jss.v093.i13>.
#' Ralf B. Schäfer (2020). webchem: An R Package to Retrieve Chemical
#' Information from the Web. Journal of Statistical Software, 93(13).
#' <doi:10.18637/jss.v093.i13>.
#' @author Andreas Scharmüller, \email{andschar@@protonmail.com}
#' @export
#' @examples
Expand All @@ -67,24 +69,30 @@
#'
#' }
get_chebiid <- function(query,
from = 'ALL',
match = c("all", "best", "ask", "na"),
from = c('all', 'chebi id', 'chebi name', 'definition', 'name',
'iupac name', 'citations', 'registry numbers', 'manual xrefs',
'automatic xrefs', 'formula', 'mass', 'monoisotopic mass',
'charge', 'inchi', 'inchikey', 'smiles', 'species'),
match = c("all", "best", "first", "ask", "na"),
max_res = 200,
stars = 'ALL',
stars = c('all', 'two only', 'three only'),
verbose = TRUE,
...) {
match <- match.arg(match)
foo <- function(query, match, from, max_res, stars, verbose, ...) {
if (is.na(query)) return(data.frame(chebiid = NA_character_,
query = NA_character_,
stringsAsFactors = FALSE))
from_all <- c('ALL', 'CHEBI ID', 'CHEBI NAME', 'DEFINITION', 'ALL NAMES',
'IUPAC NAME', 'CITATIONS', 'REGISTRY NUMBERS', 'MANUAL XREFS',
'AUTOMATIC XREFS', 'FORMULA', 'MASS', 'MONOISOTOPIC MASS',
'CHARGE', 'INCHI/INCHI KEY', 'SMILES', 'SPECIES')
from <- match.arg(from, from_all)
stars_all <- c('ALL', 'TWO ONLY', 'THREE ONLY')
stars <- match.arg(stars, stars_all)
from <- toupper(match.arg(from))
if (from == "NAME") {
from <- "ALL NAMES"
}
if (from == "inchi" | from == "inchikey") {
from <- "INCHI/INCHI KEY"
}
stitam marked this conversation as resolved.
Show resolved Hide resolved

stars <- toupper(match.arg(stars))

foo <- function(query, from, match, max_res, stars, verbose, ...) {
if (is.na(query)) return(tibble("query" = NA_character_,
"chebiid" = NA_character_))

# query
url <- 'http://www.ebi.ac.uk:80/webservices/chebi/2.0/webservice'
headers <- c(Accept = 'text/xml',
Expand Down Expand Up @@ -115,12 +123,11 @@ get_chebiid <- function(query,
cont <- try(content(res, type = 'text/xml', encoding = 'utf-8'),
silent = TRUE)
out <- l2df(as_list(xml_children(xml_find_first(cont, '//d1:return'))))
out <- setNames(out, tolower(names(out)))
out <- as_tibble(setNames(out, tolower(names(out))))
if (nrow(out) == 0) {
message('No result found. \n')
return(data.frame(chebiid = NA_character_,
query = query,
stringsAsFactors = FALSE))
return(tibble("query" = query,
"chebiid" = NA_character_))
}
if (nrow(out) > 0) out$query <- query
if (nrow(out) == 1) return(out)
Expand All @@ -134,33 +141,40 @@ get_chebiid <- function(query,
return(out[which.max(out$searchscore), ])
}
if (match == "ask") {
matched <- chooser(out$chebiid, 'all')
matched <-
matcher(
out$chebiid,
query = query,
result = out$chebiasciiname,
match = "ask",
verbose = verbose
)
return(out[out$chebiid == matched, ])
}
if (match == 'na') {
return(data.frame(chebiid = NA_character_,
query = query,
stringsAsFactors = FALSE))
if (match == "na") {
return(tibble("query" = query,
"chebiid" = NA_character_))
}
if (match == "first") {
return(out[1, ])
}
} else {
out <- data.frame(chebiid = NA_character_,
query = query,
stringsAsFactors = FALSE)
out <- tibble("query" = query,
"chebiid" = NA_character_)
message('Returning NA (', http_status(res)$message, '). \n')

return(out)
}
}
out <- lapply(query,
foo,
match = match,
from = from,
match = match,
max_res = max_res,
stars = stars,
verbose = verbose)
out <- setNames(out, query)
out <- as_tibble(bind_rows(out))
return(out)
out <- bind_rows(out)
return(dplyr::select(out, "query", "chebiid", everything()))
}


Expand Down
Loading