Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check coverage across webchem #264

Merged
merged 43 commits into from
Jul 8, 2020
Merged

check coverage across webchem #264

merged 43 commits into from
Jul 8, 2020

Conversation

Aariq
Copy link
Collaborator

@Aariq Aariq commented Jun 11, 2020

This adds two functions that integrate multiple data sources in webchem. autotranslate() is a wrapper that you can give a query of any type and the name of a webchem function and it will attempt to translate the query into an identifier the function accepts.

check_coverage() runs a get_*() or *_query() from most of the data sources in webchem and returns a tibble indicating which compounds got hits in each data source.

Currently it also has the option to plot the results in a sort of table/heatmap, but the plot is made with ggplot2. I need help translating that code to base R, because it doesn't make sense to add ggplot2 as a dependency at this time.

example

minor changes

To make all this happen, I fixed some bugs and made the from= argument more consistent across the package. (closes #262 and #263)

PR task list:

  • Update NEWS
  • Add tests (if appropriate)
  • Update documentation with devtools::document()
  • Check package passed

@Aariq
Copy link
Collaborator Author

Aariq commented Jun 12, 2020

I should also note that these functions rely on CTS, which currently just sometimes fail for no apparent reason. Maybe switching to the REST API will help (#257)?

Copy link
Contributor

@stitam stitam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Aariq. It's quite a big one, I guess I'll review it in chunks. So far I just tried the check_coverage() function and it's great! I was wondering if we could find a better name for it, I tend to associate the term "coverage" with test coverage. Just some ideas: since it returns booleans, instead of a command like get() or check(), it could be a question, e.g. is_included(). Or if we go with the command, maybe lookup() to associate more with looking it up in the register?

Do you imagine this function to be exported on the long term? If yes, then plotting is a good idea. I personally imagined this function to be used by other integrator functions and maybe not even exported at some point, in that case plotting is not really necessary. I'll look into the base graphics, I agree, it would be better not to import another package.

@Aariq
Copy link
Collaborator Author

Aariq commented Jun 16, 2020

As for the name, I'm not tied to check_coverage(). I like lookup() or perhaps has_entry()?

Do you imagine this function to be exported on the long term?

I wrote this intending it to be exported and used directly by a user with a list of compounds wanting to get some common bit of data (e.g. molecular weight), but without an idea of which of several data sources to use. But I'm actually not sure how useful this is, which is why I asked about it on Slack. Perhaps this is just a rough sketch of some future function that integrates across data sources?

@Aariq
Copy link
Collaborator Author

Aariq commented Jun 19, 2020

How about using the plot.matrix package instead of ggplot2 as a compromise? It is a much lighter weight, single purpose package, and I could add it as a Suggests rather than a Imports and prompt the user to install only if they want to plot the results.

@Aariq
Copy link
Collaborator Author

Aariq commented Jun 22, 2020

The tests are failing because CTS just fails randomly sometimes. Not sure if I should include some kind of workaround for this in the autotranslate() function or just ignore the failing tests and hope that CTS gets fixed sometime?

@Aariq
Copy link
Collaborator Author

Aariq commented Jun 23, 2020

I went ahead and addressed #263 since it seemed in the spirit of the PR.

Copy link
Contributor

@stitam stitam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Aariq here is the next chunk:) thanks for updating the name for has_entry() I think it better reflects to purpose of the function. Also data.matrix, I agree that a lightweight package is better for our needs. I hope it will work in Suggests but if not I think we can live with it in Imports as well.

NEWS.md Show resolved Hide resolved
R/alanwood.R Outdated Show resolved Hide resolved
R/alanwood.R Outdated Show resolved Hide resolved
tests/testthat/test-alanwood.R Show resolved Hide resolved
R/chebi.R Outdated Show resolved Hide resolved
R/chebi.R Show resolved Hide resolved
R/chebi.R Outdated Show resolved Hide resolved
R/chemspider.R Outdated Show resolved Hide resolved
R/chemspider.R Outdated Show resolved Hide resolved
R/cts.R Outdated Show resolved Hide resolved
Copy link
Contributor

@stitam stitam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Aariq, added a few more comments. I have reviewed everything but integration.R, I'll review it in another session. Thanks for harmonizing the syntax I think this will make webchem much easier to use!

R/etox.R Show resolved Hide resolved
R/pan.R Show resolved Hide resolved
tests/testthat/test-pubchem.R Show resolved Hide resolved
Copy link
Contributor

@stitam stitam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few thoughts for autotranlate(). Thanks for introducing rlang, great stuff!

R/integration.R Outdated Show resolved Hide resolved
R/integration.R Outdated Show resolved Hide resolved
R/integration.R Outdated Show resolved Hide resolved
R/integration.R Show resolved Hide resolved
Copy link
Contributor

@stitam stitam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is almost ready to be merged! I like the functionality of has_entry() and I think the name is OK, but maybe we could think a little more about that (it's going to stay with us for quite a while). I contemplated the benefit of autotranslate() for a long time. I understand it's necessary for has_enrty() and that's fine, but I don't think it should be exported.

R/integration.R Outdated Show resolved Hide resolved
R/integration.R Outdated Show resolved Hide resolved
R/integration.R Show resolved Hide resolved
R/integration.R Outdated Show resolved Hide resolved
@stitam stitam merged commit f33a964 into ropensci:master Jul 8, 2020
@stitam stitam added this to the RC2019F milestone Sep 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

standardize "from"
2 participants