Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sample data script and corresponding .rda files #8

Merged
merged 7 commits into from
Mar 29, 2024

Conversation

bsweger
Copy link
Contributor

@bsweger bsweger commented Mar 27, 2024

Resolves #4.

This changeset generates sample data files using data from the complex-example-forecast-hub. For this first pass, the samples are generated from local files (i.e., a cloned copy of complex-example-forcast-hub).

Once complex-example-forecast-hub is fully onboarded to the cloud, we'll pull from there instead.

Resolves #4. This changeset generates sample data files using
data from the complex-example-forecast-hub. For this first pass,
the samples are generated from local files (i.e., a cloned copy
of complex-example-forcast-hub). Once complex-example-forecast-hub
is fully onboarded to the cloud, we'll pull from there instead.
@bsweger bsweger requested a review from elray1 March 27, 2024 20:23
@elray1
Copy link
Contributor

elray1 commented Mar 27, 2024

R CMD check is demanding that we document our data sets. I think we could pull from here, with some light adaptations. Should I do that, @bsweger, or would you like to?

@bsweger
Copy link
Contributor Author

bsweger commented Mar 27, 2024

R CMD check is demanding that we document our data sets. I think we could pull from here, with some light adaptations. Should I do that, @bsweger, or would you like to?

I can take a stab at it, but you will likely have some wording revisions. Does that information go at the top of generate_example_forecast_data.R?

@elray1
Copy link
Contributor

elray1 commented Mar 27, 2024

It should go in a file called data.R in an R directory in this repository. Here's the description of how to document data sets in the R Packages book: https://r-pkgs.org/data.html#sec-documenting-data

These are descriptions are based on similar ones here:
https://github.com/Infectious-Disease-Modeling-Hubs/hubEnsembles/blob/main/R/data.R

They will likely require some edits, but this should be a reasonable start.
Copy link
Contributor

@elray1 elray1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding an actual/formal review to request 2 changes:

  • Add documentation for the data objects. As (partially) discussed in some misc. comments, this will involve adding a data.R file that documents the data objects and also running devtools::document() to add the documentation to the R package.
  • Since we did this, we've decided on two other minor updates to the target values data format. It would be nice to get those updates in this PR as well. Those changes would be made first in the example-complex-forecast-hub repository with the source data, see this issue.

@bsweger
Copy link
Contributor Author

bsweger commented Mar 29, 2024

  • umentation for the data objects. As (partially) discussed in some misc. comments, this will involve adding a data.R file that documents the data objects and also running devtools::document() to add the documentation to the R package.

  • Since we did this, we've decided on two other minor updates to the target values data format. It would be nice to get those updates in this PR as well. Those changes would be made first in the example-complex-forecast-hub repository with the source data, see this issue.

@elray1 just pushed a commit for the first note, and I'm expecting to do some updates on that documentation based on your feedback

For the second issue, in the spirit of higher velocity and/orsmaller changes, my vote is to get this PR merged and then tackle the issue you linked as a separate piece of work (which I'm happy to do as a fast follow)

It's okay to change our minds/evolve, but I don't like keeping PRs open for long periods of time when we do so. It likely doesn't matter for hubExample, but if this repo was under active development with many devs, lingering PRs increase the liklihood of merge conflicts and other annoyances.

#' output_type_id is not relevant for every kind of output_type (for example,
#' hubs will not expect output_type_id values when the output_type is mean or median}
#' \item{value}{the model’s prediction}
#' \item{model_id}{the name of the model}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elray1 question about model_id: is that a column we'd expect to see in a hub's model output data? I thought we derived it from the filename.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is true that when the data are sitting in a hub, the model_id is encoded in the file name. But when we collect the data into a data frame in a working R (or in the future, python) session, the model_id is added into the data. And the intent of this example object is to represent what a user might get after running collect_hub(). (Maybe we should say that in this documentation.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense--thank you for that clarification. Just pushed a commit with that note.

@elray1
Copy link
Contributor

elray1 commented Mar 29, 2024

I'm approving this PR, with these notes:

@bsweger I'll leave it to you to do something about the first point here or not and then merge this?

bsweger added 3 commits March 29, 2024 14:01
Rightly or wrongly, we've decided that this repo doesn't need
tests. However, the lack of them is causing a CI failure.
This commit is to see what happens if we just remove the tests
directory.
Copy link

codecov bot commented Mar 29, 2024

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

@bsweger bsweger merged commit 431fa24 into main Mar 29, 2024
7 checks passed
@bsweger bsweger deleted the bsweger/create-sample-hub-data branch March 29, 2024 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Set up and document example data objects based on example-complex-forecast-hub
2 participants