Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sample data script and corresponding .rda files #8

Merged
merged 7 commits into from
Mar 29, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,6 @@ Config/testthat/edition: 3
URL: https://github.com/Infectious-Disease-Modeling-Hubs/hubExamples
BugReports: https://github.com/Infectious-Disease-Modeling-Hubs/hubExamples/issues
Config/Needs/website: Infectious-Disease-Modeling-Hubs/hubStyle
Depends:
R (>= 2.10)
LazyData: true
65 changes: 65 additions & 0 deletions R/data.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
#' Forecast outputs
#'
#' Example forecast data that represents model outputs from a hub (in this case, forecast data
#' represents three influenza-related targets (wk inc flu hosp, wk flu hops rate category,
#' and wk flu hosp tate) for two reference dates in 2022.
#'
#' @format ## `forecast_outputs`
#' A data frame with 5,424 rows and 9 columns:
#' \describe{
#' \item{location}{FIPS code identifying a location}
#' \item{reference_date}{the starting point of the forecast in yyyy-mm-dd format}
#' \item{horizon}{number of units ahead being forecasted (weeks, in this case)}
#' \item{target_end_date}{the date of occurrence of the outcome of interest in yyyy-mm-dd format;
#' this can be calculated directly from the `reference_date` and `horizon`
#' as follows: `target_end_date = reference_date + 7*horizon`}
#' \item{target}{a unique identifier for the target}
#' \item{output_type}{the type of representation of the prediction}
#' \item{output_type_id}{more identifying information specific to the output type;
#' output_type_id is not relevant for every kind of output_type (for example,
#' hubs will not expect output_type_id values when the output_type is mean or median}
#' \item{value}{the model’s prediction}
#' \item{model_id}{the name of the model}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elray1 question about model_id: is that a column we'd expect to see in a hub's model output data? I thought we derived it from the filename.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is true that when the data are sitting in a hub, the model_id is encoded in the file name. But when we collect the data into a data frame in a working R (or in the future, python) session, the model_id is added into the data. And the intent of this example object is to represent what a user might get after running collect_hub(). (Maybe we should say that in this documentation.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense--thank you for that clarification. Just pushed a commit with that note.

#' ...
#' }
#' @source <https://github.com/Infectious-Disease-Modeling-Hubs/example-complex-forecast-hub/>
"forecast_outputs"

#' Forecast target time series
#'
#' Example time series target data from a hub that predicts influenza-related targets.
#'
#' @format ## `forecast_target_ts`
#' A data frame with 10,255 rows and 3 columns:
#' \describe{
#' \item{date}{the date of the target observation in yyyy-mm-dd format}
#' \item{location}{FIPS code identifying a location}
#' \item{value}{the value of the target's observations}
#' ...
#' }
#' @source <https://github.com/Infectious-Disease-Modeling-Hubs/example-complex-forecast-hub/>
"forecast_target_ts"

#' Forecast target values
#'
#' Example target data that represents the source of "truth" that model output data
#' will be scored against. This example represents influenza-related targets.
#'
#' @format ## `forecast_target_values`
#' A data frame with 198,485 rows and 6 columns:
#' \describe{
#' \item{location}{FIPS code identifying a location}
#' \item{target_end_date}{the target's obversation date in yyyy-mm-dd format;
#' this is used to match on the `target_end_date` field in model output data
#' submitted to the hub}
#' \item{target}{a unique identifier for the target}
#' \item{output_type}{the type of representation of the prediction}
#' \item{output_type_id}{more identifying information specific to the output type;
#' as in the model output data, output_type_id is not relevant for output_type
#' of mean and median; target data that represents quantile output_type will
#' not have an output_type_id.}
#' \item{value}{the value of the target's observations}
#' ...
#' }
#' @source <https://github.com/Infectious-Disease-Modeling-Hubs/example-complex-forecast-hub/>
"forecast_target_values"

Check warning on line 65 in R/data.R

View workflow job for this annotation

GitHub Actions / lint

file=R/data.R,line=65,col=25,[trailing_blank_lines_linter] Missing terminal newline.
35 changes: 35 additions & 0 deletions data-raw/generate_example_forecast_data.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
## code to prepare `forecast_outputss` dataset


library(distfromq)
library(dplyr)
library(ggplot2)
library(hubData)
library(readr)

hub_path <- "~/code/example-complex-forecast-hub"
forecast_outputs <- hubData::connect_hub(hub_path) |>
dplyr::collect()

q_lvls_keep <- c("0.05", "0.1", "0.25", "0.5", "0.75", "0.9", "0.95")
d_keep <- c("2022-11-19", "2022-12-17")
forecast_outputs <- forecast_outputs |>
dplyr::filter(
location %in% c("25", "48"),
(output_type != "quantile" |
(output_type == "quantile" & output_type_id %in% q_lvls_keep)
),
reference_date %in% d_keep
)

target_ts_data_path <- file.path(hub_path, "target-data", "time-series.csv")
forecast_target_ts <- read_csv(target_ts_data_path) |>
as.data.frame()

target_values_data_path <- file.path(hub_path, "target-data", "target-values.csv")
forecast_target_values <- read_csv(target_values_data_path) |>
as.data.frame()

usethis::use_data(forecast_outputs, overwrite = TRUE)
usethis::use_data(forecast_target_ts, overwrite = TRUE)
usethis::use_data(forecast_target_values, overwrite = TRUE)
Binary file added data/forecast_outputs.rda
Binary file not shown.
Binary file added data/forecast_target_ts.rda
Binary file not shown.
Binary file added data/forecast_target_values.rda
Binary file not shown.
Loading