Skip to content

Commit

Permalink
adds vignettes for documenting data and publishing functions (drafts)
Browse files Browse the repository at this point in the history
  • Loading branch information
francojc committed Dec 7, 2024
1 parent e646432 commit b64c179
Show file tree
Hide file tree
Showing 10 changed files with 71 additions and 38 deletions.
2 changes: 1 addition & 1 deletion .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
^LICENSE\.md$
^pkgdown$
^README\.Rmd$
^vignettes/articles$
^vignettes/artifacts$
^\.aider\.tags\.cache\.v3$
^\.aider\.chat\.history\.md$
^\.aider\.input\.history$
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,5 @@ Library/
.lintr
.aider*
.env
/doc/
/Meta/
11 changes: 10 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,17 @@
# qtkit 1.1.0

- Bug fix (#7): `calc_type_metrics()` now correctly allows for the `type` and `document` arguments to be specified as symbols that can take values other than type and document.
New features:

- Adds `curate_enntt_data()` to curate the ENNTT data downloaded from GitHub [here](https://github.com/senisioi/enntt-release).
- Adds `curate_swda_data()` to curate the SWDA data downloaded from the LDC [here](https://catalog.ldc.upenn.edu/docs/LDC97S62/swb1_dialogact_annot.tar.gz).

Bug fixes:

- Bug fix (#7): `calc_type_metrics()` now correctly allows for the `type` and `document` arguments to be specified as symbols that can take values other than type and document.

Enhancements:

- Adds vignettes for documenting data and using the publishing functions.
- Removes many external dependencies from the package.

# qtkit 1.0.0
Expand Down
6 changes: 6 additions & 0 deletions vignettes/artifacts/mtcars_4cyl
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
structure(list(mpg = c(22.8, 24.4, 22.8, 32.4, 30.4, 33.9, 21.5,
27.3, 26, 30.4, 21.4), wt = c(2.32, 3.19, 3.15, 2.2, 1.615, 1.835,
2.465, 1.935, 2.14, 1.513, 2.78), hp = c(93, 62, 95, 66, 52,
65, 97, 66, 91, 113, 109)), row.names = c("Datsun 710", "Merc 240D",
"Merc 230", "Fiat 128", "Honda Civic", "Toyota Corolla", "Toyota Corona",
"Fiat X1-9", "Porsche 914-2", "Lotus Europa", "Volvo 142E"), class = "data.frame")
Binary file added vignettes/artifacts/mtcars_plot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/artifacts/mtcars_plot_styled.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/artifacts/mtcars_styled.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/artifacts/mtcars_table.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
28 changes: 15 additions & 13 deletions vignettes/document.Rmd
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
title: "Documenting Datasets with qtkit"
title: "Documenting Datasets"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Documenting Datasets with qtkit}
%\VignetteIndexEntry{Documenting Datasets}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
Expand All @@ -14,7 +14,7 @@ knitr::opts_chunk$set(
)
```

```{r setup, include=FALSE}
```{r setup, message=FALSE, warning=FALSE}
library(qtkit)
library(fs)
library(tibble)
Expand All @@ -38,7 +38,7 @@ Let's start with documenting the built-in `mtcars` dataset:

```{r}
# Create a temporary file for our documentation
origin_file <- fs::file_temp(ext = "csv")
origin_file <- file_temp(ext = "csv")
# Create the origin documentation template
origin_doc <- create_data_origin(
Expand All @@ -47,14 +47,16 @@ origin_doc <- create_data_origin(
)
# View the template
origin_doc %>%
origin_doc |>
glimpse()
```

The template provides fields for essential metadata. Here's how you might fill it out for `mtcars`:
The template provides fields for essential metadata. You can either open the CSV file in a spreadsheet editor or fill it out programmatically, as shown below.

Here's how you might fill it out for `mtcars`:

```{r}
origin_doc %>%
origin_doc |>
mutate(description = c(
"Motor Trend Car Road Tests",
"Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.",
Expand All @@ -64,7 +66,7 @@ origin_doc %>%
"Single data frame with 32 observations of 11 variables",
"Public Domain",
"Citation: Henderson and Velleman (1981)"
)) %>%
)) |>
write_csv(origin_file)
```

Expand All @@ -87,7 +89,7 @@ Create a basic data dictionary without AI assistance:

```{r}
# Create a temporary file for our dictionary
dict_file <- fs::file_temp(ext = "csv")
dict_file <- file_temp(ext = "csv")
# Generate dictionary for iris dataset
iris_dict <- create_data_dictionary(
Expand All @@ -96,7 +98,7 @@ iris_dict <- create_data_dictionary(
)
# View the results
iris_dict %>%
iris_dict |>
glimpse()
```

Expand All @@ -111,7 +113,7 @@ Sys.setenv(OPENAI_API_KEY = "your-api-key")
iris_dict_ai <- create_data_dictionary(
data = iris,
file_path = dict_file,
model = "gpt-3.5-turbo",
model = "gpt-4",
sample_n = 5
)
```
Expand All @@ -136,10 +138,10 @@ tibble(
For larger datasets, you can use sampling and grouping:

```{r, eval=FALSE}
diamonds_dict <- diamonds %>%
diamonds_dict <- diamonds |>
create_data_dictionary(
file_path = "diamonds_dict.csv",
model = "gpt-3.5-turbo",
model = "gpt-4",
sample_n = 3,
grouping = "cut" # Sample across different cut categories
)
Expand Down
60 changes: 37 additions & 23 deletions vignettes/reports.Rmd
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
---
title: "Publishing Analysis Artifacts with qtkit"
title: "Publishing Analysis Artifacts"
output: rmarkdown::html_vignette
resource_files:
- artifacts/
vignette: >
%\VignetteIndexEntry{Publishing Analysis Artifacts with qtkit}
%\VignetteIndexEntry{Publishing Analysis Artifacts}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "figures/",
out.width = "100%"
comment = "#>"
)
```

Expand Down Expand Up @@ -45,7 +45,7 @@ These functions handle common tasks like:

Let's create and save a simple table using `write_kbl()`:

```{r}
```{r mtcars_table}
# Create a basic table
mtcars_table <- mtcars[1:5, 1:4] |>
kable(format = "html") |>
Expand All @@ -55,22 +55,16 @@ mtcars_table <- mtcars[1:5, 1:4] |>
write_kbl(
kbl_obj = mtcars_table,
file = "mtcars_table",
target_dir = "figures",
target_dir = "artifacts",
device = "png"
)
```

The table is saved as a PNG file and can then be included in reports or presentations. For example, the `include_graphics()` function from {knitr} can be used to include the image in an R Markdown document:

```{r mtcars_table_show}
knitr::include_graphics("figures/mtcars_table.png")
```

## Customizing Table Output

You can customize the output format and styling:

```{r}
```{r mtcars_table_styled}
mtcars_table <- mtcars[1:5, 1:4] |>
kable(format = "html") |>
kable_styling(
Expand All @@ -82,8 +76,8 @@ mtcars_table <- mtcars[1:5, 1:4] |>
write_kbl(
kbl_obj = mtcars_table,
file = "mtcars_styled",
target_dir = "figures",
device = "pdf",
target_dir = "artifacts",
device = "png",
bs_theme = "flatly"
)
```
Expand All @@ -94,7 +88,7 @@ write_kbl(

Save ggplot2 plots with `write_gg()`:

```{r}
```{r mtcars_plot}
# Create a basic plot
p <- ggplot(mtcars, aes(wt, mpg)) +
geom_point() +
Expand All @@ -105,15 +99,15 @@ write_gg(
gg_obj = p,
file = "mtcars_plot",
target_dir = "artifacts",
device = "pdf"
device = "png"
)
```

## Customizing Plot Output

Add custom themes and formatting:

```{r}
```{r mtcars_plot_styled}
p <- ggplot(mtcars, aes(wt, mpg, color = factor(cyl))) +
geom_point(size = 3) +
theme_minimal() +
Expand Down Expand Up @@ -141,7 +135,7 @@ write_gg(

Use `write_obj()` to save R objects for later use:

```{r}
```{r mtcars_subset}
# Create a filtered dataset
mtcars_subset <- mtcars |>
filter(cyl == 4) |>
Expand All @@ -155,15 +149,35 @@ write_obj(
)
```

## Reading Saved Objects
## Reading Saved Artifacts

These objects have all been saved in the `artifacts` directory, as seen below:

```{r artifacts_list}
fs::dir_tree("artifacts")
```

Tables and plots saved with {qtkit} can be easily restored. For image formats, use the `knitr::include_graphics()` function:

```{r mtcars_plot_restore}
# Display the saved plot
knitr::include_graphics("artifacts/mtcars_plot.png")
```

Objects can be read back using `dget()`:

```{r}
```{r mtcars_subset_restore}
# Read the saved object
restored_data <- dget("./figures/mtcars_4cyl")
restored_data <- dget("artifacts/mtcars_4cyl")
# Check the restored object
glimpse(restored_data)
```

> [!TIP]
> Note that tables and plots need not be saved with the `write_kbl()` and `write_gg()` functions to be restored. These objects can be saved and restored using the `write_obj()` and `dget()` function as well.

# Best Practices

1. Use consistent naming conventions
Expand Down

0 comments on commit b64c179

Please sign in to comment.