From d05de1a00995ab5eb09c3dc10cbba712356770b1 Mon Sep 17 00:00:00 2001 From: hneth Date: Thu, 1 Jun 2023 16:59:18 +0200 Subject: [PATCH 1/7] revise vignette: remove example on my.tree, but mention corresponding tasks and vignette --- vignettes/FFTrees_heart.Rmd | 42 +++++++++++++------------------------ 1 file changed, 15 insertions(+), 27 deletions(-) diff --git a/vignettes/FFTrees_heart.Rmd b/vignettes/FFTrees_heart.Rmd index 64555834..64be02c5 100644 --- a/vignettes/FFTrees_heart.Rmd +++ b/vignettes/FFTrees_heart.Rmd @@ -142,7 +142,8 @@ For definitions of all accuracy statistics, see the [accuracy statistics](FFTree ### Step\ 4: Visualise the final FFT -Use `plot()` to visualize an FFT (an `FFTrees` object): +We use `plot(x)` to visualize an FFT (from an\ `FFTrees` object\ `x`). +Using `data = "train"` evaluates an\ FFT for training data (fitting), whereas `data = "test"` predicts the performance of an\ FFT for a different dataset: ```{r fft-plot, fig.width = 6.5, fig.height = 6} # Plot predictions of the best FFT when applied to test data: @@ -152,9 +153,12 @@ plot(heart.fft, # An FFTrees object #### Other arguments +The `plot()` function for `FFTrees` object + - `tree`: Which tree in the object should beplotted? To plot a tree other than the best fitting tree (FFT \#1), just specify another tree as an integer (e.g.; `plot(heart.fft, tree = 2)`). -- `data`: For which dataset should statistics be shown? Either `data = "train"` (showing fitting or "Training" performance by default), or `data = "test"` (showing prediction or "Testing" performance). +- `data`: For which dataset should statistics be shown? +Either `data = "train"` (showing fitting or "Training" performance by default), or `data = "test"` (showing prediction or "Testing" performance). - `stats`: Should accuracy statistics be shown with the tree? To show only the tree, without any performance statistics, include the argument `stats = FALSE`. @@ -166,7 +170,9 @@ plot(heart.fft, what = "tree") - `comp`: Should statistics from competitive algorithms be shown in the ROC curve? To remove the performance statistics of competitive algorithms (e.g.; regression, random forests), include the argument `comp = FALSE`. -- `what`: To show individual cue accuracies (in ROC space), include the argument `what = "cues"`: +- `what`: Which parts of an `FFTrees` object should be visualized (e.g., `all`, `icontree` and `tree`). +Using `what = "roc"` plots tree performance as an ROC curve. +To show individual cue accuracies (in ROC space), include the argument `what = "cues"`: ```{r fft-cues, fig.width = 6, fig.height = 6, out.width = "500px"} # Plot cue accuracies (for training data) in ROC space: @@ -187,6 +193,7 @@ An `FFTrees` object contains many different outputs, to see them all, run `names names(heart.fft) ``` + #### Predicting for new data To predict classifications for a new dataset, use the standard `predict()` function. @@ -198,32 +205,13 @@ predict(heart.fft, newdata = heartdisease) ``` -#### Defining FFTs in words -To define a specific FFT and apply it to data, we can define a tree by providing its verbal description to the `my.tree` argument: - -```{r fft-my-tree, results = 'hide'} -# Create an FFT manually (from description): -my.heart.fft <- FFTrees(formula = diagnosis ~., - data = heart.train, - data.test = heart.test, - main = "My Heart FFT", - my.tree = "If chol > 350, predict True. - If cp != {a}, predict False. - If age <= 35, predict False, otherwise, predict True.") -``` - -Running this code evaluates `my.tree` for the specified sets of data. -A visualization of the resulting tree shows its performance summary -(for the training data): - -```{r plot-my-fft, fig.width = 6.5, fig.height = 6} -plot(my.heart.fft, data = "train") -``` +#### Directly defining FFTs -The resulting tree is actually not too bad, although its first node is pretty useless (as it only classifies 3\ cases, all as false alarms). -Thus, omitting the first node will result in an even simpler FFT that cannot be worse. -Feel free to verify this ---\ and see the [Manually specifying FFTs](FFTrees_mytree.html) vignette for additional details on defining FFTs from verbal or abstract descriptions. +To define a specific FFT and apply it to data, we can define a tree by providing its verbal description to the `my.tree` argument. +Similarly, we can define sets of FFT definitions (as a data frame) and evaluate them on data by using the `tree.definitions` argument of `FFTrees()`. +As we often start from an existing set of FFTs, **FFTrees** provides a set of functions for extracting, converting, and modifying tree definitions. +(See the vignette on [Manually specifying FFTs](FFTrees_mytree.html) for defining FFTs from descriptions and modifying tree definitions.) ## Vignettes From e824615eb6d572183e95c11c672ab6837308b0bc Mon Sep 17 00:00:00 2001 From: hneth Date: Thu, 1 Jun 2023 17:00:40 +0200 Subject: [PATCH 2/7] minor --- README.Rmd | 11 ++++------- README.md | 29 ++++++++++++++--------------- 2 files changed, 18 insertions(+), 22 deletions(-) diff --git a/README.Rmd b/README.Rmd index 5ab90f37..2f17aa64 100644 --- a/README.Rmd +++ b/README.Rmd @@ -197,13 +197,11 @@ plot(heart_fft, heart_fft$competition$test ``` - ### Building FFTs from verbal descriptions -FFTs are so simple that we even can create them 'from words' and then apply them to data! - +FFTs are so simple that we even can create them 'from words' and then apply them to data. For example, let's create a tree with the following three nodes and evaluate its performance on the `heart.test` data: 1. If `sex = 1`, predict _Disease_. @@ -235,11 +233,10 @@ plot(my_fft, ![An FFT created from a verbal description.](man/figures/README-example-heart-verbal-1.png) **Figure\ 2**: An FFT predicting heart disease created from a verbal description. - -As we can see, this particular tree is somewhat biased: +The performance measures (in the bottom panel of **Figure\ 2**) show that this particular tree is somewhat biased: It has nearly perfect _sensitivity_ (i.e., is good at identifying cases of _Disease_) but suffers from low _specificity_ (i.e., performs poorly in identifying _Healthy_ cases). Expressed in terms of its errors, `my_fft` incurs few misses at the expense of many false alarms. -Although the _accuracy_ of our custom tree still exceeds the data's baseline by a fair amount, the FFTs in `heart_fft` (from above) strike a better balance. +Although the _accuracy_ of our custom tree still exceeds the data's baseline by a fair amount, the FFTs in `heart_fft` (created above) strike a better balance. @@ -249,7 +246,7 @@ To explore this range of options, the **FFTrees** package enables us to design a ## References -We had a lot of fun creating **FFTrees** and hope you like it too! +We had a lot of fun creating the **FFTrees** package and hope you like it too! As a comprehensive, yet accessible introduction to FFTs, we recommend reading our article in the journal _Judgment and Decision Making_ ([2017](`r url_JDM_doi`)), entitled _FFTrees: A toolbox to create, visualize,and evaluate fast-and-frugal decision trees_ (available in [html](`r url_JDM_html`) | [PDF](`r url_JDM_pdf`)\ ). diff --git a/README.md b/README.md index 915b9955..9dda36da 100644 --- a/README.md +++ b/README.md @@ -211,8 +211,7 @@ heart_fft$competition$test ### Building FFTs from verbal descriptions FFTs are so simple that we even can create them ‘from words’ and then -apply them to data! - +apply them to data. For example, let’s create a tree with the following three nodes and evaluate its performance on the `heart.test` data: @@ -247,15 +246,15 @@ plot(my_fft, description.](man/figures/README-example-heart-verbal-1.png) **Figure 2**: An FFT predicting heart disease created from a verbal -description. - -As we can see, this particular tree is somewhat biased: It has nearly -perfect *sensitivity* (i.e., is good at identifying cases of *Disease*) -but suffers from low *specificity* (i.e., performs poorly in identifying +description. +The performance measures (in the bottom panel of **Figure 2**) show that +this particular tree is somewhat biased: It has nearly perfect +*sensitivity* (i.e., is good at identifying cases of *Disease*) but +suffers from low *specificity* (i.e., performs poorly in identifying *Healthy* cases). Expressed in terms of its errors, `my_fft` incurs few misses at the expense of many false alarms. Although the *accuracy* of our custom tree still exceeds the data’s baseline by a fair amount, the -FFTs in `heart_fft` (from above) strike a better balance. +FFTs in `heart_fft` (created above) strike a better balance. @@ -267,12 +266,12 @@ package enables us to design and evaluate a range of FFTs. ## References -We had a lot of fun creating **FFTrees** and hope you like it too! As a -comprehensive, yet accessible introduction to FFTs, we recommend reading -our article in the journal *Judgment and Decision Making* -([2017](https://doi.org/10.1017/S1930297500006239)), entitled *FFTrees: -A toolbox to create, visualize,and evaluate fast-and-frugal decision -trees* (available in +We had a lot of fun creating the **FFTrees** package and hope you like +it too! As a comprehensive, yet accessible introduction to FFTs, we +recommend reading our article in the journal *Judgment and Decision +Making* ([2017](https://doi.org/10.1017/S1930297500006239)), entitled +*FFTrees: A toolbox to create, visualize,and evaluate fast-and-frugal +decision trees* (available in [html](https://journal.sjdm.org/17/17217/jdm17217.html) \| [PDF](https://journal.sjdm.org/17/17217/jdm17217.pdf) ). @@ -333,6 +332,6 @@ Examples include: ------------------------------------------------------------------------ -\[File `README.Rmd` last updated on 2023-05-31.\] +\[File `README.Rmd` last updated on 2023-06-01.\] From 0713b9355c6b4c48638db474861eb4cdc82ff582 Mon Sep 17 00:00:00 2001 From: hneth Date: Thu, 1 Jun 2023 17:00:49 +0200 Subject: [PATCH 3/7] minor --- R/plotFFTrees_function.R | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/R/plotFFTrees_function.R b/R/plotFFTrees_function.R index 7e079294..f7ccd7cb 100644 --- a/R/plotFFTrees_function.R +++ b/R/plotFFTrees_function.R @@ -28,7 +28,7 @@ #' } #' By default, \code{data = 'train'} (as \code{x} may not contain test data). #' -#' @param what What should be plotted (as a string)? Valid options are: +#' @param what What should be plotted (as a character string)? Valid options are: #' \describe{ #' \item{'all'}{Plot the tree diagram with all corresponding guides and performance statistics, but excluding cue accuracies.} #' \item{'cues'}{Plot only the marginal accuracy of cues in ROC space. From 116029d9b78ce12f45257b33331a220162270508 Mon Sep 17 00:00:00 2001 From: hneth Date: Thu, 1 Jun 2023 17:01:10 +0200 Subject: [PATCH 4/7] update and increment version --- DESCRIPTION | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index 2993340e..dd0042e3 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,8 +1,8 @@ Package: FFTrees Type: Package Title: Generate, Visualise, and Evaluate Fast-and-Frugal Decision Trees -Version: 1.9.0.9031 -Date: 2023-05-31 +Version: 1.9.0.9032 +Date: 2023-06-01 Authors@R: c(person("Nathaniel", "Phillips", role = c("aut"), email = "Nathaniel.D.Phillips.is@gmail.com", comment = c(ORCID = "0000-0002-8969-7013")), person("Hansjoerg", "Neth", role = c("aut", "cre"), email = "h.neth@uni.kn", comment = c(ORCID = "0000-0001-5427-3141")), person("Jan", "Woike", role = "aut", comment = c(ORCID = "0000-0002-6816-121X")), From 774ac843c6a7a527bde7f7dc503f559b057e8383 Mon Sep 17 00:00:00 2001 From: hneth Date: Thu, 1 Jun 2023 17:03:11 +0200 Subject: [PATCH 5/7] minor --- README.Rmd | 2 -- README.md | 2 +- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/README.Rmd b/README.Rmd index 2f17aa64..f9b30618 100644 --- a/README.Rmd +++ b/README.Rmd @@ -40,13 +40,11 @@ url_JDM_doi <- "https://doi.org/10.1017/S1930297500006239" [![R-CMD-check](https://github.com/ndphillips/FFTrees/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/ndphillips/FFTrees/actions/workflows/R-CMD-check.yaml) - - diff --git a/README.md b/README.md index 9dda36da..b494e10f 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ -# FFTrees 1.9.0.9031 FFTrees +# FFTrees 1.9.0.9032 FFTrees From e9ddb7e7714736e28d1d3210be80086aac6ca071 Mon Sep 17 00:00:00 2001 From: hneth Date: Thu, 1 Jun 2023 17:26:29 +0200 Subject: [PATCH 6/7] revise vignette (esp. section on advanced functions) --- vignettes/FFTrees_heart.Rmd | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/vignettes/FFTrees_heart.Rmd b/vignettes/FFTrees_heart.Rmd index 64be02c5..b7c31d1c 100644 --- a/vignettes/FFTrees_heart.Rmd +++ b/vignettes/FFTrees_heart.Rmd @@ -171,8 +171,8 @@ plot(heart.fft, what = "tree") - `comp`: Should statistics from competitive algorithms be shown in the ROC curve? To remove the performance statistics of competitive algorithms (e.g.; regression, random forests), include the argument `comp = FALSE`. - `what`: Which parts of an `FFTrees` object should be visualized (e.g., `all`, `icontree` and `tree`). -Using `what = "roc"` plots tree performance as an ROC curve. -To show individual cue accuracies (in ROC space), include the argument `what = "cues"`: +Using `what = "roc"` plots tree performance as an ROC\ curve. +To show individual cue accuracies (in ROC space), specify `what = "cues"`: ```{r fft-cues, fig.width = 6, fig.height = 6, out.width = "500px"} # Plot cue accuracies (for training data) in ROC space: @@ -182,21 +182,28 @@ plot(heart.fft, what = "cues") See the [Plotting FFTrees](FFTrees_plot.html) vignette for details on plotting FFTs. -### Additional steps +### Advanced functions -#### Accessing outputs +Creating sets of FFTs and evaluating them on data by printing and plotting individual FFTs provides the core functionality of **FFTrees**. +However, the package also provides more advanced functions for accessing, defining, using and evaluating FFTs. -An `FFTrees` object contains many different outputs, to see them all, run `names()` +#### Accessing outputs + +An `FFTrees` object contains many different outputs. +Basic performance information on the current data and set of FFTs is available by the `summary()` function. +To see and access parts of an `FFTrees` object, use `str()` or `names()`: ```{r fft-names} -# Show the names of all of the outputs in heart.fft: +# Show the names of all outputs in heart.fft: names(heart.fft) ``` +Key elements of an `FFTrees` object are explained in the vignette on [Creating FFTs with FFTrees()](FFTrees_function.html). + #### Predicting for new data -To predict classifications for a new dataset, use the standard `predict()` function. +To predict classification outcomes for new data, use the standard `predict()` function. For example, here's how to predict the classifications for data in the `heartdisease` object (which actually is just a combination of `heart.train` and `heart.test`): ```{r fft-predict, eval = FALSE} @@ -211,7 +218,8 @@ predict(heart.fft, To define a specific FFT and apply it to data, we can define a tree by providing its verbal description to the `my.tree` argument. Similarly, we can define sets of FFT definitions (as a data frame) and evaluate them on data by using the `tree.definitions` argument of `FFTrees()`. As we often start from an existing set of FFTs, **FFTrees** provides a set of functions for extracting, converting, and modifying tree definitions. -(See the vignette on [Manually specifying FFTs](FFTrees_mytree.html) for defining FFTs from descriptions and modifying tree definitions.) + +See the vignette on [Manually specifying FFTs](FFTrees_mytree.html) for defining FFTs from descriptions and modifying tree definitions. ## Vignettes From a3113defc407df9e5b21ca54a2f10a229191c3fe Mon Sep 17 00:00:00 2001 From: hneth Date: Thu, 1 Jun 2023 17:26:40 +0200 Subject: [PATCH 7/7] minor --- man/plot.FFTrees.Rd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/man/plot.FFTrees.Rd b/man/plot.FFTrees.Rd index f26df154..4a1a3b9c 100644 --- a/man/plot.FFTrees.Rd +++ b/man/plot.FFTrees.Rd @@ -45,7 +45,7 @@ } By default, \code{data = 'train'} (as \code{x} may not contain test data).} -\item{what}{What should be plotted (as a string)? Valid options are: +\item{what}{What should be plotted (as a character string)? Valid options are: \describe{ \item{'all'}{Plot the tree diagram with all corresponding guides and performance statistics, but excluding cue accuracies.} \item{'cues'}{Plot only the marginal accuracy of cues in ROC space.