diff --git a/DESCRIPTION b/DESCRIPTION
index 10f0310..a116fb1 100755
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -1,5 +1,5 @@
Package: tukeyedar
-Version: 0.2.0
+Version: 0.2.1
Type: Package
Title: Tukey Inspired Exploratory Data Analysis Functions
Authors@R: person(given = "Manuel",
diff --git a/README.Rmd b/README.Rmd
index f4c8d8c..870f680 100755
--- a/README.Rmd
+++ b/README.Rmd
@@ -1,5 +1,6 @@
---
output: github_document
+bibliography: ref.bib
---
@@ -22,8 +23,7 @@ experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](h
The `tukeyedar` package houses data exploration tools. Many functions are inspired by work published by
-Tukey (1977), D. C. Hoaglin and Tukey (1983) and Velleman and Hoaglin
-(1981). Note that this package is in beta mode, so use at your own
+@eda1977, @understanding_eda1983, @applied_eda1981, and @visdata1993. Note that this package is in beta mode, so use at your own
discretion.
## Installation
@@ -85,26 +85,4 @@ mtcars %>% eda_3pt(disp, mpg)
------------------------------------------------------------------------
-
-
-
-D. C. Hoaglin, F. Mosteller, and J. W. Tukey. 1983. *Understanding
-Robust and Exploratory Data Analysis*. Wiley.
-
-
-
-
-
-Tukey, John W. 1977. *Exploratory Data Analysis*. Addison-Wesley.
-
-
-
-
-
-Velleman, P. F., and D. C. Hoaglin. 1981. *Applications, Basics and
-Computing of Exploratory Data Analysis*. Boston: Duxbury Press.
-
-
-
-
diff --git a/README.md b/README.md
index e46488d..95b189e 100755
--- a/README.md
+++ b/README.md
@@ -11,8 +11,8 @@ experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](h
The `tukeyedar` package houses data exploration tools. Many functions
-are inspired by work published by Tukey (1977), D. C. Hoaglin and Tukey
-(1983) and Velleman and Hoaglin (1981). Note that this package is in
+are inspired by work published by Tukey (1977), Hoaglin (1983), Velleman
+and Hoaglin (1981), and Cleveland (1993). Note that this package is in
beta mode, so use at your own discretion.
## Installation
@@ -86,10 +86,16 @@ mtcars %>% eda_3pt(disp, mpg)
-D. C. Hoaglin, F. Mosteller, and J. W. Tukey. 1983. *Understanding
-Robust and Exploratory Data Analysis*. Wiley.
+Hoaglin, Mosteller, D. C. 1983. *Understanding Robust and Exploratory
+Data Analysis*. Wiley.
The tukeyedar package houses data exploration tools. Many functions are inspired by work published by Tukey (1977), D. C. Hoaglin and Tukey (1983) and Velleman and Hoaglin (1981). Note that this package is in beta mode, so use at your own discretion.
+
The tukeyedar package houses data exploration tools. Many functions are inspired by work published by Tukey (1977), Hoaglin (1983), Velleman and Hoaglin (1981), and Cleveland (1993). Note that this package is in beta mode, so use at your own discretion.
-D. C. Hoaglin, F. Mosteller, and J. W. Tukey. 1983. Understanding Robust and Exploratory Data Analysis. Wiley.
+Hoaglin, Mosteller, D. C. 1983. Understanding Robust and Exploratory Data Analysis. Wiley.
Tukey, John W. 1977. Exploratory Data Analysis. Addison-Wesley.
diff --git a/docs/news/index.html b/docs/news/index.html
index 82a41c8..780f12f 100755
--- a/docs/news/index.html
+++ b/docs/news/index.html
@@ -10,7 +10,7 @@
tukeyedar
- 0.2.0
+ 0.2.1
diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml
index a23399a..d5c086f 100755
--- a/docs/pkgdown.yml
+++ b/docs/pkgdown.yml
@@ -5,7 +5,7 @@ articles:
polish: polish.html
qq: qq.html
RLine: RLine.html
-last_built: 2024-01-15T19:07Z
+last_built: 2024-01-17T14:55Z
urls:
reference: https://mgimond.github.io/tukeyedar/reference
article: https://mgimond.github.io/tukeyedar/articles
diff --git a/docs/reference/Rplot001.png b/docs/reference/Rplot001.png
index 5502c96..1299382 100755
Binary files a/docs/reference/Rplot001.png and b/docs/reference/Rplot001.png differ
diff --git a/docs/reference/Rplot002.png b/docs/reference/Rplot002.png
index 1b3806d..695b4c7 100755
Binary files a/docs/reference/Rplot002.png and b/docs/reference/Rplot002.png differ
diff --git a/docs/reference/Rplot003.png b/docs/reference/Rplot003.png
index 1b3806d..233e069 100755
Binary files a/docs/reference/Rplot003.png and b/docs/reference/Rplot003.png differ
diff --git a/docs/reference/Rplot004.png b/docs/reference/Rplot004.png
index 5663342..8040b03 100755
Binary files a/docs/reference/Rplot004.png and b/docs/reference/Rplot004.png differ
diff --git a/docs/reference/Rplot005.png b/docs/reference/Rplot005.png
index 0b8b183..2056fe6 100755
Binary files a/docs/reference/Rplot005.png and b/docs/reference/Rplot005.png differ
diff --git a/docs/reference/Rplot006.png b/docs/reference/Rplot006.png
index 0a5417b..506646c 100755
Binary files a/docs/reference/Rplot006.png and b/docs/reference/Rplot006.png differ
diff --git a/docs/reference/Rplot007.png b/docs/reference/Rplot007.png
index a19f120..37fb272 100755
Binary files a/docs/reference/Rplot007.png and b/docs/reference/Rplot007.png differ
diff --git a/docs/reference/Rplot008.png b/docs/reference/Rplot008.png
index 563b97a..29db78d 100755
Binary files a/docs/reference/Rplot008.png and b/docs/reference/Rplot008.png differ
diff --git a/docs/reference/Rplot009.png b/docs/reference/Rplot009.png
index 95ce31c..424455f 100755
Binary files a/docs/reference/Rplot009.png and b/docs/reference/Rplot009.png differ
diff --git a/docs/reference/Rplot010.png b/docs/reference/Rplot010.png
index 01249b2..cb13214 100644
Binary files a/docs/reference/Rplot010.png and b/docs/reference/Rplot010.png differ
diff --git a/docs/reference/age_height.html b/docs/reference/age_height.html
index d813b7f..d363bd6 100755
--- a/docs/reference/age_height.html
+++ b/docs/reference/age_height.html
@@ -16,7 +16,7 @@
tukeyedar
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/eda_3pt.html b/docs/reference/eda_3pt.html
index a028528..ccc4b2b 100755
--- a/docs/reference/eda_3pt.html
+++ b/docs/reference/eda_3pt.html
@@ -28,7 +28,7 @@
tukeyedar
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/eda_add.html b/docs/reference/eda_add.html
index 93e1ba6..59dd184 100755
--- a/docs/reference/eda_add.html
+++ b/docs/reference/eda_add.html
@@ -12,7 +12,7 @@
tukeyedar
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/eda_bipow.html b/docs/reference/eda_bipow.html
index 95bdba8..39cdff2 100755
--- a/docs/reference/eda_bipow.html
+++ b/docs/reference/eda_bipow.html
@@ -12,7 +12,7 @@
tukeyedar
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/eda_boxls.html b/docs/reference/eda_boxls.html
index 6652fe8..2c87d43 100755
--- a/docs/reference/eda_boxls.html
+++ b/docs/reference/eda_boxls.html
@@ -12,7 +12,7 @@
tukeyedar
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/eda_dens.html b/docs/reference/eda_dens.html
index 74f72eb..702fd55 100755
--- a/docs/reference/eda_dens.html
+++ b/docs/reference/eda_dens.html
@@ -12,7 +12,7 @@
tukeyedar
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/eda_lm.html b/docs/reference/eda_lm.html
index c4bb0fe..65b3cdc 100755
--- a/docs/reference/eda_lm.html
+++ b/docs/reference/eda_lm.html
@@ -16,7 +16,7 @@
tukeyedar
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/eda_lsum.html b/docs/reference/eda_lsum.html
index 0c52e50..cb2552d 100755
--- a/docs/reference/eda_lsum.html
+++ b/docs/reference/eda_lsum.html
@@ -16,7 +16,7 @@
tukeyedar
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/eda_normfit.html b/docs/reference/eda_normfit.html
index da911b6..7230da0 100644
--- a/docs/reference/eda_normfit.html
+++ b/docs/reference/eda_normfit.html
@@ -12,7 +12,7 @@
tukeyedar
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/eda_pol.html b/docs/reference/eda_pol.html
index b837d25..0cde83b 100755
--- a/docs/reference/eda_pol.html
+++ b/docs/reference/eda_pol.html
@@ -12,7 +12,7 @@
tukeyedar
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/eda_qq.html b/docs/reference/eda_qq.html
index b40a213..1ec051e 100755
--- a/docs/reference/eda_qq.html
+++ b/docs/reference/eda_qq.html
@@ -12,7 +12,7 @@
tukeyedar
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/eda_re.html b/docs/reference/eda_re.html
index 55093b0..255b77e 100755
--- a/docs/reference/eda_re.html
+++ b/docs/reference/eda_re.html
@@ -12,7 +12,7 @@
tukeyedar
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/eda_rline.html b/docs/reference/eda_rline.html
index 2d592bb..c3519f6 100755
--- a/docs/reference/eda_rline.html
+++ b/docs/reference/eda_rline.html
@@ -14,7 +14,7 @@
tukeyedar
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/eda_sl.html b/docs/reference/eda_sl.html
index 3f01164..28e1b1e 100755
--- a/docs/reference/eda_sl.html
+++ b/docs/reference/eda_sl.html
@@ -12,7 +12,7 @@
tukeyedar
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/eda_trim.html b/docs/reference/eda_trim.html
index 46ee234..7966e95 100755
--- a/docs/reference/eda_trim.html
+++ b/docs/reference/eda_trim.html
@@ -22,7 +22,7 @@
tukeyedar
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/eda_unipow.html b/docs/reference/eda_unipow.html
index e26dc08..e9121a1 100755
--- a/docs/reference/eda_unipow.html
+++ b/docs/reference/eda_unipow.html
@@ -16,7 +16,7 @@
tukeyedar
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/index.html b/docs/reference/index.html
index e66e3cf..79de396 100755
--- a/docs/reference/index.html
+++ b/docs/reference/index.html
@@ -10,7 +10,7 @@
tukeyedar
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/neoplasms.html b/docs/reference/neoplasms.html
index b1bedef..c931078 100755
--- a/docs/reference/neoplasms.html
+++ b/docs/reference/neoplasms.html
@@ -12,7 +12,7 @@
tukeyedar
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/nine_point.html b/docs/reference/nine_point.html
index 0711de5..ed2e899 100755
--- a/docs/reference/nine_point.html
+++ b/docs/reference/nine_point.html
@@ -14,7 +14,7 @@
tukeyedar
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/plot.eda_polish.html b/docs/reference/plot.eda_polish.html
index bdfd7e3..04e1ab3 100755
--- a/docs/reference/plot.eda_polish.html
+++ b/docs/reference/plot.eda_polish.html
@@ -10,7 +10,7 @@
tukeyedar
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/plot.eda_rline.html b/docs/reference/plot.eda_rline.html
index 6b6c659..9230f21 100755
--- a/docs/reference/plot.eda_rline.html
+++ b/docs/reference/plot.eda_rline.html
@@ -12,7 +12,7 @@
tukeyedar
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/tukeyedar.html b/docs/reference/tukeyedar.html
index d70e577..a607ff0 100755
--- a/docs/reference/tukeyedar.html
+++ b/docs/reference/tukeyedar.html
@@ -12,7 +12,7 @@
tukeyedar
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/wat05.html b/docs/reference/wat05.html
index 754b86b..263eabd 100755
--- a/docs/reference/wat05.html
+++ b/docs/reference/wat05.html
@@ -12,7 +12,7 @@
tukeyedar
- 0.2.0
+ 0.2.1
diff --git a/docs/reference/wat95.html b/docs/reference/wat95.html
index 1d46512..e84ef3b 100755
--- a/docs/reference/wat95.html
+++ b/docs/reference/wat95.html
@@ -12,7 +12,7 @@
tukeyedar
- 0.2.0
+ 0.2.1
diff --git a/docs/search.json b/docs/search.json
index e7ef3cd..a9bad87 100755
--- a/docs/search.json
+++ b/docs/search.json
@@ -1 +1 @@
-[{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"the-median-polish-basics","dir":"Articles","previous_headings":"","what":"The median polish basics","title":"Median polish","text":"median polish exploratory technique used extract effects two-way table. , median polish can thought robust version two-way ANOVA–goal characterize role factor contributing towards expected value. iteratively extracting effects associated row column factors via medians. example, given two-way table 1964 1966 infant mortality rates1 (reported count per 1000 live births) computed combination geographic region (NE, NC, S, W) level father’s educational attainment (ed8, ed9-11, ed12, ed13-15, ed16), median polish first extract overall median value, smooth residual rates first extracting median values along column (thus contributing column factor), smoothing remaining residual rates extracting median values along row (thus contributing row factor). smoothing operation iterated residuals stabilize. example workflow highlighted following figure. left-table original data showing death rates. second table shows outcome first round polishing (including initial overall median value 20.2). third forth table show second third iterations smoothing operations. Additional iterations deemed necessary given little can extracted residuals. detailed step--step explanation workflow see . resulting model additive form : \\[ y_{ij} = \\mu + \\alpha_{} + \\beta_{j} +\\epsilon_{ij} \\] \\(y_{ij}\\) response variable row \\(\\) column \\(j\\), \\(\\mu\\) overall typical value (hereafter referred common value), \\(\\alpha_{}\\) row effect, \\(\\beta_{j}\\) column effect \\(\\epsilon_{ij}\\) residual value left effects taken account. factor’s levels displayed top row left-column. example, region assigned rows father’s educational attainment assigned columns. father’s educational attainment can explain 11 units variability (7.58 - (-3.45)) death rates vs 4 units variability region (2.55 - (-1.5)). , father’s educational attainment larger contributor expected infant mortality regional effect.","code":""},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"implementing-the-median-polish","dir":"Articles","previous_headings":"","what":"Implementing the median polish","title":"Median polish","text":"package’s eda_polish augmented version built-medpolish available via stats package. key difference eda_polish takes input dataset long form opposed medpolish takes dataset form matrix. example, infant mortality dataset needs consist least three columns: one variable (two factors expected value). median polish can executed follows: function output table plot along list components stored M1 object. want suppress plot, can set parameter plot = FALSE. M1 object class eda_polish. can extract common values, row column effects follows:","code":"grd <- c(\"ed8\", \"ed9-11\", \"ed12\", \"ed13-15\", \"ed16\") dat <- data.frame(region = rep( c(\"NE\", \"NC\", \"S\", \"W\"), each = 5), edu = factor(rep( grd , 4), levels = grd), perc = c(25.3, 25.3, 18.2, 18.3, 16.3, 32.1, 29, 18.8, 24.3, 19, 38.8, 31, 19.3, 15.7, 16.8, 25.4, 21.1, 20.3, 24, 17.5)) head(dat) region edu perc 1 NE ed8 25.3 2 NE ed9-11 25.3 3 NE ed12 18.2 4 NE ed13-15 18.3 5 NE ed16 16.3 6 NC ed8 32.1 library(tukeyedar) M1 <- eda_pol(dat, row = region, col = edu, val = perc) M1$global [1] 20.85 M1$row region effect 1 NC 2.3000 2 NE -1.4625 3 S -0.3500 4 W 0.3500 M1$col edu effect 1 ed8 7.43125 2 ed9-11 5.88125 3 ed12 -1.19375 4 ed13-15 0.03125 5 ed16 -3.70000"},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"ordering-rows-and-columns-by-effect-values","dir":"Articles","previous_headings":"Implementing the median polish","what":"Ordering rows and columns by effect values","title":"Median polish","text":"order row column effects effect values, set sort parameter TRUE.","code":"M1 <- eda_pol(dat, row = region, col = edu, val = perc, sort = TRUE)"},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"applying-a-transformation-to-the-data","dir":"Articles","previous_headings":"Implementing the median polish","what":"Applying a transformation to the data","title":"Median polish","text":"can function re-express values prior performing polish. example, log transform data, pass value 0 p. re-expressing data using negative power, choice adopting Tukey transformation (tukey = TRUE) Box-Cox transformation (tukey = FALSE). example, apply power transformation -0.1 using Box-Cox transformation, type:","code":"M1 <- eda_pol(dat, row = region, col = edu, val = perc, p = 0) M1 <- eda_pol(dat, row = region, col = edu, val = perc, p = -0.1, tukey = FALSE)"},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"defining-the-statistic","dir":"Articles","previous_headings":"Implementing the median polish","what":"Defining the statistic","title":"Median polish","text":"default, polishing routine adopts median statistic. can adopt statistic via stat parameter. example, apply mean polish, type:","code":"M1 <- eda_pol(dat, row = region, col = edu, val = perc, stat = mean)"},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"the-eda_polish-plot-method","dir":"Articles","previous_headings":"","what":"The eda_polish plot method","title":"Median polish","text":"list object created eda_pol function class eda_polish. , plot method created class. plot method either output original polished table (type = \"residuals\"), diagnostic plot (type = \"diagnostic\"), CV values (cv).","code":""},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"plot-the-median-polish-table","dir":"Articles","previous_headings":"","what":"Plot the median polish table","title":"Median polish","text":"can generate plot table median polish model follows:","code":"M1 <- eda_pol(dat, row = region, col = edu, val = perc, plot = FALSE) plot(M1)"},{"path":[]},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"excluding-common-effect-from-the-color-palette-range","dir":"Articles","previous_headings":"Adjusting color schemes","what":"Excluding common effect from the color palette range","title":"Median polish","text":"default, range color palettes defined range values table–includes common effect value. prevent common value affecting distribution color palettes, set col.com FALSE. Note distribution colors maximized help improve view effects. view makes clear father’s educational attainment greater effect region.","code":"plot(M1, col.com = FALSE)"},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"excluding-rowcolumn-effects-from-the-color-palette-range","dir":"Articles","previous_headings":"Adjusting color schemes","what":"Excluding row/column effects from the color palette range","title":"Median polish","text":"want plot focus residuals maximizing range colors fit range residual values, set col.eff = FALSE. Note setting col.eff FALSE prevent effects cells colored. simply ensures range colors maximized match full range residual values. effect value falls within residual range assigned color.","code":"plot(M1, col.eff = FALSE)"},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"changing-color-schemes","dir":"Articles","previous_headings":"Adjusting color schemes","what":"Changing color schemes","title":"Median polish","text":"default, color scheme symmetrical (divergent) centered 0. adopts R’s (version 4.1 ) built-\"RdYlBu\" color palette. can assign different built-color palettes via colpal parameter. can list available colors R via hcl.pals() function. want limit output divergent color palettes, type: example, can assign \"Green-Brown\" color palette follows. (’ll remove common effect value range input values maximize displayed set colors). default color scheme symmetrical linear, centered 0. want maximize use colors, regardless range values, can set col.quant TRUE adopt quantile color scheme. ’ll note regardless asymmetrical distribution values 0, cell assigned unique color swatch. adopting quantile color classification scheme, might want adopt color palette generates fewer unique hues variation lightness values. example,","code":"hcl.pals(type = \"diverging\") [1] \"Blue-Red\" \"Blue-Red 2\" \"Blue-Red 3\" \"Red-Green\" [5] \"Purple-Green\" \"Purple-Brown\" \"Green-Brown\" \"Blue-Yellow 2\" [9] \"Blue-Yellow 3\" \"Green-Orange\" \"Cyan-Magenta\" \"Tropic\" [13] \"Broc\" \"Cork\" \"Vik\" \"Berlin\" [17] \"Lisbon\" \"Tofino\" plot(M1, colpal = \"Green-Brown\", col.com = FALSE) plot(M1, col.quant = TRUE) plot(M1, col.quant = TRUE, colpal = \"Green-Orange\")"},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"adjusting-text","dir":"Articles","previous_headings":"","what":"Adjusting text","title":"Median polish","text":"can omit labeled values output setting res.txt FALSE. Likewise can omit axes labels setting label.txt FALSE. may prove useful applying median polish large grid file. can adjust text size via res.size, row.size col.size parameters numeric values, row names, column names respectively. example, set sizes 60% default value, type:","code":"plot(M1, res.txt = FALSE) plot(M1, res.txt = FALSE, label.txt = FALSE) plot(M1, row.size = 0.6, col.size = 0.6 , res.size = 0.6)"},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"exploring-diagnostic-plots","dir":"Articles","previous_headings":"","what":"Exploring diagnostic plots","title":"Median polish","text":"plot method also generate plot residuals vs comparison values (CV), herein referred diagnostic plot. bisquare robust line fitted data (light red line) along robust loess fit (dashed blue line). function also output line’s slope. slope can used help estimate transformation data, needed. generate plot, simply extract cv component M1 list. cv component dataframe stores residuals (first column) CV values (fourth column). first records dataframe shown : diagnostic plot helps identify interactions effects. interaction suspected, model longer simple additive model; model needs augmented interactive component form: \\[ y_{ij} = \\mu + \\alpha_{} + \\beta_{j} + kCV +\\epsilon_{ij} \\] \\(CV\\) = \\(\\alpha_{}\\beta_{j}/\\mu\\) \\(k\\) constant can estimated slope generated diagnostic plot. truly additive model one changes response variable one level another level remain constant. example, given bottom-left matrix initial response values, changes response variable level level b constant regardless row effect. example, going b level z elicits change response 6 - 3 = 3. observed change values b levels x y (4-1 5-2 respectively). three row levels, change expected values b –increase 3 units. Likewise, changes response values rows x y y z constant (1) across levels column effect. additive effect can observed interaction plot shown right. column effect plotted along x-axis, row effect mapped line segment. Original dataset (left). Interaction plot (right). Parallel lines indicate interaction effects. median polish generates following table diagnostic plot: Median polished data showing interaction effects ’ll note lack pattern (flat one) accompanying diagnostic plot. Now, let’s see happens interaction fact present two way table. Original dataset (left). Interaction plot (right). Note lines longer parallel one another interaction plot. Now let’s run median polish generate diagnostic plot. Median polished data showing interaction effects ’ll note upward trend residuals increasing comparison values. usually good indication interaction effects. Another telltale sign pattern observed residuals median polish plot low residuals high residuals opposing corners table. interaction observed, either include interaction term additive model, seek re-expression might help alleviate interaction effects. choose include interaction term model, coefficient \\(k\\) can extracted slope generated diagnostic plot. choose re-express data hopes removing interaction data, can try using power transformation equal \\(1 - slope\\) (slope derived diagnostic plot). infant mortality dataset used exercise suggest interaction effects diagnostic plot. Next, ’ll look another dataset may exhibit interaction effects.","code":"plot(M1, type = \"diagnostic\") $slope cv 1.3688 head(M1$cv) perc region.eff edu.eff cv 1 -3.15625 2.3000 -1.19375 -0.1316846523 2 -0.00625 -0.3500 -1.19375 0.0200389688 3 0.00625 -1.4625 -1.19375 0.0837342626 4 0.29375 0.3500 -1.19375 -0.0200389688 5 -4.83125 -0.3500 0.03125 -0.0005245803 6 1.11875 2.3000 0.03125 0.0034472422"},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"another-example-earnings-by-sex-for-2021","dir":"Articles","previous_headings":"Exploring diagnostic plots","what":"Another example: Earnings by sex for 2021","title":"Median polish","text":"dataset consists earnings sex levels educational attainment 2021 (src: US Census Bureau). Education levels defined follows: NoHS: Less High School Graduate HS: High School Graduate (Includes Equivalency) AD: College Associate’s Degree Grad: Bachelor’s Degree original table (prior running median polish), can viewed setting maxiter 0 call eda_pol. 2021 Average earnings US. Next, ’ll run median polish. Next, plot final table diagnostic plot. ’s can glean output: Overall, median earnings $41,359 Variability earnings due different levels education attainment covers range $56,936 different sexes covers range $15,858. residuals quite large suggesting may much variability earnings may explained row column effects. residuals explain $15,780 variability data. diagnostic plot suggests strong interaction sex effect education effect. implies, example, differences earnings sexes depend level educational attainment. slope residuals CV values around 0.94. Given strong evidence interaction effects, need take one two actions: can either add comparison values (CV) row-plus-column model, can see re-expressing earnings values eliminates dependence effects.","code":"edu <- c(\"NoHS\", \"HS\", \"HS+2\", \"HS+4\", \"HS+6\") df1 <- data.frame(Education = factor(rep(edu,2), levels = edu), Sex = c(rep(\"Male\", 5), rep(\"Female\",5)), Earnings = c(31722, 40514, 49288, 73128,98840,20448, 26967, 33430, 50554, 67202)) eda_pol(df1, row = Education, col = Sex, val = Earnings , maxiter = 0) M2 <- eda_pol(df1, row = Education, col = Sex, val = Earnings , plot = FALSE) plot(M2, res.size = 0.8, row.size = 0.8, col.size = 0.8) plot(M2, \"diagnostic\") $slope cv 0.9410244"},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"adding-cv-to-the-row-plus-column-model","dir":"Articles","previous_headings":"Exploring diagnostic plots > Another example: Earnings by sex for 2021","what":"Adding CV to the row-plus-column model","title":"Median polish","text":"CV values computed stored median polish object. can extracted model via M2$cv component can visualized via plot function. following figure shows original residuals table (left) CV table (right). Median polish residuals (left) CV values (right). comparison value added model, need compute new set residuals. residuals can plotted setting add.cv TRUE specifying value k. Using slope estimate k get: CV values (left) new set residuals (right). two tables provide us parameters needed construct model. example, Female-NoHS earnings value can recreated table follows: \\[ Earnings_{Female-NoHS} = \\mu + Sex_{Female} + Education_{NoHS} + kCV_{Female-NoHS} + \\epsilon_{Female-NoHS} \\] : \\(CV_{Female-NoHS} = \\frac{(Sex_{Female})(Education_{NoHS})}{\\mu}\\) \\(k\\) constant can estimated diagnostic plot’s slope (0.94 example). gives us: \\[ Earnings_{Female-NoHS} = 41359 -7929 -15274 + (0.94)(2928.2) -460.5 \\]","code":"plot(M2, res.size = 0.8, row.size = 0.8, col.size = 0.8) plot(M2, \"cv\", res.size = 0.8, row.size = 0.8, col.size = 0.8) plot(M2, \"cv\", res.size = 0.8, row.size = 0.8, col.size = 0.8) plot(M2, \"residuals\", add.cv=TRUE, k = 0.94, res.size = 0.8, row.size = 0.8, col.size = 0.8)"},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"re-expressing-earnings","dir":"Articles","previous_headings":"Exploring diagnostic plots > Another example: Earnings by sex for 2021","what":"Re-expressing earnings","title":"Median polish","text":"’s possible earnings presented us scale best suited analysis. Subtracting slope value (derived diagnostic plot) value 1 offers suggested transformation may provide us scale measure best suited data. ’ll rerun median polish using power transformation 1 - 0.94 = 0.06. Next, plot final table diagnostic plot. Median polish output (left) CV values (right). power 0.06 may bit aggressive given ’ve gone positive relationship CV residual negative relationship two. Tweaking power parameter may recommended. can done via trial error, can done using technique described next.","code":"M3 <- eda_pol(df1, row = Education, col = Sex, val = Earnings , plot = FALSE, p = 0.06) plot(M3, res.size = 0.8, row.size = 0.8, col.size = 0.8) plot(M3, \"diagnostic\")"},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"fine-tuning-a-re-expression","dir":"Articles","previous_headings":"Exploring diagnostic plots > Another example: Earnings by sex for 2021","what":"Fine tuning a re-expression","title":"Median polish","text":"Klawonn et al.2 propose method honing optimal power transformation finding one maximizes effect’s spreads vis--vis residuals. computing ratio interquartile range row column effects 80% quantile residual’s absolute values. following code chunk computes ratio different power transformations. Row (left) column (right) effect IQRs residuals ratio vs power. plot suggests power transformation 0.1. ’ll re-run median polish using power transformation. slope much smaller loess fit suggests monotonically increasing decreasing relationship residuals CV values. Re-expressing value seems done good job stabilizing residuals across CV values. ’ll modify color scheme place emphasis effects opposed overall value. ’s can glean output: earnings values best expressed power scale 0.1. Overall, median earnings (re-expressed form) $19. Variability earnings due different levels education attainment covers range $3 different sexes covers range $1. residuals much smaller relative effects earnings re-expressed. residuals explain close $0 variability data. Just variability can explained effects. Re-expressing values eliminates interaction effects.","code":"f1 <- function(x){ out <- eda_pol(df1, row = Education, col = Sex, val = Earnings, p = x, plot=FALSE, tukey = FALSE) c(p=out$power, IQrow = out$IQ_row, IQcol = out$IQ_col) } IQ <- t(sapply(0:25/10, FUN = f1 )) # Apply transformations at 0.1 intervals plot(IQrow ~ p, IQ, type=\"b\") grid() plot(IQcol ~ p, IQ, type=\"b\") grid() M4 <- eda_pol(df1, row = Education, col = Sex, val = Earnings, plot = FALSE, p = 0.1) plot(M4, res.size = 0.8, row.size = 0.8, col.size = 0.8) plot(M4, \"diagnostic\") plot(M4, col.com = FALSE, res.size = 0.8, row.size = 0.8, col.size = 0.8)"},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"the-mean-polish","dir":"Articles","previous_headings":"","what":"The mean polish","title":"Median polish","text":"eda_pol function accepts statistical summary function. default, uses median. example, mean polish generated earnings dataset looks like : Polishing data using mean requires single iteration reach stable output. mean suffers sensitivity non-symmetrical distributions outliers. , median polish robust summary statistic. said, running mean polish benefits: ’s great way represent effects generated two-way analysis variance (aka 2-way ANOVA). confirmed comparing row column effects traditional 2-way ANOVA technique shown : median polish, must concern interactions effects. interaction present, ANOVA inferential statistics using F-test can untrustworthy. strong evidence interaction. slope 0.92 can used estimate power transformation via \\(1 - slope\\). close power transformation 0.1 ended adopting median polish exercise. Results mean polish (left) diagnositc plot (right). Re-expressing data nice job removing interaction effects much like performed median polish. suggests one run two-way ANOVA, re-expression strongly suggested.","code":"M5 <- eda_pol(df1, row = Education, col = Sex, val = Earnings , stat = mean, plot = FALSE) plot(M5, res.size = 0.8, row.size = 0.8, col.size = 0.8) model.tables(aov(Earnings ~ Sex + Education, df1)) Tables of effects Sex Sex Female Male -9489 9489 Education Education NoHS HS HS+2 HS+4 HS+6 -23124 -15469 -7850 12632 33812 plot(M5, type = \"diagnostic\", res.size = 0.8, row.size = 0.8, col.size = 0.8) $slope cv 0.9223166 M4b <- eda_pol(df1, row = Education, col = Sex, val = Earnings , stat = mean, plot = FALSE, p = 0.1, maxiter = 1) plot(M4b, res.size = 0.8, row.size = 0.8, col.size = 0.8) plot(M4b, \"diagnostic\")"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"The empirical QQ plot","text":"empirical quantile-quantile plot (QQ plot) probably one underused least appreciated plots univariate analysis. used compare two distributions across full range values. generalization boxplot limit comparison just median upper lower quartiles. fact, compares values matching value one batch corresponding quantile batch. sizes batch need . differ, larger batch interpolated smaller batch’s set quantiles. QQ plot help visualize differences distributions, can also model relationship batches. Note confused modeling relationship bivariate dataset latter pairs points observational units whereas QQ plot pairs values matching quantiles.","code":""},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"anatomy-of-the-eda_qq-plot","dir":"Articles","previous_headings":"","what":"Anatomy of the eda_qq plot","title":"The empirical QQ plot","text":"point represents matching quantiles batch. shaded boxes represent batch’s interquartile range (mid 50% values). Solid dashed lines inside shaded boxes represent batch’s medians. lightly shaded dashed dots represent batch’s 12.5th 87.5th quantiles (.e. show ends mid 80% values). upper right-hand text indicates power transformation applied batches (default power 1 original measurement scale). formula applied one batches, appear upper right-hand text. eda_qq also output suggested relationship y variable x variable console. bases batch’s interquartile values. output assigned new object, object store list following values: x value (interpolated needed), y value (interpolated needed), power parameter, formula applied x variable, formula applied y variable.","code":"#> [1] \"Suggested offsets:y = x * 1.4574 + (0.9914)\""},{"path":[]},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"data-type","dir":"Articles","previous_headings":"An overview of some of the function arguments","what":"Data type","title":"The empirical QQ plot","text":"function accept dataframe values column group column, accept two separate vector objects. example, pass two separate vector object, x y, type: data dataframe, type:","code":"library(tukeyedar) set.seed(207) x <- rnorm(30) y <- rnorm(30) + 0.5 eda_qq(x, y) dat <- data.frame(val = c(x, y), cat = rep(c(\"x\", \"y\"), each = 30)) eda_qq(dat, val, cat)"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"suppressing-the-plot","dir":"Articles","previous_headings":"An overview of some of the function arguments","what":"Suppressing the plot","title":"The empirical QQ plot","text":"can suppress plot x y values outputted list. batches match size, output show interpolated values output batches match size. output also include power parameter applied batches well formula applied one batches (fx formula applied x variable fy formula applied y variable).","code":"out <- eda_qq(x,y, plot = FALSE) #> [1] \"Suggested offsets:y = x * 1.1201 + (0.5618)\" out #> $x #> [1] -2.0207122 -1.6048333 -1.5620907 -1.5128732 -1.3126378 -1.1770882 #> [7] -1.0871906 -0.9258832 -0.8896555 -0.6152073 -0.3140113 -0.2996734 #> [13] -0.2954234 -0.2199849 -0.2108781 0.1202060 0.2608893 0.2680445 #> [19] 0.2910663 0.4239690 0.4262605 0.4301416 0.5176361 0.6085180 #> [25] 0.6880919 0.6929772 0.7640838 0.9037644 1.0124869 1.0503544 #> #> $y #> [1] -2.10710669 -1.30465821 -1.17618932 -1.17253191 -0.86423268 -0.40162859 #> [7] -0.37002087 -0.19629536 -0.07210822 -0.01829722 0.05826287 0.09105884 #> [13] 0.09371398 0.13698992 0.22318119 0.43006689 0.52597363 0.72665767 #> [19] 0.81351407 1.00612388 1.01831440 1.06713353 1.23708449 1.24360530 #> [25] 1.33232007 1.43973056 1.59125312 1.64852115 1.77464625 1.93823390 #> #> $p #> [1] 1 #> #> $fx #> NULL #> #> $fy #> NULL"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"setting-the-grey-box-and-dashed-line-parameters","dir":"Articles","previous_headings":"An overview of some of the function arguments","what":"Setting the grey box and dashed line parameters","title":"The empirical QQ plot","text":"grey box highlights interquartile ranges batches. boundary can modified via b.val argument. Likewise, lightly shaded dashed dots highlight mid 80% values can modified via l.val argument. example, highlight mid 68% values using grey boxes mid 95% values using lightly shaded dashed dots, type: can suppress plotting grey box lightly shaded dashed dots setting q = FALSE. affect median dashed lines.","code":"eda_qq(x, y, b.val = c(0.16, 0.84), l.val = c(0.025, 0.975))"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"applying-a-formula-to-one-of-the-batches","dir":"Articles","previous_headings":"An overview of some of the function arguments","what":"Applying a formula to one of the batches","title":"The empirical QQ plot","text":"can apply formula batch via fx argument x-variable fy argument y-variable. formula passed text string. example, add 0.5 x values, type:","code":"eda_qq(x, y, fx = \"x + 0.5\")"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"quantile-type","dir":"Articles","previous_headings":"An overview of some of the function arguments","what":"Quantile type","title":"The empirical QQ plot","text":"many different quantile algorithms available R. see full list quantile types, refer quantile help page: ?quantile. default, eda_qq() adopts q.type = 5. general, choice quantiles really matter, especially large datasets. want adopt R’s default type, set q.type = 7.","code":""},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"point-symbols","dir":"Articles","previous_headings":"An overview of some of the function arguments","what":"Point symbols","title":"The empirical QQ plot","text":"point symbol type, color size can modified via pch, p.col (/p-fill) size arguments. color can either built-color name (can see full list typing colors()) rgb() function. define color using one built-color names, can adjust transparency via alpha argument. alpha value 0 renders point completely transparent value 1 renders point completely opaque. point symbol can take two color parameters depending point type. pch number 21 25, p.fill define fill color p.col define border color. point symbol type, p.fill argument ignored. examples:","code":"eda_qq(x, y, p.fill = \"bisque\", p.col = \"red\", size = 1.2) eda_qq(x, y, pch = 16, p.col = \"tomato2\", size = 1.5, alpha = 0.5) eda_qq(x, y, pch = 3, p.col = \"tomato2\", size = 1.5)"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"interpreting-a-qq-plot","dir":"Articles","previous_headings":"","what":"Interpreting a QQ plot","title":"The empirical QQ plot","text":"help interpret following QQ plots, ’ll compare plot matching kernel density plots.","code":""},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"identical-distributions","dir":"Articles","previous_headings":"Interpreting a QQ plot","what":"Identical distributions","title":"The empirical QQ plot","text":"first example, generate QQ plot two identical distributions. two distributions identical, points line along x=y line shown . also generate overlapping density plots seen right plot.","code":"library(tukeyedar) set.seed(543) x <- rnorm(100) y <- x eda_qq(x, y) eda_dens(x, y)"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"additive-offset","dir":"Articles","previous_headings":"Interpreting a QQ plot","what":"Additive offset","title":"The empirical QQ plot","text":"work batches, time offset second batch, y, 2. case referred additive offset. ’ll note points parallel x=y line. indicates distributions exact shape. , fall x=y line–offset +2 units measured along y-axis, expected. can confirm adding 2 x batch: points overlap x=y line perfectly. density distribution overlap exactly well.","code":"library(tukeyedar) set.seed(543) x <- rnorm(100) y <- x + 2 eda_qq(x, y) eda_dens(x, y) eda_qq(x, y, fx = \"x + 2\") eda_dens(x, y, fx = \"x + 2\")"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"multiplicative-offset","dir":"Articles","previous_headings":"Interpreting a QQ plot","what":"Multiplicative offset","title":"The empirical QQ plot","text":"Next, explore two batches share central value second batch 0.5 times first. case referred multiplicative offset. , series points angle x=y line yet, ’ll note points follow perfectly straight line. suggests multiplicative offset change location. indicates “shape” batches similar, one “wider” . , y half wide x. can also state x twice wide y know multiplicative offset since synthetically generated values x y. practice eyeballing multiplier plot straightforward. can use suggested offset 0.5 displayed console help guide us. can also use angle points x=y line judge direction take choosing multiplier. points make angle less x=y line, want choose x multiplier less 1. angle greater x=y line, want choose multiplier greater 1. , know multiplier 0.5. Let’s confirm following code chunk:","code":"y <- x * 0.5 eda_qq(x, y) eda_dens(x, y) eda_qq(x, y, fx = \"x * 0.5\") eda_dens(x, y, fx = \"x * 0.5\")"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"both-additive-and-multiplicative-offset","dir":"Articles","previous_headings":"Interpreting a QQ plot","what":"Both additive and multiplicative offset","title":"The empirical QQ plot","text":"next example, add multiplicative additive offset data. now see multiplicative offset (points form angle x=y line) additive offset (points intersect x=y line). suggests width batches differ values offset constant value across full range values. ’s usually best first identify multiplicative offset points rendered parallel x=y line. multiplier identified, can identify additive offset. surprise, multiplier 0.5 renders series points parallel x=y line. can now eyeball offset +2 measuring distance points x=y line measured along y-axis. can model relationship y x \\(y = x * 0.5 + 2\\).","code":"y <- x * 0.5 + 2 eda_qq(x, y) eda_dens(x, y) eda_qq(x, y, fx = \"x * 0.5\") eda_dens(x, y, fx = \"x * 0.5\") eda_qq(x, y, fx = \"x * 0.5 + 2\") eda_dens(x, y, fx = \"x * 0.5 + 2\")"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"batches-need-not-be-symmetrical","dir":"Articles","previous_headings":"Interpreting a QQ plot","what":"Batches need not be symmetrical","title":"The empirical QQ plot","text":"far, worked normally distributed dataset. note distribution can used QQ plot. example, two equally skewed distributions differ central values generate points perfectly lined . Since distributions identical shape, points follow straight line regardless nature shape (skewed, unimodal, bimodal, etc…)","code":"set.seed(540) x2 <- rbeta(100, 1, 8) y2 <- x2 + 0.2 eda_qq(x2, y2) eda_dens(x2, y2)"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"perfectly-alligned-points-are-rare","dir":"Articles","previous_headings":"Interpreting a QQ plot","what":"Perfectly alligned points are rare","title":"The empirical QQ plot","text":"Note observational data, two batches pulled underlying population seldom follow perfectly straight line. example, note meandering pattern generated following QQ plot batches pulled Normal distribution. Note points meander x=y line, points tail near end distribution.","code":"set.seed(702) x <- rnorm(100) y <- rnorm(100) eda_qq(x, y)"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"power-transformation","dir":"Articles","previous_headings":"","what":"Power transformation","title":"The empirical QQ plot","text":"eda_qq function allows apply power transformation batches. Note transforming just one batch makes little sense since end comparing two batches measured different scale. example, make use R’s Indometh dataset compare indometacin plasma concentrations two test subjects. QQ plot’s grey boxes shifted towards lower values. grey boxes show mid 50% values batch (aka, IQR range). IQR shifted towards lower values higher values, suggests skewed data. Another telltale sign skewed dataset gradual dispersion points two diagonal directions. , go relatively high density points near lower values lower density points near higher values. preclude us identifying multiplicative/additive offsets, many statistical procedures benefit symmetrical distribution. Given values measures concentration, might want adopt log transformation. power transformation defined p argument (power parameter value 0 defines log transformation). seems decent job symmetrizing distributions. Note suggested offset displayed console applies transformed dataset. can verify applying offset x batch. can characterize differences indometacin plasma concentrations subject 1 subject 2 \\(log(conc)_{s2} = log(conc)_{s1} * 0.8501 + 0.3902\\)","code":"s1 <- subset(Indometh, Subject == 1, select = conc, drop = TRUE) # Test subject 1 s2 <- subset(Indometh, Subject == 2, select = conc, drop = TRUE) # Test subject 2 eda_qq(s1, s2) eda_qq(s1, s2, p = 0) #> [1] \"Suggested offsets:y = x * 0.8501 + (0.3902)\" eda_qq(s1, s2, p = 0, fx = \"x * 0.8501 + 0.3902\")"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"the-tukey-mean-difference-plot","dir":"Articles","previous_headings":"Power transformation","what":"The Tukey mean-difference plot","title":"The empirical QQ plot","text":"Tukey mean-difference plot simply extension QQ plot whereby plot rotated x=y line becomes horizontal. can useful helping identify subtle differences point pattern line. plot rotated 45° mapping difference batches y-axis, mapping mean batches x-axis. example, following figure left (QQ plot) shows additive offset batches fails clearly identify multiplicative offset. latter can clearly seen Tukey mean-difference plot (right) invoked setting argument md = TRUE.","code":"y <- x * 0.97 + 0.3 eda_qq(x, y, title = \"QQ plot\") eda_qq(x, y, md = TRUE, title = \"M-D plot\")"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"a-working-example","dir":"Articles","previous_headings":"","what":"A working example","title":"The empirical QQ plot","text":"Many datasets distributions differ additively /multiplicatively, also general shape. may create complex point patterns QQ plot. cases can indicative different processes play different ranges values. ’ll explore case using wat95 wat05 dataframes available package. data represent derived normal temperatures 1981-2010 period (wat95) 1991-2020 period (wat05) city Waterville, Maine (USA). subset data daily average normals, avg: now compare distributions. first glance, batches seem differ. close points x=y line? rotate plot zoom x=y line setting md = TRUE. view proving far insightful. Note data, overall offset 0.5 suggesting new normals 0.5°F. warmer. stop , pattern observed Tukey mean-difference plot far random. fact, can break pattern three distinct ones around 35°F 50°F. categorize groups low, mid high values. next chunk code generate three separate QQ plots range values. suggested offsets displayed console group. important note splitting paired values generated earlier eda_qq function ($avg) values original datasets (old new). split original data prior combining eda_qq, generated different patterns QQ Tukey plots. approach generate different quantile pairs. Next, ’ll adopt suggested offsets generated console. proposed offsets seem good job characterizing differences temperatures. characterization differences normal temperatures old new set normals can formalized follows: \\[ new = \\begin{cases} old * 0.9506 + 1.661, & T_{avg} < 35 \\\\ old * 1.0469 - 1.6469, & 35 \\le T_{avg} < 50 \\\\ old * 0.9938 + 0.9268, & T_{avg} \\ge 50 \\end{cases} \\] key takeaways analysis can summarized follows: Overall, new normals 0.5°F warmer. offset uniform across full range temperature values. lower temperature (less 35°F), new normals slightly narrower distribution 1.7°F warmer. mid temperature values, (35°F 50°F new normals slightly wider distribution overall 1.6°F cooler. higher range temperature values (greater 50°F), new normals slightly narrower distribution 0.9°F warmer.","code":"old <- wat95$avg # legacy temperature normals new <- wat05$avg # current temperature normals out <- eda_qq(old, new) out <- eda_qq(old, new, md = TRUE) labs <- c(\"low\", \"mid\", \"high\") out$avg <- (out$x + out$y) / 2 out <- as.data.frame(out[c(1:2,6)]) out2 <- split(out, cut(out$avg, c(min(out$avg), 35, 50, max(out$avg)), labels = labs, include.lowest = TRUE)) sapply(labs, FUN = \\(x) {eda_qq(out2[[x]]$x, out2[[x]]$y , xlab = \"old\", ylab = \"new\", md = T) title(x, line = 3, col.main = \"orange\")} ) #> [1] \"Suggested offsets:y = x * 0.9506 + (1.661)\" #> [1] \"Suggested offsets:y = x * 1.0469 + (-1.6469)\" #> [1] \"Suggested offsets:y = x * 0.9938 + (0.9268)\" xform <- c(\"x * 0.9506 + 1.661\", \"x * 1.0469 - 1.6469\", \"x * 0.9938 + 0.9268\") names(xform) <- labs sapply(labs, FUN = \\(x) {eda_qq(out2[[x]]$x, out2[[x]]$y, fx = xform[x], xlab = \"old\", ylab = \"new\", md = T) title(x, line = 3, col.main = \"coral3\")} )"},{"path":"https://mgimond.github.io/tukeyedar/articles/RLine.html","id":"the-resistant-line-basics","dir":"Articles","previous_headings":"","what":"The resistant line basics","title":"Resistant Line","text":"eda_rline function fits robust line bivariate dataset. first breaking data three roughly equal sized batches following x-axis variable. uses batches’ median values compute slope intercept. However, function doesn’t stop . fitting inital line, function fits another line (following aforementioned methodology) model’s residuals. slope close zero, residual slope added original fitted model creating updated model. iteration repeated residual slope close zero residual slope changes sign (point average last two iterated slopes used final fit). example iteration follows using data Velleman et. al’s book. dataset, neoplasms, consists breast cancer mortality rates regions varying mean annual temperatures. three batches divided follows: Note 16 record dataset divisible three thus forcing extra point middle batch (remainder division three two, extra point added tail-end batches). Next, compute medians batch (highlighted red points following figure). two end medians used compute slope : \\[ b = \\frac{y_r - y_l}{x_r-x_l} \\] subscripts \\(r\\) \\(l\\) reference median values right-left-batches. slope computed, intercept can computed follows: \\[ median(y_{l,m,r} - b * x_{l,m,r}) \\] \\((x,y)_{l,m,r}\\) median x y values batch. line used compute first set residuals. line fitted residuals following procedure outlined . initial model slope intercept 3.412 -69.877 respectively residual’s slope intercept -0.873 41.451 respectively. residual slope added first computed slope process repeated thus generating following tweaked slope updated residuals: updated slope now 3.412 + (-0.873) = 2.539. iteration continues slope residuals stabilize. final line working example , final slope intercept 2.89 -45.91, respectively.","code":"#> (Intercept) x #> -21.794691 2.357695"},{"path":"https://mgimond.github.io/tukeyedar/articles/RLine.html","id":"implementing-the-resistant-line","dir":"Articles","previous_headings":"","what":"Implementing the resistant line","title":"Resistant Line","text":"eda_rline takes just three arguments: data frame, x variable y variable. function output list. elements b model’s intercept slope. vectors x y input values sorted x. res vector final residuals sorted x. xmed ymed vectors medians three batches. px py power transformations applied variables. output list class eda_rline. plot method available class. see resistant line compares ordinary least-squares (OLS) regression slope, add output lm model plot via abline(): regression model computes slope 2.36 whereas resistant line function generates slope 2.89. scatter plot, can spot point may undo influence regression line (point highlighted green following plot). Removing point data generates OLS regression line inline resistant model. point interest 15th record neoplasms data frame. Note OLS slope inline generated resistant line. ’ll also note resistant line slope also changed. Despite resistant nature line, removal point changed makeup first tier values (note leftward shift vertical dashed line). changed makeup batch thus changing median values first second tier batches.","code":"library(tukeyedar) M <- eda_rline(neoplasms, Temp, Mortality) M #> $b #> [1] 2.890173 #> #> $a #> [1] -45.90578 #> #> $res #> [1] 21.2982659 0.1398844 -2.1791908 8.8294798 -11.2485549 -7.6167630 #> [7] -0.1398844 4.7589595 -9.0092486 -2.1994220 2.7554913 -7.2676301 #> [13] -0.3907514 6.1861272 1.7971098 0.1398844 #> #> $x #> [1] 31.8 34.0 40.2 42.1 42.3 43.5 44.2 45.1 46.3 47.3 47.8 48.5 49.2 49.9 50.0 #> [16] 51.3 #> #> $y #> [1] 67.3 52.5 68.1 84.6 65.1 72.2 81.7 89.2 78.9 88.6 95.0 87.0 #> [13] 95.9 104.5 100.4 102.5 #> #> $xmed #> [1] 40.2 45.7 49.9 #> #> $ymed #> [1] 67.30 85.15 100.40 #> #> $index #> [1] 5 11 16 #> #> $xlab #> [1] \"Temp\" #> #> $ylab #> [1] \"Mortality\" #> #> $px #> [1] 1 #> #> $py #> [1] 1 #> #> $iter #> [1] 4 #> #> attr(,\"class\") #> [1] \"eda_rline\" plot(M) abline(lm(Mortality ~ Temp, neoplasms), lty = 2) points(neoplasms[15,], col=\"#43CD80\",cex=1.5 ,pch=20) neoplasms.sub <- neoplasms[-15,] M.sub <- eda_rline(neoplasms.sub, Temp, Mortality) plot(M.sub) abline(lm(Mortality ~ Temp, neoplasms.sub), lty = 2) # Regression model with data subset"},{"path":[]},{"path":"https://mgimond.github.io/tukeyedar/articles/RLine.html","id":"nine-point-data","dir":"Articles","previous_headings":"Other examples","what":"Nine point data","title":"Resistant Line","text":"nine_point dataset used Hoaglin et. al (p. 139) test resistant line function’s ability stabilize wild oscillations computed slopes across iterations. , slope intercept 0.067 0.133 respectively matching 1/15 2/15 values computed Hoaglin et. al.","code":"M <- eda_rline(nine_point, X,Y) plot(M)"},{"path":"https://mgimond.github.io/tukeyedar/articles/RLine.html","id":"age-vs--height-data","dir":"Articles","previous_headings":"Other examples","what":"Age vs. height data","title":"Resistant Line","text":"age_height another dataset found Hoaglin et. al (p. 135). gives ages heights children private urban school. , slope intercept 0.429 91.007 respectively matching 0.426 slope closely matching 90.366 intercept values computed Hoaglin et. al page 137.","code":"M <- eda_rline(age_height, Months,Height) plot(M)"},{"path":"https://mgimond.github.io/tukeyedar/articles/RLine.html","id":"not-all-relationships-are-linear","dir":"Articles","previous_headings":"","what":"Not all relationships are linear!","title":"Resistant Line","text":"’s important remember resistant line technique valid bivariate relationship linear. , ’ll step example highlighted Velleman et. al (p. 138) using R built-mtcars dataset. First, ’ll fit resistant line data. ’s important note just resistant line can fit necessarily imply relationship linear. assess linearity mtcars dataset, ’ll make use eda_3pt function. ’s clear two half slopes relationship linear. Velleman et. al first suggest re-expressing mpg 1/mpg (.e. applying power transformation -1) giving us number gallons consumed per mile driven. two half slopes still differ. therefore opt re-express disp variable. One possibility take inverse 1/3 since displacement measure volume (e.g. length3) gives us: Now identified re-expressions linearises relationship, can fit resistant line. (Note grey line generated eda_3pt function resistant line generated eda_rline.)","code":"M <- eda_rline(mtcars, disp, mpg) plot(M) eda_3pt(mtcars, disp, mpg) eda_3pt(mtcars, disp, mpg, py = -1, ylab = \"gal/mi\") eda_3pt(mtcars, disp, mpg, px = -1/3, py = -1, ylab = \"gal/mi\", xlab = expression(\"Displacement\"^{-1/3})) M <- eda_rline(mtcars, disp, mpg, px = -1/3, py = -1) plot(M, ylab = \"gal/mi\", xlab = expression(\"Displacement\"^{-1/3}))"},{"path":"https://mgimond.github.io/tukeyedar/articles/RLine.html","id":"computing-a-confidence-interval","dir":"Articles","previous_headings":"","what":"Computing a confidence interval","title":"Resistant Line","text":"Confidence intervals coefficients can estimated using bootstrapping techniques. two approaches: resampling residuals resampling x-y cases.","code":""},{"path":"https://mgimond.github.io/tukeyedar/articles/RLine.html","id":"resampling-the-model-residuals","dir":"Articles","previous_headings":"Computing a confidence interval","what":"Resampling the model residuals","title":"Resistant Line","text":", fit resistant line extract residuals. re-run model many times replacing original y values modeled y values plus resampled residuals generate confidence intervals. Now plot distributions, tabulate 95% confidence interval.","code":"n <- 999 # Set number of iterations M <- eda_rline(neoplasms, Temp, Mortality) # Fit the resistant line bt <- array(0, dim=c(n, 2)) # Create empty bootstrap array for(i in 1:n){ #bootstrap loop df.bt <- data.frame(x=M$x, y = M$y +sample(M$res,replace=TRUE)) bt[i,1] <- eda_rline(df.bt,x,y)$a bt[i,2] <- eda_rline(df.bt,x,y)$b } hist(bt[,1], main=\"Intercept distribution\") hist(bt[,2], main=\"Slope distribution\") conf <- t(data.frame(Intercept = quantile(bt[,1], p=c(0.05,0.95) ), Slope = quantile(bt[,2], p=c(0.05,0.95) ))) conf #> 5% 95% #> Intercept -77.610056 11.955567 #> Slope 1.668588 3.561084"},{"path":"https://mgimond.github.io/tukeyedar/articles/RLine.html","id":"resampling-the-x-y-paired-values","dir":"Articles","previous_headings":"Computing a confidence interval","what":"Resampling the x-y paired values","title":"Resistant Line","text":", resample x-y paired values (replacement) compute resistant line time. Now plot distributions, tabulate 95% confidence interval.","code":"n <- 1999 # Set number of iterations bt <- array(0, dim=c(n, 2)) # Create empty bootstrap array for(i in 1:n){ #bootstrap loop recs <- sample(1:nrow(neoplasms), replace = TRUE) df.bt <- neoplasms[recs,] bt[i,1]=eda_rline(df.bt,Temp,Mortality)$a bt[i,2]=eda_rline(df.bt,Temp,Mortality)$b } hist(bt[,1], main=\"Intercept distribution\") hist(bt[,2], main=\"Slope distribution\") conf <- t(data.frame(Intercept = quantile(bt[,1], p=c(0.05,0.95) ), Slope = quantile(bt[,2], p=c(0.05,0.95) ))) conf #> 5% 95% #> Intercept -114.259180 12.78967 #> Slope 1.643678 4.31675"},{"path":"https://mgimond.github.io/tukeyedar/articles/RLine.html","id":"references","dir":"Articles","previous_headings":"","what":"References","title":"Resistant Line","text":"Applications, Basics Computing Exploratory Data Analysis, P.F. Velleman D.C. Hoaglin, 1981. Understanding robust exploratory data analysis, D.C. Hoaglin, F. Mosteller J.W. Tukey, 1983.","code":""},{"path":"https://mgimond.github.io/tukeyedar/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Manuel Gimond. Author, maintainer.","code":""},{"path":"https://mgimond.github.io/tukeyedar/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"text","code":"@Misc{, title = {tukeyedar: A package of Tukey inspired EDA functions}, author = {Manuel Gimond}, url = {https://mgimond.github.io/tukeyedar/}, year = {2021}, }"},{"path":"https://mgimond.github.io/tukeyedar/index.html","id":"tukeyedar","dir":"","previous_headings":"","what":"Tukey Inspired Exploratory Data Analysis Functions","title":"Tukey Inspired Exploratory Data Analysis Functions","text":"tukeyedar package houses data exploration tools. Many functions inspired work published Tukey (1977), D. C. Hoaglin Tukey (1983) Velleman Hoaglin (1981). Note package beta mode, use discretion.","code":""},{"path":"https://mgimond.github.io/tukeyedar/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Tukey Inspired Exploratory Data Analysis Functions","text":"can install development version tukeyedar GitHub : Note vignettes automatically generated command; note vignettes available website (see next section). want local version vignettes, add build_vignettes = TRUE parameter. , reason vignettes created, might want re-install package force=TRUE parameter.","code":"# install.packages(\"devtools\") devtools::install_github(\"mgimond/tukeyedar\") devtools::install_github(\"mgimond/tukeyedar\", build_vignettes = TRUE) devtools::install_github(\"mgimond/tukeyedar\", build_vignettes = TRUE, force=TRUE)"},{"path":"https://mgimond.github.io/tukeyedar/index.html","id":"vignettes","dir":"","previous_headings":"","what":"Vignettes","title":"Tukey Inspired Exploratory Data Analysis Functions","text":"’s strongly recommended read vignettes. can accessed website: detailed rundown resistant line function median polish empirical QQ plot chose vignettes locally created installed package, can view locally via vignette(\"RLine\", package = \"tukeyedar\"). use dark themed IDE, vignettes may render well might opt view web browser via functions RShowDoc(\"RLine\", package = \"tukeyedar\").","code":""},{"path":"https://mgimond.github.io/tukeyedar/index.html","id":"using-the-functions","dir":"","previous_headings":"","what":"Using the functions","title":"Tukey Inspired Exploratory Data Analysis Functions","text":"functions start eda_. example, generate three point summary plot mpg vs. disp mtcars dataset, type: Note functions pipe friendly. example, following works:","code":"library(tukeyedar) eda_3pt(mtcars, disp, mpg) # Using R >= 4.1 mtcars |> eda_3pt(disp, mpg) # Using magrittr (or any of the tidyverse packages) library(magrittr) mtcars %>% eda_3pt(disp, mpg)"},{"path":"https://mgimond.github.io/tukeyedar/reference/age_height.html","id":null,"dir":"Reference","previous_headings":"","what":"Age vs. height for private and rural school children — age_height","title":"Age vs. height for private and rural school children — age_height","text":"data reproduced Hoaglin et al.'s book originally sourced Bernard G. Greenberg (1953) American Journal Public Health (vol 43, pp. 692-699). dataset tabulate children's height weight urban private rural public schools.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/age_height.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Age vs. height for private and rural school children — age_height","text":"","code":"age_height"},{"path":"https://mgimond.github.io/tukeyedar/reference/age_height.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Age vs. height for private and rural school children — age_height","text":"data frame 18 rows 2 variables: Months Child's age months Height Child's height cm","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/age_height.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Age vs. height for private and rural school children — age_height","text":"Understanding robust exploratory data analysis, D.C. Hoaglin, F. Mosteller J.W. Tukey. (page 135)","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_3pt.html","id":null,"dir":"Reference","previous_headings":"","what":"3-point summary plot — eda_3pt","title":"3-point summary plot — eda_3pt","text":"eda_3pt splits data 3 groups (whose summary locations defined respective medians), two half slopes linking groups. function return scatter plot showing half-slopes red solid lines. solid grey slope linking tail-end groups shows desired shape half-slopes. goal two halve slopes line closely possible solid grey slope via re-expression techniques seeking linear relationship variables. function also return half-slopes ratio hsrtio direction re-expression X Y values ladder powers.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_3pt.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"3-point summary plot — eda_3pt","text":"","code":"eda_3pt( dat, x, y, px = 1, py = 1, tukey = TRUE, axes = TRUE, pch = 21, equal = TRUE, p.col = \"grey50\", p.fill = \"grey80\", size = 0.8, alpha = 0.7, xlab = NULL, ylab = NULL, dir = TRUE, grey = 0.6, ... )"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_3pt.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"3-point summary plot — eda_3pt","text":"dat Data frame x Column name assigned x axis y Column name assigned y axis px Power transformation apply x-variable py Power transformation apply y-variable tukey Boolean determining Tukey transformation adopted (FALSE adopts Box-Cox transformation) axes Boolean determining axes drawn. pch Point symbol type equal Boolean determining axes lengths match (.e. squate plot). p.col Color point symbol. p.fill Point fill color passed bg (used pch ranging 21-25). size Point size (0-1) alpha Point transparency (0 = transparent, 1 = opaque). applicable rgb() used define point colors. xlab X label output plot ylab Y label output plot dir Boolean indicating suggested ladder power direction displayed grey Grey level apply plot elements (0 1 1 = black) ... parameters passed graphics::plot function.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_3pt.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"3-point summary plot — eda_3pt","text":"Generates plot returns list following named components: hsrtio: ratio slopes. value close one suggests transformation needed. xmed: x-coordinate values three summary points. ymed: y-coordinate values three summary points.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_3pt.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"3-point summary plot — eda_3pt","text":"Computes three-point summary originally defined Tukey's EDA book (see reference).","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_3pt.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"3-point summary plot — eda_3pt","text":"Velleman, P. F., D. C. Hoaglin. 1981. Applications, Basics Computing Exploratory Data Analysis. Boston: Duxbury Press. D. C. Hoaglin, F. Mosteller, J. W. Tukey. 1983. Understanding Robust Exploratory Data Analysis. Wiley. Tukey, John W. 1977. Exploratory Data Analysis. Addison-Wesley.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_3pt.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"3-point summary plot — eda_3pt","text":"","code":"hsratio <- eda_3pt(cars, speed, dist) hsratio <- eda_3pt(cars, speed, dist, py = 1/3, ylab=expression(\"Dist\"^{1/3}))"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_add.html","id":null,"dir":"Reference","previous_headings":"","what":"Add graphical EDA elements to existing plot — eda_add","title":"Add graphical EDA elements to existing plot — eda_add","text":"eda_add adds graphical EDA elements scatter plot. Currently adds eda_rline fit points.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_add.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Add graphical EDA elements to existing plot — eda_add","text":"","code":"eda_add( x, pch = 24, p.col = \"darkred\", p.fill = \"yellow\", lty = 1, l.col = \"darkred\" )"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_add.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Add graphical EDA elements to existing plot — eda_add","text":"x Object class eda_rline pch Point symbol type p.col Point color passed col p.fill Point fill color passed bg (used pch ranging 21-25) lty Line type l.col Line color","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_add.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Add graphical EDA elements to existing plot — eda_add","text":"Returns eda_rline intercept slope. : Intercept b: Slope","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_add.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Add graphical EDA elements to existing plot — eda_add","text":"function adds eda_rline slope 3-pt summary points existing scatter plot. See accompanying vignette Resistant Line detailed breakdown resistant line technique.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_add.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Add graphical EDA elements to existing plot — eda_add","text":"","code":"eda_lm(mtcars, x = wt, y = mpg) #> (Intercept) x #> 37.285126 -5.344472 Mr <- eda_rline(mtcars, x=wt, y=mpg) eda_add(Mr, l.col = \"blue\") #> $a #> [1] 37.763 #> #> $b #> [1] -5.524372 #>"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_bipow.html","id":null,"dir":"Reference","previous_headings":"","what":"Ladder of powers transformation on bivariate data with three-point summary plot — eda_bipow","title":"Ladder of powers transformation on bivariate data with three-point summary plot — eda_bipow","text":"Re-expresses vector ladder powers. Requires eda_3pt() function.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_bipow.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Ladder of powers transformation on bivariate data with three-point summary plot — eda_bipow","text":"","code":"eda_bipow(dat, x, y, p = c(-1, 0, 0.5, 1, 2), tukey = TRUE, ...)"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_bipow.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Ladder of powers transformation on bivariate data with three-point summary plot — eda_bipow","text":"dat Data frame x Column name assigned x axis y Column name assigned y axis p Vector powers tukey set TRUE, adopt Tukey's power transformation. FALSE, adopt Box-Cox transformation. ... parameters passed graphics::plot function.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_bipow.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Ladder of powers transformation on bivariate data with three-point summary plot — eda_bipow","text":"return value","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_bipow.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Ladder of powers transformation on bivariate data with three-point summary plot — eda_bipow","text":"Generates matrix scatter plots boxplots various re-expressions x y values. 3-point summary associated half-slopes also plotted (function makes use eda_3pt function). values re-expressed using either Tukey power transformation (default) Box-Cox transformation (see eda_re information transformation techniques). Axes labels omitted reduce plot clutter.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_bipow.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Ladder of powers transformation on bivariate data with three-point summary plot — eda_bipow","text":"Tukey, John W. 1977. Exploratory Data Analysis. Addison-Wesley.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_bipow.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Ladder of powers transformation on bivariate data with three-point summary plot — eda_bipow","text":"","code":"data(cars) # Example 1 eda_bipow(dat = cars, x = speed, y = dist) # Custom powers eda_bipow(dat = cars, x = speed, y = dist, p = c(-1, -0.5, 0, 0.5, 1)) # Adopt box-cox transformation eda_bipow(dat = cars, x = speed, y = dist, tukey = FALSE, p = c(-1, -0.5, 0, 0.5, 1))"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_boxls.html","id":null,"dir":"Reference","previous_headings":"","what":"Boxplots equalized by level and spread — eda_boxls","title":"Boxplots equalized by level and spread — eda_boxls","text":"eda_boxls creates boxplots conditioned one variable providing option spreads levels /levels.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_boxls.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Boxplots equalized by level and spread — eda_boxls","text":"","code":"eda_boxls( dat, x, fac, p = 1, tukey = FALSE, outlier = TRUE, out.txt = NULL, type = \"none\", notch = FALSE, horiz = FALSE, outliers = TRUE, xlab = NULL, ylab = NULL, grey = 0.6, reorder = TRUE, reorder.stat = \"median\" )"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_boxls.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Boxplots equalized by level and spread — eda_boxls","text":"dat Data frame x Column name assigned values fac Column name assigned factor values conditioned p Power transformation apply variable tukey Boolean determining Tukey transformation adopted (FALSE adopts Box-Cox transformation) outlier Boolean indicating outliers plotted .txt Column whose values used label outliers type Plot type. \"none\" = equalization ; \"l\" = equalize level; \"ls\" = equalize level spread notch Boolean determining notches added. horiz plot horizontally (TRUE) vertically (FALSE) outliers plot outliers (TRUE) (FALSE) xlab X label output plot ylab Y label output plot grey Grey level apply plot elements (0 1 1 = black) reorder Boolean determining factors reordered based median, upper quartile lower quartile (set reorder.type). reorder.stat Statistic reorder level reorder set TRUE. Either \"median\", \"upper\" (upper quartile) \"lower\" (lower quartile). type set value \"none\", argument ignored stat defaults \"median\".","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_boxls.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Boxplots equalized by level and spread — eda_boxls","text":"values returned","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_boxls.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Boxplots equalized by level and spread — eda_boxls","text":"default, boxplots re-ordered median values. outlier text displayed value, modified data equalized level spread. Note notch offers 95 percent test null true medians equal assuming distribution batch approximately normal. notches overlap, can assume medians significantly different 0.05 level. Note notches correct multiple comparison issues three batches plotted.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_boxls.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Boxplots equalized by level and spread — eda_boxls","text":"","code":"# A basic boxplot. The outlier is labeled with the row number by default. eda_boxls(mtcars,mpg, cyl, type=\"none\") # A basic boxplot. The outlier is labeled with its own value. eda_boxls(mtcars,mpg, cyl, type=\"none\", out.txt=mpg ) # Boxplot equalized by level. Note that the outlier text is labeled with its # original value. eda_boxls(mtcars,mpg, cyl, type=\"l\", out.txt=mpg ) #> ======================== #> Note that the data have been equalized with \"type\" set to \"l\". #> ======================== # Boxplots equalized by level and spread eda_boxls(mtcars,mpg, cyl, type=\"ls\", out.txt=mpg ) #> ======================== #> Note that the data have been equalized with \"type\" set to \"ls\". #> ======================== # Hide outlier eda_boxls(mtcars,mpg, cyl, type=\"ls\", out.txt=mpg , outlier=FALSE) #> ======================== #> Note that the data have been equalized with \"type\" set to \"ls\". #> ======================== # Equalizing level helps visualize increasing spread with increasing # median value food <- read.csv(\"http://mgimond.github.io/ES218/Data/Food_web.csv\") eda_boxls(food, mean.length, dimension, type = \"l\") #> ======================== #> Note that the data have been equalized with \"type\" set to \"l\". #> ======================== # For long factor level names, flip plot eda_boxls(iris, Sepal.Length, Species, out.txt=Sepal.Length , horiz = TRUE) # By default, plots are ordered by their medians. singer <- lattice::singer eda_boxls(singer, height, voice.part, out.txt=height, horiz = TRUE) # To order by top quartile, set reorder.stat to \"upper\" eda_boxls(singer, height, voice.part, out.txt=height, horiz = TRUE, reorder.stat = \"upper\")"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_dens.html","id":null,"dir":"Reference","previous_headings":"","what":"Overlapping density distributions for two variables — eda_dens","title":"Overlapping density distributions for two variables — eda_dens","text":"eda_dens generates overlapping density distributions two variables.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_dens.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Overlapping density distributions for two variables — eda_dens","text":"","code":"eda_dens( x, y, fac = NULL, p = 1L, tukey = FALSE, fx = NULL, fy = NULL, grey = 0.6, col = \"red\", size = 0.8, alpha = 0.4, xlab = NULL, ylab = NULL, legend = TRUE, ... )"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_dens.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Overlapping density distributions for two variables — eda_dens","text":"x Vector first variable dataframe. y Vector second variable column defining continuous variable x dataframe. fac Column defining grouping variable x dataframe. p Power transformation apply sets values. tukey Boolean determining Tukey transformation adopted (FALSE adopts Box-Cox transformation). fx Formula apply x variable. computed transformation applied x variable. fy Formula apply y variable. computed transformation applied y variable. grey Grey level apply plot elements (0 1 1 = black). col Fill color second density distribution. size Point size (0-1). alpha Fill transparency (0 = transparent, 1 = opaque). applicable rgb() used define fill colors. xlab X variable label. Ignored x dataframe. ylab Y variable label. Ignored x dataframe. legend Boolean determining legend added plot. ... Arguments passed stats::density() function.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_dens.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Overlapping density distributions for two variables — eda_dens","text":"return value.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_dens.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Overlapping density distributions for two variables — eda_dens","text":"function generate overlapping density plots first variable assigned grey color second variable assigned default red color.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_dens.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Overlapping density distributions for two variables — eda_dens","text":"","code":"# Passing data as two separate vector objects set.seed(207) x <- rbeta(1000,2,8) y <- x * 1.5 + 0.1 eda_dens(x, y) # Passing data as a dataframe dat <- data.frame(val = c(x, y), grp = c(rep(\"x\", length(x)), rep(\"y\", length(y)))) eda_dens(dat, val, grp)"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_lm.html","id":null,"dir":"Reference","previous_headings":"","what":"Least Squares regression plot (with optional LOESS fit) — eda_lm","title":"Least Squares regression plot (with optional LOESS fit) — eda_lm","text":"eda_lm generates scatter plot fitted regression line. loess line can also added plot model comparison. axes scaled respective standard deviations match axes unit length.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_lm.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Least Squares regression plot (with optional LOESS fit) — eda_lm","text":"","code":"eda_lm( dat, x, y, xlab = NULL, ylab = NULL, px = 1, py = 1, tukey = FALSE, show.par = TRUE, reg = TRUE, w = NULL, sd = TRUE, mean.l = TRUE, grey = 0.6, pch = 21, p.col = \"grey50\", p.fill = \"grey80\", size = 0.8, alpha = 0.8, q = FALSE, q.val = c(0.16, 0.84), q.type = 5, loe = FALSE, lm.col = rgb(1, 0.5, 0.5, 0.8), loe.col = rgb(0.3, 0.3, 1, 1), stats = FALSE, stat.size = 0.8, loess.d = list(family = \"symmetric\", span = 0.7, degree = 1), ... )"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_lm.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Least Squares regression plot (with optional LOESS fit) — eda_lm","text":"dat Data frame. x Column assigned x axis. y Column assigned y axis. xlab X label output plot. ylab Y label output plot. px Power transformation apply x-variable. py Power transformation apply y-variable. tukey Boolean determining Tukey transformation adopted (FALSE adopts Box-Cox transformation). show.par Boolean determining power transformation displayed plot. reg Boolean indicating whether least squares regression line plotted. w Weight pass regression model. sd Boolean determining standard deviation lines plotted. mean.l Boolean determining x y mean lines added plot. grey Grey level apply plot elements (0 1 1 = black). pch Point symbol type. p.col Color point symbol. p.fill Point fill color passed bg (used pch ranging 21-25). size Point size (0-1). alpha Point transparency (0 = transparent, 1 = opaque). applicable rgb() used define point colors. q Boolean determining grey quantile boxes plotted. q.val F-values use define quantile box parameters. Defaults mid 68 used generate box. q.type Quantile type. Defaults 5 (Cleveland's f-quantile definition). loe Boolean indicating loess curve fitted. lm.col Regression line color. loe.col LOESS curve color. stats Boolean indicating regression summary statistics displayed. stat.size Text size stats output plot. loess.d list parameters passed loess.smooth function. robust loess used default. ... used.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_lm.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Least Squares regression plot (with optional LOESS fit) — eda_lm","text":"Returns residuals, intercept slope OLS fit. residuals: Regression model residuals : Intercept b: Slope","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_lm.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Least Squares regression plot (with optional LOESS fit) — eda_lm","text":"function plot OLS regression line , requested, loess fit. plot also display +/- 1 standard deviations dashed lines. theory, x y values follow perfectly Normal distribution, roughly 68 percent points fall lines. true 68 percent values can displayed grey rectangles setting q=TRUE. uses quantile function compute upper lower bounds defining inner 68 percent values. data follow Normal distribution, grey rectangle edges coincide +/- 1SD dashed lines. wish show interquartlie ranges (IQR) instead inner 68 percent values, simply set q.val = c(0.25,0.75). plot option re-express values via px py arguments. note re-expression produces NaN values (negative value logged) points removed plot. result fewer observations plotted. observations removed result re-expression warning message displayed console. re-expression powers shown upper right side plot. suppress display re-expressions set show.par = FALSE.","code":""},{"path":[]},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_lm.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Least Squares regression plot (with optional LOESS fit) — eda_lm","text":"","code":"# Add a regular (OLS) regression model and loess smooth to the data eda_lm(mtcars, wt, mpg, loe = TRUE) #> (Intercept) x #> 37.285126 -5.344472 # Add the inner 68% quantile to compare the true 68% of data to the SD eda_lm(mtcars, wt, mpg, loe = TRUE, q = TRUE) #> (Intercept) x #> 37.285126 -5.344472 # Show the IQR box eda_lm(mtcars, wt, mpg, loe = TRUE, q = TRUE, sd = FALSE, q.val = c(0.25,0.75)) #> (Intercept) x #> 37.285126 -5.344472 # Fit an OLS to the Income for Female vs Male df2 <- read.csv(\"https://mgimond.github.io/ES218/Data/Income_education.csv\") eda_lm(df2, x=B20004013, y = B20004007, xlab = \"Female\", ylab = \"Male\", loe = TRUE) #> (Intercept) x #> 10503.090485 1.086416 # Add the inner 68% quantile to compare the true 68% of data to the SD eda_lm(df2, x = B20004013, y = B20004007, xlab = \"Female\", ylab = \"Male\", q = TRUE) #> (Intercept) x #> 10503.090485 1.086416 # Apply a transformation to x and y axes: x -> 1/3 and y -> log eda_lm(df2, x = B20004013, y = B20004007, xlab = \"Female\", ylab = \"Male\", px = 1/3, py = 0, q = TRUE, loe = TRUE) #> (Intercept) x #> 8.58646713 0.02287702"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_lsum.html","id":null,"dir":"Reference","previous_headings":"","what":"Tukey's letter value summaries — eda_lsum","title":"Tukey's letter value summaries — eda_lsum","text":"eda_lsum letter value summary introduced John Tukey extends boxplot's 5 number summary exploring symmetry batch depth levels half (median) fourth (quartiles).","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_lsum.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Tukey's letter value summaries — eda_lsum","text":"","code":"eda_lsum(x, l = 5, all = TRUE)"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_lsum.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Tukey's letter value summaries — eda_lsum","text":"x Vector l Number levels (max = 9) Generate upper, lower mid summaries TRUE just generate mid summaries FALSE","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_lsum.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Tukey's letter value summaries — eda_lsum","text":"Returns dataframe letter value summaries.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_lsum.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Tukey's letter value summaries — eda_lsum","text":"Outputs data frame letter value summaries.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_lsum.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Tukey's letter value summaries — eda_lsum","text":"Exploratory Data Analysis, John Tukey, 1973.","code":""},{"path":[]},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_lsum.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Tukey's letter value summaries — eda_lsum","text":"","code":"x <- c(22, 8, 11, 3, 26, 1, 14, 18, 20, 25, 24) eda_lsum(x) #> letter depth lower mid upper spread #> 1 M 6.0 18.0 18.00 18.0 0.0 #> 2 H 3.5 9.5 16.25 23.0 13.5 #> 3 E 2.0 3.0 14.00 25.0 22.0 #> 4 D 1.5 2.0 13.75 25.5 23.5 #> 5 C 1.0 1.0 13.50 26.0 25.0"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_normfit.html","id":null,"dir":"Reference","previous_headings":"","what":"Normal fit vs density plot. — eda_normfit","title":"Normal fit vs density plot. — eda_normfit","text":"eda_normfit generates fitted Normal distribution data option compare density distribution.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_normfit.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Normal fit vs density plot. — eda_normfit","text":"","code":"eda_normfit( dat, x = NULL, grp = NULL, p = 1, tukey = FALSE, show.par = TRUE, sq = TRUE, inner = 0.6826, dens = TRUE, bw = \"SJ-dpi\", kernel = \"gaussian\", pch = 16, size = 0.8, alpha = 0.3, p.col = \"grey50\", p.fill = \"grey80\", grey = 0.7, col.ends = \"grey90\", col.mid = \"#EBC89B\", col.ends.dens = \"#EBC89B\", col.mid.dens = \"grey90\", offset = 0.02, tsize = 1.5, xlab = NULL, ylab = NULL, ... )"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_normfit.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Normal fit vs density plot. — eda_normfit","text":"dat Vector values dataframe. x Column values dat dataframe, ignored otherwise. grp Column grouping variables dat dataframe, ignored otherwise. p Power transformation apply sets values. tukey Boolean determining Tukey transformation adopted (FALSE adopts Box-Cox transformation). show.par Boolean determining power transformation displayed plot's upper-right corner. sq Boolean determining plot square. inner Fraction values captured inner color band normal density plots. Defaults 0.6826 (inner 68 values). dens Boolean determining density plot displayed alongside normal fit plot. bw Bandwidth parameter passed built-density function. kernel Kernel parameter passed built-density function. pch Point symbol type. size Point side. alpha Fill transparency (0 = transparent, 1 = opaque). applicable rgb() used define fill colors. p.col Color point symbol. p.fill Point fill color passed bg (used pch ranging 21-25). grey Grey level apply plot elements (0 1 1 = black). col.ends Fill color ends Normal distribution. col.mid Fill color middle band Normal distribution. col.ends.dens Fill color ends density distribution. col.mid.dens Fill color middle band density distribution. offset value (x-axis units) defines gap left right side plots. Ignored dens TRUE. tsize Size plot title. xlab X variable label. ylab Y variable label. ... Note used.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_normfit.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Normal fit vs density plot. — eda_normfit","text":"return value.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_normfit.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Normal fit vs density plot. — eda_normfit","text":"function generate (symmetrical) Normal distribution fitted data dens set FALSE side--side density/Normal fit plot dens set TRUE. latter, density plot left side Normal fit right side vertical axis. plots two fill colors: one inner band outer band. inner band shows area curve encompasses desired fraction values defined inner. default, value 0.6826, 68.26 percent (roughly percentage values covered +/- 1 standard deviations Normal distribution). Normal fit plot, range computed theoretical Normal actual values. density plot, range computed actual values. default, colors inverted Normal curve density curve. density plot drawn, Normal plot colors identical vertical axis. density plot desired, dens = TRUE, gap (defined offset) created left side density plot right side Normal fit plot. Points showing location values y-axis also added help view distribution relative density Normal fit curves. function makes use built- density function. , can pass bw kernel parameters density function. Measures centrality computed differently Normal fit density plots. mean computed Normal fit plot median computed density plot. measures centrality shown black horizontal lines plot. areas density Normal fit plots scaled peak values, respectively. , areas compared plots.","code":""},{"path":[]},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_normfit.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Normal fit vs density plot. — eda_normfit","text":"","code":"# Explore a skewed distribution set.seed(218) x <- rexp(500) # Generate base histogram hist(x) # Plot density/Normal fit plot eda_normfit(x) eda_normfit(x) # Limit the plot to just a Normal fit eda_normfit(x, dens = FALSE) #> #> !!!!!!!!!!!!!!!!!!!!!!!! #> Note that this is not a density plot. #> It's the Normal characterization of the data #> using the data's standard deviation. #> !!!!!!!!!!!!!!!!!!!!!!!! #> # The inner band's range can be modified. Here, we view the inter-quartile # range, +/- 1 standard deviation range and the inner 95% range) OP <- par(mfrow = c(1,3)) invisible(sapply(c(0.5, 0.6826, 0.95), function(prop) eda_normfit(x, inner = prop, tsize = 1, ylab = paste(prop*100,\"% of values\")))) par(OP) # The bandwidth selector can also be specified OP <- par(mfrow=c(2,3)) invisible(sapply(c(\"SJ-dpi\", \"nrd0\", \"nrd\", \"SJ-ste\", \"bcv\", \"ucv\" ), function(band) eda_normfit(x, bw = band, tsize=0.9, size=0, offset=0.005, ylab = band))) par(OP) # The bandwidth argument can also be passed a numeric value OP <- par(mfrow=c(1,3)) invisible(sapply(c(0.2, 0.1, 0.05 ), function(band) eda_normfit(x, bw = band, tsize=1,size=.5, offset=0.01, ylab = band))) par(OP) # Examples of a few kernel options OP <- par(mfrow=c(1,3)) invisible(sapply(c(\"gaussian\", \"optcosine\", \"rectangular\" ), function(k) eda_normfit(x, kernel = k, tsize=1, size=.5, offset=0.01, ylab = k))) par(OP) # Another example where data are passed as a dataframe set.seed(540) dat <- data.frame(value = rbeta(20, 1, 50), grp = sample(letters[1:3], 100, replace = TRUE)) eda_normfit(dat, value, grp) # Points can be removed and the gap rendered narrower eda_normfit(dat, value, grp, size = 0, offset = 0.01) # Color can be modified. Here we modify the density plot fill colors eda_normfit(dat, value, grp, size = 0, offset = 0.01, col.ends.dens = \"#A1D99B\", col.mid.dens = \"#E5F5E0\") # A power transformation can be applied to the data. Here # we'll apply a log transformation eda_normfit(dat, value, grp, p = 0)"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_pol.html","id":null,"dir":"Reference","previous_headings":"","what":"Polish two-way tables — eda_pol","title":"Polish two-way tables — eda_pol","text":"eda_pol Polishes two-way tables using median, means, customizable functions.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_pol.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Polish two-way tables — eda_pol","text":"","code":"eda_pol( x, row = NULL, col = NULL, val = NULL, stat = median, plot = TRUE, eps = 0.01, maxiter = 5, sort = FALSE, p = 1, tukey = FALSE, offset = 1e-05, col.quant = FALSE, colpal = \"RdYlBu\", adj.mar = TRUE, res.size = 1, row.size = 1, col.size = 1, res.txt = TRUE, label.txt = TRUE )"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_pol.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Polish two-way tables — eda_pol","text":"x three column data frame row Name column assigned row effect col Name column assigned column effect val Name column assigned response variable stat Polishing statistic (default median) plot Boolean determining output plot generated eps Convergence tolerance parameter maxiter Maximum number iterations sort Boolean determining effects row/columns sorted p Re-expression power parameter tukey Boolean determining Tukey's power transformation used. FALSE, Box-Cox transformation adopted. offset Offset add values leat one value 0 power negative col.quant Boolean determining quantile classification scheme used colpal Color palette adopt adj.mar Boolean determining margin width needs accomodate labels res.size Size residual values plot [0-1] row.size Size row effect values plot [0-1] col.size Size column effect values plot [0-1] res.txt Boolean determining values added plot label.txt Boolean determining margin column labels plotted","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_pol.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Polish two-way tables — eda_pol","text":"list class eda_polish following named components: long median polish residuals three columns: Column levels, row levels residual values. wide median polish residuals table wide form. row Row effects table col Column effects table global Overall value (common value) iter Number iterations polish stabilizes. cv Table residuals, row effects, column effects CV values long form. power Transformation power applied values prior polishing. IQ_row Ratio interquartile row effect values 80th quantile residuals. IQ_col Ratio interquartile column effect values 80th quantile residuals.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_pol.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Polish two-way tables — eda_pol","text":"function performs polish two way table. default, applies median polish, statistical summaries mean can passed function via stat = parameter. function returns list row/column effects along global residual values. also generate colored table plot = TRUE.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_pol.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Polish two-way tables — eda_pol","text":"","code":"df <- data.frame(region = rep( c(\"NE\", \"NC\", \"S\", \"W\"), each = 5), edu = rep( c(\"ed8\", \"ed9to11\", \"ed12\", \"ed13to15\", \"ed16\"), 4), perc = c(25.3, 25.3, 18.2, 18.3, 16.3, 32.1, 29, 18.8, 24.3, 19, 38.8, 31, 19.3, 15.7, 16.8, 25.4, 21.1, 20.3, 24, 17.5)) M <- eda_pol(df, row = region, col = edu, val = perc, plot = FALSE) plot(M)"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_qq.html","id":null,"dir":"Reference","previous_headings":"","what":"QQ and MD plots — eda_qq","title":"QQ and MD plots — eda_qq","text":"eda_qq Generates empirical Normal QQ plot well Tukey mean-difference plot.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_qq.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"QQ and MD plots — eda_qq","text":"","code":"eda_qq( x, y = NULL, fac = NULL, norm = FALSE, p = 1L, tukey = FALSE, md = FALSE, q.type = 5, fx = NULL, fy = NULL, plot = TRUE, show.par = TRUE, grey = 0.6, pch = 21, p.col = \"grey50\", p.fill = \"grey80\", size = 0.8, alpha = 0.8, q = TRUE, b.val = c(0.25, 0.75), l.val = c(0.125, 0.875), xlab = NULL, ylab = NULL, title = NULL, t.size = 1.2, ... )"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_qq.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"QQ and MD plots — eda_qq","text":"x Vector first variable dataframe. y Vector second variable column defining continuous variable x dataframe. fac Column defining grouping variable x dataframe. norm Boolean determining Normal QQ plot generated. p Power transformation apply sets values. tukey Boolean determining Tukey transformation adopted (FALSE adopts Box-Cox transformation). md Boolean determining Tukey mean-difference plot generated. q.type integer 1 9 selecting one nine quantile algorithms. (See quantiletile function). fx Formula apply x variable. computed transformation applied x variable. fy Formula apply y variable. computed transformation applied y variable. plot Boolean determining plot generated. show.par Boolean determining parameters power transformation formula displayed. grey Grey level apply plot elements (0 1 1 = black). pch Point symbol type. p.col Color point symbol. p.fill Point fill color passed bg (used pch ranging 21-25). size Point size (0-1) alpha Point transparency (0 = transparent, 1 = opaque). applicable rgb() used define point colors. q Boolean determining grey quantile boxes plotted. b.val Quantiles define quantile box parameters. Defaults IQR. Two values needed. l.val Quantiles define quantile line parameters. Defaults mid 75% values. Two values needed. xlab X label output plot. Ignored x dataframe. ylab Y label output plot. Ignored x dataframe. title Title add plot. t.size Title size. ... used","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_qq.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"QQ and MD plots — eda_qq","text":"Returns list following components: x: X values. May interpolated smallest quantile batch. Values reflect power transformation defined p. b: Y values. May interpolated smallest quantile batch. Values reflect power transformation defined p. p: Re-expression applied original values. fx: Formula applied x variable. fy: Formula applied y variable.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_qq.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"QQ and MD plots — eda_qq","text":"function used generate empirical QQ plot, plot displays IQR via grey boxes x y values. box widths can changed via b.val argument. plot also display mid 75% values via light colored dashed lines. line positions can changed via l.val argument. middle dashed line represents batch's median value. Console output prints suggested multiplicative additive offsets. See QQ plot vignette introduction use interpretation. function can also used generate Normal QQ plot norm argument set TRUE. case, line parameters l.val overridden set +/- 1 standard deviations. Note \"suggested offsets\" output disabled, can generate M-D version Normal QQ plot. Also note formula argument ignored mode.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_qq.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"QQ and MD plots — eda_qq","text":"","code":"# Passing data as a dataframe singer <- lattice::singer dat <- singer[singer$voice.part %in% c(\"Bass 2\", \"Tenor 1\"), ] eda_qq(dat, height, voice.part) #> [1] \"Suggested offsets:y = x * 0.8571 + (12.4286)\" # Passing data as two separate vector objects bass2 <- subset(singer, voice.part == \"Bass 2\", select = height, drop = TRUE ) tenor1 <- subset(singer, voice.part == \"Tenor 1\", select = height, drop = TRUE ) eda_qq(bass2, tenor1) #> [1] \"Suggested offsets:y = x * 1.04 + (-5.22)\" # There seems to be an additive offset of about 2 inches eda_qq(bass2, tenor1, fx = \"x - 2\") #> [1] \"Suggested offsets:y = x * 1.04 + (-5.22)\" # We can fine-tune by generating the Tukey mean-difference plot eda_qq(bass2, tenor1, fx = \"x - 2\", md = TRUE) #> [1] \"Suggested offsets:y = x * 1.04 + (-5.22)\" # An offset of another 0.5 inches seems warranted # We can sat that overall, bass2 singers are 2.5 inches taller than tenor1. # The offset is additive. eda_qq(bass2, tenor1, fx = \"x - 2.5\", md = TRUE) #> [1] \"Suggested offsets:y = x * 1.04 + (-5.22)\" # Example 2: Sepal width setosa <- subset(iris, Species == \"setosa\", select = Petal.Width, drop = TRUE) virginica <- subset(iris, Species == \"virginica\", select = Petal.Width, drop = TRUE) eda_qq(setosa, virginica) #> [1] \"Suggested offsets:y = x * 1.7143 + (1.6286)\" # The points are not completely parallel to the 1:1 line suggesting a # multiplicative offset. The slope may be difficult to eyeball. The function # outputs a suggested slope and intercept. We can start with that eda_qq(setosa, virginica, fx = \"x * 1.7143\") #> [1] \"Suggested offsets:y = x * 1.7143 + (1.6286)\" # Now let's add the suggested additive offset. eda_qq(setosa, virginica, fx = \"x * 1.7143 + 1.6286\") #> [1] \"Suggested offsets:y = x * 1.7143 + (1.6286)\" # We can confirm this value via the mean-difference plot # Overall, we have both a multiplicative and additive offset between the # species' petal widths. eda_qq(setosa, virginica, fx = \"x * 1.7143 + 1.6286\", md = TRUE) #> [1] \"Suggested offsets:y = x * 1.7143 + (1.6286)\" # Function can also generate a Normal QQ plot eda_qq(bass2, norm = TRUE)"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_re.html","id":null,"dir":"Reference","previous_headings":"","what":"Re-expression function — eda_re","title":"Re-expression function — eda_re","text":"eda_re re-expresses vector following Tukey box-cox transformation.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_re.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Re-expression function — eda_re","text":"","code":"eda_re(x, p = 0, tukey = TRUE)"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_re.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Re-expression function — eda_re","text":"x Vector p Power transformation tukey set TRUE, adopt Tukey's power transformation, FALSE, adopt Box-Cox transformation","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_re.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Re-expression function — eda_re","text":"Returns vector length input x","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_re.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Re-expression function — eda_re","text":"function used re-express data using one two transformation techniques: Box-Cox transformation (tukey = FALSE)Tukey's power transformation (tukey = TRUE).","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_re.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Re-expression function — eda_re","text":"","code":"x <- c(15, 28, 17, 73, 8, 83, 2) eda_re(x, p=-1/3) #> [1] 0.4054801 0.3293169 0.3889111 0.2392723 0.5000000 0.2292489 0.7937005"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_rline.html","id":null,"dir":"Reference","previous_headings":"","what":"Tukey's resistant line — eda_rline","title":"Tukey's resistant line — eda_rline","text":"eda_rline R implementation Hoaglin, Mosteller Tukey's resistant line technique outlined chapter 5 \"Understanding Robust Exploratory Data Analysis\" (Wiley, 1983).","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_rline.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Tukey's resistant line — eda_rline","text":"","code":"eda_rline(dat, x, y, px = 1, py = 1, tukey = TRUE, iter = 20)"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_rline.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Tukey's resistant line — eda_rline","text":"dat Data frame. x Column assigned x axis. y Column assigned y axis. px Power transformation apply x-variable. py Power transformation apply y-variable. tukey Boolean determining Tukey transformation adopted. iter Maximum number iterations run. (FALSE adopts Box-Cox transformation)","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_rline.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Tukey's resistant line — eda_rline","text":"Returns list class eda_rlinewith following named components: : Intercept b: Slope res: Residuals sorted x-values x: Sorted x values y: y values following sorted x-values xmed: Median x values third ymed: Median y values third index: Index sorted x values defining upper boundaries thirds xlab: X label name ylab: Y label name iter: Number iterations","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_rline.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Tukey's resistant line — eda_rline","text":"R implementation RLIN.F FORTRAN code Velleman et. al's book. function fits robust line using three-point summary strategy whereby data split three equal length groups along x-axis line fitted medians defining group via iterative process. function mirror built-stat::line function fitting strategy outputs additional parameters. See accompanying vignette Resistant Line detailed breakdown resistant line technique.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_rline.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Tukey's resistant line — eda_rline","text":"Velleman, P. F., D. C. Hoaglin. 1981. Applications, Basics Computing Exploratory Data Analysis. Boston: Duxbury Press. D. C. Hoaglin, F. Mosteller, J. W. Tukey. 1983. Understanding Robust Exploratory Data Analysis. Wiley.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_rline.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Tukey's resistant line — eda_rline","text":"","code":"# This first example uses breast cancer data from \"ABC's of EDA\" page 127. # The output model's parameters should closely match: Y = -46.19 + 2.89X # The plots shows the original data with a fitted resistant line (red) # and a regular lm fitted line (dashed line), and the modeled residuals. # The 3-point summary dots are shown in red. M <- eda_rline(neoplasms, Temp, Mortality) M #> $b #> [1] 2.890173 #> #> $a #> [1] -45.90578 #> #> $res #> [1] 21.2982659 0.1398844 -2.1791908 8.8294798 -11.2485549 -7.6167630 #> [7] -0.1398844 4.7589595 -9.0092486 -2.1994220 2.7554913 -7.2676301 #> [13] -0.3907514 6.1861272 1.7971098 0.1398844 #> #> $x #> [1] 31.8 34.0 40.2 42.1 42.3 43.5 44.2 45.1 46.3 47.3 47.8 48.5 49.2 49.9 50.0 #> [16] 51.3 #> #> $y #> [1] 67.3 52.5 68.1 84.6 65.1 72.2 81.7 89.2 78.9 88.6 95.0 87.0 #> [13] 95.9 104.5 100.4 102.5 #> #> $xmed #> [1] 40.2 45.7 49.9 #> #> $ymed #> [1] 67.30 85.15 100.40 #> #> $index #> [1] 5 11 16 #> #> $xlab #> [1] \"Temp\" #> #> $ylab #> [1] \"Mortality\" #> #> $px #> [1] 1 #> #> $py #> [1] 1 #> #> $iter #> [1] 4 #> #> attr(,\"class\") #> [1] \"eda_rline\" # Plot the output (red line is the resistant line) plot(M) # Add a traditional OLS regression line (dashed line) abline(lm(Mortality ~ Temp, neoplasms), lty = 3) # Plot the residuals plot(M, type = \"residuals\") # This next example models gas consumption as a function of engine displacement. # It applies a transformation to both variables via the px and py arguments. eda_3pt(mtcars, disp, mpg, px = -1/3, py = -1, ylab = \"gal/mi\", xlab = expression(\"Displacement\"^{-1/3})) #> $slope1 #> [1] -0.3633401 #> #> $slope2 #> [1] -0.4098173 #> #> $hsrtio #> [1] 1.127916 #> #> $xmed #> [1] 0.1405721 0.1813741 0.2190819 #> #> $ymed #> [1] 0.06690834 0.05208333 0.03663004 #> # This next example uses Andrew Siegel's pathological 9-point dataset to test # for model stability when convergence cannot be reached. M <- eda_rline(nine_point, X, Y) plot(M)"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_sl.html","id":null,"dir":"Reference","previous_headings":"","what":"Tukey's spread-level function — eda_sl","title":"Tukey's spread-level function — eda_sl","text":"eda_sl function generates spread-level table univariate dataset.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_sl.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Tukey's spread-level function — eda_sl","text":"","code":"eda_sl( dat, x, fac, p = 1, tukey = FALSE, sprd = \"frth\", plot = TRUE, grey = 0.6, pch = 21, p.col = \"grey50\", p.fill = \"grey80\", size = 1, alpha = 0.8 )"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_sl.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Tukey's spread-level function — eda_sl","text":"dat Dataframe x Continuous variable column fac Categorical variable column p Power transformation apply variable tukey Boolean determining Tukey transformation adopted (FALSE adopts Box-Cox transformation) sprd Choice spreads. Either interquartile, sprd = \"IQR\" fourth-spread, sprd = \"frth\" (default). plot Boolean determining plot generated. grey Grey level apply plot elements (0 1 1 = black). pch Point symbol type. p.col Color point symbol. p.fill Point fill color passed bg (used pch ranging 21-25). size Point size (0-1) alpha Point transparency (0 = transparent, 1 = opaque). applicable rgb() used define point colors.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_sl.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Tukey's spread-level function — eda_sl","text":"Returns dataframe level spreads.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_sl.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Tukey's spread-level function — eda_sl","text":"Note function confused William Cleveland's spread-location function. fac categorical, output produce many NA's. page 59, Hoaglan et. al define fourth-spread range defined upper fourth lower fourth. eda_lsum function used compute upper/lower fourths.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_sl.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Tukey's spread-level function — eda_sl","text":"Understanding Robust Exploratory Data Analysis, Hoaglin, David C., Frederick Mosteller, John W. Tukey, 1983.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_sl.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Tukey's spread-level function — eda_sl","text":"","code":"dat <- read.csv(\"http://mgimond.github.io/ES218/Data/Food_web.csv\") sl <- eda_sl(dat, mean.length, dimension) # The output can be passed to a model fitting function like eda_lm # The output slope can be used to help identify a power transformation eda_lm(sl, Level, Spread) #> (Intercept) x #> -2.969986 2.979117"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_trim.html","id":null,"dir":"Reference","previous_headings":"","what":"Trims vector and dataframe objects — eda_trim","title":"Trims vector and dataframe objects — eda_trim","text":"Removes records either tail-ends sorted dataset. Trimming can performed number records (specify num = option) quantiles (specify prop= option). eda_trim Trims vector eda_trim_df Trims data frame eda_ltrim Left-trims vector eda_rtrim Right-trims vector eda_ltrim_df Left-trims dataframe eda_rtrim_df Right-trims dataframe","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_trim.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Trims vector and dataframe objects — eda_trim","text":"","code":"eda_trim(x, prop = 0.05, num = 0) eda_trim_df(dat, x, prop = 0.05, num = 0) eda_ltrim(x, prop = 0.05, num = 0) eda_ltrim_df(dat, x, prop = 0.05, num = 0) eda_rtrim(x, prop = 0.05, num = 0) eda_rtrim_df(dat, x, prop = 0.05, num = 0)"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_trim.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Trims vector and dataframe objects — eda_trim","text":"x Vector values (trimming vector) column whose values used trim dataframe (applies *_df functions ) prop Fraction values trim num Number values trim dat Dataframe (applies *_df functions )","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_trim.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Trims vector and dataframe objects — eda_trim","text":"Returns data type input (.e. vector dataframe)","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_trim.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Trims vector and dataframe objects — eda_trim","text":"input dataset need sorted (sorting performed functions). num set zero, function assume trimming done fraction (defined prop parameter). eda_trim eda_trim_df functions called, num prop values apply tail. example, num = 5 5 smallest 5 largest values removed data. NA values must stripped input vector column elements running trim functions. Elements returned sorted trimmed elements.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_trim.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Trims vector and dataframe objects — eda_trim","text":"","code":"# Trim a vector by 10% (i.e. 10% of the smallest and 10% of the largest # values) eda_trim( mtcars[,1], prop=0.1) #> [1] 14.7 15.0 15.2 15.2 15.5 15.8 16.4 17.3 17.8 18.1 18.7 19.2 19.2 19.7 21.0 #> [16] 21.0 21.4 21.4 21.5 22.8 22.8 24.4 26.0 27.3 # Trim a data frame by 10% using the mpg column(i.e. 10% of the smallest # and 10% of the largest mpg values) eda_trim_df( mtcars, mpg, prop=0.1) #> mpg cyl disp hp drat wt qsec vs am gear carb #> Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 #> Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 #> Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 #> AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 #> Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 #> Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 #> Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 #> Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 #> Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 #> Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 #> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 #> Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 #> Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 #> Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 #> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 #> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 #> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 #> Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 #> Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 #> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 #> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 #> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 #> Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 #> Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_unipow.html","id":null,"dir":"Reference","previous_headings":"","what":"Ladder of powers transformation on a single vector — eda_unipow","title":"Ladder of powers transformation on a single vector — eda_unipow","text":"eda_unipow re-expresses vector ladder powers plots results using histogram density function. Either Tukey Box-Cox transformation used computing re-expressed values.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_unipow.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Ladder of powers transformation on a single vector — eda_unipow","text":"","code":"eda_unipow( x, p = c(2, 1, 1/2, 0.33, 0, -0.33, -1/2, -1, -2), tukey = TRUE, bins = 5, cex.main = 1.3, col = \"#DDDDDD\", border = \"#AAAAAA\", title = \"Re-expressed data via ladder of powers\", ... )"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_unipow.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Ladder of powers transformation on a single vector — eda_unipow","text":"x Vector p Vector powers tukey TRUE (default), apply Tukey's power transformation, FALSE adopt Box-Cox transformation bins Number histogram bins cex.main Histogram title size (assigned histogram plot) col Histogram fill color border Histogram border color title Overall plot title (set NULL title) ... parameters passed graphics::hist function.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_unipow.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Ladder of powers transformation on a single vector — eda_unipow","text":"return value","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_unipow.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Ladder of powers transformation on a single vector — eda_unipow","text":"output lattice descriptive plots showing transformed data across different powers.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_unipow.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Ladder of powers transformation on a single vector — eda_unipow","text":"Tukey, John W. 1977. Exploratory Data Analysis. Addison-Wesley.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_unipow.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Ladder of powers transformation on a single vector — eda_unipow","text":"","code":"data(mtcars) eda_unipow(mtcars$mpg, bins=6)"},{"path":"https://mgimond.github.io/tukeyedar/reference/neoplasms.html","id":null,"dir":"Reference","previous_headings":"","what":"Breast cancer mortality vs. temperature — neoplasms","title":"Breast cancer mortality vs. temperature — neoplasms","text":"data represent relationship mean annual temperature breast cancer mortality rate.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/neoplasms.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Breast cancer mortality vs. temperature — neoplasms","text":"","code":"neoplasms"},{"path":"https://mgimond.github.io/tukeyedar/reference/neoplasms.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Breast cancer mortality vs. temperature — neoplasms","text":"data frame 16 rows 2 variables: Temp Temperature degrees Fahrenheit. Mortality Mortality rate presented index.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/neoplasms.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Breast cancer mortality vs. temperature — neoplasms","text":"Applications, Basics Computing Exploratory Data Analysis, P.F. Velleman D.C. Hoaglin, 1981. (page 127)","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/nine_point.html","id":null,"dir":"Reference","previous_headings":"","what":"Andrew Siegel's pathological 9-point dataset — nine_point","title":"Andrew Siegel's pathological 9-point dataset — nine_point","text":"synthetic dataset created test robustness fitted lines. Originally published Andrew Siegel later adapted Hoaglin et al.'s book.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/nine_point.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Andrew Siegel's pathological 9-point dataset — nine_point","text":"","code":"nine_point"},{"path":"https://mgimond.github.io/tukeyedar/reference/nine_point.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Andrew Siegel's pathological 9-point dataset — nine_point","text":"data frame 9 rows 2 variables: X X values Y Y values","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/nine_point.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Andrew Siegel's pathological 9-point dataset — nine_point","text":"Robust regression using repeated medians, Andrew F. Siegel, Biometrika, vol 69, n 1, 1982. Understanding robust exploratory data analysis, D.C. Hoaglin, F. Mosteller J.W. Tukey. 1983 (page 139)","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/plot.eda_polish.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot eda_polish tables or diagnostic plots — plot.eda_polish","title":"Plot eda_polish tables or diagnostic plots — plot.eda_polish","text":"plot.eda_pol plot method lists eda_polish class.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/plot.eda_polish.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot eda_polish tables or diagnostic plots — plot.eda_polish","text":"","code":"# S3 method for eda_polish plot( x, type = \"residuals\", add.cv = FALSE, k = NULL, col.quant = FALSE, colpal = \"RdYlBu\", colrev = TRUE, col.eff = TRUE, col.com = TRUE, adj.mar = TRUE, res.size = 1, row.size = 1, col.size = 1, res.txt = TRUE, label.txt = TRUE, grey = 0.6, pch = 21, p.col = \"grey30\", p.fill = \"grey60\", size = 0.9, alpha = 0.8, ... )"},{"path":"https://mgimond.github.io/tukeyedar/reference/plot.eda_polish.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot eda_polish tables or diagnostic plots — plot.eda_polish","text":"x list class eda_polish. type Plot type. One three: \"residuals\", \"cv\" \"diagnostic\". add.cv Whether add kCV model plotting \"residuals\". k Custom k use kCV added model. value NULL makes us slope. col.quant Boolean indicating quantile classification scheme used. colpal Color palette adopt (one listed hcl.pals()). colrev color palette reversed? (default TRUE). col.eff Boolean indicating effects common value contribute color gradient. col.com Boolean indicating common value contribute color gradient. adj.mar Boolean indicating margin width needs accommodate labels. res.size Size residual values plot [0-1]. row.size Size row effect values plot [0-1]. col.size Size column effect values plot [0-1]. res.txt Boolean indicating values added plot. label.txt Boolean indicating margin column labels plotted. grey Grey level apply plot elements diagnostic plot (0 1 1 = black). pch Point symbol type diagnostic plot. p.col Color point symbol diagnostic plot. p.fill Point fill color passed bg (used pch ranging 21-25). size Point size (0-1) diagnostic plot. alpha Point transparency (0 = transparent, 1 = opaque). applicable rgb() used define point colors. ... Arguments passed subsequent methods.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/plot.eda_polish.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Plot eda_polish tables or diagnostic plots — plot.eda_polish","text":"Returns single element vector \"type\" \"diagnostic\" value otherwise.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/plot.eda_polish.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Plot eda_polish tables or diagnostic plots — plot.eda_polish","text":"function plots polish table residuals CV values. also generate diagnostic plot type set diagnostic","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/plot.eda_polish.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Plot eda_polish tables or diagnostic plots — plot.eda_polish","text":"","code":"# Create dataset df <- data.frame(region = rep( c(\"NE\", \"NC\", \"S\", \"W\"), each = 5), edu = rep( c(\"ed8\", \"ed9to11\", \"ed12\", \"ed13to15\", \"ed16\"), 4), perc = c(25.3, 25.3, 18.2, 18.3, 16.3, 32.1, 29, 18.8, 24.3, 19, 38.8, 31, 19.3, 15.7, 16.8, 25.4, 21.1, 20.3, 24, 17.5)) # Generate median polish output out <- eda_pol(df, row = region, col = edu, val = perc, plot = FALSE) # Plot table plot(out, type = \"residuals\") # Plot table using CV values plot(out, type = \"cv\") # Generate diagnostic plot plot(out, type = \"diagnostic\") #> $slope #> cv #> 1.3688 #>"},{"path":"https://mgimond.github.io/tukeyedar/reference/plot.eda_rline.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot eda_rline model — plot.eda_rline","title":"Plot eda_rline model — plot.eda_rline","text":"plot.eda_rline plot method lists eda_rline class.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/plot.eda_rline.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot eda_rline model — plot.eda_rline","text":"","code":"# S3 method for eda_rline plot( x, type = \"model\", xlab = NULL, ylab = NULL, grey = 0.7, pch = 21, equal = TRUE, p.col = \"grey50\", p.fill = \"grey80\", size = 0.8, alpha = 0.7, model = TRUE, pt3 = TRUE, fit = TRUE, ... )"},{"path":"https://mgimond.github.io/tukeyedar/reference/plot.eda_rline.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot eda_rline model — plot.eda_rline","text":"x Object class eda_rline. type Plot type. One two: \"model\", \"residuals\". xlab Custom x-axis label. Defaults column name. ylab Custom y-axis label. Defaults column name. grey Grey level apply plot elements (0 1 1 = black). pch Point symbol type. equal Boolean determining axes lengths match (.e. square plot). p.col Color point symbol. p.fill Point fill color passed bg (used pch ranging 21-25). size Point size (0-1). alpha Point transparency (0 = transparent, 1 = opaque). applicable rgb() used define point colors. model Boolean indicating resulting model added plot. applies type = \"model\". pt3 Boolean indicating 3-pt summaries added plot. applies type = \"model\". fit Boolean indicating fitted line added plot. ... Arguments passed subsequent methods.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/plot.eda_rline.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Plot eda_rline model — plot.eda_rline","text":"return value.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/plot.eda_rline.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Plot eda_rline model — plot.eda_rline","text":"function generates scatter plot fitted model eda_rline object.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/plot.eda_rline.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Plot eda_rline model — plot.eda_rline","text":"","code":"r.lm <- eda_rline(age_height, Months, Height) plot(r.lm) plot(r.lm, pt3 = FALSE) plot(r.lm, type = \"residuals\")"},{"path":"https://mgimond.github.io/tukeyedar/reference/tukeyedar.html","id":null,"dir":"Reference","previous_headings":"","what":"Tukey inspired exploratory data analysis functions — tukeyedar","title":"Tukey inspired exploratory data analysis functions — tukeyedar","text":"packages hosts small set Tukey inspired functions use exploring datasets robust manner.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/wat05.html","id":null,"dir":"Reference","previous_headings":"","what":"Temperature normals for Waterville Maine (1991-2020) — wat05","title":"Temperature normals for Waterville Maine (1991-2020) — wat05","text":"NOAA/NCEI derived normal daily temperatures city Waterville, Maine (USA) 1991 2020 period. Units degrees Fahrenheit.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/wat05.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Temperature normals for Waterville Maine (1991-2020) — wat05","text":"","code":"wat05"},{"path":"https://mgimond.github.io/tukeyedar/reference/wat05.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Temperature normals for Waterville Maine (1991-2020) — wat05","text":"data frame 365 rows 5 variables: date Date centered 11991-2020 period. Note year purely symbolic. doy Day year. min Typical minimum temperature 1991-2020 period. avg Typical average temperature 1991-2020 period. max Typical maximum temperature 1991-2020 period.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/wat05.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Temperature normals for Waterville Maine (1991-2020) — wat05","text":"https://www.ncei.noaa.gov/","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/wat95.html","id":null,"dir":"Reference","previous_headings":"","what":"Legacy temperature normals for Waterville Maine (1981-2010) — wat95","title":"Legacy temperature normals for Waterville Maine (1981-2010) — wat95","text":"NOAA/NCEI derived normal daily temperatures city Waterville, Maine (USA) 1981 2010 period. Units degrees Fahrenheit.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/wat95.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Legacy temperature normals for Waterville Maine (1981-2010) — wat95","text":"","code":"wat95"},{"path":"https://mgimond.github.io/tukeyedar/reference/wat95.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Legacy temperature normals for Waterville Maine (1981-2010) — wat95","text":"data frame 365 rows 5 variables: date Date centered 1981-2010 period. Note year purely symbolic. doy Day year min Typical minimum temperature 1981-2010 period. avg Typical average temperature 1981-2010 period. max Typical maximum temperature 1981-2010 period.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/wat95.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Legacy temperature normals for Waterville Maine (1981-2010) — wat95","text":"https://www.ncei.noaa.gov/","code":""},{"path":"https://mgimond.github.io/tukeyedar/news/index.html","id":"tukeyedar-020","dir":"Changelog","previous_headings":"","what":"tukeyedar 0.2.0","title":"tukeyedar 0.2.0","text":"Added Normal QQ plot option eda_qq Added symmetrical Normal fit plot function eda_normfit Updated eda_boxls aesthetics Updated median polish diagnostic plot aesthetics","code":""},{"path":"https://mgimond.github.io/tukeyedar/news/index.html","id":"tukeyedar-011","dir":"Changelog","previous_headings":"","what":"tukeyedar 0.1.1","title":"tukeyedar 0.1.1","text":"Introduces median polish function eda_pol. Introduces QQ Tukey mean-difference plot eda_qq. Introduces density plot function eda_dens. Adds re-expression parameters eda_lm via parameters px py. Adds sd labels SD dashed lines eda_lm. eda_lm now output lm intercept slope. Adds plot method eda_rline object. eda_re p = 1, box-cox option ignored. Homogenized plot appearances. Added power parameter argument eda_boxls. Added power parameter argument eda_sl. Added plot option eda_sl.","code":""},{"path":"https://mgimond.github.io/tukeyedar/news/index.html","id":"tukeyedar-010","dir":"Changelog","previous_headings":"","what":"tukeyedar 0.1.0","title":"tukeyedar 0.1.0","text":"Initial release tukeyedar","code":""}]
+[{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"the-median-polish-basics","dir":"Articles","previous_headings":"","what":"The median polish basics","title":"Median polish","text":"median polish exploratory technique used extract effects two-way table. , median polish can thought robust version two-way ANOVA–goal characterize role factor contributing towards expected value. iteratively extracting effects associated row column factors via medians. example, given two-way table 1964 1966 infant mortality rates1 (reported count per 1000 live births) computed combination geographic region (NE, NC, S, W) level father’s educational attainment (ed8, ed9-11, ed12, ed13-15, ed16), median polish first extract overall median value, smooth residual rates first extracting median values along column (thus contributing column factor), smoothing remaining residual rates extracting median values along row (thus contributing row factor). smoothing operation iterated residuals stabilize. example workflow highlighted following figure. left-table original data showing death rates. second table shows outcome first round polishing (including initial overall median value 20.2). third forth table show second third iterations smoothing operations. Additional iterations deemed necessary given little can extracted residuals. detailed step--step explanation workflow see . resulting model additive form : \\[ y_{ij} = \\mu + \\alpha_{} + \\beta_{j} +\\epsilon_{ij} \\] \\(y_{ij}\\) response variable row \\(\\) column \\(j\\), \\(\\mu\\) overall typical value (hereafter referred common value), \\(\\alpha_{}\\) row effect, \\(\\beta_{j}\\) column effect \\(\\epsilon_{ij}\\) residual value left effects taken account. factor’s levels displayed top row left-column. example, region assigned rows father’s educational attainment assigned columns. father’s educational attainment can explain 11 units variability (7.58 - (-3.45)) death rates vs 4 units variability region (2.55 - (-1.5)). , father’s educational attainment larger contributor expected infant mortality regional effect.","code":""},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"implementing-the-median-polish","dir":"Articles","previous_headings":"","what":"Implementing the median polish","title":"Median polish","text":"package’s eda_polish augmented version built-medpolish available via stats package. key difference eda_polish takes input dataset long form opposed medpolish takes dataset form matrix. example, infant mortality dataset needs consist least three columns: one variable (two factors expected value). median polish can executed follows: function output table plot along list components stored M1 object. want suppress plot, can set parameter plot = FALSE. M1 object class eda_polish. can extract common values, row column effects follows:","code":"grd <- c(\"ed8\", \"ed9-11\", \"ed12\", \"ed13-15\", \"ed16\") dat <- data.frame(region = rep( c(\"NE\", \"NC\", \"S\", \"W\"), each = 5), edu = factor(rep( grd , 4), levels = grd), perc = c(25.3, 25.3, 18.2, 18.3, 16.3, 32.1, 29, 18.8, 24.3, 19, 38.8, 31, 19.3, 15.7, 16.8, 25.4, 21.1, 20.3, 24, 17.5)) head(dat) region edu perc 1 NE ed8 25.3 2 NE ed9-11 25.3 3 NE ed12 18.2 4 NE ed13-15 18.3 5 NE ed16 16.3 6 NC ed8 32.1 library(tukeyedar) M1 <- eda_pol(dat, row = region, col = edu, val = perc) M1$global [1] 20.85 M1$row region effect 1 NC 2.3000 2 NE -1.4625 3 S -0.3500 4 W 0.3500 M1$col edu effect 1 ed8 7.43125 2 ed9-11 5.88125 3 ed12 -1.19375 4 ed13-15 0.03125 5 ed16 -3.70000"},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"ordering-rows-and-columns-by-effect-values","dir":"Articles","previous_headings":"Implementing the median polish","what":"Ordering rows and columns by effect values","title":"Median polish","text":"order row column effects effect values, set sort parameter TRUE.","code":"M1 <- eda_pol(dat, row = region, col = edu, val = perc, sort = TRUE)"},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"applying-a-transformation-to-the-data","dir":"Articles","previous_headings":"Implementing the median polish","what":"Applying a transformation to the data","title":"Median polish","text":"can function re-express values prior performing polish. example, log transform data, pass value 0 p. re-expressing data using negative power, choice adopting Tukey transformation (tukey = TRUE) Box-Cox transformation (tukey = FALSE). example, apply power transformation -0.1 using Box-Cox transformation, type:","code":"M1 <- eda_pol(dat, row = region, col = edu, val = perc, p = 0) M1 <- eda_pol(dat, row = region, col = edu, val = perc, p = -0.1, tukey = FALSE)"},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"defining-the-statistic","dir":"Articles","previous_headings":"Implementing the median polish","what":"Defining the statistic","title":"Median polish","text":"default, polishing routine adopts median statistic. can adopt statistic via stat parameter. example, apply mean polish, type:","code":"M1 <- eda_pol(dat, row = region, col = edu, val = perc, stat = mean)"},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"the-eda_polish-plot-method","dir":"Articles","previous_headings":"","what":"The eda_polish plot method","title":"Median polish","text":"list object created eda_pol function class eda_polish. , plot method created class. plot method either output original polished table (type = \"residuals\"), diagnostic plot (type = \"diagnostic\"), CV values (cv).","code":""},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"plot-the-median-polish-table","dir":"Articles","previous_headings":"","what":"Plot the median polish table","title":"Median polish","text":"can generate plot table median polish model follows:","code":"M1 <- eda_pol(dat, row = region, col = edu, val = perc, plot = FALSE) plot(M1)"},{"path":[]},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"excluding-common-effect-from-the-color-palette-range","dir":"Articles","previous_headings":"Adjusting color schemes","what":"Excluding common effect from the color palette range","title":"Median polish","text":"default, range color palettes defined range values table–includes common effect value. prevent common value affecting distribution color palettes, set col.com FALSE. Note distribution colors maximized help improve view effects. view makes clear father’s educational attainment greater effect region.","code":"plot(M1, col.com = FALSE)"},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"excluding-rowcolumn-effects-from-the-color-palette-range","dir":"Articles","previous_headings":"Adjusting color schemes","what":"Excluding row/column effects from the color palette range","title":"Median polish","text":"want plot focus residuals maximizing range colors fit range residual values, set col.eff = FALSE. Note setting col.eff FALSE prevent effects cells colored. simply ensures range colors maximized match full range residual values. effect value falls within residual range assigned color.","code":"plot(M1, col.eff = FALSE)"},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"changing-color-schemes","dir":"Articles","previous_headings":"Adjusting color schemes","what":"Changing color schemes","title":"Median polish","text":"default, color scheme symmetrical (divergent) centered 0. adopts R’s (version 4.1 ) built-\"RdYlBu\" color palette. can assign different built-color palettes via colpal parameter. can list available colors R via hcl.pals() function. want limit output divergent color palettes, type: example, can assign \"Green-Brown\" color palette follows. (’ll remove common effect value range input values maximize displayed set colors). default color scheme symmetrical linear, centered 0. want maximize use colors, regardless range values, can set col.quant TRUE adopt quantile color scheme. ’ll note regardless asymmetrical distribution values 0, cell assigned unique color swatch. adopting quantile color classification scheme, might want adopt color palette generates fewer unique hues variation lightness values. example,","code":"hcl.pals(type = \"diverging\") [1] \"Blue-Red\" \"Blue-Red 2\" \"Blue-Red 3\" \"Red-Green\" [5] \"Purple-Green\" \"Purple-Brown\" \"Green-Brown\" \"Blue-Yellow 2\" [9] \"Blue-Yellow 3\" \"Green-Orange\" \"Cyan-Magenta\" \"Tropic\" [13] \"Broc\" \"Cork\" \"Vik\" \"Berlin\" [17] \"Lisbon\" \"Tofino\" plot(M1, colpal = \"Green-Brown\", col.com = FALSE) plot(M1, col.quant = TRUE) plot(M1, col.quant = TRUE, colpal = \"Green-Orange\")"},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"adjusting-text","dir":"Articles","previous_headings":"","what":"Adjusting text","title":"Median polish","text":"can omit labeled values output setting res.txt FALSE. Likewise can omit axes labels setting label.txt FALSE. may prove useful applying median polish large grid file. can adjust text size via res.size, row.size col.size parameters numeric values, row names, column names respectively. example, set sizes 60% default value, type:","code":"plot(M1, res.txt = FALSE) plot(M1, res.txt = FALSE, label.txt = FALSE) plot(M1, row.size = 0.6, col.size = 0.6 , res.size = 0.6)"},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"exploring-diagnostic-plots","dir":"Articles","previous_headings":"","what":"Exploring diagnostic plots","title":"Median polish","text":"plot method also generate plot residuals vs comparison values (CV), herein referred diagnostic plot. bisquare robust line fitted data (light red line) along robust loess fit (dashed blue line). function also output line’s slope. slope can used help estimate transformation data, needed. generate plot, simply extract cv component M1 list. cv component dataframe stores residuals (first column) CV values (fourth column). first records dataframe shown : diagnostic plot helps identify interactions effects. interaction suspected, model longer simple additive model; model needs augmented interactive component form: \\[ y_{ij} = \\mu + \\alpha_{} + \\beta_{j} + kCV +\\epsilon_{ij} \\] \\(CV\\) = \\(\\alpha_{}\\beta_{j}/\\mu\\) \\(k\\) constant can estimated slope generated diagnostic plot. truly additive model one changes response variable one level another level remain constant. example, given bottom-left matrix initial response values, changes response variable level level b constant regardless row effect. example, going b level z elicits change response 6 - 3 = 3. observed change values b levels x y (4-1 5-2 respectively). three row levels, change expected values b –increase 3 units. Likewise, changes response values rows x y y z constant (1) across levels column effect. additive effect can observed interaction plot shown right. column effect plotted along x-axis, row effect mapped line segment. Original dataset (left). Interaction plot (right). Parallel lines indicate interaction effects. median polish generates following table diagnostic plot: Median polished data showing interaction effects ’ll note lack pattern (flat one) accompanying diagnostic plot. Now, let’s see happens interaction fact present two way table. Original dataset (left). Interaction plot (right). Note lines longer parallel one another interaction plot. Now let’s run median polish generate diagnostic plot. Median polished data showing interaction effects ’ll note upward trend residuals increasing comparison values. usually good indication interaction effects. Another telltale sign pattern observed residuals median polish plot low residuals high residuals opposing corners table. interaction observed, either include interaction term additive model, seek re-expression might help alleviate interaction effects. choose include interaction term model, coefficient \\(k\\) can extracted slope generated diagnostic plot. choose re-express data hopes removing interaction data, can try using power transformation equal \\(1 - slope\\) (slope derived diagnostic plot). infant mortality dataset used exercise suggest interaction effects diagnostic plot. Next, ’ll look another dataset may exhibit interaction effects.","code":"plot(M1, type = \"diagnostic\") $slope cv 1.3688 head(M1$cv) perc region.eff edu.eff cv 1 -3.15625 2.3000 -1.19375 -0.1316846523 2 -0.00625 -0.3500 -1.19375 0.0200389688 3 0.00625 -1.4625 -1.19375 0.0837342626 4 0.29375 0.3500 -1.19375 -0.0200389688 5 -4.83125 -0.3500 0.03125 -0.0005245803 6 1.11875 2.3000 0.03125 0.0034472422"},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"another-example-earnings-by-sex-for-2021","dir":"Articles","previous_headings":"Exploring diagnostic plots","what":"Another example: Earnings by sex for 2021","title":"Median polish","text":"dataset consists earnings sex levels educational attainment 2021 (src: US Census Bureau). Education levels defined follows: NoHS: Less High School Graduate HS: High School Graduate (Includes Equivalency) AD: College Associate’s Degree Grad: Bachelor’s Degree original table (prior running median polish), can viewed setting maxiter 0 call eda_pol. 2021 Average earnings US. Next, ’ll run median polish. Next, plot final table diagnostic plot. ’s can glean output: Overall, median earnings $41,359 Variability earnings due different levels education attainment covers range $56,936 different sexes covers range $15,858. residuals quite large suggesting may much variability earnings may explained row column effects. residuals explain $15,780 variability data. diagnostic plot suggests strong interaction sex effect education effect. implies, example, differences earnings sexes depend level educational attainment. slope residuals CV values around 0.94. Given strong evidence interaction effects, need take one two actions: can either add comparison values (CV) row-plus-column model, can see re-expressing earnings values eliminates dependence effects.","code":"edu <- c(\"NoHS\", \"HS\", \"HS+2\", \"HS+4\", \"HS+6\") df1 <- data.frame(Education = factor(rep(edu,2), levels = edu), Sex = c(rep(\"Male\", 5), rep(\"Female\",5)), Earnings = c(31722, 40514, 49288, 73128,98840,20448, 26967, 33430, 50554, 67202)) eda_pol(df1, row = Education, col = Sex, val = Earnings , maxiter = 0) M2 <- eda_pol(df1, row = Education, col = Sex, val = Earnings , plot = FALSE) plot(M2, res.size = 0.8, row.size = 0.8, col.size = 0.8) plot(M2, \"diagnostic\") $slope cv 0.9410244"},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"adding-cv-to-the-row-plus-column-model","dir":"Articles","previous_headings":"Exploring diagnostic plots > Another example: Earnings by sex for 2021","what":"Adding CV to the row-plus-column model","title":"Median polish","text":"CV values computed stored median polish object. can extracted model via M2$cv component can visualized via plot function. following figure shows original residuals table (left) CV table (right). Median polish residuals (left) CV values (right). comparison value added model, need compute new set residuals. residuals can plotted setting add.cv TRUE specifying value k. Using slope estimate k get: CV values (left) new set residuals (right). two tables provide us parameters needed construct model. example, Female-NoHS earnings value can recreated table follows: \\[ Earnings_{Female-NoHS} = \\mu + Sex_{Female} + Education_{NoHS} + kCV_{Female-NoHS} + \\epsilon_{Female-NoHS} \\] : \\(CV_{Female-NoHS} = \\frac{(Sex_{Female})(Education_{NoHS})}{\\mu}\\) \\(k\\) constant can estimated diagnostic plot’s slope (0.94 example). gives us: \\[ Earnings_{Female-NoHS} = 41359 -7929 -15274 + (0.94)(2928.2) -460.5 \\]","code":"plot(M2, res.size = 0.8, row.size = 0.8, col.size = 0.8) plot(M2, \"cv\", res.size = 0.8, row.size = 0.8, col.size = 0.8) plot(M2, \"cv\", res.size = 0.8, row.size = 0.8, col.size = 0.8) plot(M2, \"residuals\", add.cv=TRUE, k = 0.94, res.size = 0.8, row.size = 0.8, col.size = 0.8)"},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"re-expressing-earnings","dir":"Articles","previous_headings":"Exploring diagnostic plots > Another example: Earnings by sex for 2021","what":"Re-expressing earnings","title":"Median polish","text":"’s possible earnings presented us scale best suited analysis. Subtracting slope value (derived diagnostic plot) value 1 offers suggested transformation may provide us scale measure best suited data. ’ll rerun median polish using power transformation 1 - 0.94 = 0.06. Next, plot final table diagnostic plot. Median polish output (left) CV values (right). power 0.06 may bit aggressive given ’ve gone positive relationship CV residual negative relationship two. Tweaking power parameter may recommended. can done via trial error, can done using technique described next.","code":"M3 <- eda_pol(df1, row = Education, col = Sex, val = Earnings , plot = FALSE, p = 0.06) plot(M3, res.size = 0.8, row.size = 0.8, col.size = 0.8) plot(M3, \"diagnostic\")"},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"fine-tuning-a-re-expression","dir":"Articles","previous_headings":"Exploring diagnostic plots > Another example: Earnings by sex for 2021","what":"Fine tuning a re-expression","title":"Median polish","text":"Klawonn et al.2 propose method honing optimal power transformation finding one maximizes effect’s spreads vis--vis residuals. computing ratio interquartile range row column effects 80% quantile residual’s absolute values. following code chunk computes ratio different power transformations. Row (left) column (right) effect IQRs residuals ratio vs power. plot suggests power transformation 0.1. ’ll re-run median polish using power transformation. slope much smaller loess fit suggests monotonically increasing decreasing relationship residuals CV values. Re-expressing value seems done good job stabilizing residuals across CV values. ’ll modify color scheme place emphasis effects opposed overall value. ’s can glean output: earnings values best expressed power scale 0.1. Overall, median earnings (re-expressed form) $19. Variability earnings due different levels education attainment covers range $3 different sexes covers range $1. residuals much smaller relative effects earnings re-expressed. residuals explain close $0 variability data. Just variability can explained effects. Re-expressing values eliminates interaction effects.","code":"f1 <- function(x){ out <- eda_pol(df1, row = Education, col = Sex, val = Earnings, p = x, plot=FALSE, tukey = FALSE) c(p=out$power, IQrow = out$IQ_row, IQcol = out$IQ_col) } IQ <- t(sapply(0:25/10, FUN = f1 )) # Apply transformations at 0.1 intervals plot(IQrow ~ p, IQ, type=\"b\") grid() plot(IQcol ~ p, IQ, type=\"b\") grid() M4 <- eda_pol(df1, row = Education, col = Sex, val = Earnings, plot = FALSE, p = 0.1) plot(M4, res.size = 0.8, row.size = 0.8, col.size = 0.8) plot(M4, \"diagnostic\") plot(M4, col.com = FALSE, res.size = 0.8, row.size = 0.8, col.size = 0.8)"},{"path":"https://mgimond.github.io/tukeyedar/articles/polish.html","id":"the-mean-polish","dir":"Articles","previous_headings":"","what":"The mean polish","title":"Median polish","text":"eda_pol function accepts statistical summary function. default, uses median. example, mean polish generated earnings dataset looks like : Polishing data using mean requires single iteration reach stable output. mean suffers sensitivity non-symmetrical distributions outliers. , median polish robust summary statistic. said, running mean polish benefits: ’s great way represent effects generated two-way analysis variance (aka 2-way ANOVA). confirmed comparing row column effects traditional 2-way ANOVA technique shown : median polish, must concern interactions effects. interaction present, ANOVA inferential statistics using F-test can untrustworthy. strong evidence interaction. slope 0.92 can used estimate power transformation via \\(1 - slope\\). close power transformation 0.1 ended adopting median polish exercise. Results mean polish (left) diagnositc plot (right). Re-expressing data nice job removing interaction effects much like performed median polish. suggests one run two-way ANOVA, re-expression strongly suggested.","code":"M5 <- eda_pol(df1, row = Education, col = Sex, val = Earnings , stat = mean, plot = FALSE) plot(M5, res.size = 0.8, row.size = 0.8, col.size = 0.8) model.tables(aov(Earnings ~ Sex + Education, df1)) Tables of effects Sex Sex Female Male -9489 9489 Education Education NoHS HS HS+2 HS+4 HS+6 -23124 -15469 -7850 12632 33812 plot(M5, type = \"diagnostic\", res.size = 0.8, row.size = 0.8, col.size = 0.8) $slope cv 0.9223166 M4b <- eda_pol(df1, row = Education, col = Sex, val = Earnings , stat = mean, plot = FALSE, p = 0.1, maxiter = 1) plot(M4b, res.size = 0.8, row.size = 0.8, col.size = 0.8) plot(M4b, \"diagnostic\")"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"The empirical QQ plot","text":"empirical quantile-quantile plot (QQ plot) probably one underused least appreciated plots univariate analysis. used compare two distributions across full range values. generalization boxplot limit comparison just median upper lower quartiles. fact, compares values matching value one batch corresponding quantile batch. sizes batch need . differ, larger batch interpolated smaller batch’s set quantiles. QQ plot help visualize differences distributions, can also model relationship batches. Note confused modeling relationship bivariate dataset latter pairs points observational units whereas QQ plot pairs values matching quantiles.","code":""},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"anatomy-of-the-eda_qq-plot","dir":"Articles","previous_headings":"","what":"Anatomy of the eda_qq plot","title":"The empirical QQ plot","text":"point represents matching quantiles batch. shaded boxes represent batch’s interquartile range (mid 50% values). Solid dashed lines inside shaded boxes represent batch’s medians. lightly shaded dashed dots represent batch’s 12.5th 87.5th quantiles (.e. show ends mid 80% values). upper right-hand text indicates power transformation applied batches (default power 1 original measurement scale). formula applied one batches, appear upper right-hand text. eda_qq also output suggested relationship y variable x variable console. bases batch’s interquartile values. output assigned new object, object store list following values: x value (interpolated needed), y value (interpolated needed), power parameter, formula applied x variable, formula applied y variable.","code":"#> [1] \"Suggested offsets:y = x * 1.4574 + (0.9914)\""},{"path":[]},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"data-type","dir":"Articles","previous_headings":"An overview of some of the function arguments","what":"Data type","title":"The empirical QQ plot","text":"function accept dataframe values column group column, accept two separate vector objects. example, pass two separate vector object, x y, type: data dataframe, type:","code":"library(tukeyedar) set.seed(207) x <- rnorm(30) y <- rnorm(30) + 0.5 eda_qq(x, y) dat <- data.frame(val = c(x, y), cat = rep(c(\"x\", \"y\"), each = 30)) eda_qq(dat, val, cat)"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"suppressing-the-plot","dir":"Articles","previous_headings":"An overview of some of the function arguments","what":"Suppressing the plot","title":"The empirical QQ plot","text":"can suppress plot x y values outputted list. batches match size, output show interpolated values output batches match size. output also include power parameter applied batches well formula applied one batches (fx formula applied x variable fy formula applied y variable).","code":"out <- eda_qq(x,y, plot = FALSE) #> [1] \"Suggested offsets:y = x * 1.1201 + (0.5618)\" out #> $x #> [1] -2.0207122 -1.6048333 -1.5620907 -1.5128732 -1.3126378 -1.1770882 #> [7] -1.0871906 -0.9258832 -0.8896555 -0.6152073 -0.3140113 -0.2996734 #> [13] -0.2954234 -0.2199849 -0.2108781 0.1202060 0.2608893 0.2680445 #> [19] 0.2910663 0.4239690 0.4262605 0.4301416 0.5176361 0.6085180 #> [25] 0.6880919 0.6929772 0.7640838 0.9037644 1.0124869 1.0503544 #> #> $y #> [1] -2.10710669 -1.30465821 -1.17618932 -1.17253191 -0.86423268 -0.40162859 #> [7] -0.37002087 -0.19629536 -0.07210822 -0.01829722 0.05826287 0.09105884 #> [13] 0.09371398 0.13698992 0.22318119 0.43006689 0.52597363 0.72665767 #> [19] 0.81351407 1.00612388 1.01831440 1.06713353 1.23708449 1.24360530 #> [25] 1.33232007 1.43973056 1.59125312 1.64852115 1.77464625 1.93823390 #> #> $p #> [1] 1 #> #> $fx #> NULL #> #> $fy #> NULL"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"setting-the-grey-box-and-dashed-line-parameters","dir":"Articles","previous_headings":"An overview of some of the function arguments","what":"Setting the grey box and dashed line parameters","title":"The empirical QQ plot","text":"grey box highlights interquartile ranges batches. boundary can modified via b.val argument. Likewise, lightly shaded dashed dots highlight mid 80% values can modified via l.val argument. example, highlight mid 68% values using grey boxes mid 95% values using lightly shaded dashed dots, type: can suppress plotting grey box lightly shaded dashed dots setting q = FALSE. affect median dashed lines.","code":"eda_qq(x, y, b.val = c(0.16, 0.84), l.val = c(0.025, 0.975))"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"applying-a-formula-to-one-of-the-batches","dir":"Articles","previous_headings":"An overview of some of the function arguments","what":"Applying a formula to one of the batches","title":"The empirical QQ plot","text":"can apply formula batch via fx argument x-variable fy argument y-variable. formula passed text string. example, add 0.5 x values, type:","code":"eda_qq(x, y, fx = \"x + 0.5\")"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"quantile-type","dir":"Articles","previous_headings":"An overview of some of the function arguments","what":"Quantile type","title":"The empirical QQ plot","text":"many different quantile algorithms available R. see full list quantile types, refer quantile help page: ?quantile. default, eda_qq() adopts q.type = 5. general, choice quantiles really matter, especially large datasets. want adopt R’s default type, set q.type = 7.","code":""},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"point-symbols","dir":"Articles","previous_headings":"An overview of some of the function arguments","what":"Point symbols","title":"The empirical QQ plot","text":"point symbol type, color size can modified via pch, p.col (/p-fill) size arguments. color can either built-color name (can see full list typing colors()) rgb() function. define color using one built-color names, can adjust transparency via alpha argument. alpha value 0 renders point completely transparent value 1 renders point completely opaque. point symbol can take two color parameters depending point type. pch number 21 25, p.fill define fill color p.col define border color. point symbol type, p.fill argument ignored. examples:","code":"eda_qq(x, y, p.fill = \"bisque\", p.col = \"red\", size = 1.2) eda_qq(x, y, pch = 16, p.col = \"tomato2\", size = 1.5, alpha = 0.5) eda_qq(x, y, pch = 3, p.col = \"tomato2\", size = 1.5)"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"interpreting-a-qq-plot","dir":"Articles","previous_headings":"","what":"Interpreting a QQ plot","title":"The empirical QQ plot","text":"help interpret following QQ plots, ’ll compare plot matching kernel density plots.","code":""},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"identical-distributions","dir":"Articles","previous_headings":"Interpreting a QQ plot","what":"Identical distributions","title":"The empirical QQ plot","text":"first example, generate QQ plot two identical distributions. two distributions identical, points line along x=y line shown . also generate overlapping density plots seen right plot.","code":"library(tukeyedar) set.seed(543) x <- rnorm(100) y <- x eda_qq(x, y) eda_dens(x, y)"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"additive-offset","dir":"Articles","previous_headings":"Interpreting a QQ plot","what":"Additive offset","title":"The empirical QQ plot","text":"work batches, time offset second batch, y, 2. case referred additive offset. ’ll note points parallel x=y line. indicates distributions exact shape. , fall x=y line–offset +2 units measured along y-axis, expected. can confirm adding 2 x batch: points overlap x=y line perfectly. density distribution overlap exactly well.","code":"library(tukeyedar) set.seed(543) x <- rnorm(100) y <- x + 2 eda_qq(x, y) eda_dens(x, y) eda_qq(x, y, fx = \"x + 2\") eda_dens(x, y, fx = \"x + 2\")"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"multiplicative-offset","dir":"Articles","previous_headings":"Interpreting a QQ plot","what":"Multiplicative offset","title":"The empirical QQ plot","text":"Next, explore two batches share central value second batch 0.5 times first. case referred multiplicative offset. , series points angle x=y line yet, ’ll note points follow perfectly straight line. suggests multiplicative offset change location. indicates “shape” batches similar, one “wider” . , y half wide x. can also state x twice wide y know multiplicative offset since synthetically generated values x y. practice eyeballing multiplier plot straightforward. can use suggested offset 0.5 displayed console help guide us. can also use angle points x=y line judge direction take choosing multiplier. points make angle less x=y line, want choose x multiplier less 1. angle greater x=y line, want choose multiplier greater 1. , know multiplier 0.5. Let’s confirm following code chunk:","code":"y <- x * 0.5 eda_qq(x, y) eda_dens(x, y) eda_qq(x, y, fx = \"x * 0.5\") eda_dens(x, y, fx = \"x * 0.5\")"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"both-additive-and-multiplicative-offset","dir":"Articles","previous_headings":"Interpreting a QQ plot","what":"Both additive and multiplicative offset","title":"The empirical QQ plot","text":"next example, add multiplicative additive offset data. now see multiplicative offset (points form angle x=y line) additive offset (points intersect x=y line). suggests width batches differ values offset constant value across full range values. ’s usually best first identify multiplicative offset points rendered parallel x=y line. multiplier identified, can identify additive offset. surprise, multiplier 0.5 renders series points parallel x=y line. can now eyeball offset +2 measuring distance points x=y line measured along y-axis. can model relationship y x \\(y = x * 0.5 + 2\\).","code":"y <- x * 0.5 + 2 eda_qq(x, y) eda_dens(x, y) eda_qq(x, y, fx = \"x * 0.5\") eda_dens(x, y, fx = \"x * 0.5\") eda_qq(x, y, fx = \"x * 0.5 + 2\") eda_dens(x, y, fx = \"x * 0.5 + 2\")"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"batches-need-not-be-symmetrical","dir":"Articles","previous_headings":"Interpreting a QQ plot","what":"Batches need not be symmetrical","title":"The empirical QQ plot","text":"far, worked normally distributed dataset. note distribution can used QQ plot. example, two equally skewed distributions differ central values generate points perfectly lined . Since distributions identical shape, points follow straight line regardless nature shape (skewed, unimodal, bimodal, etc…)","code":"set.seed(540) x2 <- rbeta(100, 1, 8) y2 <- x2 + 0.2 eda_qq(x2, y2) eda_dens(x2, y2)"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"perfectly-alligned-points-are-rare","dir":"Articles","previous_headings":"Interpreting a QQ plot","what":"Perfectly alligned points are rare","title":"The empirical QQ plot","text":"Note observational data, two batches pulled underlying population seldom follow perfectly straight line. example, note meandering pattern generated following QQ plot batches pulled Normal distribution. Note points meander x=y line, points tail near end distribution.","code":"set.seed(702) x <- rnorm(100) y <- rnorm(100) eda_qq(x, y)"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"power-transformation","dir":"Articles","previous_headings":"","what":"Power transformation","title":"The empirical QQ plot","text":"eda_qq function allows apply power transformation batches. Note transforming just one batch makes little sense since end comparing two batches measured different scale. example, make use R’s Indometh dataset compare indometacin plasma concentrations two test subjects. QQ plot’s grey boxes shifted towards lower values. grey boxes show mid 50% values batch (aka, IQR range). IQR shifted towards lower values higher values, suggests skewed data. Another telltale sign skewed dataset gradual dispersion points two diagonal directions. , go relatively high density points near lower values lower density points near higher values. preclude us identifying multiplicative/additive offsets, many statistical procedures benefit symmetrical distribution. Given values measures concentration, might want adopt log transformation. power transformation defined p argument (power parameter value 0 defines log transformation). seems decent job symmetrizing distributions. Note suggested offset displayed console applies transformed dataset. can verify applying offset x batch. can characterize differences indometacin plasma concentrations subject 1 subject 2 \\(log(conc)_{s2} = log(conc)_{s1} * 0.8501 + 0.3902\\)","code":"s1 <- subset(Indometh, Subject == 1, select = conc, drop = TRUE) # Test subject 1 s2 <- subset(Indometh, Subject == 2, select = conc, drop = TRUE) # Test subject 2 eda_qq(s1, s2) eda_qq(s1, s2, p = 0) #> [1] \"Suggested offsets:y = x * 0.8501 + (0.3902)\" eda_qq(s1, s2, p = 0, fx = \"x * 0.8501 + 0.3902\")"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"the-tukey-mean-difference-plot","dir":"Articles","previous_headings":"Power transformation","what":"The Tukey mean-difference plot","title":"The empirical QQ plot","text":"Tukey mean-difference plot simply extension QQ plot whereby plot rotated x=y line becomes horizontal. can useful helping identify subtle differences point pattern line. plot rotated 45° mapping difference batches y-axis, mapping mean batches x-axis. example, following figure left (QQ plot) shows additive offset batches fails clearly identify multiplicative offset. latter can clearly seen Tukey mean-difference plot (right) invoked setting argument md = TRUE.","code":"y <- x * 0.97 + 0.3 eda_qq(x, y, title = \"QQ plot\") eda_qq(x, y, md = TRUE, title = \"M-D plot\")"},{"path":"https://mgimond.github.io/tukeyedar/articles/qq.html","id":"a-working-example","dir":"Articles","previous_headings":"","what":"A working example","title":"The empirical QQ plot","text":"Many datasets distributions differ additively /multiplicatively, also general shape. may create complex point patterns QQ plot. cases can indicative different processes play different ranges values. ’ll explore case using wat95 wat05 dataframes available package. data represent derived normal temperatures 1981-2010 period (wat95) 1991-2020 period (wat05) city Waterville, Maine (USA). subset data daily average normals, avg: now compare distributions. first glance, batches seem differ. close points x=y line? rotate plot zoom x=y line setting md = TRUE. view proving far insightful. Note data, overall offset 0.5 suggesting new normals 0.5°F. warmer. stop , pattern observed Tukey mean-difference plot far random. fact, can break pattern three distinct ones around 35°F 50°F. categorize groups low, mid high values. next chunk code generate three separate QQ plots range values. suggested offsets displayed console group. important note splitting paired values generated earlier eda_qq function ($avg) values original datasets (old new). split original data prior combining eda_qq, generated different patterns QQ Tukey plots. approach generate different quantile pairs. Next, ’ll adopt suggested offsets generated console. proposed offsets seem good job characterizing differences temperatures. characterization differences normal temperatures old new set normals can formalized follows: \\[ new = \\begin{cases} old * 0.9506 + 1.661, & T_{avg} < 35 \\\\ old * 1.0469 - 1.6469, & 35 \\le T_{avg} < 50 \\\\ old * 0.9938 + 0.9268, & T_{avg} \\ge 50 \\end{cases} \\] key takeaways analysis can summarized follows: Overall, new normals 0.5°F warmer. offset uniform across full range temperature values. lower temperature (less 35°F), new normals slightly narrower distribution 1.7°F warmer. mid temperature values, (35°F 50°F new normals slightly wider distribution overall 1.6°F cooler. higher range temperature values (greater 50°F), new normals slightly narrower distribution 0.9°F warmer.","code":"old <- wat95$avg # legacy temperature normals new <- wat05$avg # current temperature normals out <- eda_qq(old, new) out <- eda_qq(old, new, md = TRUE) labs <- c(\"low\", \"mid\", \"high\") out$avg <- (out$x + out$y) / 2 out <- as.data.frame(out[c(1:2,6)]) out2 <- split(out, cut(out$avg, c(min(out$avg), 35, 50, max(out$avg)), labels = labs, include.lowest = TRUE)) sapply(labs, FUN = \\(x) {eda_qq(out2[[x]]$x, out2[[x]]$y , xlab = \"old\", ylab = \"new\", md = T) title(x, line = 3, col.main = \"orange\")} ) #> [1] \"Suggested offsets:y = x * 0.9506 + (1.661)\" #> [1] \"Suggested offsets:y = x * 1.0469 + (-1.6469)\" #> [1] \"Suggested offsets:y = x * 0.9938 + (0.9268)\" xform <- c(\"x * 0.9506 + 1.661\", \"x * 1.0469 - 1.6469\", \"x * 0.9938 + 0.9268\") names(xform) <- labs sapply(labs, FUN = \\(x) {eda_qq(out2[[x]]$x, out2[[x]]$y, fx = xform[x], xlab = \"old\", ylab = \"new\", md = T) title(x, line = 3, col.main = \"coral3\")} )"},{"path":"https://mgimond.github.io/tukeyedar/articles/RLine.html","id":"the-resistant-line-basics","dir":"Articles","previous_headings":"","what":"The resistant line basics","title":"Resistant Line","text":"eda_rline function fits robust line bivariate dataset. first breaking data three roughly equal sized batches following x-axis variable. uses batches’ median values compute slope intercept. However, function doesn’t stop . fitting inital line, function fits another line (following aforementioned methodology) model’s residuals. slope close zero, residual slope added original fitted model creating updated model. iteration repeated residual slope close zero residual slope changes sign (point average last two iterated slopes used final fit). example iteration follows using data Velleman et. al’s book. dataset, neoplasms, consists breast cancer mortality rates regions varying mean annual temperatures. three batches divided follows: Note 16 record dataset divisible three thus forcing extra point middle batch (remainder division three two, extra point added tail-end batches). Next, compute medians batch (highlighted red points following figure). two end medians used compute slope : \\[ b = \\frac{y_r - y_l}{x_r-x_l} \\] subscripts \\(r\\) \\(l\\) reference median values right-left-batches. slope computed, intercept can computed follows: \\[ median(y_{l,m,r} - b * x_{l,m,r}) \\] \\((x,y)_{l,m,r}\\) median x y values batch. line used compute first set residuals. line fitted residuals following procedure outlined . initial model slope intercept 3.412 -69.877 respectively residual’s slope intercept -0.873 41.451 respectively. residual slope added first computed slope process repeated thus generating following tweaked slope updated residuals: updated slope now 3.412 + (-0.873) = 2.539. iteration continues slope residuals stabilize. final line working example , final slope intercept 2.89 -45.91, respectively.","code":"#> (Intercept) x #> -21.794691 2.357695"},{"path":"https://mgimond.github.io/tukeyedar/articles/RLine.html","id":"implementing-the-resistant-line","dir":"Articles","previous_headings":"","what":"Implementing the resistant line","title":"Resistant Line","text":"eda_rline takes just three arguments: data frame, x variable y variable. function output list. elements b model’s intercept slope. vectors x y input values sorted x. res vector final residuals sorted x. xmed ymed vectors medians three batches. px py power transformations applied variables. output list class eda_rline. plot method available class. see resistant line compares ordinary least-squares (OLS) regression slope, add output lm model plot via abline(): regression model computes slope 2.36 whereas resistant line function generates slope 2.89. scatter plot, can spot point may undo influence regression line (point highlighted green following plot). Removing point data generates OLS regression line inline resistant model. point interest 15th record neoplasms data frame. Note OLS slope inline generated resistant line. ’ll also note resistant line slope also changed. Despite resistant nature line, removal point changed makeup first tier values (note leftward shift vertical dashed line). changed makeup batch thus changing median values first second tier batches.","code":"library(tukeyedar) M <- eda_rline(neoplasms, Temp, Mortality) M #> $b #> [1] 2.890173 #> #> $a #> [1] -45.90578 #> #> $res #> [1] 21.2982659 0.1398844 -2.1791908 8.8294798 -11.2485549 -7.6167630 #> [7] -0.1398844 4.7589595 -9.0092486 -2.1994220 2.7554913 -7.2676301 #> [13] -0.3907514 6.1861272 1.7971098 0.1398844 #> #> $x #> [1] 31.8 34.0 40.2 42.1 42.3 43.5 44.2 45.1 46.3 47.3 47.8 48.5 49.2 49.9 50.0 #> [16] 51.3 #> #> $y #> [1] 67.3 52.5 68.1 84.6 65.1 72.2 81.7 89.2 78.9 88.6 95.0 87.0 #> [13] 95.9 104.5 100.4 102.5 #> #> $xmed #> [1] 40.2 45.7 49.9 #> #> $ymed #> [1] 67.30 85.15 100.40 #> #> $index #> [1] 5 11 16 #> #> $xlab #> [1] \"Temp\" #> #> $ylab #> [1] \"Mortality\" #> #> $px #> [1] 1 #> #> $py #> [1] 1 #> #> $iter #> [1] 4 #> #> attr(,\"class\") #> [1] \"eda_rline\" plot(M) abline(lm(Mortality ~ Temp, neoplasms), lty = 2) points(neoplasms[15,], col=\"#43CD80\",cex=1.5 ,pch=20) neoplasms.sub <- neoplasms[-15,] M.sub <- eda_rline(neoplasms.sub, Temp, Mortality) plot(M.sub) abline(lm(Mortality ~ Temp, neoplasms.sub), lty = 2) # Regression model with data subset"},{"path":[]},{"path":"https://mgimond.github.io/tukeyedar/articles/RLine.html","id":"nine-point-data","dir":"Articles","previous_headings":"Other examples","what":"Nine point data","title":"Resistant Line","text":"nine_point dataset used Hoaglin et. al (p. 139) test resistant line function’s ability stabilize wild oscillations computed slopes across iterations. , slope intercept 0.067 0.133 respectively matching 1/15 2/15 values computed Hoaglin et. al.","code":"M <- eda_rline(nine_point, X,Y) plot(M)"},{"path":"https://mgimond.github.io/tukeyedar/articles/RLine.html","id":"age-vs--height-data","dir":"Articles","previous_headings":"Other examples","what":"Age vs. height data","title":"Resistant Line","text":"age_height another dataset found Hoaglin et. al (p. 135). gives ages heights children private urban school. , slope intercept 0.429 91.007 respectively matching 0.426 slope closely matching 90.366 intercept values computed Hoaglin et. al page 137.","code":"M <- eda_rline(age_height, Months,Height) plot(M)"},{"path":"https://mgimond.github.io/tukeyedar/articles/RLine.html","id":"not-all-relationships-are-linear","dir":"Articles","previous_headings":"","what":"Not all relationships are linear!","title":"Resistant Line","text":"’s important remember resistant line technique valid bivariate relationship linear. , ’ll step example highlighted Velleman et. al (p. 138) using R built-mtcars dataset. First, ’ll fit resistant line data. ’s important note just resistant line can fit necessarily imply relationship linear. assess linearity mtcars dataset, ’ll make use eda_3pt function. ’s clear two half slopes relationship linear. Velleman et. al first suggest re-expressing mpg 1/mpg (.e. applying power transformation -1) giving us number gallons consumed per mile driven. two half slopes still differ. therefore opt re-express disp variable. One possibility take inverse 1/3 since displacement measure volume (e.g. length3) gives us: Now identified re-expressions linearises relationship, can fit resistant line. (Note grey line generated eda_3pt function resistant line generated eda_rline.)","code":"M <- eda_rline(mtcars, disp, mpg) plot(M) eda_3pt(mtcars, disp, mpg) eda_3pt(mtcars, disp, mpg, py = -1, ylab = \"gal/mi\") eda_3pt(mtcars, disp, mpg, px = -1/3, py = -1, ylab = \"gal/mi\", xlab = expression(\"Displacement\"^{-1/3})) M <- eda_rline(mtcars, disp, mpg, px = -1/3, py = -1) plot(M, ylab = \"gal/mi\", xlab = expression(\"Displacement\"^{-1/3}))"},{"path":"https://mgimond.github.io/tukeyedar/articles/RLine.html","id":"computing-a-confidence-interval","dir":"Articles","previous_headings":"","what":"Computing a confidence interval","title":"Resistant Line","text":"Confidence intervals coefficients can estimated using bootstrapping techniques. two approaches: resampling residuals resampling x-y cases.","code":""},{"path":"https://mgimond.github.io/tukeyedar/articles/RLine.html","id":"resampling-the-model-residuals","dir":"Articles","previous_headings":"Computing a confidence interval","what":"Resampling the model residuals","title":"Resistant Line","text":", fit resistant line extract residuals. re-run model many times replacing original y values modeled y values plus resampled residuals generate confidence intervals. Now plot distributions, tabulate 95% confidence interval.","code":"n <- 999 # Set number of iterations M <- eda_rline(neoplasms, Temp, Mortality) # Fit the resistant line bt <- array(0, dim=c(n, 2)) # Create empty bootstrap array for(i in 1:n){ #bootstrap loop df.bt <- data.frame(x=M$x, y = M$y +sample(M$res,replace=TRUE)) bt[i,1] <- eda_rline(df.bt,x,y)$a bt[i,2] <- eda_rline(df.bt,x,y)$b } hist(bt[,1], main=\"Intercept distribution\") hist(bt[,2], main=\"Slope distribution\") conf <- t(data.frame(Intercept = quantile(bt[,1], p=c(0.05,0.95) ), Slope = quantile(bt[,2], p=c(0.05,0.95) ))) conf #> 5% 95% #> Intercept -77.21150 12.799352 #> Slope 1.61773 3.564112"},{"path":"https://mgimond.github.io/tukeyedar/articles/RLine.html","id":"resampling-the-x-y-paired-values","dir":"Articles","previous_headings":"Computing a confidence interval","what":"Resampling the x-y paired values","title":"Resistant Line","text":", resample x-y paired values (replacement) compute resistant line time. Now plot distributions, tabulate 95% confidence interval.","code":"n <- 1999 # Set number of iterations bt <- array(0, dim=c(n, 2)) # Create empty bootstrap array for(i in 1:n){ #bootstrap loop recs <- sample(1:nrow(neoplasms), replace = TRUE) df.bt <- neoplasms[recs,] bt[i,1]=eda_rline(df.bt,Temp,Mortality)$a bt[i,2]=eda_rline(df.bt,Temp,Mortality)$b } hist(bt[,1], main=\"Intercept distribution\") hist(bt[,2], main=\"Slope distribution\") conf <- t(data.frame(Intercept = quantile(bt[,1], p=c(0.05,0.95) ), Slope = quantile(bt[,2], p=c(0.05,0.95) ))) conf #> 5% 95% #> Intercept -108.668421 15.031034 #> Slope 1.643678 4.157895"},{"path":"https://mgimond.github.io/tukeyedar/articles/RLine.html","id":"references","dir":"Articles","previous_headings":"","what":"References","title":"Resistant Line","text":"Applications, Basics Computing Exploratory Data Analysis, P.F. Velleman D.C. Hoaglin, 1981. Understanding robust exploratory data analysis, D.C. Hoaglin, F. Mosteller J.W. Tukey, 1983.","code":""},{"path":"https://mgimond.github.io/tukeyedar/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Manuel Gimond. Author, maintainer.","code":""},{"path":"https://mgimond.github.io/tukeyedar/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"text","code":"@Misc{, title = {tukeyedar: A package of Tukey inspired EDA functions}, author = {Manuel Gimond}, url = {https://mgimond.github.io/tukeyedar/}, year = {2021}, }"},{"path":"https://mgimond.github.io/tukeyedar/index.html","id":"tukeyedar","dir":"","previous_headings":"","what":"Tukey Inspired Exploratory Data Analysis Functions","title":"Tukey Inspired Exploratory Data Analysis Functions","text":"tukeyedar package houses data exploration tools. Many functions inspired work published Tukey (1977), Hoaglin (1983), Velleman Hoaglin (1981), Cleveland (1993). Note package beta mode, use discretion.","code":""},{"path":"https://mgimond.github.io/tukeyedar/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Tukey Inspired Exploratory Data Analysis Functions","text":"can install development version tukeyedar GitHub : Note vignettes automatically generated command; note vignettes available website (see next section). want local version vignettes, add build_vignettes = TRUE parameter. , reason vignettes created, might want re-install package force=TRUE parameter.","code":"# install.packages(\"devtools\") devtools::install_github(\"mgimond/tukeyedar\") devtools::install_github(\"mgimond/tukeyedar\", build_vignettes = TRUE) devtools::install_github(\"mgimond/tukeyedar\", build_vignettes = TRUE, force=TRUE)"},{"path":"https://mgimond.github.io/tukeyedar/index.html","id":"vignettes","dir":"","previous_headings":"","what":"Vignettes","title":"Tukey Inspired Exploratory Data Analysis Functions","text":"’s strongly recommended read vignettes. can accessed website: detailed rundown resistant line function median polish empirical QQ plot chose vignettes locally created installed package, can view locally via vignette(\"RLine\", package = \"tukeyedar\"). use dark themed IDE, vignettes may render well might opt view web browser via functions RShowDoc(\"RLine\", package = \"tukeyedar\").","code":""},{"path":"https://mgimond.github.io/tukeyedar/index.html","id":"using-the-functions","dir":"","previous_headings":"","what":"Using the functions","title":"Tukey Inspired Exploratory Data Analysis Functions","text":"functions start eda_. example, generate three point summary plot mpg vs. disp mtcars dataset, type: Note functions pipe friendly. example, following works:","code":"library(tukeyedar) eda_3pt(mtcars, disp, mpg) # Using R >= 4.1 mtcars |> eda_3pt(disp, mpg) # Using magrittr (or any of the tidyverse packages) library(magrittr) mtcars %>% eda_3pt(disp, mpg)"},{"path":"https://mgimond.github.io/tukeyedar/reference/age_height.html","id":null,"dir":"Reference","previous_headings":"","what":"Age vs. height for private and rural school children — age_height","title":"Age vs. height for private and rural school children — age_height","text":"data reproduced Hoaglin et al.'s book originally sourced Bernard G. Greenberg (1953) American Journal Public Health (vol 43, pp. 692-699). dataset tabulate children's height weight urban private rural public schools.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/age_height.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Age vs. height for private and rural school children — age_height","text":"","code":"age_height"},{"path":"https://mgimond.github.io/tukeyedar/reference/age_height.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Age vs. height for private and rural school children — age_height","text":"data frame 18 rows 2 variables: Months Child's age months Height Child's height cm","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/age_height.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Age vs. height for private and rural school children — age_height","text":"Understanding robust exploratory data analysis, D.C. Hoaglin, F. Mosteller J.W. Tukey. (page 135)","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_3pt.html","id":null,"dir":"Reference","previous_headings":"","what":"3-point summary plot — eda_3pt","title":"3-point summary plot — eda_3pt","text":"eda_3pt splits data 3 groups (whose summary locations defined respective medians), two half slopes linking groups. function return scatter plot showing half-slopes red solid lines. solid grey slope linking tail-end groups shows desired shape half-slopes. goal two halve slopes line closely possible solid grey slope via re-expression techniques seeking linear relationship variables. function also return half-slopes ratio hsrtio direction re-expression X Y values ladder powers.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_3pt.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"3-point summary plot — eda_3pt","text":"","code":"eda_3pt( dat, x, y, px = 1, py = 1, tukey = TRUE, axes = TRUE, pch = 21, equal = TRUE, p.col = \"grey50\", p.fill = \"grey80\", size = 0.8, alpha = 0.7, xlab = NULL, ylab = NULL, dir = TRUE, grey = 0.6, ... )"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_3pt.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"3-point summary plot — eda_3pt","text":"dat Data frame x Column name assigned x axis y Column name assigned y axis px Power transformation apply x-variable py Power transformation apply y-variable tukey Boolean determining Tukey transformation adopted (FALSE adopts Box-Cox transformation) axes Boolean determining axes drawn. pch Point symbol type equal Boolean determining axes lengths match (.e. squate plot). p.col Color point symbol. p.fill Point fill color passed bg (used pch ranging 21-25). size Point size (0-1) alpha Point transparency (0 = transparent, 1 = opaque). applicable rgb() used define point colors. xlab X label output plot ylab Y label output plot dir Boolean indicating suggested ladder power direction displayed grey Grey level apply plot elements (0 1 1 = black) ... parameters passed graphics::plot function.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_3pt.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"3-point summary plot — eda_3pt","text":"Generates plot returns list following named components: hsrtio: ratio slopes. value close one suggests transformation needed. xmed: x-coordinate values three summary points. ymed: y-coordinate values three summary points.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_3pt.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"3-point summary plot — eda_3pt","text":"Computes three-point summary originally defined Tukey's EDA book (see reference).","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_3pt.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"3-point summary plot — eda_3pt","text":"Velleman, P. F., D. C. Hoaglin. 1981. Applications, Basics Computing Exploratory Data Analysis. Boston: Duxbury Press. D. C. Hoaglin, F. Mosteller, J. W. Tukey. 1983. Understanding Robust Exploratory Data Analysis. Wiley. Tukey, John W. 1977. Exploratory Data Analysis. Addison-Wesley.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_3pt.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"3-point summary plot — eda_3pt","text":"","code":"hsratio <- eda_3pt(cars, speed, dist) hsratio <- eda_3pt(cars, speed, dist, py = 1/3, ylab=expression(\"Dist\"^{1/3}))"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_add.html","id":null,"dir":"Reference","previous_headings":"","what":"Add graphical EDA elements to existing plot — eda_add","title":"Add graphical EDA elements to existing plot — eda_add","text":"eda_add adds graphical EDA elements scatter plot. Currently adds eda_rline fit points.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_add.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Add graphical EDA elements to existing plot — eda_add","text":"","code":"eda_add( x, pch = 24, p.col = \"darkred\", p.fill = \"yellow\", lty = 1, l.col = \"darkred\" )"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_add.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Add graphical EDA elements to existing plot — eda_add","text":"x Object class eda_rline pch Point symbol type p.col Point color passed col p.fill Point fill color passed bg (used pch ranging 21-25) lty Line type l.col Line color","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_add.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Add graphical EDA elements to existing plot — eda_add","text":"Returns eda_rline intercept slope. : Intercept b: Slope","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_add.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Add graphical EDA elements to existing plot — eda_add","text":"function adds eda_rline slope 3-pt summary points existing scatter plot. See accompanying vignette Resistant Line detailed breakdown resistant line technique.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_add.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Add graphical EDA elements to existing plot — eda_add","text":"","code":"eda_lm(mtcars, x = wt, y = mpg) #> (Intercept) x #> 37.285126 -5.344472 Mr <- eda_rline(mtcars, x=wt, y=mpg) eda_add(Mr, l.col = \"blue\") #> $a #> [1] 37.763 #> #> $b #> [1] -5.524372 #>"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_bipow.html","id":null,"dir":"Reference","previous_headings":"","what":"Ladder of powers transformation on bivariate data with three-point summary plot — eda_bipow","title":"Ladder of powers transformation on bivariate data with three-point summary plot — eda_bipow","text":"Re-expresses vector ladder powers. Requires eda_3pt() function.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_bipow.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Ladder of powers transformation on bivariate data with three-point summary plot — eda_bipow","text":"","code":"eda_bipow(dat, x, y, p = c(-1, 0, 0.5, 1, 2), tukey = TRUE, ...)"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_bipow.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Ladder of powers transformation on bivariate data with three-point summary plot — eda_bipow","text":"dat Data frame x Column name assigned x axis y Column name assigned y axis p Vector powers tukey set TRUE, adopt Tukey's power transformation. FALSE, adopt Box-Cox transformation. ... parameters passed graphics::plot function.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_bipow.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Ladder of powers transformation on bivariate data with three-point summary plot — eda_bipow","text":"return value","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_bipow.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Ladder of powers transformation on bivariate data with three-point summary plot — eda_bipow","text":"Generates matrix scatter plots boxplots various re-expressions x y values. 3-point summary associated half-slopes also plotted (function makes use eda_3pt function). values re-expressed using either Tukey power transformation (default) Box-Cox transformation (see eda_re information transformation techniques). Axes labels omitted reduce plot clutter.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_bipow.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Ladder of powers transformation on bivariate data with three-point summary plot — eda_bipow","text":"Tukey, John W. 1977. Exploratory Data Analysis. Addison-Wesley.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_bipow.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Ladder of powers transformation on bivariate data with three-point summary plot — eda_bipow","text":"","code":"data(cars) # Example 1 eda_bipow(dat = cars, x = speed, y = dist) # Custom powers eda_bipow(dat = cars, x = speed, y = dist, p = c(-1, -0.5, 0, 0.5, 1)) # Adopt box-cox transformation eda_bipow(dat = cars, x = speed, y = dist, tukey = FALSE, p = c(-1, -0.5, 0, 0.5, 1))"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_boxls.html","id":null,"dir":"Reference","previous_headings":"","what":"Boxplots equalized by level and spread — eda_boxls","title":"Boxplots equalized by level and spread — eda_boxls","text":"eda_boxls creates boxplots conditioned one variable providing option spreads levels /levels.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_boxls.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Boxplots equalized by level and spread — eda_boxls","text":"","code":"eda_boxls( dat, x, fac, p = 1, tukey = FALSE, outlier = TRUE, out.txt = NULL, type = \"none\", notch = FALSE, horiz = FALSE, outliers = TRUE, xlab = NULL, ylab = NULL, grey = 0.6, reorder = TRUE, reorder.stat = \"median\" )"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_boxls.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Boxplots equalized by level and spread — eda_boxls","text":"dat Data frame x Column name assigned values fac Column name assigned factor values conditioned p Power transformation apply variable tukey Boolean determining Tukey transformation adopted (FALSE adopts Box-Cox transformation) outlier Boolean indicating outliers plotted .txt Column whose values used label outliers type Plot type. \"none\" = equalization ; \"l\" = equalize level; \"ls\" = equalize level spread notch Boolean determining notches added. horiz plot horizontally (TRUE) vertically (FALSE) outliers plot outliers (TRUE) (FALSE) xlab X label output plot ylab Y label output plot grey Grey level apply plot elements (0 1 1 = black) reorder Boolean determining factors reordered based median, upper quartile lower quartile (set reorder.type). reorder.stat Statistic reorder level reorder set TRUE. Either \"median\", \"upper\" (upper quartile) \"lower\" (lower quartile). type set value \"none\", argument ignored stat defaults \"median\".","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_boxls.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Boxplots equalized by level and spread — eda_boxls","text":"values returned","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_boxls.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Boxplots equalized by level and spread — eda_boxls","text":"default, boxplots re-ordered median values. outlier text displayed value, modified data equalized level spread. Note notch offers 95 percent test null true medians equal assuming distribution batch approximately normal. notches overlap, can assume medians significantly different 0.05 level. Note notches correct multiple comparison issues three batches plotted.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_boxls.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Boxplots equalized by level and spread — eda_boxls","text":"","code":"# A basic boxplot. The outlier is labeled with the row number by default. eda_boxls(mtcars,mpg, cyl, type=\"none\") # A basic boxplot. The outlier is labeled with its own value. eda_boxls(mtcars,mpg, cyl, type=\"none\", out.txt=mpg ) # Boxplot equalized by level. Note that the outlier text is labeled with its # original value. eda_boxls(mtcars,mpg, cyl, type=\"l\", out.txt=mpg ) #> ======================== #> Note that the data have been equalized with \"type\" set to \"l\". #> ======================== # Boxplots equalized by level and spread eda_boxls(mtcars,mpg, cyl, type=\"ls\", out.txt=mpg ) #> ======================== #> Note that the data have been equalized with \"type\" set to \"ls\". #> ======================== # Hide outlier eda_boxls(mtcars,mpg, cyl, type=\"ls\", out.txt=mpg , outlier=FALSE) #> ======================== #> Note that the data have been equalized with \"type\" set to \"ls\". #> ======================== # Equalizing level helps visualize increasing spread with increasing # median value food <- read.csv(\"http://mgimond.github.io/ES218/Data/Food_web.csv\") eda_boxls(food, mean.length, dimension, type = \"l\") #> ======================== #> Note that the data have been equalized with \"type\" set to \"l\". #> ======================== # For long factor level names, flip plot eda_boxls(iris, Sepal.Length, Species, out.txt=Sepal.Length , horiz = TRUE) # By default, plots are ordered by their medians. singer <- lattice::singer eda_boxls(singer, height, voice.part, out.txt=height, horiz = TRUE) # To order by top quartile, set reorder.stat to \"upper\" eda_boxls(singer, height, voice.part, out.txt=height, horiz = TRUE, reorder.stat = \"upper\")"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_dens.html","id":null,"dir":"Reference","previous_headings":"","what":"Overlapping density distributions for two variables — eda_dens","title":"Overlapping density distributions for two variables — eda_dens","text":"eda_dens generates overlapping density distributions two variables.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_dens.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Overlapping density distributions for two variables — eda_dens","text":"","code":"eda_dens( x, y, fac = NULL, p = 1L, tukey = FALSE, fx = NULL, fy = NULL, grey = 0.6, col = \"red\", size = 0.8, alpha = 0.4, xlab = NULL, ylab = NULL, legend = TRUE, ... )"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_dens.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Overlapping density distributions for two variables — eda_dens","text":"x Vector first variable dataframe. y Vector second variable column defining continuous variable x dataframe. fac Column defining grouping variable x dataframe. p Power transformation apply sets values. tukey Boolean determining Tukey transformation adopted (FALSE adopts Box-Cox transformation). fx Formula apply x variable. computed transformation applied x variable. fy Formula apply y variable. computed transformation applied y variable. grey Grey level apply plot elements (0 1 1 = black). col Fill color second density distribution. size Point size (0-1). alpha Fill transparency (0 = transparent, 1 = opaque). applicable rgb() used define fill colors. xlab X variable label. Ignored x dataframe. ylab Y variable label. Ignored x dataframe. legend Boolean determining legend added plot. ... Arguments passed stats::density() function.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_dens.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Overlapping density distributions for two variables — eda_dens","text":"return value.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_dens.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Overlapping density distributions for two variables — eda_dens","text":"function generate overlapping density plots first variable assigned grey color second variable assigned default red color.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_dens.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Overlapping density distributions for two variables — eda_dens","text":"","code":"# Passing data as two separate vector objects set.seed(207) x <- rbeta(1000,2,8) y <- x * 1.5 + 0.1 eda_dens(x, y) # Passing data as a dataframe dat <- data.frame(val = c(x, y), grp = c(rep(\"x\", length(x)), rep(\"y\", length(y)))) eda_dens(dat, val, grp)"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_lm.html","id":null,"dir":"Reference","previous_headings":"","what":"Least Squares regression plot (with optional LOESS fit) — eda_lm","title":"Least Squares regression plot (with optional LOESS fit) — eda_lm","text":"eda_lm generates scatter plot fitted regression line. loess line can also added plot model comparison. axes scaled respective standard deviations match axes unit length.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_lm.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Least Squares regression plot (with optional LOESS fit) — eda_lm","text":"","code":"eda_lm( dat, x, y, xlab = NULL, ylab = NULL, px = 1, py = 1, tukey = FALSE, show.par = TRUE, reg = TRUE, w = NULL, sd = TRUE, mean.l = TRUE, grey = 0.6, pch = 21, p.col = \"grey50\", p.fill = \"grey80\", size = 0.8, alpha = 0.8, q = FALSE, q.val = c(0.16, 0.84), q.type = 5, loe = FALSE, lm.col = rgb(1, 0.5, 0.5, 0.8), loe.col = rgb(0.3, 0.3, 1, 1), stats = FALSE, stat.size = 0.8, loess.d = list(family = \"symmetric\", span = 0.7, degree = 1), ... )"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_lm.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Least Squares regression plot (with optional LOESS fit) — eda_lm","text":"dat Data frame. x Column assigned x axis. y Column assigned y axis. xlab X label output plot. ylab Y label output plot. px Power transformation apply x-variable. py Power transformation apply y-variable. tukey Boolean determining Tukey transformation adopted (FALSE adopts Box-Cox transformation). show.par Boolean determining power transformation displayed plot. reg Boolean indicating whether least squares regression line plotted. w Weight pass regression model. sd Boolean determining standard deviation lines plotted. mean.l Boolean determining x y mean lines added plot. grey Grey level apply plot elements (0 1 1 = black). pch Point symbol type. p.col Color point symbol. p.fill Point fill color passed bg (used pch ranging 21-25). size Point size (0-1). alpha Point transparency (0 = transparent, 1 = opaque). applicable rgb() used define point colors. q Boolean determining grey quantile boxes plotted. q.val F-values use define quantile box parameters. Defaults mid 68 used generate box. q.type Quantile type. Defaults 5 (Cleveland's f-quantile definition). loe Boolean indicating loess curve fitted. lm.col Regression line color. loe.col LOESS curve color. stats Boolean indicating regression summary statistics displayed. stat.size Text size stats output plot. loess.d list parameters passed loess.smooth function. robust loess used default. ... used.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_lm.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Least Squares regression plot (with optional LOESS fit) — eda_lm","text":"Returns residuals, intercept slope OLS fit. residuals: Regression model residuals : Intercept b: Slope","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_lm.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Least Squares regression plot (with optional LOESS fit) — eda_lm","text":"function plot OLS regression line , requested, loess fit. plot also display +/- 1 standard deviations dashed lines. theory, x y values follow perfectly Normal distribution, roughly 68 percent points fall lines. true 68 percent values can displayed grey rectangles setting q=TRUE. uses quantile function compute upper lower bounds defining inner 68 percent values. data follow Normal distribution, grey rectangle edges coincide +/- 1SD dashed lines. wish show interquartlie ranges (IQR) instead inner 68 percent values, simply set q.val = c(0.25,0.75). plot option re-express values via px py arguments. note re-expression produces NaN values (negative value logged) points removed plot. result fewer observations plotted. observations removed result re-expression warning message displayed console. re-expression powers shown upper right side plot. suppress display re-expressions set show.par = FALSE.","code":""},{"path":[]},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_lm.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Least Squares regression plot (with optional LOESS fit) — eda_lm","text":"","code":"# Add a regular (OLS) regression model and loess smooth to the data eda_lm(mtcars, wt, mpg, loe = TRUE) #> (Intercept) x #> 37.285126 -5.344472 # Add the inner 68% quantile to compare the true 68% of data to the SD eda_lm(mtcars, wt, mpg, loe = TRUE, q = TRUE) #> (Intercept) x #> 37.285126 -5.344472 # Show the IQR box eda_lm(mtcars, wt, mpg, loe = TRUE, q = TRUE, sd = FALSE, q.val = c(0.25,0.75)) #> (Intercept) x #> 37.285126 -5.344472 # Fit an OLS to the Income for Female vs Male df2 <- read.csv(\"https://mgimond.github.io/ES218/Data/Income_education.csv\") eda_lm(df2, x=B20004013, y = B20004007, xlab = \"Female\", ylab = \"Male\", loe = TRUE) #> (Intercept) x #> 10503.090485 1.086416 # Add the inner 68% quantile to compare the true 68% of data to the SD eda_lm(df2, x = B20004013, y = B20004007, xlab = \"Female\", ylab = \"Male\", q = TRUE) #> (Intercept) x #> 10503.090485 1.086416 # Apply a transformation to x and y axes: x -> 1/3 and y -> log eda_lm(df2, x = B20004013, y = B20004007, xlab = \"Female\", ylab = \"Male\", px = 1/3, py = 0, q = TRUE, loe = TRUE) #> (Intercept) x #> 8.58646713 0.02287702"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_lsum.html","id":null,"dir":"Reference","previous_headings":"","what":"Tukey's letter value summaries — eda_lsum","title":"Tukey's letter value summaries — eda_lsum","text":"eda_lsum letter value summary introduced John Tukey extends boxplot's 5 number summary exploring symmetry batch depth levels half (median) fourth (quartiles).","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_lsum.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Tukey's letter value summaries — eda_lsum","text":"","code":"eda_lsum(x, l = 5, all = TRUE)"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_lsum.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Tukey's letter value summaries — eda_lsum","text":"x Vector l Number levels (max = 9) Generate upper, lower mid summaries TRUE just generate mid summaries FALSE","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_lsum.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Tukey's letter value summaries — eda_lsum","text":"Returns dataframe letter value summaries.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_lsum.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Tukey's letter value summaries — eda_lsum","text":"Outputs data frame letter value summaries.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_lsum.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Tukey's letter value summaries — eda_lsum","text":"Exploratory Data Analysis, John Tukey, 1973.","code":""},{"path":[]},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_lsum.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Tukey's letter value summaries — eda_lsum","text":"","code":"x <- c(22, 8, 11, 3, 26, 1, 14, 18, 20, 25, 24) eda_lsum(x) #> letter depth lower mid upper spread #> 1 M 6.0 18.0 18.00 18.0 0.0 #> 2 H 3.5 9.5 16.25 23.0 13.5 #> 3 E 2.0 3.0 14.00 25.0 22.0 #> 4 D 1.5 2.0 13.75 25.5 23.5 #> 5 C 1.0 1.0 13.50 26.0 25.0"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_normfit.html","id":null,"dir":"Reference","previous_headings":"","what":"Normal fit vs density plot. — eda_normfit","title":"Normal fit vs density plot. — eda_normfit","text":"eda_normfit generates fitted Normal distribution data option compare density distribution.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_normfit.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Normal fit vs density plot. — eda_normfit","text":"","code":"eda_normfit( dat, x = NULL, grp = NULL, p = 1, tukey = FALSE, show.par = TRUE, sq = TRUE, inner = 0.6826, dens = TRUE, bw = \"SJ-dpi\", kernel = \"gaussian\", pch = 16, size = 0.8, alpha = 0.3, p.col = \"grey50\", p.fill = \"grey80\", grey = 0.7, col.ends = \"grey90\", col.mid = \"#EBC89B\", col.ends.dens = \"grey90\", col.mid.dens = \"#EBC89B\", offset = 0.02, tsize = 1.5, xlab = NULL, ylab = NULL, ... )"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_normfit.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Normal fit vs density plot. — eda_normfit","text":"dat Vector values, dataframe. x Column values dat dataframe, ignored otherwise. grp Column grouping variables dat dataframe, ignored otherwise. p Power transformation apply values. tukey Boolean determining Tukey transformation adopted (TRUE) Box-Cox transformation adopted (FALSE). show.par Boolean determining power transformation used data displayed plot's upper-right corner. sq Boolean determining plot square. inner Fraction values captured inner color band normal density plots. Defaults 0.6826 (inner 68% values). dens Boolean determining density plot displayed alongside Normal fit plot. bw Bandwidth parameter passed density() function. kernel Kernel parameter passed density() function. pch Point symbol type. size Point side. alpha Fill transparency (0 = transparent, 1 = opaque). applicable rgb() used define fill colors. p.col Color point symbol. p.fill Point fill color passed bg (used pch ranging 21-25). grey Grey level apply plot elements axes, labels, etc... (0 1 1 = black). col.ends Fill color ends Normal distribution. col.mid Fill color middle band Normal distribution. col.ends.dens Fill color ends density distribution. col.mid.dens Fill color middle band density distribution. offset value (x-axis units) defines gap left right side plots. Ignored dens FALSE. tsize Size plot title. xlab X variable label. ylab Y variable label. ... Note used.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_normfit.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Normal fit vs density plot. — eda_normfit","text":"return value.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_normfit.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Normal fit vs density plot. — eda_normfit","text":"function generate (symmetrical) Normal distribution fitted data dens set FALSE side--side density/Normal fit plot dens set TRUE. latter, density plot left side Normal fit right side vertical axis. plots two fill colors: one inner band outer band. inner band shows area curve encompasses desired fraction values defined inner. default, value 0.6826, 68.26% (roughly percentage values covered +/- 1 standard deviations Normal distribution). Normal fit plot, range computed theoretical Normal actual values. density plot, range computed actual values. density plot desired, dens = TRUE, gap (defined offset) created left side density plot right side Normal fit plot. function makes use built- stats::density function. , can pass bw kernel parameters density() function. Points showing location values along y-axis also added help view distributions relative density Normal fit curves. Measures centrality computed differently Normal fit density plots. mean computed Normal fit plot median computed density plot. measures centrality shown black horizontal lines plot. areas density Normal fit plots scaled peak values, respectively. , areas compared distributions.","code":""},{"path":[]},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_normfit.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Normal fit vs density plot. — eda_normfit","text":"","code":"# Explore a skewed distribution set.seed(218) x <- rexp(500) # Generate base histogram hist(x) # Plot density/Normal fit plot eda_normfit(x) eda_normfit(x) # Limit the plot to just a Normal fit eda_normfit(x, dens = FALSE) #> #> !!!!!!!!!!!!!!!!!!!!!!!! #> Note that this is not a density plot. #> It's the Normal characterization of the data #> using the data's standard deviation. #> !!!!!!!!!!!!!!!!!!!!!!!! #> # The inner band's range can be modified. Here, we view the interquartile # range, the +/- 1 standard deviation range and the inner 95% range) OP <- par(mfrow = c(1,3)) invisible(sapply(c(0.5, 0.6826, 0.95), function(prop) eda_normfit(x, inner = prop, tsize = 1, ylab = paste(prop*100,\"% of values\")))) par(OP) # The bandwidth selector can also be specified OP <- par(mfrow=c(2,3)) invisible(sapply(c(\"SJ-dpi\", \"nrd0\", \"nrd\", \"SJ-ste\", \"bcv\", \"ucv\" ), function(band) eda_normfit(x, bw = band, tsize=0.9, size=0, offset=0.005, ylab = band))) par(OP) # The bandwidth argument can also be passed a numeric value OP <- par(mfrow=c(1,3)) invisible(sapply(c(0.2, 0.1, 0.05 ), function(band) eda_normfit(x, bw = band, tsize=1,size=.5, offset=0.01, ylab = band))) par(OP) # Examples of a few kernel options OP <- par(mfrow=c(1,3)) invisible(sapply(c(\"gaussian\", \"optcosine\", \"rectangular\" ), function(k) eda_normfit(x, kernel = k, tsize=1, size=.5, offset=0.01, ylab = k))) par(OP) # Another example where data are passed as a dataframe set.seed(540) dat <- data.frame(value = rbeta(20, 1, 50), grp = sample(letters[1:3], 100, replace = TRUE)) eda_normfit(dat, value, grp) # Points can be removed and the gap rendered narrower eda_normfit(dat, value, grp, size = 0, offset = 0.01) # Color can be modified. Here we modify the density plot fill colors eda_normfit(dat, value, grp, size = 0, offset = 0.01, col.ends.dens = \"#A1D99B\", col.mid.dens = \"#E5F5E0\") # A power transformation can be applied to the data. Here # we'll apply a log transformation eda_normfit(dat, value, grp, p = 0)"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_pol.html","id":null,"dir":"Reference","previous_headings":"","what":"Polish two-way tables — eda_pol","title":"Polish two-way tables — eda_pol","text":"eda_pol Polishes two-way tables using median, means, customizable functions.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_pol.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Polish two-way tables — eda_pol","text":"","code":"eda_pol( x, row = NULL, col = NULL, val = NULL, stat = median, plot = TRUE, eps = 0.01, maxiter = 5, sort = FALSE, p = 1, tukey = FALSE, offset = 1e-05, col.quant = FALSE, colpal = \"RdYlBu\", adj.mar = TRUE, res.size = 1, row.size = 1, col.size = 1, res.txt = TRUE, label.txt = TRUE )"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_pol.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Polish two-way tables — eda_pol","text":"x three column data frame row Name column assigned row effect col Name column assigned column effect val Name column assigned response variable stat Polishing statistic (default median) plot Boolean determining output plot generated eps Convergence tolerance parameter maxiter Maximum number iterations sort Boolean determining effects row/columns sorted p Re-expression power parameter tukey Boolean determining Tukey's power transformation used. FALSE, Box-Cox transformation adopted. offset Offset add values leat one value 0 power negative col.quant Boolean determining quantile classification scheme used colpal Color palette adopt adj.mar Boolean determining margin width needs accomodate labels res.size Size residual values plot [0-1] row.size Size row effect values plot [0-1] col.size Size column effect values plot [0-1] res.txt Boolean determining values added plot label.txt Boolean determining margin column labels plotted","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_pol.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Polish two-way tables — eda_pol","text":"list class eda_polish following named components: long median polish residuals three columns: Column levels, row levels residual values. wide median polish residuals table wide form. row Row effects table col Column effects table global Overall value (common value) iter Number iterations polish stabilizes. cv Table residuals, row effects, column effects CV values long form. power Transformation power applied values prior polishing. IQ_row Ratio interquartile row effect values 80th quantile residuals. IQ_col Ratio interquartile column effect values 80th quantile residuals.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_pol.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Polish two-way tables — eda_pol","text":"function performs polish two way table. default, applies median polish, statistical summaries mean can passed function via stat = parameter. function returns list row/column effects along global residual values. also generate colored table plot = TRUE.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_pol.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Polish two-way tables — eda_pol","text":"","code":"df <- data.frame(region = rep( c(\"NE\", \"NC\", \"S\", \"W\"), each = 5), edu = rep( c(\"ed8\", \"ed9to11\", \"ed12\", \"ed13to15\", \"ed16\"), 4), perc = c(25.3, 25.3, 18.2, 18.3, 16.3, 32.1, 29, 18.8, 24.3, 19, 38.8, 31, 19.3, 15.7, 16.8, 25.4, 21.1, 20.3, 24, 17.5)) M <- eda_pol(df, row = region, col = edu, val = perc, plot = FALSE) plot(M)"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_qq.html","id":null,"dir":"Reference","previous_headings":"","what":"QQ and MD plots — eda_qq","title":"QQ and MD plots — eda_qq","text":"eda_qq Generates empirical Normal QQ plot well Tukey mean-difference plot.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_qq.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"QQ and MD plots — eda_qq","text":"","code":"eda_qq( x, y = NULL, fac = NULL, norm = FALSE, p = 1L, tukey = FALSE, md = FALSE, q.type = 5, fx = NULL, fy = NULL, plot = TRUE, show.par = TRUE, grey = 0.6, pch = 21, p.col = \"grey50\", p.fill = \"grey80\", size = 0.8, alpha = 0.8, q = TRUE, b.val = c(0.25, 0.75), l.val = c(0.125, 0.875), xlab = NULL, ylab = NULL, title = NULL, t.size = 1.2, ... )"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_qq.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"QQ and MD plots — eda_qq","text":"x Vector first variable dataframe. y Vector second variable column defining continuous variable x dataframe. fac Column defining grouping variable x dataframe. norm Boolean determining Normal QQ plot generated. p Power transformation apply sets values. tukey Boolean determining Tukey transformation adopted (FALSE adopts Box-Cox transformation). md Boolean determining Tukey mean-difference plot generated. q.type integer 1 9 selecting one nine quantile algorithms. (See quantiletile function). fx Formula apply x variable. computed transformation applied x variable. fy Formula apply y variable. computed transformation applied y variable. plot Boolean determining plot generated. show.par Boolean determining parameters power transformation formula displayed. grey Grey level apply plot elements (0 1 1 = black). pch Point symbol type. p.col Color point symbol. p.fill Point fill color passed bg (used pch ranging 21-25). size Point size (0-1) alpha Point transparency (0 = transparent, 1 = opaque). applicable rgb() used define point colors. q Boolean determining grey quantile boxes plotted. b.val Quantiles define quantile box parameters. Defaults IQR. Two values needed. l.val Quantiles define quantile line parameters. Defaults mid 75% values. Two values needed. xlab X label output plot. Ignored x dataframe. ylab Y label output plot. Ignored x dataframe. title Title add plot. t.size Title size. ... used","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_qq.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"QQ and MD plots — eda_qq","text":"Returns list following components: x: X values. May interpolated smallest quantile batch. Values reflect power transformation defined p. b: Y values. May interpolated smallest quantile batch. Values reflect power transformation defined p. p: Re-expression applied original values. fx: Formula applied x variable. fy: Formula applied y variable.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_qq.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"QQ and MD plots — eda_qq","text":"function used generate empirical QQ plot, plot displays IQR via grey boxes x y values. box widths can changed via b.val argument. plot also display mid 75% values via light colored dashed lines. line positions can changed via l.val argument. middle dashed line represents batch's median value. Console output prints suggested multiplicative additive offsets. See QQ plot vignette introduction use interpretation. function can also used generate Normal QQ plot norm argument set TRUE. case, line parameters l.val overridden set +/- 1 standard deviations. Note \"suggested offsets\" output disabled, can generate M-D version Normal QQ plot. Also note formula argument ignored mode.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_qq.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"QQ and MD plots — eda_qq","text":"","code":"# Passing data as a dataframe singer <- lattice::singer dat <- singer[singer$voice.part %in% c(\"Bass 2\", \"Tenor 1\"), ] eda_qq(dat, height, voice.part) #> [1] \"Suggested offsets:y = x * 0.8571 + (12.4286)\" # Passing data as two separate vector objects bass2 <- subset(singer, voice.part == \"Bass 2\", select = height, drop = TRUE ) tenor1 <- subset(singer, voice.part == \"Tenor 1\", select = height, drop = TRUE ) eda_qq(bass2, tenor1) #> [1] \"Suggested offsets:y = x * 1.04 + (-5.22)\" # There seems to be an additive offset of about 2 inches eda_qq(bass2, tenor1, fx = \"x - 2\") #> [1] \"Suggested offsets:y = x * 1.04 + (-5.22)\" # We can fine-tune by generating the Tukey mean-difference plot eda_qq(bass2, tenor1, fx = \"x - 2\", md = TRUE) #> [1] \"Suggested offsets:y = x * 1.04 + (-5.22)\" # An offset of another 0.5 inches seems warranted # We can sat that overall, bass2 singers are 2.5 inches taller than tenor1. # The offset is additive. eda_qq(bass2, tenor1, fx = \"x - 2.5\", md = TRUE) #> [1] \"Suggested offsets:y = x * 1.04 + (-5.22)\" # Example 2: Sepal width setosa <- subset(iris, Species == \"setosa\", select = Petal.Width, drop = TRUE) virginica <- subset(iris, Species == \"virginica\", select = Petal.Width, drop = TRUE) eda_qq(setosa, virginica) #> [1] \"Suggested offsets:y = x * 1.7143 + (1.6286)\" # The points are not completely parallel to the 1:1 line suggesting a # multiplicative offset. The slope may be difficult to eyeball. The function # outputs a suggested slope and intercept. We can start with that eda_qq(setosa, virginica, fx = \"x * 1.7143\") #> [1] \"Suggested offsets:y = x * 1.7143 + (1.6286)\" # Now let's add the suggested additive offset. eda_qq(setosa, virginica, fx = \"x * 1.7143 + 1.6286\") #> [1] \"Suggested offsets:y = x * 1.7143 + (1.6286)\" # We can confirm this value via the mean-difference plot # Overall, we have both a multiplicative and additive offset between the # species' petal widths. eda_qq(setosa, virginica, fx = \"x * 1.7143 + 1.6286\", md = TRUE) #> [1] \"Suggested offsets:y = x * 1.7143 + (1.6286)\" # Function can also generate a Normal QQ plot eda_qq(bass2, norm = TRUE)"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_re.html","id":null,"dir":"Reference","previous_headings":"","what":"Re-expression function — eda_re","title":"Re-expression function — eda_re","text":"eda_re re-expresses vector following Tukey box-cox transformation.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_re.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Re-expression function — eda_re","text":"","code":"eda_re(x, p = 0, tukey = TRUE)"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_re.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Re-expression function — eda_re","text":"x Vector p Power transformation tukey set TRUE, adopt Tukey's power transformation, FALSE, adopt Box-Cox transformation","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_re.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Re-expression function — eda_re","text":"Returns vector length input x","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_re.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Re-expression function — eda_re","text":"function used re-express data using one two transformation techniques: Box-Cox transformation (tukey = FALSE)Tukey's power transformation (tukey = TRUE).","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_re.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Re-expression function — eda_re","text":"","code":"x <- c(15, 28, 17, 73, 8, 83, 2) eda_re(x, p=-1/3) #> [1] 0.4054801 0.3293169 0.3889111 0.2392723 0.5000000 0.2292489 0.7937005"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_rline.html","id":null,"dir":"Reference","previous_headings":"","what":"Tukey's resistant line — eda_rline","title":"Tukey's resistant line — eda_rline","text":"eda_rline R implementation Hoaglin, Mosteller Tukey's resistant line technique outlined chapter 5 \"Understanding Robust Exploratory Data Analysis\" (Wiley, 1983).","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_rline.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Tukey's resistant line — eda_rline","text":"","code":"eda_rline(dat, x, y, px = 1, py = 1, tukey = TRUE, iter = 20)"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_rline.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Tukey's resistant line — eda_rline","text":"dat Data frame. x Column assigned x axis. y Column assigned y axis. px Power transformation apply x-variable. py Power transformation apply y-variable. tukey Boolean determining Tukey transformation adopted. iter Maximum number iterations run. (FALSE adopts Box-Cox transformation)","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_rline.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Tukey's resistant line — eda_rline","text":"Returns list class eda_rlinewith following named components: : Intercept b: Slope res: Residuals sorted x-values x: Sorted x values y: y values following sorted x-values xmed: Median x values third ymed: Median y values third index: Index sorted x values defining upper boundaries thirds xlab: X label name ylab: Y label name iter: Number iterations","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_rline.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Tukey's resistant line — eda_rline","text":"R implementation RLIN.F FORTRAN code Velleman et. al's book. function fits robust line using three-point summary strategy whereby data split three equal length groups along x-axis line fitted medians defining group via iterative process. function mirror built-stat::line function fitting strategy outputs additional parameters. See accompanying vignette Resistant Line detailed breakdown resistant line technique.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_rline.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Tukey's resistant line — eda_rline","text":"Velleman, P. F., D. C. Hoaglin. 1981. Applications, Basics Computing Exploratory Data Analysis. Boston: Duxbury Press. D. C. Hoaglin, F. Mosteller, J. W. Tukey. 1983. Understanding Robust Exploratory Data Analysis. Wiley.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_rline.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Tukey's resistant line — eda_rline","text":"","code":"# This first example uses breast cancer data from \"ABC's of EDA\" page 127. # The output model's parameters should closely match: Y = -46.19 + 2.89X # The plots shows the original data with a fitted resistant line (red) # and a regular lm fitted line (dashed line), and the modeled residuals. # The 3-point summary dots are shown in red. M <- eda_rline(neoplasms, Temp, Mortality) M #> $b #> [1] 2.890173 #> #> $a #> [1] -45.90578 #> #> $res #> [1] 21.2982659 0.1398844 -2.1791908 8.8294798 -11.2485549 -7.6167630 #> [7] -0.1398844 4.7589595 -9.0092486 -2.1994220 2.7554913 -7.2676301 #> [13] -0.3907514 6.1861272 1.7971098 0.1398844 #> #> $x #> [1] 31.8 34.0 40.2 42.1 42.3 43.5 44.2 45.1 46.3 47.3 47.8 48.5 49.2 49.9 50.0 #> [16] 51.3 #> #> $y #> [1] 67.3 52.5 68.1 84.6 65.1 72.2 81.7 89.2 78.9 88.6 95.0 87.0 #> [13] 95.9 104.5 100.4 102.5 #> #> $xmed #> [1] 40.2 45.7 49.9 #> #> $ymed #> [1] 67.30 85.15 100.40 #> #> $index #> [1] 5 11 16 #> #> $xlab #> [1] \"Temp\" #> #> $ylab #> [1] \"Mortality\" #> #> $px #> [1] 1 #> #> $py #> [1] 1 #> #> $iter #> [1] 4 #> #> attr(,\"class\") #> [1] \"eda_rline\" # Plot the output (red line is the resistant line) plot(M) # Add a traditional OLS regression line (dashed line) abline(lm(Mortality ~ Temp, neoplasms), lty = 3) # Plot the residuals plot(M, type = \"residuals\") # This next example models gas consumption as a function of engine displacement. # It applies a transformation to both variables via the px and py arguments. eda_3pt(mtcars, disp, mpg, px = -1/3, py = -1, ylab = \"gal/mi\", xlab = expression(\"Displacement\"^{-1/3})) #> $slope1 #> [1] -0.3633401 #> #> $slope2 #> [1] -0.4098173 #> #> $hsrtio #> [1] 1.127916 #> #> $xmed #> [1] 0.1405721 0.1813741 0.2190819 #> #> $ymed #> [1] 0.06690834 0.05208333 0.03663004 #> # This next example uses Andrew Siegel's pathological 9-point dataset to test # for model stability when convergence cannot be reached. M <- eda_rline(nine_point, X, Y) plot(M)"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_sl.html","id":null,"dir":"Reference","previous_headings":"","what":"Tukey's spread-level function — eda_sl","title":"Tukey's spread-level function — eda_sl","text":"eda_sl function generates spread-level table univariate dataset.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_sl.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Tukey's spread-level function — eda_sl","text":"","code":"eda_sl( dat, x, fac, p = 1, tukey = FALSE, sprd = \"frth\", plot = TRUE, grey = 0.6, pch = 21, p.col = \"grey50\", p.fill = \"grey80\", size = 1, alpha = 0.8 )"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_sl.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Tukey's spread-level function — eda_sl","text":"dat Dataframe x Continuous variable column fac Categorical variable column p Power transformation apply variable tukey Boolean determining Tukey transformation adopted (FALSE adopts Box-Cox transformation) sprd Choice spreads. Either interquartile, sprd = \"IQR\" fourth-spread, sprd = \"frth\" (default). plot Boolean determining plot generated. grey Grey level apply plot elements (0 1 1 = black). pch Point symbol type. p.col Color point symbol. p.fill Point fill color passed bg (used pch ranging 21-25). size Point size (0-1) alpha Point transparency (0 = transparent, 1 = opaque). applicable rgb() used define point colors.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_sl.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Tukey's spread-level function — eda_sl","text":"Returns dataframe level spreads.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_sl.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Tukey's spread-level function — eda_sl","text":"Note function confused William Cleveland's spread-location function. fac categorical, output produce many NA's. page 59, Hoaglan et. al define fourth-spread range defined upper fourth lower fourth. eda_lsum function used compute upper/lower fourths.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_sl.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Tukey's spread-level function — eda_sl","text":"Understanding Robust Exploratory Data Analysis, Hoaglin, David C., Frederick Mosteller, John W. Tukey, 1983.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_sl.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Tukey's spread-level function — eda_sl","text":"","code":"dat <- read.csv(\"http://mgimond.github.io/ES218/Data/Food_web.csv\") sl <- eda_sl(dat, mean.length, dimension) # The output can be passed to a model fitting function like eda_lm # The output slope can be used to help identify a power transformation eda_lm(sl, Level, Spread) #> (Intercept) x #> -2.969986 2.979117"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_trim.html","id":null,"dir":"Reference","previous_headings":"","what":"Trims vector and dataframe objects — eda_trim","title":"Trims vector and dataframe objects — eda_trim","text":"Removes records either tail-ends sorted dataset. Trimming can performed number records (specify num = option) quantiles (specify prop= option). eda_trim Trims vector eda_trim_df Trims data frame eda_ltrim Left-trims vector eda_rtrim Right-trims vector eda_ltrim_df Left-trims dataframe eda_rtrim_df Right-trims dataframe","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_trim.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Trims vector and dataframe objects — eda_trim","text":"","code":"eda_trim(x, prop = 0.05, num = 0) eda_trim_df(dat, x, prop = 0.05, num = 0) eda_ltrim(x, prop = 0.05, num = 0) eda_ltrim_df(dat, x, prop = 0.05, num = 0) eda_rtrim(x, prop = 0.05, num = 0) eda_rtrim_df(dat, x, prop = 0.05, num = 0)"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_trim.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Trims vector and dataframe objects — eda_trim","text":"x Vector values (trimming vector) column whose values used trim dataframe (applies *_df functions ) prop Fraction values trim num Number values trim dat Dataframe (applies *_df functions )","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_trim.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Trims vector and dataframe objects — eda_trim","text":"Returns data type input (.e. vector dataframe)","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_trim.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Trims vector and dataframe objects — eda_trim","text":"input dataset need sorted (sorting performed functions). num set zero, function assume trimming done fraction (defined prop parameter). eda_trim eda_trim_df functions called, num prop values apply tail. example, num = 5 5 smallest 5 largest values removed data. NA values must stripped input vector column elements running trim functions. Elements returned sorted trimmed elements.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_trim.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Trims vector and dataframe objects — eda_trim","text":"","code":"# Trim a vector by 10% (i.e. 10% of the smallest and 10% of the largest # values) eda_trim( mtcars[,1], prop=0.1) #> [1] 14.7 15.0 15.2 15.2 15.5 15.8 16.4 17.3 17.8 18.1 18.7 19.2 19.2 19.7 21.0 #> [16] 21.0 21.4 21.4 21.5 22.8 22.8 24.4 26.0 27.3 # Trim a data frame by 10% using the mpg column(i.e. 10% of the smallest # and 10% of the largest mpg values) eda_trim_df( mtcars, mpg, prop=0.1) #> mpg cyl disp hp drat wt qsec vs am gear carb #> Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 #> Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 #> Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 #> AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 #> Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 #> Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 #> Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 #> Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 #> Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 #> Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 #> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 #> Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 #> Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 #> Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 #> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 #> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 #> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 #> Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 #> Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 #> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 #> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 #> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 #> Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 #> Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_unipow.html","id":null,"dir":"Reference","previous_headings":"","what":"Ladder of powers transformation on a single vector — eda_unipow","title":"Ladder of powers transformation on a single vector — eda_unipow","text":"eda_unipow re-expresses vector ladder powers plots results using histogram density function. Either Tukey Box-Cox transformation used computing re-expressed values.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_unipow.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Ladder of powers transformation on a single vector — eda_unipow","text":"","code":"eda_unipow( x, p = c(2, 1, 1/2, 0.33, 0, -0.33, -1/2, -1, -2), tukey = TRUE, bins = 5, cex.main = 1.3, col = \"#DDDDDD\", border = \"#AAAAAA\", title = \"Re-expressed data via ladder of powers\", ... )"},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_unipow.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Ladder of powers transformation on a single vector — eda_unipow","text":"x Vector p Vector powers tukey TRUE (default), apply Tukey's power transformation, FALSE adopt Box-Cox transformation bins Number histogram bins cex.main Histogram title size (assigned histogram plot) col Histogram fill color border Histogram border color title Overall plot title (set NULL title) ... parameters passed graphics::hist function.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_unipow.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Ladder of powers transformation on a single vector — eda_unipow","text":"return value","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_unipow.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Ladder of powers transformation on a single vector — eda_unipow","text":"output lattice descriptive plots showing transformed data across different powers.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_unipow.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Ladder of powers transformation on a single vector — eda_unipow","text":"Tukey, John W. 1977. Exploratory Data Analysis. Addison-Wesley.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/eda_unipow.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Ladder of powers transformation on a single vector — eda_unipow","text":"","code":"data(mtcars) eda_unipow(mtcars$mpg, bins=6)"},{"path":"https://mgimond.github.io/tukeyedar/reference/neoplasms.html","id":null,"dir":"Reference","previous_headings":"","what":"Breast cancer mortality vs. temperature — neoplasms","title":"Breast cancer mortality vs. temperature — neoplasms","text":"data represent relationship mean annual temperature breast cancer mortality rate.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/neoplasms.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Breast cancer mortality vs. temperature — neoplasms","text":"","code":"neoplasms"},{"path":"https://mgimond.github.io/tukeyedar/reference/neoplasms.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Breast cancer mortality vs. temperature — neoplasms","text":"data frame 16 rows 2 variables: Temp Temperature degrees Fahrenheit. Mortality Mortality rate presented index.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/neoplasms.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Breast cancer mortality vs. temperature — neoplasms","text":"Applications, Basics Computing Exploratory Data Analysis, P.F. Velleman D.C. Hoaglin, 1981. (page 127)","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/nine_point.html","id":null,"dir":"Reference","previous_headings":"","what":"Andrew Siegel's pathological 9-point dataset — nine_point","title":"Andrew Siegel's pathological 9-point dataset — nine_point","text":"synthetic dataset created test robustness fitted lines. Originally published Andrew Siegel later adapted Hoaglin et al.'s book.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/nine_point.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Andrew Siegel's pathological 9-point dataset — nine_point","text":"","code":"nine_point"},{"path":"https://mgimond.github.io/tukeyedar/reference/nine_point.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Andrew Siegel's pathological 9-point dataset — nine_point","text":"data frame 9 rows 2 variables: X X values Y Y values","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/nine_point.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Andrew Siegel's pathological 9-point dataset — nine_point","text":"Robust regression using repeated medians, Andrew F. Siegel, Biometrika, vol 69, n 1, 1982. Understanding robust exploratory data analysis, D.C. Hoaglin, F. Mosteller J.W. Tukey. 1983 (page 139)","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/plot.eda_polish.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot eda_polish tables or diagnostic plots — plot.eda_polish","title":"Plot eda_polish tables or diagnostic plots — plot.eda_polish","text":"plot.eda_pol plot method lists eda_polish class.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/plot.eda_polish.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot eda_polish tables or diagnostic plots — plot.eda_polish","text":"","code":"# S3 method for eda_polish plot( x, type = \"residuals\", add.cv = FALSE, k = NULL, col.quant = FALSE, colpal = \"RdYlBu\", colrev = TRUE, col.eff = TRUE, col.com = TRUE, adj.mar = TRUE, res.size = 1, row.size = 1, col.size = 1, res.txt = TRUE, label.txt = TRUE, grey = 0.6, pch = 21, p.col = \"grey30\", p.fill = \"grey60\", size = 0.9, alpha = 0.8, ... )"},{"path":"https://mgimond.github.io/tukeyedar/reference/plot.eda_polish.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot eda_polish tables or diagnostic plots — plot.eda_polish","text":"x list class eda_polish. type Plot type. One three: \"residuals\", \"cv\" \"diagnostic\". add.cv Whether add kCV model plotting \"residuals\". k Custom k use kCV added model. value NULL makes us slope. col.quant Boolean indicating quantile classification scheme used. colpal Color palette adopt (one listed hcl.pals()). colrev color palette reversed? (default TRUE). col.eff Boolean indicating effects common value contribute color gradient. col.com Boolean indicating common value contribute color gradient. adj.mar Boolean indicating margin width needs accommodate labels. res.size Size residual values plot [0-1]. row.size Size row effect values plot [0-1]. col.size Size column effect values plot [0-1]. res.txt Boolean indicating values added plot. label.txt Boolean indicating margin column labels plotted. grey Grey level apply plot elements diagnostic plot (0 1 1 = black). pch Point symbol type diagnostic plot. p.col Color point symbol diagnostic plot. p.fill Point fill color passed bg (used pch ranging 21-25). size Point size (0-1) diagnostic plot. alpha Point transparency (0 = transparent, 1 = opaque). applicable rgb() used define point colors. ... Arguments passed subsequent methods.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/plot.eda_polish.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Plot eda_polish tables or diagnostic plots — plot.eda_polish","text":"Returns single element vector \"type\" \"diagnostic\" value otherwise.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/plot.eda_polish.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Plot eda_polish tables or diagnostic plots — plot.eda_polish","text":"function plots polish table residuals CV values. also generate diagnostic plot type set diagnostic","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/plot.eda_polish.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Plot eda_polish tables or diagnostic plots — plot.eda_polish","text":"","code":"# Create dataset df <- data.frame(region = rep( c(\"NE\", \"NC\", \"S\", \"W\"), each = 5), edu = rep( c(\"ed8\", \"ed9to11\", \"ed12\", \"ed13to15\", \"ed16\"), 4), perc = c(25.3, 25.3, 18.2, 18.3, 16.3, 32.1, 29, 18.8, 24.3, 19, 38.8, 31, 19.3, 15.7, 16.8, 25.4, 21.1, 20.3, 24, 17.5)) # Generate median polish output out <- eda_pol(df, row = region, col = edu, val = perc, plot = FALSE) # Plot table plot(out, type = \"residuals\") # Plot table using CV values plot(out, type = \"cv\") # Generate diagnostic plot plot(out, type = \"diagnostic\") #> $slope #> cv #> 1.3688 #>"},{"path":"https://mgimond.github.io/tukeyedar/reference/plot.eda_rline.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot eda_rline model — plot.eda_rline","title":"Plot eda_rline model — plot.eda_rline","text":"plot.eda_rline plot method lists eda_rline class.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/plot.eda_rline.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot eda_rline model — plot.eda_rline","text":"","code":"# S3 method for eda_rline plot( x, type = \"model\", xlab = NULL, ylab = NULL, grey = 0.7, pch = 21, equal = TRUE, p.col = \"grey50\", p.fill = \"grey80\", size = 0.8, alpha = 0.7, model = TRUE, pt3 = TRUE, fit = TRUE, ... )"},{"path":"https://mgimond.github.io/tukeyedar/reference/plot.eda_rline.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot eda_rline model — plot.eda_rline","text":"x Object class eda_rline. type Plot type. One two: \"model\", \"residuals\". xlab Custom x-axis label. Defaults column name. ylab Custom y-axis label. Defaults column name. grey Grey level apply plot elements (0 1 1 = black). pch Point symbol type. equal Boolean determining axes lengths match (.e. square plot). p.col Color point symbol. p.fill Point fill color passed bg (used pch ranging 21-25). size Point size (0-1). alpha Point transparency (0 = transparent, 1 = opaque). applicable rgb() used define point colors. model Boolean indicating resulting model added plot. applies type = \"model\". pt3 Boolean indicating 3-pt summaries added plot. applies type = \"model\". fit Boolean indicating fitted line added plot. ... Arguments passed subsequent methods.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/plot.eda_rline.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Plot eda_rline model — plot.eda_rline","text":"return value.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/plot.eda_rline.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Plot eda_rline model — plot.eda_rline","text":"function generates scatter plot fitted model eda_rline object.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/plot.eda_rline.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Plot eda_rline model — plot.eda_rline","text":"","code":"r.lm <- eda_rline(age_height, Months, Height) plot(r.lm) plot(r.lm, pt3 = FALSE) plot(r.lm, type = \"residuals\")"},{"path":"https://mgimond.github.io/tukeyedar/reference/tukeyedar.html","id":null,"dir":"Reference","previous_headings":"","what":"Tukey inspired exploratory data analysis functions — tukeyedar","title":"Tukey inspired exploratory data analysis functions — tukeyedar","text":"packages hosts small set Tukey inspired functions use exploring datasets robust manner.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/wat05.html","id":null,"dir":"Reference","previous_headings":"","what":"Temperature normals for Waterville Maine (1991-2020) — wat05","title":"Temperature normals for Waterville Maine (1991-2020) — wat05","text":"NOAA/NCEI derived normal daily temperatures city Waterville, Maine (USA) 1991 2020 period. Units degrees Fahrenheit.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/wat05.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Temperature normals for Waterville Maine (1991-2020) — wat05","text":"","code":"wat05"},{"path":"https://mgimond.github.io/tukeyedar/reference/wat05.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Temperature normals for Waterville Maine (1991-2020) — wat05","text":"data frame 365 rows 5 variables: date Date centered 11991-2020 period. Note year purely symbolic. doy Day year. min Typical minimum temperature 1991-2020 period. avg Typical average temperature 1991-2020 period. max Typical maximum temperature 1991-2020 period.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/wat05.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Temperature normals for Waterville Maine (1991-2020) — wat05","text":"https://www.ncei.noaa.gov/","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/wat95.html","id":null,"dir":"Reference","previous_headings":"","what":"Legacy temperature normals for Waterville Maine (1981-2010) — wat95","title":"Legacy temperature normals for Waterville Maine (1981-2010) — wat95","text":"NOAA/NCEI derived normal daily temperatures city Waterville, Maine (USA) 1981 2010 period. Units degrees Fahrenheit.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/wat95.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Legacy temperature normals for Waterville Maine (1981-2010) — wat95","text":"","code":"wat95"},{"path":"https://mgimond.github.io/tukeyedar/reference/wat95.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Legacy temperature normals for Waterville Maine (1981-2010) — wat95","text":"data frame 365 rows 5 variables: date Date centered 1981-2010 period. Note year purely symbolic. doy Day year min Typical minimum temperature 1981-2010 period. avg Typical average temperature 1981-2010 period. max Typical maximum temperature 1981-2010 period.","code":""},{"path":"https://mgimond.github.io/tukeyedar/reference/wat95.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Legacy temperature normals for Waterville Maine (1981-2010) — wat95","text":"https://www.ncei.noaa.gov/","code":""},{"path":"https://mgimond.github.io/tukeyedar/news/index.html","id":"tukeyedar-021","dir":"Changelog","previous_headings":"","what":"tukeyedar 0.2.1","title":"tukeyedar 0.2.1","text":"Added option plot density distribution alongside Normal fit eda_normfit.","code":""},{"path":"https://mgimond.github.io/tukeyedar/news/index.html","id":"tukeyedar-020","dir":"Changelog","previous_headings":"","what":"tukeyedar 0.2.0","title":"tukeyedar 0.2.0","text":"Added Normal QQ plot option eda_qq. Added symmetrical Normal fit plot function eda_normfit. Updated eda_boxls aesthetics. Updated median polish diagnostic plot aesthetics.","code":""},{"path":"https://mgimond.github.io/tukeyedar/news/index.html","id":"tukeyedar-011","dir":"Changelog","previous_headings":"","what":"tukeyedar 0.1.1","title":"tukeyedar 0.1.1","text":"Introduces median polish function eda_pol. Introduces QQ Tukey mean-difference plot eda_qq. Introduces density plot function eda_dens. Adds re-expression parameters eda_lm via parameters px py. Adds sd labels SD dashed lines eda_lm. eda_lm now output lm intercept slope. Adds plot method eda_rline object. eda_re p = 1, box-cox option ignored. Homogenized plot appearances. Added power parameter argument eda_boxls. Added power parameter argument eda_sl. Added plot option eda_sl.","code":""},{"path":"https://mgimond.github.io/tukeyedar/news/index.html","id":"tukeyedar-010","dir":"Changelog","previous_headings":"","what":"tukeyedar 0.1.0","title":"tukeyedar 0.1.0","text":"Initial release tukeyedar","code":""}]
diff --git a/ref.bib b/ref.bib
index 58f1e98..148cfe9 100755
--- a/ref.bib
+++ b/ref.bib
@@ -1,6 +1,6 @@
@Book{applied_eda1981,
title = {Applications, Basics and Computing of Exploratory Data Analysis},
- author = {P.F. Velleman and D.C. Hoaglin},
+ author = {Velleman, P.F., and Hoaglin, D.C. },
publisher = {Duxbury Press},
address = {Boston},
year = {1981}
@@ -8,14 +8,21 @@ @Book{applied_eda1981
@Book{understanding_eda1983,
title = {Understanding robust and exploratory data analysis},
- author = {D.C. Hoaglin, F. Mosteller and J.W. Tukey},
+ author = {Hoaglin, D.C. , Mosteller, F., Tukey, J.W. },
publisher = {Wiley},
year = {1983}
}
@Book{eda1977,
title = {Exploratory Data Analysis},
- author = {John W. Tukey},
+ author = {Tukey, John W. },
publisher = {Addison-Wesley},
year = {1977}
}
+
+@Book{visdata1993,
+ title = {Visualizing Data},
+ author = {Cleveland, William},
+ publisher = {Hobart Press},
+ year = {1993}
+}