From 89005637fb479729b4fd7c21200bdf6a5476f39a Mon Sep 17 00:00:00 2001 From: Manny Gimond Date: Thu, 28 Oct 2021 19:51:35 -0400 Subject: [PATCH] minor edit to vignette --- .Rbuildignore | 2 + .gitignore | 2 + docs/404.html | 119 ++++-- docs/LICENSE-text.html | 113 +++++- docs/articles/Introduction.html | 184 +++++----- docs/articles/RLine.html | 174 ++++----- .../figure-html/unnamed-chunk-27-1.png | Bin 19046 -> 17712 bytes .../figure-html/unnamed-chunk-30-1.png | Bin 18106 -> 19479 bytes docs/articles/index.html | 119 ++++-- docs/authors.html | 119 ++++-- docs/index.html | 93 +++-- docs/pkgdown.css | 83 ++--- docs/pkgdown.js | 4 +- docs/pkgdown.yml | 6 +- docs/reference/age_height.html | 140 +++++-- docs/reference/eda_3pt.html | 252 +++++++++---- docs/reference/eda_bipow.html | 175 ++++++--- docs/reference/eda_boxls.html | 244 +++++++++---- docs/reference/eda_lm.html | 269 +++++++++----- docs/reference/eda_lsum.html | 184 +++++++--- docs/reference/eda_re.html | 167 ++++++--- docs/reference/eda_rline.html | 343 +++++++++++------- docs/reference/eda_sl.html | 187 +++++++--- docs/reference/eda_trim.html | 260 ++++++++----- docs/reference/eda_unipow.html | 232 ++++++++---- docs/reference/index.html | 194 ++++++++-- docs/reference/neoplasms.html | 140 +++++-- docs/reference/nine_point.html | 140 +++++-- docs/reference/tukeyedar.html | 121 ++++-- vignettes/Introduction.Rmd | 2 +- 30 files changed, 2789 insertions(+), 1279 deletions(-) mode change 100644 => 100755 .gitignore diff --git a/.Rbuildignore b/.Rbuildignore index e9e08d0..d9ad95f 100755 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -9,3 +9,5 @@ Readme_prep.txt ^pkgdown$ ref.bib ^\.github$ +^doc$ +^Meta$ diff --git a/.gitignore b/.gitignore old mode 100644 new mode 100755 index a7abd73..ff81adc --- a/.gitignore +++ b/.gitignore @@ -4,3 +4,5 @@ .Ruserdata inst/doc docs/ +/doc/ +/Meta/ diff --git a/docs/404.html b/docs/404.html index a058b00..c3bf80f 100755 --- a/docs/404.html +++ b/docs/404.html @@ -1,27 +1,66 @@ + - - - - + + + + Page not found (404) • tukeyedar - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - + + + + +
-
+
+ -
+ + +
+
-
+
- - + + diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html index ed8424e..1cf7419 100755 --- a/docs/LICENSE-text.html +++ b/docs/LICENSE-text.html @@ -1,12 +1,66 @@ + -License • tukeyedar + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + + + + + +
-
-
+ + +
+ +
-
+
+
- + + - diff --git a/docs/articles/Introduction.html b/docs/articles/Introduction.html index 77b2553..b67a598 100755 --- a/docs/articles/Introduction.html +++ b/docs/articles/Introduction.html @@ -19,8 +19,6 @@ - -
+
-
-

Introduction -

+
+

+Introduction

The tukeyedar package houses functions used in Exploratory Data Analysis (EDA). Most functions are inspired by work published by John Tukey, David Hoaglin and Frederick Mosteller (see references at the bottom of this document). Note that this package is in beta mode, so use at your own discretion. Many of the plots generated from these functions are not necessarily geared for publication or public dissemination but are designed to focus the viewer’s attention on the patterns generated by the plots (hence the reason for light colored axes and missing axis labels for some of the plots ).

The functions available in this package are listed below:

@@ -157,12 +161,12 @@

Introduction
-library(tukeyedar)
+library(tukeyedar)

A brief description and examples of each function is presented next.

-
-

eda_boxls -

+
+

+eda_boxls

Usage:

 eda_boxls(dat, x,fac,outlier=TRUE, out.txt, type="l", horiz=FALSE)
@@ -220,9 +224,9 @@

eda_boxls

Note that with both the l and ls options, the boxplots are ordered based on the non-equalized level values.

-
-

Trim family -

+
+

+Trim family

This is a family of trimming functions that trim a vector or dataframe by sorted vector or column values. Note that NA values need to be removed from the to-be-trimmed vector or column elements before running the trim functions.

@@ -248,9 +252,9 @@

Trim family -

eda_trim -

+
+

+eda_trim

Usage:

 eda_trim(x, prop=.05, num = 0) 
@@ -260,9 +264,9 @@

eda_trimeda_trim(x, prop = 0.1) #> [1] 2 3 4 5 6 7 8 9

-
-

eda_ltrim -

+
+

+eda_ltrim

Usage:

 eda_ltrim(x, prop=.05, num = 0) 
@@ -276,9 +280,9 @@

eda_ltrimeda_ltrim(x, num = 3) #> [1] 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

-
-

eda_rtrim -

+
+

+eda_rtrim

Usage:

 eda_rtrim(x, prop=.05, num = 0) 
@@ -292,9 +296,9 @@

eda_rtrimeda_rtrim(x, num = 2) #> [1] 1 2 3 4 5 6 7 8

-
-

eda_trim_df -

+
+

+eda_trim_df

Usage:

 eda_trim_df(dat, x, prop=.05, num = 0) 
@@ -329,9 +333,9 @@

eda_trim_df#> 6 10.8 83 19.7

Note that the output dataframe is sorted on the x column.

-
-

eda_ltrim_df -

+
+

+eda_ltrim_df

Usage:

 eda_ltrim_df(dat, x, prop=.05, num = 0)
@@ -382,9 +386,9 @@

eda_ltrim_df#> 28 17.9 80 58.3 #> 31 20.6 87 77.0

-
-

eda_rtrim_df -

+
+

+eda_rtrim_df

Usage:

 eda_rtrim_df(dat,prop=.05, x, num = 0)
@@ -426,9 +430,9 @@

eda_rtrim_df#> 11 11.3 79 24.2

-
-

eda_re -

+
+

+eda_re

Usage:

 eda_re(x, p = 0, tukey = FALSE)
@@ -468,16 +472,16 @@

eda_re \]

While both transformation techniques will generate similar distributions when the power p is 0 or greater, they will differ in distributions when the power is negative. For example, when re-expressing mtcars$mpg using an inverse power (p = -1), Tukey’s re-expression will change the data order but the Box-Cox transformation will not as shown in the following plots:

-plot(mpg ~ disp, mtcars, main="Original data")
-plot(eda_re(mpg, -1, tukey=TRUE) ~ disp, mtcars, main="Tukey")
-plot(eda_re(mpg, -1, tukey=FALSE) ~ disp, mtcars, main="Box-Cox")
+plot(mpg ~ disp, mtcars, main="Original data") +plot(eda_re(mpg, -1, tukey=TRUE) ~ disp, mtcars, main="Tukey") +plot(eda_re(mpg, -1, tukey=FALSE) ~ disp, mtcars, main="Box-Cox")

The original data shows a negative relationship between mpg and disp; the Tukey re-expression takes the inverse of mpg which changes the nature of the relationship between the y and x variable where whe have a positive relationship between the re-expressed mpg variable and disp (note that by simply changing the sign of the re-expressed value, -x^(-1) maintains the nature of the original relationship); the Box-Cox transformation, on the other hand, maintains this negative relationship.

The choice of re-rexpression will depend on the analysis context. For example, if you want an easily interpretable transformation then opt for the Tukey re-expression. If you want to compare the shape of transformed variables, the Box-Cox approach will be better suited.

-
-

eda_lsum -

+
+

+eda_lsum

Usage:

eda_lsum((x, l = 5, all = TRUE))

@@ -510,25 +514,25 @@

eda_lsum#> 3 E 4.5 66.0 151.75 237.5 171.5 #> 4 D 2.5 63.5 159.00 254.5 191.0 #> 5 C 1.5 57.0 178.25 299.5 242.5 -with(lsum, dotchart(x=mid, labels=letter,pch=20, pt.cex=1.5) ) +with(lsum, dotchart(x=mid, labels=letter,pch=20, pt.cex=1.5) )

We can make use of the letter summaries to fine-tune a re-expression power that symmetrizes the data:

-p <- c(-1/2, -1/4, 0, 1/4, 1/2)
-OP <- par(mfrow=c(1,5))
+p <- c(-1/2, -1/4, 0, 1/4, 1/2)
+OP <- par(mfrow=c(1,5))
 for (i in p) {
   lsum <- eda_lsum( eda_re(mtcars$hp,i) ,l=5)
-  with(lsum, dotchart(x=mid, labels=letter, pt.cex=1.5,
-                      pch=20,main=paste("power=",i)) )
+  with(lsum, dotchart(x=mid, labels=letter, pt.cex=1.5,
+                      pch=20,main=paste("power=",i)) )
 }
-par(OP)
+par(OP)

The goal is to find a re-expression that minimizes systematic skew across all letter summary values. A power of -0.25 seems to do a nice job in symmetrizing the distribution.

-

For a detailed explanation of the letter summaries calculation click here.

+

For a detailed explanation of the letter summaries calculation click here.

-
-

eda_sl -

+
+

+eda_sl

Usage:

 eda_sl(dat, x, y)
@@ -560,16 +564,16 @@

eda_sl

The following example shows that sepal length spread increases with increasing sepal length.

 sl <- eda_sl(iris, Species, Sepal.Length)
-plot(spread ~ level, sl, pch=16)
+plot(spread ~ level, sl, pch=16)

-
-

eda_lm -

+
+

+eda_lm

Usage:

 eda_lm(dat, x, y, x.lab = "X", y.lab = "Y", reg = TRUE, rob=FALSE, loe = FALSE, 
-        lm.col = rgb(1, 0.5, 0.5, 0.8), loe.col = rgb(.73, .73, 1, 1), 
+        lm.col = rgb(1, 0.5, 0.5, 0.8), loe.col = rgb(.73, .73, 1, 1), 
         stats=FALSE,..., plot.d=NULL, loess.d=NULL)

@@ -642,17 +646,17 @@

eda_lm

The colors for the regression line and LOESS curve can be modified by adjusting the lm.col and loe.col parameters as follows:

-eda_lm(dat=cars, x=dist, y=speed, loe=TRUE, lm.col="blue", loe.col=rgb(1,0,0,0.3))
+eda_lm(dat=cars, x=dist, y=speed, loe=TRUE, lm.col="blue", loe.col=rgb(1,0,0,0.3))

Additional parameters can be passed to the plot and the loess.smooth sub-functions via the plot.d and loess.d lists as follows:

-eda_lm(dat=cars, x=dist, y=speed, loe=TRUE, plot.d = list(pch=21, col="red",bg="bisque"), 
-        loess.d=list( span=2/5), reg=FALSE)
+eda_lm(dat=cars, x=dist, y=speed, loe=TRUE, plot.d = list(pch=21, col="red",bg="bisque"), + loess.d=list( span=2/5), reg=FALSE)

-
-

eda_3pt -

+
+

+eda_3pt

Usage:

 eda_3pt(dat, x, x.lab = "X", y.lab = "Y", adj = -.12, dir = TRUE, ...)
@@ -691,22 +695,22 @@

eda_3ptContinuing with this example, we may choose to square the speed. Here, we’ll use the eda_re() function which will apply a Tukey power transformation to the speed values. We’ll also customize the x-axis label by indicating that the x-values are squared.

 pt3b <- eda_3pt(dat=cars, x=eda_re(speed,2), y=dist,
-                x.lab = expression("Speed"^{2}) )
+ x.lab = expression("Speed"^{2}) )

This transformation does a better job at aligning the two half-slopes. The half-slopes ratio (pt3b$hsrtio) is 1.101 which is an improvement over the original half-slopes ratio.

Note that instead of transforming the x-values, we could have transformed the y-values. For example, we could have taken the cube root of breaking distance as follows:

 pt3c <- eda_3pt(dat=cars, x=speed, y=eda_re(dist,1/3),
-                y.lab = expression("Distance"^{3}) )
+ y.lab = expression("Distance"^{3}) )

This seems to be an even bigger improvement over the last re-expression with a half-slopes ratio of 1.043.

-
-

eda_unipow -

+
+

+eda_unipow

Usage:

-eda_unipow(x, p = c(2, 1, 1/2, 0.33, 0, -0.33, -1/2, -1, -2), bins=5, tukey=TRUE,
+eda_unipow(x, p = c(2, 1, 1/2, 0.33, 0, -0.33, -1/2, -1, -2), bins=5, tukey=TRUE,
          cex.main=1.3, col="#DDDDDD",border="#AAAAAA",
          title="Re-expressed data via ladder of powers", ...)

@@ -755,27 +759,27 @@

eda_unipow#> return an error if a log transformation is chosen.

This is because sunspot.year has three 0 values.

-table(sunspot.year == 0)
+table(sunspot.year == 0)
 #>  
 #>  FALSE  TRUE 
 #>    286     3

These values will generate either a -Inf if log transformed or a Inf if raised to a negative number (e.g. 0^(-0.33)).

To remedy this, we can either remove the problematic values, or adjust the ladder of powers. E.g.,

-eda_unipow(sunspot.year, p = c(2, 1, 1/2, 0.33))
+eda_unipow(sunspot.year, p = c(2, 1, 1/2, 0.33))

Since we have a relatively large dataset, we can increase the number of bins as follows:

-eda_unipow(sunspot.year, p = c(2, 1, 1/2, 0.33), bin=15)
+eda_unipow(sunspot.year, p = c(2, 1, 1/2, 0.33), bin=15)

-

Note that you might need to reset the plotting device by typing dev.off() if you encounter difficulty generating plots after executing the eda_unipow function.

+

Note that you might need to reset the plotting device by typing dev.off() if you encounter difficulty generating plots after executing the eda_unipow function.

-
-

eda_bipow -

+
+

+eda_bipow

Usage:

-eda_bipow(dat, x,y, p = c(3, 2, 1, .5, 0),...)
+eda_bipow(dat, x,y, p = c(3, 2, 1, .5, 0),...)

@@ -816,9 +820,9 @@

eda_bipow

Notice how the medians in the boxplot match the middle summary point (in red) for both the \(x\) and \(y\) values–as expected.

-
-

eda_rline -

+
+

+eda_rline

Usage:

 eda_rline(dat, x, y)
@@ -844,9 +848,9 @@

eda_rlineaccompanying vignette.

-
-

References -

+
+

+References

-

-

Site built with pkgdown 1.9000.9000.9000.

+

Site built with pkgdown 1.6.1.

@@ -885,7 +887,5 @@

References - -
+
@@ -91,9 +95,9 @@

2021-10-28

-
-

The resistant line basics -

+
+

+The resistant line basics

The eda_rline function fits a robust line through a bivariate dataset. It does so by first breaking the data into three roughly equal sized batches following the x-axis variable. It then uses the batches’ median values to compute the slope and intercept.

However, the function doesn’t stop there. After fitting the inital line, the function fits another line (following the aforementioned methodology) to the model’s residuals. If the slope is not close to zero, the residual slope is added to the original fitted model creating an updated model. This iteration is repeated until the residual slope is close to zero or until the residual slope changes in sign (at which point the average of the last two iterated slopes is used in the final fit).

An example of the iteration follows using data from Velleman et. al’s book. The dataset, neoplasms, consists of breast cancer mortality rates for regions with varying mean annual temperatures.

@@ -120,9 +124,9 @@

The resistant line basics

where the final slope and intercept are 2.89 and -45.91, respectively.

-
-

Implementing the resistant line -

+
+

+Implementing the resistant line

The eda_rline takes just three arguments: data frame, x variable and y variable. The function output is a list.

 M <- eda_rline(neoplasms, Temp, Mortality)
@@ -157,76 +161,76 @@ 

Implementing the resistant lineThe elements a and b are the model’s intercept and slope. The vectors x and y are the input values sorted on x. res is a vector of the final residuals sorted on x. xmed and ymed are vectors of the medians for each of the three batches.

You can use the output values to generate the following plot.

-plot(Mortality~Temp, neoplasms, pch=20)
-abline(a=M$a, b=M$b, col=rgb(1,0,0,0.5))
+plot(Mortality~Temp, neoplasms, pch=20) +abline(a=M$a, b=M$b, col=rgb(1,0,0,0.5))

If you wish to add the median values to all three batches for reference, modify the code as follows:

-plot(Mortality~Temp, neoplasms, pch=20)
-abline(a=M$a, b=M$b, col=rgb(1,0,0,0.5))
-points(x=M$xmed, y=M$ymed, col=rgb(1,0,0,0.5), pch=20,cex=2)
+plot(Mortality~Temp, neoplasms, pch=20) +abline(a=M$a, b=M$b, col=rgb(1,0,0,0.5)) +points(x=M$xmed, y=M$ymed, col=rgb(1,0,0,0.5), pch=20,cex=2)

To see how this resistant line compares to an ordinary least-squares (OLS) regression slope, type:

-plot(Mortality~Temp, neoplasms, pch=20)
-abline(a=M$a, b=M$b, col=rgb(1,0,0,0.5))
-abline(lm(Mortality~Temp, neoplasms),lty=2) # Regression model
+plot(Mortality~Temp, neoplasms, pch=20) +abline(a=M$a, b=M$b, col=rgb(1,0,0,0.5)) +abline(lm(Mortality~Temp, neoplasms),lty=2) # Regression model

The regression model computes a slope of 2.36 whereas the resistant line function generates a slope of 2.89. From the scatter plot, we can spot a point that may have undo influence on the regression line (this point is highlighted in green in the following plot).

-plot(Mortality~Temp, neoplasms, pch=20)
-abline(a=M$a, b=M$b, col=rgb(1,0,0,0.5))
-abline(lm(Mortality~Temp, neoplasms),lty=2) # Regression model
-points(neoplasms[15,], col="#43CD80",cex=2 ,pch=20)
+plot(Mortality~Temp, neoplasms, pch=20) +abline(a=M$a, b=M$b, col=rgb(1,0,0,0.5)) +abline(lm(Mortality~Temp, neoplasms),lty=2) # Regression model +points(neoplasms[15,], col="#43CD80",cex=2 ,pch=20)

Removing that point from the data generates an OLS regression line more inline with our resistant model. The point of interest is the 15th record in the neoplasms data frame.

 neoplasms.sub <- neoplasms[-15,]
-plot(Mortality~Temp, neoplasms, pch=20)
-abline(a=M$a, b=M$b, col=rgb(1,0,0,0.5))
-abline(lm(Mortality~Temp, neoplasms.sub),lty=2) # Regression model with data subset
-points(neoplasms[15,], col="#43CD80",cex=2 ,pch=20)
+plot(Mortality~Temp, neoplasms, pch=20) +abline(a=M$a, b=M$b, col=rgb(1,0,0,0.5)) +abline(lm(Mortality~Temp, neoplasms.sub),lty=2) # Regression model with data subset +points(neoplasms[15,], col="#43CD80",cex=2 ,pch=20)

-
-

Other examples -

-
-

Nine point data -

+
+

+Other examples

+
+

+Nine point data

The nine_point dataset is used by Hoaglin et. al (p. 139) to test the resistant line function’s ability to stabilize wild oscillations in the computed slopes across iterations.

 M <- eda_rline(nine_point, X,Y)
-plot(Y~X, nine_point, pch=20)
-abline(a=M$a, b=M$b, col=rgb(1,0,0,0.5))
+plot(Y~X, nine_point, pch=20) +abline(a=M$a, b=M$b, col=rgb(1,0,0,0.5))

Here, slope and intercept are 0.067 and 0.133 respectively matching the 1/15 and 2/15 values computed by Hoaglin et. al.

-
-

Age vs. height data -

+
+

+Age vs. height data

age_height is another dataset pulled from Hoaglin et. al (p. 135). It gives the ages and heights of children from a private urban school.

 M <- eda_rline(age_height, Months,Height)
-plot(Height ~ Months, age_height, pch=20)
-abline(a=M$a, b=M$b, col=rgb(1,0,0,0.5))
+plot(Height ~ Months, age_height, pch=20) +abline(a=M$a, b=M$b, col=rgb(1,0,0,0.5))

Here, slope and intercept are 0.429 and 91.007 respectively matching the 0.426 slope and closely matching the 90.366 intercept values computed by Hoaglin et. al on page 137.

-
-

Not all relationships are linear! -

+
+

+Not all relationships are linear!

It’s important to remember that the resistant line technique is only valid if the bivariate relationship is linear. Here, we’ll step through the example highlighted by Velleman et. al (p. 138) using the R built-in mtcars dataset.

First, we’ll fit the resistant line to the data.

 M <- eda_rline(mtcars, disp, mpg)
-plot(mpg ~ disp, mtcars, pch=20)
-abline(a=M$a, b=M$b, col=rgb(1,0,0,0.5))
+plot(mpg ~ disp, mtcars, pch=20) +abline(a=M$a, b=M$b, col=rgb(1,0,0,0.5))

It’s important to note that just because a resistant line can be fit does not necessarily imply that the relationship is linear. To assess linearity of the mtcars dataset, we’ll make use of the eda_3pt function (see the accompanying vignette for details on interpreting the 3-point summary function).

@@ -240,80 +244,80 @@ 

Not all relationships are linear!
 eda_3pt(mtcars, disp^(-1/3), 1/mpg, 
         y.lab = "gal/mi",
-        x.lab = expression("Displacement"^{-1/3}))

+ x.lab = expression("Displacement"^{-1/3}))

Now that we have identified re-expressions that linearise the relationship, we can fit the resistant line. (Note that the grey line generated by the eda_3pt function is not the same as the resistant line generated with eda_rline.)

 M <- eda_rline(mtcars, disp^(-1/3), 1/mpg)
-plot(1/mpg ~ eval(disp^(-1/3)), mtcars, pch=20,
+plot(1/mpg ~ eval(disp^(-1/3)), mtcars, pch=20,
      ylab = "gal/mi",
-     xlab = expression("Displacement"^{-1/3}))
-abline(a=M$a, b=M$b, col=rgb(1,0,0,0.5))
+ xlab = expression("Displacement"^{-1/3})) +abline(a=M$a, b=M$b, col=rgb(1,0,0,0.5))

-
-

Computing a confidence interval -

+
+

+Computing a confidence interval

Confidence intervals for the coefficients can be estimated using bootstrapping techniques. There are two approaches: resampling residuals and resampling x-y cases.

-
-

Resampling the model residuals -

+
+

+Resampling the model residuals

Here, we fit the resistant line then extract its residuals. We then re-run the model many times by replacing the original y values with the modeled y values plus the resampled residuals to generate the confidence intervals.

 n  <- 599 # Set number of iterations
 M  <- eda_rline(neoplasms, Temp, Mortality) # Fit the resistant line
-bt <- array(0, dim=c(n, 2)) # Create empty bootstrap array
+bt <- array(0, dim=c(n, 2)) # Create empty bootstrap array
 for(i in 1:n){ #bootstrap loop
-  df.bt <- data.frame(x=M$x, y = M$y +sample(M$res,replace=TRUE))
+  df.bt <- data.frame(x=M$x, y = M$y +sample(M$res,replace=TRUE))
   bt[i,1] <- eda_rline(df.bt,x,y)$a
   bt[i,2] <- eda_rline(df.bt,x,y)$b
 }

Now plot the distributions,

-hist(bt[,1], main="Intercept distribution")
-hist(bt[,2], main="Slope distribution")
+hist(bt[,1], main="Intercept distribution") +hist(bt[,2], main="Slope distribution")

and tabulate the 95% confidence interval.

-conf <- t(data.frame(Intercept = quantile(bt[,1], p=c(0.05,0.95) ),
-                     Slope = quantile(bt[,2], p=c(0.05,0.95) )))
+conf <- t(data.frame(Intercept = quantile(bt[,1], p=c(0.05,0.95) ),
+                     Slope = quantile(bt[,2], p=c(0.05,0.95) )))
 conf
-#>                   5%       95%
-#>  Intercept -76.85235 11.875164
-#>  Slope       1.64119  3.579736
+#> 5% 95% +#> Intercept -77.083618 11.905761 +#> Slope 1.674236 3.565211
-
-

Resampling the x-y paired values -

+
+

+Resampling the x-y paired values

Here, we resample the x-y paired values (with replacement) then compute the resistant line each time.

 n  <- 599 # Set number of iterations
-bt <- array(0, dim=c(n, 2)) # Create empty bootstrap array
+bt <- array(0, dim=c(n, 2)) # Create empty bootstrap array
 for(i in 1:n){ #bootstrap loop
-  recs <- sample(1:nrow(neoplasms), replace = TRUE)
+  recs <- sample(1:nrow(neoplasms), replace = TRUE)
   df.bt <- neoplasms[recs,]
   bt[i,1]=eda_rline(df.bt,Temp,Mortality)$a
   bt[i,2]=eda_rline(df.bt,Temp,Mortality)$b
 }

Now plot the distributions,

-hist(bt[,1], main="Intercept distribution")
-hist(bt[,2], main="Slope distribution")
+hist(bt[,1], main="Intercept distribution") +hist(bt[,2], main="Slope distribution")

and tabulate the 95% confidence interval.

-conf <- t(data.frame(Intercept = quantile(bt[,1], p=c(0.05,0.95) ),
-                     Slope = quantile(bt[,2], p=c(0.05,0.95) )))
+conf <- t(data.frame(Intercept = quantile(bt[,1], p=c(0.05,0.95) ),
+                     Slope = quantile(bt[,2], p=c(0.05,0.95) )))
 conf
 #>                     5%       95%
-#>  Intercept -108.368421 10.972644
-#>  Slope        1.648445  4.157895
+#> Intercept -108.668421 15.031034 +#> Slope 1.643678 4.157895
-
-

References -

+
+

+References

-

-

Site built with pkgdown 1.9000.9000.9000.

+

Site built with pkgdown 1.6.1.

@@ -350,7 +352,5 @@

ReferencesArticles • tukeyedar + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + + + + + +
-
-
+ + +
+
+
Introduction to EDA functions
+
+
Resistant Line
+
+
+
-
+
+
- + + - diff --git a/docs/authors.html b/docs/authors.html index a67a611..a9b9f03 100755 --- a/docs/authors.html +++ b/docs/authors.html @@ -1,12 +1,66 @@ + -Citation and Authors • tukeyedar + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + + + + + +
-
-
+ + +

text

@@ -69,31 +138,35 @@

Citation

Authors

- -
  • -

    Manuel Gimond. Author, maintainer. +

      +
    • +

      Manuel Gimond. Author, maintainer.

    • -
+ + +
-
+
+

- + + - diff --git a/docs/index.html b/docs/index.html index f90bd8e..9fd601f 100755 --- a/docs/index.html +++ b/docs/index.html @@ -19,8 +19,6 @@ - -

Parameter
@@ -149,36 +153,36 @@
-
-

Installation -

+
+

+Installation

This package can be installed from github (the installation process makes use of the devtools package).

-devtools::install_github("mgimond/tukeyedar")
+devtools::install_github("mgimond/tukeyedar")

Note that the vignettes will not be automatically generated with the above command; note too that the vignettes are available on this website (see next section). If you want a local version of the vignettes, add the build_vignettes = TRUE parameter.

-devtools::install_github("mgimond/tukeyedar", build_vignettes = TRUE)
+devtools::install_github("mgimond/tukeyedar", build_vignettes = TRUE)

The vignette will require that dplyr be installed since the eda_sl function relies on it. If dplyr is not already installed, the aforementioned syntax will automatically install it for you.

If for some reason the vignettes are not created, you might want to re-install the package with the force=TRUE parameter.

-devtools::install_github("mgimond/tukeyedar", build_vignettes = TRUE, force=TRUE)
+devtools::install_github("mgimond/tukeyedar", build_vignettes = TRUE, force=TRUE)
-
-

Read the vignettes! -

+
+

+Read the vignettes!

It’s strongly recommended that you read the vignettes. These can be accessed from this website:

If you chose to have the vignettes locally created when you installed the package, then you can view them locally via vignette("Introduction", package = "tukeyedar") and vignette("RLine", package = "tukeyedar"). If you use a dark themed IDE, the vignettes may not render very well so you might opt to view them in a web browser via the functions RShowDoc("Introduction", package = "tukeyedar") and RShowDoc("RLine", package = "tukeyedar").

-
-

Using the functions -

+
+

+Using the functions

All functions start with eda_. For example, to generate a three point summary plot of the mpg vs. disp from the mtcars dataset, type:

-library(tukeyedar)
+library(tukeyedar)
 eda_3pt(mtcars, disp, mpg)

#> $slope1
@@ -201,8 +205,8 @@ 

Using the functionsmtcars |> eda_3pt(disp, mpg) # Using magrittr (or any of the tidyverse packages) -library(magrittr) -mtcars %>% eda_3pt(disp, mpg)

+library(magrittr) +mtcars %>% eda_3pt(disp, mpg)

@@ -222,56 +226,51 @@

Using the functions -
-

License

+

License

- -
-

Citation

+

Citation

-
-

Developers

+

Developers

    -
  • Manuel Gimond
    Author, maintainer
  • +
  • Manuel Gimond
    Author, maintainer
-
-

Dev status

+
+

Dev status

    -
  • R-CMD-check
  • +
  • R-CMD-check
- -
+

-

-

Site built with pkgdown 1.9000.9000.9000.

+

Site built with pkgdown 1.6.1.

@@ -280,7 +279,5 @@

Dev status

- - diff --git a/docs/pkgdown.css b/docs/pkgdown.css index 80ea5b8..1273238 100755 --- a/docs/pkgdown.css +++ b/docs/pkgdown.css @@ -56,10 +56,8 @@ img.icon { float: right; } -/* Ensure in-page images don't run outside their container */ -.contents img { +img { max-width: 100%; - height: auto; } /* Fix bug in bootstrap (only seen in firefox) */ @@ -80,10 +78,11 @@ dd { /* Section anchors ---------------------------------*/ a.anchor { - display: none; - margin-left: 5px; - width: 20px; - height: 20px; + margin-left: -30px; + display:inline-block; + width: 30px; + height: 30px; + visibility: hidden; background-image: url(./link.svg); background-repeat: no-repeat; @@ -91,15 +90,17 @@ a.anchor { background-position: center center; } -h1:hover .anchor, -h2:hover .anchor, -h3:hover .anchor, -h4:hover .anchor, -h5:hover .anchor, -h6:hover .anchor { - display: inline-block; +.hasAnchor:hover a.anchor { + visibility: visible; +} + +@media (max-width: 767px) { + .hasAnchor:hover a.anchor { + visibility: hidden; + } } + /* Fixes for fixed navbar --------------------------*/ .contents h1, .contents h2, .contents h3, .contents h4 { @@ -263,26 +264,31 @@ table { /* Syntax highlighting ---------------------------------------------------- */ -pre, code, pre code { +pre { + word-wrap: normal; + word-break: normal; + border: 1px solid #eee; +} + +pre, code { background-color: #f8f8f8; color: #333; } -pre, pre code { - white-space: pre-wrap; - word-break: break-all; - overflow-wrap: break-word; -} -pre { - border: 1px solid #eee; +pre code { + overflow: auto; + word-wrap: normal; + white-space: pre; } -pre .img, pre .r-plt { +pre .img { margin: 5px 0; } -pre .img img, pre .r-plt img { +pre .img img { background-color: #fff; + display: block; + height: auto; } code a, pre a { @@ -299,8 +305,9 @@ a.sourceLine:hover { .kw {color: #264D66;} /* keyword */ .co {color: #888888;} /* comment */ -.error {font-weight: bolder;} -.warning {font-weight: bolder;} +.message { color: black; font-weight: bolder;} +.error { color: orange; font-weight: bolder;} +.warning { color: #6A0366; font-weight: bolder;} /* Clipboard --------------------------*/ @@ -358,27 +365,3 @@ mark { content: ""; } } - -/* Section anchors --------------------------------- - Added in pandoc 2.11: https://github.com/jgm/pandoc-templates/commit/9904bf71 -*/ - -div.csl-bib-body { } -div.csl-entry { - clear: both; -} -.hanging-indent div.csl-entry { - margin-left:2em; - text-indent:-2em; -} -div.csl-left-margin { - min-width:2em; - float:left; -} -div.csl-right-inline { - margin-left:2em; - padding-left:1em; -} -div.csl-indent { - margin-left: 2em; -} diff --git a/docs/pkgdown.js b/docs/pkgdown.js index 6f0eee4..7e7048f 100755 --- a/docs/pkgdown.js +++ b/docs/pkgdown.js @@ -80,7 +80,7 @@ $(document).ready(function() { var copyButton = ""; - $("div.sourceCode").addClass("hasCopyButton"); + $(".examples, div.sourceCode").addClass("hasCopyButton"); // Insert copy buttons: $(copyButton).prependTo(".hasCopyButton"); @@ -91,7 +91,7 @@ // Initialize clipboard: var clipboardBtnCopies = new ClipboardJS('[data-clipboard-copy]', { text: function(trigger) { - return trigger.parentNode.textContent.replace(/\n#>[^\n]*/g, ""); + return trigger.parentNode.textContent; } }); diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml index cb3d1d5..aa4053c 100755 --- a/docs/pkgdown.yml +++ b/docs/pkgdown.yml @@ -1,8 +1,8 @@ pandoc: 2.11.4 -pkgdown: 1.9000.9000.9000 -pkgdown_sha: e5d7021afd269a0252dd427db7148b734af962fa +pkgdown: 1.6.1 +pkgdown_sha: ~ articles: Introduction: Introduction.html RLine: RLine.html -last_built: 2021-10-28T18:59Z +last_built: 2021-10-28T23:43Z diff --git a/docs/reference/age_height.html b/docs/reference/age_height.html index 8b1e4fe..7e32f0b 100755 --- a/docs/reference/age_height.html +++ b/docs/reference/age_height.html @@ -1,15 +1,70 @@ + -Age vs. height for private and rural school children — age_height • tukeyedar + + + + + +Age vs. height for private and rural school children — age_height • tukeyedar + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + + + + + +
-
-
+ + +
@@ -68,48 +138,48 @@

Age vs. height for private and rural school children

height and weight from urban private and rural public schools.

-
age_height
+
age_height
-
-
-

Format

-

A data frame with 18 rows and 2 variables:

Months
-

Child's age in months

-
Height
-

Child's height in cm

+

Format

+

A data frame with 18 rows and 2 variables:

+
Months

Child's age in months

+
Height

Child's height in cm

... -
-
-

Source

+ + +

Source

+

Understanding robust and exploratory data analysis, by D.C. Hoaglin, F. Mosteller and J.W. Tukey. (page 135)

-
-
+ +
-
+
+
- + + - diff --git a/docs/reference/eda_3pt.html b/docs/reference/eda_3pt.html index fe7ee84..219adaa 100755 --- a/docs/reference/eda_3pt.html +++ b/docs/reference/eda_3pt.html @@ -1,5 +1,46 @@ + -3-point summary plot — eda_3pt • tukeyedar + + + + + +3-point summary plot — eda_3pt • tukeyedar + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + + + + + +
-
-
+ + +
@@ -80,93 +150,119 @@

3-point summary plot

for both X and Y values on the ladder of powers.

-
eda_3pt(
-  dat,
-  x,
-  y,
-  x.lab = NULL,
-  y.lab = NULL,
-  adj = -0.12,
-  dir = TRUE,
-  pch = 20,
-  col = "grey40",
-  ...
-)
- -
-
-

Arguments

-
dat
-

data frame

-
x
-

column name assigned the x axis

-
y
-

column name assigned the y axis

-
x.lab
-

X label for output plot

-
y.lab
-

Y label for output plot

-
adj
-

Adjustment parameter for y label

-
dir
-

boolean indicating if suggested ladder of power direction should -be displayed

-
pch
-

Plot point size as a fraction (can be larger than 1.0)

-
col
-

Plot point color

-
...
-

other parameters passed to the graphics::plot function.

-
-
-

Details

+
eda_3pt(
+  dat,
+  x,
+  y,
+  x.lab = NULL,
+  y.lab = NULL,
+  adj = -0.12,
+  dir = TRUE,
+  pch = 20,
+  col = "grey40",
+  ...
+)
+ +

Arguments

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
dat

data frame

x

column name assigned the x axis

y

column name assigned the y axis

x.lab

X label for output plot

y.lab

Y label for output plot

adj

Adjustment parameter for y label

dir

boolean indicating if suggested ladder of power direction should +be displayed

pch

Plot point size as a fraction (can be larger than 1.0)

col

Plot point color

...

other parameters passed to the graphics::plot function.

+ +

Details

+

Outputs a plot showing the three point summary as well as a list of -parameters:

  • hsrtio: The ratio between both slopes. A value close to one +parameters:

      +
    • hsrtio: The ratio between both slopes. A value close to one suggests that no transformation is needed.

    • xmed: The x-coordinate values for the three summary points.

    • ymed: The y-coordinate values for the three summary points.

    • -
-
-

References

+ + +

References

+ -
  • Applications, Basics and Computing of Exploratory Data Analysis, +

      +
    • Applications, Basics and Computing of Exploratory Data Analysis, by P.F. Velleman and D.C. Hoaglin

    • Understanding robust and exploratory data analysis, by D.C. Hoaglin, F. Mosteller and J.W. Tukey

    • Exploratory Data Analysis, by John Tukey

    • -
-
+ -
-

Examples

-

-hsratio <- eda_3pt(cars, speed, dist)
-
-hsratio <- eda_3pt(cars, speed, dist^(1/3), y.lab=expression("Dist"^{1/3}), adj=-0.1)
-
-
-
+ +

Examples

+

+hsratio <- eda_3pt(cars, speed, dist)
+
+hsratio <- eda_3pt(cars, speed, dist^(1/3), y.lab=expression("Dist"^{1/3}), adj=-0.1)
+
+
+ +
-
+
+
- + + - diff --git a/docs/reference/eda_bipow.html b/docs/reference/eda_bipow.html index e0fff76..d044199 100755 --- a/docs/reference/eda_bipow.html +++ b/docs/reference/eda_bipow.html @@ -1,13 +1,68 @@ + -Ladder of powers transformation on bivariate data with three-point summary plot — eda_bipow • tukeyedar + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + + + + + +
-
-
+ + +
@@ -64,56 +134,71 @@

Ladder of powers transformation on bivariate data with three-point summary p Requires eda_3pt() function.

-
eda_bipow(x, y, dat, p = c(3, 2, 1, 0.5, 0), tukey = TRUE, ...)
- -
-
-

Arguments

-
x
-

column name assigned the x axis

-
y
-

column name assigned the y axis

-
dat
-

data frame

-
p
-

vector of powers

-
tukey
-

if set to TRUE then adopt Tukey's power transformation, if FALSE, -adopt Box-Cox transformation technique

-
...
-

other parameters passed to the graphics::plot function.

-
-
-

References

+
eda_bipow(x, y, dat, p = c(3, 2, 1, 0.5, 0), tukey = TRUE, ...)
+ +

Arguments

+ + + + + + + + + + + + + + + + + + + + + + + + + + +
x

column name assigned the x axis

y

column name assigned the y axis

dat

data frame

p

vector of powers

tukey

if set to TRUE then adopt Tukey's power transformation, if FALSE, +adopt Box-Cox transformation technique

...

other parameters passed to the graphics::plot function.

+ +

References

+

Applications, Basics and Computing of Exploratory Data Analysis, by P.F. Velleman and D.C. Hoaglin Understanding robust and exploratory data analysis, by D.C. Hoaglin, F. Mosteller and J.W. Tukey Exploratory Data Analysis, by John Tukey

-
-
+ +
-
+
+
- + + - diff --git a/docs/reference/eda_boxls.html b/docs/reference/eda_boxls.html index 8a39c4e..3870555 100755 --- a/docs/reference/eda_boxls.html +++ b/docs/reference/eda_boxls.html @@ -1,13 +1,68 @@ + -Create boxplots equalized by level and spread — eda_boxls • tukeyedar + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + + + + + +
-
-
+ + +
@@ -64,83 +134,101 @@

Create boxplots equalized by level and spread

option to equalize levels and/or spreads.

-
eda_boxls(
-  dat,
-  x,
-  fac,
-  outlier = TRUE,
-  out.txt,
-  type = "l",
-  horiz = FALSE,
-  outliers = TRUE
-)
- -
-
-

Arguments

-
dat
-

Data frame name

-
x
-

Column name assigned to the values

-
fac
-

Column name assigned to the factor the values are to be -conditioned on

-
outlier
-

Boolean indicating if outliers should be plotted

-
out.txt
-

Column whose values are to be used to label outliers

-
type
-

Plot type. "none" = no equalization ; "l" = equalize by level; -"ls" = equalize by both level and spread

-
horiz
-

plot horizontally (TRUE) or vertically (FALSE)

-
outliers
-

plot outliers (TRUE) or not (FALSE)

-
-
- -
-

Examples

-

-# A basic boxplot (no equalization)
-eda_boxls(mtcars,mpg, cyl, type="none", out.txt=mpg )
-
-
-# Boxplots equalized by level
-eda_boxls(mtcars,mpg, cyl, type="l", out.txt=mpg )
-
-
-# Boxplots equalized by level and spread
-eda_boxls(mtcars,mpg, cyl, type="ls", out.txt=mpg )
-
-
-# Hide outlier
-eda_boxls(mtcars,mpg, cyl, type="ls", out.txt=mpg , outlier=FALSE)
-
-
-
-
+
eda_boxls(
+  dat,
+  x,
+  fac,
+  outlier = TRUE,
+  out.txt,
+  type = "l",
+  horiz = FALSE,
+  outliers = TRUE
+)
+ +

Arguments

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
dat

Data frame name

x

Column name assigned to the values

fac

Column name assigned to the factor the values are to be +conditioned on

outlier

Boolean indicating if outliers should be plotted

out.txt

Column whose values are to be used to label outliers

type

Plot type. "none" = no equalization ; "l" = equalize by level; +"ls" = equalize by both level and spread

horiz

plot horizontally (TRUE) or vertically (FALSE)

outliers

plot outliers (TRUE) or not (FALSE)

+ + +

Examples

+

+# A basic boxplot (no equalization)
+eda_boxls(mtcars,mpg, cyl, type="none", out.txt=mpg )
+
+
+# Boxplots equalized by level
+eda_boxls(mtcars,mpg, cyl, type="l", out.txt=mpg )
+
+
+# Boxplots equalized by level and spread
+eda_boxls(mtcars,mpg, cyl, type="ls", out.txt=mpg )
+
+
+# Hide outlier
+eda_boxls(mtcars,mpg, cyl, type="ls", out.txt=mpg , outlier=FALSE)
+
+
+
+ +
-
+
+
- + + - diff --git a/docs/reference/eda_lm.html b/docs/reference/eda_lm.html index 6dbcea6..00a4247 100755 --- a/docs/reference/eda_lm.html +++ b/docs/reference/eda_lm.html @@ -1,15 +1,70 @@ + -Least Squares regression plot (with optional LOESS line) — eda_lm • tukeyedar + + + + + +Least Squares regression plot (with optional LOESS line) — eda_lm • tukeyedar + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + + + + + +
-
-
+ + +
@@ -68,92 +138,119 @@

Least Squares regression plot (with optional LOESS line)

standard deviations match axes unit length.

-
eda_lm(
-  dat,
-  x,
-  y,
-  x.lab = NULL,
-  y.lab = NULL,
-  reg = TRUE,
-  loe = FALSE,
-  lm.col = rgb(1, 0.5, 0.5, 0.8),
-  loe.col = rgb(0.73, 0.73, 1, 1),
-  stats = FALSE,
-  plot.d = list(pch = 20, col = "grey40"),
-  ...,
-  loess.d = NULL
-)
- -
-
-

Arguments

-
dat
-

data frame

-
x
-

column name assigned the x axis

-
y
-

column name assigned the y axis

-
x.lab
-

X label for output plot

-
y.lab
-

Y label for output plot

-
reg
-

boolean indicating whether a least squares regression line -should be plotted

-
loe
-

boolean indicating if a loess curve should be fitted

-
lm.col
-

regression line color

-
loe.col
-

LOESS curve color

-
stats
-

boolean indicating if regression summary statistics should be -displayed

-
plot.d
-

Additional parameters passed to the plot function

-
...
-

not used

-
loess.d
-

Additional parameters passed to the loess.smooth function

-
-
-

See also

-

plot and loess.smooth functions

-
-
+
eda_lm(
+  dat,
+  x,
+  y,
+  x.lab = NULL,
+  y.lab = NULL,
+  reg = TRUE,
+  loe = FALSE,
+  lm.col = rgb(1, 0.5, 0.5, 0.8),
+  loe.col = rgb(0.73, 0.73, 1, 1),
+  stats = FALSE,
+  plot.d = list(pch = 20, col = "grey40"),
+  ...,
+  loess.d = NULL
+)
-
-

Examples

-

-eda_lm(mtcars, wt, mpg, plot.d = list(pch=16, col="black"), loe=TRUE,
-      loess.d=list(family = "symmetric", span=0.5, degree=2))
-
-
-eda_lm(mtcars, wt, mpg, plot.d = list(pch=16, col="blue"), loe=TRUE)
-
-
-
+

Arguments

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
dat

data frame

x

column name assigned the x axis

y

column name assigned the y axis

x.lab

X label for output plot

y.lab

Y label for output plot

reg

boolean indicating whether a least squares regression line +should be plotted

loe

boolean indicating if a loess curve should be fitted

lm.col

regression line color

loe.col

LOESS curve color

stats

boolean indicating if regression summary statistics should be +displayed

plot.d

Additional parameters passed to the plot function

...

not used

loess.d

Additional parameters passed to the loess.smooth function

+ +

See also

+ +

plot and loess.smooth functions

+ +

Examples

+

+eda_lm(mtcars, wt, mpg, plot.d = list(pch=16, col="black"), loe=TRUE,
+      loess.d=list(family = "symmetric", span=0.5, degree=2))
+
+
+eda_lm(mtcars, wt, mpg, plot.d = list(pch=16, col="blue"), loe=TRUE)
+
+
+ +
-
+
+
- + + - diff --git a/docs/reference/eda_lsum.html b/docs/reference/eda_lsum.html index ecb2ff0..e942511 100755 --- a/docs/reference/eda_lsum.html +++ b/docs/reference/eda_lsum.html @@ -1,15 +1,70 @@ + -Tukey's letter value summaries — eda_lsum • tukeyedar + + + + + +Tukey's letter value summaries — eda_lsum • tukeyedar + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + + + + + +
-
-
+ + +
@@ -68,63 +138,69 @@

Tukey's letter value summaries

fourth (quartiles)

-
eda_lsum(x, l = 5, all = TRUE)
- -
-
-

Arguments

-
x
-

Vector

-
l
-

Number of levels

-
all
-

generate upper, lower and mid summaries if TRUE or -just generate mid summaries if FALSE

-
-
-

Details

+
eda_lsum(x, l = 5, all = TRUE)
+ +

Arguments

+ + + + + + + + + + + + + + +
x

Vector

l

Number of levels

all

generate upper, lower and mid summaries if TRUE or +just generate mid summaries if FALSE

+ +

Details

+

Visit this [link](https://mgimond.github.io/ES218/Week08b.html) for more information on the letter values summaries.

-
-
-

References

+

References

+

Exploratory Data Analysis, John Tukey, 1973.

-
-
-
-

Examples

-
x <- c(22, 8, 11, 3, 26, 1, 14, 18, 20, 25, 24)
-eda_lsum(x)
-#>   letter depth lower   mid upper spread
-#> 1      M   6.0  18.0 18.00  18.0    0.0
-#> 2      H   3.5   9.5 16.25  23.0   13.5
-#> 3      E   2.0   3.0 14.00  25.0   22.0
-#> 4      D   1.5   2.0 13.75  25.5   23.5
-#> 5      C   1.0   1.0 13.50  26.0   25.0
-
-
+

Examples

+
x <- c(22, 8, 11, 3, 26, 1, 14, 18, 20, 25, 24)
+eda_lsum(x)
+#>   letter depth lower   mid upper spread
+#> 1      M   6.0  18.0 18.00  18.0    0.0
+#> 2      H   3.5   9.5 16.25  23.0   13.5
+#> 3      E   2.0   3.0 14.00  25.0   22.0
+#> 4      D   1.5   2.0 13.75  25.5   23.5
+#> 5      C   1.0   1.0 13.50  26.0   25.0
+
+ +
-
+
+
- + + - diff --git a/docs/reference/eda_re.html b/docs/reference/eda_re.html index 00568ff..b7a93f7 100755 --- a/docs/reference/eda_re.html +++ b/docs/reference/eda_re.html @@ -1,12 +1,67 @@ + -Re-expression function — eda_re • tukeyedar + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + + + + + +
-
-
+ + +
@@ -62,55 +132,62 @@

Re-expression function

eda_re re-expresses a vector following the Tukey or box-cox transformation.

-
eda_re(x, p = 0, tukey = TRUE)
- -
-
-

Arguments

-
x
-

Vector

-
p
-

Power transformation

-
tukey
-

If set to TRUE then adopt Tukey's power transformation, if FALSE, -adopt Box-Cox transformation technique

-
-
-

Details

+
eda_re(x, p = 0, tukey = TRUE)
+ +

Arguments

+ + + + + + + + + + + + + + +
x

Vector

p

Power transformation

tukey

If set to TRUE then adopt Tukey's power transformation, if FALSE, +adopt Box-Cox transformation technique

+ +

Details

+

The `eda_re` function is used to re-express data using one of two transformation techniques: Box-Cox transformation (tukey=FALSE)or Tukey's power transformation (tukey=TRUE).

-
-
-
-

Examples

-
x <- c(15, 28, 17, 73,  8, 83,  2)
-eda_re(x, p=-1/3)
-#> [1] 0.4054801 0.3293169 0.3889111 0.2392723 0.5000000 0.2292489 0.7937005
-
-
+

Examples

+
x <- c(15, 28, 17, 73,  8, 83,  2)
+eda_re(x, p=-1/3)
+#> [1] 0.4054801 0.3293169 0.3889111 0.2392723 0.5000000 0.2292489 0.7937005
+
+ +
-
+
+
- + + - diff --git a/docs/reference/eda_rline.html b/docs/reference/eda_rline.html index c54fd69..86a8b09 100755 --- a/docs/reference/eda_rline.html +++ b/docs/reference/eda_rline.html @@ -1,14 +1,69 @@ + -Tukey's resistant line — eda_rline • tukeyedar + + + + + +Tukey's resistant line — eda_rline • tukeyedar + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + + + + + +
-
-
+ + +
@@ -66,22 +136,32 @@

Tukey's resistant line

and Exploratory Data Analysis" (Wiley, 1983).

-
eda_rline(dat, x, y)
- -
-
-

Arguments

-
dat
-

data frame

-
x
-

column name assigned the x axis

-
y
-

column name assigned the y axis

-
-
-

Value

+
eda_rline(dat, x, y)
+ +

Arguments

+ + + + + + + + + + + + + + +
dat

data frame

x

column name assigned the x axis

y

column name assigned the y axis

+ +

Value

+

Outputs a plot showing the three point summary as well as a list of -parameters:

  • a: Intercept

  • +parameters:

    + +
      +
    • a: Intercept

    • b: Slope

    • res: Residuals sorted on x-values order

    • x: Sorted x values

    • @@ -90,129 +170,132 @@

      Value

    • ymed: Median y values for each third

    • index: Index of sorted x values defining upper boundaries of each thirds

    • -
-
-

Details

+ + +

Details

+

Bits and pieces of the RLIN.F FORTRAN code in Velleman et. al's book were -used in helping implement some of the subroutines.
+used in helping implement some of the subroutines.
Note that this function has only been tested with a subset of datasets. It is far from being fully vetted. So use with caution!

-
-
-

References

+

References

+ -
  • Applications, Basics and Computing of Exploratory Data Analysis, +

      +
    • Applications, Basics and Computing of Exploratory Data Analysis, by P.F. Velleman and D.C. Hoaglin (available at http://dspace.library.cornell.edu/handle/1813/78)

    • Understanding robust and exploratory data analysis, by D.C. Hoaglin, F. Mosteller and J.W. Tukey

    • -
-
+ -
-

Examples

-

-# This first test is with breast cancer data from "ABC's of EDA" page 127.
-# The final model should look like:  Y = -46.19 + 2.89X
-
-r.lm <- eda_rline(neoplasms, Temp, Mortality)
-r.lm
-#> $b
-#> [1] 2.890173
-#> 
-#> $a
-#> [1] -45.90578
-#> 
-#> $res
-#>  [1]  21.2982659   0.1398844  -2.1791908   8.8294798 -11.2485549  -7.6167630
-#>  [7]  -0.1398844   4.7589595  -9.0092486  -2.1994220   2.7554913  -7.2676301
-#> [13]  -0.3907514   6.1861272   1.7971098   0.1398844
-#> 
-#> $x
-#>  [1] 31.8 34.0 40.2 42.1 42.3 43.5 44.2 45.1 46.3 47.3 47.8 48.5 49.2 49.9 50.0
-#> [16] 51.3
-#> 
-#> $y
-#>  [1]  67.3  52.5  68.1  84.6  65.1  72.2  81.7  89.2  78.9  88.6  95.0  87.0
-#> [13]  95.9 104.5 100.4 102.5
-#> 
-#> $xmed
-#> [1] 40.2 45.7 49.9
-#> 
-#> $ymed
-#> [1]  67.30  85.15 100.40
-#> 
-#> $index
-#> [1]  5 11 16
-#> 
-
-# Check output
-OP <- par( mfrow = c(2,1))
-  plot(Mortality ~ Temp, neoplasms)
-  mtext(sprintf("y = %f + (%f)x", r.lm$a, r.lm$b ))
-  abline(a = r.lm$a, b = r.lm$b, col="red")
-  abline( lm(Mortality ~ Temp, neoplasms), col="grey", lty=3)
-  points(cbind(r.lm$xmed,r.lm$ymed), pch =16, col="red")
-  abline(v= r.lm$x[r.lm$index],lty=3)
-  plot(r.lm$res ~ r.lm$x)
-  abline( h = 0, lty=3)
-
-par(OP)
-
-# This next example compares children height to age
-r.lm    <- eda_rline(age_height, Months, Height)
-
-OP <- par( mfrow = c(2,1))
- plot(Height ~ Months, age_height, xlab="Age (months)", ylab="Height (cm)")
- mtext(sprintf("y = %f + (%f)x", r.lm$a, r.lm$b ))
- abline(a = r.lm$a, b = r.lm$b, col="red")
- abline( lm(Height ~ Months, age_height), col="grey", lty=3)
- points(cbind(r.lm$xmed,r.lm$ymed), pch =16, col="red")
- abline(v= r.lm$x[r.lm$index],lty=3)
- plot(r.lm$res ~ r.lm$x)
- abline( h = 0, lty=3)
-
-par(OP)
-
-# Andrew Siegel's pathological 9-point data set
-
-r.lm <- eda_rline(nine_point, X, Y)
-
-OP <- par( mfrow = c(2,1))
-plot(Y ~ X, nine_point, xlab="Age (months)", ylab="Height (cm)")
-   mtext(sprintf("y = %f + (%f)x", r.lm$a, r.lm$b ))
-   abline(a = r.lm$a, b = r.lm$b, col="red")
-   abline( lm(Y ~ X, nine_point), col="grey", lty=3)
-   points(cbind(r.lm$xmed,r.lm$ymed), pch =16, col="red")
-   abline(v= r.lm$x[r.lm$index],lty=3)
-   plot(r.lm$res ~ r.lm$x)
-   abline( h = 0, lty=3)
-
-par(OP)
-
-
-
+ +

Examples

+

+# This first test is with breast cancer data from "ABC's of EDA" page 127.
+# The final model should look like:  Y = -46.19 + 2.89X
+
+r.lm <- eda_rline(neoplasms, Temp, Mortality)
+r.lm
+#> $b
+#> [1] 2.890173
+#> 
+#> $a
+#> [1] -45.90578
+#> 
+#> $res
+#>  [1]  21.2982659   0.1398844  -2.1791908   8.8294798 -11.2485549  -7.6167630
+#>  [7]  -0.1398844   4.7589595  -9.0092486  -2.1994220   2.7554913  -7.2676301
+#> [13]  -0.3907514   6.1861272   1.7971098   0.1398844
+#> 
+#> $x
+#>  [1] 31.8 34.0 40.2 42.1 42.3 43.5 44.2 45.1 46.3 47.3 47.8 48.5 49.2 49.9 50.0
+#> [16] 51.3
+#> 
+#> $y
+#>  [1]  67.3  52.5  68.1  84.6  65.1  72.2  81.7  89.2  78.9  88.6  95.0  87.0
+#> [13]  95.9 104.5 100.4 102.5
+#> 
+#> $xmed
+#> [1] 40.2 45.7 49.9
+#> 
+#> $ymed
+#> [1]  67.30  85.15 100.40
+#> 
+#> $index
+#> [1]  5 11 16
+#> 
+
+# Check output
+OP <- par( mfrow = c(2,1))
+  plot(Mortality ~ Temp, neoplasms)
+  mtext(sprintf("y = %f + (%f)x", r.lm$a, r.lm$b ))
+  abline(a = r.lm$a, b = r.lm$b, col="red")
+  abline( lm(Mortality ~ Temp, neoplasms), col="grey", lty=3)
+  points(cbind(r.lm$xmed,r.lm$ymed), pch =16, col="red")
+  abline(v= r.lm$x[r.lm$index],lty=3)
+  plot(r.lm$res ~ r.lm$x)
+  abline( h = 0, lty=3)
+
+par(OP)
+
+# This next example compares children height to age
+r.lm    <- eda_rline(age_height, Months, Height)
+
+OP <- par( mfrow = c(2,1))
+ plot(Height ~ Months, age_height, xlab="Age (months)", ylab="Height (cm)")
+ mtext(sprintf("y = %f + (%f)x", r.lm$a, r.lm$b ))
+ abline(a = r.lm$a, b = r.lm$b, col="red")
+ abline( lm(Height ~ Months, age_height), col="grey", lty=3)
+ points(cbind(r.lm$xmed,r.lm$ymed), pch =16, col="red")
+ abline(v= r.lm$x[r.lm$index],lty=3)
+ plot(r.lm$res ~ r.lm$x)
+ abline( h = 0, lty=3)
+
+par(OP)
+
+# Andrew Siegel's pathological 9-point data set
+
+r.lm <- eda_rline(nine_point, X, Y)
+
+OP <- par( mfrow = c(2,1))
+plot(Y ~ X, nine_point, xlab="Age (months)", ylab="Height (cm)")
+   mtext(sprintf("y = %f + (%f)x", r.lm$a, r.lm$b ))
+   abline(a = r.lm$a, b = r.lm$b, col="red")
+   abline( lm(Y ~ X, nine_point), col="grey", lty=3)
+   points(cbind(r.lm$xmed,r.lm$ymed), pch =16, col="red")
+   abline(v= r.lm$x[r.lm$index],lty=3)
+   plot(r.lm$res ~ r.lm$x)
+   abline( h = 0, lty=3)
+
+par(OP)
+
+
+ +
-
+
+ - + + - diff --git a/docs/reference/eda_sl.html b/docs/reference/eda_sl.html index 1b3d0c7..9b19f08 100755 --- a/docs/reference/eda_sl.html +++ b/docs/reference/eda_sl.html @@ -1,13 +1,68 @@ + -Tukey's spread-level function — eda_sl • tukeyedar + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + + + + + +
-
-
+ + +
@@ -64,65 +134,76 @@

Tukey's spread-level function

table from a univariate dataset

-
eda_sl(dat, x, y, sprd = "frth")
- -
-
-

Arguments

-
dat
-

Dataframe

-
x
-

Categorical variable

-
y
-

Continuous variable

-
sprd
-

Choice of spreads. Either interquartile, `sprd = "IQR"` or -fourth-spread, `sprd = "frth"` (default).

-
-
-

Details

+
eda_sl(dat, x, y, sprd = "frth")
+ +

Arguments

+ + + + + + + + + + + + + + + + + + +
dat

Dataframe

x

Categorical variable

y

Continuous variable

sprd

Choice of spreads. Either interquartile, `sprd = "IQR"` or +fourth-spread, `sprd = "frth"` (default).

+ +

Details

+ -
  • Note that this function is not to be confused with Bill Cleveland's - spread-location function.

  • +
      +
    • Note that this function is not to be confused with Bill Cleveland's + spread-location function.

    • If x is not categorical, the output will produce many or all NA's.

    • On page 59, Hoaglan et. al define the fourth-spread as the the range defined by the upper fourth and lower fourth. The `eda_lsum` function is used to compute the upper/lower fourths.

    • -
-
-

References

+ + +

References

+

Understanding Robust and Exploratory Data Analysis, Hoaglin, David C., Frederick Mosteller, and John W. Tukey, 1983.

-
-
-
-

Examples

-
sl <- eda_sl(iris, Species, Sepal.Length)
-plot(spread ~ level, sl, pch=16)
-
-
-
+

Examples

+
sl <- eda_sl(iris, Species, Sepal.Length)
+plot(spread ~ level, sl, pch=16)
+
+
+ +
-
+
+ - + + - diff --git a/docs/reference/eda_trim.html b/docs/reference/eda_trim.html index 9251c8e..07f52a4 100755 --- a/docs/reference/eda_trim.html +++ b/docs/reference/eda_trim.html @@ -1,5 +1,46 @@ + -Trims vector and dataframe objects — eda_trim • tukeyedar + + + + + +Trims vector and dataframe objects — eda_trim • tukeyedar + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + + + + + +
-
-
+ + +

Removes records from either tail-ends of a sorted dataset. Trimming can be performed by number of records (specify the num = option) or by - quantiles (specify the prop= option).

eda_trim trims a vector
eda_trim_df trims a data frame
eda_ltrim left-trims a vector
eda_rtrim right-trims a vector
eda_ltrim_df left-trims a dataframe
eda_rtrim_df right-trims a dataframe

+ quantiles (specify the prop= option).

+ eda_trim trims a vector
+ eda_trim_df trims a data frame
+ eda_ltrim left-trims a vector
+ eda_rtrim right-trims a vector
+ eda_ltrim_df left-trims a dataframe
+ eda_rtrim_df right-trims a dataframe

-
eda_trim(x, prop = 0.05, num = 0)
+    
eda_trim(x, prop = 0.05, num = 0)
 
-eda_trim_df(dat, x, prop = 0.05, num = 0)
+eda_trim_df(dat, x, prop = 0.05, num = 0)
 
-eda_ltrim(x, prop = 0.05, num = 0)
+eda_ltrim(x, prop = 0.05, num = 0)
 
-eda_ltrim_df(dat, x, prop = 0.05, num = 0)
+eda_ltrim_df(dat, x, prop = 0.05, num = 0)
 
-eda_rtrim(x, prop = 0.05, num = 0)
+eda_rtrim(x, prop = 0.05, num = 0)
 
-eda_rtrim_df(dat, x, prop = 0.05, num = 0)
+eda_rtrim_df(dat, x, prop = 0.05, num = 0) -
-
-

Arguments

-
x
-

= vector of values (if trimming a vector) or the column whose +

Arguments

+ + + + + + + + + + + + + + + + + + +
x

= vector of values (if trimming a vector) or the column whose values are used to trim a dataframe (applies to *_df -functions only)

-
prop
-

= fraction of values to trim

-
num
-

= number of values to trim

-
dat
-

= dataframe (applies to *_df functions only)

- -
-

Details

+functions only)

prop

= fraction of values to trim

num

= number of values to trim

dat

= dataframe (applies to *_df functions only)

+ +

Details

+ -
  • The input dataset does not need to be sorted (sorting is performed in the +

      +
    • The input dataset does not need to be sorted (sorting is performed in the functions).

    • If num is set to zero, then the function will assume that the trimming is to be done by fraction (defined by the prop parameter).

    • @@ -112,69 +198,71 @@

      Details

    • NA values must be stripped from the to-be-trimmed vector or column elements before running the trim functions.

    • Elements are return sorted on the trimmed element.

    • -
-
+ -
-

Examples

-

-# Trim a vector by 10% (i.e. 10% of the smallest and 10% of the largest
-# values)
-eda_trim( mtcars[,1], prop=0.1)
-#>  [1] 14.7 15.0 15.2 15.2 15.5 15.8 16.4 17.3 17.8 18.1 18.7 19.2 19.2 19.7 21.0
-#> [16] 21.0 21.4 21.4 21.5 22.8 22.8 24.4 26.0 27.3
-
-# Trim a data frame by 10% using the mpg column(i.e. 10% of the smallest
-# and 10% of the largest mpg values)
-eda_trim_df( mtcars, mpg, prop=0.1)
-#>                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
-#> Chrysler Imperial 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
-#> Maserati Bora     15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
-#> Merc 450SLC       15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
-#> AMC Javelin       15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
-#> Dodge Challenger  15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
-#> Ford Pantera L    15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
-#> Merc 450SE        16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
-#> Merc 450SL        17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
-#> Merc 280C         17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
-#> Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
-#> Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
-#> Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
-#> Pontiac Firebird  19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
-#> Ferrari Dino      19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
-#> Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
-#> Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
-#> Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
-#> Volvo 142E        21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
-#> Toyota Corona     21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
-#> Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
-#> Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
-#> Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
-#> Porsche 914-2     26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
-#> Fiat X1-9         27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
-
-
+ +

Examples

+

+# Trim a vector by 10% (i.e. 10% of the smallest and 10% of the largest
+# values)
+eda_trim( mtcars[,1], prop=0.1)
+#>  [1] 14.7 15.0 15.2 15.2 15.5 15.8 16.4 17.3 17.8 18.1 18.7 19.2 19.2 19.7 21.0
+#> [16] 21.0 21.4 21.4 21.5 22.8 22.8 24.4 26.0 27.3
+
+# Trim a data frame by 10% using the mpg column(i.e. 10% of the smallest
+# and 10% of the largest mpg values)
+eda_trim_df( mtcars, mpg, prop=0.1)
+#>                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
+#> Chrysler Imperial 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
+#> Maserati Bora     15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
+#> Merc 450SLC       15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
+#> AMC Javelin       15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
+#> Dodge Challenger  15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
+#> Ford Pantera L    15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
+#> Merc 450SE        16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
+#> Merc 450SL        17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
+#> Merc 280C         17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
+#> Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
+#> Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
+#> Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
+#> Pontiac Firebird  19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
+#> Ferrari Dino      19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
+#> Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
+#> Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
+#> Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
+#> Volvo 142E        21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
+#> Toyota Corona     21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
+#> Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
+#> Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
+#> Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
+#> Porsche 914-2     26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
+#> Fiat X1-9         27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
+
+ +
-
+
+
- + + - diff --git a/docs/reference/eda_unipow.html b/docs/reference/eda_unipow.html index 4e2f1eb..2137438 100755 --- a/docs/reference/eda_unipow.html +++ b/docs/reference/eda_unipow.html @@ -1,14 +1,69 @@ + -Ladder of powers transformation on a single vector — eda_unipow • tukeyedar + + + + + +Ladder of powers transformation on a single vector — eda_unipow • tukeyedar + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + + + + + +
-
-
+ + +
@@ -66,80 +136,98 @@

Ladder of powers transformation on a single vector

transformation is used in computing the re-expressed values.

-
eda_unipow(
-  x,
-  p = c(2, 1, 1/2, 0.33, 0, -0.33, -1/2, -1, -2),
-  tukey = TRUE,
-  bins = 5,
-  cex.main = 1.3,
-  col = "#DDDDDD",
-  border = "#AAAAAA",
-  title = "Re-expressed data via ladder of powers",
-  ...
-)
- -
-
-

Arguments

-
x
-

vector

-
p
-

vector of powers

-
tukey
-

if TRUE (default), apply Tukey's power transformation, if FALSE -adopt Box-Cox transformation.

-
bins
-

number of bins in the histogram

-
cex.main
-

histogram title size (assigned to each histogram plot)

-
col
-

histogram fill color

-
border
-

histogram border color

-
title
-

Overall plot title (set to NULL for no title)

-
...
-

other parameters passed to the graphics::hist function.

-
-
-

Details

+
eda_unipow(
+  x,
+  p = c(2, 1, 1/2, 0.33, 0, -0.33, -1/2, -1, -2),
+  tukey = TRUE,
+  bins = 5,
+  cex.main = 1.3,
+  col = "#DDDDDD",
+  border = "#AAAAAA",
+  title = "Re-expressed data via ladder of powers",
+  ...
+)
+ +

Arguments

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
x

vector

p

vector of powers

tukey

if TRUE (default), apply Tukey's power transformation, if FALSE +adopt Box-Cox transformation.

bins

number of bins in the histogram

cex.main

histogram title size (assigned to each histogram plot)

col

histogram fill color

border

histogram border color

title

Overall plot title (set to NULL for no title)

...

other parameters passed to the graphics::hist function.

+ +

Details

+

The output is a lattice of descriptive plots showing the transformed data - across different powers.

-
-
-

References

+ across different powers.

+

References

+

Exploratory Data Analysis, by John Tukey

-
-
-
-

Examples

-
data(mtcars)
-eda_unipow(mtcars$mpg, bins=6)
-
-
-
+

Examples

+
data(mtcars)
+eda_unipow(mtcars$mpg, bins=6)
+
+
+ +
-
+
+ - + + - diff --git a/docs/reference/index.html b/docs/reference/index.html index 372f0eb..74bd171 100755 --- a/docs/reference/index.html +++ b/docs/reference/index.html @@ -1,12 +1,66 @@ + -Function reference • tukeyedar + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + + + + + +
-
-
+ + +
- + +
-

All functions

+ + + + + + + + + + + - + + + + + + + + - + + - + + - + + - + + - + + - + + - + + - + + - + + - + + - + + - + + - + + -
+

All functions

+

age_height

Age vs. height for private and rural school children

+

eda_3pt()

3-point summary plot

+

eda_bipow()

Ladder of powers transformation on bivariate data with three-point summary plot

+

eda_boxls()

Create boxplots equalized by level and spread

+

eda_lm()

Least Squares regression plot (with optional LOESS line)

+

eda_lsum()

Tukey's letter value summaries

+

eda_re()

Re-expression function

+

eda_rline()

Tukey's resistant line

+

eda_sl()

Tukey's spread-level function

+

eda_trim() eda_trim_df() eda_ltrim() eda_ltrim_df() eda_rtrim() eda_rtrim_df()

Trims vector and dataframe objects

+

eda_unipow()

Ladder of powers transformation on a single vector

+

neoplasms

Breast cancer mortality vs. temperature

+

nine_point

Andrew Siegel's pathological 9-point dataset

+

tukeyedar

Tukey inspired exploratory data analysis functions

+
+
+ +
-
+
+
- + + - diff --git a/docs/reference/neoplasms.html b/docs/reference/neoplasms.html index bcd5068..e206a73 100755 --- a/docs/reference/neoplasms.html +++ b/docs/reference/neoplasms.html @@ -1,13 +1,68 @@ + -Breast cancer mortality vs. temperature — neoplasms • tukeyedar + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + + + + + +
-
-
+ + +
@@ -64,48 +134,48 @@

Breast cancer mortality vs. temperature

and breast cancer mortality rate.

-
neoplasms
+
neoplasms
-
-
-

Format

-

A data frame with 16 rows and 2 variables:

Temp
-

Temperature in degrees Fahrenheit.

-
Mortality
-

Mortality rate presented as an index.

+

Format

+

A data frame with 16 rows and 2 variables:

+
Temp

Temperature in degrees Fahrenheit.

+
Mortality

Mortality rate presented as an index.

... -
-
-

Source

+ + +

Source

+

Applications, Basics and Computing of Exploratory Data Analysis, P.F. Velleman and D.C. Hoaglin, 1981. (page 127)

-
-
+ +
-
+
+ - + + - diff --git a/docs/reference/nine_point.html b/docs/reference/nine_point.html index 8514518..2d1840b 100755 --- a/docs/reference/nine_point.html +++ b/docs/reference/nine_point.html @@ -1,14 +1,69 @@ + -Andrew Siegel's pathological 9-point dataset — nine_point • tukeyedar + + + + + +Andrew Siegel's pathological 9-point dataset — nine_point • tukeyedar + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + + + + + +
-
-
+ + +
@@ -66,50 +136,50 @@

Andrew Siegel's pathological 9-point dataset

book.

-
nine_point
+
nine_point
-
-
-

Format

-

A data frame with 9 rows and 2 variables:

X
-

X values

-
Y
-

Y values

+

Format

+

A data frame with 9 rows and 2 variables:

+
X

X values

+
Y

Y values

... -
-
-

Source

+ + +

Source

+

Robust regression using repeated medians, Andrew F. Siegel, Biometrika, vol 69, n 1, 1982.

Understanding robust and exploratory data analysis, by D.C. Hoaglin, F. Mosteller and J.W. Tukey. 1983 (page 139)

-
-
+ +
-
+
+ - + + - diff --git a/docs/reference/tukeyedar.html b/docs/reference/tukeyedar.html index a32be16..fa89688 100755 --- a/docs/reference/tukeyedar.html +++ b/docs/reference/tukeyedar.html @@ -1,13 +1,68 @@ + -Tukey inspired exploratory data analysis functions — tukeyedar • tukeyedar + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + + + + + +
-
-
+ + +
@@ -65,30 +135,33 @@

Tukey inspired exploratory data analysis functions

-
-
+
+ +
- + - + + - diff --git a/vignettes/Introduction.Rmd b/vignettes/Introduction.Rmd index 6b9f968..b582a45 100755 --- a/vignettes/Introduction.Rmd +++ b/vignettes/Introduction.Rmd @@ -9,8 +9,8 @@ output: css: style.css vignette: > %\VignetteIndexEntry{Introduction to EDA functions} - %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} + \usepackage[utf8]{inputenc} editor_options: chunk_output_type: console ---