Skip to content

R functions to compute bias-adjusted treatment effect when selection on unobservables is proportional to selection on observables

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

dbasu-umass/ovbias

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ovbias

The goal of the ovbias package is to present some functions to compute quantiles of the empirical distribution of the bias-adjusted treatment effect in a linear econometric model with omitted variable bias. To analyze such models, one can consider three regression models: (a) a short regression model where the outcome variable is regressed on the treatment variable, with or without additional controls; (b) an intermediate regression model where additional control variables are added to the short regression; (c) a hypothetical long regression model, where and index of the omitted variable(s) is added to the intermediate regressions. The selection on unobservables is captured by the parameter delta; and the R-squared in the hypothetical long regression is captured by the parameter Rmax. In addition, we need to consider an auxiliary regression where the treatment variable is regressed on all observed (and included) control variables.

This package provides two functions to compute quantiles of the empirical distribution of the bias-adjusted treatment effect, bset1() and bset2().

To use the bset1() function, the user will need to run the short, intermediate and auxiliary regressions, collect relevant parameters and supply them to bset()`. The relevant parameters are:

  • beta0: treatment effect in short regression
  • R0: R-squared in short regression
  • betatilde: treatment effect in intermediate regression
  • Rtilde: R-squared in intermediate regression
  • taux: variance of residual in auxiliary regression
  • sigmay: standard deviation of the outcome variable
  • sigmax: standard deviation of the treatment variable.

To use the bset2() function, the user will need to run supply a data frame (with all relevant variables), name of the outcome variable, name of the treatment variable, and formulas for the short, intermediate and auxiliary regressions. The function will run the short, intermediate and auxiliary regressions, and collect the relevant parameters internally.

In both functions, the relevant parameters are used to construct a cubic equation in the omitted variable bias. The cubic equation is solved on a NxN grid over a bounded box of the (delta,Rmax) plane, and the roots of the cubic are used to construct the bias-adjusted treatment effect. The bias-adjusted treatment effect is, by definition, the treatment effect estimated from the intermediate regression, betatilde, and the real root of the cubic equation.

The bounded box of the (delta,Rmax) plane is defined by the Cartesian products of two intervals: [deltalow, deltahigh], and [Rtilde, Rmax]. We consider 4 different regimes: (1) low delta, low Rmax, where deltalow < delta < 1 and Rtilde < Rmax < 1.30 Rtilde; (2) low delta, high Rmax, where deltalow < delta < 1 and Rtilde < Rmax < 2.47 Rtilde; (3) high delta, low Rmax, where 1 < delta < deltahigh and Rtilde < Rmax < 1.30 Rtilde; (4) high delta, high Rmax, where 1 < delta < deltahigh and Rtilde < Rmax < 2.47 Rtilde.

For each regime, the bounded box is divided into two parts: (a) URR region, over which the cubic equation has a unique real root, and (b) NURR region, over which the cubic equation has three real roots. For the NURR region, the root whose distribution is closest to the distribution of the root over the URR region is chosen. If the URR region is empty, unique real roots of the cubic cannot be computed. Hence, the bias-adjusted treatment effect cannot be computed.

The output of the bset1() function is a list of two elements. The first element is a data frame of the parameters about the three regressions supplied by the user; the second element of the list is a matrix containing the empirical distribution of the bias-adjusted treatment effect for the four regimes.

The output of the bset2() function is a list of two elements. The first element is a data frame of the parameters extracted from the three regressions; the second element of the list is a matrix containing the empirical distribution of the bias-adjusted treatment effect for the four regimes.

For more details see Basu, D. (2021). “Bounding Sets for Treatment Effects with Proportional Selection”. Economics Department Working Paper Series. 307. University of Massachusetts Amhers. URL: https://scholarworks.umass.edu/econ_workingpaper/307. For latest version of the paper use URL: https://drive.google.com/file/d/1v-IGh9_pKqAu7ALucNGTWJXBx0A0STxJ/view?usp=sharing

Installation

You can install ovbias like so:

# install.packages("devtools")
devtools::install_github("dbasu-umass/ovbias")

Example

Creating a Simulated Data Set

Let us create a simulated data set.

library(ovbias)

# Parameters
a0 <- 0.5
a1 <- 0.25
a2 <- 0.75
a3 <- 1.0
a4 <- -0.75 
a5 <- -0.25

# Regressors
set.seed(999)
x1 <- rnorm(100, mean=0, sd=1)
x2 <- rnorm(100, mean=1, sd=2)
x3 <- rnorm(100, mean=2, sd=3)
x4 <- rnorm(100, mean=3, sd=4)
x5 <- rep(c(0,1),50)

# Outcome variable
y <- a0 + a1*x1 + a2*x2 + a3*x3 + a4*x4 + a5*x5

# Create data frame
d1 <- data.frame(y=y,x1=x1,x2=x2,x3=x3,x4=x4,x5=x5)

The researcher is interested in estimating the effect of the treatment variable, x1, on the outcome variable, y. Suppose the researcher is unable to collect data on the following two variables x4 and x5. When the researcher estimates a linear model, the treatment effect is estimated with bias (because of the two omitted variables).

Using the bset2() function

We would like to use the bset2() function to find bounding sets of the bias-adjusted treatment effect. We will use deltalow=0.01 and deltahigh=5.0 in our call of the bset2() function. We have to supply the data frame, the names of the outcome and treatment variables, and formulas for the short, intermediate and auxiliary regressions.

# Generate results
bset2(
  data = d1,
  outcome = "y",
  treatment = "x1",
  shortreg = y ~ x1,
  intreg = y ~ x1 + x2 + x3,
  auxreg = x1 ~ x2 + x3, 
  deltalow = 0.01, 
  deltahigh=5.00
  )
#> $bsetpar
#>    beta0 R0 betatilde Rtilde sigmay sigmax  taux
#> 1 -0.055  0      0.84  0.559  4.325  0.943 0.802
#> 
#> $bsetqnt
#>            Area    2.5%      5%     50%    95%  97.5%
#> LL (URR)  0.740   0.842   0.844   0.883  0.988  1.005
#> LL (NURR) 0.250   0.859   0.865   0.969  1.079  1.092
#> LH (URR)  0.734   0.844   0.847   0.945  1.223  1.269
#> LH (NURR) 0.256   0.868   0.884   1.169  1.474  1.507
#> HL (URR)  0.080  -9.319  -9.300  -9.179 -9.116 -9.112
#> HL (NURR) 0.920 -12.544 -12.124  -9.762 -9.183 -9.155
#> HH (URR)  0.344  -9.700  -9.642  -9.265 -9.095 -9.083
#> HH (NURR) 0.656 -12.595 -12.274 -10.059 -9.234 -9.181

To see the relevant parameters that emerged from the short, intermediate and auxiliary regressions, look at the first element of the list, $bsetpar, and recall:

  • beta0: treatment effect in short regression
  • R0: R-squared in short regression
  • betatilde: treatment effect in intermediate regression
  • Rtilde: R-squared in intermediate regression
  • taux: variance of residual in auxiliary regression
  • sigmay: standard deviation of the outcome variable
  • sigmax: standard deviation of the treatment variable.

To see the quantiles of the empirical distribution of the bias-adjusted treatment effect look at the second element of the list, $bsetqnt. The rownames of this matrix is informative about the region used for constructing the bias-adjusted treatment effect. For example, the LL (URR) row refers to a low delta, low Rmax regime, where the roots have been computed over the URR area. Aa another example, note that the LH (NURR) row refers to a low delta, high Rmax regime, where the roots have been computed over the NURR area. Recall that for the NURR area, there are three real roots and we choose the root whose empirical distribution is closest to the empirical distribution of the root computed over the URR area (using the same regime).

Using the bset1() function

We would like to use the bset1() function to compute quantiles of the bias-adjusted treatment effect. We will use deltalow=0.01 and deltahigh=5.0 in our call of the bset1() function - to compare our results with the use of the bset2() function above. We have to run the short, intermediate and auxiliary regressions, collect the relevant paramters and supply it to bset1().

Let us run the short regression and collect the relevant parameters.

sreg <- stats::lm(y~x1,data=d1)
beta0 <- sreg$coefficients["x1"]
R0 <- summary(sreg)$r.squared

Let us run the intermediate regression and collect the relevant parameters.

ireg <- stats::lm(y~x1+x2+x3,data=d1)
betatilde <- ireg$coefficients["x1"]
Rtilde <- summary(ireg)$r.squared

Let us run the auxiliary regression and store the variance of the residuals.

auxreg <- stats::lm(x1~x2+x3,data=d1)
taux <- var(auxreg$residuals, na.rm = TRUE)

Let us store the standard deviation of the outcome variable.

sigmay <- sd(d1$y, na.rm = TRUE)

Let us store the standard deviation of the treatment variable.

sigmax <- sd(d1$x1, na.rm=TRUE)

We have generated all the relevant parameters and can now use these to call the bset1() function.

bset1(
 beta0 = beta0,
 R0 = R0,
 betatilde = betatilde,
 Rtilde = Rtilde,
 sigmay = sigmay,
 sigmax = sigmax,
 taux = taux,
 deltalow = 0.01,
 deltahigh=5.00
 )
#> $bsetpar
#>    beta0 R0 betatilde Rtilde sigmay sigmax  taux
#> 1 -0.055  0      0.84  0.559  4.325  0.943 0.802
#> 
#> $bsetqnt
#>            Area    2.5%      5%     50%    95%  97.5%
#> LL (URR)  0.740   0.842   0.844   0.883  0.988  1.005
#> LL (NURR) 0.250   0.859   0.865   0.969  1.079  1.092
#> LH (URR)  0.734   0.844   0.847   0.945  1.223  1.269
#> LH (NURR) 0.256   0.868   0.884   1.169  1.474  1.507
#> HL (URR)  0.080  -9.319  -9.300  -9.179 -9.116 -9.112
#> HL (NURR) 0.920 -12.544 -12.124  -9.762 -9.183 -9.155
#> HH (URR)  0.344  -9.700  -9.642  -9.265 -9.095 -9.083
#> HH (NURR) 0.656 -12.595 -12.274 -10.059 -9.234 -9.181

Note that the results are identical to the ones we got above using the bset2() function.

About

R functions to compute bias-adjusted treatment effect when selection on unobservables is proportional to selection on observables

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages