This repository contains code for reproducing the empirical estimates and Monte Carlo simulations in Torgovitsky, A., "Nonparametric Inference on State Dependence in Unemployment," Econometrica, Vol. 87, No. 5 (September, 2019)
The code included in the supplemental material for Econometrica is from May 3,
2019.
Please download the most recent version of the code from the GitHub repository.
-
MATLAB. No special toolboxes are required.
-
A Mathematical Programming Language (AMPL). The trial/student version of AMPL is size restricted. Some limited versions of the code might run with the trial version, but a full license is required to reproduce the results in the paper.
-
A linear programming solver for AMPL. The default is CPLEX. The default can be changed by passing e.g.
Settings.Solver = 'gurobi'
when calling./src/DPO.m
. -
The AMPL-MATLAB API
-
Linux (or perhaps OSX). I coded this on a Linux system and made no attempt to be platform-independent. However, the code is primarily in MATLAB, so should be mostly platform-independent. Some file operations are used for recording the results. These would be likely sources of issues for other operating systems, but should be easy enough to fix.
-
Important first step: Open
./cfg/Config.m
.- Change
ResultsPath
to a directory where you want results to be saved. - Change
AMPLAPISetupPath
to the location where you installed the AMPL-MATLAB API.
- Change
-
The primary code for the DPO model is contained is
./src/DPO.m
and the routines called from within. It contains many options, which are given default values in the structure calledSettings
that is defined at the top of that file. The code for the comparison parametric dynamic binary response (PDBR) model is contained in./src/PDBR.m
. -
The directory
./bin/
contains a file calledRunSIPP.m
that generates the empirical results in the paper.-
Selecting
SimSet = main
andSimNum = n
will produce columnn
(forn
between 1 and 12) in Table 2 of the paper. -
Selecting
SimSet = sigma
orSimSet = sigma-young
andSimNum = n
will produce results for then
th gridpoint in Figures 2 and 3 (n
between 1 and 9). -
Selecting
SimSet = extra
will produce results for Appendix Table S4 in the supplemental appendix.
-
-
Running all of the empirical results in the paper will take a long time, primarily due to the procedure for constructing confidence regions.
-
Confidence regions can be turned off by changing
Settings.BuildConfidenceRegions = 1
toSettings.BuildConfidenceRegions = 0
in the functionLoadSpec
in./bin/RunSIPP.m
. -
The number of bootstrap replications can be reduced by lowering
Settings.B = 250
to some smaller number, also in the functionLoadSpec
in./bin/RunSIPP.m
. -
Multiple results for each
SimSet
can be produced simultaneously by using the file./bin/BatchRunSIPP.m
This is basically a poor-man's parallel that opens up multiple MATLAB threads. (Unfortunately, the AMPL-MATLAB API is not easy to parallelize, which is why I am using this crude workaround.) Using this function, all of the results of (e.g.) Table 1 can be produced with the commandBatchRunSIPP('your-save-dir', 'main')
. Note that this will open 12 MATLAB and AMPL instances at one time, which will strain a typical system.
-
-
The directory
./bin/
also contains a file calledRunMonteCarlo.m
that generates the simulation results for the Monte Carlos reported in the supplemental Appendix.SimNumber = 1
produces the results for Table S1 and Figure S1.SimNumber = 2
produces the results for Table S2.SimNumber = 3
produces the results for Table S3.
-
The directory
./bin/
contains a batching file for the Monte Carlos calledBatchRunMonteCarlo.m
. This opens three MATLAB threads that produce results for three different sample sizes forSimNumber = 1
or3
.
-
The cleaned data used for both the empirical results and simulations is contained in
./data/sipp08.tsv
and./data/sipp08-young.tsv
. These files are included with the repository. There is also a wide-form./data/sipp08-wide.tsv
that I use for producing the table of summary statistics (Table 1). -
I have included a Bash script
./data/DownloadAndCleanSIPP.sh
that downloads the raw 2008 SIPP data from the NBER page, converts it to Stata format and then creates my extract:-
The script uses the Stata dictionary and do files provided by the NBER, except that I have edited the beginning of the do files to remove their annoying directory hardcoding. These files are included with the repository in
./data
. -
By default, the script deletes the ~8--10GB of SIPP data once my extract is created. This can be changed by running
./data/DownloadAndCleanSIPP.sh nocleanup
instead. -
Part of the script involves running
./data/CleanSIPP.do
in Stata, which implements the sample selection rules discussed in the paper.
-
The directory ./post
contains some Python scripts and LaTeX templates used to
turn the empirical and simulation results into tables and figures.
For the empirical results:
-
Table 1 (summary statistics) is generated by running
./post/BuildSumStatsTable.py ./data/sipp08-wide.tsv destdir
wheredestdir
is the output location. -
Table 2 (main empirical results) is generated by
./post/BuildResultsTable.py simdir/results/main
wheresimdir
is the location of a simulation directory andmain
is the directory name that is created when theSimSet
variable (see above) is set tomain
to produce the main results. In order to fully reproduce Table 2, allSimNum
's (1 through 12) should have been called so that there are directoriessimdir/results/main/001
throughsimdir/results/main/012
. The completed table will be located in/simdirs/results/main/
. -
Figure 2 (sensitivity analysis) is generated by
./post/BuildResultsTable.py simdir/results/sigma
wheresimdir
is the location of a simulation directory andsigma
is the directory name that is created when theSimSet
variable (see above) is set tosigma
to produce the sensitivity analysis. -
Figure 3 (sensitivity analysis with young sample) is generated like Figure 2 but with
sigma-young
in place ofsigma
. -
Figure S4 (extra results) is generated like Table 2 but with
extra
in place ofmain
.
For the Monte Carlo simulations:
-
Figure S1 and Table S1 are generated by running
./post/BuildMCEstTable.py simdir/results/
wheresimdir
is the location of a simulation directory containing the results fromBatchRunMonteCarlo.m
withSimNumber = 1
. -
Similarly, figure S3 is generated by running
./post/BuildMCEstTable.py simdir/results/
afterBatchRunMonteCarlo.m
withSimNumber = 3
. -
Figure S2 is generated by running
./post/BuildMCTestTable.py simdir/results/
when after runningRunMonteCarlo.m
withSimNumber = 2
.
The results in the published paper were run with:
- MATLAB version 9.3.0.713579 (R2017b)
- AMPL version 20180927
- Gurobi version 8.1.0
- AMPL-MATLAB API version 1.4.0
- Stata 13.1 (for data cleaning only)
The code uses two user-written MATLAB functions both located in ./ext
:
-
csvimport.m written by Ashish Sadanandan
-
GaussQuad written by Geert Van Damme
Please use GitHub to open an issue and I will be happy to look into it.