File intended to store references that may be useful for empirical work focused on data science tools (emphasis in R) and academic tips (emphasis in Economists). Files with ⭐ are personal favorites.
⭐ Template for Empirical Papers - Ricardo Dahis - This folder provides an all-encompassing working structure for empirical papers. It organizes every step of the process: merging and cleaning (several) data sets, performing analyses (tables, figures, regressions), writing the article itself and also presentations.
⭐ Sample Replication Package - Julian Reif - This repository includes a short paper and its accompanying replication code.
⭐ Steve's R Markdown Templates - A suite of R Markdown templates for 1) academic manuscripts, 2) Beamer presentations, and 3) academic syllabi.
⭐ Template for Reproducible Empirical Accounting Research - This repository, while predominantly being targeted at the team members of our research network, provides a structured platform for reproducible R-based research projects in general. To make it more accessible to everybody who is new to R, we also “produced” a short video series that shows you how to set up your local computing environment and to reproduce the toy analysis contained in the repository. Based on this, you should be able to build your own research projects in a reproducible way.
Generic Paper Template - Lars Vilhuber - simple template with some examples of programs for the setup of multiple software.
Gentzkow-Shapiro Lab template - The GSLab Template is a minimal working demonstration of the tools and organization used by projects in the GSLab. We use SCons and a few custom builders to execute scripts and track dependencies in a portable and flexible manner.
Template for Overleaf - A LaTeX Template for Economics Papers.
⭐ Code and Data Guide - Gentzkow & Shapiro - Code and Data for the Social Sciences: A Practitioner’s Guide by Matthew Gentzkow and Jesse M. Shapiro (2014).
⭐ Tilburg Science Hub - Tilburg Science Hub (TSH) is an open-source online resource that helps individual researchers, data scientists, and teams to efficiently carry out data- and computation-intensive projects. It provides information about workflow and data management and tutorials that teach researchers how to organize and document their data and code, so the research becomes sustainable and reproducible. This in turn leads to time savings and transparency in the process.
⭐ Development Research in Practice: The DIME Analytics Data Handbook - This book is intended to teach all users of development data how to handle data effectively, efficiently, and ethically. It covers the full data workflow for a complex research project using original data.
⭐ R Guide - This guide provides instructions for using R on research projects. Its purpose is to use with collaborators and research assistants to make code consistent, easier to read, transparent, and reproducible.
⭐ Unofficial guidance on various topics by the AEA Data Editor - AEA Data Editor Unofficial Guidance on how to create data and code replicable supplements.
⭐ Coding for Economists: A Language-Agnostic Guide to Programming for Economists - Ljubica “LJ” Ristovska - The first part of the presentation focuses on general computer science concepts, guidelines, and programming tips. The second part of the presentation with Frank Pinter will introduce version control via Git
⭐ Translating Stata to R - This website is for Stata users who are interested in learning R. We provide side-by-side code snippets for common tasks in both Stata and R, so that users have a dictionary for navigating across the two languages.
⭐ Reproducible Analytical Pipelines - Bruno Rodrigues - The basic idea of a reproducible analytical pipeline (RAP) is to have code that always produces the same result when run, whatever this result might be. This is obviously crucial in research and science, but this is also the case in businesses that deal with data science/data-driven decision making etc.
Social Science Reproduction Platform - The Social Science Reproduction Platform (SSRP) crowdsources and catalogs attempts to assess and improve the computational reproducibility of social science research. Instructors can use the SSRP in applied social science courses at the graduate or undergraduate levels to teach fundamental concepts, methods, and reproducible research practices.
Project TIER - Project dedicated to developing methods and tools for enhancing research transparency.
Unofficial guidance on various topics by Social Science Data Editors - Guidance on creating replicable data and program archives. Guidance on testing replicability of code.
Replication tutorial - Lars Vilhuber - Replication and Reproducibility in Social Sciences and Statistics: Overview and Practice (2019).
Cutting Edge Reproducibility in Economics - Lars Vilhuber - Prepared for presentation at BITSS 2022 Annual Meeting on 2022-02-11. This presentation is about reproducibility, and it is created in a reproducible way.
RA Manual - Gentzkow & Shapiro Lab - Manual intended to introduce their workflow for new Research Assistants
Archive GitHub repo with Zenodo - Guide on how to archive GitHub repository and assign a DOI with Zenodo.
Reproducible R toolbox - RPubs post about reproducibility with R.
GitHub and Dropbox - Tutorial on how to combine GitHub and DropBox
A Brief Introduction to GitHub for Social Scientists using Stata and Dropbox - This is a Hello World for social scientists using Stata and Dropbox! Here, we will walk through the basics of GitHub: how to download a repository from the internet, how to integrate your repository with Dropbox, and how to upload the changes that you made in your repository to the cloud of GitHub.
Replicability Presentations - Lars Vilhuber - In this talk, I describe the context in which the current discussion in the social science is occurring: what are the definitions of replicability and reproducibility, what is failing, and to what extent. I discuss progress over the past 15 years. Finally, I discuss the concrete measures that have been implemented under my guidance at the American Economic Association, and the first preliminary outcomes from those measures. I conclude with some observations on how to integrate reproducibility into the scientific workflow in the social and statistical sciences.
Research Compendia - Paper about uses of R to produce research compendia.
⭐ Code and data for "Skeptic priors and climate consensus" (McDermott, 2021)
Suparna Chaudhry, Marc Dotson, and Andrew Heiss (2021)
⭐ R for Data Science Book - Book with a practicum of skills for data science in R.
⭐ Advanced R Book - Book designed primarily for R users who want to improve their programming skills and understanding of the language.
⭐ Data Visualization with R - This book helps you create the most popular visualizations - from quick and dirty plots to publication-ready graphs. The text relies heavily on the ggplot2 package for graphics, but other approaches are covered as well.
⭐ Data Visualization: A practical introduction - Kieran Healy - This book is a hands-on introduction to the principles and practice of looking at and presenting data using R and ggplot. R is a powerful, widely used, and freely available programming language for data analysis. You may be interested in exploring ggplot after having used R before, or be entirely new to both R and ggplot and just want to graph your data. I do not assume you have any prior knowledge of R.
⭐ Data Science for Economists - Grant R. McDermott - Introduction to the modern data science toolkit (focused on R) graduate course.
⭐ Course Materials for Advanced Data Analytics in Economics - Nick Hagerty - The idea is to explicitly teach skills/tools that otherwise are only gained through RA experiences. Help level the playing field for research/PhDs/predocs.
Data Science for Economists and Other Animals - Grant R. McDermott - This is the website for Data Science for Economists and Other Animals. The book is very much in the early development stages, but draws from lecture material that we have been refining over the last several years.
⭐ Telling Stories With Data - Rohan Alexander - The focus is on using quantitative methods to tell stories with data.
⭐ Happy Git with R - Instructions to integrate Git(Hub) with R and RStudio.
R Econ Visual Library - R code for data visualization in economics, created and maintained by DIME Analytics.
Data Wrangling Workshop with R and Tidyverse - slides, video, and walk-through example of a workshop about data wrangling using Tidyverse in R.
Statistical Inference via Data Science: A ModernDive into R and the Tidyverse - This book assumes no prerequisites: no algebra, no calculus, and no prior programming/coding experience. This is intended to be a gentle introduction to the practice of analyzing data and answering questions using data the way data scientists, statisticians, data journalists, and other researchers would.
R: Uma Introdução Para Economistas - introduction to R for economists (reference in portuguese)
Parallel Computing with R - a comprehensive overview of all things R parallel (working paper)
Databases using dplyr - RStudio's tutorial
R interface for Apache Spark - RStudio's tutorial
⭐ Reproducible Research in R - The main aim of this workshop is to set you on the right path of making your research more reproducible and shariable.
R Markdown: The Definitive Guide - Book for R Markdown references.
R Markdown Cookbook - This book is designed to provide a range of examples on how to extend the functionality of your R Markdown documents. As a cookbook, this guide is recommended to new and intermediate R Markdown users who desire to enhance the efficiency of using R Markdown and also explore the power of R Markdown.
Academic Publications with R Markdown - slide presentation about using R markdown to write academic papers
⭐ Geocomputation with R - geographic data analysis, visualization, and modeling.
Spatial Data Science - The book aims at data scientists who want to get a grip on using spatial data in their analysis. To exemplify how to do things, it uses R.
R as GIS for Economists - This book aims particularly at spatial data processing for econometric projects, where spatial variables become part of econometric analysis.
Open-Source Spatial Analytics (R) - In this course, you will learn to work within the free and open-source R environment with a specific focus on working with and analyzing spatial data.
Spatio-Temporal Statistics with R - Book about spatial statistics topics
⭐ groundhog - Reproducible Scripts via Version-Specific Package Loading
⭐ renv - Underlying the philosophy of renv is that any of your existing workflows should just work as they did before – renv helps manage library paths (and other project-specific states) to help isolate your project’s R dependencies, and the existing tools you’ve used for managing R packages (e.g. install.packages(), remove.packages()) should work as they did before
workflowr - The workflowr R package helps researchers organize their analyses in a way that promotes effective project management, reproducibility, collaboration, and sharing of results. Workflowr combines literate programming (knitr and rmarkdown) and version control (Git, via git2r) to generate a website containing time-stamped, versioned, and documented results. Any R user can quickly and easily adopt workflowr.
targets - The targets package is a Make-like pipeline toolkit for Statistics and data science in R. With targets, you can maintain a reproducible workflow without repeating yourself. targets skips costly runtime for tasks that are already up to date, runs the necessary computation with implicit parallel computing, and abstracts files as R objects. A fully up-to-date targets pipeline is tangible evidence that the output aligns with the code and data, which substantiates trust in the results
steveproj - his package will allow a researcher to start and better maintain an academic project around Make, the R programming language, Rstudio, and some other features of my R ecosystem (prominently: {stevetemplates}). Features of {steveproj} are subject to change while in development but the core of it is, I think, ready to go.
checkpoint - The goal of the checkpoint package is to solve the problem of package reproducibility in R. Since packages get updated on CRAN all the time, it can be difficult to recreate an environment where all your packages are consistent with some earlier state. To solve this issue, checkpoint allows you to install packages locally as they existed on a specific date from the corresponding snapshot (stored on the checkpoint server) and it configures your R session to use only these packages. Together, the checkpoint package and the checkpoint server act as a CRAN time machine so that anyone using checkpoint() can ensure the reproducibility of their scripts or projects at any time.
⭐ Tidyverse - a collection of packages that share an underlying design philosophy, grammar, and data structures. Used for data cleaning, wrangling, visualization, and more.
⭐ data.table - package for data manipulation. Fast, memory efficient, concise, stable, dependency-free. Recommended for large data.
⭐ collapse - collapse is a C/C++ based package for data transformation and statistical computing in R. It’s aims are: To facilitate complex data transformation, exploration and computing tasks in R. To help make R code fast, flexible, parsimonious and programmer friendly. It further implements a class-agnostic approach to data manipulation in R, supporting base R, dplyr (tibble), data.table, sf, plm classes for panel data (‘pseries’ and ‘pdata.frame’), and non-destructively handling other matrix or data frame based classes (including most time series classes such as ‘ts’, ‘xts’ / ‘zoo’, ‘timeSeries’, ‘tsibble’, ‘tibbletime’, etc.).
dtplyr - dtplyr provides a data.table backend for dplyr. The goal of dtplyr is to allow you to write dplyr code that is automatically translated to the equivalent, but usually much faster, data.table code.
tidyfast - provide fast and efficient alternatives to some tidyr (and a few dplyr) functions using data.table under the hood.
dbplyr - dbplyr is the database backend for dplyr. It allows you to use remote database tables as if they are in-memory data frames by automatically converting dplyr code into SQL.
sparklyr - Spark is a unified analytics engine for large-scale data processing. Spark can scale in ways that R simply can’t.
⭐ sf - spatial manipulation (vector-based)
⭐ terra - spatial manipulation (raster-based)
⭐ tmap - With the tmap package, thematic maps can be generated with great flexibility. The syntax for creating plots is similar to that of ggplot2, but tailored to maps.
mapsf - Create and integrate thematic maps in your R workflow. This package helps to design various cartographic representations such as proportional symbols, choropleth or typology maps. It also offers several functions to display layout elements that improve the graphic presentation of maps (e.g. scale bar, north arrow, title, labels). mapsf maps sf objects on base graphics
leaflet - plot interactive maps
rgee - rgee is a binding package for calling Google Earth Engine API from within R. Additionally, several functions have been implemented to make simple the connection with the R spatial ecosystem. The current version of rgee has been built considering the earthengine-api 0.1.235. Note that access to Google Earth Engine is only available to registered users.
⭐ model summary - modelsummary creates tables and plots to summarize statistical models and data in R.
⭐ fixest - provides a family of functions to perform estimations with multiple fixed-effects. This package is currently (Feb. 2020) the fastest software available to perform fixed-effects estimations.
vtable - The vtable package is designed to help you quickly and efficiently look at and document your data. It is also very good at producing the type of “out of the box” summary tables that economists like.
broom - helps to convert regression outputs into “tidy” data frames.
estimatr - dedicated to providing fast estimators that take into consideration designs often used by social scientists.
lfe - Useful for estimating linear models with multiple group fixed effects. Offers near-identical functionality to the popular Stata library, reghdfe.
drdid - used to compute the locally efficient doubly robust estimators for the ATT in difference-in-differences (DiD) setups.
did - contains tools for computing average treatment effect parameters in Difference in Differences models with more than two periods, with variation in treatment timing across individuals, and where the DID assumption possibly holds conditional on covariates.
HonestDiD - Robust inference in differences-in-differences and event study designs using methods developed in Rambachan and Roth (2019).
RDD Packages - Software packages for analysis and interpretation of regression discontinuity designs and related methods. Replication files and illustration codes employing these packages are also available.
NP Packages - Software packages for nonparametric and semiparametric smoothing methods with application to causal inference, treatment effect and program evaluation estimation and inference. Replication files and illustration codes employing these packages are also available.
RDDtools - RDDtools is a new R package under development, designed to offer a set of tools to run all the steps required for a Regression Discontinuity Design (RDD) Analysis, from primary data visualisation to discontinuity estimation, sensitivity and placebo testing.
gsynth - R package for Generalized Synthetic Control Method: for Causal Inference with Interactive Fixed Effect Models.
scul - This repository contains the R package scul that is used in Hollingsworth and Wing (2020) “Tactics for design and inference in synthetic control studies: An applied example using high-dimensional data.”.
mfx - provides functions that estimate a number of popular generalized linear models, returning marginal effects as output.
margins - margins is an effort to port Stata’s (closed source) margins command to R as an S3 generic method for calculating the marginal effects (or “partial effects”) of covariates included in model objects (like those of classes “lm” and “glm”).
DataExplorer - This R package aims to automate most of data handling and visualization, so that users could focus on studying the data and extracting insights during Exploratory Data Analysis (EDA).
econocharts - Microeconomics/macroeconomics graphs made with ggplot2. This package allows creating microeconomics or macroeconomics charts in R with simple functions
furrr - combine purrr’s family of mapping functions with the future’s parallel processing capabilities.
ralger - The goal of ralger is to facilitate web scraping in R.
⭐ Causal Inference: The Mixtape - Scott Cunningham - Scott Cunningham introduces students and practitioners to the methods necessary to arrive at meaningful answers to the questions of causation, using a range of modeling techniques and coding instructions for both the R and the Stata programming languages.
⭐ The Effect: An Introduction to Research Design and Causality - Nick Huntington-Klein - The Effect is a book intended to introduce students (and non-students) to the concepts of research design and causality in the context of observational data.
⭐ Econometrics - Bruce Hansen - This textbook is the second in a two-part series covering the core material typically taught in a one-year Ph.D. course in econometrics.
⭐ Literature on Recent Advances in Applied Micro Methods - List of papers with recent advances in applied micro methods
⭐ Mixtape-Sessions - Scott Cunningham - Mixtape Sessions aims to provide high-quality and approachable courses in Casual Inference. Multiple times per year our Causal-Inference Democratizer in Chief, Scott Cunningham, hosts our "Mixtape Sessions" which are our flagship, multi-day workshops aimed towards early causal-inference learners. We also welcome researchers working on the frontier of causal inference methods to host "Mixtape Tracks" which are shorter workshops aimed at advanced topics.
⭐ DiD Reading Group - Presentation of recent papers in the DiD literature by their authors.
⭐ Diff-in-Diff Notes - Asjad Naqvi - This repository tracks the recent developments in the Difference-in-Difference (DiD) literature. Currently, it is just a dump of my bookmarks from different websites including Twitter, GitHub, YouTube etc. This will be sorted out over time as the literature converges to some consensus. But this might still take a while.
⭐ How to Do Empirical Economics - This article presents a discussion among leading economists on how to do empirical research in economics. The participants discuss their reasons for starting research projects, database construction, the methods they use, the role of theory, and their views on the main alternative empirical approaches The article ends with a discussion of a set of articles which exemplify best practice in empirical work.
⭐ World Bank Methodology Posts - This is a curated list of our technical postings, to serve as a one-stop shop for your technical reading. I’ve focused here on our posts on methodological issues in impact evaluation.
⭐ Applied Empirical Methods - Paul Goldsmith-Pinkham - This course is primarily designed for graduate students interested in econometric methods used in empirical research. The goal of this class is to provide an overview of different empirical methods, with an emphasis on practical implementation.
⭐ Guia Brasileiro de Análise de Dados: Armadilhas e Soluções - a book about Brazilian data (crime, health, education, etc.) presenting common pitfalls and solutions when working with it (reference in Portuguese).
⭐ The Gary Chamberlain Online Seminar in Econometrics - Paper presentations, symposiums and tutorials about many Econometric topics.
Program Evaluation for Public Service - Andrew Heiss - Combine research design, causal inference, and econometric tools to measure the effects of social programs (intensive use of R)
Introduction to Causal Inference - Brady Neal
Ph.D. Econometrics (III) taught with R - Ed Rubin - Econometrics with R graduate course.
Applied Economics with R - Hans H. Sievertsen - The tutorial is structured as a complete research project starting with loading the raw data and ending with a chart comparing the estimates across approaches.]
Ph.D. Microeconometrics - Chris Colon - This is a PhD level course in Microeconometrics targeted at students conducting applied research (as opposed to econometricians). In addition to traditional econometric approaches, this course draws connections to recent literature on machine learning.
Empirical Economics with R - Sebastian Kranz - Besides lecture slides, the course consists of interactive web sites that mix video lectures with multiple choice quizzes. Even more important are interactive RTutor problem sets for each chapter. They allow you to work through the topics and applications in your own RStudio environment. You can automatically check your solutions and get hints.
Modern Difference in Difference Designs - Workshop Syllabus - This ten-day workshop will begin with the basic DiD design using two-way fixed effects and build up to the state-of-the-art applications. We will then move into advanced extensions like matching, synthetic control, asymmetric/staggered treatments, dynamic treatments, interference, and heterogeneous treatment effects. We will work through DiD designs with practical examples, assumptions, diagnostics, and code in R and Stata (when available).
Introduction to Econometrics - Bruce Hansen - This textbook is the first in a two-part series covering the core material typically taught in a one-year Ph.D. course in econometrics.
Introduction to Econometrics with R - Interactive learning material that blends R code with the contents of the well-received textbook Introduction to Econometrics by Stock and Watson (2015).
Library of Statistical Techniques (LOST) - Publicly-editable website to make it easy to execute statistical techniques in statistical software.
Oh Shit, Git! - Tips for common git mistakes.
Git Large File Storage - An open-source Git extension for versioning large files.
The Young Economist's Guide to Professional Etiquette - Hamermesh (1992)
Discussing, refereeing, dealing with rejections, and keeping up to date - Ryan B. Edwards (2020)
⭐ How to Write Applied Papers in Economics - Marc F. Bellemare - The goal of this paper is to teach readers how to write applied economics papers that will eventually be published in a peer-reviewed journal.
⭐ Writing Tips for Ph. D. Students - John H. Cochrane - Some tips on how to write academic articles
⭐ Great Economics Writing - Mixtape Sessions
Writing Papers: A Checklist - Michael Kremer
Writing Tips For Economics Research Papers - Plamen Nikolov
The Introduction Formula - Keith Head
Aphorisms on Writing, Speaking, and Listening - Eric Rasmusen - This article collects aphorisms on the mechanics of doing research in economics, emphasizing writing, speaking, and seminar participation. They are intended for both students and for scholars and are useful beyond just economics.
⭐ How to Give an Applied Micro Talk - Jesse M. Shapiro
Tips + Tricks with Beamer for Economists - Paul Goldsmith-Pinkham
Tips on how to avoid disaster in presentations - Monika Piazzesi
How to Present Results - David Levine
Public Speaking for Academic Economists - Rachael Meager
Tips on Being a Good Discussant
The “Big 5” and Other Ideas* For Presentations - Donald Cox
How To Give an Economics Talk - Adam Guren and Stephen Terry
⭐ Guidelines for Referee Reports (2004)
⭐ Preparing a Referee Report: Guidelines and Perspectives - Berk et al. (2015)
Managing references for research - Alex Hollingsworth - This article will outline one approach to setting up the free reference manager Zotero that can be easily integrated with Latex. While I use Latex and Zotero, there are of course other (and likely better) workflows. Certainly a similar set-up can be created using many different software combinations (e.g., Microsoft Word + EndNote). This is only meant as an introduction to see what’s possible. With the goal of reducing the cost of writing and producing research papers.
⭐ Resources for PhD Students - Shanjun Li - For mere mortals, the road to a PhD is long, lonely, arduous, and full of twists and turns. Among other things, it demands patience, perseverance, an open mind, and the courage to seek help. Owing to the generosity of the authors, this list of resources below provides useful guidance from start to finish. Use them well and your journey will be less bumpy.
⭐ Links for Advice to PhD Students - Tobias Klein
⭐ R4Econ - Shared resources for Econ Research Assistants and any other Econs working in R
Resources - Sebastian Tello-Trillo
Useful Links - Anthony Lee Zang
An unofficial guidebook for PhD students in economics and education