Skip to content

evenssalies/Machine-learning

Repository files navigation

What's in here?1

(1) You'll find material on Machine learning (ML) for problems where one's aim is to measure the effects of some more or less controlled $D$ on some $Y$ in the presence of confounders $X$. ML techniques will be discussed as I learn them, with as objective to show when they can be useful to causal inference. (2) In the different code comments I pay particular attention to differences between ML and Econometrics from the perspective of causal inference. To put it more clearly, my aim is not to compare ML and Econometrics for prediction $E(Y|X)$, but to see how ML methods can be used in Econometrics for estimating causal effects $E(Y(1)-Y(0)|D(X))$, where $D$ may be continuous. (3) You'll also find critical references on how AI is changing our perception of science, and the way we live in societies.

Regarding (1), I rely on:

  • Géron, A., 2022. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow, O'Reilly, which applies instance- and model-based learning methods to economic and other data.
  • Machine learning in Python with scikit-learn, FUN-MOOC.

As to point (2), I'll manage to build some kind of dictionary within which one can find words from ML and their translation into econometrics. A few examples: (i) features, factors, explanatory variables are the same thing, (ii) the method of Least Squares used in Econometrics to estimate a model parameters is used for training in ML, (iii) the sum of squared residuals is the cost function, (iv) Overfitting relates to what econometricians call model saturation when specifying and econometric model. A saturated model predicts very well. But saturing a model, though it can be a good control technique, is not key in causal inference. Causal inference requires assumptions and relies on matching techniques where ML can be useful. Some references may help to link the jargon and methods of ML and eocnometrics:

  • Athey, S., Imbens, G., 2019. Machine learning methods that economists should know about, Lien.

I firmly believe there are plenty of econometric methods that data scientists should know about. Before to merge the two disciplines, first I need to learn ML methods, which will take a few years.

On (3):

  • Le Cun, Y., 2014. Quand la machine apprend, Odile Jacob.
  • Mallat, S., 2018. Sciences des données et apprentissage en grande dimension – Leçons inaugurales du Collège de France, fayard.
  • Sadin, E., 2023. La vie spectrale – Penser l'ère du métavers et des IA génératives, Grasset.

Programs

  • Perceptron for a binary classification.
  • Linear regression with a test set (no validation set?).
  • Classification without a test set.
  • Classification with a test set.
  • Classification, Semi-Supervised Learning.

Footnotes

  1. Acknowledgments. I am grateful to Alexandre Mutel.