Skip to content

Latest commit

 

History

History
71 lines (49 loc) · 5.12 KB

README.md

File metadata and controls

71 lines (49 loc) · 5.12 KB

Statistical Learning and Data Mining (QBUS6810)

Marcel Scharth, The University of Sydney

This is a repository for the Jupyter Notebooks and code used in Statistical Learning Data Mining, postgraduate unit at the University of Sydney Business School. I additionally provide the lectures in case you need them for future reference.

This version: Semester 2, 2017.

Tutorials in Python

Tutorial 1: Working with Data in Python
Tutorial 2: K-Nearest Neighbours Regression
Tutorial 3: Regression Modelling
Tutorial 4: Cross Validation
Tutorial 5: The Bootstrap
Tutorial 6: Linear Model Selection and Regularisation
Tutorial 7: Naive Bayes and Sentiment Analysis
Tutorial 8: Logistic Regression and Gaussian Discriminant Analysis
Tutorial 9: Regression Splines
Tutorial 10: Regression Trees
Tutorial 11: Model Stacking
Tutorial 12: Credit Risk Modelling

Lectures

Module 1: Introduction to Statistical Learning
Module 2: Linear Regression and Statistical Thinking
Module 3: K-Nearest Neighbours Regression
Module 4: Regression Modelling
Module 5: Model Selection
Module 6: The Bootstrap
Module 7: Estimation Methods (reference module)
Module 8: Linear model Selection and Regularisation I
Module 9: Linear model Selection and Regularisation II
Module 10: Classification I
Module 11: Classification II
Module 12: Nonlinear Modelling
Module 13: Tree-based Methods
Module 14: Model Stacking
Module 15: Boosting

Acknowledgement: these lectures use figures from Introduction to Statistical Learning and Elements of Statistical Learning (see below).

References

Textbook:

An Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani.

The lectures and tutorials also draw on material from:

The Elements of Statistical Learning by Trevor Hastie and Robert Tibshirani.

Statistical Methods in Customer Relationship Management by V. Kumar and J. Andrew Petersen.

Machine Learning: A Probabilistic Perspective by Kevin P. Murphy.

Mathematical Statistics with Resampling and R by Laura M. Chihara and Tim C. Hesterberg.

Other resources

Students are highly encouraged to encourage to consider the following additional resources.

A Mind for Numbers: How to Excel at Math and Science by Barbara Oakley.

Dataquest (Python course online).

DataCamp (Python course online).

Learning Data Science (Kaggle Wiki)

Kaggle Kernels