Some practices using statistical machine learning technique based on some dataset.
To see more detail or example about deep learning, you can checkout my Deep Learning repository.
- Using Python 3
(most of the relative path links are according to the repository root)
numpy
: For low-level math operationspandas
: For data manipulationsklearn
- Scikit Learn: For evaluation metrics, some data preprocessing
For comparison purpose
sklearn
: For machine learning modelscvxopt
: For convex optimization problem (for SVM)
NLP related
gensim
: Topic Modellinghmmlearn
: Hidden Markov Models in Python, with scikit-learn like APIjieba
: Chinese text segementation librarypyHanLP
: Chinese NLP library (Python API)nltk
: Natural Language Toolkit
- Surpervised Learning
- Classification - Discrete
- Regression - Continuous
- Unsupervised Learning
- Clustering - Discrete
- Dimensionality Reduction - Continuous
- Association Rule Learning
- Semi-supervised Learning
- Reinforcement Learning
- Classification
Logistic Regression
(optimization algo.)k-Nearest Neighbors (kNN)
Support Vector Machine (SVM)
- Deduction (optimization algo.)Naive Bayes
Decision Tree (ID3, C4.5, CART)
- Regression
Linear Regression
(optimization algo.)Tree (CART)
- Clustering
k-Means
Hierarchical Clustering
- Association Rule Learning
- Dimensionality Reduction
Principal Compnent Analysis (PCA)
Single Value Decomposition (SVD)
- LSA, LSI, Recommendation SystemISOMAP
- Bagging
Random Forests
- Boosting
AdaBoost
<- With some basic boosting notesGradient Boosting
Gradient Boosting Decision Tree (GBDT)
(aka. Multiple Additive Regression Tree (MART))
XGBoost
Hidden Markov Model (HMM)
Bayesian Network
(aka. Probabilistic Directed Acyclic Graphical Model)Conditional Random Field (CRF)
Probabilistic Latent Semantic Analysis (PLSA)
Latent Dirichlet Allocation (LDA)
Vector Space Model (VSM)
- Classification
- Data Preprocessing
- Real-world Problem
- Evaluation Metrics
- Binary to Multi-class Expension
- Regression
- Evaluation Metrics
- Clustering
- Evaluation Metrics
- Data Mining - Knowledge Discovering
- Recommendation System
- Collaborative Filtering
- Information Retrieval - Topic Modelling
- Latent Semantic Analysis (LSA/LSI/SVD)
- Latent Dirichlet Allocation (LDA)
- Random Projections (RP)
- Hierarchical Dirichlet Process (HDP)
- word2vec
- Kernel Usages
- Convex Optimization
- Linear Algebra
- Orthogonality
- Eigenvalues
- Hessian Matrix
- Quadratic Form
- Markov Chain - HMM
- Calculus
- Multivariable Deratives
- Quadratic Approximations
- Lagrange Multipliers and Constrained Optimization - SVM SMO
- Lagrange Duality
- Multivariable Deratives
- Probability and Statistics
- Statistical Estimation
- Algebra
- Trigonometry
(from A to Z)
- Decision Tree
- Entropy
- HMM
- Markov Chain
- Naive Bayes
- Bayes' Theorem
- PCA
- Orthogonal Transformations
- Eigenvalues
- SVD
- Eigenvalues
- SVM
- Convex Optimization
- Constrained Optimization
- Lagrange Multipliers
- Kernel
- Machine Learning in Action
- 統計學習方法 (李航)
- 機器學習 (周志華)
- Linear Algebra with Applications (Steven Leon)
- Convex Optimization (Stephen Boyd & Lieven Vandenberghe)
- Numerical Linear Algebra (L. Trefethen & D. Bau III)
- Google - Machine Learning Recipes with Josh Gordon
- Youtube - Machine Learning Fun and Easy
- Siraj Raval - The Math of Intelligence
- bilibili - 機器學習 - 白板推導系列
- bilibili - 機器學習升級版
- ApacheCN (ML, DL, NLP)
- Machine learning 101 (infographics)
- Google Machine Learning Crash Course
- Kaggle Learn Machine Learning
- Microsoft Professional Program - Artificial Intelligence track
Textbook Implementation
- Machine Learning in Action
- Learning From Data
- 統計學習方法 (李航)
- Stanford Andrew Ng CS229
- UCI Machine Learning Repository
- Awesome Public Datasets
- Kaggle Datasets
- The MNIST Database of handwritten digits
- 資料集平台 Data Market
- AI Challenger Datasets
- Peking University Open Research Data
- Open Images Dataset
- Alibaba Cloud Tianchi Data Lab
- Extension plugin -
pip install jupyter_contrib_nbextensions
- VIM binding
- Codefolding
- ExecuteTime
- Notify
- Jupyter Theme -
pip install --upgrade jupyterthemes