collection of classnotes, and class projects from MOOCs I took.
Deep Learning notebook
- Optimizing a neural network with backward propagation
- Building deep learning models with keras
- Fine-tuning keras models 100%
Unsupervised Learning in Python notebook
- Clustering for dataset exploration: k-means clustering, Evaluating a clustering, Transforming features for better clusterings
- Visualization with hierarchical clustering (dendogram) and t-SNE
- Decorrelating data and dimension reduction: Principal Component Analysis" ("PCA"), PCA with sparse matrix
- Discovering interpretable features: dimension reduction technique called "Non-negative matrix factorization" ("NMF")
- Use NMF to build a recommendation system
Fraud Detection in Python notebook
- Resample methods for imbalance data: over sampling, under sampling, SMOTE method to
- Fraud detection using labeled data: supervised learning for fraud detection, Performance metrics for fraud detection, Adjusting algorithm weights, and Using ensemble methods to improve fraud detection
- Clustering methods for fraud detection ( KMeans, and MiniBatchKMeans), Elbow curve method to judge the right amount of clusters, Assigning fraud versus non-fraud, and DBscan
- Incooperate text data into fraud detection
- Topic Modeling on Fraud: Latent Dirichlet Allocation(LDA)
Data Visualization Class notebook
- Customizing 1D plots: apply ggplot style, reset style to default, add arrow to annotate a graph, rotate axis, legend
- Plotting 2D arrays: contour plot, 2d histrogram, plot images, histrogram and cumulative distribution function of a gray scale image, Equalizing an image histogram, Extracting bivariate histograms from a color image.
- Statistical plots with Seaborn: lmplot, residplot, regplot, jointplot, hue, violinplot, striplot, swamplot, pairplot, heatmap
- Analyzing time series: plot data with datetime index, multiple time slices, inset view
Interactive Visualization with Bokeh notebook
- Basics Bokeh: maker options, drawing geometrical shape using patch(), plotting pandas dataframe in bokeh, box_select tool, Hover tool, Colormap
- Building interactive apps with Bokeh: connet Bokeh widgets to a python code.
For example, generate fit after user select a plot, or change plotting data from a selection panel. Widget options include slider, select (dropdown), button etc.
Time Series Analysis notebook
- Merging Time Series With Different Dates
- Correlation, autocorrelation function
- Linear Regression
- Random Walk
- Stationarity, autoregressive (AR) Models
- Moving Average (MA) Model
- ARMA model
- Cointegration Models
- A Multivariate Time Series
Machine Learning for Time Series Data in Python notebook
- Classification heartbeat sounds: feature engineering and LinearSVC
- Regression stock prices
- Feature engineer time series data: envelope, tempogram, spectrogram, bandwidths, centroids
- Auto-regressive models
- cross-validating time series data
- How to work with non-stationary data, and assesting model stability
-
Analyzed data from the popular mobile game, Cookie Cats. Used bootstrap analysis to compare effectiveness of time pause at level 30 and 40 toward user retention notebook
-
Statistical Analysis in Python: random number generator and hacker statistics Bernoulli trials, Poisson distribution, normal distribution, exponential distribution, Probability function, Generate bootstrap replicates, calculate bootstrap confidence intervals, pairs boostrap, Formulating and simulating a null hypothesis, Pipeline for hypothesis testing, A/B testing, Hypothesis test for correlation coefficient notebook
Inferential Statistic notebook
- Variance, Covariance, and Correlation, Correlation tests: Pearson, Spearman rank, and Kendall Tau
- Chi-square Test of Independence
- McNemar test, Independent T-test, Paired Samples t-test, Welch’s t-test, Wilcoxon Sign-Ranked Test
- Analysis of Variance (ANOVA), ANOVA (2-way, N-way)
- Multiple Linear Regression, Logistic Regression