The tasks are divided into three main problems:
- Kernel Density Estimation - Estimating probability densities using Gaussian kernels for a uniformly distributed random variable.
- Kernel-Based Classification - Building a classifier to distinguish between two sets of data points (stars and circles) using kernel-based methods and optimization.
- K-Means Clustering - Grouping data into clusters using K-Means and improving clustering accuracy through dimensional extension.
- Generates 1000 samples of a uniformly distributed random variable.
- Approximates the probability density function using Gaussian kernels with different bandwidths (
h
values). - Analyzes the effect of different bandwidths on the density estimation.
- Uses two sets of 2D data points (stars and circles) and builds a classifier using a Gaussian kernel.
- Solves an optimization problem to minimize classification error while controlling the smoothness of the classifier.
- Demonstrates the decision boundary for different values of
h
(bandwidth) andλ
(regularization parameter).
- Implements the K-Means algorithm to group data into clusters based on proximity to randomly chosen representatives.
- Improves clustering performance by extending the data with an additional dimension.
- Compares results before and after dimensional extension.
data32.mat
,data33.mat
: Data files used for kernel-based classification and K-Means clustering.ex1.m
,ex2.m
,ex3.m
: MATLAB scripts corresponding to each problem.
-
Setup
- Ensure MATLAB is installed.
- Load data files using
load('data32.mat')
orload('data33.mat')
.
-
Running the Scripts
ex1.m
: Runs kernel density estimation.ex2.m
: Executes kernel-based classification.ex3.m
: Performs K-Means clustering.
-
Visualization
- Results are displayed as plots showing the density approximation, decision boundaries, and clustering results.
- Method: Gaussian kernels with varying bandwidths to estimate probability densities.
- Data: 2D points representing stars (labeled as 1) and circles (labeled as -1).
- Optimization Problem: Minimizes error while controlling smoothness using a kernel function.
- Standard K-Means: Groups data based on proximity to cluster centers.
- Extended Dimension: Adds a new coordinate based on the norm of data points to improve clustering.