Anomaly Detection for Tabular Data
Fraud detection is a popular application of anomaly detection. Since the number of fraud cases is minimal compared to non-fraud cases, there is a need to use outlier detection. In this repo, we will go over a popular dataset known as the "Credit Card Fraud Detection" dataset.
Here are some features of the dataset:
-
Contains 284k transactions in Europe of various amounts using their credit card
-
Each transaction is categorized into 2 classes: Fraud and non-fraud
-
Features include amount per transaction. Most features have been anonymized due to confidentiality issues. Anonymization was done using Prinicpal Component Analysis.
-
There is a huge imbalance with the dataset, only 0.172% of cases are considered fraudulent.
-
Need to deal with imbalance using correct metrics (i.e., no accuracy)