Utility-of-Synthetic-Data-in-Machine-learning-tasks.

In this repository, I tried to investigate the utility of synthetic data generated by DataSynthesizer and Synthetic Data Vault in machine learning tasks. I applied the Random Forest, Logistic Regression, Support Vector Machine, K-Nearest Neighbor, and Naive Bayes algorithms to the synthetic data and made a comparison.

I used Adult (Census Income), Banknote Authentication, Iris, Social Network Ads and Titanic datasets. My main motivation was "On the Utility of Synthetic Data: An Empirical Evaluation on Machine Learning Tasks" paper by M.Hittmeir, A.Ekelhart and R.Mayer.

Links to datasets:

Reference: Markus Hittmeir, Andreas Ekelhart and Rudolf Mayer. 2019. On the Utility of Synthetic Data: An Empirical Evaluation on Machine Learning Tasks

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Adult		Adult
BankNote Authentication		BankNote Authentication
IRIS		IRIS
Social Network		Social Network
Titanic		Titanic
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Utility-of-Synthetic-Data-in-Machine-learning-tasks.

About

Releases

Packages

Languages

AliValiyev/Utility-of-Synthetic-Data-in-Machine-learning-tasks.

Folders and files

Latest commit

History

Repository files navigation

Utility-of-Synthetic-Data-in-Machine-learning-tasks.

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages