Skip to content

An artificial data generator with a control over linear separability.

Notifications You must be signed in to change notification settings

nejci/data-generator-lin-sep

Repository files navigation

Data generator with a control over linear separability

Logo

We propose a new data generator that is useful for a systematic benchmarking of algorithms for classification and clustering.

Features

  • A user can adjust:
    • how many pairs of classes must be linearly non-separable
    • the number of classes
    • the number of data-points inside a class
    • the probability distribution of data-points
    • the minimal distance between each pair of classes
    • the shape of a point-set that forms a class
  • 38 different shapes of classes of various difficulty levels are available.
  • The output is a two-dimensional dataset.
  • It is easy to use the generator in a batch mode by calling the function createDataset() with different parameters.

Getting started

See the folder examples for some demonstrational examples or run the script examples.m that generates them.

Publications

Acknowledgements

In this project we reused the code from: