GitHub - antoniopenta/ml-project-structure: Easy and powerful template for ML projects

Easy and powerful template for ML projects

Prerequisites

pip install -r requirements.txt

Getting Started

The following command creates a structure specified in template.json for your ML projects

python build.py -dir project_example/ -template_file  template.json

An example of structure that can be defined in the template.json

{
  "num_algorithm": 2,
  "num_training_testing_validation": 2,
  "directory_training_suffix": "training",
  "datasets": [
    "dataset_1"
  ],
  "main_directories": [
    "config_pipelines",
    "data_experiments",
    "@framework",
    "@jupyter",
    "@luigi_pipeline"
  ],
  "sub_directories": [
    {
      "father": "config_pipelines",
      "dirs": [
        "directory_algoritm_suffix*num_algorithm",
        "data_generation",
        "data_processing",
        "metrics"
      ]
    }
  ]
}

@ in "@framework" is used to specify if the folder is a python module
num_algorithm is used to specify how many algorithms you would like to test
"directory_algoritm_suffix*num_algorithm" is used to generate multiple folders where the * suffix(directory_algoritm_suffix) and the number (num_algorithm) are specified in the template too.

Luigi Pipeline for experiments

In the folder pipeline_example, there is an dummy example of how to use Luigi pipeline for evaluating a KMeans algorithm.

More info on the amazing framework Luigi ( or Gigino from friends in Naples) can be found here: https://github.com/spotify/luigi

The main idea is to define the experiments using excel as follows:

experiment	diminstance	clusters	n_features	random_state	file_dataframe	file_label_true	k	file_label_predicted	file_metrics
1	100@data_generation	10@data_generation	5@data_generation	0@data_generation	data_experiments/data_generation/file_dataframe.csv@file	data_experiments/data_generation/file_label_true.csv@file	10@kmeansalgo0	data_experiments/algorithm0/file_label_predicted_algorithm0_1.csv@file	data_experiments/metrics/metrics_algorithm_1.csv@file
2	100@data_generation	10@data_generation	5@data_generation	0@data_generation	data_experiments/data_generation/file_dataframe.csv@file	data_experiments/data_generation/file_label_true.csv@file	20@kmeansalgo0	data_experiments/algorithm0/file_label_predicted_algorithm0_2.csv@file	data_experiments/metrics/metrics_algorithm_2.csv@file
3	100@data_generation	10@data_generation	5@data_generation	0@data_generation	data_experiments/data_generation/file_dataframe.csv@file	data_experiments/data_generation/file_label_true.csv@file	30@kmeansalgo0	data_experiments/algorithm0/file_label_predicted_algorithm0_3.csv@file	data_experiments/metrics/metrics_algorithm_3.csv@file

Each row is an experiment Each column is an attribute of the configuration file @ is used to defined the key of the dictorany in the configuration file. For example :

experiment	k
1	10@kmeansalgo0

becomes in a configuration file :

[kmeansalgo0]
k = 30

The extraction of the configuration file from the excel file is done using the python script update_config_files.py

The bash file exp_cluster.sh is used to run the pipeline:

This is used to create the configuration file using the data defined in the experiment 1

python scripts/update_config_files.py -excel_file experimental_settings/experiments_metafile.xlsx -sheet exp_cluster -experiment 1 -conf_file config_pipelines/data_generation/evaluation_pipeline.conf

Then the pipeline is lunched using the configuration file created above:

luigi --module luigi_pipeline.evaluation_pipeline   GenerateData  --conf config_pipelines/data_generation/evaluation_pipeline.conf  --local-scheduler --no-lock

Authors

Antonio Penta

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
pipeline_example		pipeline_example
project_example		project_example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.py		build.py
requirements.txt		requirements.txt
template.json		template.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prerequisites

Getting Started

Luigi Pipeline for experiments

Authors

About

Releases

Packages

Languages

License

antoniopenta/ml-project-structure

Folders and files

Latest commit

History

Repository files navigation

Prerequisites

Getting Started

Luigi Pipeline for experiments

Authors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages