Skip to content

Training data format

Chris Churas edited this page Dec 9, 2016 · 7 revisions

This page describes the expected format of the training data which is used as input for training CHM.

CHM Training Input Data Format

CHM is a supervised learning algorithm which needs to be "taught" how to segment the desired features. To "teach" CHM what to segment training examples must be passed to CHM Train.

Training Example

A single training example is composed of two 8 bit images.

The first image is known as the "image" (8-bit grayscale) and is a representative subsection of the data to be run.

The second image is known as the "label" (8-bit binary ie values 0 or 1) and contains a mask (value 1) denoting the areas in the "image" CHM should learn to distinguish.

Below is a picture of an Image and a Label and the two overlayed together. In the Label the areas in white are areas with value 1 which denote the mask or areas that CHM should learn to segment.

Schematic of how training and label images are linked

Image Format

This image should be an 8-bit grayscale PNG file

Example of 8-bit grayscale png

Label Format

The label should be an 8-bit grayscale PNG file with the added restriction that values in the file can only be 0 for non features and 1 for regions CHM should train on.

NOTE: Image below may appear black which isn't surprising cause the labels have a value of 1 Example of 8-bit binary label image

Directory structure of Training Dataset

CHM compatible training dataset consists of two directories images and labels. Each training example PNG file should have the same name. The Image file should go into the images directory and the Label file should go into the labels directory. They should have .png extension.

Here is a graphic showing the desired structure:

Structure of images and labels directory and files