Skip to content

Latest commit

 

History

History
1300 lines (1159 loc) · 70.5 KB

README.md

File metadata and controls

1300 lines (1159 loc) · 70.5 KB

Yet Another Connected Components Labeling Benchmark

release license contributors

OS Build Compiler OpenCV CMake GPU GitHub Actions Jenkins
Ubuntu
18.04 LTS
x64 gcc 9.3.0 4.1.2 3.13.5 None Build Status N/A
MacOS
(Darwin 19.6.0)
x64 AppleClang 12.0.0
(Xcode-12)
3.1.0 3.13.0 None Build Status N/A
Ubuntu
16.04 LTS
x64 gcc 5.4.0 4.4 3.10.3 2080Ti, CUDA 9.2 N/A Action Status
Ubuntu
20.04.02 LTS
x64 gcc 9.3.0 4.4 3.10.3 2080Ti, CUDA 11.4 N/A Action Status

Please include the following references when citing the YACCLAB project/dataset:

  • Allegretti, Stefano; Bolelli, Federico; Grana, Costantino "Optimized Block-Based Algorithms to Label Connected Components on GPUs." IEEE Transactions on Parallel and Distributed Systems, 2019. BibTex. PDF.

  • Bolelli, Federico; Cancilla, Michele; Baraldi, Lorenzo; Grana, Costantino "Towards Reliable Experiments on the Performance of Connected Components Labeling Algorithms" Journal of Real-Time Image Processing, 2018. BibTex. PDF.

  • Grana, Costantino; Bolelli, Federico; Baraldi, Lorenzo; Vezzani, Roberto "YACCLAB - Yet Another Connected Components Labeling Benchmark" Proceedings of the 23rd International Conference on Pattern Recognition, Cancun, Mexico, 4-8 Dec 2016. BibTex. PDF.

YACCLAB is an open source C++ project that enables researchers to test CCL algorithms under extremely variable points of view, running and testing algorithms on a collection of datasets described below. The benchmark performs the following tests which will be described later in this readme: correctness, average run-time (average), average run-time with steps (average_ws), density, size, granularity and memory accesses (memory).

Notice that 8-connectivity is always used in the project.

Reproducible Research

This project follows the Reproducible Research paradigms and received the Reproducible Label in Pattern Recognition (RLPR).

Requirements

To correctly install and run YACCLAB following packages, libraries and utilities are needed:

GPU algorithms also require:

  • CUDA Toolkit 9.2 or higher (https://developer.nvidia.com/cuda-toolkit) and OpenCV cudafeatures2d package (as of OpenCV 4.5.3, package dependencies entail that required packages for CUDA algorithms are core, cudafeatures2d, cudaarithm, cudafilters, cudaimgproc, cudawarping, cudev, features2d, imgcodecs, imgproc).

Notes for gnuplot:

  • on Windows system: be sure add gnuplot to system path if you want YACCLAB automatically generates charts.
  • on MacOS system: 'pdf terminal' seems to be not available due to old version of cairo, 'postscript' is used instead.

Installation (refer to the image below)

  • Clone the GitHub repository (HTTPS clone URL: https://github.com/prittt/YACCLAB.git) or simply download the full master branch zip file and extract it (e.g YACCLAB folder).

  • Install software in YACCLAB/bin subfolder (suggested) or wherever you want using CMake (point 2 of the example image). Note that CMake should automatically find the OpenCV path whether correctly installed on your OS (3), download the YACCLAB Dataset (be sure to check the box if you want to download it (4a) and (4b) or to select the correct path if the dataset is already on your file system (7)), and create a C++ project for the selected IDE/compiler (9-10). Moreover, if you want to test 3D or GPU algorithms tick the corresponding boxes (5) and (6).

Cmake

  • Set the configuration file (config.yaml) placed in the installation folder (bin in this example) in order to select desired tests.

  • Open the project, compile and run it: the work is done!

CMake Configuration Variables

Name Meaning Default
YACCLAB_DOWNLOAD_DATASET whether to automatically download the 2D YACCLAB dataset or not OFF
YACCLAB_DOWNLOAD_DATASET_3D whether to automatically download the 3D YACCLAB dataset or not OFF
YACCLAB_ENABLE_3D enable/disable the support for 3D algorithms OFF
YACCLAB_ENABLE_CUDA enable/disable CUDA support OFF
YACCLAB_ENABLE_EPDT_19C enable/disable the EPDT_19C 3D algorithm which is based on a heuristic decision tree generated from a 3D mask with 19 conditions (may noticeably increase compilation time), it has no effect when YACCLAB_ENABLE_3D is OFF OFF
YACCLAB_ENABLE_EPDT_22C enable/disable the EPDT_22C 3D algorithm which is based on a heuristic decision tree generated from a 3D mask with 22 conditions (may noticeably increase compilation time), it has no effect when YACCLAB_ENABLE_3D is OFF OFF
YACCLAB_ENABLE_EPDT_26C enable/disable the EPDT_26C 3D algorithm which is based on a heuristic decision tree generated from a 3D mask with 26 conditions (may noticeably increase compilation time), it has no effect when YACCLAB_ENABLE_3D is OFF OFF
YACCLAB_FORCE_CONFIG_GENERATION whether to force the generation of the default configuration file (config.yaml) or not. When this flag is turned OFF any existing configuration file will not be overwritten OFF
YACCLAB_INPUT_DATASET_PATH path to the input dataset folder, where to find test datasets ${CMAKE_INSTALL_PREFIX}/input
YACCLAB_OUTPUT_RESULTS_PATH path to the output folder, where to save output results ${CMAKE_INSTALL_PREFIX}/output
OpenCV_DIR OpenCV installation path -

How to include a YACCLAB algorithm into your own project?

If your project requires a Connected Components Labeling algorithm and you are not interested in the whole YACCLAB benchmark you can use the connectedComponent function of the OpenCV library which implements the BBDT and SAUF algorithms since version 3.2., Spaghetti Labeling algorithm and BKE (for GPU only) since version 4.6.

Anyway, when the connectedComponents function is called, a lot of additional code will be executed together with the core function. If your project requires the best performance you can include an algorithm implemented in YACCLAB adding the following files to your project:

  1. labeling_algorithms.h and labeling_algorithms.cc which define the base class from which every algorithm derives from;
  2. yacclab_tensor.h, yacclab_tensor.cc which define input and output data tensors;
  3. label_solver.h and label_solver.cc which cointain the implementation of labels solving algorithms;
  4. memory_tester.h, performance_evaluator.h, volume_util.h, volume_util.cc, utilities.h, utilities.cc, system_info.h, system_info.cc, check_labeling.h, check_labeling.cc, file_manager.h, file_manager.cc, stream_demultiplexer.h, config_data.h, register.h, yacclab_test.h, progress_bar.h, cuda_mat3.hpp, cuda_types3.hpp, and cuda_mat3.inl.hpp just to make things work without changing the code;
  5. headers and sources files of the required algorithm/s. The association between algorithms and headers/sources files is reported in the tables below.

2D/3D CPU Algorithms

Algorithm Name Authors Year Acronym Required Files Templated on Labels Solver
- L. Di Stefano,
A. Bulgarelli [3]
1999 DiStefano labeling_distefano_1999.h
Contour Tracing F. Chang,
C.J. Chen,
C.J. Lu [1]
1999 CT labeling_fchang_2003.h
Run-Based Two-Scan L. He,
Y. Chao,
K. Suzuki [30]
2008 RBTS labeling_he_2008.h
Scan Array-based with Union Find K. Wu,
E. Otoo,
K. Suzuki [6]
2009 SAUF labeling_wu_2009.h, labeling_wu_2009_tree.inc
Stripe-Based Labeling Algorithm H.L. Zhao,
Y.B. Fan,
T.X. Zhang,
H.S. Sang [8]
2010 SBLA labeling_zhao_2010.h
Block-Based with Decision Tree C. Grana,
D. Borghesani,
R. Cucchiara [4]
2010 BBDT labeling_grana_2010.h, labeling_grana_2010_tree.inc
Configuration Transition Based L. He,
X. Zhao,
Y. Chao,
K. Suzuki [7]
2014 CTB labeling_he_2014.h, labeling_he_2014_graph.inc
Block-Based with Binary Decision Trees W.Y. Chang,
C.C. Chiu,
J.H. Yang [2]
2015 CCIT labeling_wychang_2015.h, labeling_wychang_2015_tree.inc, labeling_wychang_2015_tree_0.inc
Light Speed Labeling L. Cabaret,
L. Lacassagne,
D. Etiemble [5]
2016 LSL_STDI
LSL_STDZII
LSL_RLEIII
labeling_lacassagne_2016.h, labeling_lacassagne_2016_code.inc IV
Pixel Prediction C.Grana,
L. Baraldi,
F. Bolelli [9]
2016 PRED labeling_grana_2016.h, labeling_grana_2016_forest.inc, labeling_grana_2016_forest_0.inc
Directed Rooted Acyclic Graph F. Bolelli,
L. Baraldi,
M. Cancilla,
C. Grana [23]
2018 DRAG labeling_bolelli_2018.h, labeling_grana_2018_drag.inc
Spaghetti Labeling F. Bolelli,
S. Allegretti,
L. Baraldi,
C. Grana [26]
2019 Spaghetti labeling_bolelli_2019.h, labeling_bolelli_2019_forest.inc, labeling_bolelli_2019_forest_firstline.inc, labeling_bolelli_2019_forest_lastline.inc, labeling_bolelli_2019_forest_singleline.inc
PRED++ F. Bolelli,
S. Allegretti,
C. Grana [33]
2021 PREDpp labeling_PREDpp_2021.h, labeling_PREDpp_2021_center_line_forest_code.inc.h, labeling_PREDpp_2021_first_line_forest_code.inc.h
Tagliatelle Labeling F. Bolelli,
S. Allegretti,
C. Grana [33]
2021 Tagliatelle labeling_tagliatelle_2021.h, labeling_tagliatelle_2021_center_line_forest_code.inc.h, labeling_tagliatelle_2021_first_line_forest_code.inc.h, labeling_tagliatelle_2021_last_line_forest_code.inc.h, labeling_tagliatelle_2021_single_line_forest_code.inc.h
Bit-Run Two Scan W. Lee,
F. Bolelli,
S. Allegretti,
C. Grana [32]
2021 BRTSVII labeling_lee_2021_brts.h
Bit-Merge-Run Scan W. Lee,
F. Bolelli,
S. Allegretti,
C. Grana [32]
2021 BMRSVII labeling_lee_2021_bmrs.h
Null Labeling F. Bolelli,
M. Cancilla,
L. Baraldi,
C. Grana [13]
- NULLV labeling_null.h
SAUF 3D F. Bolelli,
S. Allegretti,
C. Grana [33]
2021 SAUF_3D labeling3D_SAUF_2021.h, labeling3D_SAUF_2021_tree_code.inc.h
SAUF++ 3D F. Bolelli,
S. Allegretti,
C. Grana [33]
2021 SAUFpp_3D labeling3D_SAUFpp_2021.h, labeling3D_SAUFpp_2021_tree_code.inc.h
PRED 3D F. Bolelli,
S. Allegretti,
C. Grana [33]
2021 PRED_3D labeling3D_PRED_2021.h, labeling3D_PRED_2021_center_line_forest_code.inc.h, labeling3D_PRED_2021_first_line_forest_code.inc.h, labeling3D_PRED_2021_last_line_forest_code.inc.h, labeling3D_PRED_2021_single_line_forest_code.inc.h
PRED++ 3D F. Bolelli,
S. Allegretti,
C. Grana [33]
2021 PREDpp_3D labeling3D_PREDpp_2021.h, labeling3D_PREDpp_2021_center_line_forest_code.inc.h, labeling3D_PREDpp_2021_first_line_forest_code.inc.h, labeling3D_PREDpp_2021_last_line_forest_code.inc.h, labeling3D_PREDpp_2021_single_line_forest_code.inc.h
Entropy Partitioning Decision Tree RLPR M. Söchting,
S. Allegretti,
F. Bolelli,
C. Grana [31]
2021 EPDT_19c and EPDT_22cVI labeling3D_BBDT_2019.h, labeling_bolelli_2019_forest.inc, labeling_bolelli_2019_forest_firstline.inc, labeling_bolelli_2019_forest_lastline.inc, labeling_bolelli_2019_forest_singleline.inc

I standard version.
II with zero-offset optimization.
III with RLE compression.
IV only on TTA and UF.
V it only copies the pixels from the input image to the output one simply defining a lower bound limit for the execution time of CCL algorithms on a given machine and dataset.
VI EPDT_19c and EPDT_22c algorithms are based on very big decision trees that translate to many lines of C++ code. They may thus noticeably increase the build time. For this reason, a special flag (YACCLAB_ENABLE_EPDT_ALGOS) to enable/disable such algorithms is provided in the CMake file. By default the flag is OFF.
VII CCL algorithm for images in bitonal (1 bit per pixel) format. When applied to these algorithms, the average tests also consider the time for 1 byte to 1 bit per pixel conversion. On the other hand, when performing average with steps tests conversion time is ignored.

2D/3D GPU Algorithms

Algorithm Name Authors Year Acronym Required Files 2D/3D
Union Find V. Oliveira,
R. Lotufo [18]
2010 UF labeling_oliveira_2010.cu 2D and 3D
Optimized
Label Equivalence
O. Kalentev,
A. Rai,
S. Kemnitz,
R. Schneider [19]
2011 OLE labeling_kalentev_2011.cu 2D
Block-run-based P. Chen,
H.L. Zhao,
C. Tao,
H.S. Sang [25]
2011 BRB labeling_chen_2011.cu 2D
Stava O. Stava,
B. Benes [38]
2011 STAVA labeling_stava_2011.cu 2D
Rasmusson A. Rasmusson,
T.S. Sørensen,
G. Ziegler [37]
2013 RASMUSSON labeling_rasmusson_2013.cu 2D
Accelerated CCL F. N. Paravecino,
D. Kaeli [34]
2014 ACCL labeling_paravecino_2014.cu 2D
8-Directional Label Selection Y. Soh,
H. Ashraf,
Y. Hae,
I. Kim [36]
2014 DLS labeling_soh_2014_8DLS.cu 2D
Modified 8-Directional Label Selection Y. Soh,
H. Ashraf,
Y. Hae,
I. Kim [36]
2014 M8DLS labeling_soh_2014_M8DLS.cu 2D
Line-based Union-Find K. Yonehara,
K. Aizawa [39]
2015 LBUF labeling_yonehara_2015.cu 2D
Block Equivalence S. Zavalishin,
I. Safonov,
Y. Bekhtin,
I. Kurilin [20]
2016 BE labeling_zavalishin_2016.cu 2D and 3D
Distanceless
Label Propagation
L. Cabaret,
L. Lacassagne,
D. Etiemble [21]
2017 DLP labeling_cabaret_2017.cu 2D
Komura Equivalence (8-conn) S. Allegretti,
F. Bolelli,
M. Cancilla,
C. Grana [22]
2018 KE labeling_allegretti_2018.cu 2D
Hardware Accelerated
4-connected
A. Hennequin,
L. Lacassagne,
L. Cabaret,
Q. Meunier [35]
2018 HA4 labeling_hennequin_2018_HA4.cu 2D
Hardware Accelerated
8-connected
A. Hennequin,
L. Lacassagne,
L. Cabaret,
Q. Meunier [35]
2018 HA8 labeling_hennequin_2018_HA8.cu 2D
CUDA SAUF S. Allegretti,
F. Bolelli,
M. Cancilla,
C. Grana [29]
2019 C-SAUF labeling_allegretti_2019_SAUF.cu,
labeling_wu_2009_tree.inc
2D
CUDA BBDT S. Allegretti,
F. Bolelli,
M. Cancilla,
C. Grana [29]
2019 C-BBDT labeling_allegretti_2019_BBDT.cu, labeling_grana_2010_tree.inc 2D
CUDA DRAG S. Allegretti,
F. Bolelli,
M. Cancilla,
C. Grana [29]
2019 C-DRAG labeling_allegretti_2019_DRAG.cu 2D
Block-based Union Find S. Allegretti,
F. Bolelli,
C. Grana [24]
2019 BUF labeling_allegretti_2019_BUF.cu 2D and 3D
Block-based Komura Equivalence S. Allegretti,
F. Bolelli,
C. Grana [24]
2019 BKE labeling_allegretti_2019_BKE.cu 2D and 3D

Example of Algorithm Usage Outside the Benchmark

#include "labels_solver.h"
#include "labeling_algorithms.h"
#include "labeling_grana_2010.h" // To include the algorithm code (BBDT in this example)

#include <opencv2/opencv.hpp>

using namespace cv;

int main()
{
    BBDT<UFPC> BBDT_UFPC; // To create an object of the desired algorithm (BBDT in this example)
                          // templated on the labels solving strategy. See the README for the
                          // complete list of the available labels solvers, available algorithms
                          // (N.B. non all the algorithms are templated on the solver) and their
                          // acronyms.

    BBDT_UFPC.img_ = imread("test_image.png", IMREAD_GRAYSCALE); // To load into the CCL object
                                                                 // the BINARY image to be labeled

    threshold(BBDT_UFPC.img_, BBDT_UFPC.img_, 100, 1, THRESH_BINARY); // Just to be sure that the
                                                                      // loaded image is binary

    BBDT_UFPC.PerformLabeling(); // To perform Connected Components Labeling!

    Mat1i output = BBDT_UFPC.img_labels_; // To get the output labeled image  
    unsigned n_labels = BBDT_UFPC.n_labels_; // To get the number of labels found in the input img

    return EXIT_SUCCESS;
}

Configuration File

A YAML configuration file placed in the installation folder lets you specify which kinds of tests should be performed, on which datasets and on which algorithms. Four categories of algorithms are supported: 2D CPU, 2D GPU, 3D CPU and 3D GPU. For each of them, the configuration parameters are reported below.

  • execute - boolean value which specifies whether the current category of algorithms will be tested:
execute:    true
  • perform - dictionary which specifies the kind of tests to perform:
perform:
  correctness:        false
  average:            true
  average_with_steps: false
  density:            false
  granularity:        false
  memory:             false
  blocksize:          false 
  • correctness_tests - dictionary indicating the kind of correctness tests to perform:
correctness_tests:
  eight_connectivity_standard:  true
  eight_connectivity_steps:     true
  eight_connectivity_memory:    true
  eight_connectivity_blocksize: true      
  • tests_number - dictionary which sets the number of runs for each test available:
tests_number:
  average:            10
  average_with_steps: 10
  density:            10
  granularity:        10
  • algorithms - list of algorithms on which apply the chosen tests:
algorithms:
  - SAUF_RemSP
  - SAUF_TTA
  - BBDT_RemSP
  - BBDT_UFPC
  - CT
  - labeling_NULL
  • check_datasets, average_datasets, average_ws_datasets, memory_datasets and blocksize_datasets- lists of datasets on which, respectively, correctness, average, average_ws, memory and blocksize tests should be run:
...
average_datasets: ["3dpes", "fingerprints", "hamlet", "medical", "mirflickr", "tobacco800", "xdocs"]
...
  • blocksize - only for the 2D GPU and 3D GPU categories, this dictionary configures blocksize test parameters. For each axis, a list of three values specifies [<first>, <last>, <step>]:
blocksize:
  x: [2, 64, 2]
  y: [2, 64, 2]
  z: [2, 64, 2]

Finally, the following configuration parameters are common to all categories.

  • paths - dictionary with both input (datasets) and output (results) paths. It is automatically filled by Cmake during the creation of the project:
paths: {input: "<datasets_path>", output: "<output_results_path>"}
  • write_n_labels - whether to report the number of connected components in the output files:
write_n_labels: false
  • color_labels - whether to output a colored version of labeled images during tests:
color_labels: {average: false, density: false}
  • save_middle_tests - dictionary specifying, separately for every test, whether to save the output of single runs, or only a summary of the whole test:
save_middle_tests: {average: false, average_with_steps: false, density: false, granularity: false}

How to Extend YACCLAB with New Algorithms

YACCLAB has been designed with extensibility in mind, so that new resources can be easily integrated into the project. A CCL algorithm is coded with a .h header file (placed in the include folder), a .cc source file (placed in the src folder), and optional additional files containing a tree/drag definition (placed in the include folder).

The source file should be as follows:

#include "<header_file_name>.h"

REGISTER_LABELING_WITH_EQUIVALENCES_SOLVERS(<algorithm_name>);
// Replace the above line with "REGISTER_LABELING(<algorithm_name>);" if the algorithm
// is not template on the equivalence solver algorithm.

The header file should follows the structure below (see include/labeling_bolelli_2018.h to have a complete example):

// [...]

template <typename LabelsSolver> // Remove this line if the algorithm is not template 
                                 // on the equivalence solver algorithm
class <algorithm_name> : public Labeling2D<Connectivity2D::CONN_8> { // the class must extend one of the labeling
                                                     // classes Labeling2D, Labeling3D, .. that
                                                     // are template on the connectivity type
                                                    
public:
    <algorithm_name>() {}

    // This member function should implement the labeling procedure reading data from the
    // input image "img_" (OpenCV Mat1b) and storing labels into the output one "img_labels_"
    // (OpenCV Mat1i)
    void PerformLabeling()
    {
      // [...]

      LabelsSolver::Alloc(UPPER_BOUND_8_CONNECTIVITY); // Memory allocation of the labels solver
      LabelsSolver::Setup(); // Labels solver initialization

      // [...]
      
      LabelsSolver::GetLabel(<label_id>) // To get label value from its index
      LabelsSolver::NewLabel(); // To create a new label

      LabelsSolver::Flatten(); // To flatten the equivalence solver array
    }

    // This member function should implement the with step version of the labeling procedure.
    // This is required to perform tests with steps.
    void PerformLabelingWithSteps()
    {

      double alloc_timing = Alloc(); // Alloc() should be a member function responsible
                                     // for memory allocation of the required data structures

      perf_.start();
      FirstScan(); // FirsScan should be a member function that implements the 
                   // first scan step of the algorithm (if it has one)
      perf_.stop();
      perf_.store(Step(StepType::FIRST_SCAN), perf_.last());

      perf_.start();
      SecondScan(); // SecondScan should be a member function that implements the 
                    // second scan step of the algorithm (if it has one)
      perf_.stop();
      perf_.store(Step(StepType::SECOND_SCAN), perf_.last());

      // If the algorithm does not have a distinct firs and second scan replace the lines
      // above with the following ones:
      // perf_.start();
      // AllScans(); // AllScans() should be a member function which implements the entire
                     // algorithm but the allocation/deallocation 
      // perf_.stop();
      // perf_.store(Step(StepType::ALL_SCANS), perf_.last());

      perf_.start();
      Dealloc(); // Dealloc() should be a member function responsible for memory
                 // deallocation.
      perf_.stop();
      perf_.store(Step(StepType::ALLOC_DEALLOC), perf_.last() + alloc_timing);

      // [...]
    }

    // This member function should implement the labeling procedure using the OpenCV Mat
    // wrapper (MemMat) implemented by YACCLAB 
    void PerformLabelingMem(std::vector<uint64_t>& accesses){
      // [...]
    }

}

When implementing a GPU algorithm only the .cu file is required. The file should be placed in the cuda/src folder. The general structure of a GPU algorithm is the following:

// [...]

// Kernel definitions:

__global__ void <kernel_name_1>(...)
{
  ...
}

__global__ void <kernel_name_2>(...)
{
  ...
}
                                 
class <algorithm_name> : public GpuLabeling2D<Connectivity2D::CONN_8> { // the class must extend one of the labeling
                                                     // classes GpuLabeling2D, GpuLabeling3D, .. that
                                                     // are template on the connectivity type
                                                    
public:
    <algorithm_name>() {}

    // This member function should implement the labeling procedure reading data from the
    // input image "d_img_" (OpenCV cuda::GpuMat) and storing labels into the output one "d_img_labels_"
    // (OpenCV cuda::GpuMat)
    void PerformLabeling()
    {
      // Create the output image
      d_img_labels_.create(d_img_.size(), CV_32SC1);

      // [...]

      // Call necessary kernels
      <kernel_name_1> <<<...>>> (...);

      <kernel_name_2> <<<...>>> (...);

      // [...]
      
      // Wait for the end of the last kernel
      cudaDeviceSynchronize();
    }

    // This member function should implement the with step version of the labeling procedure.
    // This is required to perform tests with steps.
    void PerformLabelingWithSteps()
    {

      double alloc_timing = Alloc(); // Alloc() should be a member function responsible
                                     // for memory allocation of the required data structures

      perf_.start();
      FirstScan(); // FirsScan should be a member function that implements the 
                   // first scan step of the algorithm (if it has one)
      perf_.stop();
      perf_.store(Step(StepType::FIRST_SCAN), perf_.last());

      perf_.start();
      SecondScan(); // SecondScan should be a member function that implements the 
                    // second scan step of the algorithm (if it has one)
      perf_.stop();
      perf_.store(Step(StepType::SECOND_SCAN), perf_.last());

      // If the algorithm does not have a distinct first and second scan replace the lines
      // above with the following ones:
      // perf_.start();
      // AllScans(); // AllScans() should be a member function which implements the entire
                     // algorithm but the allocation/deallocation 
      // perf_.stop();
      // perf_.store(Step(StepType::ALL_SCANS), perf_.last());

      perf_.start();
      Dealloc(); // Dealloc() should be a member function responsible for memory
                 // deallocation.
      perf_.stop();
      perf_.store(Step(StepType::ALLOC_DEALLOC), perf_.last() + alloc_timing);

      // [...]
    }

    void PerformLabelingBlocksize(int x, int y, int z)
    {
      // Create the output image
      d_img_labels_.create(d_img_.size(), CV_32SC1);

      // [...]

      // Call necessary kernels through a macro that measures times separately
      BLOCKSIZE_KERNEL(<kernel_name_1>, <grid_size>, <block_size>, <dynamic_shared_mem>, <arguments>...);

      BLOCKSIZE_KERNEL(<kernel_name_2>, <grid_size>, <block_size>, <dynamic_shared_mem>, <arguments>...);

      // [...]
    }

}

REGISTER_LABELING(<algorithm_name>);

// Only necessary for blocksize test
REGISTER_KERNELS(<algorithm_name>, <kernel_name_1>, <kernel_name_2>, ...);

Once an algorithm has been added to YACCLAB, it is ready to be tested and compared to the others. Don't forget to update the configuration file! We look at YACCLAB as a growing effort towards better reproducibility of CCL algorithms, so implementations of new and existing labeling methods are very welcome.

The YACCLAB Dataset

The YACCLAB dataset includes both synthetic and real images and it is suitable for a wide range of applications, ranging from document processing to surveillance, and features a significant variability in terms of resolution, image density, variance of density, and number of components. All images are provided in 1 bit per pixel PNG format, with 0 (black) being background and 1 (white) being foreground. The dataset will be automatically downloaded by CMake during the installation process as described in the installation paragraph.

2D Datasets

  • MIRflickr [10]:

    Otsu-binarized version of the MIRflickr dataset, publicly available under a Creative Commons License. It contains 25,000 standard resolution images taken from Flickr. These images have an average resolution of 0.17 megapixels, there are few connected components (495 on average) and are generally composed of not too complex patterns, so the labeling is quite easy and fast.

  • Hamlet:

    A set of 104 images scanned from a version of the Hamlet found on Project Gutenberg (http://www.gutenberg.org). Images have an average amount of 2.71 million of pixels to analyze and 1447 components to label, with an average foreground density of 0.0789.

  • Tobacco800 [11],[12]:

    A set of 1290 document images. It is a realistic database for document image analysis research as these documents were collected and scanned using a wide variety of equipment over time. Resolutions of documents in Tobacco800 vary significantly from 150 to 300 DPI and the dimensions of images range from 1200 by 1600 to 2500 by 3200 pixels. Since CCL is one of the initial preprocessing steps in most layout analysis or OCR algorithms, hamlet and tobacco800 allow to test the algorithm performance in such scenarios.

  • 3DPeS [14]:

    It comes from 3DPeS (3D People Surveillance Dataset), a surveillance dataset designed mainly for people re-identification in multi camera systems with non-overlapped fields of view. 3DPeS can be also exploited to test many other tasks, such as people detection, tracking, action analysis and trajectory analysis. The background models for all cameras are provided, so a very basic technique of motion segmentation has been applied to generate the foreground binary masks, i.e., background subtraction and fixed thresholding. The analysis of the foreground masks to remove small connected components and for nearest neighbor matching is a common application for CCL.

  • Medical [15]:

    This dataset is composed by histological images and allow us to cover this fundamental medical field. The process used for nuclei segmentation and binarization is described in [15]. The resulting dataset is a collection of 343 binary histological images with an average amount of 1.21 million of pixels to analyze and 484 components to label.

  • Fingerprints [16]:

    This dataset counts 960 fingerprint images collected by using low-cost optical sensors or synthetically generated. These images were taken from the three Verification Competitions FCV2000, FCV2002 and FCV2004. In order to fit CCL application, fingerprints have been binarized using an adaptive threshold and then negated in order to have foreground pixel with value 255. Most of the original images have a resolution of 500 DPI and their dimensions range from 240 by 320 up to 640 by 480 pixels.

Samples of the YACCLAB 2D (real) datasets. From left to right: 3DPeS, Fingerprints, Medical, MIRflickr, Tobacco800, XDOCS, Hamlet.
  • Synthetic Images:
    • Classical [4]:

      A set of synthetic random noise images who contain black and white random noise with 9 different foreground densities (10% up to 90%), from a low resolution of 32x32 pixels to a maximum resolution of 4096x4096 pixels, allowing to test the scalability and the effectiveness of different approaches when the number of labels gets high. For every combination of size and density, 10 images are provided for a total of 720 images. The resulting subset allows to evaluate performance both in terms of scalability on the number of pixels and on the number of labels (density).

    • Granularity [5] :

      This dataset allows to test algorithms varying not only the pixels density but also their granularity g (i.e., dimension of minimum foreground block), underlying the behaviour of different proposals when the number of provisional labels changes. All the images have a resolution of 2048x2048 and are generated with the Mersenne Twister MT19937 random number generator implemented in the C++ standard and starting with a "seed" equal to zero. Density of the images ranges from 0% to 100% with step of 1% and for every density value 16 images with pixels blocks of gxg with g ∈ [1,16] are generated. Moreover, the procedure has been repeated 10 times for every couple of density-granularity for a total of 16160 images.

Samples of the YACCLAB 2D granularity dataset: reported images have a foreground density of 30% and, from left to right, granularities are 1, 2, 4, 6, 8, 12, 14, 16.

3D Datasets

  • OASIS [27]:

    This is a dataset of medical MRI data taken from the Open Access Series of Imaging Studies (OASIS) project. It consists of 373 volumes of 256 × 256 × 128 pixels, binarized with the Otsu threshold.

  • Mitochondria [28]:

    It is the Electron Microscopy Dataset, which contains binary sections taken from the CA1 hippocampus for a total of three volumes composed by 165 slices with a resolution of 1024 × 768 pixels.

  • Hilbert [24]:

    This dataset contains six volumes of 128 × 128 × 128 pixels, filled with the 3D Hilbert curve obtained at different iterations (1 to 6) of the construction method. The Hilbert curve is a fractal space-filling curve that representsa challenging test case for the labeling algorithms.

Samples of the YACCLAB 3D datasets. From left to right we have the Hilbert space-filling curve, the OASIS dataset and Mitochondria medical imaging data.
  • Granularity [24]:

    It contains 3D synthetic images generated as described for the 2D version. In this case, images have a resolution of 256 x 256 x 256 pixels and only three different images for every couple of density-granularity have been generated.

Samples of the YACCLAB 3D granularity dataset: reported images have a foreground density of 2% and, from left to right, granularities are 4, 8, 16.

Available Tests

  • Average run-time tests:

    execute an algorithm on every image of a dataset. The process can be repeated more times in a single test, to get the minimum execution time for each image: this allows to get more reproducible results and overlook delays produced by other running processes. It is also possible to compare the execution speed of different algorithms on the same dataset: in this case, selected algorithms (see Configuration File for more details) are executed sequentially on every image of the dataset. Results are presented in three different formats: a plain text file, histogram charts (.pdf/.ps), either in color or in gray-scale, and a LaTeX table, which can be directly included in research papers.

  • Average run-time tests with steps:

    evaluates the performance of an algorithm separating the allocation/deallocation time from the time required to compute labeling. Moreover, if an algorithm employs multiple scans to produce the correct output labels, YACCLAB will store the time of every scan and will display them separately. To understand how YACCLAB computes the memory allocation time for an algorithm on a reference image, it is important to underline the subtleties involved in the allocation process. Indeed, all modern operating systems (not real-time, nor embedded ones, but certainly Windows and Unix) handle virtual memory exploiting a demand paging technique, i.e demand paging with no pre-paging for most of Unix OS and cluster demand paging for Windows OS. This means that a disk page is copied into physical memory only when it is accessed by a process the first time, and not when the allocation function is called. Therefore, it is not possible to calculate the exact allocation time required by an algorithm, which computes CCL on a reference image, but its upper bound can be estimated using the following approach:

    • forcing the allocation of the entire memory by reserving it (malloc), filling it with zeros (memset), and tracing the time;
    • calculating the time required by the assignment operation (memset), and subtracting it from the one obtained at the previous step;
    • repeating the previous points for all data structures needed by an algorithm and summing times together.

    This will produce an upper bound of the allocation time because caches may reduce the second assignment operation, increasing the estimated allocation time. Moreover, in real cases, CCL algorithms may reserve more memory than they really need, but the demand paging, differently from our measuring system, will allocate only the accessed pages.

  • Density and size tests:

    check the performance of different CCL algorithms when they are executed on images with varying foreground density and size. To this aim, a list of algorithms selected by the user is run sequentially on every image of the test_random dataset. As for run-time tests, it is possible to repeat this test for more than one run. The output is presented as both plain text and charts(.pdf/.ps). For a density test, the mean execution time of each algorithm is reported for densities ranging from 10% up to 90%, while for a size test the same is reported for resolutions ranging from 32 x 32 up to 4096 x 4096.

  • Memory tests:

    are useful to understand the reason for the good performances of an algorithm or in general to explain its behavior. Memory tests compute the average number of accesses to the label image (i.e the image used to store the provisional and then the final labels for the connected components), the average number of accesses to the binary image to be labeled, and, finally, the average number of accesses to data structures used to solve the equivalences between label classes. Moreover, if an algorithm requires extra data, memory tests summarize them as ``other'' accesses and return the average. Furthermore, all average contributions of an algorithm and dataset are summed together in order to show the total amount of memory accesses. Since counting the number of memory accesses imposes additional computations, functions implementing memory access tests are different from those implementing run-time and density tests, to keep run-time tests as objective as possible.

  • Granularity tests:

    evaluates an algorithm varying density (from 1% to 100%, using a 1% step) and pixels granularity, but not images resolution. The output results display the average execution time over images with the same density and granularity.

  • Blocksize tests:

    this test, which only makes sense for CUDA algorithms, is aimed at finding the best block size for each kernel with grid search parameter optimization. The range of values for each block axis can be specified in the configuration file. Given a set of CUDA algorithms, the blocksize test reports execution times of each kernel on one or multiple datasets, repeating the measurement for every different block size. Results are presented in a csv file. For every kernel, dataset and block size, the total execution time in ms is reported.

Examples of YACCLAB Output Results

Fingerprints XDOCS

Contributors

Thanks goes to these wonderful people (emoji key):


Federico Bolelli

💻 📆 🚧 🚇 🤔

Stefano Allegretti

💻 🚧 🐛 🤔 🚇

Costantino Grana

💻 📆 🤔 🚇

Michele Cancilla

💻 📦 🚧

Lorenzo Baraldi

💻 📦

Maximilian Söchting

💻

patrickhwood

🐛

WalnutVision

🐛

This project follows the all-contributors specification. Contributions of any kind welcome.

References

[1]

F. Chang, C.-J. Chen, and C.-J. Lu, “A linear-time component-labeling algorithm using contour tracing technique,” Computer Vision and Image Understanding, vol. 93, no. 2, pp. 206–220, 2004.

[2]

W.-Y. Chang, C.-C. Chiu, and J.-H. Yang, “Block-based connected-component labeling algorithm using binary decision trees,” Sensors, vol. 15, no. 9, pp. 23 763–23 787, 2015.

[3]

L. Di Stefano and A. Bulgarelli, “A Simple and Efficient Connected Components Labeling Algorithm,” in International Conference on Image Analysis and Processing. IEEE, 1999, pp. 322–327.

[4]

C. Grana, D. Borghesani, and R. Cucchiara, “Optimized Block-based Connected Components Labeling with Decision Trees,” IEEE Transac-tions on Image Processing, vol. 19, no. 6, pp. 1596–1609, 2010.

[5]

L. Lacassagne and B. Zavidovique, “Light speed labeling: efficient connected component labeling on risc architectures,” Journal of Real-Time Image Processing, vol. 6, no. 2, pp. 117–135, 2011.

[6]

K. Wu, E. Otoo, and K. Suzuki, "Optimizing two-pass connected-component labeling algorithms,” Pattern Analysis and Applications," vol. 12, no. 2, pp. 117–135, 2009.

[7]

L. He, X. Zhao, Y. Chao, and K. Suzuki, "Configuration-Transition-Based Connected-Component Labeling", IEEE Transactions on Image Processing, vol. 23, no. 2, pp. 943–951, 2014.

[8]

H. Zhao, Y. Fan, T. Zhang, and H. Sang, "Stripe-based connected components labelling," Electronics letters, vol. 46, no. 21, pp. 1434–1436, 2010.

[9]

C. Grana, L. Baraldi, and F. Bolelli, "Optimized Connected Components Labeling with Pixel Prediction," in Advanced Concepts for Intelligent Vision Systems, 2016, pp. 431-440.

[10]

M. J. Huiskes and M. S. Lew, “The MIR Flickr Retrieval Evaluation,” in MIR ’08: Proceedings of the 2008 ACM International Conference on Multimedia Information Retrieval. New York, NY, USA: ACM, 2008.

[11]

G. Agam, S. Argamon, O. Frieder, D. Grossman, and D. Lewis, “The Complex Document Image Processing (CDIP) Test Collection Project,” Illinois Institute of Technology, 2006.

[12]

D. Lewis, G. Agam, S. Argamon, O. Frieder, D. Grossman, and J. Heard, “Building a test collection for complex document information processing,” in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2006, pp. 665–666.

[13]

F. Bolelli, M. Cancilla, L. Baraldi, C. Grana, "Towards Reliable Experiments on the Performance of Connected Components Labeling Algorithms," Journal of Real-Time Image Processing, 2018.

[14]

D. Baltieri, R. Vezzani, and R. Cucchiara, “3DPeS: 3D People Dataset for Surveillance and Forensics,” in Proceedings of the 2011 joint ACM workshop on Human gesture and behavior understanding. ACM, 2011, pp. 59–64.

[15]

F. Dong, H. Irshad, E.-Y. Oh, M. F. Lerwill, E. F. Brachtel, N. C. Jones, N. W. Knoblauch, L. Montaser-Kouhsari, N. B. Johnson, L. K. Rao et al., “Computational Pathology to Discriminate Benign from Malignant Intraductal Proliferations of the Breast,” PloS one, vol. 9, no. 12, p. e114885, 2014.

[16]

D. Maltoni, D. Maio, A. Jain, and S. Prabhakar, "Handbook of fingerprint recognition", Springer Science & Business Media, 2009.

[17]

C.Grana, F.Bolelli, L.Baraldi, and R.Vezzani, "YACCLAB - Yet Another Connected Components Labeling Benchmark," Proceedings of the 23rd International Conference on Pattern Recognition, Cancun, Mexico, 4-8 Dec 2016, 2016.

[18]

V. Oliveira and R. Lotufo, "A study on connected components labeling algorithms using GPUs," in SIBGRAPI. vol. 3, p. 4, 2010.

[19]

O. Kalentev, A. Rai, S. Kemnitz, R. Schneider," Connected component labeling on a 2D grid using CUDA," in Journal of Parallel and Distributed Computing 71(4), 615–620, 2011.

[20]

S. Zavalishin, I. Safonov, Y. Bekhtin, I. Kurilin, "Block Equivalence Algorithm for Labeling 2D and 3D Images on GPU," in Electronic Imaging 2016(2), 1–7, 2016.

[21]

L. Cabaret, L. Lacassagne, D. Etiemble, "Distanceless Label Propagation: an Efficient Direct Connected Component Labeling Algorithm for GPUs," in Seventh International Conference on Image Processing Theory, Tools and Applications, IPTA, 2017.

[22]

S. Allegretti, F. Bolelli, M. Cancilla, C. Grana, "Optimizing GPU-Based Connected Components Labeling Algorithms," in Third IEEE International Conference on Image Processing, Applications and Systems, IPAS, 2018.

[23]

F. Bolelli, L. Baraldi, M. Cancilla, C. Grana, "Connected Components Labeling on DRAGs," in International Conference on Pattern Recognition, 2018, pp. 121-126.

[24]

S. Allegretti, F. Bolelli, C. Grana, "Optimized Block-Based Algorithms to Label Connected Components on GPUs," in IEEE Transactions on Parallel and Distributed Systems, 2019.

[25]

P. Chen, H. Zhao, C. Tao, H. Sang, "Block-run-based connected component labelling algorithm for gpgpu using shared memory." Electronics Letters, 2011

[26]

F. Bolelli, S. Allegretti, L. Baraldi, and C. Grana, "Spaghetti Labeling: Directed Acyclic Graphs for Block-Based Bonnected Components Labeling," IEEE Transactions on Image Processing, vol. 29, no. 1, pp. 1999-2012, 2019.

[27]

D. S. Marcus, A. F. Fotenos, J. G. Csernansky, J. C. Morris, R. L. Buckner, “Open Access Series of Imaging Studies (OASIS): Longitudinal MRI Data in Nondemented and Demented OlderAdults,” J. Cognitive Neurosci., vol. 22, no. 12, pp. 2677–2684, 2010.

[28]

A. Lucchi, Y. Li, and P. Fua, “Learning for Structured Prediction Using Approximate Subgradient Descent with Working Sets,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 1987–1994.

[29]

S. Allegretti, F, Bolelli, M. Cancilla, F. Pollastri, L. Canalini, C. Grana, "How does Connected Components Labeling with Decision Trees perform on GPUs?," In 18th International Conference on Computer Analysis of Images and Patterns, 2019.

[30]

L. He, Y. Chao, K. Suzuki. "A run-based two-scan labeling algorithm." IEEE Transactions on Image Processing, 2008.

[31]

M. Söchting, S. Allegretti, F. Bolelli, C. Grana. "A Heuristic-Based Decision Tree for Connected Components Labeling of 3D Volumes." 25th International Conference on Pattern Recognition, 2021

[32]

W. Lee, F. Bolelli, S. Allegretti, C. Grana. "Fast Run-Based Connected Components Labeling for Bitonal Images." 5th International Conference on Imaging, Vision & Pattern Recognition, 2021

[33]

F. Bolelli, S. Allegretti, C. Grana. "One DAG to Rule Them All." IEEE Transactions on Pattern Analisys and Machine Intelligence, 2021

[34]

F. N. Paravecino, D. Kaeli, "Accelerated Connected Component Labeling Using CUDA Framework." International Conference on Computer Vision and Graphics, ICCVG, 2014

[35]

A. Hennequin, L. Lacassagne, L. Cabaret, Q. Meunier, "A new Direct Connected Component Labeling and Analysis Algorithms for GPUs", DASIP, 2018

[36]

Y. So, H. Ashraf, Y. Hae, I. Kim, "Fast Parallel Connected Component Labeling Algorithm Using CUDA Based On 8-Directional Label Selection", International Journal of Latest Research in Science and Technology, 2014

[37]

A. Rasmusson, T.S. Sørensen, G. Ziegler, "Connected Components Labeling on the GPU with Generalization to Voronoi Diagrams and Signed Distance Fields", International Symposium on Visual Computing, 2013

[38]

O. Stava, B. Benes, "Connected Components Labeling in CUDA", GPU Computing Gems, 2011

[39]

K. Yonehara, K. Aizawa, "A Line-Based Connected Component Labeling Algorithm Using GPUs", Third International Symposium on Computing and Networking, 2015