This repository contains the code, notebooks and dataset used for the models trained in the thesis "Modelo para la detección de cáncer de seno en imágenes histológicas a partir de aprendizaje profundo con múltiples anotadores".
Made by Jhoan Buitrago and Juan González, with the help of our advisor Julián Gil González.
The original dataset comes from the "Breast Cancer Semantic Segmentation (BCSS) dataset".
The preprocessed dataset comes from "Learning from crowds in digital pathology using scalable variational Gaussian processes" and the data can be found in this google drive folder.
The final data used for the training of the models of this work can be found in this link. The data is stored in a zip file which contains npy files of the preprocessed dataset after doing feature extraction with a VGG16.
-
npy files can be read using numpy.load
-
utils.py
has functions for loading npy files with their corresponding labels.
In /data/pkl
you can find the pickle (.pkl) files for majority voting and crowdsourced labels. These files have the annotations and labels.
-
pkl files can be read using pandas.read_pickle.
-
utils.py
has functions for loading the labels of gold standard, majority voting and multiple annotators.
-
/notebooks
has jupyter notebooks used for the training of multiple models/notebooks/old_notebooks
has notebooks with trainings of previous models using different methodologies that were discarded for a variaty of reasons. The main reason being that they were very time consuming to train with the available hardware.
-
/data/pkl
has .pkl files with the majority voting and crowdsourced labels. -
grid_search.py
has functions for performing grid search and saving the results of the model evaluation. -
utils.py
contains general functions and utilities for reading and loading data.
The results of the evaluations of each model are stored in JSON files which can be found in this google drive folder.
The files contain a list of each of the trained models with their respective hiperparameters and the evaluation reports of each of the 10 repetitions.