Changes to BreaKHis & GleasonArvaniti (#759)

kaiko-ai · Feb 13, 2025 · c59d1e5 · c59d1e5
1 parent 17549e3
commit c59d1e5
Show file tree

Hide file tree

Showing 12 changed files with 48 additions and 66 deletions.
diff --git a/configs/vision/pathology/offline/classification/breakhis.yaml b/configs/vision/pathology/offline/classification/breakhis.yaml
@@ -51,7 +51,7 @@ model:
       class_path: torch.nn.Linear
       init_args:
         in_features: ${oc.env:IN_FEATURES, 384}
-        out_features: &NUM_CLASSES 8
+        out_features: &NUM_CLASSES 4
     criterion: torch.nn.CrossEntropyLoss
     optimizer:
       class_path: torch.optim.AdamW

diff --git a/configs/vision/pathology/offline/classification/gleason_arvaniti.yaml b/configs/vision/pathology/offline/classification/gleason_arvaniti.yaml
@@ -33,7 +33,6 @@ trainer:
           dataloader_idx_map:
             0: train
             1: val
-            2: test
           backbone:
             class_path: eva.vision.models.ModelFromRegistry
             init_args:
@@ -84,11 +83,6 @@ data:
         init_args:
           <<: *DATASET_ARGS
           split: val
-      test:
-        class_path: eva.datasets.EmbeddingsClassificationDataset
-        init_args:
-          <<: *DATASET_ARGS
-          split: test
       predict:
         - class_path: eva.vision.datasets.GleasonArvaniti
           init_args: &PREDICT_DATASET_ARGS
@@ -103,10 +97,6 @@ data:
           init_args:
             <<: *PREDICT_DATASET_ARGS
             split: val
-        - class_path: eva.vision.datasets.GleasonArvaniti
-          init_args:
-            <<: *PREDICT_DATASET_ARGS
-            split: test
     dataloaders:
       train:
         batch_size: &BATCH_SIZE ${oc.env:BATCH_SIZE, 256}
@@ -115,9 +105,6 @@ data:
       val:
         batch_size: *BATCH_SIZE
         num_workers: *N_DATA_WORKERS
-      test:
-        batch_size: *BATCH_SIZE
-        num_workers: *N_DATA_WORKERS
       predict:
         batch_size: &PREDICT_BATCH_SIZE ${oc.env:PREDICT_BATCH_SIZE, 64}
         num_workers: *N_DATA_WORKERS
diff --git a/configs/vision/pathology/online/classification/breakhis.yaml b/configs/vision/pathology/online/classification/breakhis.yaml
@@ -44,7 +44,7 @@ model:
       class_path: torch.nn.Linear
       init_args:
         in_features: ${oc.env:IN_FEATURES, 384}
-        out_features: &NUM_CLASSES 8
+        out_features: &NUM_CLASSES 4
     criterion: torch.nn.CrossEntropyLoss
     optimizer:
       class_path: torch.optim.AdamW

diff --git a/configs/vision/pathology/online/classification/gleason_arvaniti.yaml b/configs/vision/pathology/online/classification/gleason_arvaniti.yaml
@@ -80,11 +80,6 @@ data:
         init_args:
           <<: *DATASET_ARGS
           split: val
-      test:
-        class_path: eva.vision.datasets.GleasonArvaniti
-        init_args:
-          <<: *DATASET_ARGS
-          split: test
     dataloaders:
       train:
         batch_size: &BATCH_SIZE ${oc.env:BATCH_SIZE, 256}
@@ -93,6 +88,3 @@ data:
       val:
         batch_size: *BATCH_SIZE
         num_workers: *N_DATA_WORKERS
-      test:
-        batch_size: *BATCH_SIZE
-        num_workers: *N_DATA_WORKERS
diff --git a/docs/datasets/breakhis.md b/docs/datasets/breakhis.md
@@ -2,7 +2,9 @@
 
 The Breast Cancer Histopathological Image Classification (BreakHis) is  composed of 9,109 microscopic images of breast tumor tissue collected from 82 patients using different magnifying factors (40X, 100X, 200X, and 400X). For this benchmark we only use the 40X samples which results in a subset of 1,995 images. This database has been built in collaboration with the P&D Laboratory, Pathological Anatomy and Cytopathology, Parana, Brazil.
 
-The dataset is divided into two main groups: benign tumors and malignant tumors. The dataset currently contains four histological distinct types of benign breast tumors: adenosis (A), fibroadenoma (F), phyllodes tumor (PT), and tubular adenona (TA); and four malignant tumors (breast cancer): carcinoma (DC), lobular carcinoma (LC), mucinous carcinoma (MC) and papillary carcinoma (PC).
+The dataset is divided into two main groups: benign tumors and malignant tumors. The original dataset contains four histological distinct types of benign breast tumors: adenosis (A), fibroadenoma (F), phyllodes tumor (PT), and tubular adenona (TA); and four malignant tumors (breast cancer): carcinoma (DC), lobular carcinoma (LC), mucinous carcinoma (MC) and papillary carcinoma (PC).
+
+Given that patient counts for some classes are very low (e.g. 3 for PT), we only use classes with at least 7 patients for this benchmark: TA, MC, F & DC.
 
 ## Raw data
 
@@ -11,24 +13,23 @@ The dataset is divided into two main groups: benign tumors and malignant tumors.
 |                                |                             |
 |--------------------------------|-----------------------------|
 | **Modality**                   | Vision (WSI patches)        |
-| **Task**                       | Multiclass classification (8 classes) |
+| **Task**                       | Multiclass classification (4 classes) |
 | **Cancer type**                | Breast                      |
 | **Data size**                  | 4 GB                        |
 | **Image dimension**            | 700 x 460                   |
 | **Magnification (μm/px)**      | 40x (0.25)                  |
 | **Files format**               | `png`                       |
-| **Number of images**           | 1995                        |
+| **Number of images**           | 1471                        |
 
 
 ### Splits
 
-The data source provides train/validation splits
+The data source provides train/validation splits. There is no overlap of patients between the splits, and a stratified distribution of the classes is approximated (extact stratification is not possible due to the patient separation constraint).
 
-| Splits | Train           | Validation   |
-|----------|---------------|--------------|
-| #Samples | 1393 (70%)    | 602 (30%)    |
+| Splits   | Train            | Validation      |
+|----------|------------------|-----------------|
+| #Samples | 1132 (76.95%)    | 339 (23.04%)    |
 
-A test split is not provided, as by further dividing the dataset the number of samples per class becomes too low for robust evaluations. __eva__ therefore reports evaluation results for BreakHis on the validation split.
 
 
 ### Organization

diff --git a/docs/datasets/gleason_arvaniti.md b/docs/datasets/gleason_arvaniti.md
@@ -22,14 +22,14 @@ Images are classified as benign, Gleason pattern 3, 4 or 5. The dataset contains
 
 ### Splits
 
-We use the same splits as proposed in the paper:
+The following splits are proposed in the paper:
 
-| Splits | Train         | Validation   | Test         |
-|---|---------------|--------------|--------------|
+| Splits   | Train           | Validation     | Test           |
+|----------|-----------------|----------------|----------------|
 | #Samples | 15,303 (67.26%) | 2,482 (10.91%) | 4,967 (21.83%) |
 
 Note that the authors chose TMA 76 as validation cohort because it contains the most balanced distribution of Gleason scores.
-
+We couldn't achieve stable results when evaluating on the test set, so we only use the train and validation sets for this benchmark.
 
 ## Download and preprocessing
 The `GleasonArvaniti` dataset class doesn't download the data during runtime and must be downloaded and preprocessed manually:

diff --git a/docs/datasets/index.md b/docs/datasets/index.md
@@ -11,7 +11,7 @@
 |------------------------------------|----------|-------------|------------------------|----------------------------|------------------|
 | [BACH](bach.md)                    | 400      | 2048x1536   | 20x (0.5)              | Classification (4 classes) | Breast           |
 | [BRACS](bracs.md)                  | 4539     | variable   | 40x (0.25)             | Classification (7 classes) | Breast           |
-| [BreakHis](breakhis.md)            | 1995     | 700x460    | 40x (0.25)             | Classification (8 classes) | Breast           |
+| [BreakHis](breakhis.md)            | 1471     | 700x460    | 40x (0.25)             | Classification (4 classes) | Breast           |
 | [CRC](crc.md)                      | 107,180  | 224x224     | 20x (0.5)              | Classification (9 classes) | Colorectal       |
 | [GleasonArvaniti](crc.md)          | 22,752   | 750x750    | 40x (0.23)             | Classification (4 classes) | Prostate         |
 | [PatchCamelyon](patch_camelyon.md) | 327,680  | 96x96       | 10x (1.0) \*           | Classification (2 classes) | Breast           |

diff --git a/src/eva/vision/data/datasets/classification/breakhis.py b/src/eva/vision/data/datasets/classification/breakhis.py
@@ -3,7 +3,7 @@
 import functools
 import glob
 import os
-from typing import Callable, Dict, List, Literal, Set
+from typing import Any, Callable, Dict, List, Literal, Set
 
 import torch
 from torchvision import tv_tensors
@@ -28,37 +28,26 @@ class BreaKHis(base.ImageClassification):
 
     _val_patient_ids: Set[str] = {
         "18842D",
-        "16184",
-        "8168",
-        "4372",
-        "16716",
-        "9146",
-        "21978AB",
-        "6241",
-        "17901",
-        "12465",
-        "3411F",
-        "18842",
-        "2980",
-        "15570C",
-        "2985",
-        "13413",
-        "3909",
-        "14134E",
-        "2523",
-        "19854C",
         "19979",
-        "29960CD",
-        "21998AB",
-        "29960AB",
-        "14946",
+        "15275",
+        "15792",
+        "16875",
+        "3909",
+        "5287",
+        "16716",
+        "2773",
+        "5695",
+        "16184CD",
+        "23060CD",
+        "21998CD",
+        "21998EF",
     }
     """Patient IDs to use for dataset splits."""
 
     _expected_dataset_lengths: Dict[str | None, int] = {
-        "train": 1393,
-        "val": 602,
-        None: 1995,
+        "train": 1132,
+        "val": 339,
+        None: 1471,
     }
     """Expected dataset lengths for the splits and complete dataset."""
 
@@ -106,7 +95,7 @@ def __init__(
     @property
     @override
     def classes(self) -> List[str]:
-        return ["A", "F", "PT", "TA", "DC", "LC", "MC", "PC"]
+        return ["TA", "MC", "F", "DC"]
 
     @property
     @override
@@ -151,8 +140,8 @@ def validate(self) -> None:
         _validators.check_dataset_integrity(
             self,
             length=self._expected_dataset_lengths[self._split],
-            n_classes=8,
-            first_and_last_labels=("A", "PC"),
+            n_classes=4,
+            first_and_last_labels=("TA", "DC"),
         )
 
     @override
@@ -165,6 +154,10 @@ def load_target(self, index: int) -> torch.Tensor:
         class_name = self._extract_class(self._image_files[self._indices[index]])
         return torch.tensor(self.class_to_idx[class_name], dtype=torch.long)
 
+    @override
+    def load_metadata(self, index: int) -> Dict[str, Any]:
+        return {"patient_id": self._extract_patient_id(self._image_files[self._indices[index]])}
+
     @override
     def __len__(self) -> int:
         return len(self._indices)
@@ -200,6 +193,8 @@ def _make_indices(self) -> List[int]:
         val_indices = []
 
         for index, image_file in enumerate(self._image_files):
+            if self._extract_class(image_file) not in self.classes:
+                continue
             patient_id = self._extract_patient_id(image_file)
             if patient_id in self._val_patient_ids:
                 val_indices.append(index)

diff --git a/src/eva/vision/data/datasets/classification/gleason_arvaniti.py b/src/eva/vision/data/datasets/classification/gleason_arvaniti.py
@@ -8,6 +8,7 @@
 
 import pandas as pd
 import torch
+from loguru import logger
 from torchvision import tv_tensors
 from typing_extensions import override
 
@@ -100,6 +101,12 @@ def prepare_data(self) -> None:
         if not os.path.isdir(os.path.join(self._root, "test_patches_750")):
             raise FileNotFoundError(f"`test_patches_750` directory not found in {self._root}")
 
+        if self._split == "test":
+            logger.warning(
+                "The test split currently leads to unstable evaluation results. "
+                "We recommend using the validation split instead."
+            )
+
     @override
     def configure(self) -> None:
         self._indices = self._make_indices()

diff --git a/...21978AB/40X/SOB_B_A-14-21978AB-40-001.png → ...14-3411F/40X/SOB_B_TA-14-3411F-40-001.png b/...21978AB/40X/SOB_B_A-14-21978AB-40-001.png → ...14-3411F/40X/SOB_B_TA-14-3411F-40-001.png
diff --git a/...22549AB/40X/SOB_B_A-14-22549AB-40-001.png → ...14-13200/40X/SOB_B_TA-14-13200-40-001.png b/...22549AB/40X/SOB_B_A-14-22549AB-40-001.png → ...14-13200/40X/SOB_B_TA-14-13200-40-001.png
diff --git a/...14-12204/40X/SOB_M_LC-14-12204-40-001.png → ...14-19979/40X/SOB_M_MC-14-19979-40-001.png b/...14-12204/40X/SOB_M_LC-14-12204-40-001.png → ...14-19979/40X/SOB_M_MC-14-19979-40-001.png