-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
14 changed files
with
528 additions
and
1 deletion.
There are no files selected for viewing
123 changes: 123 additions & 0 deletions
123
configs/vision/pathology/offline/classification/gleason_arvaniti.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,123 @@ | ||
--- | ||
trainer: | ||
class_path: eva.Trainer | ||
init_args: | ||
n_runs: &N_RUNS ${oc.env:N_RUNS, 5} | ||
default_root_dir: &OUTPUT_ROOT ${oc.env:OUTPUT_ROOT, logs/${oc.env:MODEL_NAME, dino_vits16}/offline/gleason_arvaniti} | ||
max_steps: &MAX_STEPS ${oc.env:MAX_STEPS, 12500} | ||
checkpoint_type: ${oc.env:CHECKPOINT_TYPE, best} | ||
callbacks: | ||
- class_path: eva.callbacks.ConfigurationLogger | ||
- class_path: lightning.pytorch.callbacks.TQDMProgressBar | ||
init_args: | ||
refresh_rate: ${oc.env:TQDM_REFRESH_RATE, 1} | ||
- class_path: lightning.pytorch.callbacks.LearningRateMonitor | ||
init_args: | ||
logging_interval: epoch | ||
- class_path: lightning.pytorch.callbacks.ModelCheckpoint | ||
init_args: | ||
filename: best | ||
save_last: true | ||
save_top_k: 1 | ||
monitor: &MONITOR_METRIC ${oc.env:MONITOR_METRIC, val/MulticlassAccuracy} | ||
mode: &MONITOR_METRIC_MODE ${oc.env:MONITOR_METRIC_MODE, max} | ||
- class_path: lightning.pytorch.callbacks.EarlyStopping | ||
init_args: | ||
min_delta: 0 | ||
patience: ${oc.env:PATIENCE, 21} | ||
monitor: *MONITOR_METRIC | ||
mode: *MONITOR_METRIC_MODE | ||
- class_path: eva.callbacks.ClassificationEmbeddingsWriter | ||
init_args: | ||
output_dir: &DATASET_EMBEDDINGS_ROOT ${oc.env:EMBEDDINGS_ROOT, ./data/embeddings}/${oc.env:MODEL_NAME, dino_vits16}/gleason_arvaniti | ||
dataloader_idx_map: | ||
0: train | ||
1: val | ||
2: test | ||
backbone: | ||
class_path: eva.vision.models.ModelFromRegistry | ||
init_args: | ||
model_name: ${oc.env:MODEL_NAME, universal/vit_small_patch16_224_dino} | ||
model_extra_kwargs: ${oc.env:MODEL_EXTRA_KWARGS, null} | ||
overwrite: false | ||
logger: | ||
- class_path: lightning.pytorch.loggers.TensorBoardLogger | ||
init_args: | ||
save_dir: *OUTPUT_ROOT | ||
name: "" | ||
model: | ||
class_path: eva.HeadModule | ||
init_args: | ||
head: | ||
class_path: torch.nn.Linear | ||
init_args: | ||
in_features: ${oc.env:IN_FEATURES, 384} | ||
out_features: &NUM_CLASSES 4 | ||
criterion: torch.nn.CrossEntropyLoss | ||
optimizer: | ||
class_path: torch.optim.AdamW | ||
init_args: | ||
lr: ${oc.env:LR_VALUE, 0.0003} | ||
lr_scheduler: | ||
class_path: torch.optim.lr_scheduler.CosineAnnealingLR | ||
init_args: | ||
T_max: *MAX_STEPS | ||
eta_min: 0.0 | ||
metrics: | ||
common: | ||
- class_path: eva.metrics.AverageLoss | ||
- class_path: eva.metrics.MulticlassClassificationMetrics | ||
init_args: | ||
num_classes: *NUM_CLASSES | ||
data: | ||
class_path: eva.DataModule | ||
init_args: | ||
datasets: | ||
train: | ||
class_path: eva.datasets.EmbeddingsClassificationDataset | ||
init_args: &DATASET_ARGS | ||
root: *DATASET_EMBEDDINGS_ROOT | ||
manifest_file: manifest.csv | ||
split: train | ||
val: | ||
class_path: eva.datasets.EmbeddingsClassificationDataset | ||
init_args: | ||
<<: *DATASET_ARGS | ||
split: val | ||
test: | ||
class_path: eva.datasets.EmbeddingsClassificationDataset | ||
init_args: | ||
<<: *DATASET_ARGS | ||
split: test | ||
predict: | ||
- class_path: eva.vision.datasets.GleasonArvaniti | ||
init_args: &PREDICT_DATASET_ARGS | ||
root: ${oc.env:DATA_ROOT, ./data/arvaniti_gleason_patches} | ||
split: train | ||
transforms: | ||
class_path: eva.vision.data.transforms.common.ResizeAndCrop | ||
init_args: | ||
mean: ${oc.env:NORMALIZE_MEAN, [0.485, 0.456, 0.406]} | ||
std: ${oc.env:NORMALIZE_STD, [0.229, 0.224, 0.225]} | ||
- class_path: eva.vision.datasets.GleasonArvaniti | ||
init_args: | ||
<<: *PREDICT_DATASET_ARGS | ||
split: val | ||
- class_path: eva.vision.datasets.GleasonArvaniti | ||
init_args: | ||
<<: *PREDICT_DATASET_ARGS | ||
split: test | ||
dataloaders: | ||
train: | ||
batch_size: &BATCH_SIZE ${oc.env:BATCH_SIZE, 256} | ||
num_workers: &N_DATA_WORKERS ${oc.env:N_DATA_WORKERS, 4} | ||
shuffle: true | ||
val: | ||
batch_size: *BATCH_SIZE | ||
num_workers: *N_DATA_WORKERS | ||
test: | ||
batch_size: *BATCH_SIZE | ||
num_workers: *N_DATA_WORKERS | ||
predict: | ||
batch_size: &PREDICT_BATCH_SIZE ${oc.env:PREDICT_BATCH_SIZE, 64} | ||
num_workers: *N_DATA_WORKERS |
98 changes: 98 additions & 0 deletions
98
configs/vision/pathology/online/classification/gleason_arvaniti.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
--- | ||
trainer: | ||
class_path: eva.Trainer | ||
init_args: | ||
n_runs: &N_RUNS ${oc.env:N_RUNS, 5} | ||
default_root_dir: &OUTPUT_ROOT ${oc.env:OUTPUT_ROOT, logs/${oc.env:MODEL_NAME, dino_vits16}/online/gleason_arvaniti} | ||
max_steps: &MAX_STEPS ${oc.env:MAX_STEPS, 12500} | ||
checkpoint_type: ${oc.env:CHECKPOINT_TYPE, best} | ||
callbacks: | ||
- class_path: eva.callbacks.ConfigurationLogger | ||
- class_path: lightning.pytorch.callbacks.TQDMProgressBar | ||
init_args: | ||
refresh_rate: ${oc.env:TQDM_REFRESH_RATE, 1} | ||
- class_path: lightning.pytorch.callbacks.LearningRateMonitor | ||
init_args: | ||
logging_interval: epoch | ||
- class_path: lightning.pytorch.callbacks.ModelCheckpoint | ||
init_args: | ||
filename: best | ||
save_last: true | ||
save_top_k: 1 | ||
monitor: &MONITOR_METRIC ${oc.env:MONITOR_METRIC, val/MulticlassAccuracy} | ||
mode: &MONITOR_METRIC_MODE ${oc.env:MONITOR_METRIC_MODE, max} | ||
- class_path: lightning.pytorch.callbacks.EarlyStopping | ||
init_args: | ||
min_delta: 0 | ||
patience: ${oc.env:PATIENCE, 21} | ||
monitor: *MONITOR_METRIC | ||
mode: *MONITOR_METRIC_MODE | ||
logger: | ||
- class_path: lightning.pytorch.loggers.TensorBoardLogger | ||
init_args: | ||
save_dir: *OUTPUT_ROOT | ||
name: "" | ||
model: | ||
class_path: eva.HeadModule | ||
init_args: | ||
backbone: | ||
class_path: eva.vision.models.ModelFromRegistry | ||
init_args: | ||
model_name: ${oc.env:MODEL_NAME, universal/vit_small_patch16_224_dino} | ||
model_extra_kwargs: ${oc.env:MODEL_EXTRA_KWARGS, null} | ||
head: | ||
class_path: torch.nn.Linear | ||
init_args: | ||
in_features: ${oc.env:IN_FEATURES, 384} | ||
out_features: &NUM_CLASSES 4 | ||
criterion: torch.nn.CrossEntropyLoss | ||
optimizer: | ||
class_path: torch.optim.AdamW | ||
init_args: | ||
lr: ${oc.env:LR_VALUE, 0.0003} | ||
lr_scheduler: | ||
class_path: torch.optim.lr_scheduler.CosineAnnealingLR | ||
init_args: | ||
T_max: *MAX_STEPS | ||
eta_min: 0.0 | ||
metrics: | ||
common: | ||
- class_path: eva.metrics.AverageLoss | ||
- class_path: eva.metrics.MulticlassClassificationMetrics | ||
init_args: | ||
num_classes: *NUM_CLASSES | ||
data: | ||
class_path: eva.DataModule | ||
init_args: | ||
datasets: | ||
train: | ||
class_path: eva.vision.datasets.GleasonArvaniti | ||
init_args: &DATASET_ARGS | ||
root: ${oc.env:DATA_ROOT, ./data/arvaniti_gleason_patches} | ||
split: train | ||
transforms: | ||
class_path: eva.vision.data.transforms.common.ResizeAndCrop | ||
init_args: | ||
mean: ${oc.env:NORMALIZE_MEAN, [0.485, 0.456, 0.406]} | ||
std: ${oc.env:NORMALIZE_STD, [0.229, 0.224, 0.225]} | ||
val: | ||
class_path: eva.vision.datasets.GleasonArvaniti | ||
init_args: | ||
<<: *DATASET_ARGS | ||
split: val | ||
test: | ||
class_path: eva.vision.datasets.GleasonArvaniti | ||
init_args: | ||
<<: *DATASET_ARGS | ||
split: test | ||
dataloaders: | ||
train: | ||
batch_size: &BATCH_SIZE ${oc.env:BATCH_SIZE, 256} | ||
num_workers: &N_DATA_WORKERS ${oc.env:N_DATA_WORKERS, 4} | ||
shuffle: true | ||
val: | ||
batch_size: *BATCH_SIZE | ||
num_workers: *N_DATA_WORKERS | ||
test: | ||
batch_size: *BATCH_SIZE | ||
num_workers: *N_DATA_WORKERS |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
# Gleason (Arvaniti) | ||
|
||
Benchmark dataset for automated Gleason grading of prostate cancer tissue microarrays via deep learning as proposed by [Arvaniti et al.](https://www.nature.com/articles/s41598-018-30535-1). | ||
|
||
Images are classified as benign, Gleason pattern 3, 4 or 5. The dataset contains annotations on a discovery / train cohort of 641 patients and an independent test cohort of 245 patients annotated by two pathologists. For the test cohort, we only use the labels from pathologist Nr. 1 for this benchmark | ||
|
||
## Raw data | ||
|
||
### Key stats | ||
|
||
| | | | ||
|--------------------------------|-----------------------------| | ||
| **Modality** | Vision (WSI patches) | | ||
| **Task** | Multiclass classification (4 classes) | | ||
| **Cancer type** | Prostate | | ||
| **Data size** | 4 GB | | ||
| **Image dimension** | 750 x 750 | | ||
| **Magnification (μm/px)** | 40x (0.23) | | ||
| **Files format** | `jpg` | | ||
| **Number of images** | 22,752 | | ||
|
||
|
||
### Splits | ||
|
||
We use the same splits as proposed in the paper: | ||
|
||
| Splits | Train | Validation | Test | | ||
|---|---------------|--------------|--------------| | ||
| #Samples | 15,303 (67.26%) | 2,482 (10.91%) | 4,967 (21.83%) | | ||
|
||
Note that the authors chose TMA 76 as validation cohort because it contains the most balanced distribution of Gleason scores. | ||
|
||
|
||
## Download and preprocessing | ||
The `GleasonArvaniti` dataset class doesn't download the data during runtime and must be downloaded and preprocessed manually: | ||
|
||
1. Download dataset archives from the [official source](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/OCYCMP) | ||
2. Unpack all .tar.gz archives into the same folder | ||
3. Adjust the folder structure and then run the `create_patches.py` from https://github.com/eiriniar/gleason_CNN/tree/master | ||
|
||
This should result in the folloing folder structure: | ||
|
||
``` | ||
arvaniti_gleason_patches | ||
├── test_patches_750 | ||
│ ├── patho_1 | ||
│ │ ├── ZT80_38_A_1_1 | ||
│ │ ├── ZT76_39_A_1_1_patch_12_class_0.jpg | ||
│ │ ├── ZT76_39_A_1_1_patch_23_class_0.jpg | ||
│ │ │ └── ... | ||
│ │ ├── ZT80_38_A_1_2 | ||
│ │ │ └── ... | ||
│ │ └── ... | ||
│ ├── patho_2 # we don't use this | ||
│ │ └── ... | ||
├── train_validation_patches_750 | ||
│ ├── ZT76_39_A_1_1 | ||
│ │ ├── ZT76_39_A_1_1_patch_12_class_0.jpg | ||
│ │ ├── ZT76_39_A_1_1_patch_23_class_0.jpg | ||
│ │ └── ... | ||
│ ├── ZT76_39_A_1_2 | ||
│ └── ... | ||
``` | ||
|
||
## Relevant links | ||
|
||
* [Paper](https://www.nature.com/articles/s41598-018-30535-1) | ||
* [GitHub](https://github.com/eiriniar/gleason_CNN) | ||
* [Dataset](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/OCYCMP) | ||
|
||
## License | ||
|
||
[CC0 1.0 Universal](https://creativecommons.org/publicdomain/zero/1.0/) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.