SemanticSegmentationTask: add class-wise metrics #2130

robmarkcole · 2024-06-19T10:17:53Z

Addresses #2121 for segmentation. Mostly copied from @isaaccorley as here - he is additionally passing on_epoch=True which is NOT adopted here

Output metrics for ChaBud binary task with labels=['background', 'burned_area']
This dataset nicely illustrates why class labels are required - burned_area is minority class and is not learnt

[{'test_loss': 450.0233459472656,
  'test_multiclassaccuracy_background': 0.9817732572555542,
  'test_multiclassaccuracy_burned_area': 0.006427088752388954,
  'test_AverageAccuracy': 0.4941001236438751,
  'test_AverageF1Score': 0.4793838560581207,
  'test_AverageJaccardIndex': 0.4542679488658905,
  'test_multiclassfbetascore_background': 0.9489027857780457,
  'test_multiclassfbetascore_burned_area': 0.009864915162324905,
  'test_multiclassjaccardindex_background': 0.9035778641700745,
  'test_multiclassjaccardindex_burned_area': 0.004958001431077719,
  'test_OverallAccuracy': 0.9036323428153992,
  'test_OverallF1Score': 0.9036323428153992,
  'test_OverallJaccardIndex': 0.8265058994293213,
  'test_multiclassprecision_background': 0.9189077615737915,
  'test_multiclassprecision_burned_area': 0.0312582366168499,
  'test_multiclassrecall_background': 0.9817732572555542,
  'test_multiclassrecall_burned_area': 0.006427088752388954}]

DimitrisMantas · 2024-06-20T08:39:32Z

Given that most metrics of interest are broken (e.g., all of them when average="macro" and ignore_index is specified (Lightning-AI/torchmetrics#2443) andJaccardIndex which outputs NaN when average==macro instead when you try to take absent and ignored classes into account with zero_division (Lightning-AI/torchmetrics#2535)), should we make an effort to see if and how we could add our own?

I'm saying this because these are only the issues I've found so far, but I've also noticed other suspicious things like the fact that my classwise recall values are not the same as those in the confusion matrix when you normalize it with respect to ground truth (I haven't checked if this is also the case with precision, so when the matrix is normalized column-wise). I'm also pretty confident that if all of this is wrong then micro averaging is also probably wrong.

I should be pretty easy to compute all these metrics straight from the confusion matrix (assuming it at least is correct) and I've actually tried to reimplent them this way but it hasn't really been a priority because I’ve found that all these wrong (?) values are basically a lower bound of the actual ones. If you look at the official implementations, this is actually what they are doing, and my guess is that they have a bug in their logic later on. But indeed all these metrics inherit from StatScores, basically the confusion matrix.

I’m actually pretty dumbfounded these issues are not a top priority for the TorchMetrics team and instead they focus on adding to their docs but to each their own…

robmarkcole · 2024-06-20T08:54:34Z

@DimitrisMantas good call on my ignoring the ignore_index.! In fairness they do address issues, but have a long backlog. When I made some noise they addressed Lightning-AI/torchmetrics#2198
My opinion is it is better to work with torchmetrics to address the issues, rather than implement from scratch here. I see your comment at Lightning-AI/torchmetrics#2535 (comment) so perhaps a pragmatic approach is not to add new metrics that we have concerns about, but also to create specific issues which track these concerns

torchgeo/trainers/segmentation.py

DimitrisMantas · 2024-06-20T09:01:18Z

Sure, that makes sense; please excuse the rant haha.

robmarkcole · 2024-06-20T09:14:33Z

Applied on_epoch=True, to all steps for consistency - this results in both per epoch and per step being reported for train only - perhaps this is why @isaaccorley did not apply to train?

train_loss_epoch | 0.028535427525639534
train_loss_step | 0.00008003244874998927
train_AverageAccuracy_epoch | 0.9101453423500061
train_AverageAccuracy_step | 0.9124529361724854

Note that Val is unaffected:

val_AverageAccuracy | 0.8227439522743225

For a task with 2 classes there are a grand total of Metrics (52) being reported between train & val

isaaccorley · 2024-06-20T12:22:35Z

I just set to be explicit but I think that pytorch lightning or torchmetrics auto sets on_epoch to be False for training and True for all else.

DimitrisMantas · 2024-06-20T12:51:04Z

You need to set both on_step and on_epoch to get logs only per step or per epoch.

robmarkcole · 2024-06-20T13:41:57Z

@DimitrisMantas now just performing on_step for train loss, so a more manageable 36 metrics now

robmarkcole · 2024-06-21T08:29:04Z

Not sure about this failing test ValueError: Problem with given class_path 'torchgeo.trainers.SemanticSegmentationTask'

isaaccorley · 2024-06-21T14:55:05Z

Must be an issue with on of the minimum versions of the package since it's passing for the other tests.

torchgeo/trainers/segmentation.py

adamjstewart · 2024-08-06T11:40:45Z

We can definitely increase the min version of torchmetrics if we need to.

robmarkcole · 2024-08-07T14:26:33Z

@adamjstewart I'm inclined to close this PR as I don't feel confident I understand the behaviour of torchmetrics in this implementation. Elsewhere I am using the on_stage_epoch_end hooks and feel confident I do understand the behaviour with that approach. Overall I think this should be a change we make from a place of understanding, and in smaller steps than this PR takes

robmarkcole · 2024-08-07T14:48:15Z

torchmetrics=1.1.0 test errors here

MeanAveragePrecision(), kwargs = {'average': 'macro'}
...
ValueError: Unexpected keyword arguments: `average`

See this was added in 1.1.1

robmarkcole · 2024-08-08T12:36:40Z

After discussion with torchmetrics devs, created Lightning-AI/torchmetrics#2683

adamjstewart · 2024-08-08T16:59:29Z

That's such a complicated minimal reproducible example lol.

adamjstewart · 2024-08-21T09:55:33Z

I tried making a self-contained minimal reproducible example but couldn't get one working and gave up.

DimitrisMantas · 2024-08-27T14:36:14Z

It just hit me that we should be a bit careful with which metrics we add to avoid unnecessary computation; class-wise accuracy and recall are the same thing and so are micro-averaged accuracy, precision, and recall.

adamjstewart · 2024-08-27T14:38:24Z

Any sense of how much these metrics actually add to processing time? If it isn't noticeable by a human, I don't particularly care about the overhead.

DimitrisMantas · 2024-08-27T14:42:17Z

Haven't measured it but doubt it's much.

robmarkcole · 2024-08-27T16:33:24Z

I believe Lightning offers tools for profiling

adamjstewart · 2024-08-28T11:43:32Z

They do, see https://torchgeo.readthedocs.io/en/latest/user/contributing.html#i-o-benchmarking

torchgeo/trainers/segmentation.py

robmarkcole · 2024-09-05T10:18:44Z

@adamjstewart @DimitrisMantas per this comment we should be using the _epoch_end hooks Lightning-AI/torchmetrics#2683 (comment)

DimitrisMantas · 2024-09-05T10:49:42Z

I see the issue, but I must be missing something because my own code uses the standard logging tools and metric collections work just fine.

Altough by "work", I mean I don't get an error. Other than that, I found out a couple of days ago that the diagonal of my confusion matrix doesn't match the class accuracies (which it should), so I'm obviously not using the API correctly...

Edit: I have at least one mistake where I do self.log_dict(metrics(input, target). The docs says this is wrong.

Edit 2: Aaaaand I finally got your error...

DimitrisMantas · 2024-09-05T12:29:56Z

Ok, so basically what the torchmetrics guys are saying is that automatic logging is not supported for metric collections?

DimitrisMantas · 2024-09-05T13:45:43Z

@robmarkcole I can confirm the recommended approach yields consistent results.

adamjstewart · 2024-10-01T20:17:42Z

Sorry it's taken me so long to review. I was originally hung up on the hack required to support ClasswiseWrapper in log_dict, but I've gotten over that. If torchmetrics wants to make that easier in the future, great. But I also really want this feature, so let's not wait on that.

Only remaining concern is that the code required to loop over all metrics and averages actually makes the code more complicated and difficult to read than avoiding loops entirely. If we want to add new metrics in the future, it looks non-straightforward. I wonder if we can loop over averages only and still keep things simple.

I would also really like to see this done for ClassificationTask too so SemanticSegmentationTask doesn't have a different set of features or metrics.

robmarkcole · 2024-10-02T09:45:44Z

@adamjstewart please see above comment on using _epoch_end

robmarkcole added 3 commits June 19, 2024 09:13

Add average metrics

23fa1fb

Add average metrics

b7d8305

refactor: Rename metrics in SemanticSegmentationTask

b1526fa

github-actions bot added the trainers PyTorch Lightning trainers label Jun 19, 2024

Ruff format

341e272

DimitrisMantas reviewed Jun 20, 2024

View reviewed changes

torchgeo/trainers/segmentation.py Show resolved Hide resolved

Use ignore_index

024feda

robmarkcole added 2 commits June 20, 2024 09:10

pass on_epoch

04cac59

on_epoch to train too

56f20fc

Disable on_step for train metrics

3d2b309

robmarkcole added 2 commits June 20, 2024 14:42

Merge branch 'main' into update-metrics

9af1493

ruff format

192c496

Merge branch 'main' into update-metrics

73b710f

robmarkcole added 2 commits June 23, 2024 06:29

Merge branch 'main' into update-metrics

8ce8c30

Merge branch 'main' into update-metrics

e4ed9fd

robmarkcole commented Jul 2, 2024

View reviewed changes

torchgeo/trainers/segmentation.py Show resolved Hide resolved

robmarkcole added 4 commits July 8, 2024 09:19

Merge branch 'main' into update-metrics

d9c2688

Merge branch 'main' into update-metrics

400fae3

Merge branch 'main' into update-metrics

f4c793e

Merge branch 'main' into update-metrics

3b629ea

try torchmetrics==1.1.0

9e985e2

robmarkcole added 7 commits August 7, 2024 14:50

try torchmetrics==1.1.1

c773322

Merge branch 'main' into update-metrics

9d8c8e4

Use loop to generate metrics

e2640f5

Update

19187a9

Fix jaccard

a3f7ffe

fix dependencies delta

9a66442

fix pyproject

8381cb7

robmarkcole mentioned this pull request Aug 8, 2024

log_dict to support ClasswiseWrapper Lightning-AI/torchmetrics#2683

Closed

adamjstewart removed this from the 0.6.0 milestone Aug 27, 2024

adamjstewart added this to the 0.7.0 milestone Aug 27, 2024

robmarkcole commented Sep 5, 2024

View reviewed changes

torchgeo/trainers/segmentation.py Show resolved Hide resolved

Merge branch 'main' into update-metrics

b5050ad

calebrob6 approved these changes Sep 28, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SemanticSegmentationTask: add class-wise metrics #2130

SemanticSegmentationTask: add class-wise metrics #2130

robmarkcole commented Jun 19, 2024 •

edited

Loading

DimitrisMantas commented Jun 20, 2024 •

edited

Loading

robmarkcole commented Jun 20, 2024

DimitrisMantas commented Jun 20, 2024

robmarkcole commented Jun 20, 2024 •

edited

Loading

isaaccorley commented Jun 20, 2024 •

edited

Loading

DimitrisMantas commented Jun 20, 2024

robmarkcole commented Jun 20, 2024

robmarkcole commented Jun 21, 2024

isaaccorley commented Jun 21, 2024

adamjstewart commented Aug 6, 2024

robmarkcole commented Aug 7, 2024

robmarkcole commented Aug 7, 2024 •

edited

Loading

robmarkcole commented Aug 8, 2024

adamjstewart commented Aug 8, 2024

adamjstewart commented Aug 21, 2024

DimitrisMantas commented Aug 27, 2024

adamjstewart commented Aug 27, 2024

DimitrisMantas commented Aug 27, 2024

robmarkcole commented Aug 27, 2024

adamjstewart commented Aug 28, 2024

robmarkcole commented Sep 5, 2024

DimitrisMantas commented Sep 5, 2024 •

edited

Loading

DimitrisMantas commented Sep 5, 2024

DimitrisMantas commented Sep 5, 2024

adamjstewart commented Oct 1, 2024

robmarkcole commented Oct 2, 2024

SemanticSegmentationTask: add class-wise metrics #2130

Are you sure you want to change the base?

SemanticSegmentationTask: add class-wise metrics #2130

Conversation

robmarkcole commented Jun 19, 2024 • edited Loading

DimitrisMantas commented Jun 20, 2024 • edited Loading

robmarkcole commented Jun 20, 2024

DimitrisMantas commented Jun 20, 2024

robmarkcole commented Jun 20, 2024 • edited Loading

isaaccorley commented Jun 20, 2024 • edited Loading

DimitrisMantas commented Jun 20, 2024

robmarkcole commented Jun 20, 2024

robmarkcole commented Jun 21, 2024

isaaccorley commented Jun 21, 2024

adamjstewart commented Aug 6, 2024

robmarkcole commented Aug 7, 2024

robmarkcole commented Aug 7, 2024 • edited Loading

robmarkcole commented Aug 8, 2024

adamjstewart commented Aug 8, 2024

adamjstewart commented Aug 21, 2024

DimitrisMantas commented Aug 27, 2024

adamjstewart commented Aug 27, 2024

DimitrisMantas commented Aug 27, 2024

robmarkcole commented Aug 27, 2024

adamjstewart commented Aug 28, 2024

robmarkcole commented Sep 5, 2024

DimitrisMantas commented Sep 5, 2024 • edited Loading

DimitrisMantas commented Sep 5, 2024

DimitrisMantas commented Sep 5, 2024

adamjstewart commented Oct 1, 2024

robmarkcole commented Oct 2, 2024

robmarkcole commented Jun 19, 2024 •

edited

Loading

DimitrisMantas commented Jun 20, 2024 •

edited

Loading

robmarkcole commented Jun 20, 2024 •

edited

Loading

isaaccorley commented Jun 20, 2024 •

edited

Loading

robmarkcole commented Aug 7, 2024 •

edited

Loading

DimitrisMantas commented Sep 5, 2024 •

edited

Loading