The performance of end-to-end fine-tuning #56

JacobYuan7 · 2022-08-17T12:14:13Z

JacobYuan7
Aug 17, 2022

Hi, it's me again:)
Many thanks for your great work!
I am reading it again, so I raised a new question. The authors used a two-stage fine-tuning to tune the model. I wonder "have you tried one-stage (end-to-end) fine-tuning? and does it perform similarly with two-stage fine-tuning?".

Answered by fredzzhang

Aug 17, 2022

Hi @JacobYuan7,

We did attempt fine-tuning the whole pipeline end-to-end, but found out that it decreases the performance. This is most likely due to the data processing in the interaction head, where we filter out detected humans and objects with scores below a certain threshold. And since most images in the dataset contain very few salient objects, there are usually fewer than 5 objects left that are fed into the interaction head. As a result, the gradient flowing back to the backbone detector is very noisy.

As a side note, whether the model is two-stage or one-stage does not depend on the training procedure, but the model architecture. Our model is two-stage because it first detects hu…

View full answer

fredzzhang · 2022-08-17T13:23:10Z

fredzzhang
Aug 17, 2022
Maintainer

Hi @JacobYuan7,

We did attempt fine-tuning the whole pipeline end-to-end, but found out that it decreases the performance. This is most likely due to the data processing in the interaction head, where we filter out detected humans and objects with scores below a certain threshold. And since most images in the dataset contain very few salient objects, there are usually fewer than 5 objects left that are fed into the interaction head. As a result, the gradient flowing back to the backbone detector is very noisy.

As a side note, whether the model is two-stage or one-stage does not depend on the training procedure, but the model architecture. Our model is two-stage because it first detects humans and objects, followed by exhaustive pairing. A one-stage model usually does not have an explicit representation for the individual detections and generates detected pairs directly.

Hope that answers your question.

Fred.

3 replies

JacobYuan7 Aug 18, 2022
Author

Hi @fredzzhang,

Thanks for the side note. You are right. I should not mix up the use of "end-to-end" and "one-stage".

Btw, I wonder how high will the performance be if we perform this end-to-end fine-tuning? I assume the performance of a naïve interaction head is 29.22/23.09/31.05. Will the performance be lower than this?

fredzzhang Aug 18, 2022
Maintainer

I don't quite remember the numbers. I initialised the model with the same weights that produced the numbers in the paper and started fine-tuning. And from the start, the performance kept decreasing. So I think the model couldn't converge due to the instability caused by the noisy gradient.

JacobYuan7 Aug 18, 2022
Author

@fredzzhang
Get the idea! Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The performance of end-to-end fine-tuning #56

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

The performance of end-to-end fine-tuning #56

JacobYuan7 Aug 17, 2022

Replies: 1 comment · 3 replies

fredzzhang Aug 17, 2022 Maintainer

JacobYuan7 Aug 18, 2022 Author

fredzzhang Aug 18, 2022 Maintainer

JacobYuan7 Aug 18, 2022 Author

JacobYuan7
Aug 17, 2022

Replies: 1 comment 3 replies

fredzzhang
Aug 17, 2022
Maintainer

JacobYuan7 Aug 18, 2022
Author

fredzzhang Aug 18, 2022
Maintainer

JacobYuan7 Aug 18, 2022
Author