The performance of end-to-end fine-tuning #56
-
Hi, it's me again:) |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Hi @JacobYuan7, We did attempt fine-tuning the whole pipeline end-to-end, but found out that it decreases the performance. This is most likely due to the data processing in the interaction head, where we filter out detected humans and objects with scores below a certain threshold. And since most images in the dataset contain very few salient objects, there are usually fewer than 5 objects left that are fed into the interaction head. As a result, the gradient flowing back to the backbone detector is very noisy. As a side note, whether the model is two-stage or one-stage does not depend on the training procedure, but the model architecture. Our model is two-stage because it first detects humans and objects, followed by exhaustive pairing. A one-stage model usually does not have an explicit representation for the individual detections and generates detected pairs directly. Hope that answers your question. Fred. |
Beta Was this translation helpful? Give feedback.
Hi @JacobYuan7,
We did attempt fine-tuning the whole pipeline end-to-end, but found out that it decreases the performance. This is most likely due to the data processing in the interaction head, where we filter out detected humans and objects with scores below a certain threshold. And since most images in the dataset contain very few salient objects, there are usually fewer than 5 objects left that are fed into the interaction head. As a result, the gradient flowing back to the backbone detector is very noisy.
As a side note, whether the model is two-stage or one-stage does not depend on the training procedure, but the model architecture. Our model is two-stage because it first detects hu…