July 2020
tl;dr: Follow-up work of PointTrack for MOTS.
Three main contributions:
- Semantic segmentation map as seed map in PointTrack and SpatialEmbedding.
- Copy and paste data augmentation for crowded scenes. Need segmentation mask.
- Training instance embedding:
- PointTrack consists of D track ids, each with three crops with equal temporal space. It does not use 3 consecutive frames to increase the intra-track-id discrepancy. The space S is randomly chosen between 1 and 10.
- PointTrack++ finds that for environment embedding, making S>2 does not converge, but for foreground 2D point cloud a large S (~12) helps to achieve a higher performance. Thus the embeddings are trained separately. Then the individual MLP weights are fixed, and a new MLP is trained to aggregate these info together.
- Image is upsampled to twice the original size for better performance.
- Questions and notes on how to improve/revise the current work