Wenzhao Qiu1,Shanmin Pang1 📧,Hao Zhang1,Jianwu Fang1,Jianru Xue1
1 Xi’an Jiaotong University
(📧) corresponding author
accepted as RA-L
High-Definition (HD) map construction is essential for autonomous driving to accurately understand the surrounding environment. In this paper, we propose a Tightly Coupled temporal fusion Map Network (TICMapNet). TICMapNet breaks down the fusion process into three sub-problems: PV feature alignment, BEV feature adjustment, and Query feature fusion. By doing so, we effectively integrate temporal information at different stages through three plug-and-play modules, using the proposed tightly coupled strategy. Our approach does not rely on camera extrinsic parameters, offering a new perspective for addressing the visual fusion challenge in the field of object detection. Experimental results demonstrate that TICMapNet significantly enhances the single-frame baseline and achieves impressive performance across multiple datasets.
Method | Backbone | PV2BEV | BEVDeocder | Lr Schd | mAP | Config | Download |
---|---|---|---|---|---|---|---|
ours_2 | R50 | GKT | DQ | 24ep | 59.0 | config | model |
Method | Backbone | PV2BEV | BEVDeocder | Lr Schd | mAP | Config | Download |
---|---|---|---|---|---|---|---|
ours_1 | R50 | GKT | VA | 10ep | 61.7 | config | model |
ours_2 | R50 | GKT | DQ | 10ep | 60.6 | config | model |
Method | Backbone | PV2BEV | BEVDeocder | Lr Schd | mAP | Config | Download |
---|---|---|---|---|---|---|---|
ours_2[1] | R50 | GKT | DQ | 24ep | 28.3 | config | model |
ours_2[2] | R50 | GKT | DQ | 24ep | 32.9 | config | model |
Method | Backbone | PV2BEV | BEVDeocder | Lr Schd | mAP | Config | Download |
---|---|---|---|---|---|---|---|
ours_2 | R50 | GKT | DQ | 24ep | 57.4 | config | model |
Method | Backbone | PV2BEV | BEVDeocder | Lr Schd | mAP | Config | Download |
---|---|---|---|---|---|---|---|
ours_2 | R50 | GKT | VA | 10ep | 59.7 | config | model |
Notes:
ours_1 employs MapTR as a single-frame baseline, and ours_2 introduces Decoupled Query based on ours_1.
[1]A. Lilja, J. Fu, E. Stenborg, and L. Hammarstrand, "Localization is all you evaluate: Data leakage in online mapping datasets and how to fix it," in CVPR 2024, pp. 22150–22159.
[2]T. Yuan, Y. Liu, Y. Wang, Y. Wang and H. Zhao, "StreamMapNet: Streaming Mapping Network for Vectorized Online HD Map Construction," in WACV 2024, pp. 7341-7350.
TICMapNet is based on MapTR. It is also greatly inspired by the following outstanding contributions to the open-source community:BEVFormer, StreamMapNet,BEVFusion,GKT,mmdetection3d.
If you find TICMapNet is useful in your research, please consider citing it by the following BibTeX entry.
@ARTICLE{10740793,
author={Qiu, Wenzhao and Pang, Shanmin and Zhang, Hao and Fang, Jianwu and Xue, Jianru},
journal={IEEE Robotics and Automation Letters},
title={TICMapNet: A Tightly Coupled Temporal Fusion Pipeline for Vectorized HD Map Learning},
year={2024},
volume={},
number={},
pages={1-8},
keywords={Feature extraction;History;Cameras;Object detection;Encoding;Three-dimensional displays;Decoding;Pipelines;Visualization;Manuals;Vectorized HD map;Temporal fusion},
doi={10.1109/LRA.2024.3490384}}
}