Skip to content

Latest commit

 

History

History
141 lines (91 loc) · 6.92 KB

README.md

File metadata and controls

141 lines (91 loc) · 6.92 KB

Tracking Objects as Points

Simultaneous object detection and tracking using center points:

Tracking Objects as Points,
Xingyi Zhou, Vladlen Koltun, Philipp Krähenbühl,
arXiv technical report (arXiv 2004.01177)

@article{zhou2020tracking,
  title={Tracking Objects as Points},
  author={Zhou, Xingyi and Koltun, Vladlen and Kr{\"a}henb{\"u}hl, Philipp},
  journal={arXiv:2004.01177},
  year={2020}
}

Contact: zhouxy@cs.utexas.edu. Any questions or discussion are welcome!

Abstract

Tracking has traditionally been the art of following interest points through space and time. This changed with the rise of powerful deep networks. Nowadays, tracking is dominated by pipelines that perform object detection followed by temporal association, also known as tracking-by-detection. In this paper, we present a simultaneous detection and tracking algorithm that is simpler, faster, and more accurate than the state of the art. Our tracker, CenterTrack, applies a detection model to a pair of images and detections from the prior frame. Given this minimal input, CenterTrack localizes objects and predicts their associations with the previous frame. That's it. CenterTrack is simple, online (no peeking into the future), and real-time. It achieves 67.3% MOTA on the MOT17 challenge at 22 FPS and 89.4% MOTA on the KITTI tracking benchmark at 15 FPS, setting a new state of the art on both datasets. CenterTrack is easily extended to monocular 3D tracking by regressing additional 3D attributes. Using monocular video input, it achieves 28.3% AMOTA@0.2 on the newly released nuScenes 3D tracking benchmark, substantially outperforming the monocular baseline on this benchmark while running at 28 FPS.

Features at a glance

  • One-sentence method summary: Our model takes the current frame, the previous frame, and a heatmap rendered from previous tracking results as input, and predicts the current detection heatmap as well as their offsets to centers in the previous frame.

  • The model can be trained on still image datasets if videos are not available.

  • Easily extends to monocular 3d object tracking, multi-category tracking, and pose tracking.

  • State-of-the-art performance on MOT17, KITTI, and nuScenes monocular tracking benchmarks.

Main results

Pedestrian tracking on MOT17 test set

Detection MOTA FPS
Public 61.4 22
Private 67.3 22

2D vehicle tracking on KITTI test set (with flip test)

MOTA FPS
89.44 15

3D tracking on nuScenes test set

AMOTA @ 0.2 AMOTA FPS
27.8 4.6 28

Besides benchmark evaluation, we also provide models for 80-category tracking and pose tracking trained on COCO. See the sample visual results below (Video files from openpose and YOLO).

All models and details are available in our Model zoo.

Installation

Please refer to INSTALL.md for installation instructions.

Use CenterTrack

We support demo for image folder, video, and webcam.

First, download the models (By default, nuscenes_3d_tracking for monocular 3D tracking, coco_tracking for 80-category detection and coco_pose_tracking for pose tracking) from the Model zoo and put them in CenterNet_ROOT/models/.

We provide a video clip from the nuScenes dataset in videos/nuscenes_mini.mp4. To test monocular 3D tracking on this video, run

python demo.py tracking,ddd --load_model ../models/nuScenes_3Dtracking.pth --dataset nuscenes --pre_hm --track_thresh 0.1 --demo ../videos/nuscenes_mini.mp4

If setup correctly, you will see an output video like:

Similarly, for 80-category tracking on images/ video, run:

python demo.py tracking --load_model ../models/coco_tracking.pth --demo /path/to/image/or/folder/or/video 

For webcam demo, run

python demo.py tracking --load_model ../models/coco_tracking.pth --demo webcam 

For monocular 3D tracking, run

python demo.py tracking,ddd --demo webcam --load_model ../models/coco_tracking.pth --demo /path/to/image/or/folder/or/video/or/webcam 

Similarly, for pose tracking, run:

python demo.py tracking,multi_pose --load_model ../models/coco_pose.pth --demo /path/to/image/or/folder/or/video/or/webcam 

The result for the example images should look like:

You can add --debug 2 to visualize the heatmap and offset predictions.

To use this CenterTrack in your own project, you can

import sys
CENTERTRACK_PATH = /path/to/CenterTrack/src/lib/
sys.path.insert(0, CENTERTRACK_PATH)

from detectors.detector_factory import detector_factory
from opts import opts

MODEL_PATH = /path/to/model
TASK = 'tracking' # or 'tracking,multi_pose' for pose tracking and 'tracking,ddd' for monocular 3d tracking
opt = opts().init('{} --load_model {}'.format(TASK, MODEL_PATH).split(' '))
detector = detector_factory[opt.task](opt)

images = ['''image read from open cv or from a video''']
for img in images:
  ret = detector.run(img)['results']

Each ret will be a list dict: [{'bbox': [x1, y1, x2, y2], 'tracking_id': id, ...}]

Benchmark Evaluation and Training

After installation, follow the instructions in DATA.md to setup the datasets. Then check GETTING_STARTED.md to reproduce the results in the paper. We provide scripts for all the experiments in the experiments folder.

License

CenterTrack is developed upon CenterNet. Both codebases are released under MIT License themselves. Some code of CenterNet are from third-parties with different licenses, please check the CenterNet repo for details. In addition, this repo uses py-motmetrics for MOT evaluation and nuscenes-devkit for nuScenes evaluation and preprocessing. See NOTICE for detail. Please note the licenses of each dataset. Most of the datasets we used in this project are under non-commercial licenses.