Skip to content

kapitsa2811/YOLO-OT

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

YOT

1. Overview

This project aims to develop a neural network to track an object using YOLO V3 and LSTM. YOLO V3 has a key role to detect objects in an image while LSTM deals with their location in each frame as a historical data. In this project, three models are trained and estimated. YOTMCLS is utilizing coordinates and image feature from YOLO V3 as an input data. YOTMPMO does not use image feature data. In this model, coordinates are converted into probability map as an input data. YOTMMLP is designed that Cx, Cy, W and H are fed to its own LSTM network separately.

1.1 YOTMCLS

The input of YOTMCLS consists of the image feature and coordinates of an image from the YOLO output. The location of an object is predicted from LSTM.

YOTMCLS

1.2 YOTMPMO

The coordinates are converted into probability map and then fed to LSTM. However, there is no way to reconvert the probability map into coordinates. Thus, this project proposes the way to convert the output of LSTM into coordinates using the below equation.

YOTMPMO

1.3 YOTMMLP

There are four LSTMs for each coordinate in the YOTMMLP model, so Cx, Cy, W, H are independently predicted. Because separating coordinates makes the prediction model simple, it is expected that the performance will improve.

YOTMMLP

2. Prerequisites

Python 3.7
PyTorch 1.3

3. Dataset and Training

3.1 Dataset

To train YOT, 27 of TB-100 data from http://cvlab.hanyang.ac.kr/tracker_benchmark/datasets.html are used.

3.2 Training

60% frames of each video clip are used to train the networks and 20% frames of them are used to validate them. The IoT scores of training and validating sets of YOLO outputs are 0.641 and 0.646.

3.3 Default value of coordinates

The default value of coordinates of a predicted object from YOLO V3 is (0, 0, 0, 0, 0) when the object is not detected. However, using Cx=0 and Cy=0 may make a bias because (0, 0) means left-top in an image. In this project, (0.5, 0.5, 0, 0, 0) is used as a default value for undetected objects.

4. Test Results

4.1 YOTMCLS

This model does not show good performance. The image feature seems to reduce the performance due to the complexity .

YOTMCLS

4.2 YOTMPMO

This model shows poor performance with overfitting.

YOTMPMO

4.3 YOTMMLP

With 64 of hidden size, YOTMMLP shows good performance.

YOTMMLP

Demo videos are available.

Video 1, Video 2

4.3 YOTMMLP with GT

Ground truth is also sequential data, so training with ground truth and YOLO output will be expected to improve the performance. With 32 of the hidden size of LSTM, this model shows slightly better performance than YOTMMLP trained without ground truth.

YOTMMLP with GT

References

https://github.com/Guanghan/ROLO

https://github.com/eriklindernoren/PyTorch-YOLOv3

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%