You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TrackNet is a deep learning network for higi-speed and tiny objects tracking invented by National Chiao-Tung University in Taiwan. It's a FCN model adpotes VGG16 to generate feature map and DeconvNet to decode using pixel-wise classification. TrackNet could take multiple consecutive frames as input, model will learn not only object tracking but also trajectory to enhance its capability of positioning and recognition. TrackNet will generate gaussian heat map centered on ball to indicate position of the ball. Binary cross-entropy is used as loss function to compute difference between heat map of prediction and ground truth.
Modification
1. Combine ResNet and U-Net to form network architecture.
Layer
Filter size
Depth
Padding
Stride
Activation
conv1
3 x 3
64
2
1
BN+Relu
conv2
3 x 3
64
2
1
BN+Relu
resD_1
-
32
-
-
BN+Relu
resE_1 x 2
-
32
-
-
BN+Relu
resD_2
-
64
-
-
BN+Relu
resE_2 x 2
-
64
-
-
BN+Relu
resD_3
-
128
-
-
BN+Relu
resE_3 x 3
-
128
-
-
BN+Relu
resD_4
-
256
-
-
BN+Relu
resE_4 x 2
-
256
-
-
BN+Relu
resU_1 + concat
-
128+128
-
-
BN+Relu
resDE_5 x 3
-
128
-
-
BN+Relu
resU_2 + concat
-
64+64
-
-
BN+Relu
resDE_6 x 2
-
64
-
-
BN+Relu
resU_3 + concat
-
32+32
-
-
BN+Relu
resDE_7 x 2
-
32
-
-
BN+Relu
resU_4
-
16
-
-
BN+Relu
conv3
3 x 3
64
2
1
BN+Relu
conv4
3 x 3
64
2
1
BN+Relu
conv5
3 x 3
256
2
1
BN+Relu+Softmax
Sturcture of res-block-encoder(resE)
Layer
Filter size
Depth
Padding
Stride
Activation
conv1
1 x 1
n
0
1
BN+Relu
conv2
3 x 3
n
2
1
BN+Relu
conv3
1 x 1
2n
0
1
BN+Relu
Sturcture of res-block-downsamping(resD)
Layer
Filter size
Depth
Padding
Stride
Activation
conv1
1 x 1
n
0
1
BN+Relu
conv2
3 x 3
n
2
2
BN+Relu
conv3
1 x 1
2n
0
1
BN+Relu
Sturcture of res-block-decoder(resDE)
Layer
Filter size
Depth
Padding
Stride
Activation
conv1
1 x 1
n
0
1
BN+Relu
conv2
3 x 3
n
2
1
BN+Relu
conv3
1 x 1
n
0
1
BN+Relu
Sturcture of res-block-upsamping(resU)
Layer
Filter size
Depth
Padding
Stride
Activation
conv1
1 x 1
n
0
1
BN+Relu
convT1
3 x 3
n
0
2
BN+Relu
conv2
1 x 1
n
0
1
BN+Relu
2. Use Conv2Dtranspose instead for upsampling in decoder, matching structure of ResNet in encoder.
3. Use focal loss to help model focusing more on small ground truth.
4. Use consecutive 3 frames in gray scale as input image to reduce memory usage and increase training speed.
Parameter of training
Parameter
Value
Image size
512 x 288
Heat map ball radius
2.5 pixel
Batch size
2
Learning rate
1.0
Epochs
50
Optimizer
Adadelta
Number of training images
~20k
Accuracy, Precision and Recall for test.mp4
TP, FP1, FP2, TN, FN are defined as below:
TP: True positive, center distance of ball between prediction and ground truch is smaller than 5 pixel
FP1: False positive, center distance of ball between prediction and ground truch is larger than 5 pixel
FP2: Fasle positive, if ball is not in ground truth but in prediction.
TN: True negative.
FN: False positive.
Metric
Formula
Value
Accuracy
(TP+TN)/(TP+TN+FP1+FP2+FN)
0.909
Precision
TP/(TP+FP1+FP2)
0.939
Recall
TP/(TP+FN)
0.953
Setup
Clone the repository:https://github.com/Chang-Chia-Chi/TrackNet-Badminton-Tracking-tensorflow2.git
Run pip3 install -r requirements.txt to install packages required.
Because the model is created with channel first, it could be trained and tested with GPU only.
Label
Run python imgLabel.py to open the program.
Mouse and button events are described below:
Mouse Event
Function
Left click
Label center of the ball
Middel click
Chancel label of the ball
Keyboard Event
Function
e
exit program
s
save csv
n
go to next frame
p
back to previous frame
f
go to first frame
l
go to last frame
>
fast forward 36 frames
<
fast backward 36 frames
If you want to load pre-labeled csv file, change load_csv in imgLabel.py to True.
After label all frames, press s to save file and then press e to leave the program.