Train a YOLOX model to detect home amenities from images.
Significance: Small model size, Fast training and inference time, High accuracy (from the model's paper)
Dminity is a final year university project supervised by Dr Azlin Ahmad.
OpenImageV6 with 30 labels for object detection task:
Toilet, Swimming_pool, Bed, Billiard_table, Sink, Fountain, Oven, Ceiling_fan,
Television, Microwave_oven, Gas_stove, Refrigerator,
Kitchen_&_dining_room_table, Washing_machine, Bathtub, Stairs, Fireplace,
Pillow, Mirror, Shower, Couch, Countertop, Coffeemaker, Dishwasher, Sofa_bed,
Tree_house, Towel, Porch, Wine_rack, Jacuzzi
-
Start small: We start with the end-to-end process with only 1 label first (Bathtub).
- Download OpenImageV6 for class Bathtub (use this notebook)
- Upload to Roboflow (TODO: YOLOX requires what image size?)
- Export as link
- Use the Roboflow YOLOX training notebook to train the model
- Download the model and do inferences
- Setup either Weight & Biases or Tensorboard (edit the original YOLOX training notebook)
- Setup experiment environment with 3 variants of the YOLOX model (yolo-s, yolo-m, yolox)
-
Experiment with 3 yolox size variants with 200 images of all 30 classes (for 30 epochs)
- Record the experiment result and make conclusion based on each variant's pros and cons
-
Train selected variant for 300 epochs on base dataset
-
Train selected variant for 300 epochs on dataset after applying bag-of-freebies (from Yolov4 whitepaper)
-
Create a demo application (https://huggingface.co/spaces/Dolpheyn/dminity)
2021-08-29: Environment setup
Created a notebook Download Custom OpenImage Dataset and Upload to Google Drive.
Uploaded custom dataset with 1 class -- Bathtub -- to roboflow.
The notebook for training YOLOX with roboflow requires Pascal VOC export format.
Here it says that YOLOX needs 640x640 input size. Yolo-tiny and Yolo-nano needs 416x416.
Tomorrow:
- Download the 10 classes dataset and upload to drive
- Resize the Bathtub dataset to what YOLOX requires and continue with training
- setting up for experimentation.
2021-08-31: Trained with 1 class and setup for experimentation
Trained using the notebook with the bathtub dataset. Confirmed that 640x640 is the correct input size for the model.
The eval cell does'nt work, with an error of division with zero (the zero is the
number of eval images, the n_samples
). However, the folder containing eval
list of images in /content/YOLOX/datasets/VOC2012/ImageSets/Main/val.txt
does
have a lot of items. TODO: look into the evaluator dataset loader script.
Things to note: the train & test were successful although they're using the same dataloader.
File "/content/YOLOX/yolox/evaluators/voc_evaluator.py", line 167, in evaluate_prediction
a_infer_time = 1000 * inference_time / (n_samples * self.dataloader.batch_size)
│ │ │ │ └ 64
│ │ │ └ <torch.utils.data.dataloader.DataLoader object at 0x7fc4a27cdad0>
│ │ └ <yolox.evaluators.voc_evaluator.VOCEvaluator object at 0x7fc4a27cd8d0>
│ └ 0.0
└ 0.0
ZeroDivisionError: float division by zero
The first inference result (yolox-s):
For the experimentation, I saw somewhere in the trainer that it writes to a
tensorboard's SummaryWriter
. If I can load it into tensorboard locally, I can
see the result after training finished for the day & make conclusions.
Path of the trainer: /content/YOLOX/yolox/core/trainer.py
On line 178:
# Tensorboard logger
if self.rank == 0:
self.tblogger = SummaryWriter(self.file_name)
The Tensorboard events are stored in the experiment folder.
Path: /content/YOLOX_Outputs/<experiment_name>
Just zip the whole directory, mount gdrive and copy. Then, download to local and launch tensorboard locally to see the experiment.
The trainer only writes the average precision though, idk if there are other useful information to get.
TODO: check other information one can get from tensorboard.SummaryWriter
To experiment with other yolox variants,
- Download pretrained weights from the checkpoint storage
- Copy dataloaders from the yolo_s example into
exps/default/<model_variant>
to make the train script load Pascal VOC format datasets - Train
- Zip and download outputs
- Watch output in tensorboard locally
Tensorboard with the training outputs of yolox-s and yolox-m for bathtub:
2021-09-02: Download 200 training images from each class
08:30 -- Tried uploading the first 10 classes to roboflow with no limit (16275 train, 1124 test, 280 validation). Browser kept going out of memory.
However, roboflow has "add more image to a dataset" feature, so maybe we can upload 5 classes at a time for 30/5 = 6 times. Could take a day or two of downloading and uploading.
That's a problem for when we want to do the end-to-end process with all the data. But today, we are going to do 10 classes at a time with a limit of 200 data per class to start with experimentation.
10:02 -- Uploading OID_lim200_1-10
to roboflow (classes 1 through 10,
limited to 200 images for each class.) and downloading OID_lim200_11-20
from
OpenImage.
10:31 -- OID_lim200_01-10
had some problem (didnt map the class code to
class name). Need to redownload. OID_lim200_11-20
works fine with a total of
2479 images.
12:41 -- Downloaded OID_lim200_01-10
, OID_lim200_11-20
&
OID_lim200_21-30
and created a new project on roboflow just for
experimenting
Here are all the number of images in each 10-class-bucket limit to 200 images each class that we need to upload:
╰─ ls OID_lim200_01-10/Dataset/*/*/*.jpg | wc -l
2825
╰─ ls OID_lim200_11-20/Dataset/*/*/*.jpg | wc -l
2479
╰─ ls OID_lim200_21-30/Dataset/*/*/*.jpg | wc -l
1719
And the size of each bucket:
╰─ du -hs OID_lim200_*
1019M OID_lim200_01-10
886M OID_lim200_11-20
718M OID_lim200_21-30
14:17 -- Done for the day.
2021-09-07: Setup Repository
. dminity/
|__ dataset/
| |__ download-custom-openimage-dataset.ipynb
|__ train/
| |__ experiment-yolox-variants.ipynb
|__ deploy/ #TODO
2021-09-08: Explore YOLOX Code, create call graph viz
- Setup
weights and biasestensorboard in my fork of the YOLOX repo
If we give --fp16
arg when executing train.py
, it will adopt mixed precision training
that will decrease memory usage and bandwith resulting to speed up in training.
To show more things in tensorboard, add metrics to SummaryWriter
(self.tblogger) in yolox/core/trainer.py
in Trainer::after_epoch
on this line
2021-09-11: Explore YOLOX eval code
Found where to add more scalar to tensorboard SummaryWriter (in
Trainer::evaluate_and_save_model
).
Need to know where to get all the missing metrics in order to write to tb every epoch.
Missing metrics:
-
train/loss
-
train/box_loss
-
train/obj_loss
-
train/cls_loss
-
metrics/precision
-
metrics/recall
-
val/box_loss
-
val/obj_loss
-
val/cls_loss
-
x/lr0
-
x/lr1
-
x/lr2
What are dem losses?
box_loss
is a loss that measures how "tight" the predicted bounding boxes
are to the ground truth object (usually a regression loss, L1, smoothL1 etc.).
cls_loss
a loss that measures the correctness of the classification of each
predicted bounding box: each box may contain an object class, or a
"background". This loss is usually called cross entropy loss
Where can I get the metrics?
train/*
and metrics/*
: maybe from outputs = model()
hyperparameters (x/*
): maybe from optimizer
TODO tomorrow: Output everything in outputs (in
trainer::train_for_one_iter
).
2021-09-13: Continue with tensorboard
Output of YOLOX each training iter:
{
'total_loss': tensor(13.5224, device='cuda:0', grad_fn=<AddBackward0>),
'iou_loss': tensor(2.4550, device='cuda:0', grad_fn=<MulBackward0>),
'l1_loss': 0.0,
'conf_loss': tensor(7.1816, device='cuda:0', grad_fn=<DivBackward0>),
'cls_loss': tensor(3.8858, device='cuda:0', grad_fn=<DivBackward0>),
'num_fg': 5.926470588235294
}
Which is the return value of yolox/models/yolo_head.py::forward()
344 return loss, reg_weight * loss_iou, loss_obj, loss_cls, loss_l1, num_fg / max(num_gts, 1)
Which in yolox/models/yolox.py::forward()
is assigned like this:
loss, iou_loss, conf_loss, cls_loss, l1_loss, num_fg = self.head(
fpn_outs, targets, x
)
So, conf_loss
from outputs
is loss_obj
from head's outputs.
2021-09-14
Setup validation
Created a script to list 1 img path for each class from the validation set.
Playing with batch size
Bigger batch size means lower steps per epoch since we are only correcting per N images with the average loss of all N images, where N is the batch size.
This means that a larger batch size means faster training, but one implication that might happen is that our model can be worse and generalizing.
One strategy that I can think of is: train for larger batch size early on for like 20 epochs, and then switch to a lower batch size and train for more epochs and wait for the model to perform better.
I still think I need wandb to monitor experiments and provides a convenient way of reloading and resuming checkpoints, since it allows us to load past checkpoints from its artifact.
Batch size to number of steps per epoch
Batch size 32 - 112 iters/epoch, mem: 9871, avg iter time: 2.655 Average forward time: 17.31 ms, Average NMS time: 4.25 ms, Average inference time: 21.56 ms
32 batch-size got CUDA out of mem error after the first epoch. Probs just stick to lower batch num. The 64 batch-size one didn't even got to the first epoch.
Batch size 16 - 223 iter/epoch, mem: 9799Mb, avg iter time: 2.085 Average forward time: 18.54 ms, Average NMS time: 4.79 ms, Average inference time: 23.32 ms
2021-10-10 - 2021-10-16
Literature review for FYP paper
Wrote chapter 1: Introduction (background of study, problem statement, research questions & objectives, scope, significance)
2021-10-17 - 2021-10-23
Literature review for FYP paper
Read the YOLOX paper, extracted key points: the network architecture, performance, comparisons.
Also read the YOLO (v1) and SSD paper.
2021-10-24 - 2021-10-30
Literature review for FYP paper
Read the YOLOv2, RetinaNET and YOLOv3 papers (just some high level point extractions, need to go through again.)