Skip to content

Speeding up face recognition with oneshot regression-classification CNN

Notifications You must be signed in to change notification settings

oneir0mancer/face_detection_recognition

Repository files navigation

Joint detection-identification with convolutional neural networks

This repo contains the code for my bachelor thesis on face detection and recognition. It explores the ways of speeding up deep face recognition systems by switching from the traditional pipeline - detection -> alignment -> feature extraction -> classification - to oneshot regression-classification framework.

Some results

Requrements

  • Python >= 3.6
  • numpy
  • PyTorch >= 1.0
  • dlib

Get started

  1. Download FEI Face Database, Caltech Faces 1999 dataset, Georgia Tech face database. Organize your folders like this:
faces
├── fei
│   └── ...
├── caltech_faces
│   └── ...
└── gt_db
    └── ...
  1. Run preprocess.py. After some time it will generate labels (bbox coordinates and class labels) for your data.

  2. Run organize_data.py. It will downscale images (to 320px by default) and split them into trainset and testset.

Alterantively, download already preprocessed dataset that I used. It also contains images from google for 9 more subjects.

Run train.py to train the network.

Network architecture

Scheme

Network architecture is based on YOLO. DarkNet backbone is replaced with much more popular ResNet. Bbox attributes are separated from class probabilities and, furthermore, an additional layer for face embeddings is added. This will help to implement things like ArcFace or Triplet loss in the future.

Training

The training process is much similar to YOLO training. Loss function is constructed similatly.

loss

The difference is that I use smooth L1 loss for localization loss, binary cross-entropy for confidence loss and cross-entropy for classification loss.

Results

This network was trained (on Google Colab) for 25 epochs with batch size = 128.

It was compared against traditional dlib + ResNet34 pipeline.

dlib + ResNet34 Described net
Accuracy 97.54% 98.15%
Mean IoU 79.82% 78.88%
FPS 6.5 34.5

This numbers were obtained on laptop GTX 1050 GPU, so fps can vary. Also, I don't believe dlib uses GPU acceleration, so comparison against pipelines that use MTCNN will be less impressive.

About

Speeding up face recognition with oneshot regression-classification CNN

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages