Alex Face Detector
This project is a custom R-CNN to develop bounding box object detection.
There are 4 key components
- Generate ground truths: Label the images
- Generate a dataset: Take ground truth labels and form a dataset from it
- Transfer learning: Use pre-existing weights of VGG16 imagenet and customise it to this dataset
- Predict: Use the trained model to make predictions on incoming images
How to set up the right environment to run the program
- Make sure you have python installed, use the following link https://www.python.org/downloads/
- How to get started with python https://www.python.org/about/gettingstarted/
- How to set up your requirements with requirements.txt https://stackoverflow.com/questions/7225900/how-can-i-install-packages-using-pip-according-to-the-requirements-txt-file-from
Okay you're now set with the right environment let's get this show on the road!
Make your current working directory when running scripts the same as the one readme.md (this file) is stored in Psst you can check using os.getcwd()
- Take a bunch of photos of your face and drop them into the path "1. Data Gen\1. Data"
- Run the python script "1. Data Gen\create_ground_truth_bounding_box.py" this is a labelling tool that can enable you to label data.
- Assumption 1: Every image name is unique
- Assumption 2: There is only one object in a frame at a time
- Note 1: It will resize your images to fix the screen hence it will have much lower resolution so that I have to do less programming.
- Note 2: It will run through all of those files that do not have a corresponding .csv file (hence they are assumed not to have been labelled). If you want to re-label them
- Look at the tutorial image below to label images.
- Now that all your images are labeled lets we'll need to generate a dataset of smaller sections of the image to train the image classifier on. Run the python script "1. Data Gen\generate_data_set_labels.py" to select thousands of bounding boxes of interest based upon cv2's fast selective search function. This will have very few foreground labeled images (what i've specifed as an IOU of 90%). This script will take a while to run because it's not optimised.
- Time for training the classifier! Run the python script "2. R-CNN\transfer_learning_classifier.py", depending upon your dataset size it could take a very long time, my dataset was 4000 images and it took me 5 hours to run.
- Note: Memory was an issue so i've limited the dataset size to 4000 labels images with a batch size of 8, this may need to change for your machine
- Time to do some predictions, everything should be trained and ready to go. Put some more unseen images into "1. Data Gen\2. Data for Predictions" and it will loop through and try do some predictions. Heads up it takes quite a while to run through the thousands of regions of interest generated by cv2's selective search function.
- Note: The output of each prediction is saved in "1. Data Gen/3. Data From Predictions" and saved as a pickle file (.p file extension)
There you have it folks!