This project is part of the Udacity Self-Driving Car Engineer Nanodegree. It performs vehicle identification with boundary boxes in images and videos using "classical" computer vision techniques such as
- Support Vector Machines (SVM)
- Histogram of Oriented Gradients (HOG)
- Spatial Binning
A pre-trained classifier is included in this repo, but for changes in the parameters, you need to download training images such as GTI vehicle image database and Kitti vision dataset -> object -> 2D object. Please unpack the images to a folder train_images, separated into vehicles and non-vehicles. An example video file can be downloaded here.
git clone https://github.com/jhallier/Vehicle-Detection
cd Vehicle-Detection
pip install -r requirements.txt
Run a sample visualization of the data and some transformations on the images
python data_exploration.py
Run a vehicle detection pipeline on a video image
python main.py
The pipeline of the transformation is as follows:
- Perform a Histogram of Oriented Gradients (HOG) feature extraction on a labeled training set of images and train a classifier Linear SVM classifier
- Perform spatial color binning and histograms of color on the image, and append the result to the feature vector
- Run sliding windows of different sizes across the image and generate a heatmap of each positive result
- Threshold the heatmap to prevent false positives and combine continuous heatmap regions to one detection result with a boundary box
- Run the pipeline on a video and save the output video file with the resulting boundary boxes drawn upon
With the function extract_features_path from support_functions.py, the images are read in one by one and a feature vector is extracted. First, spatial binning is performed (if the flag is set), then color histogram, and then HOG features are extracted with the get_hog_features function.
Below is an example of car and a non-car image from the dataset.
From these car and non-car images, the color space is transformed and a HOG transformation applied to each channel. HOG calculates the main gradient direction in small parts of the image. The gradient is discretized in n orientations. For example, 9 orientations means the main direction of the gradient is calculated in steps of 45 degrees. This allows a generalization of a car image, where the shape may differ a little from image to image, but the rough proportions of the car are similar and distinct to other non-car images.
Below is an example of the YCrCb color space, which shows that all three different channel show a slightly different HOG, but all are distinctly different from the non-car image. Channel 1 seems to be most interesting for the general shape of the car, and channel three for the color values, which can be used in the spatial color binning.
As a comparison, here are the HOG transformations for the HLS color space, where channel 2 still catches some of the shape, but the other channels are inferior compared to YCrCb.
A linear SVM was trained using all three HOG channels combined, a spatial binning feature with 32x32 size, and color histogram with 32 bins. The feature vector is normalized with the sklearn function StandardScaler, which has the advantage that the same normalization can later be applied to the images from the video stream. 20% of the images are used as a test set and a linear SVM is trained to classify cars and non-cars.