Created vehicle detection pipeline with two approaches: (1) deep neural networks (YOLO framework) and (2) support vector machines ( OpenCV + HOG).
Others are the same as in the repository of Lane Departure Warning System:
Anaconda is used for managing my dependencies.
(1) Download weights for YOLO
You can download the weight from here and save it to the weights folder.
(2) If you want to run the demo, you can simply run:
python main.py
svm_pipeline.py
contains the code for the svm pipeline.
Steps:
The code for this step is contained in the function named extract_features
and codes from line 464 to 552 in svm_pipeline.py
.
If the SVM classifier exist, load it directly.
Otherwise, I started by reading in all the vehicle
and non-vehicle
images, around 8000 images in each category. These datasets are comprised of
images taken from the GTI vehicle image database and
KITTI vision benchmark suite.
Here is an example of one of each of the vehicle
and non-vehicle
classes:
I then explored different color spaces and different skimage.hog()
parameters (orientations
, pixels_per_cell
, and cells_per_block
). I grabbed random images from each of the two classes and displayed them to get a feel for what the skimage.hog()
output looks like.
Here is an example using the RGB
color space and HOG parameters of orientations=9
, pixels_per_cell=(8, 8)
and cells_per_block=(2, 2)
:
To optimize the HoG extraction, I extract the HoG feature for the entire image only once. Then the entire HoG image
is saved for further processing. (see line 319 to 321 in svm_pipeline.py
)
I tried various combinations of parameters and choose the final combination as follows
(see line 16-27 in svm_pipeline.py
):
YCrCb
color spaceAll the features are normalized by line 511 to 513 in svm_pipeline.py
, which is a critical step. Otherwise, classifier
may have some bias toward to the features with higher weights.
I randomly select 20% of images for testing and others for training, and a linear SVM is used as classifier (see line
520 to 531 in svm_pipeline.py
)
For this SVM-based approach, I use two scales of the search window (64x64 and 128x128, see line 41) and search only between
[400, 656] in y axis (see line 32 in svm_pipeline.py
). I choose 75% overlap for the search windows in each scale (see
line 314 in svm_pipeline.py
).
For every window, the SVM classifier is used to predict whether it contains a car nor not. If yes, save this window (see
line 361 to 366 in svm_pipeline.py
). In the end, a list of windows contains detected cars are obtianed.
After obtained a list of windows which may contain cars, a function named generate_heatmap
(in line 565 in
svm_pipeline.py
) is used to generate a heatmap. Then a threshold is used to filter out the false positives.
For image, we could directly use the result from the filtered heatmap to create a bounding box of the detected vehicle.
For video, we could further utilize neighbouring frames to filter out the false positives, as well as to smooth the position of bounding box.
svm_pipeline.py
).scipy.ndimage.measurements.label()
to identify individual blobs in the heatmap.yolo_pipeline.py
contains the code for the yolo pipeline.
YOLO is an object detection pipeline baesd on Neural Network. Contrast to prior work on object detection with classifiers to perform detection, YOLO frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance.
Steps to use the YOLO for detection:
yolo_pipeline.py
is modified and integrated based on this tensorflow implementation of YOLO.
Since the “car” is known to YOLO, I use the precomputed weights directly and apply to the entire input frame.
For the SVM based approach, the accuray is good, but the speed (2 fps) is an problem due to the fact of sliding window approach is time consuming! We could use image downsampling, multi-threads, or GPU processing to improve the speed. But, there are probably a lot engineering work need to be done to make it running real-time. Also, in this application, I limit the vertical searching range to control the number of searching windows, as well as avoid some false positives (e.g. cars on the tree).
For YOLO based approach, it achieves real-time and the accuracy are quite satisfactory. Only in some cases, it may failure to detect the small car thumbnail in distance. My intuition is that the original input image is in resolution of 1280x720, and it needs to be downscaled to 448x448, so the car in distance will be tiny and probably quite distorted in the downscaled image (448x448). In order to correctly identify the car in distance, we might need to either crop the image instead of directly downscaling it, or retrain the network.