Built and trained a convolutional neural network for end-to-end driving in a simulator, using TensorFlow and Keras.
(Note: the hyperlinks only works if you are on the homepage of this GitHub reop, and if you are viewing it in “github.io” you can be redirected by clicking the View the Project on GitHub on the top)
You can also build it by yourself from here.
Anaconda is used for managing my dependencies.
conda env create -f environment-gpu.yml
source activate evn-gpu
My computer setting is as follows:
Using the Udacity provided simulator and my drive.py file, the car can be driven autonomously around the track by following two steps:
(1) Launch the Udacity simulator, and enter AUTONOMOUS MODE.
(2) Drive the car by executing:
python drive.py model.h5
(1) Launch the Udacity simulator, and enter TRAINING MODE.
(2) Record your own manual driving sequences and save them as csv file.
(3) Train your model with saved sequences.
(4) Test your model in AUTONOMOUS MODE (following steps in 4).

My model consists of a convolution neural network with 4 conv layers which have 3x3 filter sizes and depths vary between 32 and 256, and 3 fully connected layers. The model includes RELU layers to introduce nonlinearity (e.g. code line 145), and the data is normalized in the model using a Keras lambda layer (code line 143).
The model contains dropout layers in order to reduce overfitting (model.py lines 147 and 150). The model was trained and validated on different data sets to reduce overfitting (code line 214-218). The model was tested by running it through the simulator and ensuring that the vehicle could stay on the track.
The model used an adam optimizer, so the learning rate was not tuned manually (model.py line 225). The batch_size and nb_epoch are set (model.py line 163 and 244)
Training data was chosen to keep the vehicle driving on the road. There are three cameras (left, center, right) mounted on the front of the car, and I used all three cameras in training. This is because we need to handle the issue of recovering from being off-center. For details about how I created the training data, see the next section.
The simulator has three cameras: a center, right and left camera. One example is as follows:

To capture good driving behavior, I recorded two laps on track 1 using center lane driving. In the training stage, I use all three cameras as training inputs. This is because we need to handle the issue of recovering from being off-center driving. How to achieve this:

In the simulator, we could also weave all over the road and turn recording on and off to record recovery driving. However, in a real car, that’s not really possible, or at least not legally. So, I decided not to record the vehicle recovering from the left side and right sides of the road back to center.
Then after a few test with my network, I found it doesn’t perform well in sharp turns, so I record a few more driving examples in turning for my network to learn.





When we process the left and right camera, we add corrections (+0.2 or -0.2) for their steering angles because we only know the ground-truth steering angle for the center camera (as given by Udacity simulator). Therefore, it may introduce some small errors for the steering angles of left and right images. So, I decided that in the validation data, I only use the center camera. Finally randomly shuffled the data set and put 30% of the data into a validation set (code line 214).
I used this training data for training the model. The validation set helped determine if the model was over or under fitting. The ideal number of epochs was 4 as evidenced by the validation loss is not getting lower anymore. I used an adam optimizer so that manually training the learning rate wasn’t necessary.
My first attempt was to use a convolution neural network model similar to the LeNet, however it doesn’t performs well enough, high loss in both training and validation. So, I take two approaches: (1) balance the training data as well as (2) change the model similar to VGG net - with configuration A
My proposed model is derived from VGG and LeNet, which is more complex than LeNet but smaller than VGG. Later, I found that my model had a low mean squared error on the training set but a high mean squared error on the validation set, which implied that the model was overfitting. So, I added tow dropout layers into the model and reduce the number of neurons in FC layers. Then I noticed that both the train loss and validation loss are small.
In the driving test, I found that the model works on both track 1 and even an unseen track 2 without leaving the road.