Here we will try to train a classifier to identify different rowing boat classes. Rowing has several different boat classes based on the number of people in a crew boat and whether each rower has two oars (sculling) or a single oar (sweep rowing). We will use powerful pretrained computer vision models and adapt these for our purposes via Tensorflow.
First lets collect an appropriate dataset of different classes of boat. A Python package (https://github.com/hardikvasa/google-images-download) helps make this simple.
from google_images_download import google_images_download
response = google_images_download.googleimagesdownload() #class instantiation
search_words = "Rowing Single,Rowing Pair,Rowing Double,Rowing Four,Rowing Quad,Rowing Eight"
arguments = {"keywords":search_words,"limit":5,"print_urls":True} #creating list of arguments
response.download(arguments) #passing the arguments to the function
We can repeat this similarly with different choices of keywords to expand our dataset. (It seems to be more effective to take top hits for several different key words rather than using many hits for a broad category that gets misinterpreted by the search). Manually checking some of the downloaded images reveals that lots of images are connected with the wrong search term. Search for a double brings up a quad, for example. For retraining the model to be most effective, we want to remove and/or correct the mislabelled data. I do this manually, since there are not so many images, but this does impose a bottleneck on the amount of images. Taking the top results from google searches does also tend to return many of the same images, which also contributes to a restriction on the amount of data.
Let's check how many images we managed to get hold of for each class
import os
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
boat_classes = [boat_class for boat_class in os.listdir('downloads') if os.path.isdir('downloads/'+boat_class)]
class_frequencies = [len([name for name in os.listdir('downloads/'+boat_class) if name.endswith('.jpg') or name.endswith('.png') ])
for boat_class in boat_classes]
print('We have a total of {} images'.format(sum(class_frequencies)))
plt.bar(boat_classes,class_frequencies)
plt.xticks(boat_classes, boat_classes, rotation='vertical')
plt.ylabel('# of examples')
plt.show()
This should be enough to give it a go at retraining the neural network model.
To retrain the neural network model, we use the script here: https://raw.githubusercontent.com/tensorflow/hub/r0.1/examples/image_retraining/retrain.py The neural network model we are using is an architecture called Inception V3 trained on a large dataset known as Imagenet. This model consists of many successive layers that capture general features of an image, such as edges, through to fine details and textures at different layers. Nonlinear activations allow the model flexibly to approximate complex representations of data.
Running the script above first passes our images through the network without the final layer and stores the result in a cache. We can then fine tune the weights of the final layer. Storing the results avoids having to repeatedly pass images through the network. Since the Inception V3 model has been trained already on the large dataset of a variety of different objects (Imagenet), the model already has a good representation of objects in images and we can be successful in identifying boat classes simply by slightly altering this model.
os.system('python retrain.py --image_dir downloads/')
Once the images have been converted to the representation given by the penultimate layer of the model (a 2048-dimensional vector), we can train the weights of the final layer via stochastic gradient descent. We define an objective function that tells us about the difference between the model predictions and the known image class labels for our images. In gradient descent, we would evaluate the gradient of the objective function that we want to minimize and move in the direction of $-\nabla f$ to give the direction of greatest decrease of the function. That is update the weights via $$ w^{k+1} = w^k - \alpha\nabla f(w^k) $$ where $w$ are the parameters of the layer we want to learn, $f$ is our objective function and $\alpha$ is the learning rate which describes how large a step we take in parameter space. Under certain conditions on the function $f$ and the learning rate following gradient descent is guaranteed to converge to a local minimum of the function $f$. In practice though, this requires evaluating the gradient on all of the available data at once which is not computationally feasible. Instead we can approximate the gradient by estimating it on a subset of our data (using say 100 images at a time). This introduces some noise via the noisy estimate of the gradient, but this noise has beneficial properties for the performance of stochastic gradient descent. Many more sophisticated variations on this optimization method have been developed using ideas such as momentum to avoid local minima and look for a global optimum. Some of these ideas are explained further here: https://distill.pub/2017/momentum/
To monitor the results of training the model by stochastic gradient descent, we can use an interactive dashboard called tensorboard.
os.system('tensorboard --logdir /tmp/retrain_logs')
Then navigating to localhost:6006 in a browser will allow us to visualise the progress of training the model.
The objective function we use here is a cross-entropy loss defined as $H(p,q)=\operatorname {E}_{p}[-\log q]=-\sum _{x}p(x)\,\log q(x).\!$ We can also relate this to the KL divergence via $H(p,q)=H(p)+D_{{{\mathrm {KL}}}}(p\|q),\!$ so that for a fixed distribution, $p$, minimizing the cross-entropy is equivalent to minimizing the KL-divergence between $p$ and $q$. This enforces the distribution of true labels to be similar to the distribution of predicted labels.
We see from the graphs below that the objective decreases noisily, and at the same time the accuracy, the proportion of correctly predicted boat classes for images, increases. Eventually, after 4000 steps of training, the accuracy plateaus at around 65% accuracy on the test set of data. (We separate our data into some data on which to train the model and some test data to check the model is a good model. The blue curve shows metrics evaluated on test data, orange curve is on training data.)
from IPython.display import Image
Image(filename='cross_entropy_curve.png')
Image(filename='accuracy_curve.png')
Great, so we can identify what class a boat is 65% of the time. Probably most humans that have seen a rowing boat before could do a lot better than this, but its a start.
Lets see some examples of images that the model could get right, and some it got wrong. First this image of an oxford college novice eight is incorrectly predicted to be a four. An eight is the second most likely boat class here though, with a four being the next biggest sweep boat after an eight.
Image(filename='test_001.jpg')
!python label_image.py --graph=/tmp/output_graph.pb --labels=/tmp/output_labels.txt --input_layer=Placeholder --output_layer=final_result --image=test_001.jpg
This image of the kiwi pair is correctly labelled as a pair.
Image(filename='test_002.jpg')
!python label_image.py --graph=/tmp/output_graph.pb --labels=/tmp/output_labels.txt --input_layer=Placeholder --output_layer=final_result --image=test_002.jpg
Finally this image of Vicky Thornley is correctly predicted as a single.
Image(filename='test_003.jpg')
!python label_image.py --graph=/tmp/output_graph.pb --labels=/tmp/output_labels.txt --input_layer=Placeholder --output_layer=final_result --image=test_003.jpg
So, if we did want to improve this model, we could find better ways to gather more labelled data and train the model again. It seems as though the data is the key restriction here. This project was just for a little fun, with bumps racing in the form of summer eights starting this week in Oxford. Since identifying boat classes is a task that humans can already do very effectively and there is little need to automate this task, it's probably not worth the work required to get higher accuracy, but it is cool to see how powerful these models are.
This project has used a technique called transfer learning and you can learn much more here: https://www.tensorflow.org/tutorials/image_retraining