Transfer Learning with VGG16 model
Introduction
In the previous post: Part-1, we had classified the images of dog breeds using a model that we created from scratch. Using that model we predicted the dog breeds with an accuracy of around 10%. With 133 dog breeds (target classes), random selection would have given us an accuracy of less than 1%. Compared to that our simple model performed reasonably well.
But ~10% accuracy is still very low. We can use a more complex model for our problem but the more complex a model is, the more time and computing power it takes to train it. To get a high enough accuracy in our problem it would take days to train a sufficiently complex model on any personal computer.
Instead, we are going to use a method called Transfer Learning to hasten the model training process.
At a fundamental level, all images share the same basic features - Edges, Curves, Gradients, Patterns, etc. As such, we do not need to train the model to recognize these features every time. Since these features are stored in a model as weight parameters, we can re-use a pre-trained model to skip the time needed to train these weights. We only need to train the weights for the final classification layer based on our particular problem. This process is known as Transfer Learning.
In this post we are going to use a large but simple model called VGG-16.
Let’s get started.
Import Libraries
import numpy as np
from glob import glob
import os
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
from PIL import Image
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
Check Datasets
The first step is to load-in the Images and check the total size of our dataset.
The Dog Images Dataset can be downloaded from here: dog dataset. Unzip the folder and place it in this project’s home directory, at the location
/dogImages
.
# load filenames for dog images
dog_files = np.array(glob(os.path.join('dogImages','*','*','*')))
# print number of images in dataset
print('There are %d total dog images.' % len(dog_files))
There are 8351 total dog images.
Check CUDA Availability
Check if GPU is available.
# check if CUDA is available
use_cuda = torch.cuda.is_available()
if use_cuda:
print('Using GPU.')
Using GPU.
Define Parameters
Define the parameters needed in data loader and model creation.
# parameters
n_epochs = 5
num_classes = 133
num_workers = 0
batch_size = 10
learning_rate = 0.01
Data Loaders for the Dog Dataset
In the next step we will do the following:
- Define Transformations that will be applied to the images using
torchvision.transforms
. Transformations are also known as Augmentation. This is a pre-processing step and it helps the model to generalize to new data much better. - Load the image data using
torchvision.datasets.ImageFolder
and apply the transformations. - Create Dataloaders using
torch.utils.data.DataLoader
.
Note:
- We have created dictionaries for all three steps that are divided into train, validation and test sets.
- The Image Resize shape and mean & standard-deviation values for Normalization module were chosen so as to replicate the VGG16 model.
## TODO: Specify data loaders
trans = {
'train': transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(15),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
]),
'valid': transforms.Compose([
transforms.Resize(224),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
]),
'test': transforms.Compose([
transforms.Resize(224),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
}
data_transfer = {
'train': datasets.ImageFolder(os.path.join('dogImages','train'), transform=trans['train']),
'valid': datasets.ImageFolder(os.path.join('dogImages','valid'), transform=trans['valid']),
'test': datasets.ImageFolder(os.path.join('dogImages','test'), transform=trans['test'])
}
loaders_transfer = {
'train': DataLoader(data_transfer['train'], batch_size=batch_size, num_workers=num_workers, shuffle=True),
'valid': DataLoader(data_transfer['valid'], batch_size=batch_size, num_workers=num_workers, shuffle=True),
'test': DataLoader(data_transfer['test'], batch_size=batch_size, num_workers=num_workers, shuffle=True)
}
print(f"Size of Train DataLoader: {len(loaders_transfer['train'].dataset)}")
print(f"Size of Validation DataLoader: {len(loaders_transfer['valid'].dataset)}")
print(f"Size of Test DataLoader: {len(loaders_transfer['test'].dataset)}")
Size of Train DataLoader: 6680
Size of Validation DataLoader: 835
Size of Test DataLoader: 836
Model Architecture
Next, we will initialize the vgg16 pre-trained model using the torchvision.models.vgg16
module. We will keep the whole model unchanged except the last classifier layer, where we change the number of output nodes to number of classes.
# specify model architecture
model_transfer = torchvision.models.vgg16(pretrained=True)
# modify last layer of classifier
model_transfer.classifier[6] = nn.Linear(4096, num_classes)
print(model_transfer)
VGG(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU(inplace)
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU(inplace)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace)
(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU(inplace)
(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU(inplace)
(16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(18): ReLU(inplace)
(19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(20): ReLU(inplace)
(21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU(inplace)
(23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(25): ReLU(inplace)
(26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(27): ReLU(inplace)
(28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(29): ReLU(inplace)
(30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(classifier): Sequential(
(0): Linear(in_features=25088, out_features=4096, bias=True)
(1): ReLU(inplace)
(2): Dropout(p=0.5)
(3): Linear(in_features=4096, out_features=4096, bias=True)
(4): ReLU(inplace)
(5): Dropout(p=0.5)
(6): Linear(in_features=4096, out_features=133, bias=True)
)
)
Freeze Feature Gradients
We need to freeze the gradients for the feature part of the model as we do not want to re-train the weigths for those layers. We will only train the weights for the classifier section of the model.
# freeze gradients for model features
for param in model_transfer.features.parameters():
param.require_grad = False
Specify Loss Function and Optimizer
We have chosen CrossEntropyLoss
as our loss function and Stochastic Gradient Descent
as our optimizer.
Note: Here we are only optimizing the weights for classifier part of the model. We will not change the weights for the features part of the model.
## select loss function
criterion_transfer = nn.CrossEntropyLoss()
## select optimizer
optimizer_transfer = optim.SGD(params=model_transfer.classifier.parameters(), lr=learning_rate)
Train and Validate the Model
We define a function for Training and Validation. It calculates a running train & validation loss and saves the model whenever the validation loss decreases.
def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path):
"""returns trained model"""
# initialize tracker for minimum validation loss
valid_loss_min = np.Inf
for epoch in range(1, n_epochs+1):
# initialize variables to monitor training and validation loss
train_loss = 0.0
valid_loss = 0.0
###################
# train the model #
###################
for batch_idx, (data, target) in enumerate(loaders['train']):
# move to GPU
if use_cuda:
data, target = data.cuda(), target.cuda()
## find the loss and update the model parameters accordingly
## record the average training loss
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
train_loss += loss.item() * data.size(0)
if batch_idx % 200 == 0:
print(f"Training Batch: {batch_idx}+/{len(loaders['train'])}")
######################
# validate the model #
######################
for batch_idx, (data, target) in enumerate(loaders['valid']):
# move to GPU
if use_cuda:
data, target = data.cuda(), target.cuda()
## update the average validation loss
output = model(data)
loss = criterion(output, target)
valid_loss += loss.item() * data.size(0)
if batch_idx % 200 == 0:
print(f"Validation Batch: {batch_idx}+/{len(loaders['valid'])}")
train_loss = train_loss / len(loaders['train'].dataset)
valid_loss = valid_loss / len(loaders['valid'].dataset)
# print training/validation statistics
print(f'Epoch: {epoch} \tTraining Loss: {train_loss} \tValidation Loss: {valid_loss}')
# save the model if validation loss has decreased
if valid_loss <= valid_loss_min:
print(f'Validation loss decreased from {valid_loss_min} to {valid_loss}.\nSaving Model...')
torch.save(model.state_dict(), save_path)
valid_loss_min = valid_loss
# return trained model
return model
Finally, we train the model.
# train the model
if use_cuda:
model_transfer = model_transfer.cuda()
model_transfer = train(n_epochs, loaders_transfer, model_transfer, \
optimizer_transfer, criterion_transfer, use_cuda, 'model_transfer.pt')
Training Batch: 0+/668
Training Batch: 200+/668
Training Batch: 400+/668
Training Batch: 600+/668
Validation Batch: 0+/84
Epoch: 1 Training Loss: 2.233159229605498 Validation Loss: 1.1463432044326187
Validation loss decreased from inf to 1.1463432044326187.
Saving Model...
Training Batch: 0+/668
Training Batch: 200+/668
Training Batch: 400+/668
Training Batch: 600+/668
Validation Batch: 0+/84
Epoch: 2 Training Loss: 1.570702178994874 Validation Loss: 0.9507174207243377
Validation loss decreased from 1.1463432044326187 to 0.9507174207243377.
Saving Model...
Training Batch: 0+/668
Training Batch: 200+/668
Training Batch: 400+/668
Training Batch: 600+/668
Validation Batch: 0+/84
Epoch: 3 Training Loss: 1.4183635966863462 Validation Loss: 0.9120735898167788
Validation loss decreased from 0.9507174207243377 to 0.9120735898167788.
Saving Model...
Training Batch: 0+/668
Training Batch: 200+/668
Training Batch: 400+/668
Training Batch: 600+/668
Validation Batch: 0+/84
Epoch: 4 Training Loss: 1.3522749468014983 Validation Loss: 0.91904990312582
Training Batch: 0+/668
Training Batch: 200+/668
Training Batch: 400+/668
Training Batch: 600+/668
Validation Batch: 0+/84
Epoch: 5 Training Loss: 1.3099311252910935 Validation Loss: 0.7952524953170451
Validation loss decreased from 0.9120735898167788 to 0.7952524953170451.
Saving Model...
Loading in the saved model.
# load the model that got the best validation accuracy (uncomment the line below)
model_transfer.load_state_dict(torch.load('model_transfer.pt'))
Test the Model
We compare the predicted outputs with target to get the number of correct predictions and then calculate the pecentage accuracy.
def test(loaders, model, criterion, use_cuda):
# monitor test loss and accuracy
test_loss = 0.
correct = 0.
total = 0.
for batch_idx, (data, target) in enumerate(loaders['test']):
# move to GPU
if use_cuda:
data, target = data.cuda(), target.cuda()
# forward pass: compute predicted outputs by passing inputs to the model
output = model(data)
# calculate the loss
loss = criterion(output, target)
# update average test loss
test_loss += loss.item() * data.size(0)
# convert output probabilities to predicted class
pred = output.data.max(1, keepdim=True)[1]
# compare predictions to true label
correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred))).cpu().numpy())
total += data.size(0)
test_loss = test_loss / len(loaders['test'].dataset)
print('Test Loss: {:.6f}\n'.format(test_loss))
print('\nTest Accuracy: %2d%% (%2d/%2d)' % (
100. * correct / total, correct, total))
# call test function
test(loaders_transfer, model_transfer, criterion_transfer, use_cuda)
Test Loss: 0.928593
Test Accuracy: 73% (612/836)
Conclusion
With only 5 epochs of training we achieved an accuracy of over 70%. The loss was still decreasing, so we may have been able to get even better performance with more training. This is a huge improvement over the ~10% accuracy we got using the model we created from scratch in Part-1.
VGG16 is not the most advanced model architecture for image recognition. We can get near human level accuracy by using other model architectures such as ResNet. We will look into that in a future post.
The full code for this post can found at this link.
Acknowledgements
This project is based on Dog-Breed-Classification project created as part of Udacity’s Deep Learning Nanodegree program.