Transfer Learning with Pytorch¶
In this post, we'll explore how to perform transfer learning using Pytorch.
We will use a subset of Food-11k that contains 11 different kinds of food categories. We will go over the dataset preparation, data augmentation and then steps to build the classifier. We use transfer learning to use the low level image features like edges, textures etc. learnt by a pretrained model, ResNet50, and then train our classifier to learn the higher level details in our dataset images. ResNet50 has already been trained on ImageNet with millions of images.
The original Food-11k dataset constains about 11k images of 11 categories of foods. Training on the whole dataset will take hours. Hence, we are going to use a subset of this dataset. The 10 food categories include Bread, Dairy product, Dessert, Egg, Fried food, Meat, Noodles, Rice, Seafood, Soup, Vegetable. I've prepared the sub-dataset into train, valid, test set. In the train set, there are 10 folders for the 11 kinds of food, and each folder contains 100 images for a particular kind of food. valid and test set follow the same structure, but with 20 and 40 images per category respectively.
So finally, we have 1100 training images, 220 validation images, and 440 test images in 10 classes of foods.
# import libraries
import torch
from torchvision import models, datasets
from torchvision import transforms
from torch import nn, optim
from torch.utils.data.dataloader import DataLoader
import time
import numpy as np
import matplotlib.pyplot as plt
import os
from PIL import Image
Data Augmentations¶
The images in the available training set can be modified in a number of ways to incorporate more variations in the training process, so that the trained model gets more generalized and performs well on different kinds of test data. Also the input data can come in a variety of sizes. They need to be normalized to a fixed size and format before batches of data are used together for training.
Let us go over the transformations we used for our data augmentation.
The transform RandomResizedCrop crops the input image by a random size(within a scale range of 0.8 to 1.0 of the original size and a random aspect ratio in the default range of 0.75 to 1.33 ). The crop is then resized to 256×256.
RandomRotation rotates the image by an angle randomly chosen between -15 to 15 degrees.
RandomHorizontalFlip randomly flips the image horizontally with a default probability of 50%.
CenterCrop crops an 224×224 image from the center.
ToTensor converts the PIL Image which has values in the range of 0-255 to a floating point Tensor and normalizes them to a range of 0-1, by dividing it by 255.
Normalize takes in a 3 channel Tensor and normalizes each channel by the input mean and standard deviation for the channel. Mean and standard deviation vectors are input as 3 element vectors. Each channel in the tensor is normalized as T = (T – mean)/(standard deviation)
# applying transforms to the data
image_transforms = {
'train': transforms.Compose([
transforms.RandomResizedCrop(size=256, scale=(0.8,1.0)),
transforms.RandomRotation(degrees=15),
transforms.RandomHorizontalFlip(),
transforms.CenterCrop(size=224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])
]),
'valid': transforms.Compose([
transforms.Resize(size=256),
transforms.CenterCrop(size=224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])
]),
'test': transforms.Compose([
transforms.Resize(size=256),
transforms.CenterCrop(size=224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])
])
}
Note that for the validation and test data, we do not do the RandomResizedCrop, RandomRotation and RandomHorizontalFlip transformations. Because they are used for testing model performance.
# Load data
# Set train, valid, and test directory
train_directory = 'food-11k-sub/train'
valid_directory = 'food-11k-sub/valid'
test_directory = 'food-11k-sub/test'
# batch size
bs = 32
# number of epochs
epochs = 20
# number of classes
num_classes = 11
# device
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Load data from directory
data = {
'train': datasets.ImageFolder(root=train_directory,
transform=image_transforms['train']),
'valid': datasets.ImageFolder(root=valid_directory,
transform=image_transforms['valid']),
'test': datasets.ImageFolder(root=test_directory,
transform=image_transforms['test'])
}
# Get a mapping of the indices to the class names, in order to see the output classes of the test images.
idx_to_class = {v: k for k, v in data['train'].class_to_idx.items()}
print(idx_to_class)
# size of data, to be used for calculating Averge Loss and Accuracy
train_data_size = len(data['train'])
valid_data_size = len(data['valid'])
test_data_size = len(data['test'])
# Create iterators for the Data loaded using DataLoader module
train_data = DataLoader(data['train'], batch_size=bs, shuffle=True)
valid_data = DataLoader(data['valid'], batch_size=bs, shuffle=True)
test_data = DataLoader(data['test'], batch_size=bs, shuffle=True)
train_data_size, valid_data_size, test_data_size
The torchvision.transforms package and the DataLoader are very important PyTorch features that makes the data augmentation and loading process very easy.
Transfer Learning¶
We are going to use the Resnet50 as the base model. It is one of the best performant models in terms of model size, inference speed, and prediction accuracy.
First we load the pretrained Resnet50. Then we freeze the model parameters of the convolutional layers (as a feature extractor). Because we are doing transfer learning.
# load pretrained resnet50
resnet_50 = models.resnet50(pretrained=True)
# Freeze model parameters, coz we are fine-tuning
for param in resnet_50.parameters():
param.requires_grad = False
Then we replace the final layer of the ResNet50 model by a small set of Sequential layers. The inputs to the last fully connected layer of ResNet50 is fed to a Linear layer (Dense) which has 256 outputs, which are then fed into ReLU and Dropout layers. It is then followed by a 256×11 Linear Layer which has 11 outputs corresponding to the 11 classes.
# change the final layer of Resnet50 Model for fine-tuning
fc_inputs = resnet_50.fc.in_features
resnet_50.fc = nn.Sequential(
nn.Linear(fc_inputs, 256),
nn.ReLU(),
nn.Dropout(0.4),
nn.Linear(256, 11),
nn.LogSoftmax(dim=1) # for using NLLLoss()
)
# convert model to GPU
resnet_50 = resnet_50.to(device)
# define optimizer and loss function
loss_func = nn.NLLLoss()
optimizer = optim.Adam(resnet_50.parameters())
Start training¶
Now the entire model has been set up, let's start training.
First, take a look at the summary of the model. Since we freezed the layers of the base model, Trainable parameters are only 527,371 for the added layers.
from torchsummary import summary
summary(resnet_50, input_size=(3,224,224))
def train_and_validate(model, loss_criterion, optimizer, epochs=25):
'''
Function to train and validate
Parameters
:param model: Model to train and validate
:param loss_criterion: Loss Criterion to minimize
:param optimizer: Optimizer for computing gradients
:param epochs: Number of epochs (default=25)
Returns
model: Trained Model with best validation accuracy
history: (dict object): Having training loss, accuracy and validation loss, accuracy
'''
start = time.time()
history = []
best_acc = 0.0
for epoch in range(epochs):
epoch_start = time.time()
print("Epoch: {}/{}".format(epoch+1, epochs))
# Set to training mode
model.train()
# Loss and Accuracy within the epoch
train_loss = 0.0
train_acc = 0.0
valid_loss = 0.0
valid_acc = 0.0
for i, (inputs, labels) in enumerate(train_data):
inputs = inputs.to(device)
labels = labels.to(device)
# Clean existing gradients
optimizer.zero_grad()
# Forward pass - compute outputs on input data using the model
outputs = model(inputs)
# Compute loss
loss = loss_criterion(outputs, labels)
# Backpropagate the gradients
loss.backward()
# Update the parameters
optimizer.step()
# Compute the total loss for the batch and add it to train_loss
train_loss += loss.item() * inputs.size(0)
# Compute the accuracy
ret, predictions = torch.max(outputs.data, 1)
correct_counts = predictions.eq(labels.data.view_as(predictions))
# Convert correct_counts to float and then compute the mean
acc = torch.mean(correct_counts.type(torch.FloatTensor))
# Compute total accuracy in the whole batch and add to train_acc
train_acc += acc.item() * inputs.size(0)
#print("Batch number: {:03d}, Training: Loss: {:.4f}, Accuracy: {:.4f}".format(i, loss.item(), acc.item()))
# Validation - No gradient tracking needed
with torch.no_grad():
# Set to evaluation mode
model.eval()
# Validation loop
for j, (inputs, labels) in enumerate(valid_data):
inputs = inputs.to(device)
labels = labels.to(device)
# Forward pass - compute outputs on input data using the model
outputs = model(inputs)
# Compute loss
loss = loss_criterion(outputs, labels)
# Compute the total loss for the batch and add it to valid_loss
valid_loss += loss.item() * inputs.size(0)
# Calculate validation accuracy
ret, predictions = torch.max(outputs.data, 1)
correct_counts = predictions.eq(labels.data.view_as(predictions))
# Convert correct_counts to float and then compute the mean
acc = torch.mean(correct_counts.type(torch.FloatTensor))
# Compute total accuracy in the whole batch and add to valid_acc
valid_acc += acc.item() * inputs.size(0)
#print("Validation Batch number: {:03d}, Validation: Loss: {:.4f}, Accuracy: {:.4f}".format(j, loss.item(), acc.item()))
# Find average training loss and training accuracy
avg_train_loss = train_loss/train_data_size
avg_train_acc = train_acc/train_data_size
# Find average training loss and training accuracy
avg_valid_loss = valid_loss/valid_data_size
avg_valid_acc = valid_acc/valid_data_size
history.append([avg_train_loss, avg_valid_loss, avg_train_acc, avg_valid_acc])
epoch_end = time.time()
print("Epoch : {:03d}, Training: Loss: {:.4f}, Accuracy: {:.4f}%, \n\t\tValidation : Loss : {:.4f}, Accuracy: {:.4f}%, Time: {:.4f}s".format(epoch+1, avg_train_loss, avg_train_acc*100, avg_valid_loss, avg_valid_acc*100, epoch_end-epoch_start))
# Save if the model has best accuracy till now
# torch.save(model, 'model_'+str(epoch)+'.pt')
return model, history
num_epochs = 25
trained_model, history = train_and_validate(resnet_50, loss_func, optimizer, num_epochs)
torch.save(history, 'history.pt')
torch.save(trained_model,'trained_model.pt')
history = np.array(history)
plt.plot(history[:,0:2])
plt.legend(['Tr Loss', 'Val Loss'])
plt.xlabel('Epoch Number')
plt.ylabel('Loss')
plt.ylim(0,1)
plt.savefig('loss_curve.png')
plt.show()
plt.plot(history[:,2:4])
plt.legend(['Tr Accuracy', 'Val Accuracy'])
plt.xlabel('Epoch Number')
plt.ylabel('Accuracy')
plt.ylim(0,1)
plt.savefig('_accuracy_curve.png')
plt.show()
We achieved over 80% accuracy on the validation score. The result is acceptable for this small dataset. If we trained on the entire dataset, the accuracy would be much better.
In the next post, I will try fine-tuning the same model, on the same dataset with Keras. Let's see the difference in implementation. So stay tuned ~.
Test set accuracy¶
def computeTestSetAccuracy(model, loss_criterion):
'''
Function to compute the accuracy on the test set
Parameters
:param model: Model to test
:param loss_criterion: Loss Criterion to minimize
'''
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
test_acc = 0.0
test_loss = 0.0
# Validation - No gradient tracking needed
with torch.no_grad():
# Set to evaluation mode
model.eval()
# Validation loop
for j, (inputs, labels) in enumerate(test_data):
inputs = inputs.to(device)
labels = labels.to(device)
# Forward pass - compute outputs on input data using the model
outputs = model(inputs)
# Compute loss
loss = loss_criterion(outputs, labels)
# Compute the total loss for the batch and add it to valid_loss
test_loss += loss.item() * inputs.size(0)
# Calculate validation accuracy
ret, predictions = torch.max(outputs.data, 1)
correct_counts = predictions.eq(labels.data.view_as(predictions))
# Convert correct_counts to float and then compute the mean
acc = torch.mean(correct_counts.type(torch.FloatTensor))
# Compute total accuracy in the whole batch and add to valid_acc
test_acc += acc.item() * inputs.size(0)
print("Test Batch number: {:03d}, Test: Loss: {:.4f}, Accuracy: {:.4f}".format(j, loss.item(), acc.item()))
# Find average test loss and test accuracy
avg_test_loss = test_loss/test_data_size
avg_test_acc = test_acc/test_data_size
print("Test accuracy : " + str(avg_test_acc))
computeTestSetAccuracy(trained_model, loss_func)
Predict on test images¶
def predict(model, test_image_name):
'''
Function to predict the class of a single test image
Parameters
:param model: Model to test
:param test_image_name: Test image
'''
transform = image_transforms['test']
test_image = Image.open(test_image_name)
plt.imshow(test_image)
test_image_tensor = transform(test_image)
if torch.cuda.is_available():
test_image_tensor = test_image_tensor.view(1, 3, 224, 224).cuda()
else:
test_image_tensor = test_image_tensor.view(1, 3, 224, 224)
with torch.no_grad():
model.eval()
# Model outputs log probabilities
out = model(test_image_tensor)
ps = torch.exp(out)
topk, topclass = ps.topk(3, dim=1)
for i in range(3):
print("Predcition", i+1, ":", idx_to_class[topclass.cpu().numpy()[0][i]], ", Score: ", topk.cpu().numpy()[0][i])
model = torch.load('trained_model.pt')
predict(model, 'food-11k-sub/test/Egg/3_22.jpg')
Summary¶
The above results and test accuracy is pretty good I would say, considering we are using a fairly small dataset.
In this post, we've tried transfer learning on a small dataset using Pytorch with the pre-trained ResNet50 model. In a future post, I will try to do the same thing using Keras. Stay tuned.
Comments
comments powered by Disqus