Using Pre-trained Models: PyTorch and Keras

In this post, we will try to use pre-trained models to do image classification. We will use two popular deep learning frameworks, PyTorch and Keras. Let's find out the workflow of using pre-trained models in these two frameworks.

PyTorch pre-trained models

Let's first look at the pre-trained models in PyTorch. We can find all of them in torchvision.models.

In [0]:
from torchvision import models
import torch
dir(models)
Out[0]:
['AlexNet',
 'DenseNet',
 'GoogLeNet',
 'GoogLeNetOutputs',
 'Inception3',
 'InceptionOutputs',
 'MNASNet',
 'MobileNetV2',
 'ResNet',
 'ShuffleNetV2',
 'SqueezeNet',
 'VGG',
 '_GoogLeNetOutputs',
 '_InceptionOutputs',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '_utils',
 'alexnet',
 'densenet',
 'densenet121',
 'densenet161',
 'densenet169',
 'densenet201',
 'detection',
 'googlenet',
 'inception',
 'inception_v3',
 'mnasnet',
 'mnasnet0_5',
 'mnasnet0_75',
 'mnasnet1_0',
 'mnasnet1_3',
 'mobilenet',
 'mobilenet_v2',
 'quantization',
 'resnet',
 'resnet101',
 'resnet152',
 'resnet18',
 'resnet34',
 'resnet50',
 'resnext101_32x8d',
 'resnext50_32x4d',
 'segmentation',
 'shufflenet_v2_x0_5',
 'shufflenet_v2_x1_0',
 'shufflenet_v2_x1_5',
 'shufflenet_v2_x2_0',
 'shufflenetv2',
 'squeezenet',
 'squeezenet1_0',
 'squeezenet1_1',
 'utils',
 'vgg',
 'vgg11',
 'vgg11_bn',
 'vgg13',
 'vgg13_bn',
 'vgg16',
 'vgg16_bn',
 'vgg19',
 'vgg19_bn',
 'video',
 'wide_resnet101_2',
 'wide_resnet50_2']

Step 1: load pre-trained model

In [0]:
# load the pretrained alexnet
alexnet = models.alexnet(pretrained=True)
# view the alexnet
print(alexnet)
Downloading: "https://download.pytorch.org/models/alexnet-owt-4df8aa71.pth" to /root/.cache/torch/checkpoints/alexnet-owt-4df8aa71.pth
AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace=True)
    (3): Dropout(p=0.5, inplace=False)
    (4): Linear(in_features=4096, out_features=4096, bias=True)
    (5): ReLU(inplace=True)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

Step 2: Specify transformations of images

Once we have the model with us, the next step is to transform the input image so that they have the right shape and other characteristics like mean and standard deviation. These values should be similar to the ones which were used while training the model.

In [0]:
from torchvision import transforms
transform = transforms.Compose([            
 transforms.Resize(256),                    
 transforms.CenterCrop(224),                
 transforms.ToTensor(),                     
 transforms.Normalize(                      
 mean=[0.485, 0.456, 0.406],                
 std=[0.229, 0.224, 0.225]                  
 )])

Step 3: load and transform input image

In [0]:
%matplotlib inline
In [0]:
from PIL import Image
import matplotlib.pyplot as plt

img = Image.open("cat1.jpg")
plt.imshow(img)
Out[0]:
<matplotlib.image.AxesImage at 0x7fa0438e37b8>
In [0]:
# transform the image and prepare a batch to be passed to the alexnet
img_t = transform(img)
batch_t = torch.unsqueeze(img_t, 0)

Step 4: Model Inference

In [0]:
# put our model to inference mode
alexnet.eval()

# forward pass
out = alexnet(batch_t)
print(out.shape)
torch.Size([1, 1000])

For this, we will first read and store the labels from a text file having a list of all the 1000 labels.

In [0]:
# read in imagenet class labels
with open("imagenet_classes.txt") as f:
  classes = [line.strip() for line in f.readlines()]
In [0]:
_, index = torch.max(out, 1)
percentage = torch.nn.functional.softmax(out, dim=1)[0] * 100
print(classes[index[0]], percentage[index[0]].item())
Siamese cat, Siamese 99.61538696289062

We got the 'Siamese cat' class with an over 99% confidence. Let's see what other labels it might be.

In [0]:
_, indices = torch.sort(out, descending=True)
[(classes[idx], percentage[idx].item()) for idx in indices[0][:5]]
Out[0]:
[('Siamese cat, Siamese', 99.61538696289062),
 ('Chihuahua', 0.0938020721077919),
 ('Egyptian cat', 0.0931343361735344),
 ('wallaby, brush kangaroo', 0.03188218176364899),
 ('toy terrier', 0.027880175039172173)]

Let's try out resnet50 and resnet101.

In [0]:
# resnet 50
# first load model
resnet_50 = models.resnet50(pretrained=True)

# then put the model in eval mode
resnet_50.eval()

# forward pass
out = resnet_50(batch_t)

# Forth, print the top 5 classes predicted by the model
_, indices = torch.sort(out, descending=True)
percentage = torch.nn.functional.softmax(out, dim=1)[0] * 100
[(classes[idx], percentage[idx].item()) for idx in indices[0][:5]]
Out[0]:
[('Siamese cat, Siamese', 99.51265716552734),
 ('Egyptian cat', 0.22618000209331512),
 ('lynx, catamount', 0.16791078448295593),
 ('paper towel', 0.010585855692625046),
 ('mouse, computer mouse', 0.010341660119593143)]
In [0]:
# resnet 101
# first load model
resnet_101 = models.resnet101(pretrained=True)

# then put the model in eval mode
resnet_101.eval()

# forward pass
out = resnet_101(batch_t)

# Forth, print the top 5 classes predicted by the model
_, indices = torch.sort(out, descending=True)
percentage = torch.nn.functional.softmax(out, dim=1)[0] * 100
[(classes[idx], percentage[idx].item()) for idx in indices[0][:5]]
Downloading: "https://download.pytorch.org/models/resnet101-5d3b4d8f.pth" to /root/.cache/torch/checkpoints/resnet101-5d3b4d8f.pth

Out[0]:
[('Siamese cat, Siamese', 99.88835144042969),
 ('lynx, catamount', 0.03424644097685814),
 ('Egyptian cat', 0.015446522273123264),
 ('paper towel', 0.0066812834702432156),
 ('jay', 0.005932020954787731)]

Keras work flow on pre-trained models

Let's try out an example using keras. First, let's find out the pretrained models in keras.

In [0]:
import keras
dir(keras.applications)
Out[0]:
['DenseNet121',
 'DenseNet169',
 'DenseNet201',
 'InceptionResNetV2',
 'InceptionV3',
 'MobileNet',
 'MobileNetV2',
 'NASNetLarge',
 'NASNetMobile',
 'ResNet101',
 'ResNet101V2',
 'ResNet152',
 'ResNet152V2',
 'ResNet50',
 'ResNet50V2',
 'VGG16',
 'VGG19',
 'Xception',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 'absolute_import',
 'backend',
 'densenet',
 'division',
 'inception_resnet_v2',
 'inception_v3',
 'keras_applications',
 'keras_modules_injection',
 'layers',
 'mobilenet',
 'mobilenet_v2',
 'models',
 'nasnet',
 'print_function',
 'resnet',
 'resnet50',
 'resnet_v2',
 'utils',
 'vgg16',
 'vgg19',
 'xception']

In this case, we will only try out mobilenetV2. Other models work in a similar way.

In [0]:
from keras.preprocessing.image import load_img, img_to_array
from keras.applications.imagenet_utils import decode_predictions
from keras.applications import mobilenet_v2
from keras.applications.mobilenet_v2 import preprocess_input
import numpy as np
In [0]:
# first, load image , to 224*224 imagenet image size
original_image  = load_img("cat1.jpg", target_size=(224, 224))

# second, convert the PIL image to numpy array
numpy_image = img_to_array(original_image)

# third, convert the image into 4D tensor (samples, height, width, channels)
input_image = np.expand_dims(numpy_image, axis=0)
print('PIL image size = ', original_image.size)
print('NumPy image size = ', numpy_image.shape)
print('Input image size = ', input_image.shape)
plt.imshow(np.uint8(input_image[0]))
PIL image size =  (224, 224)
NumPy image size =  (224, 224, 3)
Input image size =  (1, 224, 224, 3)
Out[0]:
<matplotlib.image.AxesImage at 0x7fa03fbf7b70>
In [0]:
# fourth, Normalize the image
processed_image = preprocess_input(input_image.copy())

Now, we are ready to make predictions.

In [0]:
mobilenet_model= mobilenet_v2.MobileNetV2(weights="imagenet")
prediction = mobilenet_model.predict(processed_image)
label = decode_predictions(prediction)
print('label = ', label[0][:5])
label =  [('n02123597', 'Siamese_cat', 0.7589597), ('n02124075', 'Egyptian_cat', 0.01455726), ('n04141975', 'scale', 0.010692967), ('n15075141', 'toilet_tissue', 0.0065661813), ('n04493381', 'tub', 0.0037253292)]

Now, we have seen the workflows of using pre-trained models in PyTorch and Tensorflow. Using these pre-trained models is very convenient, but in most cases, they may not satisfy the specifications of our applications. We may want a more specific model. It opens up another topic Transfer Learning, or Fine Tuning these pre-trained models to meet our demands. In a flowing post, we will focus on Tranfer Learning using these models.



Comments

comments powered by Disqus