Object Detection using Faster-RCNN Pytorch¶

In this post, we will explore Faster-RCNN object detector with Pytorch. We will use the pretrained Faster-RCNN model with Resnet50 as the backbone.

Understanding model inputs and outputs:¶

The pretrained Faster-RCNN ResNet-50 model we are going to use expects the input image tensor to be in the form [n, c, h, w] where

n is the number of images
c is the number of channels , for RGB images its 3
h is the height of the image
w is the widht of the image

The model will return

Bounding boxes [x0, y0, x1, y1] all all predicted classes of shape (N,4) where N is the number of classes predicted by the model to be present in the image.
Labels of all predicted classes.
Scores of each predicted label.

Load model¶

Now, we are loading the pretrained Faster-RCNN Resnet50 model, and also loading the COCO dataset category names.

In [0]:

# import necessary libraries
%matplotlib inline
import matplotlib.pyplot as plt 
from PIL import Image
import torch
import torchvision.transforms as T
import torchvision
import numpy as np 
import cv2
import warnings
warnings.filterwarnings('ignore')

In [0]:

# load model
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
# set to evaluation mode
model.eval()


# load the COCO dataset category names
# we will use the same list for this notebook
COCO_INSTANCE_CATEGORY_NAMES = [
    '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',
    'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
    'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',
    'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
    'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
    'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
    'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
    'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',
    'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
    'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',
    'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]

We can see some N/A’s in the list, as a few classes were removed in the later papers. We will go with the list given by Pytorch.

Object detection pipeline¶

We define two functions used for model inference.

get_prediction take the img_path, and confidence as input, and returns predicted bounding boxes and classes.
detect_object uses the get_prediction function and gives the visualization result.

In [0]:

def get_prediction(img_path, confidence):
  """
  get_prediction
    parameters:
      - img_path - path of the input image
      - confidence - threshold value for prediction score
    method:
      - Image is obtained from the image path
      - the image is converted to image tensor using PyTorch's Transforms
      - image is passed through the model to get the predictions
      - class, box coordinates are obtained, but only prediction score > threshold
        are chosen.
    
  """
  img = Image.open(img_path)
  transform = T.Compose([T.ToTensor()])
  img = transform(img)
  pred = model([img])
  pred_class = [COCO_INSTANCE_CATEGORY_NAMES[i] for i in list(pred[0]['labels'].numpy())]
  pred_boxes = [[(i[0], i[1]), (i[2], i[3])] for i in list(pred[0]['boxes'].detach().numpy())]
  pred_score = list(pred[0]['scores'].detach().numpy())
  pred_t = [pred_score.index(x) for x in pred_score if x>confidence][-1]
  pred_boxes = pred_boxes[:pred_t+1]
  pred_class = pred_class[:pred_t+1]
  return pred_boxes, pred_class

In [0]:

def detect_object(img_path, confidence=0.5, rect_th=2, text_size=2, text_th=2):
  """
  object_detection_api
    parameters:
      - img_path - path of the input image
      - confidence - threshold value for prediction score
      - rect_th - thickness of bounding box
      - text_size - size of the class label text
      - text_th - thichness of the text
    method:
      - prediction is obtained from get_prediction method
      - for each prediction, bounding box is drawn and text is written 
        with opencv
      - the final image is displayed
  """
  boxes, pred_cls = get_prediction(img_path, confidence)
  img = cv2.imread(img_path)
  img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
  # print(len(boxes))
  for i in range(len(boxes)):
    cv2.rectangle(img, boxes[i][0], boxes[i][1],color=(0, 255, 0), thickness=rect_th)
    cv2.putText(img,pred_cls[i], boxes[i][0], cv2.FONT_HERSHEY_SIMPLEX, text_size, (0,255,0),thickness=text_th)
  plt.figure(figsize=(20,30))
  plt.imshow(img)
  plt.xticks([])
  plt.yticks([])
  plt.show()

Making predictions¶

Now we are ready to use the model to do inference. Let's look at a few examples.

Example 1

In [30]:

!wget -nv https://www.goodfreephotos.com/cache/other-photos/car-and-traffic-on-the-road-coming-towards-me.jpg -O traffic.jpg
detect_object('./traffic.jpg', confidence=0.7)

2020-06-14 10:28:35 URL:https://www.goodfreephotos.com/cache/other-photos/car-and-traffic-on-the-road-coming-towards-me_800.jpg?cached=1522560655 [409997/409997] -> "traffic.jpg" [1]

The result is a bit surprising. We not only detected the three cars in the picture, but also detect the person in the car which is very indistinct.

Example 2

In [33]:

!wget -nv https://pixnio.com/free-images/2018/12/10/2018-12-10-18-38-14-1196x900.jpg -O traffic2.jpg
detect_object('./traffic2.jpg', confidence=0.7)

2020-06-14 10:35:30 URL:https://pixnio.com/free-images/2018/12/10/2018-12-10-18-38-14-1196x900.jpg [189333/189333] -> "traffic2.jpg" [1]

It looks like we are getting quite accurate predictions with the model.

Example 3

In [34]:

!wget -nv https://storage.needpix.com/rsynced_images/pedestrian-zone-456909_1280.jpg -O pedestrian.jpg
detect_object('./pedestrian.jpg', confidence=0.7)

2020-06-14 10:37:56 URL:https://storage.needpix.com/rsynced_images/pedestrian-zone-456909_1280.jpg [409534/409534] -> "pedestrian.jpg" [1]

Comparing inference time for CPU and GPU¶

Let's take a look at the inference time of the model for CPU and GPU. I am using Google Colab to do the experiment.

In [0]:

import time

def check_inference_time(image_path, gpu=False):
  model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
  model.eval()
  img = Image.open(image_path)
  transform = T.Compose([T.ToTensor()])
  img = transform(img)
  if gpu:
    model.cuda()
    img = img.cuda()
  else:
    model.cpu()
    img = img.cpu()
  start_time = time.time()
  pred = model([img])
  end_time = time.time()
  return end_time-start_time

In [36]:

cpu_time = sum([check_inference_time('./traffic.jpg', gpu=False) for _ in range(10)])/10.0
gpu_time = sum([check_inference_time('./traffic.jpg', gpu=True) for _ in range(10)])/10.0


print('\n\nAverage Time take by the model with GPU = {}s\nAverage Time take by the model with CPU = {}s'.format(gpu_time, cpu_time))


Average Time take by the model with GPU = 0.10062439441680908s
Average Time take by the model with CPU = 4.527194166183472s

In [37]:

plt.bar([0.1, 0.2], [cpu_time, gpu_time], width=0.08)
plt.ylabel('Time/s')
plt.xticks([0.1, 0.2], ['CPU', 'GPU'])
plt.title('Inference time of Faster-RCNN with Resnet-50 backbone on CPU and GPU')
plt.show()

Using Google Colab, the inference time of the Faster-RCNN model on GPU is approximately 45 times faster than on CPU.

Object Detection using Faster-RCNN PyTorch

Object Detection using Faster-RCNN Pytorch¶

Understanding model inputs and outputs:¶

Load model¶

Object detection pipeline¶

Making predictions¶

Comparing inference time for CPU and GPU¶

Comments

Object Detection using Faster-RCNN Pytorch¶

Understanding model inputs and outputs:¶

Load model¶

Object detection pipeline¶

Making predictions¶

Comparing inference time for CPU and GPU¶

Related Posts:

Comments