Fine-tune Mask-RCNN on a Custom Dataset

In an earlier post, we've seen how to use a pretrained Mask-RCNN model using PyTorch. Although it is quite useful in some cases, we sometimes or our desired applications only needs to segment an specific class of object which may not exist in the COCO categories. Therefore, we need to train a customized Mask-RCNN model to meet out demand.

In this post, We will see how to fune-tune Mask-RCNN on a custom dataset. I will cover the processing pipeline from how to prepare a custom dataset to model funtuning and evaluation. It will be very useful, so keep reading.

I've prepared a very small Beagle dataset, and of course I've also put the annotated data in the dataset. Feel free to download it from this link.

Step 1: Preparing the Dataset

The dataset I prepared contains a total number of 100 beagle images which I scraped from Google Image. 75 of them are used for training and 25 of them are used for validation.

I used VGG Image Annotator (VIA) to annotate the training and validation images. Its a simple tool and it labels all the images and exports it to a single JSON file.

Step 2: Install Dependencies

Fisrt we need to downgrade tensorflow to 1.15.0 and keras to 2.2.5 in order to use Matterport's implementation of Mask-RCNN. I do this because I'm using Google Colab to do the experiment.

!pip install tensorflow-gpu==1.15.0
!pip install keras==2.2.5

Then we clone matterport's implementation of Mask-RCNN and download the pretraind weights trained on COCO dataset. We are going to fine-tune the weights using our own dataset.

!git clone
%cd Mask_RCNN/
!python install

I've also cloned my prepared dataset to Google Colad. If you're not using Google Colab, you don't need to do that. This repo also contains the which used for configure the model, load data, train and evaluate the model. I refereced this article.

%cd ..
!git clone
%cd fine-tune-MaskRcnn/

Step 3: Modify for Our Own Dataset

Fisrt, modify the following 3 functions in

def load_custom(self, dataset_dir, subset):
def load_mask(self, image_id):
def image_reference(self, image_id):

Raplace 'beagle' with your custom class name in these functions.

Second, modify

class CustomConfig(Config):
    """Configuration for training on the toy  dataset.
    Derives from the base Config class and overrides some values.
    # Give the configuration a recognizable name
    NAME = "beagle"
     # Number of classes (including background)
    NUM_CLASSES = 1 + 1  # Background + beagle
    # Number of training steps per epoch
    # Skip detections with < 90% confidence

Step 4: Training

Now we are ready to train the mode. If you don't have a GPU, you can also use Google Colab. I only trained the model for 10 epochs, you can modify the number of epochs in

!python3 train --dataset=beagle --weights=coco

Step 5: Inference using the Trained Model

In [33]:
%matplotlib inline
import os
import sys
import random
import math
import re
import time
import numpy as np
import tensorflow as tf
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.patches as patches
ROOT_DIR = os.path.abspath("../")
from mrcnn import utils
from mrcnn import visualize
from mrcnn.visualize import display_images
import mrcnn.model as modellib
from mrcnn.model import log
import beagle
MODEL_DIR = os.path.join(ROOT_DIR, "logs")
MODEL_WEIGHTS_PATH = ROOT_DIR +"/beagle_mask_rcnn_coco.h5"

Setup configurations

In [34]:
config = beagle.CustomConfig()
BEAGLE_DIR = ROOT_DIR+"/fine-tune-MaskRcnn/beagle"
In [35]:
# Override the training configurations with a few
# changes for inferencing.
class InferenceConfig(config.__class__):
    # Run detection on one image at a time
    GPU_COUNT = 1

config = InferenceConfig()
BACKBONE                       resnet101
BACKBONE_STRIDES               [4, 8, 16, 32, 64]
BATCH_SIZE                     1
BBOX_STD_DEV                   [0.1 0.1 0.2 0.2]
GPU_COUNT                      1
GRADIENT_CLIP_NORM             5.0
IMAGES_PER_GPU                 1
IMAGE_MAX_DIM                  1024
IMAGE_META_SIZE                14
IMAGE_MIN_DIM                  800
IMAGE_MIN_SCALE                0
IMAGE_RESIZE_MODE              square
IMAGE_SHAPE                    [1024 1024    3]
LEARNING_MOMENTUM              0.9
LEARNING_RATE                  0.001
LOSS_WEIGHTS                   {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}
MASK_POOL_SIZE                 14
MASK_SHAPE                     [28, 28]
MAX_GT_INSTANCES               100
MEAN_PIXEL                     [123.7 116.8 103.9]
MINI_MASK_SHAPE                (56, 56)
NAME                           beagle
NUM_CLASSES                    2
POOL_SIZE                      7
PRE_NMS_LIMIT                  6000
ROI_POSITIVE_RATIO             0.33
RPN_ANCHOR_RATIOS              [0.5, 1, 2]
RPN_ANCHOR_SCALES              (32, 64, 128, 256, 512)
RPN_ANCHOR_STRIDE              1
RPN_BBOX_STD_DEV               [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD              0.7
STEPS_PER_EPOCH                100
TRAIN_BN                       False
TRAIN_ROIS_PER_IMAGE           200
USE_MINI_MASK                  True
USE_RPN_ROIS                   True
VALIDATION_STEPS               50
WEIGHT_DECAY                   0.0001

In [36]:
# set target device
DEVICE = "/gpu:0"  # /cpu:0 or /gpu:0
In [37]:
def get_ax(rows=1, cols=1, size=16):
    """Return a Matplotlib Axes array to be used in
    all visualizations in the notebook. Provide a
    central point to control graph sizes.
    Adjust the size attribute to control how big to render images
    _, ax = plt.subplots(rows, cols, figsize=(size*cols, size*rows))
    return ax

Load validation set

In [38]:
dataset = beagle.CustomDataset()
dataset.load_custom(BEAGLE_DIR, "val")

# Must call before using the dataset

print("Images: {}\nClasses: {}".format(len(dataset.image_ids), dataset.class_names))
Images: 25
Classes: ['BG', 'beagle']

Create model in inference mode and load our trained weights

In [39]:
# Create model in inference mode
with tf.device(DEVICE):
    model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR,
In [41]:
weights_path = "../logs/beagle20200618T0317/mask_rcnn_beagle_0010.h5"

# Load weights
print("Loading weights ", weights_path)
model.load_weights(weights_path, by_name=True)
Loading weights  ../logs/beagle20200618T0317/mask_rcnn_beagle_0010.h5
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/ The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/ The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/ The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/ The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/ The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/ The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

Re-starting from epoch 10

Inference on test images

In [44]:
image_id = random.choice(dataset.image_ids)
image, image_meta, gt_class_id, gt_bbox, gt_mask =\
    modellib.load_image_gt(dataset, config, image_id, use_mini_mask=False)
info = dataset.image_info[image_id]
print("image ID: {}.{} ({}) {}".format(info["source"], info["id"], image_id, 

# Run object detection
results = model.detect([image], verbose=1)

# Display results
ax = get_ax(1)
r = results[0]
visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'], 
                            dataset.class_names, r['scores'], ax=ax,
log("gt_class_id", gt_class_id)
log("gt_bbox", gt_bbox)
log("gt_mask", gt_mask)
image ID: beagle.00000228.jpg (20) /content/fine-tune-MaskRcnn/beagle/val/00000228.jpg
Processing 1 images
image                    shape: (1024, 1024, 3)       min:    0.00000  max:  255.00000  uint8
molded_images            shape: (1, 1024, 1024, 3)    min: -123.70000  max:  151.10000  float64
image_metas              shape: (1, 14)               min:    0.00000  max: 1024.00000  int64
anchors                  shape: (1, 261888, 4)        min:   -0.35390  max:    1.29134  float32
gt_class_id              shape: (1,)                  min:    1.00000  max:    1.00000  int32
gt_bbox                  shape: (1, 4)                min:  135.00000  max:  900.00000  int32
gt_mask                  shape: (1024, 1024, 1)       min:    0.00000  max:    1.00000  bool
In [52]:
image_id = random.choice(dataset.image_ids)
image, image_meta, gt_class_id, gt_bbox, gt_mask =\
    modellib.load_image_gt(dataset, config, image_id, use_mini_mask=False)
info = dataset.image_info[image_id]
print("image ID: {}.{} ({}) {}".format(info["source"], info["id"], image_id, 

# Run object detection
results = model.detect([image], verbose=1)

# Display results
ax = get_ax(1)
r = results[0]
visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'], 
                            dataset.class_names, r['scores'], ax=ax,
log("gt_class_id", gt_class_id)
log("gt_bbox", gt_bbox)
log("gt_mask", gt_mask)
image ID: beagle.00000248.jpg (24) /content/fine-tune-MaskRcnn/beagle/val/00000248.jpg
Processing 1 images
image                    shape: (1024, 1024, 3)       min:    0.00000  max:  254.00000  uint8
molded_images            shape: (1, 1024, 1024, 3)    min: -123.70000  max:  150.10000  float64
image_metas              shape: (1, 14)               min:    0.00000  max: 1024.00000  int64
anchors                  shape: (1, 261888, 4)        min:   -0.35390  max:    1.29134  float32
gt_class_id              shape: (1,)                  min:    1.00000  max:    1.00000  int32
gt_bbox                  shape: (1, 4)                min:  335.00000  max:  824.00000  int32
gt_mask                  shape: (1024, 1024, 1)       min:    0.00000  max:    1.00000  bool


In this post, we've how to fine-tune a custom Mask-RCNN model on my prepared Beagle dataset. I've walked you through the entire training process, from preparing the dataset to how to perform inference using your own model.

I hope you guys find this post useful. The code and dataset used in this post are availbe in my GitHub Repo.


