Evaluating Automated Polygon Segmentation Methods
Labeling images in preparation for neural network training can be time-consuming and tedious, especially when working with polygons. The most common tools available require the user to place multiple points around the object they wish to segment. These tools either draw straight lines from point to point or try their best to snap the line around the edges of the object. Sometimes these tools require 10 or more points to create a decent segmentation of the object.
The Anno.Ai data science team regularly works with labeled data and was interested in finding a faster approach to segmenting objects in images using polygons, with a focus on open source options. In this post we will discuss several methods, GrabCut, f-BRS, and DEXTR, to help automate segmenting objects.
The first method we will explore is GrabCut, which is based on graph cuts. It is also the fastest way if you already have bounding boxes for the objects, as no additional input is required to create the segmentation.
GrabCut estimates the color distribution of the target object and that of the background using a Gaussian mixture model. This is then used to construct a Markov random field over the pixel labels, with an energy function that prefers connected regions having the same label, and running a graph-cut based optimization to infer their values. As this estimate is likely to be more accurate than the original, taken from the bounding box, this two-step procedure is repeated until convergence.
Setting up GrabCut did require some programming, but there are several tutorials online and were very easy to follow. In total it was only 40 lines of code to get it running.
The image tests were run with a max of 10 iterations, with an increase showing no discernible differences to the output.
The following displays the original images on the left with the bounding boxes used for each, and the masked output from GrabCut on the right
GrabCut should definitely be considered if you already have bounding boxes and are wanting to switch to polygon segmentation. Even better results could be made using it for a first pass and then passing the images through another method like f-BRS or DEXTR to get the objects that may have been missed.
Otherwise, if starting from completely unlabeled data, DEXTR or f-BRS are better options as both of those returned masks more closely matching the objects’ shape and with smoother edges.
Another method is f-BRS, which stands for feature backpropagating refinement scheme. This was created as a response to a recently proposed backpropagating refinement (BRS) scheme, which produced significantly better results for objects that other methods had trouble with. The problem with BRS is that it requires running forward and backward passes through a deep network several times, leading to significantly increased computational time. f-BRS, on the other hand, requires running forward and backward passes just for a small part of a network.
The GitHub repository is easy to get up and running and provides a nice graphical UI with all of the parameter adjustments needed so there is no need to poke around the code to change things. The one small issue encountered was needing to select the desired image to work on. A preferred method would be to select the directory, automatically save and go to the next image, speeding up the overall process. Even with this small gripe, the workflow was smooth and simple
The best performance of f-BRS used a Resnet 101 model as a base, with L-BFGS max iterations at 50, and a 0.80 prediction threshold.
For car objects, two points were placed, one at the front and the other at the back. Airplanes had four points located at the front, tail, left-wing, and right-wing.
Starting from completely unlabeled images, f-BRS is a very good option for its fast and smooth workflow. The GPU speed is also very close to the fastest method, DEXTR, with only a 0.17-second difference. Segmentation results are close to DEXTR as well but included more unwanted pixels surrounding the objects.
For speeding up the workflow, adjusting the code to be able to automatically go through a directory of images instead of needing to go through the UI’s folder search to select an image every time is a must.
DEXTR, also known as Deep Extreme Cut, is another approach and improves on GrabCut. This method uses the extreme points in an object, which is the input. DEXTR adds an extra channel to the image in the input of a convolutional neural network, which contains a Gaussian centered in each of the extreme points. The CNN then learns to transform this information into a segmentation of an object that matches those extreme points. Both PyTorch and TensorFlow implementations are available on GitHub.
Getting this repository up and running was easy and it provides a UI as well. Unfortunately, several issues were found with the workflow. First off, you need to edit the code by adding in the path every time you want to work on an image, and you could only input one image at a time. You also need to edit the save path so your images don’t get overwritten when you work on a new one. There are also no parameter adjustments in the UI so you have to edit those in the code as well if you wish to change them. Another issue I had was accidental points being placed when I was trying to zoom or pan. There isn’t a way to remove a point so I had to close the program and start over again.
Testing took much longer to complete due to these issues, but overall the output was better than f-BRS. The masks returned had smoother edges more closely outlining the objects than f-BRS and less unwanted pixels surrounding the objects.
The image tests performed best using a 0.80 prediction threshold. Points for car objects were placed at the front left, front right, back left, and back right. Airplanes had four points located at the front, tail, left-wing, and right-wing.
DEXTR, out of the box, should really only be used if you are limited to CPU or are only annotating a few images, as it is a very time-consuming process to change the code for every image.
With modifications to code, this is the best option as it is the fastest performer on the GPU and CPU and has the best segmentation results. For modifications, you should adjust the code to go through a directory and save images without needing to do it manually each time. Another addition would be to fix the pan and zoom issues if working with small objects, otherwise making the UI full screen should allow you to completely avoid needing this.
Compute Time Comparisons
The GPU of the machine used was a GeForce RTX 3080 and a Ryzen threadripper 3960x 24-core processor CPU.