At Anno, we use instance segmentation in several of our machine learning (ML) pipelines. In our previous blog posts on user-assisted segmentation in our Labeling App, we described incorporating segmentation into our data labeling workflows to speed up annotation of large image and video datasets. In some of our other pipelines, including our process for generating synthetic training data for our Vision application, we use segmentation for background removal.
The synthetic data pipeline allows our customers to use a single image of a specific object or logo, and develop a deployable Vision model in minutes–vice months–with zero manual annotation. Finding an object in the wild on a camera feed without needing to annotate data may sound like magic, but it’s not! In this blog post, we evaluate open source methods for automatically segmenting single object images.
The Anno data science team explored five different open source methods, including Rembg, Fastseg, Paddleseg, Yolact and Yolact-edge, for automatically segmenting single object images. For all techniques, the segmentation results were more accurate when the image had a little padding around the object that was to be segmented.
For testing, images were kept consistent between all of the methods and only used on the pretrained models available in each. Equipment used for testing was an Ubuntu 20.04 machine with a 3080 GPU and AMD Ryzen threadripper 3960x 24-core processor CPU unless stated otherwise. For figuring out which method performed best, we considered: inference speed, segmentation quality, and ease of use.
Rembg,the first approach we tried out, is meant for just removing the background of images using a U2-Net model. The models it comes with include the base U2-Net, a lighter weight U2-Net, and one specifically for people. You can also use a custom trained model for more accurate background removal.
The set up for Rembg is very easy, as there is a Pypi package available to download instead of needing to get the GitHub repository. For our purposes of using this as a way to segment images, we used Rembg as a library so we could adjust the output. As said earlier, it only removes the background and saves the object as an image on a transparent background. To use this for segmentation, we used cv2 to adjust the output and create a mask of the object. We then returned the mask as black and white image as well as the point coordinates of the object.
For testing, we used the included pretrained base U2net. The following displays the original images on the left and the masked output on the right.
Another method we explored was Fastseg, an approach using MobileNetV3 models with modified lightweight heads based off ofLR-ASPP (Lite Reduced Atrous Spatial Pyramid Pooling) for semantic segmentation. Fastseg has pretrained models available to use, all trained on Citiscapes, and the option to use your own weights files.
Although the setup is not as easy as grabbing a Pypi package, the repository’s readme makes getting it running straightforward, with several examples for different use cases. One issue we came across was that it has a minimum image input size of 400x400, otherwise it will error out when trying to segment the image. Another issue is that the pretrained model doesn’t do well with top down images or with object shapes that weren’t in the training data. This can be seen in the results below with the original images on the left and the segmented output on the right.
PaddleSeg, based on PaddlePaddle, was another approach we tested. It works in four segmentation areas including semantic, interactive, panoptic and matting and has over 80 pretrained models.
To get PaddleSeg working, just download the repository from GitHub and follow the tutorials in the readme. When trying to use this repository, we ran into a few problems. One was that most of the documentation was in Chinese. With the latest updates to the repo, documentation and tutorials are now available in English. Another issue was trying to use a GPU for segmentation, which there is a version for, but we were unable to get it working. The latest releases may have fixed this problem.
For our testing, we used their DeepLabV3 ResNet101 model that was pre trained on the Pascal VOC dataset. The following displays the original images on the left and the segmented output on the right.
A different technique we looked at was YOLACT, built with real-time instance segmentation in mind. YOLACT is particularly fast, as it accomplishes instance segmentation by having two parallel subtasks, the generation of a set of prototype masks, and the prediction of per-instance mask coefficients. It also uses Fast NMS, instead of the standard NMS, which is 12 ms faster and only has a marginal performance penalty. An updated version of YOLACT was released, called YOLACT++, that slightly improves segmentation. There is also another version called YOLACT-Edge, which is an implementation of YOLACT, but modified for use on edge devices. All models provided by YOLACT and other versions were trained on the Coco dataset.
For all methods, the segmentation output is of the original image with a transparent mask overlay. To have the output be a black and white mask, you can use the same technique as we did with Rembg. The results of YOLACT, YOLACT++, and YOLACT-Edge were all very similar, with the main difference being the segmentation speeds. All versions of YOLACT were tested using their ResNet101 model. Below are segmentation images from YOLACT.
The top performer was Rembg, with its quality segmentation results and ease of use, but it had the slowest GPU inference speed. YOLACT and its various versions had comparable results to the Rembg segmentation results. If you are looking to segment people, YOLACT, and its other versions, performed best. The only major problems were the inability to get it running with a CPU and incompatibility with certain computers. Both Fastseg and PaddleSeg had poor segmentation results compared to the other methods.
About the Author
Hanna Diamond is a Data Scientist at Anno.Ai, where she focuses on computer vision applications and development of deep learning models. Hanna has a background in image processing, computer vision, and robotics; she holds a BA in Computer Science from Oberlin College. Hanna leads Anno’s Zwift virtual bike team and enjoys building up her gaming workstation in her free time.