![](https://crypto4nerd.com/wp-content/uploads/2023/06/1IwMJmESRP5CxmyonL_RbGQ.gif)
Unlock the power of Interactive Segmentation and revolutionize your data labeling process with ease.
In this article, we’ll explore click-based interactive segmentation and its benefits for speeding up the data labeling process in semantic segmentation. Segmentation models have proven useful in numerous fields, including analyzing medical images, video understanding, and even self-driving cars. Moreover, we recently talked about the background replacement task and face beautification in our article, in which we presented a new large open-source dataset for portrait segmentation and face parsing along with a set of pre-trained models.
How about we dive into image segmentation and explore the different types? Whether you want to refresh your memory or learn something new, let’s start!
Image segmentation is a task of computer vision, which is to classify each pixel of an image and thereby separate objects on it into separate segments (masks). Image Segmentation has three subtasks: semantic segmentation, instance segmentation, and panoptic segmentation. Each subtask has unique features and subcategories to consider when working on the task.
Semantic segmentation is an image segmentation task that assigns each pixel to a given class or set of classes. With a single target class, such a task is called binary segmentation. Face parsing, portrait segmentation, and scene understanding are subfields of semantic segmentation.
Instance Segmentation aims to localize the area of pixels with target objects into separated segments (instances). Such segmentation is beneficial if we expect object tracking in the video or if the segmented objects are too small and too close to each other for accurate pixel-by-pixel segmentation. Most often, examples of instance segmentation can be seen in the problems of analyzing geospatial data and medical images and recognizing the environment for self-driving cars and robotics.
Panoptic segmentation combines semantic and instance segmentation by returning the sum of their masks. But keep in mind that with this type of segmentation, each image pixel should have only one class label, i.e., if objects in the image overlap, panoptical segmentation will cope with such occlusion worse than the instance segmentation described above.
All three types of image segmentation have their characteristics, but their main common disadvantage is the data labeling process if the existing datasets don’t fit any criteria for traning models on them.
Neural network models for image segmentation are trained by approximating the ground-truth segmentation masks. Therefore, if there is no suitable dataset, then such ground-truth masks can be obtained through manual labeling on data annotation platforms (Toloka.AI, Amazon Mechanical Turk, Supervisely, Elementary) or synthetic data generation. When using synthetic images to train models, their quality may be lower than those trained on real data. To use synthetic data, an additional model must be developed to generate high-quality data and annotations. Despite its high cost, duration, and the likelihood of human error, the manual labeling approach is often chosen.
Using interactive segmentation, you can simplify the process of manual labeling and make additional contributions to model training, namely:
- speed up the labeling process with the possible minimization of the number of crowd workers because of partially automated segmentation mask drawing;
- improve the quality of the models due to the increase in the set of qualitatively labeled data;
- reduce the cost of the final markup.
Below we will discuss in more detail how Interactive Segmentation works.
One approach to interactive segmentation is based on the use of clicks, with which the user can mark up masks. There are two types of clicks: positive ones, which render the mask of a given class, and negative ones, which can be used to correct the original mask. For example, if the resulting segmentation mask doesn’t completely cover the desired area of pixels, then you can make another positive click. Likewise, if there are extra pixels in the segmentation mask, you can put a negative click, which will refine it.
All approaches to solving this problem aim to get the maximum possible value of the Intersection over Union (IoU) metric for the minimum number of clicks. Usually, 90 is chosen as a maximum value for IoU and written NoC@90, where NoC is the number of clicks.
Thus, in one click or several clicks, you can get the desired segmentation mask for the target object and reduce the time spent on marking through polygon points, as is usually implemented on crowdsourcing platforms. It is also worth noting that when using interactive segmentation approaches, masks are obtained with smooth edges, repeating the shape of the segmented object as much as possible. To achieve this effect classically by drawing a mask through creating a polygon, you need to put a more significant number of points, significantly increasing the markup time.