Object Detection and Zone Counting

Introduction

Object detection on a video stream with object counting in a polygon zone is a sophisticated application of artificial intelligence (AI) that finds extensive use in various fields, including surveillance, traffic management, and industrial automation. This technology leverages advanced computer vision techniques to identify and track objects within a specified area of interest.

Object detection involves the use of AI models, such as convolutional neural networks (CNNs), to analyze video frames and identify objects present within them. In the context of video streams, these models analyze successive frames to track the movement and changes in object positions over time.

A polygon zone is a defined area of interest within the video frame where object counting needs to occur. It is usually specified by a set of vertices that form a closed shape. The application of a polygon zone allows for focused analysis, restricting object counting to a specific region within the video stream.

The implementation of object counting within a polygon zone involves the following steps:

Polygon Definition: define the polygon zone by specifying its vertices. This defines the region where object counting will take place.
Frame Analysis: process each frame of the video stream using the object detection model. Identify and classify objects present in the frame.
Polygon Intersection: check whether the detected objects intersect with the defined polygon zone. This step ensures that only objects within the specified region are considered for counting.
Counting Logic: establish counting logic based on the number of objects within the polygon zones.
Real-Time Updates: provide real-time updates on the object count within the polygon zone. This information can be utilized for various applications, such as monitoring crowd density, traffic flow, or object movement in a manufacturing environment.

Applications

Surveillance: monitor specific areas within a surveillance camera’s field of view, counting and tracking objects for security purposes.
Traffic Management: analyze vehicle movement within designated zones to optimize traffic flow and detect congestion.
Industrial Automation: count and track objects on factory floors to enhance production efficiency and safety.
Retail Analytics: monitor customer movement and product interactions within specific areas of a retail store for marketing and inventory management.

Using DeGirum SDK for Object Zone Counting

Prerequisites

Assuming you have configured Python environment on your Windows, Linux, or MacOS computer, you need to install DeGirum PySDK and DeGirum Tools Python packages by running the following commands (follow this link for details):

pip install -U degirum
pip install -U degirum_tools

Alternatively, you can use Google Colab to run zone counting example Jupyter notebook provided by DeGirum.

Step-by-Step Guide

Import necessary packages

You need to import degirum and degirum_tools packages:

import degirum as dg, degirum_tools

Select object detection AI model

As a starter, we will use YOLOv5s COCO model, which can detect 80 COCO classes. We will take this model from DeGirum cloud public model zoo.

Lets define the cloud zoo URL and the model name:

model_zoo_url = "https://cs.degirum.com/degirum/public"
model_name = "yolo_v5s_coco--512x512_quant_n2x_orca1_1"

Here cs.degirum.com is DeGirum DeLight Cloud Platform URL, and degirum/public is the path to DeGirum cloud public model zoo.

yolo_v5s_coco--512x512_quant_n2x_orca1_1 is the model name we will use for object detection. It is based on YOLOv5 Ultralytics model trained to detect 80 COCO classes and compiled for DeGrium ORCA1 AI hardware accelerator.

Define video source

For simplicity, we will use short highway traffic video from DeGirum PySDK examples GitHub repo:

video_source = 
"https://github.com/DeGirum/PySDKExamples/raw/main/images/Traffic.mp4"

But you can use any video file you want. If you run the code locally and your computer has video camera, you may use video camera as a video source:

video_source = 0 # specify index of local video camera

Define polygon zones

For each zone, in which you want to count objects, you need to define the list of [x,y] pixel coordinates of polygon vertices, which surround that zone. Then you define a list containing all zone polygons:

polygon_zones = [
[[265, 260], [730, 260], [870, 450], [120, 450]], # zone 1
[[400, 100], [610, 100], [690, 200], [320, 200]], # zone 2
]

Here we defined two zone with four vertices in each zone.

Obtains cloud API access token

In order to use AI models from DeGirum Cloud Platform, you need to register and generate cloud API access token. Please follow these instructions. Registration is free.

Connect to model zoo and load the model

# connect to AI inference engine
zoo = dg.connect(dg.CLOUD, model_zoo_url, "<cloud API token>")# load model
model = zoo.load_model(model_name)

Here we connect to DeGirum Cloud Platform to run AI model inferences (by using dg.CLOUD parameter) and to cloud model zoo specified by model_zoo_url using cloud API access token obtained on the previous step. Then we load a model specified by model_name.

For more inference options please refer to this documentation page.

Define interactive display

If you would like to observe live real-time results of object detection and zone counting with AI annotations overlay, you may use Display class from DeGirum Tools package:

with degirum_tools.Display("AI Camera") as display:
...

Define zone counting object

We will use ZoneCounter object from DeGirum Tools package, which greatly simplifies the task of object zone counting.

with degirum_tools.Display("AI Camera") as display:
zone_counter = degirum_tools.ZoneCounter(
polygon_zones,
class_list=["car", "motorbike", "truck"],
per_class_display=True,
triggering_position=degirum_tools.AnchorPoint.CENTER,
window_name=display.window_name,
)

Here we specify the list of classes we want to count: class_list=["car", "motorbike", "truck"]. You may omit this parameter to count all classes the model reports.

We also specify per-class counting mode: per_class_display=True. You may omit this parameter to count all classes together.

We specify the triggering position to be the center of the object bounding box: triggering_position=degirum_tools.AnchorPoint.CENTER. Possible triggering positions are all four vertices and all four centers of bounding box edges.

If you want to adjust polygon zones on interactive display during run-time, you can specify the OpenCV window name of that interactive display: window_name=display.window_name. You can drag the whole zone polygon by left mouse button and you can move polygon vertices by right mouse button.

Define inference loop

with degirum_tools.open_video_stream(video_source) as stream:
for result in model.predict_batch(
degirum_tools.video_source(stream)
):
img = result.image_overlay
img = zone_counter.analyze_and_annotate(result, img)
display.show(img)

We open the video stream using degirum_tools.open_video_stream.

Then we supply video source to the input of model.predict_batch method, which performs AI model inference on each frame retrieved from the video stream in effective pipelined manner.

The result object contains the inference results, which are stored in the result.results list.

The result.image_overlay method draws AI annotations on a top of original video frame image. These annotations include bounding boxes of all detected objects.

The zone_counter.analyze_and_annotate method counts objects in polygon zones and draws object counts on a top of provided image.

Finally, display.show(img) displays fully annotated image in OpenCV interactive window.

To simplify the boilerplate code you may use degirum_tools.predict_stream function, which effectively performs the same steps as above:

for result in degirum_tools.predict_stream(
model, video_source, analyzers=zone_counter
):display.show(result)

Access zone counting results

If you want to know, which object belongs to which zone, you may analyze "in_zone" key of each result dictionary: when present, its value is the index of the zone where this object is detected:

for detection in result.results:
if (zone_index := detection.get("in_zone", None)) is not None:
# this `detection` belongs to zone with index `zone_index`

If you want to obtain per-zone counts, you may use the following expression:

zones,counts = np.unique([d["in_zone"] 
for d in result.results 
if d["label"] in class_list and "in_zone" in d], 
return_counts=True)
print(dict(zip(zones,counts)))

The whole code of the example described above is available in this Jupyter notebook

The following is the sample screenshot of a live video produced by that example: