A Guide to Using YOLO Models with DeepStream SDK for Scalable Object Tracking

Home / Vision AI Guides

A Guide to Using YOLO Models with DeepStream SDK for Scalable Object Tracking

Team Awareye

September 30, 2024

•

5 min read

In the swiftly changing world of digital innovation, real-time object detection and tracking have become essential across various industries—from process mining to smart surveillance systems. One of the best ways to achieve this is by integrating YOLO (You Only Look Once) models with NVIDIA’s DeepStream SDK.

YOLO is renowned for its speed and accuracy, enabling efficient object detection in live video feeds. When paired with DeepStream, a powerful platform designed for building AI-based video analytics applications, you unlock the potential for scalable and efficient object-tracking solutions.

‍

In this blog, we’ll explore how to harness the strength of YOLO models alongside DeepStream, diving into practical implementation steps, optimization techniques, and real-world applications. Whether you’re a developer looking to enhance your video analytics projects or a tech enthusiast eager to understand the latest in AI advancements, this guide will provide you with the insights you need to get started.

Let’s delve into the subject in detail.

Introduction to YOLOv11: The Next Generation of Real-Time Object Detection

YOLOv11 is the latest iteration of the renowned YOLO series, and improves real-time object detection with enhanced accuracy and speed. Building on the strengths of its predecessors, YOLOv11 integrates advanced deep learning architectures and improved training techniques to deliver exceptional performance in various applications, from autonomous vehicles to smart surveillance systems.

One of the key features of YOLOv11 is its ability to balance precision and processing speed, making it capable of efficiently detecting multiple objects in complex environments. The model benefits from a streamlined architecture, allowing for faster inference times while maintaining high accuracy rates, making it ideal for real-time applications.

Additionally, YOLOv11 introduces improvements in feature extraction and model scalability, enabling users to tailor the model to specific needs and constraints.

In this article, we will use YOLOv8, but the deployment and inference approach remains almost exactly the same.

NVIDIA's DeepStream

NVIDIA's DeepStream SDK is a toolkit for processing video, audio, and images using AI. Built on GStreamer, it helps developers create real-time apps that analyze video data. It is beneficial for those working with multiple sensors, like cameras, and for building Intelligent Video Analytics (IVA) applications.

With DeepStream, you can create processing pipelines that utilize neural networks and handle tasks like video tracking, encoding, decoding, and rendering. It works on different platforms, making it easier and faster to build vision AI applications for on-site use, at the edge, or in the cloud.

At Awareye, we have configured and customized DeepStream so that it can be deployed with hundreds or thousands of camera feeds, and powered by models like the YOLO series. Using Awareye, companies are streamlining their process mining, operations, packaging pipelines, safety compliance and more.

The walkthrough below outlines the steps to use DeepStream with the YOLO series.

Installing DeepStream SDK

To install DeepStream SDK, you need access to a GPU server. We will assume you have that handy. There are two ways of installing the DeepStream SDK. First, you can use Docker. Alternatively, you can build it on your own. We have chosen the latter approach, as it helps you understand the workings of DeepStream SDK.

Step 1: Install Latest Glib

First, migrate glib to a newer version (2.76.6). Let’s install the packages below.

Prerequisites: Install below packages:

pip3 install meson
pip3 install ninja

Now compile and build:

$ git clone https://github.com/GNOME/glib.git
$ cd glib
$ git checkout <glib-version-branch>
# e.g. 2.76.6
$ meson build --prefix=/usr
$ ninja -C build/
$ cd build/
$ ninja install

Check and confirm the newly installed glib version.

pkg-config --modversion glib-2.0

Step 2: Install DeepStream Dependencies

Now we need to install GStreamer and a number of DeepStream SDK dependencies.

$ sudo apt install \
libssl3 \
libssl-dev \
libgles2-mesa-dev \
libgstreamer1.0-0 \
gstreamer1.0-tools \
gstreamer1.0-plugins-good \
gstreamer1.0-plugins-bad \
gstreamer1.0-plugins-ugly \
gstreamer1.0-libav \
libgstreamer-plugins-base1.0-dev \
libgstrtspserver-1.0-0 \
libjansson4 \
libyaml-cpp-dev \
libjsoncpp-dev \
protobuf-compiler \
gcc \
make \
git \
python3

Once this is done, you can move to installing CUDA Toolkit 12.2.

Step 3: Install CUDA Toolkit 12.2

To install CUDA Toolkit 12.2, run the commands below.

$ sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub
$ sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /"
$ sudo apt-get update
$ sudo apt-get install cuda-toolkit-12-2

We added the right repo, and then installed using apt-get.

Step 4: Install NVIDIA Driver 535.161.08

Download and install using NVIDIA driver 535.161.08 from the Data Center Driver for Linux X64 page at: https://www.nvidia.cn/Download/driverResults.aspx/222416/en-us/

Run the following commands to install and execute.

$chmod 755 NVIDIA-Linux-x86_64-535.161.08.run
$sudo ./NVIDIA-Linux-x86_64-535.161.08.run --no-cc-version-check

Step 5: Install TensorRT 8.6.1.6

Next, run the following command to install TensorRT 8.6.1.6.

sudo apt-get install --no-install-recommends libnvinfer-lean8=8.6.1.6-1+cuda12.0 libnvinfer-vc-plugin8=8.6.1.6-1+cuda12.0 \
libnvinfer-headers-dev=8.6.1.6-1+cuda12.0 libnvinfer-dev=8.6.1.6-1+cuda12.0 libnvinfer-headers-plugin-dev=8.6.1.6-1+cuda12.0 \
libnvinfer-plugin-dev=8.6.1.6-1+cuda12.0 libnvonnxparsers-dev=8.6.1.6-1+cuda12.0 libnvinfer-lean-dev=8.6.1.6-1+cuda12.0 \
libnvparsers-dev=8.6.1.6-1+cuda12.0 python3-libnvinfer-lean=8.6.1.6-1+cuda12.0 python3-libnvinfer-dispatch=8.6.1.6-1+cuda12.0 \
uff-converter-tf=8.6.1.6-1+cuda12.0 onnx-graphsurgeon=8.6.1.6-1+cuda12.0 libnvinfer-bin=8.6.1.6-1+cuda12.0 \
libnvinfer-dispatch-dev=8.6.1.6-1+cuda12.0 libnvinfer-dispatch8=8.6.1.6-1+cuda12.0 libnvonnxparsers-dev=8.6.1.6-1+cuda12.0 \
libnvonnxparsers8=8.6.1.6-1+cuda12.0 libnvinfer-vc-plugin-dev=8.6.1.6-1+cuda12.0 libnvinfer-samples=8.6.1.6-1+cuda12.0

Step 6: Install Librdkafka (to enable Kafka protocol adaptor for message broker)

We will need a Kafka message broker as well. Let’s install that.

Clone the librdkafka repository from GitHub:

$ git clone https://github.com/confluentinc/librdkafka.git

Configure and build the library.

$ cd librdkafka
$ git checkout tags/v2.2.0
$ ./configure --enable-ssl
$ make
$ sudo make install

Copy the generated libraries to the DeepStream directory.

$ sudo mkdir -p /opt/nvidia/deepstream/deepstream/lib
$ sudo cp /usr/local/lib/librdkafka* /opt/nvidia/deepstream/deepstream/lib
$ sudo ldconfig

Now we are ready to install the DeepStream SDK.

Step 7: Install DeepStream SDK

Download the DeepStream 7.0 package deepstream-7.0_7.0.0-1_amd64.deb :

https://catalog.ngc.nvidia.com/orgs/nvidia/resources/deepstream

$ sudo apt-get install ./deepstream-7.0_7.0.0-1_amd64.deb

You should have the DeepStream SDK installed.

Integrating YOLO with DeepStream SDK

Now that we’ve successfully set up our environment and installed all the necessary components, it’s time to dive into the exciting world of YOLO with DeepStream. This powerful combination enables us to leverage the strengths of YOLO’s real-time object detection capabilities alongside DeepStream’s efficient video analytics framework.

In this section, we’ll start by configuring our YOLO model for integration with DeepStream. We’ll explore how to optimize the model for performance, set up the inference pipeline, and visualize the detection results in real time. With DeepStream’s support for multiple streams and hardware acceleration, we can handle complex scenarios with ease, making our object tracking applications both scalable and efficient.

Here, we are using the marcoslucianops/DeepStream-Yolo GitHub repository, which includes support for YOLO models in the NVIDIA DeepStream SDK. The latest version supported currently is YOLOv8, so we will use that.

Install the dependencies.

pip install cmake
pip install onnxsim‍

Clone the following repository.

git clone https://github.com/marcoslucianops/DeepStream-Yolo
cd DeepStream-Yolo

Download the Ultralytics YOLOv8 detection model (.pt) of your choice from the YOLOv8 releases. Here, we’ll use yolov8s.pt.

wget 
https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8s.pt

Convert the model to ONNX, as that’s the format the DeepStream SDK uses.

python3 utils/export_yoloV8.py -w yolov8s.pt

Set the CUDA version.

export CUDA_VER=11.4

Compile the library.

make -C nvdsinfer_custom_impl_Yolo clean && make -C 
nvdsinfer_custom_impl_Yolo‍

Edit the config_infer_primary_yoloV8.txt file according to your model (for YOLOv8s with 80 classes).

[property]
...
onnx-file=yolov8s.onnx
...
num-detected-classes=80
...

Edit the deepstream_app_config file.

...
[primary-gie]
...
config-file=config_infer_primary_yoloV8.txt

You can also change the video source in deepstream_app_config file. Here, a default video file has been loaded.

...
[source0]
...
uri=file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4

Run inference on YOLOv8 via DeepStream SDK.

deepstream-app -c deepstream_app_config.txt‍

‍

Great! We now have a working setup with DeepStream SDK and YOLOv8. Let’s configure it for multiple video streams next.

DeepStream SDK and YOLO: MultiStream Setup

As the demand for real-time video analytics continues to grow, the ability to process multiple video streams simultaneously has become essential. This is where the DeepStream SDK shines, offering robust support for multistreaming to meet the challenges of modern applications.

Why Multistreaming?

Scalability: In many scenarios, such as factory floors or large retail environments, the need to analyze data from numerous cameras in real time is critical. DeepStream allows you to manage and process multiple streams concurrently, scaling your applications to handle increased loads without sacrificing performance.
Resource Efficiency: By leveraging GPU acceleration, DeepStream enables efficient resource utilization. This means that even with numerous streams, the processing remains smooth and responsive, maximizing your hardware’s potential.
Comprehensive Insights: Multistreaming allows for a holistic view of environments. Whether tracking people across multiple locations or monitoring traffic from various angles, processing multiple feeds in tandem provides richer data and insights that single-stream systems cannot offer.
Reduced Latency: In applications where real-time response is crucial—like surveillance or autonomous navigation—multistreaming ensures that data from all sources is processed promptly, leading to faster decision-making.

Guide to Setting up MultiStream Object Detection Workflow

To set up multiple video streams in one DeepStream application, you can make these changes to the deepstream_app_config.txt file.

Change the rows and columns to build a grid display according to the number of streams you want. For example, for 4 streams, you can add 2 rows and 2 columns.

[tiled-display]
rows=2
columns=2

Set num-sources=4 and add uri of all the 4 streams.

[source0]
enable=1
type=3
uri=<path_to_video>
uri=<path_to_video>
uri=<path_to_video>
uri=<path_to_video>
num-sources=4

Run multi-stream inference:

deepstream-app -c deepstream_app_config.txt‍

Conclusion

In conclusion, combining YOLO models with NVIDIA’s DeepStream SDK provides a highly scalable and efficient solution for real-time object detection and tracking. By leveraging YOLO’s accuracy and speed, along with DeepStream’s ability to handle multiple video streams and GPU acceleration, developers can build robust video analytics systems capable of processing large-scale data in a variety of environments. Whether you're deploying AI solutions for smart surveillance, process mining, or industrial automation, this integration offers a powerful toolkit to address modern challenges in video analysis.

As demonstrated, setting up DeepStream with YOLO models requires careful configuration, but the potential gains in scalability and performance make it worthwhile. You can follow the steps outlined above to do the same, or you can reach out to us at Awareye, where we are using DeepStream SDK and fine-tuned vision models to create vision AI solutions for the enterprise segment.

Reach out to us to learn more.

‍

News, Updates and Blogs

Stay updated with the latest developments in Vision AI, real-world applications, and industry innovations.

Case Studies

Case Study: Automated Vehicle Entry and Exit Monitoring

Implementing Awareye’s license plate recognition system at Best Western Resort and Country Club significantly improved the efficiency and reliability of vehicle entry and exit monitoring.

Case Studies

Case Study: Detecting Missing Labels and Missing Contents on Packaging Belt

CentralPharma significantly reduced packaging errors and improved efficiency by partnering with Awareye to integrate AI-powered camera systems for real-time label and content verification.

Case Studies

Case Study: Reducing Crowding in Common Areas at a Hospital

Aarvy Hospitals partnered with Awareye to implement an AI-powered computer vision system, significantly improving crowd control, social distancing, and safety during the COVID-19 pandemic.

Explore More

Ready to Deploy Awareye AI?

Unlock smarter operations with Awareye’s AI-driven multi-camera technology. Transform your business today—contact us to learn how.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.