14. YOLO11

Ultralytics YOLO11 是新一代计算机视觉模型, 在目标检测、实例分割、图像分类、姿势估计、定向物体检测和对象跟踪等计算机视觉任务上展现了卓越的性能和准确性。

整体上yolo11相较于yolov8变化不大, 主要的改变有加入多头注意力机制,分类检测头加入深度可分离卷积等等,在性能和准确度上相对于yolov8有显著提升。




14.1. YOLO11简单测试

14.1.1. 环境配置


conda create -n yolo11 python=3.10
conda activate yolo11

# 配置pip源(可选)
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple/

# 安装
pip install ultralytics

# 或者
git clone https://github.com/ultralytics/ultralytics.git
cd ultralytics
pip install -e .

# 检测版本
(yolo11) llh@llh:/xxx$ yolo version

14.1.2. 模型推理测试

# 使用yolo命令测试yolo11n.pt模型
(yolo11) llh@llh:/xxx$ yolo predict model=yolo11n.pt source='https://ultralytics.com/images/bus.jpg'
Downloading https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n.pt to 'yolo11n.pt'...
100%|██████████████████████████████████████████████████████████████████████████| 5.35M/5.35M [00:16<00:00, 346kB/s]
Ultralytics 8.3.29 🚀 Python-3.10.15 torch-2.5.1 CUDA:0 (NVIDIA GeForce RTX 4070 Ti SUPER, 16376MiB)
YOLO11n summary (fused): 238 layers, 2,616,248 parameters, 0 gradients, 6.5 GFLOPs

Downloading https://ultralytics.com/images/bus.jpg to 'bus.jpg'...
100%|███████████████████████████████████████████████████████████████████████| 134k/134k [00:00<00:00, 1.25MB/s]
image 1/1 /xxx/bus.jpg: 640x480 4 persons, 1 bus, 54.0ms
Speed: 5.3ms preprocess, 54.0ms inference, 96.6ms postprocess per image at shape (1, 3, 640, 480)
Results saved to /xxx/runs/detect/predict
14.2. 模型转换

14.2.1. 导出onnx模型

模型导出使用专门针对rknn优化的 ultralytics_yolo11 。 该工程在基于不影响输出结果, 不需要重新训练模型的条件下, 有以下改动:

  • 修改输出结构, 移除后处理结构(后处理结果对于量化不友好);

  • dfl 结构在 NPU 处理上性能不佳,移至模型外部的后处理阶段,此操作大部分情况下可提升推理性能;

  • 模型输出分支新增置信度的总和,用于后处理阶段加速阈值筛选。

git clone https://github.com/airockchip/ultralytics_yolo11.git

# 修改 ./ultralytics/cfg/default.yaml中model文件路径,默认为yolo11n.pt

# 导出onnx模型
export PYTHONPATH=./
python ./ultralytics/engine/exporter.py

导出的onnx模型会在同目录下,使用 netron 查看其模型输入输出:


14.2.2. 转换成rknn模型


(toolkit2.2) llh@llh:/xxx/yolo11$ python onnx2rknn.py ./yolo11n.onnx rk3588 i8
I rknn-toolkit2 version: 2.2.0
--> Config model
--> Loading model
I Loading : 100%|██████████████████████████████████████████████| 174/174 [00:00<00:00, 81918.16it/s]
--> Building model
I OpFusing 0: 100%|██████████████████████████████████████████████| 100/100 [00:00<00:00, 571.25it/s]
I OpFusing 1 : 100%|█████████████████████████████████████████████| 100/100 [00:00<00:00, 279.63it/s]
I OpFusing 0 : 100%|█████████████████████████████████████████████| 100/100 [00:00<00:00, 164.75it/s]
I OpFusing 1 : 100%|█████████████████████████████████████████████| 100/100 [00:00<00:00, 152.38it/s]
I OpFusing 2 : 100%|█████████████████████████████████████████████| 100/100 [00:00<00:00, 144.98it/s]
I OpFusing 0 : 100%|█████████████████████████████████████████████| 100/100 [00:00<00:00, 104.63it/s]
I OpFusing 1 : 100%|█████████████████████████████████████████████| 100/100 [00:00<00:00, 100.10it/s]
I OpFusing 2 : 100%|██████████████████████████████████████████████| 100/100 [00:01<00:00, 75.79it/s]
W build: found outlier value, this may affect quantization accuracy
                        const name                          abs_mean    abs_std     outlier value
                        model.23.cv3.1.1.1.conv.weight      0.55        0.60        -12.173
                        model.23.cv3.0.0.0.conv.weight      0.25        0.35        -15.593
I GraphPreparing : 100%|███████████████████████████████████████| 228/228 [00:00<00:00, 10247.11it/s]
I Quantizating : 100%|████████████████████████████████████████████| 228/228 [00:18<00:00, 12.04it/s]
# 省略....................................
I rknn buiding done.
--> Export rknn model

14.3. 板卡上部署测试


# 安装opencv,git等等
sudo apt update
sudo apt install libopencv-dev git make gcc g++ libsndfile1-dev

# 鲁班猫系统中,使用命令直接拉取教程配套例程
git clone https://gitee.com/LubanCat/lubancat_ai_manual_code.git

cd lubancat_ai_manual_code/example/yolo11/cpp

# 或者拉取rknn_model_zoo例程测试,实际编译操作请查看工程的README文件
# git clone https://github.com/airockchip/rknn_model_zoo.git



切换到lubancat_ai_manual_code/example/yolo11/cpp目录下,然后编译例程(教程测试鲁班猫4,设置rk3588), 例程中yolo11_videocapture_demo将使用系统默认opencv,如果不需要,请注释CMakeLists.txt文件中的相关程序。

# 指定目标,如果内存大于4G,添加-d
cat@lubancat:/xxx/example/yolo11/cpp$ ./build-linux.sh -t rk3588
./build-linux.sh -t rk3588
-- The C compiler identification is GNU 10.2.1
-- The CXX compiler identification is GNU 10.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/aarch64-linux-gnu-gcc - skipped
-- Detecting C compile features
# 省略...............................
[ 95%] Linking CXX executable yolo11_videocapture_demo
[100%] Linking CXX executable yolo11_videocapture_demo_zero_copy
[100%] Built target yolo11_videocapture_demo
[100%] Built target yolo11_videocapture_demo_zero_copy
[  8%] Built target fileutils
[ 16%] Built target imageutils
[ 33%] Built target yolo11_videocapture_demo_zero_copy
[ 41%] Built target imagedrawing
[ 58%] Built target yolo11_image_demo_zero_copy
[ 75%] Built target yolo11_videocapture_demo
[ 91%] Built target yolo11_image_demo
[100%] Built target audioutils



cat@lubancat:/xxx$ ./yolo11_image_demo  model/yolo11n.rknn model/bus.jpg
load lable ./model/coco_80_labels_list.txt
model input num: 1, output num: 9
input tensors:
index=0, name=images, n_dims=4, dims=[1, 640, 640, 3], n_elems=1228800, size=1228800, fmt=NHWC, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
output tensors:
index=0, name=462, n_dims=4, dims=[1, 64, 80, 80], n_elems=409600, size=409600, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-44, scale=0.135466
index=1, name=onnx::ReduceSum_476, n_dims=4, dims=[1, 80, 80, 80], n_elems=512000, size=512000, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003027
index=2, name=480, n_dims=4, dims=[1, 1, 80, 80], n_elems=6400, size=6400, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003036
index=3, name=487, n_dims=4, dims=[1, 64, 40, 40], n_elems=102400, size=102400, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-38, scale=0.093536
index=4, name=onnx::ReduceSum_501, n_dims=4, dims=[1, 80, 40, 40], n_elems=128000, size=128000, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003476
index=5, name=505, n_dims=4, dims=[1, 1, 40, 40], n_elems=1600, size=1600, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
index=6, name=512, n_dims=4, dims=[1, 64, 20, 20], n_elems=25600, size=25600, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-42, scale=0.084287
index=7, name=onnx::ReduceSum_526, n_dims=4, dims=[1, 80, 20, 20], n_elems=32000, size=32000, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003903
index=8, name=530, n_dims=4, dims=[1, 1, 20, 20], n_elems=400, size=400, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
model is NHWC input fmt
model input height=640, width=640, channel=3
origin size=640x640 crop size=640x640
input image: 640 x 640, subsampling: 4:2:0, colorspace: YCbCr, orientation: 1
scale=1.000000 dst_box=(0 0 639 639) allow_slight_change=1 _left_offset=0 _top_offset=0 padding_w=0 padding_h=0
rga_api version 1.10.1_[0]
bus @ (91 136 554 440) 0.944
person @ (108 236 224 535) 0.894
person @ (211 240 284 509) 0.855
person @ (476 230 559 521) 0.816
person @ (79 358 118 516) 0.396
write_image path: out.png width=640 height=640 channel=3 data=0x55b715aee0


# 教程测试鲁班猫4,连接usb摄像头,然后执行程序,对其他屏幕显示的画面进行检测:
cat@lubancat:/xxx$ ./yolo11_videocapture_demo  model/yolo11n.rknn 0
load lable ./model/coco_80_labels_list.txt
model input num: 1, output num: 9
input tensors:
index=0, name=images, n_dims=4, dims=[1, 640, 640, 3], n_elems=1228800, size=1228800,
fmt=NHWC, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
output tensors:
# 省略......................

如果是打开USB摄像头,请确认摄像头的设备号,修改例程中摄像头支持的分辨率和MJPG格式等, 打开的是mipi摄像头时,opencv需要设置转换成rgb格式以及设置分辨率大小等等。