11. YOLOv8

Ultralytics YOLOv8 建立在以前YOLO版本的成功基础上, 引入了新的功能和改进,进一步提高了性能和灵活性。YOLOv8设计快速、准确且易于使用,是目标检测和跟踪、实例分割、图像分类和姿态估计任务的绝佳选择。

broken

YOLOv8 有5个不同模型大小的预训练模型:n、s、m、l和x。不同大小模型目标检测的准确度:

broken

本章将简单测试 YOLOv8 目标检测、图像分割以及状态/关键点检测,并在鲁班猫板卡上部署测试。

提示

测试环境:板卡系统是Debian11,PC是WSL2(ubuntu20.04),rknn-Toolkit2是1.6.0以上版本

11.1. YOLOv8相关环境安装

使用conda创建虚拟环境,然后安装ultralytics相关环境,测试下yolo命令,主要是为后面使用airockchip/ultralytics_yolov8导出模型创建环境。 也可以直接跳到后面模型导出小结,拉取airockchip/ultralytics_yolov8,然后根据工程文件pyproject.toml中的依赖安装环境。

# 使用conda创建虚拟环境
conda create -n yolov8 python=3.9
conda activate yolov8

# 安装YOLOv8,直接使用命令安装
pip install ultralytics -i https://pypi.tuna.tsinghua.edu.cn/simple

# 或者通过拉取仓库然后安装
git clone https://github.com/ultralytics/ultralytics
cd ultralytics
pip install -e .

# 安装成功后,使用命令yolo简单看下版本
(yolov8) llh@anhao:/$ yolo version
8.2.96

安装之后,就可以直接使用 yolo命令 或者编写 python程序 进行模型训练等等操作。

11.2. YOLOv8目标检测

11.2.1. 目标检测简单测试

简单测试下环境,先从官网获取需要的权重文件(也可以不下载,使用yolo命令指定官方预训练权重时会自动下载),然后使用模型进行预测。

# 创建一个目录,然后获取官方仓库的权重(https://github.com/ultralytics/ultralytics/tree/main),
# 或者不手动下载,后面使用yolo命令会自动下载。
wget https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n.pt

# 获取测试图片,可以下面位置获取,可能会失败,也可以从配套例程获取
wget https://ultralytics.com/images/bus.jpg

然后使用yolo命令进行测试:

# 第一个参数是指任务[detect, segment, classify], 这里测试目标检测是detect,该参数是可选的;
# 第二个参数是模式[train, val, predict, export, track)],是选择进行训练、评估或者推理等等;
# 其他参数,model设置模型,source指定要预测的图片路径,imgsz指定图像尺寸等等,更多参数具体参考下:https://docs.ultralytics.com/usage/cfg/

# 简单目标检测
(yolov8) llh@anhao: yolo detect predict model=./yolov8n.pt source=./bus.jpg
Ultralytics YOLOv8.0.206 🚀 Python-3.8.18 torch-2.1.0+cu121 CUDA:0 (NVIDIA GeForce RTX 3060 Laptop GPU, 6144MiB)
YOLOv8n summary (fused): 168 layers, 3151904 parameters, 0 gradients, 8.7 GFLOPs

image 1/1 /yolov8/bus.jpg: 640x480 4 persons, 1 bus, 1 stop sign, 81.8ms
Speed: 2.9ms preprocess, 81.8ms inference, 3.3ms postprocess per image at shape (1, 3, 640, 480)
Results saved to runs/detect/predict
💡 Learn more at https://docs.ultralytics.com/modes/predict

# 预测图片结果保存在当前runs目录下,具体路径是./runs/detect/predict/bus.jpg

测试结果:

broken

11.2.2. 模型训练

这里测试coco128,然后训练模型:

yolo detect train data=coco128.yaml model=yolov8n.pt epochs=100 imgsz=640

也可以训练自己的模型。

11.2.3. 模型导出(airockchip/ultralytics_yolov8)

使用 airockchip/ultralytics_yolov8 可以直接导出适配rknpu的模型,在npu上获得更高的推理效率。 该仓库对模型进行了优化:

  • dfl 结构在NPU处理上性能不佳,移至模型外部。

  • 假设有6000个候选框,原模型将 dfl 结构放置于’’框置信度过滤”前,则6000个候选框都需要计算经过dfl计算;而将 dfl 结构放置于’’框置信度过滤”后, 假设过滤后剩100个候选框,则dfl部分计算量减少至100个,大幅减少了计算资源、带宽资源的占用。

假设有6000个候选框,检测类别是 80 类,则阈值检索操作需要重复 6000* 80 ~= 4.8*10^5 次,占据了较多耗时。故导出模型时, 在模型中额外新增了对 80 类检测目标进行求和操作,用于快速过滤置信度。(该结构在部分情况下有效,与模型的训练结果有关) 可以在 ./ultralytics/nn/modules/head.py 52行~54行的位置,注释掉这部分优化,对应的代码是:

cls_sum = torch.clamp(y[-1].sum(1, keepdim=True), 0, 1)
y.append(cls_sum)

具体参考下 RKOPT_README.md

测试 airockchip/ultralytics_yolov8 模型导出,注意教程测试的是rk_opt_v1分支, 该分支将导出torchscript模型,如果使用main分支默认导出onnx模型,模型转换成rknn时注意修改下加载模型函数。

# 拉取airockchip/ultralytics_yolov8,rk_opt_v1分支
git clone -b rk_opt_v1  https://github.com/airockchip/ultralytics_yolov8.git
cd ultralytics_yolov8

# 复制训练的模型yolov8n.pt到ultralytics_yolov8目录下
# 然后修改./ultralytics/cfg/default.yaml文件,主要是设置下model,为自己训练的模型路径:
    model: ./yolov8n.pt # (str, optional) path to model file, i.e. yolov8n.pt, yolov8n.yaml
    data:  # (str, optional) path to data file, i.e. coco128.yaml
    epochs: 100  # (int) number of epochs to train for

# 导出模型
(yolov8) llh@anhao:~/ultralytics_yolov8$ export PYTHONPATH=./
(yolov8) llh@anhao:~/ultralytics_yolov8$ python ./ultralytics/engine/exporter.py
Ultralytics YOLOv8.0.151 🚀 Python-3.8.18 torch-2.1.0+cu121 CPU (12th Gen Intel Core(TM) i5-12500H)
YOLOv8n summary (fused): 168 layers, 3151904 parameters, 0 gradients, 8.7 GFLOPs

PyTorch: starting from 'yolov8n.pt' with input shape (16, 3, 640, 640) BCHW and output shape(s) ((16, 64, 80, 80), (16, 80, 80, 80),
(16, 1, 80, 80), (16, 64, 40, 40), (16, 80, 40, 40), (16, 1, 40, 40), (16, 64, 20, 20), (16, 80, 20, 20), (16, 1, 20, 20)) (6.2 MB)

RKNN: starting export with torch 2.1.0+cu121...

RKNN: feed yolov8n_rknnopt.torchscript to RKNN-Toolkit or RKNN-Toolkit2 to generate RKNN model.
Refer https://github.com/airockchip/rknn_model_zoo/tree/main/models/CV/object_detection/yolo
RKNN: export success ✅ 5.0s, saved as 'yolov8n_rknnopt.torchscript' (12.3 MB)

Export complete (10.9s)
Results saved to /home/llh/ultralytics_yolov8
Predict:         yolo predict task=detect model=yolov8n_rknnopt.torchscript imgsz=640
Validate:        yolo val task=detect model=yolov8n_rknnopt.torchscript imgsz=640 data=coco.yaml
Visualize:       https://netron.app

导出的模型,保存在当前目录下的yolov8n_rknnopt.torchscript。

11.2.4. 模型转换

前面导出的模型,还需要通过toolkit2转换成rknn模型,toolkit2的安装参考下前面环境安装章节。 我们参考 rknn_model_zoo仓库 的例程, 编译下模型转换程序,将torchscript模型(需要注意load_pytorch导入的模型后缀是.pt,模型需要重新命名)转换成rknn模型。

pt2rknn.py(参考配套例程)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
if __name__ == '__main__':
    model_path, platform, do_quant, output_path = parse_arg()

    # Create RKNN object
    rknn = RKNN(verbose=False)

    # Pre-process config
    print('--> Config model')
    rknn.config(mean_values=[[0, 0, 0]], std_values=[
                    [255, 255, 255]], target_platform=platform)
    print('done')

    # 加载Pytorch模型,如果是使用airockchip/ultralytics_yolov8的main分支,将导入onnx模型
    print('--> Loading model')
    #ret = rknn.load_onnx(model=model_path)
    ret = rknn.load_pytorch(model=model_path, input_size_list=[[1, 3, 640, 640]])
    if ret != 0:
        print('Load model failed!')
        exit(ret)
    print('done')

    # Build model
    print('--> Building model')
    ret = rknn.build(do_quantization=do_quant, dataset=DATASET_PATH)
    if ret != 0:
        print('Build model failed!')
        exit(ret)
    print('done')

    # Export rknn model
    print('--> Export rknn model')
    ret = rknn.export_rknn(output_path)
    if ret != 0:
        print('Export rknn model failed!')
        exit(ret)
    print('done')

    # 精度分析,,输出目录./snapshot
    #print('--> Accuracy analysis')
    #ret = rknn.accuracy_analysis(inputs=['./subset/000000052891.jpg'])
    #if ret != 0:
    #    print('Accuracy analysis failed!')
    #    exit(ret)
    #print('done')

    # Release
    rknn.release()

执行命令,导出rknn模型:

# 执行python3 pt2rknn.py可查看参数解释
(toolkit2_1.6) llh@YH-LONG:/xxx/yolov8$ python3 pt2rknn.py
Usage: python3 pt2rknn.py torchscript_model_path [platform] [dtype(optional)] [output_rknn_path(optional)]
    platform choose from [rk3562,rk3566,rk3568,rk3588]
    dtype choose from    [i8, fp]
Example: python pt2rknn.py ./yolov8n.onnx rk3588

# 指定模型和目标,转换出rknn模型,默认int8量化
(toolkit2_1.6) llh@YH-LONG:/xxx/yolov8$ python3 pt2rknn.py yolov8n_rknnopt.pt rk3588
W __init__: rknn-toolkit2 version: 1.6.0+81f21f4d
--> Config model
done
--> Loading model
W load_onnx: It is recommended onnx opset 19, but your onnx model opset is 12!
W load_onnx: Model converted from pytorch, 'opset_version' should be set 19 in torch.onnx.export for successful convert!
Loading : 100%|████████████████████████████████████████████████| 136/136 [00:00<00:00, 33780.96it/s]
done
--> Building model
W build: found outlier value, this may affect quantization accuracy
const name                        abs_mean    abs_std     outlier value
model.0.conv.weight               2.44        2.47        -17.494
model.22.cv3.2.1.conv.weight      0.09        0.14        -10.215
model.22.cv3.1.1.conv.weight      0.12        0.19        13.361, 13.317
model.22.cv3.0.1.conv.weight      0.18        0.20        -11.216
GraphPreparing : 100%|██████████████████████████████████████████| 161/161 [00:00<00:00, 5311.08it/s]
Quantizating : 100%|██████████████████████████████████████████████| 161/161 [00:05<00:00, 29.91it/s]
W build: The default input dtype of 'images' is changed from 'float32' to 'int8' in rknn model for performance!
                    Please take care of this change when deploy rknn model with Runtime API!
#....省略....
done
--> Export rknn model
done

导出rknn模型后,我们使用toolkit2连接板卡,简单测试下模型,或者进行内存、性能评估等等。 可以直接使用 rknn_model_zoo仓库 提供的例程,我们这里参考例程编写下:

test.py(参考配套例程)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
if __name__ == '__main__':

    rknn = RKNN()
    rknn.list_devices()

    # 加载rknn模型
    rknn.load_rknn(path=rknn_model_path)

    # 设置运行环境,目标默认是rk3588
    ret = rknn.init_runtime(target=target, device_id=device_id)

    # 输入图像
    img_src = cv2.imread(img_path)
    src_shape = img_src.shape[:2]
    img, ratio, (dw, dh) = letter_box(img_src, IMG_SIZE)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    #img = cv2.resize(img_src, IMG_SIZE)

    # 推理运行
    print('--> Running model')
    outputs = rknn.inference(inputs=[img])
    print('done')

    # 后处理
    boxes, classes, scores = post_process(outputs)

    img_p = img_src.copy()
    if boxes is not None:
        draw(img_p, get_real_box(src_shape, boxes, dw, dh, ratio), scores, classes)
        cv2.imwrite("result.jpg", img_p)

执行命令前,确认PC通过usb或者网线连接,确认adb设备连接,在板卡上执行rknn_server或者restart_rknn.sh命令,开启rknn_server。 然后在PC ubutnu中执行命令(教程测试的板卡是lubancat-4):

(toolkit2_1.6) llh@YH-LONG:/xxx/yolov8$ python test.py
W __init__: rknn-toolkit2 version: 1.6.0+81f21f4d
*************************
all device(s) with adb mode:
192.168.103.157:5555
*************************
I NPUTransfer: Starting NPU Transfer Client, Transfer version 2.1.0 (b5861e7@2020-11-23T11:50:36)
D RKNNAPI: ==============================================
D RKNNAPI: RKNN VERSION:
D RKNNAPI:   API: 1.6.0 (535b468 build@2023-12-11T09:05:46)
D RKNNAPI:   DRV: rknn_server: 1.5.0 (17e11b1 build: 2023-05-18 21:43:39)
D RKNNAPI:   DRV: rknnrt: 1.6.0 (9a7b5d24c@2023-12-13T17:31:11)
D RKNNAPI: ==============================================
D RKNNAPI: Input tensors:
D RKNNAPI:   index=0, name=images, n_dims=4, dims=[1, 640, 640, 3], n_elems=1228800, size=1228800, w_stride = 0, size_with_stride = 0, fmt=NHWC, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
D RKNNAPI: Output tensors:
D RKNNAPI:   index=0, name=318, n_dims=4, dims=[1, 64, 80, 80], n_elems=409600, size=409600, w_stride = 0, size_with_stride = 0, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-58, scale=0.117659
D RKNNAPI:   index=1, name=onnx::ReduceSum_326, n_dims=4, dims=[1, 80, 80, 80], n_elems=512000, size=512000, w_stride = 0, size_with_stride = 0, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003104
D RKNNAPI:   index=2, name=331, n_dims=4, dims=[1, 1, 80, 80], n_elems=6400, size=6400, w_stride = 0, size_with_stride = 0, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003173
D RKNNAPI:   index=3, name=338, n_dims=4, dims=[1, 64, 40, 40], n_elems=102400, size=102400, w_stride = 0, size_with_stride = 0, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-45, scale=0.093747
D RKNNAPI:   index=4, name=onnx::ReduceSum_346, n_dims=4, dims=[1, 80, 40, 40], n_elems=128000, size=128000, w_stride = 0, size_with_stride = 0, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003594
D RKNNAPI:   index=5, name=350, n_dims=4, dims=[1, 1, 40, 40], n_elems=1600, size=1600, w_stride = 0, size_with_stride = 0, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003627
D RKNNAPI:   index=6, name=357, n_dims=4, dims=[1, 64, 20, 20], n_elems=25600, size=25600, w_stride = 0, size_with_stride = 0, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-34, scale=0.083036
D RKNNAPI:   index=7, name=onnx::ReduceSum_365, n_dims=4, dims=[1, 80, 20, 20], n_elems=32000, size=32000, w_stride = 0, size_with_stride = 0, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003874
D RKNNAPI:   index=8, name=369, n_dims=4, dims=[1, 1, 20, 20], n_elems=400, size=400, w_stride = 0, size_with_stride = 0, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
--> Running model
W inference: The 'data_format' is not set, and its default value is 'nhwc'!
done
person @ (211 241 282 506) 0.864
person @ (109 235 225 535) 0.860
person @ (477 226 560 522) 0.848
person @ (79 327 116 513) 0.306
bus  @ (96 136 549 449) 0.864

最后结果保存在当前目录下result.jpg。

更多的测试参考下 rknn_model_zoo仓库

11.2.5. 板卡上部署推理

1、使用 rknn_model_zoo仓库 提供的RKNN_C_demo, 在板端部署,测试的是lubancat-4板卡,Debian11系统,系统默认频率。

重要

如果部署自己训练的模型,按前面的方式转换成rknn模型,在部署时请注意类别的数量,请修改例程中的coco_80_labels_list.txt 文件和宏OBJ_CLASS_NUM等等

# 拉取rknn_model_zoo仓库源码,注意教程测试的rknn_model_zoo仓库版本是1.5.0
cat@lubancat:~/$ git clone https://github.com/airockchip/rknn_model_zoo.git
# 切换到~/rknn_model_zoo/libs/rklibs目录,然后拉取相关库,包括rknpu2 和 librga
cat@lubancat:~/$ cd ~/rknn_model_zoo/libs/rklibs
cat@lubancat:~/rknn_model_zoo/libs/rklibs$ git clone https://github.com/rockchip-linux/rknpu2
cat@lubancat:~/rknn_model_zoo/libs/rklibs$ git clone https://github.com/airockchip/librga

# 然后切换到~/rknn_model_zoomodels/CV/object_detection/yolo/RKNN_C_demo/RKNN_toolkit_2/rknn_yolo_demo目录
cat@lubancat:~/$ cd rknn_model_zoo/models/CV/object_detection/yolo/RKNN_C_demo/RKNN_toolkit_2/rknn_yolo_demo

# 运行build-linux_RK3588.sh脚本,编译工程(使用系统默认的编译器),最后生成的文件安装在build/目录下
cat@lubancat:~/rknn_model_zoo/models/CV/object_detection/yolo/RKNN_C_demo/RKNN_toolkit_2/rknn_yolo_demo$ ./build-linux_RK3588.sh
-- The C compiler identification is GNU 10.2.1
-- The CXX compiler identification is GNU 10.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/aarch64-linux-gnu-gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/aarch64-linux-gnu-g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: /home/cat/rknn_model_zoo/models/CV/object_detection/yolo/RKNN_C_demo/RKNN_toolkit_2/rknn_yolo_demo/build/build_linux_aarch64
Scanning dependencies of target rknn_yolo_demo_zero_copy
Scanning dependencies of target rknn_yolo_demo
[ 16%] Building CXX object CMakeFiles/rknn_yolo_demo.dir/src/yolo.cc.o
[ 16%] Building CXX object CMakeFiles/rknn_yolo_demo_zero_copy.dir/src/zero_copy_demo.cc.o
[ 33%] Building CXX object CMakeFiles/rknn_yolo_demo.dir/src/main.cc.o
[ 33%] Building CXX object CMakeFiles/rknn_yolo_demo_zero_copy.dir/src/yolo.cc.o
[ 50%] Building CXX object CMakeFiles/rknn_yolo_demo.dir/home/cat/rknn_model_zoo/models/CV/object_detection/yolo/RKNN_C_demo/yolo_utils/resize_function.cc.o
[ 50%] Building CXX object CMakeFiles/rknn_yolo_demo_zero_copy.dir/home/cat/rknn_model_zoo/models/CV/object_detection/yolo/RKNN_C_demo/yolo_utils/resize_function.cc.o
[ 58%] Building CXX object CMakeFiles/rknn_yolo_demo_zero_copy.dir/home/cat/rknn_model_zoo/libs/common/rkdemo_utils/rknn_demo_utils.cc.o
[ 66%] Building CXX object CMakeFiles/rknn_yolo_demo.dir/home/cat/rknn_model_zoo/libs/common/rkdemo_utils/rknn_demo_utils.cc.o
[ 75%] Building CXX object CMakeFiles/rknn_yolo_demo.dir/home/cat/rknn_model_zoo/libs/rklibs/librga/samples/utils/allocator/dma_alloc.cpp.o
[ 83%] Building CXX object CMakeFiles/rknn_yolo_demo_zero_copy.dir/home/cat/rknn_model_zoo/libs/rklibs/librga/samples/utils/allocator/dma_alloc.cpp.o
[ 91%] Linking CXX executable rknn_yolo_demo
[100%] Linking CXX executable rknn_yolo_demo_zero_copy
[100%] Built target rknn_yolo_demo
[100%] Built target rknn_yolo_demo_zero_copy
[ 50%] Built target rknn_yolo_demo_zero_copy
[100%] Built target rknn_yolo_demo
Install the project...
-- Install configuration: ""
## 后面省略.......

执行命令进行模型推理:

# 切换到install/rk3588/Linux/rknn_yolo_demo目录下,复制前面转换出的yolov8n_rknnopt_RK3588_i8.rknn模型文件到目录下,
# 然后执行命令:
cat@lubancat:xxx/install/rk3588/Linux/rknn_yolo_demo$ ./rknn_yolo_demo yolov8 q8 ./yolov8n_rknnopt_RK3588_i8.rknn ./model/bus640.jpg
MODEL HYPERPARAM:
    Model type: yolov8, 4
    Anchor free
    Post process with: q8
img_height: 640, img_width: 640, img_channel: 3
RUN MODEL ONE TIME FOR TIME DETAIL
    inputs_set use: 0.712000 ms
    rknn_run use: 22.514000 ms
    outputs_get use: 2.493000 ms
loadLabelName ./model/coco_80_labels_list.txt
post_process load lable finish
    cpu_post_process use: 8.829000 ms
DRAWING OBJECT
    person @ (211 241 282 507) 0.872261
    person @ (109 235 224 536) 0.860884
    bus @ (99 135 551 455) 0.849507
    person @ (476 221 560 521) 0.845714
    person @ (79 326 115 513) 0.298742
    SAVE TO ./out.bmp
WITHOUT POST_PROCESS
    full run for 10 loops, average time: 28.171200 ms
WITH POST_PROCESS
    full run for 10 loops, average time: 46.156399 ms

2、也可以参考 rknn_model_zoo仓库 提供的例程, 编写一个C++部署例程(从配套例程获取),然后在板卡上编译运行。

# 获取配套例程(该例程目前只支持linux平台,rk356x和rk3588)
cat@lubancat:~/$ git clone https://gitee.com/LubanCat/lubancat_ai_manual_code.git
cat@lubancat:~/$ cd lubancat_ai_manual_code/example/yolov8/yolov8_det/cpp

# 编译例程,-t指定rk3588
cat@lubancat:~/lubancat_ai_manual_code/example/yolov8/yolov8_det/cpp$ ./build-linux.sh -t rk3588
===================================
TARGET_SOC=rk3588
INSTALL_DIR=/home/cat/lubancat_ai_manual_code/example/yolov8/yolov8_det/cpp/install/rk3588_linux
BUILD_DIR=/home/cat/lubancat_ai_manual_code/example/yolov8/yolov8_det/cpp/build/build_rk3588_linux
CC=aarch64-linux-gnu-gcc
CXX=aarch64-linux-gnu-g++
===================================
-- The C compiler identification is GNU 10.2.1
-- The CXX compiler identification is GNU 10.2.1
#...省略...
[100%] Linking CXX executable rknn_yolov8_demo
[100%] Built target rknn_yolov8_demo
[100%] Built target rknn_yolov8_demo
Install the project...
-- Install configuration: ""
#...省略...

切换到当前目录install/rk3588_linux下,然后执行命令:

cat@lubancat:~/lubancat_ai_manual_code/example/yolov8/yolov8_det/cpp/install/rk3588_linux$ ./rknn_yolov8_demo ./model/yolov8_rk3588.rknn ./model/bus.jpg
load lable ./model/coco_80_labels_list.txt
model input num: 1, output num: 9
input tensors:
index=0, name=images, n_dims=4, dims=[1, 640, 640, 3], n_elems=1228800, size=1228800, fmt=NHWC, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
output tensors:
index=0, name=318, n_dims=4, dims=[1, 64, 80, 80], n_elems=409600, size=409600, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-58, scale=0.117659
index=1, name=onnx::ReduceSum_326, n_dims=4, dims=[1, 80, 80, 80], n_elems=512000, size=512000, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003104
index=2, name=331, n_dims=4, dims=[1, 1, 80, 80], n_elems=6400, size=6400, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003173
index=3, name=338, n_dims=4, dims=[1, 64, 40, 40], n_elems=102400, size=102400, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-45, scale=0.093747
index=4, name=onnx::ReduceSum_346, n_dims=4, dims=[1, 80, 40, 40], n_elems=128000, size=128000, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003594
index=5, name=350, n_dims=4, dims=[1, 1, 40, 40], n_elems=1600, size=1600, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003627
index=6, name=357, n_dims=4, dims=[1, 64, 20, 20], n_elems=25600, size=25600, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-34, scale=0.083036
index=7, name=onnx::ReduceSum_365, n_dims=4, dims=[1, 80, 20, 20], n_elems=32000, size=32000, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003874
index=8, name=369, n_dims=4, dims=[1, 1, 20, 20], n_elems=400, size=400, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
model is NHWC input fmt
model input height=640, width=640, channel=3
scale=1.000000 dst_box=(0 0 639 639) allow_slight_change=1 _left_offset=0 _top_offset=0 padding_w=0 padding_h=0
src width=640 height=640 fmt=0x1 virAddr=0x0x55762912c0 fd=0
dst width=640 height=640 fmt=0x1 virAddr=0x0x55763bd2f0 fd=0
src_box=(0 0 639 639)
dst_box=(0 0 639 639)
color=0x72
rga_api version 1.10.0_[2]
rknn_run
person @ (211 241 282 506) 0.864
bus @ (96 136 549 449) 0.864
person @ (109 235 225 535) 0.860
person @ (477 226 560 522) 0.848
person @ (79 327 116 513) 0.306
once rknn run and process use 20.539000 ms

结果保存在当前目录下的/out.jpg,结果显示:

broken

上面测试的C++部署例程使用Opencv加载和保存图片,librga处理图片,还有其他的图片加载和处理方式请参考rknn_model_zoo仓库。 如果上面程序运行提示RGA相关问题,请参考下 Rockchip_FAQ_RGA_CN.md

11.3. YOLOv8分割

11.3.1. 实例分割测试

YOLOv8支持实例分割(图像分割),我们可以直接使用yolo命令测试:

# yolo命令传输解释:
# 第一个参数是指任务[detect, segment, classify], 测试实例分割是segment,该参数是可选的;
# 第二个参数model,设置模型,该参数必须指定;
# 其他参数,source指定要预测的图片路径或者视频等等,imgsz指定图像尺寸等等,更多参数具体参考下:https://docs.ultralytics.com/usage/cfg/

# 简单图像分割测试,下面命令将从官方仓库获取预训练模型yolov8n-seg.pt,然后推理
(yolov8) llh@anhao: yolo segment predict model=yolov8n-seg.pt source='https://ultralytics.com/images/bus.jpg'
Downloading https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n-seg.pt to 'yolov8n-seg.pt'...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 6.73M/6.73M [00:00<00:00, 10.6MB/s]
Ultralytics YOLOv8.0.206 🚀 Python-3.8.18 torch-2.1.0+cu121 CUDA:0 (NVIDIA GeForce RTX 3060 Laptop GPU, 6144MiB)
YOLOv8n-seg summary (fused): 195 layers, 3404320 parameters, 0 gradients, 12.6 GFLOPs

Found https://ultralytics.com/images/bus.jpg locally at bus.jpg
image 1/1 /mnt/f/wsl_file/wsl_ai/yolov8/bus.jpg: 640x480 4 persons, 1 bus, 1 skateboard, 90.4ms
Speed: 3.6ms preprocess, 90.4ms inference, 313.3ms postprocess per image at shape (1, 3, 640, 480)
Results saved to runs/segment/predict
💡 Learn more at https://docs.ultralytics.com/modes/predict

结果保存在当前目录下runs/segment/predict下:

broken

11.3.2. 模型训练

测试训练coco128-seg.yaml,也可以自行制定数据集训练。

# 简单训练模型,基于预训练模型yolov8n-seg.pt,使用coco128
yolo segment train data=coco128-seg.yaml model=yolov8n-seg.pt epochs=100 imgsz=640

# 对刚训练的模型,进行简单评估
yolo segment val model=runs/segment/train/best.pt

11.3.3. 模型导出

使用 airockchip/ultralytics_yolov8 导出适合部署到rknpu上的模型,该模型在npu上获得更高的推理效率。

# 在当前目录下,拉取airockchip/ultralytics_yolov8,main分支
git clone https://github.com/airockchip/ultralytics_yolov8.git
cd ultralytics_yolov8

# 修改ultralytics/cfg/default.yaml中model文件路径,可以修改为前面训练出的模型,或者使用预训练模型yolov8n-seg.pt

    # Train settings -------------------------------------------------------------------------------------------------------
    model: ../runs/segment/train/weights/best.pt # (str, optional) path to model file, i.e. yolov8n.pt, yolov8n.yaml
    data:  # (str, optional) path to data file, i.e. coco128.yaml
    epochs: 100  # (int) number of epochs to train for

# 执行python ./ultralytics/engine/exporter.py导出模型
(yolov8) llh@anhao:~/ultralytics_yolov8$ export PYTHONPATH=./
(yolov8) llh@anhao:~/ultralytics_yolov8$ python ./ultralytics/engine/exporter.py
Ultralytics YOLOv8.0.151 🚀 Python-3.8.18 torch-2.1.0+cu121 CPU ()
YOLOv8n-seg summary (fused): 195 layers, 3404320 parameters, 0 gradients, 12.6 GFLOPs

PyTorch: starting from '../runs/segment/train4/weights/best.pt' with input shape (16, 3, 640, 640) BCHW and output shape(s) ((16, 64, 80, 80),
 (16, 80, 80, 80), (16, 1, 80, 80), (16, 32, 80, 80),(16, 64, 40, 40), (16, 80, 40, 40), (16, 1, 40, 40),(16, 32, 40, 40),
  (16, 64, 20, 20), (16, 80, 20, 20), (16, 1, 20, 20), (16, 32, 20, 20), (16, 32, 160, 160)) (6.7 MB)

RKNN: starting export with torch 2.1.0+cu121...

RKNN: feed ../runs/segment/train4/weights/best.onnx to RKNN-Toolkit or RKNN-Toolkit2 to generate RKNN model.
Refer https://github.com/airockchip/rknn_model_zoo/tree/main/models/CV/object_detection/yolo
RKNN: export success ✅ 0.5s, saved as '../runs/segment/train4/weights/best.onnx' (13.0 MB)

Export complete (4.9s)
Results saved to /mnt/f/wsl_file/wsl_ai/yolov8/runs/segment/train4/weights
Predict:         yolo predict task=segment model=../runs/segment/train4/weights/best.onnx imgsz=640
Validate:        yolo val task=segment model=../runs/segment/train4/weights/best.onnx imgsz=640 data=xxx
Visualize:       https://netron.app

# 模型保存在对应目录下(../runs/segment/train/weights/best.onnx),为了便于识别,可以重新命名为yolov8n-seg.onnx

导出的onnx模型,我们可以使用 netron 查看其网络结构。

11.3.4. 转换成rknn模型

导出的onnx模型,还需要通过toolkit2转换成rknn模型,这里简单编译下模型转换程序将onnx模型转换成rknn模型。

onnx2rknn.py(参考配套例程)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
if __name__ == '__main__':
    model_path, platform, do_quant, output_path = parse_arg()

    # Create RKNN object
    rknn = RKNN(verbose=False)

    # Pre-process config
    print('--> Config model')
    rknn.config(mean_values=[[0, 0, 0]], std_values=[
                    [255, 255, 255]], target_platform=platform)
    print('done')

    # Load model
    print('--> Loading model')
    ret = rknn.load_onnx(model=model_path)
    #ret = rknn.load_pytorch(model=model_path, input_size_list=[[1, 3, 640, 640]])
    if ret != 0:
        print('Load model failed!')
        exit(ret)
    print('done')

    # Build model
    print('--> Building model')
    ret = rknn.build(do_quantization=do_quant, dataset=DATASET_PATH)
    if ret != 0:
        print('Build model failed!')
        exit(ret)
    print('done')

    # Export rknn model
    print('--> Export rknn model')
    ret = rknn.export_rknn(output_path)
    if ret != 0:
        print('Export rknn model failed!')
        exit(ret)
    print('done')

    # 精度分析,,输出目录./snapshot
    #print('--> Accuracy analysis')
    #ret = rknn.accuracy_analysis(inputs=['./subset/000000052891.jpg'])
    #if ret != 0:
    #    print('Accuracy analysis failed!')
    #    exit(ret)
    #print('done')

    # Release
    rknn.release()

拉取配套例程,执行下面命令将onnx模型转换成rknn模型:

# 命令onnx2rknn.py 后面的参数是模型路径 输出目录 量化类型
(toolkit2_1.6) llh@YH-LONG:~/lubancat_ai_manual_code/yolov8/yolov8-seg$ python3 onnx2rknn.py
Usage: python3 onnx2rknn.py [onnx_model_path] [platform] [dtype(optional)] [output_rknn_path(optional)]
   platform choose from [rk3562,rk3566,rk3568,rk3588]
   dtype choose from    [i8, fp]
(toolkit2_1.6) llh@YH-LONG:~/lubancat_ai_manual_code/yolov8/yolov8-seg$ python3 onnx2rknn.py yolov8n_rknnopt.pt rk3588
W __init__: rknn-toolkit2 version: 1.6.0+81f21f4d
--> Config model
done
--> Loading model
W load_onnx: It is recommended onnx opset 19, but your onnx model opset is 12!
W load_onnx: Model converted from pytorch, 'opset_version' should be set 19 in torch.onnx.export for successful convert!
Loading : 100%|████████████████████████████████████████████████| 162/162 [00:00<00:00, 37982.96it/s]
done
--> Building model
W build: found outlier value, this may affect quantization accuracy
const name                        abs_mean    abs_std     outlier value
model.22.cv3.1.1.conv.weight      0.12        0.18        -12.338
GraphPreparing : 100%|██████████████████████████████████████████| 183/183 [00:00<00:00, 5341.05it/s]
Quantizating : 100%|██████████████████████████████████████████████| 183/183 [00:05<00:00, 30.75it/s]
# ....省略....
done
--> Export rknn model
done

转换成rknn模型后,我们使用toolkit2连接板卡简单测试下模型,或者进行内存、性能评估等等。

test.py(参考配套例程)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
if __name__ == '__main__':

    rknn = RKNN()
    rknn.list_devices()

    # 加载rknn模型
    rknn.load_rknn(path=rknn_model_path)

    # 设置运行环境,目标默认是rk3588
    ret = rknn.init_runtime(target=target, device_id=device_id)

    # 输入图像
    img_src = cv2.imread(img_path)
    src_shape = img_src.shape[:2]
    img, ratio, (dw, dh) = letter_box(img_src, IMG_SIZE)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    #img = cv2.resize(img_src, IMG_SIZE)

通过usb或者网线连接板卡,确认adb连接成功,在板卡上执行rknn_server命令,开启rknn_server,在PC端运行test.py。

(toolkit2_1.6) llh@YH-LONG:~/lubancat_ai_manual_code/yolov8/yolov8-seg$ python3 test.py
W __init__: rknn-toolkit2 version: 1.6.0+81f21f4d
*************************
all device(s) with adb mode:
192.168.103.152:5555
*************************
I NPUTransfer: Starting NPU Transfer Client, Transfer version 2.1.0 (b5861e7@2020-11-23T11:50:36)
D RKNNAPI: ==============================================
D RKNNAPI: RKNN VERSION:
D RKNNAPI:   API: 1.6.0 (535b468 build@2023-12-11T09:05:46)
D RKNNAPI:   DRV: rknn_server: 1.5.0 (17e11b1 build: 2023-05-18 21:43:39)
D RKNNAPI:   DRV: rknnrt: 1.6.0 (9a7b5d24c@2023-12-13T17:31:11)
D RKNNAPI: ==============================================
D RKNNAPI: Input tensors:
D RKNNAPI:   index=0, name=images, n_dims=4, dims=[1, 640, 640, 3], n_elems=1228800, size=1228800, w_stride = 0, size_with_stride = 0, fmt=NHWC, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
D RKNNAPI: Output tensors:
D RKNNAPI:   index=0, name=375, n_dims=4, dims=[1, 64, 80, 80], n_elems=409600, size=409600, w_stride = 0, size_with_stride = 0, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-61, scale=0.115401
D RKNNAPI:   index=1, name=onnx::ReduceSum_383, n_dims=4, dims=[1, 80, 80, 80], n_elems=512000, size=512000, w_stride = 0, size_with_stride = 0, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003514
D RKNNAPI:   index=2, name=388, n_dims=4, dims=[1, 1, 80, 80], n_elems=6400, size=6400, w_stride = 0, size_with_stride = 0, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003540
D RKNNAPI:   index=3, name=354, n_dims=4, dims=[1, 32, 80, 80], n_elems=204800, size=204800, w_stride = 0, size_with_stride = 0, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=27, scale=0.019863
D RKNNAPI:   index=4, name=395, n_dims=4, dims=[1, 64, 40, 40], n_elems=102400, size=102400, w_stride = 0, size_with_stride = 0, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-15, scale=0.099555
D RKNNAPI:   index=5, name=onnx::ReduceSum_403, n_dims=4, dims=[1, 80, 40, 40], n_elems=128000, size=128000, w_stride = 0, size_with_stride = 0, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003555
D RKNNAPI:   index=6, name=407, n_dims=4, dims=[1, 1, 40, 40], n_elems=1600, size=1600, w_stride = 0, size_with_stride = 0, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003680
D RKNNAPI:   index=7, name=361, n_dims=4, dims=[1, 32, 40, 40], n_elems=51200, size=51200, w_stride = 0, size_with_stride = 0, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=30, scale=0.022367
D RKNNAPI:   index=8, name=414, n_dims=4, dims=[1, 64, 20, 20], n_elems=25600, size=25600, w_stride = 0, size_with_stride = 0, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-55, scale=0.074253
D RKNNAPI:   index=9, name=onnx::ReduceSum_422, n_dims=4, dims=[1, 80, 20, 20], n_elems=32000, size=32000, w_stride = 0, size_with_stride = 0, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003813
D RKNNAPI:   index=10, name=426, n_dims=4, dims=[1, 1, 20, 20], n_elems=400, size=400, w_stride = 0, size_with_stride = 0, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
D RKNNAPI:   index=11, name=368, n_dims=4, dims=[1, 32, 20, 20], n_elems=12800, size=12800, w_stride = 0, size_with_stride = 0, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=43, scale=0.019919
D RKNNAPI:   index=12, name=347, n_dims=4, dims=[1, 32, 160, 160], n_elems=819200, size=819200, w_stride = 0, size_with_stride = 0, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-119, scale=0.032336
--> Running model
W inference: The 'data_format' is not set, and its default value is 'nhwc'!
done
bus  @ (87 137 553 439) 0.911
person @ (108 236 228 537) 0.900
person @ (211 241 283 508) 0.873
person @ (477 232 559 519) 0.866
person @ (79 327 125 514) 0.540
tie @ (248 284 259 310) 0.274

结果保存在result.jpg中。

11.3.5. 部署到板卡

简单参考 rknn_model_zoo仓库 提供的部署例程,编写一个例程部署yolov8n-seg.rknn到板卡。

# 获取配套例程(该例程只支持linux平台,rk356x和rk3588)
cat@lubancat:~/$ git clone https://gitee.com/LubanCat/lubancat_ai_manual_code.git
cat@lubancat:~/$ cd lubancat_ai_manual_code/example/yolov8/yolov8_seg/cpp

# 编译例程,-t指定rk3588
cat@lubancat:~/lubancat_ai_manual_code/example/yolov8/yolov8_seg/cpp$ ./build-linux.sh -t rk3588
./build-linux.sh -t rk3588
===================================
TARGET_SOC=rk3588
INSTALL_DIR=/home/cat/lubancat_ai_manual_code/example/yolov8/yolov8_seg/cpp/install/rk3588_linux
BUILD_DIR=/home/cat/lubancat_ai_manual_code/example/yolov8/yolov8_seg/cpp/build/build_rk3588_linux
CC=aarch64-linux-gnu-gcc
CXX=aarch64-linux-gnu-g++
===================================
-- The C compiler identification is GNU 10.2.1
-- The CXX compiler identification is GNU 10.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
# 省略....
-- Build files have been written to: /home/cat/yolov8/rknn_model_zoo/examples/yolov8/cpp_seg/build/build_rk3588_linux
Scanning dependencies of target rknn_yolov8_seg_demo
[ 20%] Building CXX object CMakeFiles/rknn_yolov8_seg_demo.dir/rknpu2/yolov8_seg.cc.o
[100%] Built target rknn_yolov8_seg_demo
[100%] Built target rknn_yolov8_seg_demo
Install the project...
# 省略....

切换到当前目录install/rk3588_linux下,然后执行命令:

# ./rknn_yolov8_seg_demo <model_path> <image_path>
cat@lubancat:~/lubancat_ai_manual_code/example/yolov8/yolov8_seg/cpp/install/rk3588_linux$ ./rknn_yolov8_seg_demo ./model/yolov8_seg_rk3588.rknn ./model/bus.jpg
load lable ./model/coco_80_labels_list.txt
model input num: 1, output num: 13
input tensors:
index=0, name=images, n_dims=4, dims=[1, 640, 640, 3], n_elems=1228800, size=1228800, fmt=NHWC, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
output tensors:
index=0, name=375, n_dims=4, dims=[1, 64, 80, 80], n_elems=409600, size=409600, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-61, scale=0.115401
index=1, name=onnx::ReduceSum_383, n_dims=4, dims=[1, 80, 80, 80], n_elems=512000, size=512000, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003514
index=2, name=388, n_dims=4, dims=[1, 1, 80, 80], n_elems=6400, size=6400, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003540
index=3, name=354, n_dims=4, dims=[1, 32, 80, 80], n_elems=204800, size=204800, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=27, scale=0.019863
index=4, name=395, n_dims=4, dims=[1, 64, 40, 40], n_elems=102400, size=102400, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-15, scale=0.099555
index=5, name=onnx::ReduceSum_403, n_dims=4, dims=[1, 80, 40, 40], n_elems=128000, size=128000, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003555
index=6, name=407, n_dims=4, dims=[1, 1, 40, 40], n_elems=1600, size=1600, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003680
index=7, name=361, n_dims=4, dims=[1, 32, 40, 40], n_elems=51200, size=51200, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=30, scale=0.022367
index=8, name=414, n_dims=4, dims=[1, 64, 20, 20], n_elems=25600, size=25600, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-55, scale=0.074253
index=9, name=onnx::ReduceSum_422, n_dims=4, dims=[1, 80, 20, 20], n_elems=32000, size=32000, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003813
index=10, name=426, n_dims=4, dims=[1, 1, 20, 20], n_elems=400, size=400, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
index=11, name=368, n_dims=4, dims=[1, 32, 20, 20], n_elems=12800, size=12800, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=43, scale=0.019919
index=12, name=347, n_dims=4, dims=[1, 32, 160, 160], n_elems=819200, size=819200, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-119, scale=0.032336
model is NHWC input fmt
model input height=640, width=640, channel=3
scale=1.000000 dst_box=(0 0 639 639) allow_slight_change=1 _left_offset=0 _top_offset=0 padding_w=0 padding_h=0
src width=640 height=640 fmt=0x1 virAddr=0x0x7fa2919040 fd=0
dst width=640 height=640 fmt=0x1 virAddr=0x0x559b7a1520 fd=0
src_box=(0 0 639 639)
dst_box=(0 0 639 639)
color=0x72
rga_api version 1.10.0_[2]
rknn_run
bus @ (87 137 553 439) 0.911
person @ (109 236 226 534) 0.900
person @ (211 241 283 508) 0.873
person @ (476 234 559 519) 0.866
person @ (79 327 125 514) 0.540
rknn run and process use 65.642000 ms

测试yolov8s-seg模型的结果保存在out.jpg,结果如下图所示。

broken

11.4. YOLOv8姿态估计

姿态估计(Pose estimation)是一项涉及识别图像中关键点位置的任务。 关键点可以表示对象的各个部分,如关节、地标或其他独特特征,关键点的位置通常表示为一组2D[x, y]或3D[x, y, visible]坐标。

YOLOv8-Pose人体姿态估计,会先检测出图像中所有的人体检测框,然后每个检测框进行人体姿态估计。 YOLOv8-Pose使用的数据集是 COCO Keypoints 2017 ,总共包含20万张图像,支持人体17个关键点。

YOLOv8-Pose提供了不同版本模型,适用于不同的环境:

  • YOLOv8n-pose: 轻量级的模型,适用于计算资源受限的环境。

  • YOLOv8s-pose: 相对轻量级但性能更好的模型,平衡了速度和准确度。

  • YOLOv8m-pose: 中等大小的模型,提供较高的准确度,适用于需要更准确结果的场景。

  • YOLOv8l-pose: 较大的模型,具有更高的准确度,但速度较慢,适用于高精度要求的应用。

  • YOLOv8x-pose: 精确最大的的模型,但速度最慢,适用于对准确度有极高要求的场景。

  • YOLOv8x-pose-p6: 支持高分辨率图像输入,更高的准确度和更强的检测能力,但需要更多的计算资源。

详细请查看:https://github.com/ultralytics/ultralytics

11.4.1. YOLOv8-pose简单测试

YOLOv8n-pose模型推理测试(使用python):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
from ultralytics import YOLO

# Load a model
model = YOLO("./yolov8n-pose.pt")

# Predict with the model
results = model("https://ultralytics.com/images/bus.jpg", save=True)

# 打印输出结果
for result in results:
    boxes = result.boxes  #Boxes对象
    keypoints = result.keypoints  # Keypoints对象
    print(boxes, keypoints)
# 获取yolov8n-pose.pt
(yolov8) llh@anhao: wget https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8n-pose.pt
# 推理测试
(yolov8) llh@anhao: python yolov8n_pose.py
Downloading https://ultralytics.com/images/bus.jpg to 'bus.jpg'...
100%|███████████████████████████████████████████████████████████████| 134k/134k [00:00<00:00, 1.03MB/s]
image 1/1 /mnt/e/work/yolov8/bus.jpg: 640x480 4 persons, 53.6ms
Speed: 4.5ms preprocess, 53.6ms inference, 120.5ms postprocess per image at shape (1, 3, 640, 480)
Results saved to runs/pose/Predict

ultralytics.engine.results.Boxes object with attributes:

cls: tensor([0., 0., 0., 0.], device='cuda:0')
conf: tensor([0.8908, 0.8800, 0.8732, 0.4125], device='cuda:0')
data: tensor([[4.5000e+01, 3.9700e+02, 2.4200e+02, 9.0600e+02, 8.9082e-01, 0.0000e+00],
        [6.7000e+02, 3.9000e+02, 8.1000e+02, 8.7800e+02, 8.8004e-01, 0.0000e+00],
        [2.2400e+02, 4.0400e+02, 3.4400e+02, 8.5600e+02, 8.7320e-01, 0.0000e+00],
        [0.0000e+00, 4.8500e+02, 7.2000e+01, 8.9300e+02, 4.1249e-01, 0.0000e+00]], device='cuda:0')
id: None
is_track: False
orig_shape: (1080, 810)
shape: torch.Size([4, 6])
xywh: tensor([[143.5000, 651.5000, 197.0000, 509.0000],
        [740.0000, 634.0000, 140.0000, 488.0000],
        [284.0000, 630.0000, 120.0000, 452.0000],
        [ 36.0000, 689.0000,  72.0000, 408.0000]], device='cuda:0')
#省略.................

上面YOLOv8n-pose模型预测,最后还打印输出了boxes和keypoints。boxes显示有检测出的四个框的类别(cls)、四个框的置信度(conf)、 框坐标和置信度(data)、原始图像尺寸(orig_shape)、边界框的中心坐标和尺寸(xywh)等等。

结果图片保存在当前目录的runs/pose/Predict中,查看如下:

broken

11.4.2. YOLOv8-pose模型导出

使用 airockchip/ultralytics_yolov8 导出适合部署到rknpu上的模型,模型的改动:

  • 修改输出结构, 移除后处理结构(后处理结果对于量化不友好);

  • dfl结构在NPU处理上性能不佳,移至模型外部的后处理阶段,此操作大部分情况下可提升推理性能;

获取自行训练或者官方的yolov8-pose模型,根据模型路径调整./ultralytics/cfg/default.yaml中model路径,然后导出模型:

(yolov8) llh@anhao: cd ultralytics_yolov8
(yolov8) llh@anhao: export PYTHONPATH=./
(yolov8) llh@anhao: python ./ultralytics/engine/exporter.py
Ultralytics YOLOv8.2.82 🚀 Python-3.9.19 torch-2.4.1+cu121 CPU (Intel Core(TM) i7-14700F)
YOLOv8n-pose summary (fused): 187 layers, 3,289,964 parameters, 0 gradients, 9.2 GFLOPs

PyTorch: starting from '../yolov8n-pose.pt' with input shape (16, 3, 640, 640) BCHW and output shape(s) ((), (16, 17, 3, 8400)) (6.5 MB)

RKNN: starting export with torch 2.4.1+cu121...

RKNN: feed ../yolov8n-pose.onnx to RKNN-Toolkit or RKNN-Toolkit2 to generate RKNN model.
Refer https://github.com/airockchip/rknn_model_zoo/tree/main/models/CV/object_detection/yolo
RKNN: export success ✅ 0.4s, saved as '../yolov8n-pose.onnx' (12.6 MB)

Export complete (3.3s)
Results saved to /xxx/yolov8
Predict:         yolo predict task=pose model=../yolov8n-pose.onnx imgsz=640
Validate:        yolo val task=pose model=../yolov8n-pose.onnx imgsz=640 data=/usr/src/app/ultralytics/datasets/coco-pose.yaml
Visualize:       https://netron.app

# 测试使用模型为yolov8n-pose.pt,在对应目录下生成yolov8n-pose.onnx模型。

可以使用 netron 查看导出的onnx模型的网络结构:

broken

11.4.3. 导出rknn模型

导出的yolov8-pose模型,使用toolkit2将器转换成rknn模型。

# 获取配套例程的转换程序onnx2rknn.py

# python onnx2rknn.py <onnx_model> <TARGET_PLATFORM> <dtype(optional)> <output_rknn_path(optional)>
(toolkit2.2) llh@anhao:/xxx/yolov8$ python onnx2rknn.py ./yolov8n-pose.onnx rk3588 fp
I rknn-toolkit2 version: 2.2.0
--> Config model
done
--> Loading model
I Loading : 100%|██████████████████████████████████████████████| 144/144 [00:00<00:00, 67581.94it/s]
done
--> Building model
W build: The dataset='../datasets/COCO/coco_subset_20.txt' is ignored because do_quantization = False!
I OpFusing 0: 100%|██████████████████████████████████████████████| 100/100 [00:00<00:00, 717.31it/s]
I OpFusing 1 : 100%|█████████████████████████████████████████████| 100/100 [00:00<00:00, 438.48it/s]
I OpFusing 0 : 100%|█████████████████████████████████████████████| 100/100 [00:00<00:00, 373.56it/s]
I OpFusing 1 : 100%|█████████████████████████████████████████████| 100/100 [00:00<00:00, 331.13it/s]
I OpFusing 2 : 100%|█████████████████████████████████████████████| 100/100 [00:00<00:00, 177.53it/s]
I rknn building ...
I rknn buiding done.
done
--> Export rknn model
output_path: ./yolov8_pose.rknn
done

模型转换测试设置了目标是rk3588,如果是lbcat-0/1/2,需要设置目标为rk3566/rk3568,并且没有量化模型。

11.4.4. 部署测试

YOLOv8-pose模型板卡上部署,其推理和后处理大致步骤:

  • 预处理,对输入图像进行letterbox,具体通过rga或者cpu实现,图像归一化等操作在rknn模型中;

  • 模型推理;

  • 后处理,根据设置的置信度阈值BOX_THRESH,对检测框进行筛选,然后对符合要求的检测框进行解码,并进行nms,解码关键点,还原到原图尺度等。

板卡上编译测试例程,编译前系统要安装opencv:

# 鲁班猫板卡系统默认是debian或者ubuntu发行版,直接使用apt安装opencv,建议自行编译安装opencv
sudo apt update
sudo apt install libopencv-dev

# 获取教程配套例程,,或者从https://github.com/airockchip/rknn_model_zoo获取测试例程

# 其中-t指定目标设备,这里测试使用lubancat-4,设置rk3588,如果是lubancat-0/1/2就设置rk356x
# 如果系统内存大于4G的,设置参数-d
cat@lubancat:~/xxx$ cd example/yolov8/yolov8_pose
cat@lubancat:~/xxx/example/yolov8/yolov8_pose$ ./build-linux.sh -t rk3588 -d

重要

部署使用的librknnrt库的版本需要与模型转换的Toolkit2的版本一致。

编译输出程序在当前目录的install/rk3588_linux中,测试yolov8_pose_image例程:

cat@lubancat:~/xxx/install/$  ./yolov8_pose_image_demo ./model/yolov8_pose.rknn ./model/bus.jpg
load lable ./model/yolov8_pose_labels_list.txt
model input num: 1, output num: 4
input tensors:
index=0, name=images, n_dims=4, dims=[1, 640, 640, 3], n_elems=1228800, size=2457600, fmt=NHWC, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
output tensors:
index=0, name=385, n_dims=4, dims=[1, 65, 80, 80], n_elems=416000, size=832000, fmt=NCHW, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
index=1, name=400, n_dims=4, dims=[1, 65, 40, 40], n_elems=104000, size=208000, fmt=NCHW, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
index=2, name=415, n_dims=4, dims=[1, 65, 20, 20], n_elems=26000, size=52000, fmt=NCHW, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
index=3, name=537, n_dims=4, dims=[1, 17, 3, 8400], n_elems=428400, size=856800, fmt=NCHW, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
model is NHWC input fmt
model input height=640, width=640, channel=3
origin size=640x640 crop size=640x640
input image: 640 x 640, subsampling: 4:2:0, colorspace: YCbCr, orientation: 1
scale=1.000000 dst_box=(0 0 639 639) allow_slight_change=1 _left_offset=0 _top_offset=0 padding_w=0 padding_h=0
src width=640 height=640 fmt=0x1 virAddr=0x0x7fa71f2010 fd=0
dst width=640 height=640 fmt=0x1 virAddr=0x0x7fa70c6000 fd=8
src_box=(0 0 639 639)
dst_box=(0 0 639 639)
color=0x72
rga_api version 1.10.0_[2]
person @ (108 235 224 536) 0.888
person @ (211 241 284 507) 0.872
person @ (476 234 560 518) 0.861
write_image path: out.png width=640 height=640 channel=3 data=0x7fa71f2010

在当前目录的install/rk3588_linux中,还有一个yolov8_pose_videocapture_demo,可以打开摄像头或者视频文件,下面测试打开摄像头,并拍摄其他屏幕显示的画面。

# 执行例程请注意摄像头的设备号,支持的分辨率,编解码格式等等,具体请查看yolov8_pose_videocapture_demo.cc源文件
cat@lubancat:~/xxx/install/$ ./yolov8_pose_videocapture_demo ./model/yolov8_pose.rknn 0
load lable ./model/yolov8_pose_labels_list.txt
model input num: 1, output num: 4
input tensors:
index=0, name=images, n_dims=4, dims=[1, 640, 640, 3], n_elems=1228800, size=2457600, fmt=NHWC, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
output tensors:
index=0, name=385, n_dims=4, dims=[1, 65, 80, 80], n_elems=416000, size=832000, fmt=NCHW, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
index=1, name=400, n_dims=4, dims=[1, 65, 40, 40], n_elems=104000, size=208000, fmt=NCHW, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
index=2, name=415, n_dims=4, dims=[1, 65, 20, 20], n_elems=26000, size=52000, fmt=NCHW, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
index=3, name=537, n_dims=4, dims=[1, 17, 3, 8400], n_elems=428400, size=856800, fmt=NCHW, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
model is NHWC input fmt
model input height=640, width=640, channel=3
scale=1.000000 dst_box=(0 80 639 559) allow_slight_change=1 _left_offset=0 _top_offset=80 padding_w=0 padding_h=160
src width=640 height=480 fmt=0x1 virAddr=0x0x55923ea480 fd=0
dst width=640 height=640 fmt=0x1 virAddr=0x0x7f79666000 fd=26
src_box=(0 0 639 479)
dst_box=(0 80 639 559)
color=0x72
rga_api version 1.10.0_[2]
fill dst image (x y w h)=(0 0 640 640) with color=0x72727272
rknn_run
# 省略...................
broken

更多rknn模型例程请参考 rknn_model_zoo仓库

需要注意:测试usb摄像头,请确认摄像头的设备号,修改例程中摄像头支持的分辨率和MJPG格式等, 如果是mipi摄像头,需要opencv设置转换成rgb格式以及设置分辨率大小等等。