3. RKNN Toolkit2介绍

RKNN Toolkit2 开发套件(Python接口)运行在PC平台(x86或者arm64),提供了模型转换、 量化功能、模型推理、性能和内存评估、量化精度分析、模型加密等功能。 更详细的功能说明参考下RKNN-Toolkit2工程文件的 《RKNN-Toolkit2用户使用指南》

本章将简单介绍在PC(Ubuntu系统)上使用RKNN-Toolkit2进行模型转换、模型推理、性能评估等测试。

重要

测试环境:鲁班猫RK系列板卡,镜像系统是Debian10/11,PC环境使用ubuntu20.04, 教程编写时的RKNN-Toolkit2是1.5.0版本,LubanCat-0/1/2板卡npu驱动0.8.2,lubancat-4板卡npu驱动是0.8.8。

3.1. Toolkit2安装

安装Toolkit2,可以使用python的包管理器pip3安装,或者直接使用docker构建Toolkit2环境。 相关依赖库和docker文件从瑞芯微官方 RKNN-Toolkit2工程 或者从 云盘资料下载 (提取码hslu), 在 1-野火开源图书_教程文档\配套代码->嵌入式AI应用开发实战指南->AI教程相关源文件->rknn-toolkit2 (获取的RKNN-Toolkit2文件中包含RKNN Toolkit Lite2)。

下面测试是使用python的venv虚拟环境中安装Toolkit2, 也可以参考前面开发环境章节使用 Anaconda 或者 miniconda 创建环境, 其中miniconda是Anaconda的轻量版。

# 安装python工具,ubuntu20.04默认是安装了python3.8.10
# 虚拟环境这里选择使用python3.8-venv,也可以安装Anaconda/nimiconda,然后使用conda管理环境。
sudo apt update
sudo apt-get install python3-dev python3-pip python3.8-venv gcc

#安装相关库和软件包
sudo apt-get install libxslt1-dev zlib1g-dev libglib2.0 libsm6 \
libgl1-mesa-glx libprotobuf-dev gcc

安装RKNN-Toolkit2:

# 创建目录,由于测试使用的ubuntu20.04已经安装的包可能和安装运行RKNN-Toolkit2所需的包版本不同,
# 为避免其他问题,这里使用python venv隔离环境。
mkdir project-Toolkit2 && cd project-Toolkit2
# toolkit2_1.5是环境名称,可以自行修改
python3 -m venv toolkit2_1.5

# 激活进入环境
source toolkit2_1.5/bin/activate

# 从官方RKNN-Toolkit2仓库拉取最新版本或者从配套例程获取(教程测试时toolkit2版本是1.5.0),或者配套网盘获取
git clone https://github.com/rockchip-linux/rknn-toolkit2.git
# 或者 git clone https://github.com/airockchip/rknn-toolkit2.git
#git clone https://gitee.com/LubanCat/lubancat_ai_manual_code.git
#cd lubancat_ai_manual_code/dev_env/rknn_toolkit2

# 配置pip源
pip3 config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple/

# 安装依赖库,根据rknn-toolkit2\doc\requirements_cp38-1.4.0.txt
pip3 install numpy
pip3 install -r doc/requirements_cp38-1.5.0.txt

# 安装rknn_toolkit2
# 根据系统的python版本和架构(最新版本支持arm64和x86)选择不同的whl文件安装:
pip3 install packages/rknn_toolkit2-1.5.0+1fa95b5c-cp38-cp38-linux_x86_64.whl

检测是否安装成功:

(toolkit2_1.5) llh@YH-LONG:~$ python3
Python 3.8.17 (default, Jul  5 2023, 21:04:15)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from rknn.api import RKNN
>>>

输入quit()或者使用快捷键Ctrl+D退出。

3.2. RKNN Toolkit2接口使用

这小节熟悉下Toolkit-lite2工具,该工具在PC平台上使用,提供python接口简化模型的部署和运行。 用户通过该工具可以便捷地完成一些功能:

  • 模型转换,Toolkit-lite2工具导入原始的Caffe、TensorFlow、TensorFlow Lite、ONNX、Pytorch、MXNet等模型转换成RKNN模型(), 也支持导入RKNN模型然后在NPU平台 上加载推理等。

  • 量化功能,支持将浮点模型量化为定点模型,目前支持的量化方法为非对称量化(asymmetric_quantized-8),并支持混合量化功能。

  • 模型推理,能够在PC上模拟NPU运行RKNN模型并获取推理结果;或将RKNN模型分发到指定的NPU设备上进行推理并获取推理结果。

  • 性能和内存评估,连接板卡,将RKNN模型分发到指定NPU设备上运行,然后评估模型在实际设备上运行时的性能和内存占用情况。

  • 量化精度分析,该功能将给出模型量化后每一层推理结果与浮点模型推理结果的余弦距离,以分析量化误差是如何出现的,为提高量化模型的精度提供思路。

  • 模型加密功能,使用指定的加密等级将RKNN模型整体加密,因为RKNN模型的解密是在NPU驱动中完成的,所以使用加密模型时,与普通RKNN模型一样加载即可,NPU驱动会自动对其进行解密。

使用Toolkit-lite2,可以运行在PC上,通过模拟器运行模型,然后进行推理,或者模型转换等操作;也可以运行在连接的板卡NPU上, 将RKNN模型传到NPU设备上运行,再从NPU设备上获取推理结果、性能信息等等。

Toolkit-lite2运行模型时的一个简单的流程示意:

broken

流程简单描述:

  • 创建RKNN对象,初始化RKNN环境

  • 设置模型预处理参数,如果是运行在PC上,通过模拟器运行模型时需要调用config接口设置模型的预处理参数;如果运行在连接的板卡NPU上并且导入RKNN模型,不需要配置。

  • 导入模型,如果是运行在PC上,通过模拟器运行模型时使用load_caffe、load_tensorflow等接口导入对应的非RKNN模型,通过;如果运行在连接的板卡NPU使用接口load_rknn导入RKNN模型。

  • 构建RKNN模型,如果是运行在PC上,通过模拟器运行模型,需要调用build接口构建RKNN模型,然后可以导出RKNN模型或者初始化运行环境进行推理等操作;如果运行在连接的板卡NPU上不需要。

  • 初始化运行时环境,如果需要模型推理或性能评估,必须先调用init_runtime初始化运行时环境,要指定模型的运行平台(模拟器或者连接板卡的硬件NPU)。

  • 初始化运行环境后,可以调用inference接口进行推理,使用eval_perf接口对模型性能进行评估,或者使用eval_memory接口获取模型在硬件平台上运行时的内存使用情况(模型必须运行在硬件平台上)。

  • 最后调用release接口释放RKNN对象。

使用load_rknn导入rknn模型时,不能调用accuracy_analysis精度分析,需要导入非rknn模型,然后构建模型时设置量化。

详细的接口说明参考下RKNN-Toolkit2工程中doc/目录下的用户手册,详细使用例程请参考RKNN-Toolkit2工程中examples/functions目录下例程。

3.2.1. 模型转换和模型推理

这小节展示下在PC上通过模拟器运行模型,进行模型的转换和推理。

配套例程可以也可以使用教程配套例程,或者从 RKNN-Toolkit2工程 文件中example目录下的onnx/yolov5中获取。

在已经搭建好RKNN-Toolkit2的运行环境中,执行下面命令:

# 切换到配套例程examples/onnx/yolov5目录下(使用教程配套例程)
(toolkit2_1.5) llh@YH-LONG:~$ cd lubancat_ai_manual_code/dev_env/rknn-toolkit2/examples/conversion/yolov5
# 运行test.py,模型转换和模型推理
(toolkit2_1.5) llh@YH-LONG:~/examples/conversion/yolov5$ python3 test.py
W __init__: rknn-toolkit2 version: 1.5.0+1fa95b5c
--> Config model
W config: 'target_platform' is None, use rk3566 as default, Please set according to the actual platform!
done
--> Loading model
Loading : 100%|████████████████████████████████████████████████| 124/124 [00:00<00:00, 16907.57it/s]
done
--> Building model
Analysing : 100%|███████████████████████████████████████████████| 142/142 [00:00<00:00, 5817.91it/s]
Quantizating : 100%|█████████████████████████████████████████████| 142/142 [00:00<00:00, 292.11it/s]
W build: The default input dtype of 'images' is changed from 'float32' to 'int8' in rknn model for performance!
                    Please take care of this change when deploy rknn model with Runtime API!
W build: The default output dtype of '269' is changed from 'float32' to 'int8' in rknn model for performance!
                    Please take care of this change when deploy rknn model with Runtime API!
W build: The default output dtype of '271' is changed from 'float32' to 'int8' in rknn model for performance!
                    Please take care of this change when deploy rknn model with Runtime API!
W build: The default output dtype of '273' is changed from 'float32' to 'int8' in rknn model for performance!
                    Please take care of this change when deploy rknn model with Runtime API!
done
--> Export rknn model
done
--> Init runtime environment
W init_runtime: Target is None, use simulator!
done
--> Running model
W inference: The 'data_format' has not been set and defaults is nhwc!
Analysing : 100%|███████████████████████████████████████████████| 146/146 [00:00<00:00, 5202.22it/s]
Preparing : 100%|███████████████████████████████████████████████| 146/146 [00:00<00:00, 1102.45it/s]
W inference: The dims of input(ndarray) shape (640, 640, 3) is wrong, expect dims is 4! Try expand dims to (1, 640, 640, 3)!
done
class: person, score: 0.8838784694671631
box coordinate left,top,right,down: [209.6862335205078, 243.11955797672272, 285.13685607910156, 507.7035621404648]
class: person, score: 0.8669421076774597
box coordinate left,top,right,down: [477.6677174568176, 241.00597953796387, 561.1506419181824, 523.3208637237549]
class: person, score: 0.826057493686676
box coordinate left,top,right,down: [110.24830067157745, 235.76190769672394, 231.76915538311005, 536.1012514829636]
class: person, score: 0.32633310556411743
box coordinate left,top,right,down: [80.75779604911804, 354.98213291168213, 121.49669003486633, 516.5315389633179]
class: bus , score: 0.6890695095062256
box coordinate left,top,right,down: [91.16828817129135, 134.78936767578125, 556.8909769654274, 460.78936767578125]

在test.py主程序中:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
if __name__ == '__main__':

    # 创建RKNN对象
    rknn = RKNN(verbose=True)

    # 设置模型转换参数,这里可以指定平台,添加target_platform='rk3588'配置,默认rk3566
    # mean_values是设置输入的均值,std_values是输入的归一化值
    print('--> Config model')
    rknn.config(mean_values=[[0, 0, 0]], std_values=[[255, 255, 255]])
    print('done')

    # 导入onnx模型,使用model指定onnx模型路径
    print('--> Loading model')
    ret = rknn.load_onnx(model=ONNX_MODEL)
    if ret != 0:
        print('Load model failed!')
        exit(ret)
    print('done')

    # 构建RKNN模型,这里设置do_quantization为true开启量化,dataset是指定用于量化校正的数据集
    print('--> Building model')
    ret = rknn.build(do_quantization=QUANTIZE_ON, dataset=DATASET)
    if ret != 0:
        print('Build model failed!')
        exit(ret)
    print('done')

    # 导出RKNN模型,使用export_path指定导出模型路径,这里默认设置RKNN_MODEL
    print('--> Export rknn model')
    ret = rknn.export_rknn(RKNN_MODEL)
    if ret != 0:
        print('Export rknn model failed!')
        exit(ret)
    print('done')

    # 调用init_runtime接口初始化运行时环境,默认是在PC上模拟仿真
    print('--> Init runtime environment')
    ret = rknn.init_runtime()
    # ret = rknn.init_runtime('rk3566')
    if ret != 0:
        print('Init runtime environment failed!')
        exit(ret)
    print('done')

    # 设置输出,用于模型推理
    img = cv2.imread(IMG_PATH)
    # img, ratio, (dw, dh) = letterbox(img, new_shape=(IMG_SIZE, IMG_SIZE))
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))

    # 进行推理,没有设置target默认使用模拟器,之后对输出数据后处理并保存结果
    print('--> Running model')
    outputs = rknn.inference(inputs=[img])
    np.save('./onnx_yolov5_0.npy', outputs[0])
    np.save('./onnx_yolov5_1.npy', outputs[1])
    np.save('./onnx_yolov5_2.npy', outputs[2])
    print('done')

    # 省略...

该例程将配置并导入onnx模型(yolov5s.onnx),对模型进行量化之后导出RKNN模型(yolov5s.rknn), 随后初始化运行环境,使用模拟器模拟推理获取输出数据,对输出数据后处理,最终得到结果图片。

需要注意的是,test.py配置的平台默认是rk3566,也就是导出的rknn模型是部署到rk3566平台。 要修改为其他平台,需要调用config接口设置模型的预处理参数,例如:

# 设置模型转换参数,这里可以指定平台,添加target_platform='rk3588'配置,默认rk3566
# mean_values是设置输入的均值,std_values是输入的归一化值
print('--> Config model')
rknn.config(mean_values=[[0, 0, 0]], std_values=[[255, 255, 255]], target_platform='rk3588')
print('done')

运行导出成功后,会在当前目录下生成yolov5s.rknn文件,在板卡上部署可以参考前面的RKNN API或者后面的Toolkit Lite2章节。 同时在PC上模拟运行RKNN模型并获取推理结果,会在当前目录下输出结果图片(result.jpg):

broken

3.2.2. 性能和内存评估

这小节将使用RKNN-Toolkit2,在连接的板卡NPU上运行,进行性能和内存评估或者推理等操作。

RKNN Toolkit2运行在PC上,通过PC的USB连接NPU设备。 RKNN Toolkit2将RKNN模型传到NPU设备上运行,再从NPU设备上获取推理结果、性能信息等

在开始例程前我们需要以下操作:

1.1、连接板卡,这里测试实际通过网络adb连接板卡。在板卡上先启动adbd(可以从配套例程获取):

# 在板卡上执行命令:
cat@lubancat:~$ ./adbd &
[1] 41260
cat@lubancat:~$ install_listener('tcp:5037','*smartsocket*')
using port=5555

在PC Ubuntu20.04上安装adb,直接使用命令 sudo apt install -y adb ,安装成功后使用adb devices命令启动adb server, 在板卡上我们启动adbd。然后板子和PC连接在一个局域网下(可以互相ping通),使用下面命令:

# PC端安装adb
(toolkit2_1.5) llh@YH-LONG:~$ sudo apt install -y adb

# 开启adb server
(toolkit2_1.5) llh@YH-LONG:~$ adb start-server

# 连接板子,IP根据具体板子,默认5555端口
(toolkit2_1.5) llh@YH-LONG:~$ adb connect 192.168.103.131
already connected to 192.168.103.131:5555

# 查看连接的设备,连接成功,这就是RKNN初始化运行时的device_id
(toolkit2_1.5) llh@YH-LONG:~$ adb devices
List of devices attached
192.168.103.131:5555    device

1.2、也可以有线连接adb。首先通过 usb gadget 将otg口配置成adb调试。用type-c线将板卡和电脑连接。(使用虚拟机时要注意连接到虚拟机上)然后使用以下命令查看id。

#获取id,此时id为cee99073e6f57988
(toolkit2_1.5) lubancat@lubancat-vm:~$ adb devices
List of devices attached
* daemon not running; starting now at tcp:5037
* daemon started successfully
cee99073e6f57988       device

2、板卡上启动rknn_server服务,该服务是一个运行在板子上的后台代理服务,用于接收PC通过USB传输过来的协议, 然后执行板端runtime对应的接口,并返回结果给PC,该rknn_server文件,鲁班猫板卡系统固件默认 已经添加(可以从 RKNPU2工程 文件获取,或者从配套例程获取)。

在linux平台上,我们需要添加librknnrt.so(鲁班猫板卡系统默认已经有,可以更新下), 该文件从 RKNPU2 的runtime/XXXX/Linux/rknn_server目录下获取,或者教程配套例程中获取。

# 在板卡上执行restart_rknn.sh或者直接执行rknn_server
# 如果想查看运行日志,可以设置环境变量export RKNN_LOG_LEVEL=4
cat@lubancat:~$ rknn_server
start rknn server, version:1.5.0 (17e11b1 build: 2023-05-18 21:43:39)
I NPUTransfer: Starting NPU Transfer Server, Transfer version 2.1.0 (b5861e7@2020-11-23T11:50:51)

接下来获取配套例程,然后测试例程(测试使用RKNN模型,使用前面小节导出的rknn模型):

(toolkit2_1.5) llh@YH-LONG:~$ cd lubancat_ai_manual_code/dev_env/rknn-toolkit2/examples/evaluation/yolov5
(toolkit2_1.5) llh@YH-LONG:~/rknn-toolkit2/examples/evaluation/yolov5$ python3 test.py
W __init__: rknn-toolkit2 version: 1.5.0+1fa95b5c
--> Loading model
done
--> Init runtime environment
W init_runtime: Flag perf_debug has been set, it will affect the performance of inference!
W init_runtime: Flag eval_mem has been set, it will affect the performance of inference!
I NPUTransfer: Starting NPU Transfer Client, Transfer version 2.1.0 (b5861e7@2020-11-23T11:50:36)
D RKNNAPI: ==============================================
D RKNNAPI: RKNN VERSION:
D RKNNAPI:   API: 1.5.0 (17e11b1 build: 2023-05-18 21:43:54)
D RKNNAPI:   DRV: rknn_server: 1.5.0 (17e11b1 build: 2023-05-18 21:43:39)
D RKNNAPI:   DRV: rknnrt: 1.5.0 (e6fe0c678@2023-05-25T08:09:20)
D RKNNAPI: ==============================================
D RKNNAPI: Input tensors:
D RKNNAPI:   index=0, name=images, n_dims=4, dims=[1, 640, 640, 3], n_elems=1228800, size=1228800, w_stride = 0, size_with_stride = 0, fmt=NHWC, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
D RKNNAPI: Output tensors:
D RKNNAPI:   index=0, name=269, n_dims=4, dims=[1, 255, 80, 80], n_elems=1632000, size=1632000, w_stride = 0, size_with_stride = 0, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=83, scale=0.093136
D RKNNAPI:   index=1, name=271, n_dims=4, dims=[1, 255, 40, 40], n_elems=408000, size=408000, w_stride = 0, size_with_stride = 0, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=48, scale=0.089854
D RKNNAPI:   index=2, name=273, n_dims=4, dims=[1, 255, 20, 20], n_elems=102000, size=102000, w_stride = 0, size_with_stride = 0, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=46, scale=0.078630
done
--> eval_perf
W eval_perf: The result of eval_perf is also saved to eval_perf.csv!
===================================================================================================================
                            Performance
        #### The performance result is just for debugging, ####
        #### may worse than actual performance!            ####
===================================================================================================================
ID   OpType           DataType Target InputShape                                   OutputShape            DDR Cycles     NPU Cycles     Total Cycles   Time(us)       MacUsage(%)    WorkLoad(0/1/2)-ImproveTherical        Task Number    Task Size      Regcmd Size    RW(KB)         FullName
1    InputOperator    UINT8    CPU    \                                            (1,3,640,640)          0              0              0              14             \              0.0%/0.0%/0.0% - Up:0.0%               0              0              0              4598.19        InputOperator:images
2    ConvRelu         UINT8    NPU    (1,3,640,640),(32,3,6,6),(32)                (1,32,320,320)         851599         345600         851599         4041           8.55           100.0%/0.0%/0.0% - Up:0.0%             68             67456          69632          4918.31        Conv:Conv_0
3    ConvRelu         INT8     NPU    (1,32,320,320),(64,32,3,3),(64)              (1,64,160,160)         416796         460800         460800         924            49.87          100.0%/0.0%/0.0% - Up:0.0%             42             41664          43008          2407.16        Conv:Conv_2
4    ConvRelu         INT8     NPU    (1,64,160,160),(32,64,1,1),(32)              (1,32,160,160)         277747         51200          277747         433            11.82          100.0%/0.0%/0.0% - Up:0.0%             17             16864          17408          1604.09        Conv:Conv_4
5    ConvRelu         INT8     NPU    (1,32,160,160),(32,32,1,1),(32)              (1,32,160,160)         421114         25600          421114         390            6.56           100.0%/0.0%/0.0% - Up:0.0%             10             9920           10240          2432.09        Conv:Conv_6
6    ConvReluAdd      INT8     NPU    (1,32,160,160),(32,32,3,3),(32),(1,32,160,160) (1,32,160,160)         416796         230400         416796         474            48.61          100.0%/0.0%/0.0% - Up:0.0%             10             9920           10240          2407.16        Conv:Conv_8
7    ConvRelu         INT8     NPU    (1,64,160,160),(32,64,1,1),(32)              (1,32,160,160)         554287         51200          554287         430            11.91          100.0%/0.0%/0.0% - Up:0.0%             22             17904          18688          3201.22        Conv:Conv_11
8    Concat           INT8     NPU    (1,32,160,160),(1,32,160,160)                (1,64,160,160)         555661         0              555661         638            \              100.0%/0.0%/0.0% - Up:0.0%             17             7664           8960           3209.16        Concat:Concat_13
9    ConvRelu         INT8     NPU    (1,64,160,160),(64,64,1,1),(64)              (1,64,160,160)         439662         102400         439662         645            15.88          100.0%/0.0%/0.0% - Up:0.0%             21             20832          21504          2539.22        Conv:Conv_14
10   ConvRelu         INT8     NPU    (1,64,160,160),(128,64,3,3),(128)            (1,128,80,80)          209700         460800         460800         817            56.40          100.0%/0.0%/0.0% - Up:0.0%             30             29760          30720          1211.09        Conv:Conv_16
11   ConvRelu         INT8     NPU    (1,128,80,80),(64,128,1,1),(64)              (1,64,80,80)           139569         51200          139569         277            18.48          100.0%/0.0%/0.0% - Up:0.0%             12             11904          12288          806.06         Conv:Conv_18
12   ConvRelu         INT8     NPU    (1,64,80,80),(64,64,1,1),(64)                (1,64,80,80)           216101         25600          216101         213            12.02          100.0%/0.0%/0.0% - Up:0.0%             9              8928           9216           1248.06        Conv:Conv_20
13   ConvReluAdd      INT8     NPU    (1,64,80,80),(64,64,3,3),(64),(1,64,80,80)   (1,64,80,80)           139569         230400         230400         347            66.40          100.0%/0.0%/0.0% - Up:0.0%             9              8928           9216           806.06         Conv:Conv_22
14   ConvRelu         INT8     NPU    (1,64,80,80),(64,64,1,1),(64)                (1,64,80,80)           216101         25600          216101         214            11.96          100.0%/0.0%/0.0% - Up:0.0%             9              8928           9216           1248.06        Conv:Conv_25
15   ConvReluAdd      INT8     NPU    (1,64,80,80),(64,64,3,3),(64),(1,64,80,80)   (1,64,80,80)           209700         230400         230400         349            66.02          100.0%/0.0%/0.0% - Up:0.0%             9              8928           9216           1211.09        Conv:Conv_27
16   ConvRelu         INT8     NPU    (1,128,80,80),(64,128,1,1),(64)              (1,64,80,80)           277249         51200          277249         276            18.55          100.0%/0.0%/0.0% - Up:0.0%             17             12944          13568          1601.22        Conv:Conv_30
17   Concat           INT8     NPU    (1,64,80,80),(1,64,80,80)                    (1,128,80,80)          280344         0              280344         372            \              100.0%/0.0%/0.0% - Up:0.0%             17             7664           8960           1619.09        Concat:Concat_32
18   ConvRelu         INT8     NPU    (1,128,80,80),(128,128,1,1),(128)            (1,128,80,80)          275014         102400         275014         365            28.05          100.0%/0.0%/0.0% - Up:0.0%             12             11904          12288          1588.31        Conv:Conv_33
19   ConvRelu         INT8     NPU    (1,128,80,80),(256,128,3,3),(256)            (1,256,40,40)          109787         460800         460800         684            67.37          100.0%/0.0%/0.0% - Up:0.0%             42             41664          43008          634.06         Conv:Conv_35
20   ConvRelu         INT8     NPU    (1,256,40,40),(128,256,1,1),(128)            (1,128,40,40)          72209          51200          72209          182            28.13          100.0%/0.0%/0.0% - Up:0.0%             9              8928           9216           417.03         Conv:Conv_37
21   ConvRelu         INT8     NPU    (1,128,40,40),(128,128,1,1),(128)            (1,128,40,40)          129002         25600          129002         136            18.82          100.0%/0.0%/0.0% - Up:0.0%             6              5952           6144           745.03         Conv:Conv_39
22   ConvReluAdd      INT8     NPU    (1,128,40,40),(128,128,3,3),(128),(1,128,40,40) (1,128,40,40)          72209          230400         230400         330            69.82          100.0%/0.0%/0.0% - Up:0.0%             6              5952           6144           417.03         Conv:Conv_41
23   ConvRelu         INT8     NPU    (1,128,40,40),(128,128,1,1),(128)            (1,128,40,40)          129002         25600          129002         133            19.25          100.0%/0.0%/0.0% - Up:0.0%             6              5952           6144           745.03         Conv:Conv_44
24   ConvReluAdd      INT8     NPU    (1,128,40,40),(128,128,3,3),(128),(1,128,40,40) (1,128,40,40)          72209          230400         230400         352            65.45          100.0%/0.0%/0.0% - Up:0.0%             6              5952           6144           417.03         Conv:Conv_46
25   ConvRelu         INT8     NPU    (1,128,40,40),(128,128,1,1),(128)            (1,128,40,40)          129002         25600          129002         214            11.96          100.0%/0.0%/0.0% - Up:0.0%             6              5952           6144           745.03         Conv:Conv_49
26   ConvReluAdd      INT8     NPU    (1,128,40,40),(128,128,3,3),(128),(1,128,40,40) (1,128,40,40)          109787         230400         230400         334            68.98          100.0%/0.0%/0.0% - Up:0.0%             6              5952           6144           634.06         Conv:Conv_51
27   ConvRelu         INT8     NPU    (1,256,40,40),(128,256,1,1),(128)            (1,128,40,40)          138730         51200          138730         179            28.60          100.0%/0.0%/0.0% - Up:0.0%             14             9968           10496          801.22         Conv:Conv_54
28   Concat           INT8     NPU    (1,128,40,40),(1,128,40,40)                  (1,256,40,40)          149958         0              149958         214            \              100.0%/0.0%/0.0% - Up:0.0%             17             7664           8960           866.06         Concat:Concat_56
29   ConvRelu         INT8     NPU    (1,256,40,40),(256,256,1,1),(256)            (1,256,40,40)          504912         102400         504912         235            43.57          100.0%/0.0%/0.0% - Up:0.0%             9              8928           9216           2916.06        Conv:Conv_57
30   ConvRelu         INT8     NPU    (1,256,40,40),(512,256,3,3),(512)            (1,512,20,20)          74287          460800         460800         747            61.69          100.0%/0.0%/0.0% - Up:0.0%             12             11904          12288          429.03         Conv:Conv_59
31   ConvRelu         INT8     NPU    (1,512,20,20),(256,512,1,1),(256)            (1,256,20,20)          45890          51200          51200          150            34.13          100.0%/0.0%/0.0% - Up:0.0%             6              5952           6144           265.03         Conv:Conv_61
32   ConvRelu         INT8     NPU    (1,256,20,20),(256,256,1,1),(256)            (1,256,20,20)          151857         25600          151857         101            25.35          100.0%/0.0%/0.0% - Up:0.0%             6              5952           6144           877.03         Conv:Conv_63
33   ConvReluAdd      INT8     NPU    (1,256,20,20),(256,256,3,3),(256),(1,256,20,20) (1,256,20,20)          74287          230400         230400         315            73.14          100.0%/0.0%/0.0% - Up:0.0%             6              5952           6144           429.03         Conv:Conv_65
34   ConvRelu         INT8     NPU    (1,512,20,20),(256,512,1,1),(256)            (1,256,20,20)          69471          51200          69471          148            34.59          100.0%/0.0%/0.0% - Up:0.0%             11             6992           7424           401.22         Conv:Conv_68
35   Concat           INT8     NPU    (1,256,20,20),(1,256,20,20)                  (1,512,20,20)          113764         0              113764         123            \              100.0%/0.0%/0.0% - Up:0.0%             17             7664           8960           657.03         Concat:Concat_70
36   ConvRelu         INT8     NPU    (1,512,20,20),(512,512,1,1),(512)            (1,512,20,20)          74287          102400         102400         206            49.71          100.0%/0.0%/0.0% - Up:0.0%             6              5952           6144           429.03         Conv:Conv_71
37   ConvRelu         INT8     NPU    (1,512,20,20),(256,512,1,1),(256)            (1,256,20,20)          58952          51200          58952          144            35.56          100.0%/0.0%/0.0% - Up:0.0%             11             6992           7424           340.47         Conv:Conv_73
38   MaxPool          INT8     NPU    (1,256,20,20)                                (1,256,20,20)          58952          0              58952          86             \              100.0%/0.0%/0.0% - Up:0.0%             2              416            512            340.47         MaxPool:MaxPool_75
39   MaxPool          INT8     NPU    (1,256,20,20)                                (1,256,20,20)          58952          0              58952          83             \              100.0%/0.0%/0.0% - Up:0.0%             2              416            512            340.47         MaxPool:MaxPool_76
40   MaxPool          INT8     NPU    (1,256,20,20)                                (1,256,20,20)          138941         0              138941         84             \              100.0%/0.0%/0.0% - Up:0.0%             2              416            512            802.44         MaxPool:MaxPool_77
41   Concat           INT8     NPU    (1,256,20,20),(1,256,20,20),(1,256,20,20),(1,256,20,20) (1,1024,20,20)         281551         0
281551         203            \              100.0%/0.0%/0.0% - Up:0.0%             29             14288          16640          1626.06        Concat:Concat_78
42   ConvRelu         INT8     NPU    (1,1024,20,20),(512,1024,1,1),(512)          (1,512,20,20)          74287          204800         204800         365            56.11          100.0%/0.0%/0.0% - Up:0.0%             12             11904          12288          429.03         Conv:Conv_79
43   ConvRelu         INT8     NPU    (1,512,20,20),(256,512,1,1),(256)            (1,256,20,20)          87446          51200          87446          147            34.83          100.0%/0.0%/0.0% - Up:0.0%             11             6992           7424           505.03         Conv:Conv_81
44   ConvTranspose    INT8     NPU    (1,256,20,20),(256,1,4,4),(256)              (1,256,40,40)          277249         6400           277249         170            3.76           100.0%/0.0%/0.0% - Up:0.0%             1              992            1024           1601.22        ConvTranspose:Resize_84_2deconv
45   Concat           INT8     NPU    (1,256,40,40),(1,256,40,40)                  (1,512,40,40)          184766         0              184766         458            \              100.0%/0.0%/0.0% - Up:0.0%             17             7664           8960           1067.09        Concat:Concat_85
46   ConvRelu         INT8     NPU    (1,512,40,40),(128,512,1,1),(128)            (1,128,40,40)          72209          102400         102400         362            28.29          100.0%/0.0%/0.0% - Up:0.0%             12             11904          12288          417.03         Conv:Conv_86
47   ConvRelu         INT8     NPU    (1,128,40,40),(128,128,1,1),(128)            (1,128,40,40)          94372          25600          94372          201            12.74          100.0%/0.0%/0.0% - Up:0.0%             6              5952           6144           545.03         Conv:Conv_88
48   ConvRelu         INT8     NPU    (1,128,40,40),(128,128,3,3),(128)            (1,128,40,40)          184766         230400         230400         392            58.78          100.0%/0.0%/0.0% - Up:0.0%             6              5952           6144           1067.09        Conv:Conv_90
49   ConvRelu         INT8     NPU    (1,512,40,40),(128,512,1,1),(128)            (1,128,40,40)          138730         102400         138730         356            28.76          100.0%/0.0%/0.0% - Up:0.0%             17             12944          13568          801.22         Conv:Conv_92
50   Concat           INT8     NPU    (1,128,40,40),(1,128,40,40)                  (1,256,40,40)          149958         0              149958         277            \              100.0%/0.0%/0.0% - Up:0.0%             17             7664           8960           866.06         Concat:Concat_94
51   ConvRelu         INT8     NPU    (1,256,40,40),(256,256,1,1),(256)            (1,256,40,40)          109787         102400         109787         228            44.91          100.0%/0.0%/0.0% - Up:0.0%             9              8928           9216           634.06         Conv:Conv_95
52   ConvRelu         INT8     NPU    (1,256,40,40),(128,256,1,1),(128)            (1,128,40,40)          173674         51200          173674         168            30.48          100.0%/0.0%/0.0% - Up:0.0%             14             9968           10496          1003.03        Conv:Conv_97
53   ConvTranspose    INT8     NPU    (1,128,40,40),(128,1,4,4),(128)              (1,128,80,80)          554287         12800          554287         405            3.16           100.0%/0.0%/0.0% - Up:0.0%             1              992            1024           3201.22        ConvTranspose:Resize_100_2deconv
54   Concat           INT8     NPU    (1,128,80,80),(1,128,80,80)                  (1,256,80,80)          349961         0              349961         739            \              100.0%/0.0%/0.0% - Up:0.0%             17             7664           8960           2021.16        Concat:Concat_101
55   ConvRelu         INT8     NPU    (1,256,80,80),(64,256,1,1),(64)              (1,64,80,80)           139569         102400         139569         477            21.47          100.0%/0.0%/0.0% - Up:0.0%             21             20832          21504          806.06         Conv:Conv_102
56   ConvRelu         INT8     NPU    (1,64,80,80),(64,64,1,1),(64)                (1,64,80,80)           146841         25600          146841         282            9.08           100.0%/0.0%/0.0% - Up:0.0%             9              8928           9216           848.06         Conv:Conv_104
57   ConvRelu         INT8     NPU    (1,64,80,80),(64,64,3,3),(64)                (1,64,80,80)           349961         230400         349961         406            56.75          100.0%/0.0%/0.0% - Up:0.0%             9              8928           9216           2021.16        Conv:Conv_106
58   ConvRelu         INT8     NPU    (1,256,80,80),(64,256,1,1),(64)              (1,64,80,80)           277249         102400         277249         409            25.04          100.0%/0.0%/0.0% - Up:0.0%             26             21872          22784          1601.22        Conv:Conv_108
59   Concat           INT8     NPU    (1,64,80,80),(1,64,80,80)                    (1,128,80,80)          280344         0              280344         431            \              100.0%/0.0%/0.0% - Up:0.0%             17             7664           8960           1619.09        Concat:Concat_110
60   ConvRelu         INT8     NPU    (1,128,80,80),(128,128,1,1),(128)            (1,128,80,80)          421633         102400         421633         430            23.81          100.0%/0.0%/0.0% - Up:0.0%             12             11904          12288          2435.09        Conv:Conv_111
61   Conv             INT8     NPU    (1,128,80,80),(255,128,1,1),(255)            (1,255,80,80)          421444         204000         421444         645            31.63          100.0%/0.0%/0.0% - Up:0.0%             17             12944          13568          2434.00        Conv:Conv_141
62   OutputOperator   INT8     CPU    (1,255,80,80)                                \                      0              0              0              148            \              0.0%/0.0%/0.0% - Up:0.0%               0              0              0              1178.12        OutputOperator:269
63   ConvRelu         INT8     NPU    (1,128,80,80),(128,128,3,3),(128)            (1,128,40,40)          138730         230400         230400         486            47.41          100.0%/0.0%/0.0% - Up:0.0%             23             18896          19712          801.22         Conv:Conv_113
64   Concat           INT8     NPU    (1,128,40,40),(1,128,40,40)                  (1,256,40,40)          109787         0              109787         208            \              100.0%/0.0%/0.0% - Up:0.0%             17             7664           8960           634.06         Concat:Concat_115
65   ConvRelu         INT8     NPU    (1,256,40,40),(128,256,1,1),(128)            (1,128,40,40)          72209          51200          72209          165            31.03          100.0%/0.0%/0.0% - Up:0.0%             9              8928           9216           417.03         Conv:Conv_116
66   ConvRelu         INT8     NPU    (1,128,40,40),(128,128,1,1),(128)            (1,128,40,40)          94372          25600          94372          123            20.81          100.0%/0.0%/0.0% - Up:0.0%             6              5952           6144           545.03         Conv:Conv_118
67   ConvRelu         INT8     NPU    (1,128,40,40),(128,128,3,3),(128)            (1,128,40,40)          109787         230400         230400         315            73.14          100.0%/0.0%/0.0% - Up:0.0%             6              5952           6144           634.06         Conv:Conv_120
68   ConvRelu         INT8     NPU    (1,256,40,40),(128,256,1,1),(128)            (1,128,40,40)          138730         51200          138730         167            30.66          100.0%/0.0%/0.0% - Up:0.0%             14             9968           10496          801.22         Conv:Conv_122
69   Concat           INT8     NPU    (1,128,40,40),(1,128,40,40)                  (1,256,40,40)          149958         0              149958         202            \              100.0%/0.0%/0.0% - Up:0.0%             17             7664           8960           866.06         Concat:Concat_124
70   ConvRelu         INT8     NPU    (1,256,40,40),(256,256,1,1),(256)            (1,256,40,40)          149958         102400         149958         222            46.13          100.0%/0.0%/0.0% - Up:0.0%             9              8928           9216           866.06         Conv:Conv_125
71   Conv             INT8     NPU    (1,256,40,40),(255,256,1,1),(255)            (1,255,40,40)          149947         102000         149947         227            44.93          100.0%/0.0%/0.0% - Up:0.0%             14             9968           10496          866.00         Conv:Conv_143
72   OutputOperator   INT8     CPU    (1,255,40,40)                                \                      0              0              0              45             \              0.0%/0.0%/0.0% - Up:0.0%               0              0              0              1664.06        OutputOperator:271
73   ConvRelu         INT8     NPU    (1,256,40,40),(256,256,3,3),(256)            (1,256,20,20)          69471          230400         230400         398            57.89          100.0%/0.0%/0.0% - Up:0.0%             17             12944          13568          401.22         Conv:Conv_127
74   Concat           INT8     NPU    (1,256,20,20),(1,256,20,20)                  (1,512,20,20)          74287          0              74287          124            \              100.0%/0.0%/0.0% - Up:0.0%             17             7664           8960           429.03         Concat:Concat_129
75   ConvRelu         INT8     NPU    (1,512,20,20),(256,512,1,1),(256)            (1,256,20,20)          45890          51200          51200          145            35.31          100.0%/0.0%/0.0% - Up:0.0%             6              5952           6144           265.03         Conv:Conv_130
76   ConvRelu         INT8     NPU    (1,256,20,20),(256,256,1,1),(256)            (1,256,20,20)          134542         25600          134542         102            25.10          100.0%/0.0%/0.0% - Up:0.0%             6              5952           6144           777.03         Conv:Conv_132
77   ConvRelu         INT8     NPU    (1,256,20,20),(256,256,3,3),(256)            (1,256,20,20)          74287          230400         230400         319            72.23          100.0%/0.0%/0.0% - Up:0.0%             6              5952           6144           429.03         Conv:Conv_134
78   ConvRelu         INT8     NPU    (1,512,20,20),(256,512,1,1),(256)            (1,256,20,20)          69471          51200          69471          145            35.31          100.0%/0.0%/0.0% - Up:0.0%             11             6992           7424           401.22         Conv:Conv_136
79   Concat           INT8     NPU    (1,256,20,20),(1,256,20,20)                  (1,512,20,20)          113764         0              113764         123            \              100.0%/0.0%/0.0% - Up:0.0%             17             7664           8960           657.03         Concat:Concat_138
80   ConvRelu         INT8     NPU    (1,512,20,20),(512,512,1,1),(512)            (1,512,20,20)          74287          102400         102400         207            49.47          100.0%/0.0%/0.0% - Up:0.0%             6              5952           6144           429.03         Conv:Conv_139
81   Conv             INT8     NPU    (1,512,20,20),(255,512,1,1),(255)            (1,255,20,20)          74454          51000          74454          147            34.69          100.0%/0.0%/0.0% - Up:0.0%             11             6992           7424           430.00         Conv:Conv_145
82   OutputOperator   INT8     CPU    (1,255,20,20)                                \                      0              0              0              19             \              0.0%/0.0%/0.0% - Up:0.0%               0              0              0              100.00         OutputOperator:273
Total Operator Elapsed Time(us): 27987
Total Memory RW Amount(MB): 88.4914
Operator Time-Consuming Ranking:
OpType           Call Number     CPU Time(us)    NPU Time(us)    Total Time(us)    Time Ratio(%)
ConvRelu         50              0               19301           19301             68.96
Concat           13              0               4112            4112              14.69
ConvReluAdd      7               0               2501            2501              8.93
Conv             3               0               1019            1019              3.64
ConvTranspose    2               0               575             575               2.05
MaxPool          3               0               253             253               0.9
OutputOperator   3               212             0               212               0.75
InputOperator    1               14              0               14                0.05

===================================================================================================================

done
--> eval_memory
======================================================
            Memory Profile Info Dump
======================================================
NPU model memory detail(bytes):
    Total Weight Memory: 6.99 MiB
    Total Internal Tensor Memory: 5.86 MiB
    Total Memory: 18.16 MiB

INFO: When evaluating memory usage, we need consider
the size of model, current model size is: 8.11 MiB
======================================================

done

test.py主程序:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
if __name__ == '__main__':
    # 创建RKNN
    # 如果测试遇到问题,请开启verbose=True,查看调试信息。
    #rknn = RKNN(verbose=True)
    rknn = RKNN()

    # 导入RKNN模型,path参数指定rknn模型路径
    print('--> Loading model')
    ret = rknn.load_rknn(path=RKNN_MODEL)
    if ret != 0:
        print('Load model failed!')
        exit(ret)
    print('done')

    # 初始化运行时环境,指定连接的板卡NPU平台,device_id指定前面adb连接的板卡设备ID
    # perf_debug开启进行性能评估时开启debug模式,eval_mem进入内存评估模式
    print('--> Init runtime environment')
    ret = rknn.init_runtime(target='rk3588', device_id='192.168.103.131:5555', perf_debug=True, eval_mem=True)
    if ret != 0:
        print('Init runtime environment failed!')
        exit(ret)
    print('done')

    # 模型性能进行评估,默认is_print是true,打印内存使用情况
    print('--> eval_perf')
    rknn.eval_perf()
    print('done')

    # 调试,模型性能进行评估,默认is_print是true,打印内存使用情况
    print('--> eval_memory')
    rknn.eval_memory()
    print('done')

    rknn.release()

上面程序初始化运行时环境进行了内存和性能评估,也可以在NPU设备上推理,实际添加参考下前面小节在PC模拟器上推理。

更多rknn-toolkit2的功能测试例程,参考下https://github.com/rockchip-linux/rknn-toolkit2/tree/master/examples/functions。

3.3. 板卡信息查看和设置

3.3.1. 对于rk356x(RK3566/RK3568)

鲁班猫板卡CPU默认是 interactive 状态,它会根据CPU使用率和目标负载来动态地调整CPU频率。 为获得更高运行速度或者性能评估,我们需要手动固定频率,参考下这:

1、CPU频率设置:

# 查看CPU当前频率
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq

# 查看CPU当前调频策略
cat  /sys/devices/system/cpu/cpufreq/policy0/scaling_governor

# 查看可以设置的频率
cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies

# 设置允许root权限的用户通过sysfs的“scaling_setspeed字段将cpu频率设置成用户想要的频率
echo userspace > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor

# 设置需要固定的频率
echo 1800000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed

2、DDR频率设置:

# 查看DDR当前频率
cat /sys/class/devfreq/dmc/cur_freq

# 查看DDR当前调频策略
cat  /sys/class/devfreq/dmc/governor

# 查看DDR可以设置的频率
cat /sys/class/devfreq/dmc/available_frequencies

# 设置允许root权限的用户通过sysfs的“scaling_setspeed字段将cpu频率设置成用户想要的频率
echo userspace >  /sys/class/devfreq/dmc/governor

# 设置需要固定的频率,这里是1056000000
echo 1056000000 > /sys/class/devfreq/dmc/userspace/set_freq

3、NPU频率设置:

# 查看NPU查看可用的频率
cat /sys/class/devfreq/fde40000.npu/available_frequencies

echo userspace > /sys/class/devfreq/fde40000.npu/governor

# 设置频率
echo 900000000 > /sys/kernel/debug/clk/clk_scmi_npu/clk_rate

# 查看当前npu频率
cat /sys/class/devfreq/fde40000.npu/cur_freq

cat /sys/kernel/debug/clk/clk_summary | grep npu

3.3.2. 对于rk3588(rk3588s/rk3588)

1、NPU频率查询和设置

# 查看驱动版本
cat /sys/kernel/debug/rknpu/version

# 查看电源状态
cat /sys/kernel/debug/rknpu/power

# 查看NPU使用率,需要root权限
cat /sys/kernel/debug/rknpu/load

# 查看NPU可用的频率,然后设置频率
cat /sys/class/devfreq/fdab0000.npu/available_frequencies

echo userspace > /sys/class/devfreq/fdab0000.npu/governor

echo 1000000000 > /sys/class/devfreq/fdab0000.npu/userspace/set_freq

# 查看NPU当前工作频率
cat /sys/kernel/debug/rknpu/freq

3.3.3. NPU其他相关

# 查看librknnrt库版本
strings /usr/lib/librknnrt.so | grep "librknnrt version"

# 查看rknn_server版本
strings /usr/bin/rknn_server | grep build

# 查看NPU驱动版本
dmesg | grep -i rknpu

# 或者
sudo cat /sys/kernel/debug/rknpu/version

关于NPU驱动更新,需要编译内核然后更新板卡内核,鲁班猫板卡可以直接使用命令更新或者烧录网盘的最新镜像。

sudo apt update

# 请根据实际板卡的内核版本
sudo apt install linux-image-5.10.160

sudo apt install linux-image-5.10.198