1. 飞桨(PaddlePaddle) FastDeploy

1.1. FastDeploy介绍

FastDeploy 是一款全场景、易用灵活、极致高效的AI推理部署工具。 提供超过160+ Text,Vision, Speech和跨模态模型开箱即用的部署体验,并实现端到端的推理性能优化。 包括 物体检测、字符识别(OCR)、人脸、人像扣图、多目标跟踪系统、NLP、Stable Diffusion文图生成、TTS 等几十种任务场景,满足开发者多场景、多硬件、多平台的产业部署需求。

FastDeploy旨在为AI开发者提供模型部署最优解,具备全场景、简单易用、极致高效三大特点。

  • 全场景:支持 GPU、CPU、Jetson、ARM CPU、瑞芯微NPU、晶晨NPU、恩智浦NPU 等多类硬件,支持本地部署、服务化部署、Web端部署、移动端部署等,支持CV、NLP、Speech 三大领域, 支持图像分类、图像分割、语义分割、物体检测、字符识别(OCR)、人脸检测识别、人像扣图、姿态估计、文本分类、信息抽取、行人跟踪、语音合成等16大主流算法场景。

  • 易用灵活:3行代码完成AI模型的部署,1行代码快速切换后端推理引擎和部署硬件,统一API实现不同部署场景的零成本迁移。

  • 极致高效:相比传统深度学习推理引擎只关注模型的推理时间,FastDeploy则关注模型任务的端到端部署性能。通过高性能前后处理、整合高性能推理引擎、一键自动压缩等技术, 实现了AI模型推理部署的极致性能优化。

接下来我们将简单搭建下FastDeploy环境,在RK356X/RK3588上部署轻量检测网络PicoDet。

提示

教程测试环境:lubancat-0/1/2使用Debian10,lubancat-4使用Debian11,PC端是WSL2(ubuntu20.04)

1.2. PC端模型转换推理环境搭建

1、安装FastDeploy等环境

需要安装rknn-Toolkit2和FastDeploy工具,用于模型转换等。创建一个虚拟环境:

# 使用conda创建一个名为FastDeploy的环境,并指定python版本
conda create -n FastDeploy python=3.8

# 进入环境
conda activate FastDeploy

然后安装fastdeploy等工具:

# 直接用预编译库安装,也可以FastDeploy编译安装
# 详细参考下:
pip3 install fastdeploy-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html

# 安装paddle2onnx
pip3 install paddle2onnx

# 安装paddlepaddle
python -m pip3 install paddlepaddle==0.0.0 -f https://www.paddlepaddle.org.cn/whl/linux/cpu-mkl/develop.html

paddlepaddle安装详细参考下 这里

2、安装rknn-Toolkit2

使用paddlepaddle训练保存的Paddle模型,在板卡上部署rknn模型,目前需要先转换成onnx,然后转成rknn模型。模型转换使用rknn-Toolkit2工具, 该工具安装参考下前面章节 《RKNN Toolkit2介绍》

1.3. 板端FastDeploy RKNPU2推理环境搭建

使用FastDeploy,可以在板卡上快速部署模型,提供有Python和C++ SDK,请根据实际部署需要选择。

1.3.1. 编译安装Fastdeploy SDK

在rk系列鲁班猫板卡,FastDeploy支持ONNX Runtime和RKNPU2后端后端引擎,具体如下( 具体请参考 这里 ):

broken

接下来介绍下如何编译FastDeploy Python SDK 和 C++ SDK, 由于板卡性能和资源,编译FastDeploy SDK可能会编译很慢,出现内存不足等情况。

如果需要直接使用(建议自行编译最新的Fastdeploy SDK),我们有提供编译好的安装文件和相关源码, 参考下 云盘资料 (提取码hslu),在 鲁班猫->1-野火开源图书...->AI教程相关源文件 目录中。

1、编译Fastdeploy Python SDK

Python SDK的编译暂时仅支持板端编译, 下面是测试在LubanCat-0/1/2(debian10 python3.7),lubancat-4(Debian11,python3.9.2), 获取SDK源码是develop分支,如果编译遇到问题不能解决,建议使用稳定版本。

# 安装相关软件库等
sudo apt update
sudo apt install -y git cmake python3-dev

# 如果是debian10系统,可以链接,更新下pip
sudo ln -sf /usr/bin/pip3 /usr/bin/pip
pip install --upgrade pip

# git获取源码,默认是develop分支,或者从网盘获取SDK压缩包,目前建议使用稳定版本,切换到其他1.0.7分支等
git clone https://github.com/PaddlePaddle/FastDeploy.git

设置环境变量,编译SDK:

# 设置环境变量
export ENABLE_ORT_BACKEND=ON     # 是否编译集成ONNX Runtime后端,ON打开,OFF关闭
export ENABLE_RKNPU2_BACKEND=ON  # 使用rknpu2作为后端引擎
export ENABLE_VISION=ON          # 是否编译集成视觉模型的部署模块
export RKNN2_TARGET_SOC=RK356X   # lubancat-4选择RK3588(一般指rk3588、rk3588s),lubancat-0/1/2选择RK356X(一般指rk3566、rk3568)

# 切换到SDK源码的FastDeploy/python目录下
cd FastDeploy/python

# 编译,打包
python3 setup.py build
python3 setup.py bdist_wheel

提示

rk356X板卡编译时卡死,可能内存不够,建议加下swap交换分区(linux如何添加交换分区,请网上搜索下),至少4G交换空间,板端编译较慢需要等待下。

安装SDK:

# 安装
cd dist
pip3 install fastdeploy_python-0.0.0-cp37-cp37m-linux_aarch64.whl

如果限于板卡资源和性能,可以直接从云盘资料的whl文件,然后使用pip3安装,如果安装遇到问题,可以参考下前面编译过程安装的环境。

2、编译Fastdeploy C++ SDK

Fastdeploy C++ SDK支持板端编译和交叉编译,板卡内存充足建议直接在板卡上编译, 编译可以参考下 这里 。 这里演示下在板卡上编译:

# 获取Fastdeploy源码,如果前面获取了则不需要,也可以从网盘资料获取...
git clone https://github.com/PaddlePaddle/FastDeploy.git

# 切换到源码目录下
cd FastDeploy

# 创建一个编译目录,并切换到该目录下,执行cmake配置
mkdir build && cd build

cmake .. -DENABLE_ORT_BACKEND=ON \
        -DENABLE_RKNPU2_BACKEND=ON \
        -DENABLE_VISION=ON \
        -DRKNN2_TARGET_SOC=RK356X \
        -DCMAKE_INSTALL_PREFIX=${PWD}/fastdeploy-0.0.0

# ENABLE_ORT_BACKEND 是指定是否编译集成ONNX Runtime后端
# ENABLE_RKNPU2_BACKEND 是指定否编译集成RKNPU2后端
# ENABLE_VISION 指定是否编译集成视觉模型的部署模块
# RKNN2_TARGET_SOC 指定目标设备,输入值为RK3588,RK356X,必须设置
# CMAKE_INSTALL_PREFIX 安装目录
# ORT_DIRECTORY 开启ONNX Runtime后端时,用于指定用户本地的ONNX Runtime库路径;如果不指定,编译过程会自动下载ONNX Runtime库(教程测试下载的版本是1.12.0)
# OPENCV_DIRECTORY 当ENABLE_VISION=ON时,用于指定用户本地的OpenCV库路径;如果不指定,编译过程会自动下载OpenCV库(教程测试下载的版本是3.4.14)

# 编译,安装,如果是RK356X,指定-j4;如果是RK3588,指定-j8
make -j4

make install

# 最后安装在XXX/FastDeploy/build/fastdeploy-0.0.0目录下
cat@lubancat:~/FastDeploy-develop/build/fastdeploy-0.0.0$ ls
FastDeploy.cmake        FastDeployConfig.cmake  ThirdPartyNotices.txt  fastdeploy_init.sh  lib           summary.cmake  utils
FastDeployCSharp.cmake  LICENSE                 VERSION_NUMBER         include             openmp.cmake  third_libs     utils.cmake

1.4. 部署推理示例

1.4.1. 轻量化检测网络PicoDet

PP-PicoDet是轻量级实时移动端目标检测模型,该模型的介绍参考下 这里 。 下面我们将在板卡上使用FastDeploy部署轻量化检测网络PicoDet:

1、模型转换

rknn-Toolkit2暂时不支持Paddle模型直接导出为RKNN模型,需要使用Paddle2ONNX转成onnx模型,然后导出rknn模型。在PC ubunut系统中:

# 启用虚拟环境
# 这里测试直接下载Paddle静态图模型并解压
wget https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_416_coco_lcnet.tar
tar xvf picodet_s_416_coco_lcnet.tar

# 静态图模型转换onnx模型,
cd tools/rknpu2/picodet_s_416_coco_lcnet/

paddle2onnx --model_dir picodet_s_416_coco_lcnet \
        --model_filename model.pdmodel \
        --params_filename model.pdiparams \
        --save_file picodet_s_416_coco_lcnet/picodet_s_416_coco_lcnet.onnx \
        --enable_dev_version True

# 固定shape
python -m paddle2onnx.optimize --input_model picodet_s_416_coco_lcnet/picodet_s_416_coco_lcnet.onnx \
                            --output_model picodet_s_416_coco_lcnet/picodet_s_416_coco_lcnet.onnx \
                            --input_shape_dict "{'image':[1,3,416,416]}"

# 最后模型保存在当前目录下picodet_s_416_coco_lcnet/picodet_s_416_coco_lcnet.onnx

2、导出RKNN模型

切换到获取的FastDeploy源码tools/rknpu2/目录下(或者从配套程序获取模型转换程序),并复制picodet_s_416_coco_lcnet到该目录下。

然后修改输出节点,paddle2onnx版本不同,转换模型的输出节点名称可能有所不同,需要使用 Netron 可视化模型, 找到NonMaxSuppression节点,确认输出节点名称,然后修改下config/picodet_s_416_coco_lcnet_unquantized.yaml配置文件中的outputs_nodes参数:

broken
picodet_s_416_coco_lcnet_unquantized.yaml配置文件
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
mean:
-
    - 123.675
    - 116.28
    - 103.53
std:
-
    - 58.395
    - 57.12
    - 57.375
model_path: ./picodet_s_416_coco_lcnet/picodet_s_416_coco_lcnet.onnx
outputs_nodes:
- 'p2o.Mul.179'
- 'p2o.Concat.9'
do_quantization: False
dataset:
output_folder: "./picodet_s_416_coco_lcnet"
# 环境中可能需要安装的模块
pip install pyyaml

# 切换到获取的FastDeploy源码tools/rknpu2/目录下(或者从配套程序获取模型转换程序),并复制picodet_s_416_coco_lcnet到该目录下
cd tools/rknpu2/

# 按前面可视化模型模型的输出节点,修改config/picodet_s_416_coco_lcnet_unquantized.yaml配置文件

# 导出模型,--config_path 指定配置文件,根据实际tools/rknpu2/config目录下的配置文件设置,--target_platform指定平台,rk3588、rk3566等
python export.py --config_path config/picodet_s_416_coco_lcnet_unquantized.yaml --target_platform rk3568

模型转换的环境中需要安装rknn-Toolkit2,rknn模型最后保存在picodet_s_416_coco_lcnet/picodet_s_416_coco_lcnet_rk3568_unquantized.rknn 。

1.4.1.1. 板端Python部署推理

复制前面转换获取的rknn模型到板卡,然后使用FastDeploy源码中对应的推理程序,进行模型推理测试(测试lubancat-0/1/2板卡,rk356x):

# 部署测试需要python opencv,需要安装下:
pip3 install opencv-python

# 获取FastDeploy源码,前面获取了就不需要,然后切换到推理程序目录下
git clone https://github.com/PaddlePaddle/FastDeploy.git
cd examples/vision/detection/paddledetection/rknpu2/python

# 在板端推理部署,可以使用前面自己导出的rknn模型和配置文件,复制到板卡的推理程序目录下

# 也可以快速测试直接下载模型文件,解压在程序目录下
wget https://bj.bcebos.com/paddlehub/fastdeploy/rknpu2/picodet_s_416_coco_lcnet.zip
unzip picodet_s_416_coco_lcnet.zip

# 获取测试图片
wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg

# 运行推理程序,--model_file 指定模型文件,--config_file 指定配置文件, --image 指定需要推理的图片
python3 infer.py --model_file ./picodet_s_416_coco_lcnet/picodet_s_416_coco_lcnet_rk3568_unquantized.rknn  \
                --config_file ./picodet_s_416_coco_lcnet/infer_cfg.yml  --image 000000014439.jpg

教程测试是lubancat-2(rk3568),指定模文件时需要注意导出模型的平台,运行推理显示:

[INFO] fastdeploy/vision/common/processors/transform.cc(45)::FuseNormalizeCast  Normalize and Cast are fused to Normalize in preprocessing pipeline.
[INFO] fastdeploy/vision/common/processors/transform.cc(93)::FuseNormalizeHWC2CHW       Normalize and HWC2CHW are fused to NormalizeAndPermute  in preprocessing pipeline.
[INFO] fastdeploy/vision/common/processors/transform.cc(159)::FuseNormalizeColorConvert BGR2RGB and NormalizeAndPermute are fused to NormalizeAndPermute with swap_rb=1
[INFO] fastdeploy/runtime/backends/rknpu2/rknpu2_backend.cc(81)::GetSDKAndDeviceVersion rknpu2 runtime version: 1.5.1b19 (32afb0e92@2023-07-14T12:46:17)
[INFO] fastdeploy/runtime/backends/rknpu2/rknpu2_backend.cc(82)::GetSDKAndDeviceVersion rknpu2 driver version: 0.8.2
index=0, name=image, n_dims=4, dims=[1, 416, 416, 3], n_elems=519168, size=1038336, fmt=NHWC, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000, pass_through=0
index=0, name=p2o.Mul.179, n_dims=4, dims=[1, 3598, 4, 1], n_elems=14392, size=28784, fmt=NCHW, type=FP32, qnt_type=AFFINE, zp=0, scale=1.000000, pass_through=0
index=1, name=p2o.Concat.9, n_dims=4, dims=[1, 80, 3598, 1], n_elems=287840, size=575680, fmt=NCHW, type=FP32, qnt_type=AFFINE, zp=0, scale=1.000000, pass_through=0
[INFO] fastdeploy/runtime/runtime.cc(367)::CreateRKNPU2Backend  Runtime initialized with Backend::RKNPU2 in Device::RKNPU.
[WARNING] fastdeploy/runtime/backends/rknpu2/rknpu2_backend.cc(420)::InitRKNNTensorMemory       The input tensor type != model's inputs type.The input_type need FP16,but inputs[0].type is UINT8
DetectionResult: [xmin, ymin, xmax, ymax, score, label_id]
413.461548,89.649635, 508.461548, 282.605743, 0.833008, 0
160.961548,81.698311, 200.000000, 166.795670, 0.812012, 0
265.576935,79.756004, 299.230774, 167.524033, 0.791992, 0
105.384621,46.251198, 126.730774, 93.534248, 0.769531, 0
584.230774,113.685692, 612.692322, 178.085327, 0.762695, 0
328.269257,40.211838, 344.423096, 80.545067, 0.636719, 0
379.038483,42.093449, 396.153870, 83.397835, 0.554199, 0
510.000031,116.052879, 598.846191, 278.235565, 0.541504, 0
24.038462,116.417061, 55.048080, 153.442307, 0.443115, 0
58.557693,136.325714, 107.115387, 173.836533, 0.428955, 0
352.307709,45.097954, 376.923096, 104.034851, 0.421387, 0
188.750000,45.795971, 200.000000, 61.637917, 0.414551, 0
352.692322,45.097954, 369.615387, 87.343147, 0.388672, 0
360.961548,61.000599, 383.461548, 114.353363, 0.363525, 0
505.000031,114.838936, 556.538452, 269.009613, 0.334717, 0
1.334135,150.286057, 37.091347, 172.622589, 0.400635, 24
58.269234,143.487976, 104.038467, 172.501190, 0.308594, 24
163.076935,87.525238, 600.000000, 344.274017, 0.576660, 33
164.519241,84.975960, 320.000000, 344.516815, 0.389648, 33

Visualized result save in ./visualized_result.jpg

简单测试结果显示:

broken

1.4.1.2. 板端C++部署推理

复制前面转换获取的rknn模型到板卡,进行模型推理测试,C++部署推理程序同样在FastDeploy源码中, 在源码根目录examples/vision/detection/paddledetection/rknpu2/cpp下,也可以从配套例程获取,其目录文件如下:

.
├── CMakeLists.txt
├── README.md
├── README_CN.md
├── infer_picodet_demo.cc    # PicoDet C++部署推理程序
├── infer_ppyoloe_demo.cc
└── infer_yolov8_demo.cc

0 directories, 6 file

推理程序infer_picodet_demo.cc中RKNPU2推理固定了模型文件名称,我们可以自行修改或者自己重新编写推理程序:

infer_picodet_demo.cc
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// NPU推理
void RKNPU2Infer(const std::string& model_dir, const std::string& image_file) {
auto model_file = model_dir + "/picodet_s_416_coco_lcnet_rk3568_unquantized.rknn"; // 修改为自己的模型文件名称,教程测试是lubancat-2(rk3568)
auto params_file = "";
auto config_file = model_dir + "/infer_cfg.yml";

auto option = fastdeploy::RuntimeOption();
option.UseRKNPU2();

auto format = fastdeploy::ModelFormat::RKNN;

auto model = fastdeploy::vision::detection::PicoDet(
    model_file, params_file, config_file, option, format);

model.GetPreprocessor().DisablePermute();
model.GetPreprocessor().DisableNormalize();
model.GetPostprocessor().ApplyNMS();

auto im = cv::imread(image_file);

fastdeploy::vision::DetectionResult res;
fastdeploy::TimeCounter tc;
tc.Start();
if (!model.Predict(&im, &res)) {
    std::cerr << "Failed to predict." << std::endl;
    return;
}
tc.End();
tc.PrintInfo("PPDet in RKNPU2");

std::cout << res.Str() << std::endl;
auto vis_im = fastdeploy::vision::VisDetection(im, res, 0.5);
cv::imwrite("infer_rknpu2.jpg", vis_im);
std::cout << "Visualized result saved in ./infer_rknpu2.jpg" << std::endl;
}

PicoDet如何编译和运行,可以阅读源码的README_CN.md/README.md文件。下面以lubancat-2板卡为例,简单编译测试下:

# 在源码目录下,创建build目录,然后切换到该目录下
mkdir build && cd build

# 执行cmake目录,FASTDEPLOY_INSTALL_DIR指定前面编译安装Fastdeploy C++ SDK的目录,教程测试就是~/FastDeploy-develop/build/fastdeploy-0.0.0
cmake .. -DFASTDEPLOY_INSTALL_DIR=~/FastDeploy-develop/build/fastdeploy-0.0.0

# 然后执行make编译,默认会编译全部推理程序,在当前目录下生成可执行文件,我们这里测试关注的是infer_picodet_demo
cat@lubancat:~/xxx/rknpu2/cpp/build$ make
Scanning dependencies of target infer_ppyoloe_demo
[ 16%] Building CXX object CMakeFiles/infer_ppyoloe_demo.dir/infer_ppyoloe_demo.cc.o
[ 33%] Linking CXX executable infer_ppyoloe_demo
[ 33%] Built target infer_ppyoloe_demo
Scanning dependencies of target infer_yolov8_demo
[ 50%] Building CXX object CMakeFiles/infer_yolov8_demo.dir/infer_yolov8_demo.cc.o
[ 66%] Linking CXX executable infer_yolov8_demo
[ 66%] Built target infer_yolov8_demo
Scanning dependencies of target infer_picodet_demo
[ 83%] Building CXX object CMakeFiles/infer_picodet_demo.dir/infer_picodet_demo.cc.o
[100%] Linking CXX executable infer_picodet_demo
[100%] Built target infer_picodet_demo

cat@lubancat:~/xxx/rknpu2/cpp/build$ ls
CMakeCache.txt  CMakeFiles  Makefile  cmake_install.cmake  infer_picodet_demo  infer_ppyoloe_demo  infer_yolov8_demo

生成了推理可执行程序infer_picodet_demo ,接下来复制模型文件,获取测试图片,然后推理测试。

复制前面转换的rknn模型文件到板卡,我们可以创建一个文件夹,存放模型文件和推理参数文件。 另外,前面编译Fastdeploy C++ SDK时设置了ENABLE_ORT_BACKEND=ON,也支持cpu推理,我们也复制picodet_s_416_coco_lcnet.onnx模型文件该目录下。

# 复制推理参数文件、模型文件和推理可执行程序,存放到一个目录下
cat@lubancat:~/xxx/test$ ls
infer_cfg.yml  infer_picodet_demo  picodet_s_416_coco_lcnet.onnx  picodet_s_416_coco_lcnet_rk3568_unquantized.rknn

# 拉取测试图片
cat@lubancat:~/xxx/test$ wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg

# CPU推理,infer_picodet_demo的第一个参数是指定模型目录,教程测试是当前目录下,第二个参数指定测试图片,最后一个参数是推理引擎,0是用CPU推理,1是RKNPU2推理
cat@lubancat:~/xxx/test$ ./infer_picodet_demo ./ 000000014439.jpg 0
[INFO] fastdeploy/vision/common/processors/transform.cc(45)::FuseNormalizeCast  Normalize and Cast are fused to Normalize in preprocessing pipeline.
[INFO] fastdeploy/vision/common/processors/transform.cc(93)::FuseNormalizeHWC2CHW       Normalize and HWC2CHW are fused to NormalizeAndPermute  in preprocessing pipeline.
[INFO] fastdeploy/vision/common/processors/transform.cc(159)::FuseNormalizeColorConvert BGR2RGB and NormalizeAndPermute are fused to NormalizeAndPermute with swap_rb=1
[INFO] fastdeploy/runtime/runtime.cc(326)::CreateOrtBackend     Runtime initialized with Backend::ORT in Device::CPU.
[FastDeploy] PPDet in ONNX duration = 0.612096s.
Visualized result saved in ./infer_onnx.jpg

# NPU推理(教程测试未修改CPU,DDR或者NPU频率,lubancat-2 debian10系统默认设置)
cat@lubancat:~/xxx/test$ ./infer_picodet_demo ./ 000000014439.jpg 1
[INFO] fastdeploy/vision/common/processors/transform.cc(45)::FuseNormalizeCast  Normalize and Cast are fused to Normalize in preprocessing pipeline.
[INFO] fastdeploy/vision/common/processors/transform.cc(93)::FuseNormalizeHWC2CHW       Normalize and HWC2CHW are fused to NormalizeAndPermute  in preprocessing pipeline.
[INFO] fastdeploy/vision/common/processors/transform.cc(159)::FuseNormalizeColorConvert BGR2RGB and NormalizeAndPermute are fused to NormalizeAndPermute with swap_rb=1
[INFO] fastdeploy/runtime/backends/rknpu2/rknpu2_backend.cc(81)::GetSDKAndDeviceVersion rknpu2 runtime version: 1.5.1b19 (32afb0e92@2023-07-14T12:46:17)
[INFO] fastdeploy/runtime/backends/rknpu2/rknpu2_backend.cc(82)::GetSDKAndDeviceVersion rknpu2 driver version: 0.8.2
index=0, name=image, n_dims=4, dims=[1, 416, 416, 3], n_elems=519168, size=1038336, fmt=NHWC, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000, pass_through=0
index=0, name=p2o.Mul.179, n_dims=4, dims=[1, 3598, 4, 1], n_elems=14392, size=28784, fmt=NCHW, type=FP32, qnt_type=AFFINE, zp=0, scale=1.000000, pass_through=0
index=1, name=p2o.Concat.9, n_dims=4, dims=[1, 80, 3598, 1], n_elems=287840, size=575680, fmt=NCHW, type=FP32, qnt_type=AFFINE, zp=0, scale=1.000000, pass_through=0
[INFO] fastdeploy/runtime/runtime.cc(367)::CreateRKNPU2Backend  Runtime initialized with Backend::RKNPU2 in Device::RKNPU.
[INFO] fastdeploy/vision/common/processors/transform.cc(159)::FuseNormalizeColorConvert BGR2RGB and Normalize are fused to Normalize with swap_rb=1
[WARNING] fastdeploy/runtime/backends/rknpu2/rknpu2_backend.cc(420)::InitRKNNTensorMemory       The input tensor type != model's inputs type.The input_type need FP16,but inputs[0].type is UINT8
[FastDeploy] PPDet in RKNPU2 duration = 0.190239s.
DetectionResult: [xmin, ymin, xmax, ymax, score, label_id]
413.461548,89.649635, 508.461548, 282.605743, 0.833008, 0
160.961548,81.698311, 200.000000, 166.795670, 0.812012, 0
265.576935,79.756004, 299.230774, 167.524033, 0.791992, 0
105.384621,46.251198, 126.730774, 93.534248, 0.769531, 0
584.230774,113.685692, 612.692322, 178.085327, 0.762695, 0
328.269257,40.211838, 344.423096, 80.545067, 0.636719, 0
379.038483,42.093449, 396.153870, 83.397835, 0.554199, 0
510.000031,116.052879, 598.846191, 278.235565, 0.541504, 0
24.038462,116.417061, 55.048080, 153.442307, 0.443115, 0
58.557693,136.325714, 107.115387, 173.836533, 0.428955, 0
352.307709,45.097954, 376.923096, 104.034851, 0.421387, 0
188.750000,45.795971, 200.000000, 61.637917, 0.414551, 0
352.692322,45.097954, 369.615387, 87.343147, 0.388672, 0
360.961548,61.000599, 383.461548, 114.353363, 0.363525, 0
505.000031,114.838936, 556.538452, 269.009613, 0.334717, 0
1.334135,150.286057, 37.091347, 172.622589, 0.400635, 24
58.269234,143.487976, 104.038467, 172.501190, 0.308594, 24
163.076935,87.525238, 600.000000, 344.274017, 0.576660, 33
164.519241,84.975960, 320.000000, 344.516815, 0.389648, 33

Visualized result saved in ./infer_rknpu2.jpg

在lubancat-2板卡,执行CPU推理一幅测试图片大约0.612096s,结果保存在当前目录的infer_onnx.jpg文件中; 执行NPU推理一幅测试图片大约0.190239s,结果图片保存在infer_rknpu2.jpg,推理结果图片和python推理相同,看下前面图片。