3. PP-ORCv4¶

PP-OCRv4在PP-OCRv3的基础上进一步升级。整体的框架图保持了与PP-OCRv3相同的pipeline，针对检测模型和识别模型进行了数据、网络结构、训练策略等多个模块的优化。 PP-OCRv4系统框图如下所示：

从效果上看，速度可比情况下，多种场景精度均有大幅提升：中文场景，相对于PP-OCRv3中文模型提升超4%；英文数字场景，相比于PP-OCRv3英文模型提升6%；多语言场景，优化80个语种识别效果，平均准确率提升超8%。

本章将简单介绍下部署环境，然后演示下如何在鲁班猫板卡上部署运行PP-ORCv4。

3.1. 环境安装¶

在PC上的WSL或者ubunut系统中创建一个PaddleOCR环境，然后安装相关工具等。

# 使用conda创建虚拟环境
conda create -n PaddleOCR python=3.8
conda activate PaddleOCR

通过PaddleOCR源码安装PaddleOCR：

# 根据具体环境，参考paddle官网安装paddlepaddle

# 然后安装PaddleOCR
git clone https://github.com/PaddlePaddle/PaddleOCR.git
cd PaddleOCR

# 安装下环境
pip install -r requirements.txt

# 安装toolkit2请参考下前面开发环境章节

关于如何训练和评估PP-OCRv4，请参考下PaddleOCR 文档。

3.2. 模型准备¶

PP-OCRv4模型可以直接从 PaddleOCR 主页面下载:

或者从教程的配套例程获取，在终端也可以直接使用命令：

# 获取测试模型
wget -c https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_det_infer.tar
wget -c https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_rec_infer.tar
wget -c https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar

# 然后解压
tar -xvf ch_PP-OCRv4_rec_infer.tar
tar -xvf ch_PP-OCRv4_det_infer.tar
tar -xvf ch_ppocr_mobile_v2.0_cls_infer.tar

获取的有三个模型，分别是文本检测模型、方向分类模型和文本识别模型：

文本检测模型：是一个基于DB检测模型，可以检测出图像中的文字区域；
方向分类模型：进行文本方向判断和校正，方便后续进行文本识别；
文本识别模型：得到的检测框进行文本识别，得到每个文本框内的文字内容。

环境中需要安装paddle2onnx，然后将获取的文本检测模型转换成onnx模型：

# 安装paddle2onnx
pip install paddle2onnx

# 将paddle转换成onnx模型
paddle2onnx --model_dir ./model/ch_PP-OCRv4_det_infer \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--save_file ./model/ch_PP-OCRv4_det_infer/model.onnx \
--opset_version 12 \
--enable_dev_version True

# Seting fix input shape
python -m paddle2onnx.optimize --input_model model/ch_PP-OCRv4_det_infer/model.onnx \
                            --output_model model/ch_PP-OCRv4_det_infer/ppocrv4_det.onnx \
                            --input_shape_dict "{'x':[1,3,480,480]}"

将获取的文本识别模型转换成onnx模型并固定输入维度：

paddle2onnx --model_dir ./model/ch_PP-OCRv4_rec_infer \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--save_file ./model/ch_PP-OCRv4_rec_infer/model.onnx \
--opset_version 12 \
--enable_dev_version True

# Seting fix input shape
python -m paddle2onnx.optimize --input_model model/ch_PP-OCRv4_rec_infer/model.onnx \
                            --output_model model/ch_PP-OCRv4_rec_infer/ppocrv4_rec.onnx \
                            --input_shape_dict "{'x':[1,3,48,320]}"

将方向分类模型转换成onnx模型(可选)：

# 方向分类模型
paddle2onnx --model_dir ch_ppocr_mobile_v2.0_cls_infer \
            --model_filename inference.pdmodel \
            --params_filename inference.pdiparams \
            --save_file ch_ppocr_mobile_v2.0_cls_infer/ch_ppocr_mobile_v2.0_cls_infer.onnx \
            --enable_dev_version True

# Seting fix input shape
python -m paddle2onnx.optimize --input_model ch_ppocr_mobile_v2.0_cls_infer/ch_ppocr_mobile_v2.0_cls_infer.onnx \
                            --output_model ch_ppocr_mobile_v2.0_cls_infer/ch_ppocr_mobile_v2.0_cls_infer.onnx \
                            --input_shape_dict "{'x':[1,3,48,192]}"

3.2.1. rknn模型¶

教程测试在lubancat-4的npu上部署，还需要使用toolkit2将模型转换成rknn模型，模型转换程序参考下配套教程代码仓库。如果你测试的不是lubancat-4（rk3588系列），请修改程序中 platform = "rk3588"。

# 获取程序
git clone https://gitee.com/LubanCat/lubancat_ai_manual_code.git
cd lubancat_ai_manual_code/example/ppocrv4

# 在toolkit2环境下，转换文字识别模型
(toolkit2_1.6) llh@YH-LONG:xxx/pp-ocrv4$ python convert_rec.py
W __init__: rknn-toolkit2 version: 1.6.0+81f21f4d
--> Config model
done
--> Loading model
W load_onnx: It is recommended onnx opset 19, but your onnx model opset is 12!
Loading : 100%|███████████████████████████████████████████████| 420/420 [00:00<00:00, 111961.85it/s]
W load_onnx: The config.mean_values is None, zeros will be set for input 0!
W load_onnx: The config.std_values is None, ones will be set for input 0!
done
--> Building model
done
--> Export rknn model
done

# 转换文字检测模型
(toolkit2_1.6) llh@YH-LONG:xxx/pp-ocrv4$ python convert_det.py
W __init__: rknn-toolkit2 version: 1.6.0+81f21f4d
--> Config model
done
--> Loading model
W load_onnx: It is recommended onnx opset 19, but your onnx model opset is 12!
Loading : 100%|███████████████████████████████████████████████| 342/342 [00:00<00:00, 137123.79it/s]
done
--> Building model
W build: found outlier value, this may affect quantization accuracy
const name          abs_mean    abs_std     outlier value
conv2d_398.w_0      6.53        8.93        58.173
conv2d_402.w_0      2.41        3.73        34.766
# 省略
conv2d_421.w_0      0.08        0.09        11.623
GraphPreparing : 100%|██████████████████████████████████████████| 214/214 [00:00<00:00, 5121.45it/s]
Quantizating : 100%|██████████████████████████████████████████████| 214/214 [00:05<00:00, 37.25it/s]
W build: The default input dtype of 'x' is changed from 'float32' to 'int8' in rknn model for performance!
                    Please take care of this change when deploy rknn model with Runtime API!
W build: The default output dtype of 'sigmoid_0.tmp_0' is changed from 'float32' to 'int8' in rknn model for performance!
                    Please take care of this change when deploy rknn model with Runtime API!
done
--> Export rknn model
done

# 方向分类模型
(toolkit2_1.6) llh@YH-LONG:xxx/pp-ocrv4$ python convert_cls.py
W __init__: rknn-toolkit2 version: 1.6.0+81f21f4d
--> Config model
done
--> Loading model
W load_onnx: It is recommended onnx opset 19, but your onnx model opset is 9!
Loading : 100%|███████████████████████████████████████████████| 250/250 [00:00<00:00, 151681.76it/s]
W load_onnx: The config.mean_values is None, zeros will be set for input 0!
W load_onnx: The config.std_values is None, ones will be set for input 0!
done
--> Building model
done
--> Export rknn model
done

生成的rknn模型都保存在model目录下。

3.3. 板卡上部署测试¶

提示

测试时请确认rknn-Toolkit2的版本和运行推理程序时使用的librknnrt.so库版本一致。

在板卡上拉取例程，并将前面转换的rknn模型传输到板卡，然后放到例程的model目录下。更多rknn例程使用请参考 rknn_model_zoo 。

# 板卡上拉取程序
git clone https://gitee.com/LubanCat/lubancat_ai_manual_code.git
cd lubancat_ai_manual_code/example/ppocrv4/cpp

# 直接编译例程
cat@lubancat:xxx/cpp$ ./build-linux.sh -t rk3588
./build-linux.sh -t rk3588
===================================
TARGET_SOC=rk3588
INSTALL_DIR=/home/cat/yolov8/rknn_model_zoo/examples/PPOCR/PPOCR-System/cpp/install/rk3588_linux
BUILD_DIR=/home/cat/yolov8/rknn_model_zoo/examples/PPOCR/PPOCR-System/cpp/build/build_rk3588_linux
ENABLE_DMA32=OFF
ENABLE_ZERO_COPY=OFF
CC=aarch64-linux-gnu-gcc
CXX=aarch64-linux-gnu-g++
===================================
-- Configuring done
-- Generating done
-- Build files have been written to: /home/cat/yolov8/rknn_model_zoo/examples/PPOCR/PPOCR-System/cpp/build/build_rk3588_linux
Scanning dependencies of target rknn_ppocr_system_demo
[ 16%] Building CXX object CMakeFiles/rknn_ppocr_system_demo.dir/rknpu2/ppocr_system.cc.o
[ 33%] Building CXX object CMakeFiles/rknn_ppocr_system_demo.dir/main.cc.o
[ 50%] Building CXX object CMakeFiles/rknn_ppocr_system_demo.dir/postprocess.cc.o
# 省略....
[ 66%] Linking CXX executable rknn_ppocr_system_demo
[100%] Built target rknn_ppocr_system_demo
[100%] Built target rknn_ppocr_system_demo
Install the project...
-- Install configuration: ""
# 省略....

切换到install/rk3588_linux目录下，简单执行命令，查看下参数：

# 查看程序命令使用
cat@lubancat:xxx/cpp/install/rk3588_linux$ ./rknn_ppocr_system_demo
./rknn_ppocr_system_demo <det_model_path> <rec_model_path> <image_path>
or ./rknn_ppocr_system_demo <det_model_path> <cls_model_path> <rec_model_path> <image_path>

编译出的ocr程序，支持两种命令形式，我们先测试下只是用文本检测和文本识别：

# rknn_ppocr_system_demo <det_model_path> <rec_model_path> <image_path>
# 这里测试是将rknn_ppocr_system_demo和ppocrv4_det_rk3588.rknn ppocrv4_rec_rk3588.rknn 三个文件放在同一目录下，请注意模型的路径。
cat@lubancat:xxx/cpp/install/rk3588_linux$ ./rknn_ppocr_system_demo ppocrv4_det_rk3588.rknn ppocrv4_rec_rk3588.rknn ./model/test.jpg
model input num: 1, output num: 1
input tensors:
index=0, name=x, n_dims=4, dims=[1, 480, 480, 3], n_elems=691200, size=691200, fmt=NHWC, type=INT8, qnt_type=AFFINE, zp=-14, scale=0.018658
output tensors:
index=0, name=sigmoid_0.tmp_0, n_dims=4, dims=[1, 1, 480, 480], n_elems=230400, size=230400, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
model is NHWC input fmt
model input height=480, width=480, channel=3
model input num: 1, output num: 1
input tensors:
index=0, name=x, n_dims=4, dims=[1, 48, 320, 3], n_elems=46080, size=92160, fmt=NHWC, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
output tensors:
index=0, name=softmax_11.tmp_0, n_dims=3, dims=[1, 40, 6625, 0], n_elems=265000, size=530000, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
model is NHWC input fmt
model input height=48, width=320, channel=3
# 省略...........
[4] @ [(25, 179), (298, 177), (300, 194), (26, 195)]
regconize result: 【品牌】：代加工方式/OEMODM, score=0.706945
[5] @ [(25, 209), (235, 209), (235, 227), (25, 227)]
regconize result: 【品名】：纯臻营养护发素, score=0.710571
[6] @ [(25, 240), (241, 240), (241, 258), (25, 258)]
regconize result: 【产品编号】：YM-X-3011, score=0.708099
[7] @ [(413, 233), (429, 233), (429, 303), (413, 303)]
regconize result: ODMOEM, score=0.707845
[8] @ [(25, 270), (180, 270), (180, 289), (25, 289)]
regconize result: 【净含量】：220ml, score=0.709872
[9] @ [(27, 304), (251, 304), (251, 320), (27, 320)]
regconize result: 【适用人群】：适合所有肤质, score=0.702712
[10] @ [(25, 334), (342, 334), (342, 352), (25, 352)]
regconize result: 【主要成分】：鲸蜡硬脂醇、燕麦β-葡聚, score=0.693295
[11] @ [(27, 364), (282, 365), (281, 384), (26, 382)]
regconize result: 糖、椰油酰胺丙基甜菜碱、泛醒, score=0.687570
[12] @ [(368, 368), (476, 368), (476, 387), (368, 387)]
regconize result: （成品包材）, score=0.711100
[13] @ [(25, 397), (361, 395), (362, 413), (26, 415)]
regconize result: 【主要功能】：可紧致头发磷层，从而达到, score=0.710629
[14] @ [(27, 428), (369, 428), (369, 445), (27, 445)]
regconize result: 即时持久改善头发光泽的效果，给干燥的头, score=0.710064
[15] @ [(27, 459), (136, 459), (136, 478), (27, 478)]
regconize result: 发足够的滋养, score=0.711751

测试将输出识别文本的结果以及评分，结果图片保存在当前目录下，如下：

另外一种是有带文字方向检测的命令，在识别时会根据文字方向进行识别。

# rknn_ppocr_system_demo <det_model_path> <cls_model_path> <rec_model_path> <image_path>
cat@lubancat:xxx/cpp/install/rk3588_linux$ ./rknn_ppocr_system_demo ppocrv4_det_rk3588.rknn ch_ppocr_mobile_v2.0_cls_rk3588.rknn ppocrv4_rec_rk3588.rknn ./11.jpg
model input num: 1, output num: 1
input tensors:
index=0, name=x, n_dims=4, dims=[1, 480, 480, 3], n_elems=691200, size=691200, fmt=NHWC, type=INT8, qnt_type=AFFINE, zp=-14, scale=0.018658
output tensors:
index=0, name=sigmoid_0.tmp_0, n_dims=4, dims=[1, 1, 480, 480], n_elems=230400, size=230400, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
model is NHWC input fmt
model input height=480, width=480, channel=3
model input num: 1, output num: 1
input tensors:
index=0, name=x, n_dims=4, dims=[1, 48, 192, 3], n_elems=27648, size=55296, fmt=NHWC, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
output tensors:
index=0, name=softmax_0.tmp_0, n_dims=2, dims=[1, 2, 0, 0], n_elems=2, size=4, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
model is NHWC input fmt
model input height=48, width=192, channel=3
model input num: 1, output num: 1
input tensors:
index=0, name=x, n_dims=4, dims=[1, 48, 320, 3], n_elems=46080, size=92160, fmt=NHWC, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
output tensors:
index=0, name=softmax_11.tmp_0, n_dims=3, dims=[1, 40, 6625, 0], n_elems=265000, size=530000, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
model is NHWC input fmt
model input height=48, width=320, channel=3
# 省略...........
[5] @ [(110, 168), (135, 167), (136, 469), (112, 470)]
regconize result: （45元/每公斤，100公斤起订）, score=0.668917
[6] @ [(35, 195), (72, 195), (72, 471), (35, 471)]
regconize result: 纯臻营养护发素, score=0.710519
[7] @ [(177, 197), (200, 198), (197, 470), (175, 469)]
regconize result: （品牌】：代加工方式/OEMODM, score=0.700224
[8] @ [(141, 208), (168, 208), (168, 473), (141, 473)]
regconize result: 每瓶22元，1000瓶起订）, score=0.706473
[9] @ [(365, 211), (385, 211), (385, 472), (365, 472)]
regconize result: 糖、椰油酰胺丙基甜菜碱、泛酸, score=0.681553
[10] @ [(302, 240), (321, 240), (321, 468), (302, 468)]
regconize result: 适用人群】：适合所有肤质, score=0.709757
[11] @ [(208, 261), (230, 261), (230, 468), (208, 468)]
regconize result: 品名】：纯臻营养护发素, score=0.710094
[12] @ [(239, 259), (260, 259), (260, 468), (239, 468)]
regconize result: 产品编号】：YM-X-3011, score=0.707292
[13] @ [(82, 322), (106, 323), (104, 472), (79, 471)]
regconize result: 产品信息/参数, score=0.706752
[14] @ [(269, 315), (290, 314), (291, 467), (271, 468)]
regconize result: 净含量】：220ml, score=0.707764
[15] @ [(459, 360), (479, 361), (478, 471), (457, 470)]
regconize result: 发足够的滋养, score=0.711100

测试图片保存在当前目录下，如下：

3.4. 参考链接¶

https://github.com/PaddlePaddle/PaddleOCR

https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/doc/doc_ch/PP-OCRv4_introduction.md

https://github.com/airockchip/rknn_model_zoo