5. PP-ORCv4

PP-OCRv4在PP-OCRv3的基础上进一步升级。整体的框架图保持了与PP-OCRv3相同的pipeline, 针对检测模型和识别模型进行了数据、网络结构、训练策略等多个模块的优化。 PP-OCRv4系统框图如下所示:

broken

从效果上看,速度可比情况下,多种场景精度均有大幅提升:

中文场景,相对于PP-OCRv3中文模型提升超4%;英文数字场景,相比于PP-OCRv3英文模型提升6%; 多语言场景,优化80个语种识别效果,平均准确率提升超8%。 更多的介绍请参考下 PP-OCRv4_introduction.md

本章将简单介绍下部署环境,然后演示下如何鲁班猫板卡上部署运行PP-ORCv4。

5.1. 环境安装

在PC上的WSL或者ubunut系统中创建一个PaddleOCR环境,然后安装相关工具等。

# 使用conda创建虚拟环境
conda create -n PaddleOCR python=3.8
conda activate PaddleOCR

获取PaddleOCR源码:

git clone https://github.com/PaddlePaddle/PaddleOCR.git
cd PaddleOCR

# 安装下环境
pip install -r requirements.txt

# 安装toolkit2请参考下前面开发环境章节

关于如何训练和评估PP-OCRv4,请参考下PaddleOCR 文档

5.2. 模型准备

PP-OCRv4模型可以直接从 PaddleOCR 主页面下载:

broken

或者从教程的配套例程获取,在终端也可以直接使用命令:

# 获取测试模型
wget -c https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_det_infer.tar
wget -c https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_rec_infer.tar
wget -c https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar

# 然后解压
tar -xvf ch_PP-OCRv4_rec_infer.tar
tar -xvf ch_PP-OCRv4_det_infer.tar
tar -xvf ch_ppocr_mobile_v2.0_cls_infer.tar

获取的有三个模型,分别是文本检测模型、方向分类模型和文本识别模型:

  • 文本检测模型:是一个基于DB检测模型,可以检测出图像中的文字区域;

  • 方向分类模型:进行文本方向判断和校正,方便后续进行文本识别;

  • 文本识别模型:得到的检测框进行文本识别,得到每个文本框内的文字内容。

环境中需要安装paddle2onnx,然后将获取的文本检测模型转换成onnx模型:

paddle2onnx --model_dir ./model/ch_PP-OCRv4_det_infer \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--save_file ./model/ch_PP-OCRv4_det_infer/model.onnx \
--opset_version 12 \
--enable_dev_version True

# Seting fix input shape
python -m paddle2onnx.optimize --input_model model/ch_PP-OCRv4_det_infer/model.onnx \
                            --output_model model/ch_PP-OCRv4_det_infer/ppocrv4_det.onnx \
                            --input_shape_dict "{'x':[1,3,480,480]}"

将获取的文本识别模型转换成onnx模型并固定输入维度:

paddle2onnx --model_dir ./model/ch_PP-OCRv4_rec_infer \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--save_file ./model/ch_PP-OCRv4_rec_infer/model.onnx \
--opset_version 12 \
--enable_dev_version True

# Seting fix input shape
python -m paddle2onnx.optimize --input_model model/ch_PP-OCRv4_rec_infer/model.onnx \
                            --output_model model/ch_PP-OCRv4_rec_infer/ppocrv4_rec.onnx \
                            --input_shape_dict "{'x':[1,3,48,320]}"

将方向分类模型转换成onnx模型:

# 方向分类模型
paddle2onnx --model_dir ch_ppocr_mobile_v2.0_cls_infer \
            --model_filename inference.pdmodel \
            --params_filename inference.pdiparams \
            --save_file ch_ppocr_mobile_v2.0_cls_infer/ch_ppocr_mobile_v2.0_cls_infer.onnx \
            --enable_dev_version True

# Seting fix input shape
python -m paddle2onnx.optimize --input_model ch_ppocr_mobile_v2.0_cls_infer/ch_ppocr_mobile_v2.0_cls_infer.onnx \
                            --output_model ch_ppocr_mobile_v2.0_cls_infer/ch_ppocr_mobile_v2.0_cls_infer.onnx \
                            --input_shape_dict "{'x':[1,3,48,192]}"

5.2.1. rknn模型

教程测试在lubancat-4的npu上部署,还需要使用toolkit2将模型转换成rknn模型,模型转换程序参考下配套教程代码仓库。 如果你测试的不是lubancat-4(rk3588系列),请修改程序中 platform = "rk3588"

# 获取程序
git clone https://gitee.com/LubanCat/lubancat_ai_manual_code.git
cd lubancat_ai_manual_code/example/ppocrv4

# 在toolkit2环境下,转换文字识别模型
(toolkit2_1.6) llh@YH-LONG:xxx/pp-ocrv4$ python convert_rec.py
W __init__: rknn-toolkit2 version: 1.6.0+81f21f4d
--> Config model
done
--> Loading model
W load_onnx: It is recommended onnx opset 19, but your onnx model opset is 12!
Loading : 100%|███████████████████████████████████████████████| 420/420 [00:00<00:00, 111961.85it/s]
W load_onnx: The config.mean_values is None, zeros will be set for input 0!
W load_onnx: The config.std_values is None, ones will be set for input 0!
done
--> Building model
done
--> Export rknn model
done

# 转换文字检测模型
(toolkit2_1.6) llh@YH-LONG:xxx/pp-ocrv4$ python convert_det.py
W __init__: rknn-toolkit2 version: 1.6.0+81f21f4d
--> Config model
done
--> Loading model
W load_onnx: It is recommended onnx opset 19, but your onnx model opset is 12!
Loading : 100%|███████████████████████████████████████████████| 342/342 [00:00<00:00, 137123.79it/s]
done
--> Building model
W build: found outlier value, this may affect quantization accuracy
const name          abs_mean    abs_std     outlier value
conv2d_398.w_0      6.53        8.93        58.173
conv2d_402.w_0      2.41        3.73        34.766
# 省略
conv2d_421.w_0      0.08        0.09        11.623
GraphPreparing : 100%|██████████████████████████████████████████| 214/214 [00:00<00:00, 5121.45it/s]
Quantizating : 100%|██████████████████████████████████████████████| 214/214 [00:05<00:00, 37.25it/s]
W build: The default input dtype of 'x' is changed from 'float32' to 'int8' in rknn model for performance!
                    Please take care of this change when deploy rknn model with Runtime API!
W build: The default output dtype of 'sigmoid_0.tmp_0' is changed from 'float32' to 'int8' in rknn model for performance!
                    Please take care of this change when deploy rknn model with Runtime API!
done
--> Export rknn model
done

# 方向分类模型
(toolkit2_1.6) llh@YH-LONG:xxx/pp-ocrv4$ python convert_cls.py
W __init__: rknn-toolkit2 version: 1.6.0+81f21f4d
--> Config model
done
--> Loading model
W load_onnx: It is recommended onnx opset 19, but your onnx model opset is 9!
Loading : 100%|███████████████████████████████████████████████| 250/250 [00:00<00:00, 151681.76it/s]
W load_onnx: The config.mean_values is None, zeros will be set for input 0!
W load_onnx: The config.std_values is None, ones will be set for input 0!
done
--> Building model
done
--> Export rknn model
done

生成的rknn模型都保存在model目录下。

5.3. 板卡上部署测试

提示

测试时请确认rknn-Toolkit2的版本和运行推理程序时使用的librknnrt.so库版本一致。

在板卡上拉取例程,并将前面转换的rknn模型传输到板卡,然后放到例程的model目录下。更多rknn例程使用请参考 rknn_model_zoo

# 板卡上拉取程序
git clone https://gitee.com/LubanCat/lubancat_ai_manual_code.git
cd lubancat_ai_manual_code/example/ppocrv4/cpp

# 直接编译例程
cat@lubancat:xxx/cpp$ ./build-linux.sh -t rk3588
./build-linux.sh -t rk3588
===================================
TARGET_SOC=rk3588
INSTALL_DIR=/home/cat/yolov8/rknn_model_zoo/examples/PPOCR/PPOCR-System/cpp/install/rk3588_linux
BUILD_DIR=/home/cat/yolov8/rknn_model_zoo/examples/PPOCR/PPOCR-System/cpp/build/build_rk3588_linux
ENABLE_DMA32=OFF
ENABLE_ZERO_COPY=OFF
CC=aarch64-linux-gnu-gcc
CXX=aarch64-linux-gnu-g++
===================================
-- Configuring done
-- Generating done
-- Build files have been written to: /home/cat/yolov8/rknn_model_zoo/examples/PPOCR/PPOCR-System/cpp/build/build_rk3588_linux
Scanning dependencies of target rknn_ppocr_system_demo
[ 16%] Building CXX object CMakeFiles/rknn_ppocr_system_demo.dir/rknpu2/ppocr_system.cc.o
[ 33%] Building CXX object CMakeFiles/rknn_ppocr_system_demo.dir/main.cc.o
[ 50%] Building CXX object CMakeFiles/rknn_ppocr_system_demo.dir/postprocess.cc.o
# 省略....
[ 66%] Linking CXX executable rknn_ppocr_system_demo
[100%] Built target rknn_ppocr_system_demo
[100%] Built target rknn_ppocr_system_demo
Install the project...
-- Install configuration: ""
# 省略....

切换到install/rk3588_linux目录下,简单执行命令,查看下参数:

# 查看程序命令使用
cat@lubancat:xxx/cpp/install/rk3588_linux$ ./rknn_ppocr_system_demo
./rknn_ppocr_system_demo <det_model_path> <rec_model_path> <image_path>
or ./rknn_ppocr_system_demo <det_model_path> <cls_model_path> <rec_model_path> <image_path>

编译出的ocr程序,支持两种命令形式,我们先测试下只是用文本检测和文本识别:

# rknn_ppocr_system_demo <det_model_path> <rec_model_path> <image_path>
# 这里测试是将rknn_ppocr_system_demo和ppocrv4_det_rk3588.rknn ppocrv4_rec_rk3588.rknn 三个文件放在同一目录下,请注意模型的路径。
cat@lubancat:xxx/cpp/install/rk3588_linux$ ./rknn_ppocr_system_demo ppocrv4_det_rk3588.rknn ppocrv4_rec_rk3588.rknn ./model/test.jpg
model input num: 1, output num: 1
input tensors:
index=0, name=x, n_dims=4, dims=[1, 480, 480, 3], n_elems=691200, size=691200, fmt=NHWC, type=INT8, qnt_type=AFFINE, zp=-14, scale=0.018658
output tensors:
index=0, name=sigmoid_0.tmp_0, n_dims=4, dims=[1, 1, 480, 480], n_elems=230400, size=230400, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
model is NHWC input fmt
model input height=480, width=480, channel=3
model input num: 1, output num: 1
input tensors:
index=0, name=x, n_dims=4, dims=[1, 48, 320, 3], n_elems=46080, size=92160, fmt=NHWC, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
output tensors:
index=0, name=softmax_11.tmp_0, n_dims=3, dims=[1, 40, 6625, 0], n_elems=265000, size=530000, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
model is NHWC input fmt
model input height=48, width=320, channel=3
# 省略...........
[4] @ [(25, 179), (298, 177), (300, 194), (26, 195)]
regconize result: 【品牌】:代加工方式/OEMODM, score=0.706945
[5] @ [(25, 209), (235, 209), (235, 227), (25, 227)]
regconize result: 【品名】:纯臻营养护发素, score=0.710571
[6] @ [(25, 240), (241, 240), (241, 258), (25, 258)]
regconize result: 【产品编号】:YM-X-3011, score=0.708099
[7] @ [(413, 233), (429, 233), (429, 303), (413, 303)]
regconize result: ODMOEM, score=0.707845
[8] @ [(25, 270), (180, 270), (180, 289), (25, 289)]
regconize result: 【净含量】:220ml, score=0.709872
[9] @ [(27, 304), (251, 304), (251, 320), (27, 320)]
regconize result: 【适用人群】:适合所有肤质, score=0.702712
[10] @ [(25, 334), (342, 334), (342, 352), (25, 352)]
regconize result: 【主要成分】:鲸蜡硬脂醇、燕麦β-葡聚, score=0.693295
[11] @ [(27, 364), (282, 365), (281, 384), (26, 382)]
regconize result: 糖、椰油酰胺丙基甜菜碱、泛醒, score=0.687570
[12] @ [(368, 368), (476, 368), (476, 387), (368, 387)]
regconize result: (成品包材), score=0.711100
[13] @ [(25, 397), (361, 395), (362, 413), (26, 415)]
regconize result: 【主要功能】:可紧致头发磷层,从而达到, score=0.710629
[14] @ [(27, 428), (369, 428), (369, 445), (27, 445)]
regconize result: 即时持久改善头发光泽的效果,给干燥的头, score=0.710064
[15] @ [(27, 459), (136, 459), (136, 478), (27, 478)]
regconize result: 发足够的滋养, score=0.711751

测试图片保存在当前目录下,如下:

broken

另外一种是有带文字方向检测的命令,在识别时会根据文字方向进行识别。

# rknn_ppocr_system_demo <det_model_path> <cls_model_path> <rec_model_path> <image_path>
cat@lubancat:xxx/cpp/install/rk3588_linux$ ./rknn_ppocr_system_demo ppocrv4_det_rk3588.rknn ch_ppocr_mobile_v2.0_cls_rk3588.rknn ppocrv4_rec_rk3588.rknn ./11.jpg
model input num: 1, output num: 1
input tensors:
index=0, name=x, n_dims=4, dims=[1, 480, 480, 3], n_elems=691200, size=691200, fmt=NHWC, type=INT8, qnt_type=AFFINE, zp=-14, scale=0.018658
output tensors:
index=0, name=sigmoid_0.tmp_0, n_dims=4, dims=[1, 1, 480, 480], n_elems=230400, size=230400, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
model is NHWC input fmt
model input height=480, width=480, channel=3
model input num: 1, output num: 1
input tensors:
index=0, name=x, n_dims=4, dims=[1, 48, 192, 3], n_elems=27648, size=55296, fmt=NHWC, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
output tensors:
index=0, name=softmax_0.tmp_0, n_dims=2, dims=[1, 2, 0, 0], n_elems=2, size=4, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
model is NHWC input fmt
model input height=48, width=192, channel=3
model input num: 1, output num: 1
input tensors:
index=0, name=x, n_dims=4, dims=[1, 48, 320, 3], n_elems=46080, size=92160, fmt=NHWC, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
output tensors:
index=0, name=softmax_11.tmp_0, n_dims=3, dims=[1, 40, 6625, 0], n_elems=265000, size=530000, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
model is NHWC input fmt
model input height=48, width=320, channel=3
# 省略...........
[5] @ [(110, 168), (135, 167), (136, 469), (112, 470)]
regconize result: (45元/每公斤,100公斤起订), score=0.668917
[6] @ [(35, 195), (72, 195), (72, 471), (35, 471)]
regconize result: 纯臻营养护发素, score=0.710519
[7] @ [(177, 197), (200, 198), (197, 470), (175, 469)]
regconize result: (品牌】:代加工方式/OEMODM, score=0.700224
[8] @ [(141, 208), (168, 208), (168, 473), (141, 473)]
regconize result: 每瓶22元,1000瓶起订), score=0.706473
[9] @ [(365, 211), (385, 211), (385, 472), (365, 472)]
regconize result: 糖、椰油酰胺丙基甜菜碱、泛酸, score=0.681553
[10] @ [(302, 240), (321, 240), (321, 468), (302, 468)]
regconize result: 适用人群】:适合所有肤质, score=0.709757
[11] @ [(208, 261), (230, 261), (230, 468), (208, 468)]
regconize result: 品名】:纯臻营养护发素, score=0.710094
[12] @ [(239, 259), (260, 259), (260, 468), (239, 468)]
regconize result: 产品编号】:YM-X-3011, score=0.707292
[13] @ [(82, 322), (106, 323), (104, 472), (79, 471)]
regconize result: 产品信息/参数, score=0.706752
[14] @ [(269, 315), (290, 314), (291, 467), (271, 468)]
regconize result: 净含量】:220ml, score=0.707764
[15] @ [(459, 360), (479, 361), (478, 471), (457, 470)]
regconize result: 发足够的滋养, score=0.711100

测试图片保存在当前目录下,如下:

broken