4. PP-ORCv5

PP-OCRv5 是PP-OCR新一代文字识别解决方案, 该方案聚焦于多场景、多文字类型的文字识别。在文字类型方面,PP-OCRv5支持简体中文、中文拼音、繁体中文、英文、日文5大主流文字类型,在场景方面, PP-OCRv5升级了中英复杂手写体、竖排文本、生僻字等多种挑战性场景的识别能力。 在内部多场景复杂评估集上,PP-OCRv5较PP-OCRv4端到端提升13个百分点。

broken

下面将在鲁班猫rk系列板卡上部署PP-OCRv5_mobile。

4.1. 环境安装和简单测试

4.1.1. 环境安装

在PC上中创建一个PaddleOCR环境,然后安装相关工具等。

# 使用conda创建虚拟环境
conda create -n PaddleOCR3.0 python=3.11
conda activate PaddleOCR3.0

安装PaddlePaddle 3.0,具体命令参考 快速安装

# 安装 PaddlePaddle 3.0,下面是参考命令
(paddle3.0) llh@llh:/xxx$ python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/

安装PaddleOCR 3.0。

# 安装 paddleocr
(paddle3.0) llh@llh:/xxx$ pip install paddleocr==3.0.0

4.1.2. 推理测试

使用paddleocr api测试:

from paddleocr import PaddleOCR

# 初始化 PaddleOCR 实例
ocr = PaddleOCR(
    use_doc_orientation_classify=False,
    use_doc_unwarping=False,
    use_textline_orientation=False)

# 对示例图像执行 OCR 推理
result = ocr.predict(input="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png")

# 可视化结果并保存 json 结果
for res in result:
    res.print()
    res.save_to_img("output")
    res.save_to_json("output")

执行程序输出:

Creating model: ('PP-OCRv5_mobile_det', None)
Using official model (PP-OCRv5_mobile_det), the model files will be automatically downloaded and saved in /home/llh/.paddlex/official_models.
Connecting to https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-OCRv5_mobile_det_infer.tar ...
Downloading PP-OCRv5_mobile_det_infer.tar ...
[==================================================] 100.00%
Extracting PP-OCRv5_mobile_det_infer.tar
[==================================================] 100.00%
E0603 10:19:29.581522 14188 analysis_config.cc:169] Please use PaddlePaddle with GPU version.
Creating model: ('PP-OCRv5_mobile_rec', None)
Using official model (PP-OCRv5_mobile_rec), the model files will be automatically downloaded and saved in /home/llh/.paddlex/official_models.
Connecting to https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-OCRv5_mobile_rec_infer.tar ...
Downloading PP-OCRv5_mobile_rec_infer.tar ...
[==================================================] 100.00%
Extracting PP-OCRv5_mobile_rec_infer.tar
[==================================================] 100.00%
E0603 10:19:42.374616 14188 analysis_config.cc:169] Please use PaddlePaddle with GPU version.
Connecting to https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png ...
Downloading general_ocr_002.png ...
[==================================================] 100.00%
{'res': {'input_path': '/home/llh/.paddlex/predict_input/general_ocr_002.png', 'page_index': None,
'model_settings': {'use_doc_preprocessor': True, 'use_textline_orientation': False}, 'doc_preprocessor_res':
{'input_path': None, 'page_index': None, 'model_settings': {'use_doc_orientation_classify': False, 'use_doc_unwarping': False}, 'angle': -1},
'dt_polys': array([[[  3,  10],
        ...,
        [  4,  30]],

    ...,

    [[ 99, 456],
        ...,
        [ 99, 479]]], dtype=int16), 'text_det_params': {'limit_side_len': 736, 'limit_type': 'min', 'thresh': 0.3, 'max_side_limit': 4000,
        'box_thresh': 0.6, 'unclip_ratio': 1.5}, 'text_type': 'general', 'textline_orientation_angles': array([-1, ..., -1]),
        'text_rec_score_thresh': 0.0, 'rec_texts': ['www.997700', '', 'Cm', '登机牌', 'BOARDING', 'PASS', 'CLASS', '序号SERIAL NO.',
        '座位号', 'SEAT NO.', '航班FLIGHT', '日期DATE', '舱位', '', 'W', '035', '12F', 'MU2379', '03DEc', '始发地', 'FROM', '登机口',
        'GATE', '登机时间BDT', '目的地TO', '福州', 'TAIYUAN', 'G11', 'FUZHOU', '身份识别IDNO.', '姓名NAME', 'ZHANGQIWEI', '票号TKT NO.',
        '张祺伟', '票价FARE', 'ETKT7813699238489/1', '登机口于起飞前10分钟关闭 GATESCL0SE10MINUTESBEFOREDEPARTURETIME'],
        'rec_scores': array([0.67634642, ..., 0.97416109]), 'rec_polys': array([[[  3,  10],
        ...,
        [  4,  30]],

    ...,

    [[ 99, 456],
        ...,
        [ 99, 479]]], dtype=int16), 'rec_boxes': array([[  3, ...,  30],
    ...,
    [ 99, ..., 479]], dtype=int16)}}

结果保存在当前目录下的output文件下。

或者使用PaddleOCR命令测试:

# 运行 PP-OCRv5 推理
(paddle3.0) llh@llh:/xxx$ paddleocr ocr -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png
    --use_doc_orientation_classify False --use_doc_unwarping False --use_textline_orientation False

4.2. 模型准备

4.2.1. 模型转换成onnx

安装paddle2onnx工具:

# 安装paddle2onnx
(paddle3.0) llh@llh:/xxx$ paddlex --install paddle2onnx

执行了前面的推理测试,就会自动下载PP-OCRv5的模型文件,保存在~/.paddlex/official_models目录下。 下面使用paddle2onnx将其转成onnx模型:

# 转换PP-OCRv5_mobile_det
(paddle3.0) llh@llh:/xxx$ paddlex --paddle2onnx --paddle_model_dir ~/.paddlex/official_models/PP-OCRv5_mobile_det --onnx_model_dir PP-OCRv5_mobile_det
Input dir: /home/llh/.paddlex/official_models/PP-OCRv5_mobile_det
Output dir: PP-OCRv5_mobile_det
Paddle2ONNX conversion starting...
[Paddle2ONNX] Start parsing the Paddle model file...
[Paddle2ONNX] Use opset_version = 14 for ONNX export.
[Paddle2ONNX] PaddlePaddle model is exported as ONNX format now.
2025-06-02 10:47:38 [INFO]      Try to perform constant folding on the ONNX model with Polygraphy.
# 省略....................
2025-06-02 10:47:44 [INFO]      ONNX model saved in PP-OCRv5_mobile_det/inference.onnx.
Paddle2ONNX conversion succeeded
Copied /home/llh/.paddlex/official_models/PP-OCRv5_mobile_det/inference.yml to PP-OCRv5_mobile_det/inference.yml
Done

# 转换PP-OCRv5_mobile_rec
(paddle3.0) llh@llh:/xxx$ paddlex --paddle2onnx --paddle_model_dir ~/.paddlex/official_models/PP-OCRv5_mobile_rec --onnx_model_dir PP-OCRv5_mobile_rec
Input dir: /home/llh/.paddlex/official_models/PP-OCRv5_mobile_rec
Output dir: PP-OCRv5_mobile_rec
Paddle2ONNX conversion starting...
[Paddle2ONNX] Start parsing the Paddle model file...
[Paddle2ONNX] Use opset_version = 14 for ONNX export.
[Paddle2ONNX] PaddlePaddle model is exported as ONNX format now.
2025-06-02 10:52:52 [INFO]      Try to perform constant folding on the ONNX model with Polygraphy.
# 省略....................
[I]     Total Nodes | Original:   650, After Folding:   650 |     0 Nodes Folded
2025-06-02 10:52:55 [INFO]      ONNX model saved in PP-OCRv5_mobile_rec/inference.onnx.
Paddle2ONNX conversion succeeded
Copied /home/llh/.paddlex/official_models/PP-OCRv5_mobile_rec/inference.yml to PP-OCRv5_mobile_rec/inference.yml
Done

结果保存在当前目录下的PP-OCRv5_mobile_det和PP-OCRv5_mobile_rec文件中。

4.2.2. 转成rknn模型

教程测试在lubancat-4的npu上部署,还需要使用toolkit2将模型转换成rknn模型,模型转换程序参考下配套教程代码仓库。 如果你测试的不是lubancat-4(rk3588系列),请执行命令时设置对应的平台。

# 获取程序(程序可能没有及时更新)
git clone https://gitee.com/LubanCat/lubancat_ai_manual_code.git
cd lubancat_ai_manual_code/example/ppocrv5/python

转换PP-OCRv5_mobile_det模型,会在当前目录下生成PP-OCRv5_mobile_det.rknn文件:

# 在toolkit2环境下
(toolkit2_2.3.2) llh@YH-LONG:xxx/$ python convert_det.py  ../PP-OCRv5_mobile_det/inference.onnx  rk3588 fp PP-OCRv5_mobile_det.rknn
I rknn-toolkit2 version: 2.3.2
--> Config model
done
--> Loading model
W load_onnx: If you don't need to crop the model, don't set 'inputs'/'input_size_list'/'outputs'!
I Loading : 100%|█████████████████████████████████████████████| 300/300 [00:00<00:00, 232071.41it/s]
done
--> Building model
W build: The dataset='' is ignored because do_quantization = False!
I OpFusing 0: 100%|██████████████████████████████████████████████| 100/100 [00:00<00:00, 719.17it/s]
I OpFusing 1 : 100%|█████████████████████████████████████████████| 100/100 [00:00<00:00, 379.09it/s]
I OpFusing 0 : 100%|█████████████████████████████████████████████| 100/100 [00:00<00:00, 255.95it/s]
I OpFusing 1 : 100%|█████████████████████████████████████████████| 100/100 [00:00<00:00, 240.68it/s]
I OpFusing 2 : 100%|█████████████████████████████████████████████| 100/100 [00:00<00:00, 171.35it/s]
I rknn building ...
I rknn building done.
done
--> Export rknn model
done

转换PP-OCRv5_mobile_rec模型,会在当前目录下生成PP-OCRv5_mobile_rec.rknn文件:

# 在toolkit2环境下,转换文字识别模型
(toolkit2_2.3.2) llh@YH-LONG:xxx/$ python convert_rec.py  ../PP-OCRv5_mobile_rec/inference.onnx  rk3588 fp PP-OCRv5_mobile_rec.rknn
I rknn-toolkit2 version: 2.3.2
--> Config model
W config: Please make sure the model can be dynamic when enable 'config.dynamic_input'!
I The 'dynamic_input' function has been enabled, the MaxShape is dynamic_input[2] = [[1, 3, 48, 640]]!
        The following functions are subject to the MaxShape:
            1. The quantified dataset needs to be configured according to MaxShape
            2. The eval_perf or eval_memory return the results of MaxShape
done
--> Loading model
# 省略...........................................
E RKNN: [09:15:31.056] channel is too large, may produce thousands of regtask, fallback to cpu!
E RKNN: [09:15:31.056] channel is too large, may produce thousands of regtask, fallback to cpu!
E RKNN: [09:15:31.081] channel is too large, may produce thousands of regtask, fallback to cpu!
E RKNN: [09:15:31.812] channel is too large, may produce thousands of regtask, fallback to cpu!
E RKNN: [09:15:31.812] channel is too large, may produce thousands of regtask, fallback to cpu!
E RKNN: [09:15:31.812] channel is too large, may produce thousands of regtask, fallback to cpu!
E RKNN: [09:15:31.812] channel is too large, may produce thousands of regtask, fallback to cpu!
E RKNN: [09:15:31.836] channel is too large, may produce thousands of regtask, fallback to cpu!
I rknn building done.
done
--> Export rknn model
done

教程测试的PP-OCRv5_mobile_rec模型设置了 dynamic_input 输入,支持三个维度的输入:

dynamic_input = [
    [[1,3,48,160]],
    [[1,3,48,320]],
    [[1,3,48,640]],
]

后面的部署程序将处理输入图像宽度大于480时,设置维度[1,3,48,640];图像宽度大于240小于480时, 设置输入维度[1,3,48,320];当图像宽度小于240时,设置输入维度[1,3,48,160]。

4.3. 板卡上部署测试

在板卡上拉取例程,并将前面转换的rknn模型传输到板卡,然后放到例程的model目录下。

# 获取程序(程序可能没有及时更新)
git clone https://gitee.com/LubanCat/lubancat_ai_manual_code.git
cd lubancat_ai_manual_code/example/ppocrv5/cpp


# 直接编译例程
cat@lubancat:xxx/cpp$ ./build-linux.sh -t rk3588
./build-linux.sh -t rk3588
===================================
TARGET_SOC=rk3588
INSTALL_DIR=/xxx/ppocrv5/cpp/install/rk3588_linux
BUILD_DIR=/xxx/ppocrv5/cpp/build/build_rk3588_linux
CC=aarch64-linux-gnu-gcc
CXX=aarch64-linux-gnu-g++
===================================
-- Configuring done
-- Generating done
-- Build files have been written to: /xxx/ppocrv5/cpp/build/build_rk3588_linux
[ 15%] Built target imageutils
[ 30%] Built target imagedrawing
# 省略....................
[ 84%] Built target audioutils
[100%] Built target rknn_ppocrv5_demo

切换到install/rk3588_linux目录下,然后运行程序:

# 获取一张测试图片
wget -c https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png


# 执行命令,设置./rknn_ppocrv5_demo <det_model_path> <rec_model_path> <image_path>
cat@lubancat:~/xxx/install/rk3588_linux$ ./rknn_ppocrv5_demo ./model/PP-OCRv5_mobile_det.rknn
    ./model/PP-OCRv5_mobile_rec.rknn  ./general_ocr_002.png
model input num: 1, output num: 1
input tensors:
index=0, name=x, n_dims=4, dims=[1, 480, 480, 3], n_elems=691200, size=1382400, fmt=NHWC, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
output tensors:
index=0, name=fetch_name_0, n_dims=4, dims=[1, 1, 480, 480], n_elems=230400, size=460800, fmt=NCHW, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
model is NHWC input fmt
model input height=480, width=480, channel=3
model input num: 1, output num: 1
input tensors:
index=0, name=x, n_dims=4, dims=[1, 48, 640, 3], n_elems=92160, size=184320, fmt=NHWC, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
output tensors:
index=0, name=fetch_name_0, n_dims=3, dims=[1, 80, 18385, 0], n_elems=1470800, size=2941600, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
dynamic inputs shape range:
index=0, name=x, shape_number=3, range=[[1, 48, 640, 3],[1, 48, 320, 3],[1, 48, 160, 3]], fmt = NHWC
model is NHWC input fmt
model input height=48, width=640, channel=3
rga_api version 1.10.1_[0]
[Info] num_contours=33
[Info] results->count: [28]
# 省略..............................
regconize result: 航班FLIGHT日期DATE舱位CLASS, score=0.967866
[5] @ [(505, 128), (571, 128), (571, 155), (505, 155)]
regconize result: 035, score=0.998698
[6] @ [(720, 124), (832, 124), (832, 154), (720, 154)]
regconize result: 12F6, score=0.775238
[7] @ [(76, 139), (326, 134), (328, 157), (78, 162)]
regconize result: MU237903DEC, score=0.989524
[8] @ [(405, 134), (429, 134), (429, 155), (405, 155)]
regconize result: W, score=0.987304
[9] @ [(337, 171), (470, 171), (470, 195), (337, 195)]
regconize result: 始发地FROM, score=0.992745
[10] @ [(485, 172), (614, 170), (616, 193), (487, 194)]
regconize result: 登机口GATE, score=0.997628
[11] @ [(673, 166), (813, 162), (815, 188), (675, 191)]
regconize result: 登机时间、BDT, score=0.958984
[12] @ [(61, 178), (171, 174), (173, 200), (63, 203)]
regconize result: 目的地TO, score=0.956347
[13] @ [(87, 205), (171, 201), (175, 226), (91, 231)]
regconize result: 福州, score=0.999755
[14] @ [(336, 217), (479, 214), (481, 235), (337, 239)]
regconize result: TAIYUAN, score=0.997698
[15] @ [(483, 212), (556, 212), (556, 235), (483, 235)]
regconize result: LG11, score=0.861450
[16] @ [(85, 226), (205, 226), (205, 251), (85, 251)]
regconize result: FUZHOU, score=0.996338
[17] @ [(339, 238), (481, 234), (483, 256), (341, 259)]
regconize result: 身份识别IDNO, score=0.943787
[18] @ [(65, 249), (175, 249), (175, 271), (65, 271)]
regconize result: 姓名NAME, score=0.993164
[19] @ [(72, 276), (263, 270), (265, 294), (74, 300)]
regconize result: ZHANGQIWEI, score=0.975732
[20] @ [(457, 295), (578, 293), (580, 315), (459, 317)]
regconize result: 票号TKTNO, score=0.986049
[21] @ [(95, 311), (210, 309), (212, 334), (97, 336)]
regconize result: 张祺伟, score=0.995768
[22] @ [(65, 341), (168, 341), (168, 365), (65, 365)]
regconize result: 票价FARE, score=0.992838
[23] @ [(345, 349), (658, 346), (660, 364), (347, 367)]
regconize result: ETKT7813699238489/1, score=0.978002
[24] @ [(97, 458), (828, 441), (830, 459), (98, 477)]
regconize result: 登机口于起飞前10分钟关闭GATESCLOSE10MINUTESBEFOREDEPARTURETIME, score=0.913095

识别文本输出如上,检测的结果输出如下图所示:

broken