4. Resnet18图像分类¶

本章使用Resnet18进行图像分类，该网络是在ImageNet数据集上的预训练模型。

4.1. 环境准备¶

TPU-MLIR的环境搭建参考前面章节。

创建工作目录，获取Resnet18模型文件：

# 在工作目录下创建一个Resnet18
root@cfa2f8976af9:/workspace# mkdir resnet18 && cd resnet18
root@cfa2f8976af9:/workspace/resnet18#

# 获取Reset18模型文件并解压
root@cfa2f8976af9:/workspace/resnet18# git clone https://github.com/shicai/MobileNet-Caffe.git
root@cfa2f8976af9:/workspace/resnet18# tar xf resnet18-v1-7.tar.gz

复制量化所需的数据集文件到Reset18工作目录下(非必须)：

# 拉取 tpu-mlir源码，如果之前有拉取过，可以忽略
root@cfa2f8976af9:/workspace# git clone https://github.com/milkv-duo/tpu-mlir.git
root@cfa2f8976af9:/workspace# cd resnet18
root@cfa2f8976af9:/workspace/resnet18#  cp -rf ../tpu-mlir/regression/dataset/ILSVRC2012/ ./

4.2. 模型转换¶

4.2.1. ONNX转换成MLIR¶

使用model_transform(model_transform.py)工具将模型转成MLIR模型。

# model_def指定指定模型定义文件，input_shapes指定输入形状，
# mean和scale指定均值和归一化参数，pixel_format指定输入图像格式默认bgr，resize_dims是指图片调整为设置的大小,再裁剪成模型输入的尺寸。
# test_input指定验证的图像，test_result指定验证的结果文件，mlir指定输出mlir模型文件名称和路径。
root@cfa2f8976af9:/workspace/mobilenetv2# model_transform.py \
                                        --model_name resnet18 \
                                        --model_def ../resnet18-v1-7/resnet18-v1-7.onnx \
                                        --test_input ./cat.jpg \
                                        --input_shapes [[1,3,224,224]] \
                                        --resize_dims 256,256 \
                                        --mean 123.675,116.28,103.53 \
                                        --scale 0.0171,0.0175,0.0174 \
                                        --pixel_format rgb \
                                        --test_result resnet18_top_outputs.npz \
                                        --mlir resnet18.mlir
# 省略.....................
[flatten_170_Flatten             ]        CLOSE [PASSED]
    (1, 512) float32
    close order            = 4
[resnetv15_dense0_fwd_Gemm       ]        CLOSE [PASSED]
    (1, 1000) float32
    close order            = 3
32 compared
32 passed
0 equal, 5 close, 27 similar
0 failed
0 not equal, 0 not similar
min_similiarity = (0.9999998211860657, 0.9999992583105448, 121.27786636352539)
Target    resnet18_top_outputs.npz
Reference resnet18_ref_outputs.npz
npz compare PASSED.
compare resnetv15_dense0_fwd_Gemm: 100%|███████████████████████████████████████████████████████████| 32/32 [00:00<00:00, 35.89it/s]
[Success]: npz_tool.py compare resnet18_top_outputs.npz resnet18_ref_outputs.npz --tolerance 0.99,0.99 --except - -vv

Resnet18模型的预处理，图像首先被缩放到最小尺寸256x256，然后被裁剪成224x224，然后用mean=255*[0.485,0.456,0.406]和std=255*[0.229,0.224,0.225]对图像进行归一化，最后一步是将其从HWC转换到CHW布局。

因此，model_transform的参数mean和scale分别设置为123.675,116.28,103.53 和 0.0171,0.0175,0.0174；resize_dims设置为256,256。

执行命令后，会在当前目录下生成resnet18.mlir， resnet18_in_f32.npz和resnet18_top_f32_all_origin_weight.npz等文件。

4.2.2. INT8量化模型¶

使用run_calibration(run_calibration.py)工具得到量化需要的校准表，使用了100张来自ILSVRC2012的图片测试：

root@cfa2f8976af9:/workspace/mobilenetv2# run_calibration.py resnet18.mlir \
                                            --dataset ../ILSVRC2012 \
                                            --input_num 100 \
                                            -o resnet18_cali_table
TPU-MLIR v1.8.1-20240712
GmemAllocator use FitFirstAssign
reused mem is 1003520, all mem is 3439080
2024/07/18 11:18:44 - INFO :
load_config Preprocess args :
        resize_dims           : [256, 256]
        keep_aspect_ratio     : False
        keep_ratio_mode       : letterbox
        pad_value             : 0
        pad_type              : center
        input_dims            : [224, 224]
        --------------------------
        mean                  : [123.675, 116.28, 103.53]
        scale                 : [0.0171, 0.0175, 0.0174]
        --------------------------
        pixel_format          : rgb
        channel_format        : nchw

last input data (idx=100) not valid, droped
input_num = 100, ref = 100
real input_num = 100
activation_collect_and_calc_th for op: resnetv15_dense0_fwd_Gemm: 100%|███████████████████████████████████████████| 33/33 [00:06<00:00,  5.22it/s]
[2048] threshold: resnetv15_dense0_fwd_Gemm: 100%|██████████████████████████████████████████████████████████████| 33/33 [00:00<00:00, 703.48it/s]
[2048] threshold: resnetv15_dense0_fwd_Gemm: 100%|███████████████████████████████████████████████████████████| 33/33 [00:00<00:00, 73.88it/s]
GmemAllocator use FitFirstAssign
reused mem is 1003520, all mem is 3439080
GmemAllocator use FitFirstAssign
reused mem is 1003520, all mem is 3439080
prepare data from 100
tune op: resnetv15_dense0_fwd_Gemm: 100%|███████████████████████████████████████████████████████████| 33/33 [00:08<00:00,  3.71it/s]
auto tune end, run time:9.854496955871582

使用model_deploy(model_deploy.py)工具将MLIR模型转成INT8对称的cvimodel模型：

# 执行下面命令，其中参数calibration_table指定前面生成的校准表，test_input指定验证的图像，test_reference前面输出的验证结果文件
root@cfa2f8976af9:/workspace/mobilenetv2# model_deploy.py \
                                        --mlir resnet18.mlir \
                                        --quantize INT8 \
                                        --calibration_table resnet18_cali_table \
                                        --chip cv181x \
                                        --test_input ./cat.jpg \
                                        --test_reference resnet18_top_outputs.npz \
                                        --compare_all \
                                        --fuse_preprocess \
                                        --model resnet18_int8_sym_fused.cvimodel
# 省略.....................
[flatten_170_Flatten             ]        EQUAL [PASSED]
    (1, 512, 1, 1) float32
[resnetv15_dense0_fwd_Gemm       ]        EQUAL [PASSED]
    (1, 1000, 1, 1) float32
[resnetv15_dense0_fwd_Gemm_f32   ]        EQUAL [PASSED]
    (1, 1000, 1, 1) float32
9 compared
9 passed
9 equal, 0 close, 0 similar
0 failed
0 not equal, 0 not similar
min_similiarity = (1.0, 1.0, inf)
Target    resnet18_cv181x_int8_sym_model_outputs.npz
Reference resnet18_cv181x_int8_sym_tpu_outputs.npz
npz compare PASSED.
compare resnetv15_dense0_fwd_Gemm_f32: 100%|███████████████████████████████████████████████| 9/9 [00:00<00:00, 47.13it/s]
[Success]: npz_tool.py compare resnet18_cv181x_int8_sym_model_outputs.npz resnet18_cv181x_int8_sym_tpu_outputs.npz
--tolerance 0.99,0.90 --except - -vv

# 运行完成后,会在当前目录下生成模型文件resnet18_int8_sym_fused.cvimodel

其中设置了参数fuse_preprocess，是将预处理融入到模型中。

提示

使用LubanCat-P1板卡processor参数设置 --processor cv181x 或者设置chip参数为 --chip cv181x 。

4.3. 板卡上模型部署¶

需要拉取配套例程程序和交叉编译器，然后编译例程，将生成的可执行文件和模型文件拷贝到板卡上运行。

4.3.1. 例程解释¶

完整例程请参考配套教程，resnet18模型的推理部署例程与前面一章mobilenet_v2例程类似，只是resnet18模型后处理添加了一个简单的softmax函数。

resnet18.cpp¶

// 省略....................
// softmax
void softmax(float *prob, int n) {
    float sum = 0.0;
    for (int i = 0; i < n; ++i) {
        prob[i] = exp(prob[i]);
        sum += prob[i];
    }
    for (int i = 0; i < n; ++i) {
        prob[i] /= sum;
    }
}
// 省略....................

int32_t top_num = 5;
float *prob = (float *)CVI_NN_TensorPtr(output);
int32_t count = CVI_NN_TensorCount(output);

softmax(prob, count);

// find top-k prob and cls
std::vector<size_t> idx(count);
std::iota(idx.begin(), idx.end(), 0);
std::sort(idx.begin(), idx.end(), [&prob](size_t idx_0, size_t idx_1) {return prob[idx_0] > prob[idx_1];});

// show results.
printf("------\n");
for (size_t i = 0; i < top_num; i++) {
    int top_k_idx = idx[i];
    printf("  %f, idx %d", prob[top_k_idx], top_k_idx);
    if (!labels.empty())
    printf(", %s", labels[top_k_idx].c_str());
    printf("\n");
}
printf("------\n");
CVI_NN_CleanupModel(model);
printf("CVI_NN_CleanupModel succeeded\n");
return 0;
// 省略....................

4.3.2. 例程交叉编译¶

执行下面命令， 根据部署的板卡系统选择交叉编译器，如果是arm64设置aarch64-linux-gnu，如果是riscv64设置riscv64-linux-musl-x86_64 。

# 获取交叉编译器(如果前面测试获取了交叉编译器，就不需要),教程测试rsiv64系统，并设置交叉编译器到环境变量
wget https://sophon-file.sophon.cn/sophon-prod-s3/drive/23/03/07/16/host-tools.tar.gz
cd /workspace/tpu-mlir
tar xvf host-tools.tar.gz
cd host-tools
export PATH=$PATH:$(pwd)/gcc/riscv64-linux-musl-x86_64/bin
# 如果是aarch64系统，设置交叉编译器到环境变量
#export PATH=$PATH:$(pwd)/gcc/gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu/bin

# 检测交叉编译
riscv64-unknown-linux-musl-gcc -v
Using built-in specs.
COLLECT_GCC=riscv64-unknown-linux-musl-gcc
COLLECT_LTO_WRAPPER=/home/dev/sg2000/cvi_mmf_sdk_test/host-tools/gcc/riscv64-linux-musl-x86_64/bin/
../libexec/gcc/riscv64-unknown-linux-musl/10.2.0/lto-wrapper
Target: riscv64-unknown-linux-musl
Configured with: /mnt/ssd/jenkins_iotsw/slave/workspace/Toolchain/build-gnu-riscv_4/
./source/riscv/riscv-gcc/configure #省略................
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.2.0 (Xuantie-900 linux-5.10.4 musl gcc Toolchain V2.6.1 B-20220906

4.3.3. 编译部署运行¶

获取配套例程文件：

# 获取配套例程文件并切换到mobilenetv2目录下
git clone https://gitee.com/LubanCat/lubancat_sg2000_application_code.git

编译部署例程：

# 切换到resnet18目录下
cd lubancat_sg2000_application_code/examples/resnet18
# 编译resnet18例程，请根据不同板卡系统设置参数，如果是arm64设置aarch64，如果是riscv64设置musl_riscv64
./build.sh -a musl_riscv64
-- CMAKE_C_COMPILER: riscv64-unknown-linux-musl-gcc
-- CMAKE_CXX_COMPILER: riscv64-unknown-linux-musl-g++
-- CMAKE_C_COMPILER: riscv64-unknown-linux-musl-gcc
-- CMAKE_CXX_COMPILER: riscv64-unknown-linux-musl-g++
-- The C compiler identification is GNU 10.2.0
-- The CXX compiler identification is GNU 10.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /home/dev/host-tools/gcc/riscv64-linux-musl-x86_64/bin/riscv64-unknown-linux-musl-gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /home/dev/host-tools/gcc/riscv64-linux-musl-x86_64/bin/riscv64-unknown-linux-musl-g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done (0.9s)
-- Generating done (0.0s)
-- Build files have been written to: /xxx/examples/resnet18/build/build_riscv_musl
[2/3] Install the project...
-- Install configuration: "RELEASE"
# 省略......

在当前目录install/build_riscv_musl目录下生成可执行文件resnet18。通过scp命令或者其他方式将该文件夹的内容传输到LubanCat-P1板卡。

scp -r install/  cat@192.168.103.142:~/

然后在板卡上执行程序：

# 如果是riscv系统需要设置下环境变量
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib64v0p7_xthead/lp64d

# 命令格式：./resnet18 cvimodel image.jpg label_file
root@lubancat:/home/cat/install# ./resnet18 resnet18_int8_sym_fused.cvimodel cat.jpg synset.txt
CVI_NN_RegisterModel succeeded
Input Tensor Number  : 1
[0] data_raw, shape (1,3,256,256), count 196608, fmt 7
Output Tensor Number : 1
[0] resnetv15_dense0_fwd_Gemm_f32, shape (1,1000,1,1), count 1000, fmt 0
CVI_NN_Forward succeeded
------
0.422913, idx 285, 'n02124075 Egyptian cat'
0.256510, idx 287, 'n02127052 lynx, catamount'
0.073491, idx 282, 'n02123159 tiger cat'
0.057235, idx 281, 'n02123045 tabby, tabby cat'
0.044575, idx 278, 'n02119789 kit fox, Vulpes macrotis'
------
CVI_NN_CleanupModel succeeded

可以看到resnet18模型的输入输出Tensor信息，最后输出模型推理前5个预测结果，其中第一个是预测概率最大的结果，即’Egyptian cat’。

4.4. 参考链接¶

https://github.com/onnx/models/blob/main/validated/vision/classification/resnet/README.md

https://github.com/milkv-duo/tpu-mlir.git