6. YOLOV5目标检测¶

YOLOV5 是一种目标检测算法，属于单阶段目标检测方法，是在COCO数据集上预训练的物体检测架构和模型系列。最新的YOLOv5 v7.0有YOLOv5n、YOLOv5s、YOLOv5m、YOLOv5l、YOLOv5x等，除了目标检测，还有分割，分类等应用场景。

6.1. 环境准备¶

教程测试使用conda创建yolov5 PyTorch环境, Anaconda安装可以从官网网页直接下载，打开https://www.anaconda.com/download#downloads ， Pytorch的安装参考下https://pytorch.org/ 。

下面测试yolov5的安装过程，使用conda创建环境，使用pip安装依赖库，使用python命令行检测安装的环境：

# 安装anaconda参考前面环境搭建教程，然后使用conda命令创建环境
conda create -n yolov5 python=3.9
conda activate yolov5

git clone https://github.com/ultralytics/yolov5
cd yolov5

# 安装依赖库
pip3 install -r requirements.txt

# 进入python命令行，检测安装的环境
(yolov5) llh@anhao:~yolov5$ python
Python 3.9.18 (main, Sep 11 2023, 13:41:44)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> import utils
>>> display=utils.notebook_init()
Checking setup...
YOLOv5 🚀 v7.0-253-g63555c8 Python-3.9.18 torch-2.1.2+cu121 CUDA:0 (NVIDIA GeForce RTX 3060 Laptop GPU, 6144MiB)
Setup complete ✅ (12 CPUs, 15.6 GB RAM, 142.6/1006.9 GB disk)

TPU-MLIR开发环境的创建，参考下前面TPU-MLIR开发环境章节。

6.2. 模型转换¶

yolov5模型转换前，需要重新导出下模型，主要是更改输出节点，部分处理放到后处理，下面的修改示例仅供参考，可以自行修改输出简单，修改后需要编写对应的后处理。

6.2.1. 导出模型文件¶

下面以测试yolov5s模型为例，获取yolov5s.pt模型文件：

wget https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5s.pt

进入前面创建的yolov5环境，拉取yolov5工程源码：

conda activate yolov5

git clone https://github.com/ultralytics/yolov5

然后找到yolov5/models/yolo.py文件，修改Detect类的forward函数：

yolo.py¶

# def forward(self, x):
#     """Processes input through YOLOv5 layers, altering shape for detection: `x(bs, 3, ny, nx, 85)`."""
#     z = []  # inference output
#     for i in range(self.nl):
#         x[i] = self.m[i](x[i])  # conv
#         bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)
#         x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()

#         if not self.training:  # inference
#             if self.dynamic or self.grid[i].shape[2:4] != x[i].shape[2:4]:
#                 self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)

#             if isinstance(self, Segment):  # (boxes + masks)
#                 xy, wh, conf, mask = x[i].split((2, 2, self.nc + 1, self.no - self.nc - 5), 4)
#                 xy = (xy.sigmoid() * 2 + self.grid[i]) * self.stride[i]  # xy
#                 wh = (wh.sigmoid() * 2) ** 2 * self.anchor_grid[i]  # wh
#                 y = torch.cat((xy, wh, conf.sigmoid(), mask), 4)
#             else:  # Detect (boxes only)
#                 xy, wh, conf = x[i].sigmoid().split((2, 2, self.nc + 1), 4)
#                 xy = (xy * 2 + self.grid[i]) * self.stride[i]  # xy
#                 wh = (wh * 2) ** 2 * self.anchor_grid[i]  # wh
#                 y = torch.cat((xy, wh, conf), 4)
#             z.append(y.view(bs, self.na * nx * ny, self.no))

#     return x if self.training else (torch.cat(z, 1),) if self.export else (torch.cat(z, 1), x)

def forward(self, x):
    z = []  # inference output
    for i in range(self.nl):
        z.append(self.m[i](x[i]))

    return z

修改export.py文件中的run函数：

export.py¶

for _ in range(2):
    y = model(im)  # dry runs
if half and not coreml:
    im, model = im.half(), model.half()  # to FP16
# shape = tuple((y[0] if isinstance(y, tuple) else y).shape)  # model output shape
shape = tuple((y[0] if (isinstance(y, tuple) or (isinstance(y, list))) else y).shape)  # model output shape
metadata = {"stride": int(max(model.stride)), "names": model.names}  # model metadata
LOGGER.info(f"\n{colorstr('PyTorch:')} starting from {file} with output shape {shape} ({file_size(file):.1f} MB)")

然后执行命令模型导出模型：

# 设置yolov5s.pt模型的路径，导出yolov5s.onnx模型
python3 export.py --weights ./yolov5s.pt --include onnx

以上是教程测试的导出方式（一些模型处理放到部署例程的后处理），也可以自行更改其他方式的导出，只是需要注意后面模型部署的后处理也要修改。

6.2.2. 转换模型为MLIR¶

进入TPU-MLIR开发环境，Docker环境，创建一个yolov5s工作目录，然后复制量化和测试的图片到工作目录下。

# 获取tpu-mlir源码
root@cfa2f8976af9:/workspace# git clone https://github.com/milkv-duo/tpu-mlir.git

# 创建yolov5s工作目录
root@cfa2f8976af9:/workspace# mkdir yolov5s && cd yolov5s
root@cfa2f8976af9:/workspace/yolov5s#

# 复制tpu-mlir/regression目录下COCO数据集图像和测试图片
root@cfa2f8976af9:/workspace/yolov5s# cp -r ../tpu-mlir/regression/dataset/COCO2017  ./
root@cfa2f8976af9:/workspace/yolov5s# cp ../tpu-mlir/regression/image/dog.jpg ./

复制前面导出使用yolov5s.onnx模型到当前工作目录，然后使用model_transform(model_transform.py)工具将模型转成MLIR模型。

# model_def指定指定模型定义文件，input_shapes指定输入形状，
# mean和scale指定均值和归一化参数，pixel_format指定输入图像格式
# test_input指定验证的图像，test_result指定验证的结果文件，mlir指定输出mlir模型文件名称和路径。
root@cfa2f8976af9:/workspace/yolov5s# model_transform \
                                    --model_name yolov5s \
                                    --model_def ./yolov5s.onnx \
                                    --input_shapes [[1,3,640,640]] \
                                    --mean 0.0,0.0,0.0 \
                                    --scale 0.0039216,0.0039216,0.0039216 \
                                    --keep_aspect_ratio \
                                    --pixel_format rgb \
                                    --test_input ../dog.jpg \
                                    --test_result yolov5s_top_outputs.npz \
                                    --mlir yolov5s.mlir
# 省略.....................
[359_Conv                        ]      SIMILAR [PASSED]
    (1, 255, 20, 20) float32
    cosine_similarity      = 1.000000
    euclidean_similarity   = 1.000000
    sqnr_similarity        = 125.106964
142 compared
142 passed
0 equal, 2 close, 140 similar
0 failed
0 not equal, 0 not similar
min_similiarity = (0.9999997615814209, 0.999998570296626, 116.59041404724121)
Target    yolov5s_top_outputs.npz
Reference yolov5s_ref_outputs.npz
npz compare PASSED.
compare 359_Conv: 100%|███████████████████████████| 142/142 [00:09<00:00, 15.64it/s]
[Success]: npz_tool.py compare yolov5s_top_outputs.npz yolov5s_ref_outputs.npz --tolerance 0.99,0.99 --except - -vv

生成yolov5s.mlir等文件，保存在yolov5s工作目录下。

6.2.3. 量化模型¶

运行run_calibration(run_calibration.py)得到校准表，测试模型使用的是COCO数据集，所以量化使用100张来自COCO2017的图片，执行命令：

root@cfa2f8976af9:/workspace/yolov5s# run_calibration yolov5s.mlir \
                                    --dataset ./COCO2017/ \
                                    --input_num 100 \
                                    -o yolov5s_cali_table
TPU-MLIR v1.8.1-20240712
GmemAllocator use FitFirstAssign
reused mem is 6553600, all mem is 61738800
2024/07/23 17:45:06 - INFO :
load_config Preprocess args :
        resize_dims           : [640, 640]
        keep_aspect_ratio     : True
        keep_ratio_mode       : letterbox
        pad_value             : 0
        pad_type              : center
        input_dims            : [640, 640]
        --------------------------
        mean                  : [0.0, 0.0, 0.0]
        scale                 : [0.0039216, 0.0039216, 0.0039216]
        --------------------------
        pixel_format          : rgb
        channel_format        : nchw

last input data (idx=100) not valid, droped
input_num = 100, ref = 100
real input_num = 100
activation_collect_and_calc_th for op: 359_Conv: 100%|█████████████████████████████████| 143/143 [01:35<00:00,  1.49it/s]
[2048] threshold: 359_Conv: 100%|██████████████████████████████████████████████████████| 143/143 [00:00<00:00, 745.94it/s]
[2048] threshold: 359_Conv: 100%|██████████████████████████████████████████████████| 143/143 [00:01<00:00, 88.21it/s]
GmemAllocator use FitFirstAssign
reused mem is 6553600, all mem is 61738800
GmemAllocator use FitFirstAssign
reused mem is 6553600, all mem is 61738800
prepare data from 100
tune op: 359_Conv: 100%|██████████████████| 143/143 [02:38<00:00,  1.11s/it]
auto tune end, run time:159.4181764125824

然后使用model_deploy(model_deploy.py)工具将MLIR模型转成INT8 cvimodel模型：

# 转成INT8量化模型，执行命令
# calibration_table指定前面生成的校准表，tolerance表示MLIR量化后的结果与MLIR推理结果相似度的误差容忍度
# model指定输出的cvimodel模型文件名称和路径
root@cfa2f8976af9:/workspace/yolov5s# model_deploy.py --mlir yolov5s.mlir --quantize INT8 --calibration_table ./yolov5s_cali_table \
--chip cv181x --fuse_preprocess  --test_input ./dog.jpg  \
--test_reference yolov5s_top_outputs.npz  --tolerance 0.85,0.45  --model yolov5s_int8_sym.cvimodel
# 省略..............................................
Cmodel: bm_run_cmdbuf
device[0] closed
[Running]: npz_tool.py compare yolov5s_cv181x_int8_sym_model_outputs.npz yolov5s_cv181x_int8_sym_tpu_outputs.npz
--tolerance 0.99,0.90 --except - -vv
compare 359_Conv_f32:  67%|████████████████████████████████████ | 2/3 [00:00<00:00, 983.65it/s][output0_Conv_f32 ] EQUAL [PASSED]
    (1, 255, 80, 80) float32
[358_Conv_f32                    ]        EQUAL [PASSED]
    (1, 255, 40, 40) float32
[359_Conv_f32                    ]        EQUAL [PASSED]
    (1, 255, 20, 20) float32
3 compared
3 passed
3 equal, 0 close, 0 similar
0 failed
0 not equal, 0 not similar
min_similiarity = (1.0, 1.0, inf)
Target    yolov5s_cv181x_int8_sym_model_outputs.npz
Reference yolov5s_cv181x_int8_sym_tpu_outputs.npz
npz compare PASSED.
compare 359_Conv_f32: 100%|███████████████████████████████████████████████| 3/3 [00:00<00:00, 10.10it/s]
[Success]: npz_tool.py compare yolov5s_cv181x_int8_sym_model_outputs.npz yolov5s_cv181x_int8_sym_tpu_outputs.npz
--tolerance 0.99,0.90 --except - -vv

# 运行完成后,会在当前目录下生成模型文件yolov5s_int8_sym.cvimodel。

其中设置了参数fuse_preprocess，是将预处理融入到模型中，如果没有设置，在模型推理预处理阶段需要进行相应的量化等操作。

6.3. 板卡部署推理¶

获取配套例程文件：

# 获取配套例程文件并切换到yolov5目录下
git clone https://gitee.com/LubanCat/lubancat_sg2000_application_code.git
cd lubancat_sg2000_application_code/examples/yolov5

6.3.1. 部署推理和后处理¶

教程测试测试的是尺寸大小为s，batch为1，输入图像为640*640的yolov5模型，并且在模型转换和量化前进行了重新导出。

模型输出特征图尺度为80x80、40x40和20x20的三个特征图，每个位置共3个anchor，因此输出特征图通道为3x(5+80)=255，也就是模型输出(1, 255, 80, 80)、(1, 255, 40, 40)、(1, 255, 20, 20)。

后处理阶段：

类别预测分支和目标分支需要进行sigmoid计算，通过conf_threshold阈值过滤目标分值，低于分值的检测框去掉，再用conf_threshold过滤之前过滤后的类别预测分值
对于符合的过滤框，保存过滤框的目标分值和类别分值的乘积，以及进行bbox解码成xywh格式
再通过nms对检测框进行非极大值抑制，去除重复的检测框，在这过程中通过nms_threshold阈值进行过滤，还设置OBJ_NUMB_MAX_SIZE限制过滤后的检测框数目最大为该值，

最后将剩下的检测框还原到网络输出前的原图尺度

yolov5.cpp¶

// 省略........................
int main(int argc, char **argv) {
if (argc != 3) {
    usage(argv);
    exit(-1);
}

cvi_app_context_t cvi_yolov5_ctx;
memset(&cvi_yolov5_ctx, 0, sizeof(cvi_app_context_t));

// load model file
const char *model_file = argv[1];
CVI_MODEL_HANDLE model = nullptr;
int ret = CVI_NN_RegisterModel(model_file, &model);
if (CVI_RC_SUCCESS != ret) {
    printf("CVI_NN_RegisterModel failed, err %d\n", ret);
    exit(1);
}
printf("CVI_NN_RegisterModel succeeded\n");

// get input output tensors
CVI_NN_GetInputOutputTensors(model, &cvi_yolov5_ctx.input_tensors, &cvi_yolov5_ctx.input_num, &cvi_yolov5_ctx.output_tensors,
                            &cvi_yolov5_ctx.output_num);
printf("Input Tensor Number  : %d\n", cvi_yolov5_ctx.input_num);
dump_tensors(cvi_yolov5_ctx.input_tensors, cvi_yolov5_ctx.input_num);
printf("Output Tensor Number : %d\n", cvi_yolov5_ctx.output_num);
dump_tensors(cvi_yolov5_ctx.output_tensors, cvi_yolov5_ctx.output_num);

CVI_TENSOR *input = CVI_NN_GetTensorByName(CVI_NN_DEFAULT_TENSOR, cvi_yolov5_ctx.input_tensors, cvi_yolov5_ctx.input_num);
assert(input);

CVI_SHAPE shape = CVI_NN_TensorShape(input);
cvi_yolov5_ctx.model_height = shape.dim[2];
cvi_yolov5_ctx.model_width = shape.dim[3];
printf("model_height:%d  model_width:%d \n", cvi_yolov5_ctx.model_height, cvi_yolov5_ctx.model_width);

// imread
cv::Mat image;
image = cv::imread(argv[2]);
if (!image.data) {
    printf("Could not open or find the image\n");
    return -1;
}

// preprocess: letterbox,resize,bgr2rgb
cv::Mat resized_img = letterbox(&cvi_yolov5_ctx, image, IMG_RESIZE_DIMS, IMG_RESIZE_DIMS);
cv::cvtColor(resized_img, resized_img, cv::COLOR_BGR2RGB);
printf("resized_img: %d, %d \n",resized_img.cols, resized_img.rows);

// Packed2Planar (rgb nchw)
cv::Mat channels[3];
for (int i = 0; i < 3; i++) {
    channels[i] = cv::Mat(resized_img.rows, resized_img.cols, CV_8SC1);
}
cv::split(resized_img, channels);

int8_t *ptr = (int8_t *)CVI_NN_TensorPtr(input);
int channel_size = cvi_yolov5_ctx.model_height * cvi_yolov5_ctx.model_width;
for (int i = 0; i < 3; ++i) {
    memcpy(ptr + i * channel_size, channels[i].data, channel_size);
}

// 模型推理
CVI_NN_Forward(model, cvi_yolov5_ctx.input_tensors, cvi_yolov5_ctx.input_num, cvi_yolov5_ctx.output_tensors, cvi_yolov5_ctx.output_num);
printf("CVI_NN_Forward succeeded\n");

// 后处理
const float nms_threshold = 0.45;      // 默认的NMS阈值
const float box_conf_threshold = 0.25; // 默认的置信度阈值
object_detect_result_list od_results;
postprocess(&cvi_yolov5_ctx, box_conf_threshold, nms_threshold, &od_results);

// 在原图上画检测框
char text[256];
printf("%d objects are detected: \n", od_results.count);
for (int i = 0; i < od_results.count; i++)
{
    object_detect_result *det_result = &(od_results.results[i]);
    printf(" %s @ (%f %f %f %f) %.3f\n", class_names[det_result->cls_id],
            det_result->rect.x, det_result->rect.y,
            det_result->rect.width, det_result->rect.height,
            det_result->prop);
    int x1 = det_result->rect.x;
    int y1 = det_result->rect.y;

    cv::rectangle(image, det_result->rect, cv::Scalar(255, 0, 0));
    sprintf(text, "%s %.1f%%", class_names[det_result->cls_id], det_result->prop * 100);
    cv::putText(image, text, cv::Point(x1, y1 - 6), cv::FONT_HERSHEY_DUPLEX, 0.7, cv::Scalar(0,0,255), 1, cv::LINE_AA);
}

// save
cv::imwrite("./out.jpg", image);

CVI_NN_CleanupModel(model);
return 0;
}

6.3.2. 编译部署例程¶

执行下面命令设置交叉编译器（教程测试板卡默认是rsiv64系统）， 根据部署的板卡系统选择交叉编译器，如果是arm64设置aarch64-linux-gnu，如果是riscv64设置riscv64-linux-musl-x86_64 。

# 获取交叉编译器(如果前面测试获取了交叉编译器，就不需要),教程测试rsiv64系统，并设置交叉编译器到环境变量
wget https://sophon-file.sophon.cn/sophon-prod-s3/drive/23/03/07/16/host-tools.tar.gz
cd /workspace/tpu-mlir
tar xvf host-tools.tar.gz
cd host-tools
export PATH=$PATH:$(pwd)/gcc/riscv64-linux-musl-x86_64/bin
# 如果是aarch64系统，设置交叉编译器到环境变量
#export PATH=$PATH:$(pwd)/gcc/gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu/bin

# 检测交叉编译
riscv64-unknown-linux-musl-gcc -v
Using built-in specs.
COLLECT_GCC=riscv64-unknown-linux-musl-gcc
COLLECT_LTO_WRAPPER=/home/dev/sg2000/cvi_mmf_sdk_test/host-tools/gcc/riscv64-linux-musl-x86_64/bin/
../libexec/gcc/riscv64-unknown-linux-musl/10.2.0/lto-wrapper
Target: riscv64-unknown-linux-musl
Configured with: /mnt/ssd/jenkins_iotsw/slave/workspace/Toolchain/build-gnu-riscv_4/
./source/riscv/riscv-gcc/configure #省略................
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.2.0 (Xuantie-900 linux-5.10.4 musl gcc Toolchain V2.6.1 B-20220906

获取配套例程文件：

# 获取配套例程文件并切换到yolov5目录下
git clone https://gitee.com/LubanCat/lubancat_sg2000_application_code.git

编译部署例程：

# 切换到yolov5例程
cd lubancat_sg2000_application_code/examples/yolov5

# 编译yolov5例程，请根据不同板卡系统设置-a参数，如果是arm64设置aarch64，如果是riscv64设置musl_riscv64
./build.sh -a musl_riscv64
-- CMAKE_C_COMPILER: riscv64-unknown-linux-musl-gcc
-- CMAKE_CXX_COMPILER: riscv64-unknown-linux-musl-g++
-- CMAKE_C_COMPILER: riscv64-unknown-linux-musl-gcc
-- CMAKE_CXX_COMPILER: riscv64-unknown-linux-musl-g++
-- The C compiler identification is GNU 10.2.0
-- The CXX compiler identification is GNU 10.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /home/dev/host-tools/gcc/riscv64-linux-musl-x86_64/bin/riscv64-unknown-linux-musl-gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /home/dev/host-tools/gcc/riscv64-linux-musl-x86_64/bin/riscv64-unknown-linux-musl-g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done (0.9s)
-- Generating done (0.0s)
-- Build files have been written to: /home/dev/xxx/examples/yolov5/build/build_riscv_musl
[2/3] Install the project...
-- Install configuration: "RELEASE"
# 省略.................

复制可执行文件到板卡：

# 通过scp复制可执行程序yolov5到板卡
scp -r install/install_riscv_musl/yolov5  cat@192.168.103.142:~/

模型推理测试：

# 如果是riscv系统需要设置下环境变量
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib64v0p7_xthead/lp64d

root@lubancat:/home/cat# ./yolov5 yolov5s_int8_sym.cvimodel cat.jpg
version: 1.4.0
yolov5s Build at 2024-07-15 16:09:38 For platform cv181x
Max SharedMem size:5734400
CVI_NN_RegisterModel succeeded
Input Tensor Number  : 1
[0] images_raw, shape (1,3,640,640), count 1228800, fmt 7
Output Tensor Number : 3
[0] output0_Conv_f32, shape (1,255,80,80), count 1632000, fmt 0
[1] 358_Conv_f32, shape (1,255,40,40), count 408000, fmt 0
[2] 359_Conv_f32, shape (1,255,20,20), count 102000, fmt 0
model_height:640  model_width:640
resized_img: 640, 640
CVI_NN_Forward succeeded
dog @ (134.000000 218.000000 174.000000 325.000000) 0.923
car @ (475.000000 70.000000 209.000000 105.000000) 0.677
bicycle @ (168.000000 115.000000 397.000000 306.000000) 0.608

可以看到模型的输入输出Tensor信息，最后输出模型推理前3个预测结果，dog的概率是0.923等等，结果图片保存为out.jpg，通过scp命令（或者其他方式）传输到PC上查看：

6.4. 参考链接¶

https://github.com/ultralytics/yolov5

https://github.com/sophgo/cviruntime