5. YOLOV3目标检测¶
YOLOV3 是一种目标检测算法,是YOLO目标检测算法的一个重要版本。 YOLOv3在YOLOv2的基础上,对网络的主干进行了改良, 利用多尺度特征图进行检测, 并改进了多个独立的Logistic回归分类器来取代softmax来预测类别分类。
yolov3论文实现使用Darknet,Darknet是一个基于C和CUDA的开源深度学习框架。 使用Darknet实现的yolov3代码仓库在 https://github.com/pjreddie/darknet/tree/master。
5.1. 环境准备¶
TPU-MLIR的环境搭建参考 前面章节 。
创建工作目录,获取教程测试模型文件:
# 在工作目录下创建一个yolov3
root@cfa2f8976af9:/workspace# mkdir yolov3 && cd yolov3
root@cfa2f8976af9:/workspace/yolov3#
拉取配套例程的模型文件:
# 获取yolov3_416.prototxt和yolov3_416.caffemodel文件
root@cfa2f8976af9:/workspace/yolov3# wget xxx //待加
可以自行获取Darknet模型(或者自行训练Darknet yolov3模型),然后将其转为caffe模型文件,该部分可以搜索下网络上相关文档。
然后复制量化所需的数据集文件到Reset18工作目录下(非必须):
# 拉取tpu-mlir源码,如果之前有拉取过,可以忽略
root@cfa2f8976af9:/workspace# git clone https://github.com/milkv-duo/tpu-mlir.git
root@cfa2f8976af9:/workspace# cd yolov3
root@cfa2f8976af9:/workspace/yolov3# cp -rf ../tpu-mlir/regression/dataset/COCO2017/ ./
5.2. 模型转换¶
教程测试的是416x416尺寸的yolov3模型,使用Caffe。 另外,需要注意是带后处理的yolov3模型,在yolov3_416.prototxt文件,添加有一个名称是output,类型为YoloDetection的层:
layer {
bottom: "layer106-conv"
bottom: "layer94-conv"
bottom: "layer82-conv"
top: "output"
name: "yolov3Detection"
type: "YoloDetection"
yolo_detection_param {
net_input_w: 416
net_input_h: 416
nms_threshold: 0.45
obj_threshold: 0.25
keep_topk: 200
class_num: 80
}
}
使用model_transform(model_transform.py)工具将模型转成MLIR模型:
# model_def指定指定模型定义文件,input_shapes指定输入形状,
# mean和scale指定均值和归一化参数,pixel_format指定输入图像格式,
# test_input指定验证的图像,mlir指定输出mlir模型文件名称和路径。
root@cfa2f8976af9:/workspace/tpu/yolov3# model_transform.py --model_name yolov3 \
--model_def ./yolov3_416.prototxt \
--model_data ./yolov3.caffemodel \
--test_input ./dog.jpg \
--test_result yolov3_top_output.npz \
--input_shapes [[1,3,416,416]] \
--resize_dims=416,416 \
--keep_aspect_ratio \
--mean 0.0,0.0,0.0 \
--scale 0.00392,0.00392,0.00392 \
--pixel_format rgb \
--tolerance 0.99,0.99 \
--mlir yolov3.mlir
# --excepts output \
# 省略..................................................
[layer106-conv ] SIMILAR [PASSED]
(1, 255, 52, 52) float32
cosine_similarity = 1.000000
euclidean_similarity = 0.999999
sqnr_similarity = 115.155401
[output ] CLOSE [PASSED]
(1, 1, 200, 6) float32
close order = 5
104 compared
104 passed
1 equal, 1 close, 102 similar
0 failed
0 not equal, 0 not similar
min_similiarity = (0.9999997019767761, 0.9999982038586644, 112.97511100769043)
Target yolov3_top_output.npz
Reference yolov3_ref_outputs.npz
npz compare PASSED.
compare output: 100%|█████████████████████████████████████████| 104/104 [00:07<00:00, 14.18it/s]
[Success]: npz_tool.py compare yolov3_top_output.npz yolov3_ref_outputs.npz --tolerance 0.99,0.99 --except - -vv
5.2.1. INT8量化模型¶
运行run_calibration(run_calibration.py)得到校准表,输入数据使用前面复制的100张来自COCO2017的图片,执行命令:
root@cfa2f8976af9:/workspace/tpu/yolov3# run_calibration yolov3.mlir \
--dataset ./COCO2017/ \
--input_num 100 \
-o yolov3_cali_table
TPU-MLIR v1.8.1-20240712
GmemAllocator use FitFirstAssign
reused mem is 11075584, all mem is 132385491
2024/07/25 11:32:42 - INFO :
load_config Preprocess args :
resize_dims : [416, 416]
keep_aspect_ratio : True
keep_ratio_mode : letterbox
pad_value : 0
pad_type : center
input_dims : [416, 416]
--------------------------
mean : [0.0, 0.0, 0.0]
scale : [0.00392, 0.00392, 0.00392]
--------------------------
pixel_format : rgb
channel_format : nchw
last input data (idx=100) not valid, droped
input_num = 100, ref = 100
real input_num = 100
activation_collect_and_calc_th for op: layer107: 100%|████████████████████████████████████████████████████| 248/248 [02:56<00:00, 1.41it/s]
[2048] threshold: layer107: 100%|█████████████████████████████████████████████████████████████████| 248/248 [00:00<00:00, 846.23it/s]
[2048] threshold: layer107: 100%|████████████████████████████████████████████████████████████████| 248/248 [00:02<00:00, 89.39it/s]
GmemAllocator use FitFirstAssign
reused mem is 11075584, all mem is 132385491
GmemAllocator use FitFirstAssign
reused mem is 11075584, all mem is 132385491
prepare data from 100
tune op: layer107: 100%|██████████████████████████████████████████| 248/248 [04:50<00:00, 1.17s/it]
auto tune end, run time:294.8394968509674
然后使用model_deploy(model_deploy.py)工具将MLIR模型转成INT8 cvimodel模型:
root@cfa2f8976af9:/workspace/tpu/yolov3# model_deploy.py \
--mlir yolov3.mlir \
--calibration_table yolov3_calibration_table \
--chip cv181x \
--quantize INT8 \
--quant_input \
--test_input ./dog.jpg \
--test_reference yolov3_top_output.npz \
--excepts output \
--tolerance 0.9,0.3 \
--fuse_preprocess \
--customization_format RGB_PLANAR \
--model yolov3_416.cvimodel
# 省略.................................................
[Success]: tpuc-opt yolov3_cv181x_int8_sym_final.mlir --codegen="model_file=yolov3_416.cvimodel
embed_debug_info=false model_version=latest" -o /dev/null
[CMD]: model_runner.py --input yolov3_in_ori.npz --model yolov3_416.cvimodel --output yolov3_cv181x_int8_sym_model_outputs.npz
setenv:cv181x
Start TPU Simulator for cv181x
device[0] opened, 4294967296
version: 1.4.0
yolov3 Build at 2024-07-25 11:46:21 For platform cv181x
Cmodel: bm_load_cmdbuf
Max SharedMem size:8306688
Cmodel: bm_run_cmdbuf
device[0] closed
[Running]: npz_tool.py compare yolov3_cv181x_int8_sym_model_outputs.npz yolov3_cv181x_int8_sym_tpu_outputs.npz
--tolerance 0.99,0.90 --except output -vv
compare output: 0%| | 0/1 [00:00<?, ?it/s][output ] EQUAL [PASSED]
(1, 1, 200, 6) float32
1 compared
1 passed
1 equal, 0 close, 0 similar
0 failed
0 not equal, 0 not similar
min_similiarity = (1.0, 1.0, inf)
Target yolov3_cv181x_int8_sym_model_outputs.npz
Reference yolov3_cv181x_int8_sym_tpu_outputs.npz
npz compare PASSED.
compare output: 100%|█████████████████████████████████████| 1/1 [00:00<00:00, 32.32it/s]
[Success]: npz_tool.py compare yolov3_cv181x_int8_sym_model_outputs.npz yolov3_cv181x_int8_sym_tpu_outputs.npz
--tolerance 0.99,0.90 --except output -vv
# 输出模型文件yolov3_416.cvimodel,将其复制到板卡上,进行后部署测试
其中设置了参数fuse_preprocess,是将预处理融入到模型中,如果没有设置,在模型推理预处理阶段需要进行相应的量化等操作。
5.3. 部署测试¶
部署运行使用tpu-sdk库,相关API参考下 TPU SDK 开发资料汇总 中的CVITEK_TPU_SDK开发指南。
5.3.1. 部分例程解析¶
yolov3模型转换设置了–fuse_preprocess参数,输入图像预处理不需要结果归一化等操作,但是需要输入rgb nchw格式数据。 另外,yolov3模型加了 YoloDetection 层,一些后处理直接加在模型中,部署例程中后处理部分不需要进行bbox解码和nms等操作 ,只需简单处理就输出检测结果。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 | // 省略...............
// opencv 读取图片
cv::Mat image;
image = cv::imread(argv[2]);
if (!image.data) {
printf("Could not open or find the image\n");
return -1;
}
cv::Mat cloned = image.clone();
detection dets[MAX_DET];
int32_t det_num = 0;
/* 预处理 */
int ih = image.rows;
int iw = image.cols;
int oh = height;
int ow = width;
double scale = std::min((double)oh / ih, (double)ow / iw);
int nh = (int)(ih * scale);
int nw = (int)(iw * scale);
// resize & letterbox
cv::resize(image, image, cv::Size(nw, nh));
int top = (oh - nh) / 2;
int bottom = (oh - nh) - top;
int left = (ow - nw) / 2;
int right = (ow - nw) - left;
cv::copyMakeBorder(image, image, top, bottom, left, right, cv::BORDER_CONSTANT,
cv::Scalar::all(0));
cv::cvtColor(image, image, cv::COLOR_BGR2RGB);
//Packed2Planar
cv::Mat channels[3];
for (int i = 0; i < 3; i++) {
channels[i] = cv::Mat(image.rows, image.cols, CV_8SC1);
}
cv::split(image, channels);
// fill data
int8_t *ptr = (int8_t *)CVI_NN_TensorPtr(input);
int channel_size = height * width;
for (int i = 0; i < 3; ++i) {
memcpy(ptr + i * channel_size, channels[i].data, channel_size);
}
/* 模型推理 */
gettimeofday(&start_time, NULL);
CVI_NN_Forward(model, input_tensors, input_num, output_tensors, output_num);
gettimeofday(&stop_time, NULL);
printf("CVI_NN_Forward using %f ms\n", (__get_us(stop_time) - __get_us(start_time)) / 1000);
printf("CVI_NN_Forward succeeded\n");
/* 后处理 */
float *output_ptr = (float *)CVI_NN_TensorPtr(output);
for (int i = 0; i < MAX_DET; ++i) {
// filter real det with score > 0
if (output_ptr[i * 6 + 5] > 0) {
// output: [x,y,w,h,cls,score]
dets[det_num].bbox.x = output_ptr[i * 6 + 0];
dets[det_num].bbox.y = output_ptr[i * 6 + 1];
dets[det_num].bbox.w = output_ptr[i * 6 + 2];
dets[det_num].bbox.h = output_ptr[i * 6 + 3];
dets[det_num].cls = output_ptr[i * 6 + 4];
dets[det_num].score = output_ptr[i * 6 + 5];
det_num++;
}
}
printf("get detection num: %d\n", det_num);
// correct box with origin image size
int restored_w = 0;
int restored_h = 0;
bool relative_position = false;
if (((float)width / cloned.cols) < ((float)height / cloned.rows)) {
restored_w = width;
restored_h = (cloned.rows * width) / cloned.cols;
} else {
restored_h = height;
restored_w = (cloned.cols * height) / cloned.rows;
}
for (int i = 0; i < det_num; ++i) {
box b = dets[i].bbox;
b.x = (b.x - (width - restored_w) / 2. / width) /
((float)restored_w / width);
b.y = (b.y - (height - restored_h) / 2. / height) /
((float)restored_h / height);
b.w *= (float)width / restored_w;
b.h *= (float)height / restored_h;
if (!relative_position) {
b.x *= cloned.cols;
b.w *= cloned.cols;
b.y *= cloned.rows;
b.h *= cloned.rows;
}
dets[i].bbox = b;
}
/* 绘制结果框并输出信息 */
printf("-------------------\n");
for (int i = 0; i < det_num; i++) {
box b = dets[i].bbox;
// xywh2xyxy
int x1 = (b.x - b.w / 2);
int y1 = (b.y - b.h / 2);
int x2 = (b.x + b.w / 2);
int y2 = (b.y + b.h / 2);
printf("%s @ (%d %d %d %d) %.3f\n", coco_names[dets[i].cls], x1,y1,x2,y2,dets[i].score);
cv::rectangle(cloned, cv::Point(x1, y1), cv::Point(x2, y2), cv::Scalar(255, 255, 0),
3, 8, 0);
cv::putText(cloned, coco_names[dets[i].cls], cv::Point(x1, y1),
cv::FONT_HERSHEY_DUPLEX, 1.0, cv::Scalar(0, 0, 255), 2);
}
printf("-------------------\n");
// save or show picture
cv::imwrite(argv[3], cloned);
CVI_NN_CleanupModel(model);
printf("CVI_NN_CleanupModel succeeded\n");
return 0;
}
|
5.3.2. 编译并测试例程¶
执行下面命令, 根据部署的板卡系统选择交叉编译器,如果是arm64设置aarch64-linux-gnu,如果是riscv64设置riscv64-linux-musl-x86_64 。
# 获取交叉编译器(如果前面测试获取了交叉编译器,就不需要),教程测试rsiv64系统,并设置交叉编译器到环境变量
wget https://sophon-file.sophon.cn/sophon-prod-s3/drive/23/03/07/16/host-tools.tar.gz
cd /workspace/tpu-mlir
tar xvf host-tools.tar.gz
cd host-tools
export PATH=$PATH:$(pwd)/gcc/riscv64-linux-musl-x86_64/bin
# 如果是aarch64系统,设置交叉编译器到环境变量
#export PATH=$PATH:$(pwd)/gcc/gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu/bin
# 检测交叉编译
riscv64-unknown-linux-musl-gcc -v
Using built-in specs.
COLLECT_GCC=riscv64-unknown-linux-musl-gcc
COLLECT_LTO_WRAPPER=/home/dev/sg2000/cvi_mmf_sdk_test/host-tools/gcc/riscv64-linux-musl-x86_64/bin/
../libexec/gcc/riscv64-unknown-linux-musl/10.2.0/lto-wrapper
Target: riscv64-unknown-linux-musl
Configured with: /mnt/ssd/jenkins_iotsw/slave/workspace/Toolchain/build-gnu-riscv_4/
./source/riscv/riscv-gcc/configure #省略................
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.2.0 (Xuantie-900 linux-5.10.4 musl gcc Toolchain V2.6.1 B-20220906
获取配套例程文件:
# 获取配套例程文件并切换到yolov3目录下
git clone https://gitee.com/LubanCat/lubancat_sg2000_application_code.git
编译部署例程:
# 切换到yolov3目标检测例程目录
cd lubancat_sg2000_application_code/examples/yolov3
# 编译yolov3例程,请根据不同板卡系统设置-a参数,如果是arm64设置aarch64,如果是riscv64设置musl_riscv64
./build.sh -a musl_riscv64
-- CMAKE_C_COMPILER: riscv64-unknown-linux-musl-gcc
-- CMAKE_CXX_COMPILER: riscv64-unknown-linux-musl-g++
-- CMAKE_C_COMPILER: riscv64-unknown-linux-musl-gcc
-- CMAKE_CXX_COMPILER: riscv64-unknown-linux-musl-g++
-- The C compiler identification is GNU 10.2.0
-- The CXX compiler identification is GNU 10.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /home/dev/host-tools/gcc/riscv64-linux-musl-x86_64/bin/riscv64-unknown-linux-musl-gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /home/dev//host-tools/gcc/riscv64-linux-musl-x86_64/bin/riscv64-unknown-linux-musl-g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done (0.9s)
-- Generating done (0.0s)
-- Build files have been written to: /home/dev/xxx/samples/yolov3/build/build_riscv_musl
[2/3] Install the project...
-- Install configuration: "RELEASE"
# 省略.................
在板卡上执行程序,推理测试:
# 如果是riscv系统需要设置下环境变量
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib64v0p7_xthead/lp64d
# 运行程序指定cvimodel模型路径,测试图片路径,输出结果图片路径
# ./yolov3 cvimodel image.jpg image_detected.jpg
root@lubancat:/home/cat# ./yolov3 yolov3_416.cvimodel dog.jpg out.jpg
version: 1.4.0
yolov3 Build at 2024-07-25 11:46:21 For platform cv181x
Max SharedMem size:8306688
CVI_NN_RegisterModel succeeded
Input Tensor Number : 1
[0] data_raw, shape (1,3,416,416), count 519168, fmt 7
Output Tensor Number : 1
[0] layer107, shape (1,1,200,6), count 1200, fmt 0
CVI_NN_Forward using 295.636000 ms
CVI_NN_Forward succeeded
get detection num: 3
-------------------
truck @ (474 74 688 178) 0.950
bicycle @ (179 136 544 425) 0.995
dog @ (132 237 312 525) 0.984
-------------------
CVI_NN_CleanupModel succeeded
推理输出模型的输入输出Tensor信息,输出目标检测信息等等,结果图片保存在当前目录下out.jpg。