7. YOLOv5

Yolov5 is a target detection algorithm that belongs to a single-stage target detection method. It is an object detection architecture and model series pre-trained on the COCO dataset. It represents Ultralytics’ open source research on future visual AI methods. It contains lessons learned and best practices that have been developed through thousands of hours of research and development. The latest YOLOv5 v7.0 includes YOLOv5n, YOLOv5s, YOLOv5m, YOLOv5l, YOLOv5x, etc. In addition to target detection, there are also application scenarios such as segmentation and classification.

The basic principle of YOLOv5 is simply: divide the whole picture into several networks, each grid predicts the type and position information of the object in the grid, and then performs the calculation according to the intersection ratio between the predicted frame and the real frame. The screening of the target box, and finally output the prediction box.

This chapter will simply use YOLOv5, and simply deploy and test it on the Lubancat board.

提示

Test environment: Lubancat RK board system is Debian10, PC is WSL2 (ubuntu20.04), PyTorch is 1.10.1, rknn-Toolkit2 version 1.4.0, YOLOv5 v7.0, airockchip/yolov5 v6.2.

7.1. YOLOv5 environment installation

Install python and other environments, as well as related dependent libraries, and then clone the source code of the YOLOv5 warehouse.

# Install python and other environments, as well as related dependent libraries, and then clone the source code of the YOLOv5 warehouse.
sudo python3 -m venv .yolov5_env
source .toolkit2_env/bin/activate

# Pull the latest yolov5 warehouse, pytorch version
git clone https://github.com/ultralytics/yolov5
cd yolov5

# Install dependent libraries
pip3 install -r requirements.txt

# Enter the python command line to detect the installed environment
import torch
import utils
display = utils.notebook_init()

# The test environment here shows:
Checking setup...
YOLOv5 🚀 2023-2-20 Python-3.8.10 torch-1.10.1+cpu CPU
Setup complete(6 CPUs, 12.4 GB RAM, 77.3/251.0 GB disk)

7.2. YOLOv5 is easy to use

7.2.1. Get the pre-trained weights file

Download yolov5s.pt, yolov5m.pt, yolov5l.pt, yolov5x.pt weight files, which can be directly obtained from here. Among them, n, s, m, l, and x represent the width and depth of the network, and the smallest is n, which has the fastest speed and the lowest precision.

7.2.2. YOLOv5 simple test

Enter the yolov5 source code directory, put the previously downloaded weight file in the current directory, and the two test pictures are located in /data/images/

# simple test
# --source specifies the test data, which can be pictures or videos, etc.
# --weights specifies the weight file path, which can be trained by yourself, or directly use the official website.
sudo python3 detect.py --source ./data/images/ --weights yolov5s.pt

Test Results:

broken

The above displays in sequence: basic configuration, network parameters, detection results, processing speed, and finally the saved location. We switch to the runs/detect/exp2 directory to view the detection results:

broken

7.2.3. Convert to rknn model

Next, rknn will be exported based on yolov5s.pt:

1、Convert it to yolov5s.onnx (it can also be a model such as torchscript), first install the onnx installation dependent environment:

# Install the ambient light of onnx
pip3 install -r requirements.txt coremltools onnx onnx-simplifier onnxruntime

# Obtain weight file,and use yolov5s.pt
# Here we use the following command to export the onnx model
sudo python3 export.py --weights yolov5s.pt --include  onnx

# Or use the following command to export Torchscript
sudo python3 export.py --weights yolov5s.pt --include  torchscript

Export display:

(.yolov5_env) llh@YH-LONG:/mnt/e/Users/Administrator/Desktop/wsl_user/yolov5/yolov5-7.0$ python export.py --weights yolov5s.pt --include  onnx
export: data=data/coco128.yaml, weights=['yolov5s.pt'], imgsz=[640, 640], batch_size=1, device=cpu, half=False, inplace=False, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=12, verbose=False, workspace=4, nms=False, agnostic_nms=False, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['onnx']
YOLOv5 🚀 2023-2-28 Python-3.8.10 torch-1.10.1+cpu CPU

Fusing layers...
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs

PyTorch: starting from yolov5s.pt with output shape (1, 25200, 85) (14.1 MB)

ONNX: starting export with onnx 1.13.1...
ONNX: export success ✅ 1.8s, saved as yolov5s.onnx (28.0 MB)

Export complete (2.3s)
Results saved to /mnt/e/Users/Administrator/Desktop/wsl_user/yolov5/yolov5-7.0
Detect:          python detect.py --weights yolov5s.onnx
Validate:        python val.py --weights yolov5s.onnx
PyTorch Hub:     model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5s.onnx')
Visualize:       https://netron.app

Yolov5s.onnx files will be generated in the current directory. We can use Netron <https://netron.app/> _ visible model:

broken

2、Export Looking at the picture above, the rear of the onnx model is finally passed through the DETECT layer to adapt to the RKNN for appropriate processing. Here, remove this network structure (but the Sigmoid function at the rear of the model structure is not deleted), and the three feature maps are directly output. For details, here

You need to modify the Yolov5 file source code:

models/yolo.py
def forward(self, x):
    z = []  # inference output
    for i in range(self.nl):
        z.append(torch.sigmoid(self.m[i](x[i])))

    return z
export.py
#shape = tuple((y[0] if isinstance(y, tuple) else y).shape)  # model output shape
shape = tuple((y[0] if (isinstance(y, tuple) or (isinstance(y, list))) else y).shape)  # model output shape

After modification, re -execute the command conversion: `` python export.py -weights yolov5s.pt -found onnx`` Use Netron <https://netron.app/> _ visible model:

broken

3、Converted to RKNN model

The front of the Angnx model is converted into the RKNN model. You need to install the environment such as RKNN-Toolkit2, and refer to the chapter of “NPU”.

Use Rknn-Toolkit2 model conversion function:

rknn_transfer.py(Part of the code here, please refer to the supporting case)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
if __name__ == '__main__':

    # Create RKNN
    # If the test encounters a problem, turn on Verbose = TRUE and check the debugging information.
    # rknn = RKNN(verbose=True)
    rknn = RKNN()

    # Enter image processing
    img = cv2.imread(IMG_PATH)
    # img, ratio, (dw, dh) = letterbox(img, new_shape=(IMG_SIZE, IMG_SIZE))
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))

    # pre-process config
    print('--> Config model')
    rknn.config(mean_values=[[0, 0, 0]], std_values=[[255, 255, 255]], target_platform="rk3568")
    print('done')

    # If the previous conversion is TORCHScript, renamed .pt files, cancel the following annotation, and then load the model.
    #print('--> Loading model')
    #ret = rknn.load_pytorch(model=PYTORCH_MODEL, input_size_list=[[1,3,IMG_SIZE,IMG_SIZE]])
    #if ret != 0:
    #    print('Load model failed!')
    #    exit(ret)
    #print('done')

    # Load the ONNX model
    print('--> Loading model')
    ret = rknn.load_onnx(model=ONNX_MODEL)
    if ret != 0:
        print('Load model failed!')
        exit(ret)
    print('done')

    # Build a model and open the quantification by default
    print('--> Building model')
    ret = rknn.build(do_quantization=True, dataset=DATASET)
    #ret = rknn.build(do_quantization=False)
    if ret != 0:
        print('Build model failed!')
        exit(ret)
    print('done')

    # Quantitative accuracy analysis. Turn off by default, cancel the annotation to open
    #print('--> Accuracy analysis')
    #Ret = rknn.accuracy_analysis(inputs=[img])
    #if ret != 0:
    #    print('Accuracy analysis failed!')
    #    exit(ret)
    #print('done')

    # Export the RKNN model
    print('--> Export rknn model')
    ret = rknn.export_rknn(RKNN_MODEL)
    #if ret != 0:
    #    print('Export rknn model failed!')
    #    exit(ret)
    print('done')

    # Initialize the operating environment, the default is not configured with device_id, etc., you need to turn on when you need to debug.
    print('--> Init runtime environment')
    ret = rknn.init_runtime()
    # ret = rknn.init_runtime(target='rk3568', device_id='192.168.103.115:5555', perf_debug=True)
    if ret != 0:
        print('Init runtime environment failed!')
        exit(ret)
    print('done')

    # Debug, evaluate the performance of the model, cancel the annotation and open it
    # rknn.eval_perf(inputs=[img], is_print=True)

    rknn.release()
# Run the conversion program
sudo python3 test.py

# Display the result and generate yolov5s.rknn in the current directory
W __init__: rknn-toolkit2 version: 1.4.0-22dcfef4
--> Config model
done
--> Loading model
done
--> Building model
Analysing : 100%|███████████████████████████████████████████████| 145/145 [00:00<00:00, 2992.56it/s]
Quantizating : 100%|█████████████████████████████████████████████| 145/145 [00:00<00:00, 320.39it/s]
W build: The default input dtype of 'images' is changed from 'float32' to 'int8' in rknn model for performance!
                    Please take care of this change when deploy rknn model with Runtime API!
W build: The default output dtype of 'output0' is changed from 'float32' to 'int8' in rknn model for performance!
                    Please take care of this change when deploy rknn model with Runtime API!
W build: The default output dtype of '329' is changed from 'float32' to 'int8' in rknn model for performance!
                    Please take care of this change when deploy rknn model with Runtime API!
W build: The default output dtype of '331' is changed from 'float32' to 'int8' in rknn model for performance!
                    Please take care of this change when deploy rknn model with Runtime API!
done
--> Export rknn model
done

# Cancel the above EVAL_PERF and rknn.init_runtime comments, turn on performance evaluation, simple debugging
--> Init runtime environment
W init_runtime: Target is None, use simulator!
W init_runtime: Flag perf_debug has been set, it will affect the performance of inference!
I NPUTransfer: Starting NPU Transfer Client, Transfer version 2.1.0 (b5861e7@2020-11-23T11:50:36)
D RKNNAPI: ==============================================
D RKNNAPI: RKNN VERSION:
D RKNNAPI:   API: 1.4.0 (bb6dac9 build: 2022-08-29 16:17:01)(null)
D RKNNAPI:   DRV: rknn_server: 1.3.0 (121b661 build: 2022-04-29 11:11:47)
D RKNNAPI:   DRV: rknnrt: 1.4.0 (a10f100eb@2022-09-09T09:07:14)
D RKNNAPI: ==============================================
done
===================================================================================================================
                            Performance
        #### The performance result is just for debugging, ####
        #### may worse than actual performance!            ####
===================================================================================================================
Total Weight Memory Size: 7312768
Total Internal Memory Size: 7782400
Predict Internal Memory RW Amount: 138880000
Predict Weight Memory RW Amount: 7312768
ID   OpType           DataType Target InputShape                                   OutputShape            DDR Cycles     NPU Cycles     Total Cycles   Time(us)       MacUsage(%)    RW(KB)         FullName

0    InputOperator    UINT8    CPU    \                                            (1,3,640,640)          0              0              0              9              \              1200.00        InputOperator:images
1    Conv             UINT8    NPU    (1,3,640,640),(32,3,6,6),(32)                (1,32,320,320)         687110         691200         691200         8407           9.14           4409.25        Conv:Conv_0
2    exSwish          INT8     NPU    (1,32,320,320)                               (1,32,320,320)         997336         0              997336         3737           \              6400.00        exSwish:Sigmoid_1_2swish
.....(Omitted)
145  Sigmoid          INT8     NPU    (1,255,80,80)                                (1,255,80,80)          498668         0              498668         1895           \              3200.00        Sigmoid:Sigmoid_199
146  OutputOperator   INT8     CPU    (1,255,80,80),(1,80,80,256)                  \                      0              0              0              165            \              3200.00        OutputOperator:output0
147  OutputOperator   INT8     CPU    (1,255,40,40),(1,40,40,256)                  \                      0              0              0              52             \              800.00         OutputOperator:329
148  OutputOperator   INT8     CPU    (1,255,20,20),(1,20,20,256)                  \                      0              0              0              34             \              220.00         OutputOperator:331
Total Operator Elapsed Time(us): 77776

===================================================================================================================

The RKNN model will be exported in the current directory. If the EVAL_PERF interface annotation is canceled, the performance evaluation will be performed and simply debug.

7.2.4. Deploy to LubanCat Card

After exporting the RKNN model, use the RKNN Toolkit Lite2 to make a simple deployment on the end of the board.

rknnlite_inference.py(Part of the code here, please refer to the supporting case)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
if __name__ == '__main__':

    # Create Rknnlite object
    rknn_lite = RKNNLite()

    # Import RKNN model
    print('--> Load RKNN model')
    ret = rknn_lite.load_rknn(rknn_model)
    if ret != 0:
        print('Load RKNN model failed')
        exit(ret)
    print('done')

    # Initialized operating environment
    print('--> Init runtime environment')
    ret = rknn_lite.init_runtime()
    if ret != 0:
        print('Init runtime environment failed!')
        exit(ret)
    print('done')

    # Enter image processing
    img = cv2.imread(IMG_PATH)
    img, ratio, (dw, dh) = letterbox(img, new_shape=(640, 640)) # 灰度填充
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    # reasoning
    print('--> Running model')
    outputs = rknn_lite.inference(inputs=[img])
    print('done')

    # Post -treatment
    boxes, classes, scores = yolov5_post_process(outputs)

    img_1 = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
    if boxes is not None:
        draw(img_1, boxes, scores, classes)

    # Show or save pictures
    cv2.imwrite("out.jpg", img_1)
    # cv2.imshow("post process result", img_1)
    # cv2.waitKey(0)
    # cv2.destroyAllWindows()

    rknn_lite.release()

Simple test shows:

--> Load RKNN model
done
--> Init runtime environment
I RKNN: [11:59:53.211] RKNN Runtime Information: librknnrt version: 1.4.0 (a10f100eb@2022-09-09T09:07:14)
I RKNN: [11:59:53.211] RKNN Driver Information: version: 0.7.2
I RKNN: [11:59:53.213] RKNN Model Information: version: 1, toolkit version: 1.4.0-22dcfef4(compiler version: 1.4.0 (3b4520e4f@2022-09-05T20:52:35)), target: RKNPU lite, target platform: rk3568, framework name: PyTorch, framework layout: NCHW
done
--> Running model
done
class: person, score: 0.8889630436897278
box coordinate left,top,right,down: [370, 168, 574, 495]
class: person, score: 0.5832323431968689
box coordinate left,top,right,down: [59, 242, 362, 493]
class: tie, score: 0.668832540512085
box coordinate left,top,right,down: [221, 360, 249, 491]

The result will be stored in the out.jpg in the current directory. View the picture:

broken

7.3. airockchip/yolov5 Simple test

The above is a simple test using the official YOLOV5 V7.0. Next, we use AirockChip/YOLOV5. The Yolov5 of the warehouse is optimized to deploy the RKNPU device. The tutorial test is V6.2.

Pull the latest AirockChip/YOLOV5 warehouse, install similar, refer to the previous front:

# The V6.2 version was tested
git clone https://github.com/airockchip/yolov5.git
cd yolov5

To obtain a weight document, we need to re -train:

# Re-training and obtain the optimized weight file. This is based on weight Yolov5s.pt, or uses--CFG specified configuration file
sudo python3 train.py --data coco128.yaml --weights yolov5s.pt  --img 640

# There are a lot of output information after the training, and the output will be out:
...
Optimizer stripped from runs/train/exp/weights/last.pt, 14.9MB
Optimizer stripped from runs/train/exp/weights/best.pt, 14.9MB

Validating runs/train/exp/weights/best.pt...
...
# Training some analysis and weights stored in runs/train/exp

It will save files in Runs/Train/EXP/Weights/Save. We renamed yolov5s_relu.pt and copied it to the source code root directory.

Next export TORCHScript and use the command:

# --Weights specify the path of the weight file,
# --Rknpu specified platform (RK_Platform supports RK1808, RV1109, RV1126, RK3399Pro, RK3566, RK3568, RK3588, RV1103, RV1106)
# --include specifies the exported onnx model, and the default is TORCHSCRIPT

sudo python3 export.py --weights yolov5s_relu.pt --rknpu rk3568

# Normal conversion will output:
export: data=data/coco128.yaml, weights=['yolov5s.pt'], imgsz=[640, 640], batch_size=1, device=cpu, half=False, inplace=False, train=False, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=12, verbose=False, workspace=4, nms=False, agnostic_nms=False, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['torchscript'], rknpu=rk3568
YOLOv5 🚀 2022-10-28 Python-3.8.10 torch-1.10.1+cpu CPU

Fusing layers...
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
---> save anchors for RKNN
[[10.0, 13.0], [16.0, 30.0], [33.0, 23.0], [30.0, 61.0], [62.0, 45.0], [59.0, 119.0], [116.0, 90.0], [156.0, 198.0], [373.0, 326.0]]

PyTorch: starting from yolov5s.pt with output shape (1, 255, 80, 80) (14.1 MB)

TorchScript: starting export with torch 1.10.1+cpu...
TorchScript: export success, saved as yolov5s.torchscript (27.8 MB)

Export complete (1.92s)
Results saved to /mnt/e/Users/Administrator/Desktop/wsl_user/yolov5/yolov5-master
Detect:          python detect.py --weights yolov5s.torchscript
Validate:        python val.py --weights yolov5s.torchscript
PyTorch Hub:     model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5s.torchscript')
Visualize:       https://netron.app


# Or use the following command to export the onnx model to generate the YOLOV5S.onnx file in the current directory.
python export.py --weights yolov5s.pt --include onnx --rknpu rk3568

Export torchscript, will generate yolov5s_relu.torchscript files in the current directory, and rename it .pt file.

7.3.1. Converted to the RKNN model and deployed to LubanCat RK356X board

Test using the previous conversion file, directly run the program to export the RKNN file,or use the tools here for model conversion, model evaluation, model deployment, etc.

# Run the test, export the RKNN model:
W __init__: rknn-toolkit2 version: 1.4.0-22dcfef4
--> Config model
done
--> Loading model
PtParse: 100%|███████████████████████████████████████████████████| 698/698 [00:01<00:00, 602.67it/s]
done
--> Building model
Analysing : 100%|████████████████████████████████████████████████| 145/145 [00:00<00:00, 381.87it/s]
Quantizating : 100%|█████████████████████████████████████████████| 145/145 [00:00<00:00, 518.54it/s]
W build: The default input dtype of 'x.1' is changed from 'float32' to 'int8' in rknn model for performance!
                    Please take care of this change when deploy rknn model with Runtime API!
W build: The default output dtype of '172' is changed from 'float32' to 'int8' in rknn model for performance!
                    Please take care of this change when deploy rknn model with Runtime API!
W build: The default output dtype of '173' is changed from 'float32' to 'int8' in rknn model for performance!
                    Please take care of this change when deploy rknn model with Runtime API!
W build: The default output dtype of '174' is changed from 'float32' to 'int8' in rknn model for performance!
                    Please take care of this change when deploy rknn model with Runtime API!
done
--> Export rknn model
done
--> Init runtime environment
W init_runtime: Target is None, use simulator!
W init_runtime: Flag perf_debug has been set, it will affect the performance of inference!
I NPUTransfer: Starting NPU Transfer Client, Transfer version 2.1.0 (b5861e7@2020-11-23T11:50:36)
D RKNNAPI: ==============================================
D RKNNAPI: RKNN VERSION:
D RKNNAPI:   API: 1.4.0 (bb6dac9 build: 2022-08-29 16:17:01)(null)
D RKNNAPI:   DRV: rknn_server: 1.3.0 (121b661 build: 2022-04-29 11:11:47)
D RKNNAPI:   DRV: rknnrt: 1.4.0 (a10f100eb@2022-09-09T09:07:14)
D RKNNAPI: ==============================================
done
===================================================================================================================
                            Performance
        #### The performance result is just for debugging, ####
        #### may worse than actual performance!            ####
===================================================================================================================
Total Weight Memory Size: 7312768
Total Internal Memory Size: 6144000
Predict Internal Memory RW Amount: 86931200
Predict Weight Memory RW Amount: 7312768
ID   OpType           DataType Target InputShape                                   OutputShape            DDR Cycles     NPU Cycles     Total Cycles   Time(us)       MacUsage(%)    RW(KB)         FullName

0    InputOperator    UINT8    CPU    \                                            (1,3,640,640)          0              0              0              8              \              1200.00        InputOperator:x.1
1    ConvRelu         UINT8    NPU    (1,3,640,640),(32,3,6,6),(32)                (1,32,320,320)         687110         691200         691200         8387           9.16           4409.25        Conv:input.4_Conv
2    ConvRelu         INT8     NPU    (1,32,320,320),(64,32,3,3),(64)              (1,64,160,160)         750885         921600         921600         2247           45.57          4818.50        Conv:input.6_Conv
.....(Omitted)
80   Conv             INT8     NPU    (1,512,20,20),(255,512,1,1),(255)            (1,255,20,20)          66931          102000         102000         180            62.96          429.50         Conv:1189_Conv
81   Sigmoid          INT8     NPU    (1,255,20,20)                                (1,255,20,20)          31167          0              31167          141            \              200.00         Sigmoid:1190_Sigmoid
82   OutputOperator   INT8     CPU    (1,255,80,80),(1,80,80,256)                  \                      0              0              0              161            \              3200.00        OutputOperator:172
83   OutputOperator   INT8     CPU    (1,255,40,40),(1,40,40,256)                  \                      0              0              0              46             \              800.00         OutputOperator:173
84   OutputOperator   INT8     CPU    (1,255,20,20),(1,20,20,256)                  \                      0              0              0              24             \              220.00         OutputOperator:174
Total Operator Elapsed Time(us): 44412

===================================================================================================================

提示

The CPU frequency of the previous test is 1.8GHz, the DDR frequency is 1.056GHz, and the NPU frequency is 900MHz.

The deployment test in the future is not listed as the previous.