2. Qwen2-VL

Qwen2-VL是一个基于视觉-语言预训练的多模态模型,支持图像和文本的联合输入,输出是文本形式。

broken

Github地址:https://github.com/QwenLM/Qwen2-VL

将在鲁班猫板卡上部署Qwen2-VL-2B-Instruct模型,对输入图像进行描述。

2.1. Qwen2-VL使用

Qwen2-VL系列的2B和7B模型及其量化模型可以在Hugging Face或者ModelScope上找到, 请参考 Qwen/Qwen2-VL-2B-Instruct

接下来将使用HF Transformers参考 Qwen/Qwen2-VL-2B-Instruct Model Card中的描述测试Qwen2-VL。 安装测试环境:

# 简单创建一个测试环境
conda create -n qwen2_vl python=3.10
conda activate qwen2_vl

# 自行安装transformers和torch
(qwen2_vl) llh@llh:/xxx$ pip install transformers torch torchvision

# 安装qwen-vl-utils 工具包(可选)
(qwen2_vl) llh@llh:/xxx$ pip install qwen-vl-utils

拉取模型文件:

# 安装git-lfs
git lfs install

# 获取Qwen2-VL-2B-Instruct模型文件
git clone https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct
# 从镜像网址获取(可选)
git clone https://hf-mirror.com/Qwen/Qwen2-VL-2B-Instruct

# 或者获取Qwen2-VL-7B-Instruct模型文件
git clone https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct
# 从镜像网址获取(可选)
git clone https://hf-mirror.com/Qwen/Qwen2-VL-7B-Instruct

测试程序,可以自行修改图像路径等等:

infer.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
from PIL import Image
import requests
import torch
from torchvision import io
from typing import Dict
from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor

# Load the model in half-precision on the available device(s)
path = "path/to/Qwen2-VL-2B-Instruct"
model = Qwen2VLForConditionalGeneration.from_pretrained(
    path, torch_dtype="auto", device_map="auto"
)
processor = AutoProcessor.from_pretrained(path)

# Image
#url = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
#image = Image.open(requests.get(url, stream=True).raw)
image = Image.open('./data/demo.jpg')
conversation = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
            },
            {"type": "text", "text": "描述这幅图像"},
        ],
    }
]

# Preprocess the inputs
text_prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
# Excepted output: '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<|vision_start|><|image_pad|><|vision_end|>
# Describe this image.<|im_end|>\n<|im_start|>assistant\n'

inputs = processor(
    text=[text_prompt], images=[image], padding=True, return_tensors="pt"
)
inputs = inputs.to("cuda")

# Inference: Generation of the output
output_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids = [
    output_ids[len(input_ids) :]
    for input_ids, output_ids in zip(inputs.input_ids, output_ids)
]
output_text = processor.batch_decode(
    generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True
)
print(output_text)

执行程序,将输出对图像的文本描述:

# 修改程序中模型的路径为前面拉取模型文件的路径
(qwen2_vl) llh@llh:/xxx$ python infer.py
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
`Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████| 2/2 [01:16<00:00, 38.45s/it]
['这幅图像描绘了一位宇航员在月球表面休息的场景。宇航员穿着白色的宇航服,坐在一个绿色的冷藏箱旁边,手里拿着一瓶绿色的啤酒。
背景中可以看到地球和星空,显示出宇航员在月球上的孤独和宁静。
月球表面的岩石和沙子,以及远处的地球,都清晰可见。整体画面充满了科幻和探索的氛围。']

2.2. 模型转换

Qwen2-VL模型是ViT加Qwen2的串联结构,为了部署模型,将模型分成两部分。模型转换程序参考 rknn-llm 工程文件中的例程。

2.2.1. 导出vision的onnx模型

# 获取rknn-llm
git clone https://github.com/airockchip/rknn-llm

# 切换到example目录下
cd rknn-llm/examples/Qwen2-VL-2B_Demo

# 注意修改程序中模型路径
path = 'path/Qwen2-VL-2B-Instruct'

(qwen2_vl) llh@llh:/xxx$ python export/export_vision.py
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
`Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████| 2/2 [00:36<00:00, 18.16s/it]
#......省略

会在onnx目录下生成qwen2_vl_2b_vision.onnx文件。

2.2.2. 转换成rknn模型

将前面生成的qwen2_vl_2b_vision.onnx文件经过Toolkit2工具导出rknn模型, rknn-Toolkit2环境安装参考下前面 Toolkit2章节

# 教程测试lubancat-4,如果是鲁班猫3需要修改target_platform = "rk3576"和模型路径model_path

(toolkit2_2.3)llh@llh:/xxx$ python export/export_vision_rknn.py
I rknn-toolkit2 version: 2.3.0
I Loading : 100%|█████████████████████████████████████████████████| 551/551 [00:22<00:00, 24.37it/s]
I OpFusing 0: 100%|██████████████████████████████████████████████| 100/100 [00:00<00:00, 174.66it/s]
I OpFusing 1 : 100%|██████████████████████████████████████████████| 100/100 [00:02<00:00, 34.42it/s]
I OpFusing 0 : 100%|██████████████████████████████████████████████| 100/100 [00:11<00:00,  8.75it/s]
I OpFusing 1 : 100%|██████████████████████████████████████████████| 100/100 [00:12<00:00,  8.27it/s]
I OpFusing 0 : 100%|██████████████████████████████████████████████| 100/100 [00:14<00:00,  6.82it/s]
I OpFusing 1 : 100%|██████████████████████████████████████████████| 100/100 [00:14<00:00,  6.75it/s]
I OpFusing 2 : 100%|██████████████████████████████████████████████| 100/100 [00:14<00:00,  6.68it/s]
I OpFusing 0 : 100%|██████████████████████████████████████████████| 100/100 [00:15<00:00,  6.50it/s]
I OpFusing 1 : 100%|██████████████████████████████████████████████| 100/100 [00:15<00:00,  6.43it/s]
I OpFusing 2 : 100%|██████████████████████████████████████████████| 100/100 [00:31<00:00,  3.13it/s]
I Saving : 100%|██████████████████████████████████████████████████| 295/295 [00:16<00:00, 18.32it/s]
I rknn building ...
I rknn building done.

教程测试lubancat-4,将会在rknn目录下生成qwen2_vl_2b_vision_rk3588.rknn文件。

2.2.3. 导出rkllm模型

使用rkllm-toolkit工具导出rkllm模型,rkllm-toolkit环境安装参考下 前面 RKLLM章节 或者查看 Rockchip_RKLLM_SDK_CN_xxx.pdf

先对原始数据进行处理,将原始的json格式转化为模型接受的形式,作为量化数据。

# 修改path为前面拉取Qwen2-VL-2B-Instruct模型文件的路径
(rkllm_1.1.4) llh@llh:/xxx$ python data/make_input_embeds_for_quantize.py
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
`Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.84it/s]
inputs_embeds torch.Size([1, 249, 1536])
#......省略
inputs_embeds torch.Size([1, 227, 1536])
inputs_embeds torch.Size([1, 280, 1536])
inputs_embeds torch.Size([1, 300, 1536])
inputs_embeds torch.Size([1, 334, 1536])
inputs_embeds torch.Size([1, 386, 1536])
100%|██████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:08<00:00,  2.46it/s]
Done

导出rkllm模型,注意修改模型的路径和目标平台。

# 修改模型路径和llm.build的target_platform='rk3588'和quantized_dtype='w8a8'
# 教程测试lubancat-4, 如果是使用鲁班猫3,需要修改:
 ret = llm.build(do_quantization=True, optimization_level=1, quantized_dtype='w4a16',
                 quantized_algorithm='normal', target_platform='rk3576', num_npu_core=2, extra_qparams=qparams, dataset=dataset)


# 执行程序加载模型导出rkllm模型
(rkllm_1.1.4) llh@llh:/xxx$ python export/export_rkllm.py
INFO: rkllm-toolkit version: 1.1.4
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
`Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████| 2/2 [00:24<00:00, 12.20s/it]
WARNING: rkllm-toolkit only exports the language model of Qwen2VL!
Optimizing model: 100%|███████████████████████████████████████████████████████████████████████████████| 28/28 [01:03<00:00,  2.27s/it]
Building model: 100%|█████████████████████████████████████████████████████████████████████████████████| 399/399 [00:07<00:00, 53.74it/s]
INFO: The token_id of eos is set to 151645
INFO: The token_id of pad is set to 151643
INFO: The token_id of bos is set to 151643
Converting model: 100%|████████████████████████████████████████████████████████████████████| 339/339 [00:00<00:00, 3590578.42it/s]
INFO: Exporting the model, please wait ....
[=================================================>] 597/597 (100%)
INFO: Model has been saved to ./Qwen2-VL-2B-Instruct.rkllm!

2.3. 部署测试

板卡上获取rkllm工程文件:

# 板卡上获取测试例程(教程测试lubancat-4)
git clone https://github.com/airockchip/rknn-llm

# 待加

# 切换到例程目录
cd rknn-llm/examples/Qwen2-VL-2B_Demo/deploy

板卡上编译测试例程:

# 本地编译,改build-linux.sh中编译器
GCC_COMPILER=aarch64-linux-gnu

cat@lubancat:~/rknn-llm/examples/Qwen2-VL-2B_Demo/deploy$ ./build-linux.sh
-- The C compiler identification is GNU 10.2.1
-- The CXX compiler identification is GNU 10.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/aarch64-linux-gnu-gcc - skipped
# 省略..............
[ 70%] Linking CXX executable llm_test
[ 70%] Built target llm_test
[ 80%] Linking CXX executable llm
[ 80%] Built target llm
[ 90%] Linking CXX executable imgenc
[100%] Linking CXX executable demo
[100%] Built target imgenc
[100%] Built target demo
[ 30%] Built target demo
[ 50%] Built target llm_test
[ 70%] Built target llm
[100%] Built target imgenc
# 省略..............

切换到install/demo_Linux_aarch64/目录下执行程序demo例程,用户输入“<image>请描述图像”,例程将对下面的测试图像进行描述:

broken
# 将前面导出的qwen2_vl_2b_vision_rk3588.rknn和Qwen2-VL-2B-Instruct.rkllm模型传输到板卡

# 执行demo程序
# Usage: ./demo image_path encoder_model_path llm_model_path max_new_tokens max_context_len
cat@lubancat:~/xxx/install/demo_Linux_aarch64$ export LD_LIBRARY_PATH=./lib
cat@lubancat:~/xxx/install/demo_Linux_aarch64$ ./demo demo.jpg ~/qwen2_vl_2b_vision_rk3588.rknn ~/Qwen2-VL-2B-Instruct.rkllm 128 512
I rkllm: rkllm-runtime version: 1.1.4, rknpu driver version: 0.9.8, platform: RK3588

rkllm init success
main: LLM Model loaded in  2637.91 ms
model input num: 1, output num: 1
input tensors:
index=0, name=onnx::Expand_0, n_dims=4, dims=[1, 392, 392, 3], n_elems=460992, size=921984, fmt=NHWC, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
output tensors:
index=0, name=6076, n_dims=2, dims=[196, 1536, 0, 0], n_elems=301056, size=602112, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
model input height=392, width=392, channel=3
main: ImgEnc Model loaded in  2297.09 ms

user: <image>请描述图像
robot: 这张图片展示了一位宇航员在月球表面的场景。宇航员穿着白色的太空服,戴着头盔和手套,正在休息或放松。
他手中拿着一个绿色的瓶子,似乎在喝着饮料。背景是广阔的月球表面,可以看到一些岩石和沙子。
远处可以看到地球和其他星星,给人一种置身于宇宙中的感觉。整体画面充满了科幻元素和探索太空的主题。