9. SenseVoice¶
SenseVoice 是具有音频理解能力的音频基础模型,包括语音识别(ASR)、语种识别(LID)、 语音情感识别(SER)和声学事件分类(AEC)或声学事件检测(AED)。
Sensevoice采用超过40万小时数据训练,支持超过50种语言,识别效果上优于Whisper模型。 将输入的音频转换为对应的文本序列,并根据特殊的标记来指定不同的任务。
9.1. RKNN模型转换¶
在PC上创建一个虚拟环境,安装Sensevoice环境
# 使用conda创建虚拟环境
conda create -n sensevoice python=3.11
conda activate sensevoice
# 安装相关库和软件
git clone https://github.com/alibaba/FunASR.git
cd FunASR
pip3 install -qq -e ./
cd ..
#拉取SenseVoice源码,安装相关依赖库
git clone https://github.com/FunAudioLLM/SenseVoice
cd SenseVoice
pip install -qq -r ./requirements.txt
cd ..
pip install soundfile onnx onnxruntime kaldi-native-fbank librosa soundfile
# 配置pip源(可选)
# pip3 config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple/
获取SenseVoiceSmall模型(可选):
git clone https://www.modelscope.cn/iic/SenseVoiceSmall.git
先简单编程将SenseVoiceSmall模型导出onnx模型,参考了 sherpa-onnx <https://github.com/k2-fsa/sherpa-onnx> 中程序 然后使用toolkit2工具转换成rknn模型。
# 省略............................
def main():
# model, params = SenseVoiceSmall.from_pretrained(model="iic/SenseVoiceSmall", device="cpu")
model, params = SenseVoiceSmall.from_pretrained(model="../models/SenseVoiceSmall", device="cpu")
display_params(params)
generate_tokens(params)
model.__class__.forward = modified_forward
model.encoder.__class__.forward = encoder_forward
x = torch.randn(1, INPUT_LEN, 560, dtype=torch.float32)
x_length = torch.tensor([100], dtype=torch.int32)
language = torch.tensor([3], dtype=torch.int32)
text_norm = torch.tensor([15], dtype=torch.int32)
opset_version = 13
filename = "model.onnx"
torch.onnx.export(
model,
(x, x_length, language, text_norm),
filename,
opset_version=opset_version,
input_names=["x", "x_length", "language", "text_norm"],
output_names=["logits"],
)
if __name__ == "__main__":
torch.manual_seed(20250717)
main()
执行export_onnx.py程序,可自行修改模型路径和导出的onnx名称:
(sensevoice) llh@llh:/xxx/SenseVoice$ python export_onnx.py
Loading remote code successfully: model
----------params----------
# 省略............................................
然后使用配套例程的转换程序,将sensevoice的onnx模型成rknn模型:
# Usage: python3 convert_30.py onnx_model_path [platform] [dtype(optional)] [output_rknn_path(optional)]
# platform choose from [rk3562,rk3566,rk3568,rk3576,rk3588]
# dtype choose from [fp] for [rk3562,rk3566,rk3568,rk3576,rk3588]
(toolkit2_2.3) llh@llh:/xxx$ python convert.py model.onnx rk3588 fp model.rknn
I rknn-toolkit2 version: 2.3.2
--> Config model
done
--> Loading model
# 省略.................
I Loading : 0%| | 0/917 [00:00<?, ?it/s]
I Loading : 94%|███████████████████████████████████████████▉ | 858/917 [00:00<00:00, 8536.70it/s]
I Loading : 100%|███████████████████████████████████████████████| 917/917 [00:00<00:00, 6075.70it/s]
done
--> Building model
# 省略...................
I OpFusing 1: 0%|█████████████████████████████████████ | 0/100 [00:00<?, ?it/s]
I OpFusing 2 : 100%|██████████████████████████████████████████████| 100/100 [00:16<00:00, 5.94it/s]
I rknn building ...
# 省略...................
E RKNN: [15:09:24.190] Unkown op target: 0
I rknn building done.
done
--> Export rknn model
done
模型将保存在当前目录下,名称为model.rknn。
9.2. Lubancat板卡上测试¶
9.2.1. 测试语音文件¶
简单测试使用RKNN Toolkit Lite2的接口,需要先安装Toolkit Lite2等环境。
教程简单测试使用UV创建环境,UV的简单使用和安装参考下 前面教程 , 自行测试也可以直接安装,不用UV和虚拟环境。
# 使用UV创建一个环境,使用python3.11
cat@lubancat:~$ uv venv --python python3.11 rknn_lite_sensevoice
Using CPython 3.11.13
Creating virtual environment at: rknn_lite_sensevoice
Activate with: source rknn_lite_sensevoice/bin/activate
# 进入环境
cat@lubancat:~$ source rknn_lite_sensevoice/bin/activate
(rknn_lite_sensevoice) cat@lubancat:~$
# 安装RKNN Toolkit Lite2,对应python版本的whl文件(参考前面文档),
(rknn_lite_sensevoice) cat@lubancat:~/xxx$ uv pip install rknn_toolkit_lite2-2.3.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Using Python 3.11.13 environment at: rknn_lite_sensevoice
Resolved 5 packages in 1.17s
Prepared 5 packages in 1.64s
Installed 5 packages in 23ms
+ numpy==2.3.2
+ psutil==7.0.0
+ rknn-toolkit-lite2==2.3.2 (from xxxx)
+ ruamel-yaml==0.18.14
+ ruamel-yaml-clib==0.2.12
# 安装soundfile,kaldi_native_fbank等等
(rknn_lite_sensevoice) cat@lubancat:~$ uv pip install kaldi_native_fbank soundfile
获取测试例程sensevoice_rknn.py:
# 获取测试例程,待更新
(rknn_lite_sensevoice) cat@lubancat:~$ git clone xxx
(rknn_lite_sensevoice) cat@lubancat:~$ cd
将前面小节转换出的rknn模型传输到板卡,然后运行例程,需要指定模型路径,tokens,以及音频路径等参数
# 例程参数
(rknn_lite_sensevoice) cat@lubancat:~/xxx$ python sensevoice_rknn.py -h
usage: sensevoice_rknn.py [-h] --model_path MODEL_PATH --tokens TOKENS [--language LANGUAGE]
--audio_path AUDIO_PATH [--use-itn USE_ITN] [--mvn_path MVN_PATH]
sensevoice Python Demo
options:
-h, --help show this help message and exit
--model_path MODEL_PATH
model path, could be .rknn or .onnx file
--tokens TOKENS Path to tokens.txt
--language LANGUAGE the language of the input wav file. Supported values: zh, en, ja, ko, yue, auto
--audio_path AUDIO_PATH
The input wave to be recognized
--use-itn USE_ITN 1 to use inverse text normalization. 0 to not use inverse text normalization
--mvn_path MVN_PATH The input wave to be recognized
# 执行测试例程(Lubancat-4)
(rknn_lite_sensevoice) cat@lubancat:~/xxx$ python sensevoice_rknn.py --model_path ../model/model_encoder.rknn \
--audio_path ../model/en.wav --mvn_path ../model/am.mvn --tokens ../model/tokens.txt --use-itn 1
I RKNN: [10:21:08.231] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27)
I RKNN: [10:21:08.231] RKNN Driver Information, version: 0.9.8
I RKNN: [10:21:08.232] RKNN Model Information, version: 6, toolkit version: 2.3.2(compiler version: 2.3.2 (e045de294f@2025-04-07T19:48:25)),
target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape
W RKNN: [10:21:08.620] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes
W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.)
<|en|><|NEUTRAL|><|Speech|><|withitn|>The tribal chieftain called for the boy and presented him with 50 pieces of code.
<|NEUTRAL|><|NEUTRAL|><|NEUTRAL|><|NEUTRAL|><|NEUTRAL|>