20. TTS¶
TTS(Text To Speech)是将文本转成语音,也就是语音合成。
20.1. MMS-TTS¶
MMS-TTS是Facebook公司Massively Multilingual Speech(MMS)项目的一部分,MMS项目旨在为多种语言提供语音技术支持,覆盖了广泛的语言范围。
mms-tts模型是基于VITS(Variational Inference with adversarial learning for end-to-end Text-to-Speech)架构开发的, 能够将文本转换为高质量的语音输出。
20.1.1. mms-tts-eng简单使用¶
在PC上使用conda创建环境,然后测试mms-tts-eng。
conda create -n tts python=3.10
conda activate tts
# 安装相关库等等
pip install --upgrade transformers accelerate
获取 facebook/mms-tts-eng 模型文件(可选)。
git lfs install
# sudo apt update && sudo apt install git-lfs
git clone https://huggingface.co/facebook/mms-tts-eng
# 或者是镜像网址
git clone https://hf-mirror.com/facebook/mms-tts-eng
参考 示例程序 ,创建python程序文件:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | import torch
from transformers import VitsTokenizer, VitsModel, set_seed
#tokenizer = VitsTokenizer.from_pretrained("path/to/mms-tts-eng")
#model = VitsModel.from_pretrained("path/to/mms-tts-eng")
tokenizer = VitsTokenizer.from_pretrained("facebook/mms-tts-eng")
model = VitsModel.from_pretrained("facebook/mms-tts-eng")
inputs = tokenizer(text="Hello - my dog is cute", return_tensors="pt")
set_seed(555) # make deterministic
with torch.no_grad():
outputs = model(**inputs)
waveform = outputs.waveform[0]
# 将结果保存.wav文件
import scipy
scipy.io.wavfile.write("techno.wav", rate=model.config.sampling_rate, data=waveform.numpy())
|
修改模型的路径为前面手动拉取的 facebook/mms-tts-eng 模型文件路径, 如果没有手动拉取文件,模型路径需要设置为”facebook/mms-tts-eng”,执行程序将会自动拉取文件。
执行程序将设置的英语文本转成语音,并保存在techno.wav文件。
# 执行测试命令
(tts) llh@llh:/xxx$ python test.py
20.1.2. 模型转换¶
1、转换成onnx模型
参考 rknn_model_zoo 中export_onnx.md说明,导出onnx模型。
在前面测试mms-tts-eng的环境中,获取rknn_model_zoo中modeling_vits_for_export_onnx.py,并 将该文件复制为transformers源文件vits模型目录下的modeling_vits.py, 例如:/home/xxx/anaconda3/envs/tts/lib/python3.10/site-packages/transformers/models/vits/modeling_vits.py ,然后再执行export_onnx.py程序。
(tts) llh@llh:/xxx$ mkdir ../model
# 如果自行拉取了mms-tts-eng模型文件,需要修改export_onnx.py程序中的模型路径
model, tokenizer = setup_model("path/to/mms-tts-eng")
# 执行export_onnx.py程序导出onnx模型,可修改--max_length参数(100,200,300)
(tts) llh@llh:/xxx$ python export_onnx.py --max_length 200
将在model目录下生成mms_tts_eng_decoder_200.onnx和mms_tts_eng_encoder_200.onnx文件。
提示
转换模型环境中pytorch或者onnx版本与rknn_model_zoo中export_onnx.md文件说明不同,
可能会出现 RuntimeError: Trying to create tensor with negative dimension -2
等等问题,可以尝试改下torch.onnx.export的opset_version参数。
2、转换成rknn模型
使用toolkit2工具将onnx转换成rknn模型,具体程序参考: rknn_model_zoo , toolkit2的环境按照参考下 前面教程 。
# 执行convert.py程序导出onnx模型,教程测试鲁班猫4
(toolkit2.3) llh@llh:/xxx$ python convert.py ./mms_tts_eng_encoder_200.onnx rk3588
I rknn-toolkit2 version: 2.3.0
--> Config model
done
--> Loading model
I Loading : 100%|██████████████████████████████████████████████| 240/240 [00:00<00:00, 20289.29it/s]
W load_onnx: The config.mean_values is None, zeros will be set for input 0!
W load_onnx: The config.std_values is None, ones will be set for input 0!
W load_onnx: The config.mean_values is None, zeros will be set for input 1!
W load_onnx: The config.std_values is None, ones will be set for input 1!
W load_onnx: The config.mean_values is None, zeros will be set for input 2!
W load_onnx: The config.std_values is None, ones will be set for input 2!
W load_onnx: The config.mean_values is None, zeros will be set for input 3!
W load_onnx: The config.std_values is None, ones will be set for input 3!
done
--> Building model
I OpFusing 0: 100%|██████████████████████████████████████████████| 100/100 [00:00<00:00, 313.77it/s]
I OpFusing 1 : 100%|█████████████████████████████████████████████| 100/100 [00:00<00:00, 103.39it/s]
I OpFusing 0 : 100%|██████████████████████████████████████████████| 100/100 [00:01<00:00, 70.28it/s]
I OpFusing 1 : 100%|██████████████████████████████████████████████| 100/100 [00:01<00:00, 68.71it/s]
I OpFusing 2 : 100%|██████████████████████████████████████████████| 100/100 [00:03<00:00, 25.14it/s]
I rknn building ...
I rknn building done.
done
--> Export rknn model
done
(toolkit2.2) llh@llh:/xxx$ python convert.py ./mms_tts_eng_decoder_200.onnx rk3588
I rknn-toolkit2 version: 2.3.0
--> Config model
done
--> Loading model
I Loading : 100%|█████████████████████████████████████████████| 851/851 [00:00<00:00, 218082.28it/s]
done
--> Building model
W build: For tensor ['793'], the value smaller than -3e+38 has been corrected to -10000. Set opt_level to 2 or lower to disable this correction.
I OpFusing 0: 100%|███████████████████████████████████████████████| 100/100 [00:01<00:00, 94.64it/s]
I OpFusing 1 : 100%|██████████████████████████████████████████████| 100/100 [00:02<00:00, 33.52it/s]
I OpFusing 0 : 100%|██████████████████████████████████████████████| 100/100 [00:05<00:00, 19.85it/s]
I OpFusing 1 : 100%|██████████████████████████████████████████████| 100/100 [00:05<00:00, 18.75it/s]
I OpFusing 0 : 100%|██████████████████████████████████████████████| 100/100 [00:07<00:00, 14.09it/s]
I OpFusing 1 : 100%|██████████████████████████████████████████████| 100/100 [00:07<00:00, 13.98it/s]
I OpFusing 2 : 100%|██████████████████████████████████████████████| 100/100 [00:07<00:00, 13.79it/s]
I OpFusing 0 : 100%|██████████████████████████████████████████████| 100/100 [00:07<00:00, 12.75it/s]
I OpFusing 1 : 100%|██████████████████████████████████████████████| 100/100 [00:07<00:00, 12.65it/s]
I OpFusing 2 : 100%|██████████████████████████████████████████████| 100/100 [00:08<00:00, 11.76it/s]
I rknn building ...
# 省略.....................
I rknn building done.
done
--> Export rknn model
done
20.1.3. 测试例程¶
在鲁班猫板卡上获取rknn_model_zoo中的例程,然后前面获得的rknn模型放到mode目录下:
# 安装git等等
sudo apt update
sudo apt install git make gcc g++ libsndfile1-dev
# 拉取rknn_model_zoo例程测试,实际编译操作请查看工程的README文件
git clone https://github.com/airockchip/rknn_model_zoo.git
切换到rknn_model_zoo目录下,修改下3rdparty/CMakeLists.txt文件,修改libsndfile库使用系统安装的libsndfile库。
# libsndfile
#set(LIBSNDFILE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/libsndfile)
#set(LIBSNDFILE_INCLUDES ${LIBSNDFILE_PATH}/include PARENT_SCOPE)
#set(LIBSNDFILE ${LIBSNDFILE_PATH}/${CMAKE_SYSTEM_NAME}/${TARGET_LIB_ARCH}/libsndfile.a PARENT_SCOPE)
set(LIBSNDFILE_PATH /usr/)
set(LIBSNDFILE_INCLUDES ${LIBSNDFILE_PATH}/include PARENT_SCOPE)
set(LIBSNDFILE ${LIBSNDFILE_PATH}/lib/aarch64-linux-gnu/libsndfile.so PARENT_SCOPE)
然后编译例程(教程测试鲁班猫4,ubuntu系统, 参数设置rk3588),将生成rknn_mms_tts_demo例程。
cat@lubancat:~$ cd rknn_model_zoo
# -t参数设置平台,教程测试lubancat-4,设置rk3588
cat@lubancat:/xxx/rknn_model_zoo$ ./build-linux.sh -t rk3588 -a aarch64 -d mms_tts
./build-linux.sh -t rk3588 -a aarch64 -d mms_tts
aarch64-linux-gnu
===================================
BUILD_DEMO_NAME=mms_tts
BUILD_DEMO_PATH=examples/mms_tts/cpp
TARGET_SOC=rk3588
TARGET_ARCH=aarch64
BUILD_TYPE=Release
ENABLE_ASAN=OFF
DISABLE_RGA=OFF
DISABLE_LIBJPEG=OFF
INSTALL_DIR=/home/cat/rknn_model_zoo/install/rk3588_linux_aarch64/rknn_mms_tts_demo
BUILD_DIR=/home/cat/rknn_model_zoo/build/build_rknn_mms_tts_demo_rk3588_linux_aarch64_Release
CC=aarch64-linux-gnu-gcc
CXX=aarch64-linux-gnu-g++
===================================
# 省略...............................
[ 16%] Built target imagedrawing
[ 25%] Building C object utils.out/CMakeFiles/audioutils.dir/audio_utils.c.o
[ 58%] Built target fileutils
[ 58%] Built target imageutils
[ 66%] Linking C static library libaudioutils.a
[ 66%] Built target audioutils
[ 83%] Building CXX object CMakeFiles/rknn_mms_tts_demo.dir/rknpu2/mms_tts.cc.o
[ 83%] Building CXX object CMakeFiles/rknn_mms_tts_demo.dir/process.cc.o
[ 91%] Building CXX object CMakeFiles/rknn_mms_tts_demo.dir/main.cc.o
[100%] Linking CXX executable rknn_mms_tts_demo
[100%] Built target rknn_mms_tts_demo
[ 16%] Built target audioutils
[ 33%] Built target fileutils
[ 66%] Built target rknn_mms_tts_demo
[ 83%] Built target imagedrawing
[100%] Built target imageutils
# 省略...............................
测试rknn_mms_tts_demo例程:
cd install/rk3588_linux_aarch64/rknn_mms_tts_demo/
# 命令使用
./rknn_mms_tts_demo <encoder_path> <decoder_path> <input_text>
# 测试lubancat-4
cat@lubancat:~/xxx$ ./rknn_mms_tts_demo ../../../model/mms_tts_eng_encoder_200.rknn
../../../model/mms_tts_eng_decoder_200.rknn '"Mister quilter is the apostle of the middle classes and we are glad to welcome his gospel."'
model input num: 2, output num: 4
input tensors:
index=0, name=input_ids, n_dims=2, dims=[1, 200], n_elems=200, size=1600, fmt=UNDEFINED, type=INT64, qnt_type=AFFINE, zp=0, scale=1.000000
index=1, name=attention_mask, n_dims=2, dims=[1, 200], n_elems=200, size=1600, fmt=UNDEFINED, type=INT64, qnt_type=AFFINE, zp=0, scale=1.000000
output tensors:
index=0, name=log_duration, n_dims=3, dims=[1, 1, 200], n_elems=200, size=400, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
index=1, name=input_padding_mask, n_dims=3, dims=[1, 1, 200], n_elems=200, size=400, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
index=2, name=prior_means, n_dims=3, dims=[1, 200, 192], n_elems=38400, size=76800, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
index=3, name=prior_log_variances, n_dims=3, dims=[1, 200, 192], n_elems=38400, size=76800, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
-- init_mms_tts_encoder_model use: 85.587997 ms
model input num: 4, output num: 1
input tensors:
index=0, name=attn, n_dims=4, dims=[1, 400, 200, 1], n_elems=80000, size=160000, fmt=NHWC, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
index=1, name=output_padding_mask, n_dims=3, dims=[1, 1, 400], n_elems=400, size=800, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
index=2, name=prior_means, n_dims=3, dims=[1, 200, 192], n_elems=38400, size=76800, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
index=3, name=prior_log_variances, n_dims=3, dims=[1, 200, 192], n_elems=38400, size=76800, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
output tensors:
index=0, name=waveform, n_dims=2, dims=[1, 102400], n_elems=102400, size=204800, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
-- init_mms_tts_decoder_model use: 158.074005 ms
-- read_vocab use: 0.013000 ms
-- inference_mms_tts_model use: 669.695007 ms
Real Time Factor (RTF): 0.670 / 6.400 = 0.105
The output wav file is saved: output.wav
语音结果保存在当前目录的output.wav
20.2. MeloTTS¶
MeloTTS是由麻省理工学院(MIT)和 MyShell.ai联合开发的高质量多语言文本转语音(TTS)库。 它支持英语、西班牙语、法语、中文(支持中英混合)、日语和韩语等多种语言,并具有CPU实时推理的能力。
github地址:https://github.com/myshell-ai/MeloTTS
20.2.1. MeloTTS简单使用¶
在本地PC上测试MeloTTS,先创建一个测试环境,并安装相关库。
conda create -n melotts python=3.9
conda activate melotts
# 获取 MeloTTS源码
git clone https://github.com/myshell-ai/MeloTTS.git
cd MeloTTS
# 安装库
pip install unidic-lite
pip install -r requirements.txt
然后使用MeloTTS提供的Python API,测试程序中文示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 | from melo.api import TTS
# Speed is adjustable
speed = 1.0
device = 'cpu' # or cuda:0
text = "我最近在学习machine learning,希望能够在未来的artificial intelligence领域有所建树。"
model = TTS(language='ZH', device=device)
# model = TTS(language='ZH', device=device, use_hf=False ,config_path = 'path/config.json', ckpt_path='path/checkpoint.pth')
speaker_ids = model.hps.data.spk2id
output_path = 'zh.wav'
model.tts_to_file(text, speaker_ids['ZH'], output_path, speed=speed)
|
执行程序,将直接从huggingface上拉取模型,如果网络不行设置下: export HF_ENDPOINT=https://hf-mirror.com
。
(melotts) llh@llh:/xxx/MeloTTS$ python test.py
# 省略............................
#[nltk_data] /home/xxx/nltk_data...
#[nltk_data] Unzipping taggers/averaged_perceptron_tagger.zip.
#[nltk_data] Downloading package cmudict to /home/xxx/nltk_data...
#[nltk_data] Unzipping corpora/cmudict.zip.
tokenizer_config.json: 100%|███████████████████████████████████████████████████████████████████| 48.0/48.0 [00:00<00:00, 60.7kB/s]
config.json: 100%|█████████████████████████████████████████████████████████████████████████████| 570/570 [00:00<00:00, 359kB/s]
vocab.txt: 100%|████████████████████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 668kB/s]
tokenizer.json: 100%|█████████████████████████████████████████████| 466k/466k [00:00<00:00, 949kB/s]
tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████| 48.0/48.0 [00:00<00:00, 10.3kB/s]
# 省略............................
> Text split to sentences.
我最近在学习machine learning,
希望能够在未来的artificial intelligence领域有所建树.
> ===========================
Loading model from cache /tmp/jieba.cache
Loading model cost 0.372 seconds.
Prefix dict has been built successfully.
model.safetensors: 100%|███████████████████████████████████████████████████████| 672M/672M [00:52<00:00, 12.8MB/s]
# 省略............................
100%|██████████████████████████████████████████████████████████████████████████| 2/2 [01:00<00:00, 30.38s/it]
在当前目录下生成zh.wav文件。
也可以直接访问网址: Myshell 或者 Hugging Face 体验。
20.2.2. 导出onnx模型¶
教程测试手动获取 Mmyshell-ai/MeloTTS-Chinese 文件(非必须)。
# 获取MeloTTS-Chinese
git lfs install
git clone https://huggingface.co/myshell-ai/MeloTTS-Chinese
# 或者从镜像网址获取
git clone https://hf-mirror.com/myshell-ai/MeloTTS-Chinese
获取导出onnx模型文件的MeloTTS工程文件:
git clone https://github.com/mmontol/MeloTTS.git
修改工程中export.py文件中模型相关文件的路径:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | from melo.api import TTS
import os
os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"
# Speed is adjustable
speed = 1.0
device = 'cpu' # or cuda:0
text = "我最近在学习machine learning,希望能够在未来的artificial intelligence领域有所建树。"
model = TTS(language='ZH', device=device, use_hf=False ,config_path = '../../models/MeloTTS-Chinese/config.json',
ckpt_path='../../models/MeloTTS-Chinese/checkpoint.pth')
speaker_ids = model.hps.data.spk2id
model.export_onnx(text, speaker_ids['ZH'])
|
在前面创建的melotts环境中导出onnx模型:
(melotts) llh@llh:/xxx/MeloTTS$ export_onnx.py
/home/llh/xxx FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`.
# 省略...........
Export encoder to encoder-ZH_MIX_EN.onnx
Export decoder to decoder-ZH_MIX_EN.onnx
在当前目录下生成encoder-ZH_MIX_EN.onnx和decoder-ZH_MIX_EN.onnx文件。
20.2.3. 转换成rknn模型¶
参考 toolkit2 的使用说明, 将前面得到的onnx模型转换成rknn模型。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | if __name__ == '__main__':
model_path, platform, do_quant, output_path = parse_arg()
# Create RKNN object
rknn = RKNN(verbose=False)
# Pre-process config
print('--> Config model')
rknn.config(target_platform=platform)
print('done')
# Load model
print('--> Loading model')
ret = rknn.load_onnx(model=model_path)
if ret != 0:
print('Load model failed!')
exit(ret)
print('done')
# Build model
print('--> Building model')
ret = rknn.build(do_quantization=do_quant)
if ret != 0:
print('Build model failed!')
exit(ret)
print('done')
# Export rknn model
print('--> Export rknn model')
ret = rknn.export_rknn(output_path)
if ret != 0:
print('Export rknn model failed!')
exit(ret)
print('done')
# Release
rknn.release()
|
在toolkit2环境中,执行convert.py程序(参考配套例程),导出rknn模型。
#Usage: python3 ../convert.py onnx_model_path [platform] [dtype(optional)] [output_rknn_path(optional)]
# platform choose from [rk3566, rk3568, rk3576, rk3588]
# dtype choose from [fp] for [rk3566, rk3568, rk3576, rk3588]
# 教程测试lubancat-4,设置rk3588
(toolkit2.3.2) llh@llh:/xxx$ python convert.py encoder-ZH_MIX_EN.onnx rk3588
I rknn-toolkit2 version: 2.3.2
--> Config model
done
--> Loading model
I Loading : 100%|██████████████████████████████████████████████| 326/326 [00:00<00:00, 57163.17it/s]
W load_onnx: The config.mean_values is None, zeros will be set for input 5!
W load_onnx: The config.std_values is None, ones will be set for input 5!
W load_onnx: The config.mean_values is None, zeros will be set for input 6!
W load_onnx: The config.std_values is None, ones will be set for input 6!
W load_onnx: The config.mean_values is None, zeros will be set for input 7!
W load_onnx: The config.std_values is None, ones will be set for input 7!
done
--> Building model
I OpFusing 0: 100%|███████████████████████████████████████████████| 100/100 [00:01<00:00, 71.55it/s]
I OpFusing 1 : 100%|██████████████████████████████████████████████| 100/100 [00:04<00:00, 20.65it/s]
I OpFusing 0 : 88%|█████████████████████████████████████████▎ | 88/100 [00:07<00:00, 12.44it/s]
# 省略...................................................
E RKNN: [11:04:49.763] Unkown op target: 0
I rknn building done.
done
--> Export rknn model
done
(toolkit2.3.2) llh@llh:/xxx$ python convert.py decoder-ZH_MIX_EN.onnx rk3588
I rknn-toolkit2 version: 2.3.2
--> Config model
done
# 省略...................................................
E RKNN: [11:04:49.763] Unkown op target: 0
I rknn building done.
done
--> Export rknn model
done
会在当前目录下生成encoder-ZH_MIX_EN.rknn和decoder-ZH_MIX_EN.rknn文件。
20.2.4. 鲁班猫上部署测试¶
rknn模型部署的环境参考下 RKNPU2 , 部署melotts模型程序请参考教程配套例程。
在鲁班猫板卡上安装相关库,然后获取测试例程:
# 安装git等等
sudo apt update
sudo apt install git make gcc g++ libsndfile1-dev
# 获取例程
git clone https://gitee.com/LubanCat/lubancat_ai_manual_code.git
cd /lubancat_ai_manual_code/examples/melotts/cpp
将前面导出的rknn模型放到melotts/model目录下,板卡上直接编译测试例程:
# 教程测试lubancat-4,设置rk3588
cat@lubancat:~/xxx/examples/melotts/cpp$ ./build-linux.sh -t rk3588
./build-linux.sh -t rk3588
===================================
TARGET_SOC=rk3588
INSTALL_DIR=/home/cat/xxx/examples/melotts/cpp/install/rk3588_linux
BUILD_DIR=/home/cat/xxx/examples/melotts/cpp/build/build_rk3588_linux
ENABLE_DMA32=OFF
DISABLE_RGA=OFF
BUILD_TYPE=Release
ENABLE_ASAN=OFF
CC=aarch64-linux-gnu-gcc
CXX=aarch64-linux-gnu-g++
# 省略.......................
[ 66%] Built target fileutils
Scanning dependencies of target melotts_demo
# 省略.......................
[100%] Built target imagedrawing
Install the project...
-- Install configuration: "Release"
-- Installing: /home/cat/xxx/examples/melotts/cpp/install/rk3588_linux/./melotts_demo
-- Set runtime path of "/home/cat/xxx/examples/melotts/cpp/install/rk3588_linux/./melotts_demo" to "$ORIGIN/lib"
-- Installing: /home/cat/xxx/examples/melotts/cpp/install/rk3588_linux/model/decoder-ZH_MIX_EN.rknn
-- Installing: /home/cat/xxx/examples/melotts/cpp/install/rk3588_linux/model/encoder-ZH_MIX_EN.rknn
切换到install/rk3588_linux目录下,然后执行melotts_demo程序,(教程这里测试板卡是lubancat-4,测试默认npu单核)。
#Usage: ./melotts_demo [options]
#options:
#--input_text Specifies the input text to be processed.
#--encoder_model_path Encoder model path
#--decoder_model_path Decoder model path
#--output_filename Specifies the output audio filename to be generated in the format, For example: audio.wav
#--speed Specifies the speed of output audio (default: 1.0).
#--disable_bert Indicates whether to disable the BERT model inference (default: ture).
#--language Specifies the language (ZH_MIX_EN / ZH / EN) for TTS (default: ZH).
# 测试使用默认参数,可以设置input_text参数指定文本,设置language参数来指定语言等等
cat@lubancat:~/xxx/install/rk3588_linux$ ./melotts_demo
-- Lexicon init use: 554.216980 ms
model input num: 8, output num: 5
input tensors:
index=0, name=x, n_dims=2, dims=[1, 256], n_elems=256, size=2048, fmt=UNDEFINED, type=INT64, qnt_type=AFFINE, zp=0, scale=1.000000
index=1, name=x_lengths, n_dims=1, dims=[1], n_elems=1, size=8, fmt=UNDEFINED, type=INT64, qnt_type=AFFINE, zp=0, scale=1.000000
index=2, name=sid, n_dims=1, dims=[1], n_elems=1, size=8, fmt=UNDEFINED, type=INT64, qnt_type=AFFINE, zp=0, scale=1.000000
index=3, name=tone, n_dims=2, dims=[1, 256], n_elems=256, size=2048, fmt=UNDEFINED, type=INT64, qnt_type=AFFINE, zp=0, scale=1.000000
index=4, name=lang_ids, n_dims=2, dims=[1, 256], n_elems=256, size=2048, fmt=UNDEFINED, type=INT64, qnt_type=AFFINE, zp=0, scale=1.000000
index=5, name=ja_bert, n_dims=3, dims=[1, 768, 256], n_elems=196608, size=393216, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
index=6, name=noise_scale_w, n_dims=1, dims=[1], n_elems=1, size=2, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
index=7, name=sdp_ratio, n_dims=1, dims=[1], n_elems=1, size=2, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
output tensors:
index=0, name=logw, n_dims=3, dims=[1, 1, 256], n_elems=256, size=512, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
index=1, name=x_mask, n_dims=3, dims=[1, 1, 256], n_elems=256, size=512, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
index=2, name=g, n_dims=3, dims=[1, 256, 1], n_elems=256, size=512, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
index=3, name=m_p, n_dims=3, dims=[1, 192, 256], n_elems=49152, size=98304, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
index=4, name=logs_p, n_dims=3, dims=[1, 192, 256], n_elems=49152, size=98304, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
-- init_melotts_encoder_model use: 37.278000 ms
model input num: 6, output num: 1
input tensors:
index=0, name=attn, n_dims=3, dims=[1, 512, 256], n_elems=131072, size=262144, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
index=1, name=y_mask, n_dims=3, dims=[1, 1, 512], n_elems=512, size=1024, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
index=2, name=g, n_dims=3, dims=[1, 256, 1], n_elems=256, size=512, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
index=3, name=m_p, n_dims=3, dims=[1, 192, 256], n_elems=49152, size=98304, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
index=4, name=logs_p, n_dims=3, dims=[1, 192, 256], n_elems=49152, size=98304, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
index=5, name=noise_scale, n_dims=1, dims=[1], n_elems=1, size=2, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
output tensors:
index=0, name=y, n_dims=3, dims=[1, 1, 262144], n_elems=262144, size=524288, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
-- init_melotts_decoder_model use: 199.809006 ms
Split sentence: 我最近在学习machine learning, 希望能够在未来的artificial intelligence领域有所建树.
audio_length: 5.596009
Real Time Factor (RTF): 1.341 / 5.596 = 0.240
The output wav file is saved: audio.wav
-- save_audio use: 1.077000 ms
输出音频文件保存在当前目录下audio.wav,板卡可以使用命令aplay audio.wav播放生成的音频。
以上就是鲁班猫上简单部署melotts,配套例程请参考https://gitee.com/LubanCat/lubancat_ai_manual_code, 可以自行优化。