## 部署tts推理 git clone https://github.com/fishaudio/fish-speech.git ## 1. 安装环境 ``` # 创建一个 python 3.10 虚拟环境, 你也可以用 virtualenv conda create -n fish-speech python=3.10 conda activate fish-speech # 安装 pytorch pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 # (Ubuntu / Debian 用户) 安装 sox + ffmpeg apt install libsox-dev ffmpeg # (Ubuntu / Debian 用户) 安装 pyaudio apt install build-essential \ cmake \ libasound-dev \ portaudio19-dev \ libportaudio2 \ libportaudiocpp0 # 安装 fish-speech pip3 install -e .[stable] ``` ## 2. 下载模型文件 在fish-speech项目下执行 huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-speech-1.5 ## 3. 建立克隆声音目录 在fish-speech项目下新建/references/test文件夹,将音频文件放到该目录下,并建一个同名的lab文件,将字幕放到该文件里。 ![](./fish-speech.jpg) ## 4. 启动api服务 ``` python -m tools.api_server \ --listen 0.0.0.0:8080 \ --llama-checkpoint-path "checkpoints/fish-speech-1.5" \ --decoder-checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" \ --decoder-config-name firefly_gan_vq \ --compile \ --half ``` --- ## 5. 接口说明 ### 5.1 Text-to-Speech endpoint: `/v1/tts` POST: ```json { "text": "string", "chunk_length": 200, "format": "wav", "references": [], "reference_id": null, "seed": null, "use_memory_cache": "off", "normalize": true, "streaming": false, "max_new_tokens": 1024, "top_p": 0.7, "repetition_penalty": 1.2, "temperature": 0.7 } ```