🧪 Skills

qwenspeak

Text-to-speech generation via Qwen3-TTS over SSH. Preset voices, voice cloning, voice design. Use when the user wants to generate speech audio, clone voices,...

v1.5.0
❤️ 0
⬇️ 552
👁 1
Share

Description


name: qwenspeak description: Text-to-speech generation via Qwen3-TTS over SSH. Preset voices, voice cloning, voice design. Use when the user wants to generate speech audio, clone voices, or work with TTS. compatibility: Requires ssh and a running qwenspeak instance. QWENSPEAK_HOST and QWENSPEAK_PORT env vars must be set. metadata: author: psyb0t homepage: https://github.com/psyb0t/docker-qwenspeak

qwenspeak

YAML-driven text-to-speech over SSH using Qwen3-TTS models.

For installation and deployment, see references/setup.md.

SSH Wrapper

Use scripts/qwenspeak.sh for all commands. It handles host, port, and host key acceptance via QWENSPEAK_HOST and QWENSPEAK_PORT env vars.

scripts/qwenspeak.sh <command> [args]
scripts/qwenspeak.sh <command> < input_file
scripts/qwenspeak.sh <command> > output_file

TTS Generation

Submit YAML, get a job UUID back immediately, poll for progress. Jobs run sequentially — one at a time, the rest queue up.

# Get the YAML template
scripts/qwenspeak.sh "tts print-yaml" > job.yaml

# Submit job
scripts/qwenspeak.sh "tts" < job.yaml
# {"id": "550e8400-...", "status": "queued", "total_steps": 3, "total_generations": 7}

# Check progress
scripts/qwenspeak.sh "tts get-job 550e8400"

# Follow job log
scripts/qwenspeak.sh "tts get-job-log 550e8400 -f"

# Download result
scripts/qwenspeak.sh "get hello.wav" > hello.wav

YAML Structure

Global settings + list of steps. Each step loads a model, runs all its generations, then unloads. Settings cascade: global > step > generation.

steps:
  - mode: custom-voice
    model_size: 1.7b
    speaker: Ryan
    language: English
    generate:
      - text: "Hello world"
        output: hello.wav
      - text: "I cannot believe this!"
        speaker: Vivian
        instruct: "Speak angrily"
        output: angry.wav

  - mode: voice-design
    generate:
      - text: "Welcome to our store."
        instruct: "A warm, friendly young female voice with a cheerful tone"
        output: welcome.wav

  - mode: voice-clone
    model_size: 1.7b
    ref_audio: ref.wav
    ref_text: "Transcript of reference"
    generate:
      - text: "First line in cloned voice"
        output: clone1.wav
      - text: "Second line"
        output: clone2.wav

Modes

custom-voice — Pick from 9 preset speakers. 1.7B supports emotion/style via instruct.

voice-design — Describe the voice in natural language via instruct. 1.7B only.

voice-clone — Clone from reference audio. Set ref_audio and ref_text at step level to reuse across generations. x_vector_only: true skips transcript.

Emotion trick for cloned voices

Upload references with different emotions, use separate steps:

scripts/qwenspeak.sh "create-dir refs"
scripts/qwenspeak.sh "put refs/happy.wav" < me_happy.wav
scripts/qwenspeak.sh "put refs/angry.wav" < me_angry.wav
steps:
  - mode: voice-clone
    ref_audio: refs/happy.wav
    ref_text: "transcript of happy ref"
    generate:
      - text: "Great news everyone!"
        output: happy1.wav

  - mode: voice-clone
    ref_audio: refs/angry.wav
    ref_text: "transcript of angry ref"
    generate:
      - text: "This is unacceptable"
        output: angry1.wav

Job Management

scripts/qwenspeak.sh "tts list-jobs"              # list all
scripts/qwenspeak.sh "tts list-jobs --json"        # JSON output
scripts/qwenspeak.sh "tts get-job <id>"            # job details
scripts/qwenspeak.sh "tts get-job-log <id>"        # view log
scripts/qwenspeak.sh "tts get-job-log <id> -f"     # follow log
scripts/qwenspeak.sh "tts cancel-job <id>"         # cancel

Statuses: queuedrunningcompleted | failed | cancelled

Completed jobs auto-cleaned after 1 day, all jobs after 1 week. UUID prefixes work (e.g. first 8 chars).

File Operations

All paths relative to the work directory. Traversal blocked.

Command Description
put <path> Upload file from stdin
get <path> Download file to stdout
list-files [--json] List directory
remove-file <path> Delete a file
create-dir <path> Create directory
remove-dir <path> Remove empty directory
move-file <src> <dst> Move or rename
copy-file <src> <dst> Copy a file
file-exists <path> Check if file exists (true/false)
search-files <glob> Glob search (** recursive)

Speakers

Speaker Gender Language Description
Vivian Female Chinese Bright, slightly edgy young voice
Serena Female Chinese Warm, gentle young voice
Uncle_Fu Male Chinese Seasoned, low mellow timbre
Dylan Male Chinese Youthful Beijing dialect, clear natural timbre
Eric Male Chinese Lively Chengdu/Sichuan dialect, slightly husky
Ryan Male English Dynamic with strong rhythmic drive
Aiden Male English Sunny American, clear midrange
Ono_Anna Female Japanese Playful, light nimble timbre
Sohee Female Korean Warm with rich emotion

YAML Options

All settings cascade: global > step > generation.

Field Default Description
dtype float32 float32, float16, bfloat16 (float16/bfloat16 GPU only)
flash_attn auto FlashAttention-2: auto-detects, auto-switches float32→bfloat16
temperature 0.9 Sampling temperature
top_k 50 Top-k sampling
top_p 1.0 Top-p / nucleus sampling
repetition_penalty 1.05 Repetition penalty
max_new_tokens 2048 Max codec tokens to generate
no_sample false Greedy decoding
streaming false Streaming mode (lower latency)
mode required Step only: custom-voice, voice-design, or voice-clone
model_size 1.7b Step only: 1.7b or 0.6b
text required Text to synthesize
output required Output file path
speaker Vivian custom-voice: speaker name
language Auto Language for synthesis
instruct - custom-voice: emotion/style; voice-design: voice description
ref_audio - voice-clone: reference audio file path
ref_text - voice-clone: transcript of reference audio
x_vector_only false voice-clone: use speaker embedding only

Reviews (0)

Sign in to write a review.

No reviews yet. Be the first to review!

Comments (0)

Sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Compatible Platforms

Pricing

Free

Related Configs