🧪 Skills

Flow Voice

Clone any voice from a short audio sample and generate speech with it. Powered by LuxTTS (150x realtime, local, free, no API key). Use when asked to clone a...

v1.0.0
❤️ 0
⬇️ 88
👁 1
Share

Description


name: flow-voice description: Clone any voice from a short audio sample and generate speech with it. Powered by LuxTTS (150x realtime, local, free, no API key). Use when asked to clone a voice, generate a voiceover, add speech to a video, or bake audio into an animation. Supports wav/mp3 input, 48kHz output. Works on CPU and MPS (Apple Silicon). homepage: https://github.com/ysharma3501/LuxTTS metadata: openclaw: emoji: "🎙️" version: "1.0.0" author: "windseeker1111" requires: bins: ["uv", "ffmpeg"] pip: ["zipvoice", "soundfile", "librosa"]

Flow Voice — Voice Cloning for OpenClaw

Clone any voice from a 3–30 second audio sample and generate speech from text. Powered by LuxTTS — 150x realtime, runs locally, fits in 1GB VRAM, works on CPU and Apple Silicon MPS. No API key, no cloud, no cost.

Output directory: ~/clawd/output/voice/


Commands

What you say What it does
"clone this voice [audio file]" Encode a voice profile from a sample
"speak as [name]: [text]" Generate speech using a saved voice profile
"add voiceover to [video]: [text]" Generate speech + bake into video with ffmpeg
"list voices" Show saved voice profiles
"clone voice from URL [url]" Download audio from URL, then clone

Workflow

Step 1: Clone a voice

uv run ~/clawd/skills/flow-voice/scripts/clone.py \
  --sample /path/to/sample.wav \
  --name "eric"

Saves encoded profile to ~/clawd/output/voice/profiles/eric.pkl. Requires at least 3 seconds of clean audio. 10–30 seconds is ideal.

Step 2: Generate speech

uv run ~/clawd/skills/flow-voice/scripts/speak.py \
  --voice "eric" \
  --text "Hello, this is a test of voice cloning." \
  --output ~/clawd/output/voice/output.wav

Outputs 48kHz WAV. Use --speed 1.0 to adjust pace.

Step 3: Bake into video (optional)

uv run ~/clawd/skills/flow-voice/scripts/speak.py \
  --voice "eric" \
  --text "Your agent can think. Now teach it to draw." \
  --output /tmp/vo.wav

ffmpeg -i input.mp4 -i /tmp/vo.wav \
  -c:v copy -c:a aac -shortest output_with_voice.mp4

One-Shot: Clone + Speak in one command

uv run ~/clawd/skills/flow-voice/scripts/flow_voice.py \
  --sample /path/to/sample.wav \
  --text "Beautiful diagrams, from a single prompt." \
  --output ~/clawd/output/voice/result.wav

No profile saving — just clone and speak immediately.

Bake voiceover directly into a video

uv run ~/clawd/skills/flow-voice/scripts/flow_voice.py \
  --sample /path/to/sample.wav \
  --text "Your agent can think. Now teach it to draw." \
  --video /path/to/animation.mp4 \
  --output ~/clawd/output/voice/final_with_voice.mp4

Parameters

Flag Default Description
--sample required Reference audio file (wav/mp3, min 3s)
--text required Text to speak
--output auto-named Output file path
--video none If set, bakes audio into this video
--voice none Use saved profile instead of --sample
--name none Save cloned profile with this name
--speed 1.0 Speech speed (0.8 = slower, 1.2 = faster)
--steps 4 Inference steps (3–4 recommended)
--t-shift 0.9 Sampling param (higher = potentially better quality)
--smooth false Add smoothing (reduces metallic artifacts)
--device auto Force cpu / mps / cuda

Tips

  • Minimum 3 seconds of audio for cloning — 10–30s is ideal
  • If you hear metallic artifacts, add --smooth
  • For Apple Silicon (M1/M2/M3), device defaults to mps automatically
  • First run downloads the model (~200MB) to ~/.cache/huggingface/
  • Clean audio works best — no background music or noise in the reference sample

Examples

Clone Eric's voice from a recording:

uv run ~/clawd/skills/flow-voice/scripts/flow_voice.py \
  --sample ~/recordings/eric-30s.wav \
  --name eric \
  --text "FlowStay is live. Book your room with AI." \
  --output ~/clawd/output/voice/flowstay-promo.wav

Add voiceover to a Flow Visual Explainer animation:

uv run ~/clawd/skills/flow-voice/scripts/flow_voice.py \
  --voice eric \
  --text "Your agent can think. Now teach it to draw." \
  --video ~/clawd/2026-03-10-flowvisual-c3-magic-wand-comp.mp4 \
  --output ~/clawd/output/voice/flowvisual-voiced.mp4

Quick one-shot from a downloaded audio clip:

yt-dlp -x --audio-format wav -o /tmp/ref.wav "https://www.instagram.com/reel/..."
uv run ~/clawd/skills/flow-voice/scripts/flow_voice.py \
  --sample /tmp/ref.wav \
  --text "Hello from OpenClaw." \
  --output ~/clawd/output/voice/test.wav

Powered by LuxTTS (ysharma3501/LuxTTS, ZipVoice-based) — Free, local, no API key required. Packaged for OpenClaw by Flow — March 2026

Reviews (0)

Sign in to write a review.

No reviews yet. Be the first to review!

Comments (0)

Sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Compatible Platforms

Pricing

Free

Related Configs