🧪 Skills

Mimimax Voice Clone +TTS

Voice cloning and TTS using MiniMax API. User must provide a voice name when cloning; after success, voice_name->voice_id is written back to this skill doc f...

v1.0.1
❤️ 1
⬇️ 106
👁 1
Share

Description


name: voice-clone-tts description: Voice cloning and TTS using MiniMax API. User must provide a voice name when cloning; after success, voice_name->voice_id is written back to this skill doc for reuse. install: instruction-only; Python script requires Python 3.7+ and dependencies in requirements.txt (see Install section). requiredEnv:

  • MINIMAX_API_KEY optionalEnv:
  • MINIMAX_KEY
  • MINIMAX_GROUP_API_KEY envNote: At least one of MINIMAX_API_KEY, MINIMAX_KEY, or MINIMAX_GROUP_API_KEY must be set for API auth.

Voice Clone + TTS

Scope

This skill is narrowly scoped to: (1) uploading clone audio to MiniMax, (2) creating a cloned voice, (3) TTS with cloned or existing voices, and (4) updating the cloned-voice mapping block in this SKILL.md. The script only reads/writes this skill’s SKILL.md; it does not read unrelated system files or other environment variables beyond the MiniMax API key(s) above.

When to Use

  • Use this skill when you need to clone user-provided audio into a reusable voice.
  • Use this skill when you need text-to-speech (TTS) with an already cloned voice.
  • For long-term maintenance, cloned results are written back to this file to reduce repeated setup.

API Reference (MiniMax)

  • Upload clone audio: POST /v1/files/upload, purpose=voice_clone, multipart/form-data.
  • Create cloned voice: POST /v1/voice_clone.
  • Speech synthesis: POST /v1/t2a_v2.
  • Audio requirements: mp3/m4a/wav, duration 10 seconds–5 minutes, file size <=20MB.

Install

  • Runtime: Python 3.7+.
  • Dependencies: The script uses the requests library. Install with:
    pip install -r requirements.txt
    
    or pip install requests.
  • Network: The script calls MiniMax APIs over HTTPS; it does not read unrelated system files. It only reads/writes this skill’s SKILL.md to update the cloned-voice mapping block.

Required environment variables (credentials)

At least one of the following must be set for MiniMax API authentication (see frontmatter requiredEnv / optionalEnv):

Variable Required Notes
MINIMAX_API_KEY preferred Primary API key
MINIMAX_KEY alternative Accepted if set
MINIMAX_GROUP_API_KEY alternative Accepted if set

The script will fail with a clear error if none are set.

Prerequisites

  1. Credentials: Set one of the env vars above (see “Required environment variables”).
  2. Prepare clone audio (format/duration/size limits above).
  3. Before cloning, confirm the voice name (voice_name) with the user, e.g. liuyang_narration_v1.

Usage

  1. Go to the skill directory: cd workspace/skills/voice-clone-tts
  2. Run the script (clone + synthesize):
python scripts/minimax_voice_clone_tts.py \
  --audio "/absolute/path/to/voice.wav" \
  --voice-name "yangtuo_demo_v1" \
  --display-name "Alpaca Demo" \
  --text "Hello, this is a cloned voice test." \
  --output "./output/voice_test.mp3"
  1. To clone only (no synthesis), omit --text.
  2. To synthesize only (by display name or voice_id):
# Resolve by display name
python scripts/minimax_voice_clone_tts.py \
  --voice "voice_v2" \
  --text "This is TTS using an existing cloned voice." \
  --output "./output/reuse_voice.mp3"

# Or specify voice_id directly
python scripts/minimax_voice_clone_tts.py \
  --voice-id "yangtuo_demo_v1" \
  --text "This is TTS using an existing cloned voice." \
  --output "./output/reuse_voice.mp3"

Common Options

  • --audio: Path to clone audio (required for cloning).
  • --voice-name: Required when cloning; API voice ID (letters, digits, underscores, e.g. yangtuo_demo_v1).
  • --display-name: Optional when cloning; display name written to SKILL (e.g. Alpaca Demo). Defaults to --voice-name if omitted.
  • --voice-id: For synthesis, specify API voice_id directly (skips mapping table).
  • --voice: For synthesis, specify display name or voice_id; resolved from the mapping table below (e.g. voice_v2 or yangtuo_demo_v1).
  • --text: Text to synthesize (omit for clone-only).
  • --output: Output audio path (default ./output/minimax_tts.mp3).
  • --model: Speech model (default speech-2.8-turbo).
  • --format: Output format (mp3/pcm/flac/wav).
  • --speed --vol --pitch --emotion: Speech expression parameters.

Write-Back (Important)

  • After a successful clone, the script writes display name → voice_id to the “Cloned Voice Mapping” section below.
  • Use a display name with --voice "display name" so you don't need to remember voice_id.

Cloned Voice Mapping

  • Left: display name; right: API voice_id. For TTS use --voice "display name" or --voice-id voice_id.
  • test_voice_1772187110: test_voice_1772187110 (updated: 2026-02-27 18:12:00)
  • voice_v1: shuangyue_test (updated: 2026-02-28 16:47:01)
  • voice_v2: yangtuo_demo_v1 (updated: 2026-02-27 18:19:39)
  • voice_v3: dong_yuhui_voice_v1 (updated: 2026-03-02 19:51:44)

Troubleshooting

  • 401 / auth failure: Check that MINIMAX_API_KEY is correct.
  • Parameter errors: Check voice_name rules, audio format/size, and text length.
  • Clone succeeded but not written back: Ensure SKILL.md exists and contains the write-back marker block.

Reviews (0)

Sign in to write a review.

No reviews yet. Be the first to review!

Comments (0)

Sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Compatible Platforms

Pricing

Free

Related Configs