🧪 Skills

Alicloud Ai Audio Cosyvoice Voice Clone

Use when creating cloned voices with Alibaba Cloud Model Studio CosyVoice customization models, especially cosyvoice-v3.5-plus or cosyvoice-v3.5-flash, from...

v1.0.0
❤️ 0
⬇️ 45
👁 1
Share

Description


name: alicloud-ai-audio-cosyvoice-voice-clone description: Use when creating cloned voices with Alibaba Cloud Model Studio CosyVoice customization models, especially cosyvoice-v3.5-plus or cosyvoice-v3.5-flash, from reference audio and then reusing the returned voice_id in later TTS calls. version: 1.0.0

Category: provider

Model Studio CosyVoice Voice Clone

Use the CosyVoice voice enrollment API to create cloned voices from public reference audio.

Critical model names

Use model="voice-enrollment" and one of these target_model values:

  • cosyvoice-v3.5-plus
  • cosyvoice-v3.5-flash
  • cosyvoice-v3-plus
  • cosyvoice-v3-flash
  • cosyvoice-v2

Recommended default in this repo:

  • target_model="cosyvoice-v3.5-plus"

Region and compatibility

  • cosyvoice-v3.5-plus and cosyvoice-v3.5-flash are available only in China mainland deployment mode (Beijing endpoint).
  • In international deployment mode (Singapore endpoint), cosyvoice-v3-plus and cosyvoice-v3-flash do not support voice clone/design.
  • The target_model used during enrollment must match the model used later in speech synthesis, otherwise synthesis fails.

Endpoint

  • Domestic: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
  • International: https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization

Prerequisites

  • Set DASHSCOPE_API_KEY in your environment, or add dashscope_api_key to ~/.alibabacloud/credentials.
  • Provide a public audio URL for the enrollment sample.

Normalized interface (cosyvoice.voice_clone)

Request

  • model (string, optional): fixed to voice-enrollment
  • target_model (string, optional): default cosyvoice-v3.5-plus
  • prefix (string, required): letters/digits only, max 10 chars
  • voice_sample_url (string, required): public audio URL
  • language_hints (array[string], optional): only first item is used
  • max_prompt_audio_length (float, optional): only for cosyvoice-v3.5-plus, cosyvoice-v3.5-flash, cosyvoice-v3-flash
  • enable_preprocess (bool, optional): only for cosyvoice-v3.5-plus, cosyvoice-v3.5-flash, cosyvoice-v3-flash

Response

  • voice_id (string): use this as the voice parameter in later TTS calls
  • request_id (string)
  • usage.count (number, optional)

Operational guidance

  • For Chinese dialect reference audio, keep language_hints=["zh"]; control dialect style later in synthesis via text or instruct.
  • For cosyvoice-v3.5-plus, supported language_hints include zh, en, fr, de, ja, ko, ru, pt, th, id, vi.
  • Avoid frequent enrollment calls; each call creates a new custom voice and consumes quota.

Local helper script

Prepare a normalized request JSON:

python skills/ai/audio/alicloud-ai-audio-cosyvoice-voice-clone/scripts/prepare_cosyvoice_clone_request.py \
  --target-model cosyvoice-v3.5-plus \
  --prefix myvoice \
  --voice-sample-url https://example.com/voice.wav \
  --language-hint zh

Validation

mkdir -p output/alicloud-ai-audio-cosyvoice-voice-clone
for f in skills/ai/audio/alicloud-ai-audio-cosyvoice-voice-clone/scripts/*.py; do
  python3 -m py_compile "$f"
done
echo "py_compile_ok" > output/alicloud-ai-audio-cosyvoice-voice-clone/validate.txt

Pass criteria: command exits 0 and output/alicloud-ai-audio-cosyvoice-voice-clone/validate.txt is generated.

Output And Evidence

  • Save artifacts, command outputs, and API response summaries under output/alicloud-ai-audio-cosyvoice-voice-clone/.
  • Include target_model, prefix, and sample URL in the evidence file.

References

  • references/api_reference.md
  • references/sources.md

Reviews (0)

Sign in to write a review.

No reviews yet. Be the first to review!

Comments (0)

Sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Compatible Platforms

Pricing

Free

Related Configs