🧪 Skills

Polyphone TTS

Fix Chinese polyphone (多音字) mispronunciation in TTS by auto-detecting ambiguous characters and applying pinyin annotations. Use when users complain about wro...

v1.0.0
❤️ 0
⬇️ 24
👁 1
Share

Description


name: senseaudio-polyphone-tts description: Fix Chinese polyphone (多音字) mispronunciation in TTS by auto-detecting ambiguous characters and applying pinyin annotations. Use when users complain about wrong pronunciation, need precise tone control, or are synthesizing text with characters like 行/干/量/好/了/得/地/的/着/过. Triggers on "读音不对", "这个字读错了", "多音字", "标注拼音", "银行行长", "绕口令", or any request to correct TTS pronunciation. metadata: openclaw: requires: env: - SENSEAUDIO_API_KEY bins: - curl - jq - xxd primaryEnv: SENSEAUDIO_API_KEY homepage: https://senseaudio.cn compatibility: required_credentials: - name: SENSEAUDIO_API_KEY description: API key from https://senseaudio.cn/platform/api-key env_var: SENSEAUDIO_API_KEY

SenseAudio Polyphone TTS (多音字)

Precise pronunciation control for Chinese TTS via pinyin annotation. The dictionary parameter lets you override how specific characters are read — essential for polyphones (多音字) that the model might guess wrong.

The dictionary parameter only works with cloned voices and model SenseAudio-TTS-1.5. System voices (male_0004_a etc.) do not support it.

Step 1: Scan for Polyphones

When the user provides text, scan it for these common polyphones and flag any that appear:

Character Readings Context clues
háng (行业/银行/行列) / xíng (行走/行动/可行) 银行、行长、行业 → háng
gān (干净/干燥) / gàn (干活/干部) 干部、干活 → gàn
liáng (量体温/测量) / liàng (数量/重量) 数量、质量 → liàng
pū (铺床/铺路) / pù (店铺/铺子) 店铺、铺面 → pù
hǎo (好的/很好) / hào (好奇/爱好) 爱好、好学 → hào
le (吃了/来了) / liǎo (了解/了结) 了解、了不起 → liǎo
de (跑得快) / dé (得到) / děi (得去) 得到 → dé;必须 → děi
de (慢慢地) / dì (土地/地方) 副词用法 → de
de (我的) / dí (的确) / dì (目的) 目的、的确 → dì/dí
zhe (看着) / zháo (着火) / zhuó (着装) 着火、着急 → zháo;着装 → zhuó
cháng (长度/很长) / zhǎng (成长/行长) 行长、生长 → zhǎng
zhòng (重量/重要) / chóng (重复/重新) 重复、重新 → chóng
zhōng (中间/中国) / zhòng (中奖/中毒) 中奖、中毒 → zhòng
hái (还有/还是) / huán (还钱/归还) 还钱、偿还 → huán
fā (发现/发展) / fà (头发/理发) 头发、理发 → fà
shù (数字/数量) / shǔ (数数/数一数) 数数、数落 → shǔ
cān (参加/参考) / shēn (人参/党参) 人参、党参 → shēn
chā (差别/差距) / chà (差不多) / chāi (出差) 出差 → chāi;差不多 → chà

Show the user which polyphones were found and your best guess at the intended reading, then ask them to confirm or correct before synthesizing.

Example:

检测到多音字:
- "行" (第2个): 银行 → 建议读 háng [hang2] ✓ 还是 xíng [xing2]?
- "行" (第4个): 行长 → 建议读 zhǎng [zhang3] ✓ 还是 cháng [chang2]?

Step 2: Build the Dictionary

Convert confirmed readings into the dictionary array. Each entry covers one phrase containing the polyphone:

原文片段 → replacement 格式:在多音字前加 [pinyin],其余字保持原样

Pinyin format: [声母韵母声调数字] — e.g., [hang2][xing2][zhang3]

Example:

  • original: 银行行长
  • replacement: 银[hang2]行[zhang3]长

Build the full dictionary array:

"dictionary": [
  {"original": "银行行长", "replacement": "银[hang2]行[zhang3]长"},
  {"original": "好奇心", "replacement": "[hao4]奇心"}
]

Each original should be a short phrase (3–8 chars) that uniquely identifies the occurrence in context. Avoid single-character originals — they may match unintended occurrences.

Step 3: Synthesize

The user must provide a cloned voice ID. If they don't have one, remind them that dictionary requires a cloned voice and suggest using the senseaudio-voice-cloner skill first.

curl -s -X POST https://api.senseaudio.cn/v1/t2a_v2 \
  -H "Authorization: Bearer $SENSEAUDIO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "SenseAudio-TTS-1.5",
    "text": "<TEXT>",
    "stream": false,
    "voice_setting": {
      "voice_id": "<CLONED_VOICE_ID>"
    },
    "audio_setting": {
      "format": "mp3"
    },
    "dictionary": <DICTIONARY_ARRAY>
  }' -o response.json

jq -r '.data.audio' response.json | xxd -r -p > output.mp3

Check base_resp.status_code == 0 before decoding.

Step 4: Iterate

After the user listens, they may find additional mispronunciations. Update the dictionary array and re-synthesize. Keep the previous response.json until the new one succeeds.

Report: file path, duration (jq '.extra_info.audio_length' response.json ms), character count, and which dictionary entries were applied.

Reviews (0)

Sign in to write a review.

No reviews yet. Be the first to review!

Comments (0)

Sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Compatible Platforms

Pricing

Free

Related Configs