Polyphone TTS
Fix Chinese polyphone (多音字) mispronunciation in TTS by auto-detecting ambiguous characters and applying pinyin annotations. Use when users complain about wro...
Description
name: senseaudio-polyphone-tts description: Fix Chinese polyphone (多音字) mispronunciation in TTS by auto-detecting ambiguous characters and applying pinyin annotations. Use when users complain about wrong pronunciation, need precise tone control, or are synthesizing text with characters like 行/干/量/好/了/得/地/的/着/过. Triggers on "读音不对", "这个字读错了", "多音字", "标注拼音", "银行行长", "绕口令", or any request to correct TTS pronunciation. metadata: openclaw: requires: env: - SENSEAUDIO_API_KEY bins: - curl - jq - xxd primaryEnv: SENSEAUDIO_API_KEY homepage: https://senseaudio.cn compatibility: required_credentials: - name: SENSEAUDIO_API_KEY description: API key from https://senseaudio.cn/platform/api-key env_var: SENSEAUDIO_API_KEY
SenseAudio Polyphone TTS (多音字)
Precise pronunciation control for Chinese TTS via pinyin annotation. The dictionary parameter lets you override how specific characters are read — essential for polyphones (多音字) that the model might guess wrong.
The
dictionaryparameter only works with cloned voices and modelSenseAudio-TTS-1.5. System voices (male_0004_a etc.) do not support it.
Step 1: Scan for Polyphones
When the user provides text, scan it for these common polyphones and flag any that appear:
| Character | Readings | Context clues |
|---|---|---|
| 行 | háng (行业/银行/行列) / xíng (行走/行动/可行) | 银行、行长、行业 → háng |
| 干 | gān (干净/干燥) / gàn (干活/干部) | 干部、干活 → gàn |
| 量 | liáng (量体温/测量) / liàng (数量/重量) | 数量、质量 → liàng |
| 铺 | pū (铺床/铺路) / pù (店铺/铺子) | 店铺、铺面 → pù |
| 好 | hǎo (好的/很好) / hào (好奇/爱好) | 爱好、好学 → hào |
| 了 | le (吃了/来了) / liǎo (了解/了结) | 了解、了不起 → liǎo |
| 得 | de (跑得快) / dé (得到) / děi (得去) | 得到 → dé;必须 → děi |
| 地 | de (慢慢地) / dì (土地/地方) | 副词用法 → de |
| 的 | de (我的) / dí (的确) / dì (目的) | 目的、的确 → dì/dí |
| 着 | zhe (看着) / zháo (着火) / zhuó (着装) | 着火、着急 → zháo;着装 → zhuó |
| 长 | cháng (长度/很长) / zhǎng (成长/行长) | 行长、生长 → zhǎng |
| 重 | zhòng (重量/重要) / chóng (重复/重新) | 重复、重新 → chóng |
| 中 | zhōng (中间/中国) / zhòng (中奖/中毒) | 中奖、中毒 → zhòng |
| 还 | hái (还有/还是) / huán (还钱/归还) | 还钱、偿还 → huán |
| 发 | fā (发现/发展) / fà (头发/理发) | 头发、理发 → fà |
| 数 | shù (数字/数量) / shǔ (数数/数一数) | 数数、数落 → shǔ |
| 参 | cān (参加/参考) / shēn (人参/党参) | 人参、党参 → shēn |
| 差 | chā (差别/差距) / chà (差不多) / chāi (出差) | 出差 → chāi;差不多 → chà |
Show the user which polyphones were found and your best guess at the intended reading, then ask them to confirm or correct before synthesizing.
Example:
检测到多音字:
- "行" (第2个): 银行 → 建议读 háng [hang2] ✓ 还是 xíng [xing2]?
- "行" (第4个): 行长 → 建议读 zhǎng [zhang3] ✓ 还是 cháng [chang2]?
Step 2: Build the Dictionary
Convert confirmed readings into the dictionary array. Each entry covers one phrase containing the polyphone:
原文片段 → replacement 格式:在多音字前加 [pinyin],其余字保持原样
Pinyin format: [声母韵母声调数字] — e.g., [hang2]、[xing2]、[zhang3]
Example:
- original:
银行行长 - replacement:
银[hang2]行[zhang3]长
Build the full dictionary array:
"dictionary": [
{"original": "银行行长", "replacement": "银[hang2]行[zhang3]长"},
{"original": "好奇心", "replacement": "[hao4]奇心"}
]
Each original should be a short phrase (3–8 chars) that uniquely identifies the occurrence in context. Avoid single-character originals — they may match unintended occurrences.
Step 3: Synthesize
The user must provide a cloned voice ID. If they don't have one, remind them that dictionary requires a cloned voice and suggest using the senseaudio-voice-cloner skill first.
curl -s -X POST https://api.senseaudio.cn/v1/t2a_v2 \
-H "Authorization: Bearer $SENSEAUDIO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "SenseAudio-TTS-1.5",
"text": "<TEXT>",
"stream": false,
"voice_setting": {
"voice_id": "<CLONED_VOICE_ID>"
},
"audio_setting": {
"format": "mp3"
},
"dictionary": <DICTIONARY_ARRAY>
}' -o response.json
jq -r '.data.audio' response.json | xxd -r -p > output.mp3
Check base_resp.status_code == 0 before decoding.
Step 4: Iterate
After the user listens, they may find additional mispronunciations. Update the dictionary array and re-synthesize. Keep the previous response.json until the new one succeeds.
Report: file path, duration (jq '.extra_info.audio_length' response.json ms), character count, and which dictionary entries were applied.
Reviews (0)
No reviews yet. Be the first to review!
Comments (0)
No comments yet. Be the first to share your thoughts!