🧪 Skills

Alicloud Ai Audio Asr

Transcribe non-realtime speech with Alibaba Cloud Model Studio Qwen ASR models (`qwen3-asr-flash`, `qwen-audio-asr`, `qwen3-asr-flash-filetrans`). Use when c...

v1.0.0
❤️ 0
⬇️ 48
👁 1
Share

Description


name: alicloud-ai-audio-asr description: Transcribe non-realtime speech with Alibaba Cloud Model Studio Qwen ASR models (qwen3-asr-flash, qwen-audio-asr, qwen3-asr-flash-filetrans). Use when converting recorded audio files to text, generating transcripts with timestamps, or documenting DashScope/OpenAI-compatible ASR request and response fields. version: 1.0.0

Category: provider

Model Studio Qwen ASR (Non-Realtime)

Validation

mkdir -p output/alicloud-ai-audio-asr
python -m py_compile skills/ai/audio/alicloud-ai-audio-asr/scripts/transcribe_audio.py && echo "py_compile_ok" > output/alicloud-ai-audio-asr/validate.txt

Pass criteria: command exits 0 and output/alicloud-ai-audio-asr/validate.txt is generated.

Output And Evidence

  • Store transcripts and API responses under output/alicloud-ai-audio-asr/.
  • Keep one command log or sample response per run.

Use Qwen ASR for recorded audio transcription (non-realtime), including short audio sync calls and long audio async jobs.

Critical model names

Use one of these exact model strings:

  • qwen3-asr-flash
  • qwen-audio-asr
  • qwen3-asr-flash-filetrans

Selection guidance:

  • Use qwen3-asr-flash or qwen-audio-asr for short/normal recordings (sync).
  • Use qwen3-asr-flash-filetrans for long-file transcription (async task workflow).

Prerequisites

  • Install SDK dependencies (script uses Python stdlib only):
python3 -m venv .venv
. .venv/bin/activate
  • Set DASHSCOPE_API_KEY in environment, or add dashscope_api_key to ~/.alibabacloud/credentials.

Normalized interface (asr.transcribe)

Request

  • audio (string, required): public URL or local file path.
  • model (string, optional): default qwen3-asr-flash.
  • language_hints (array, optional): e.g. zh, en.
  • sample_rate (number, optional)
  • vocabulary_id (string, optional)
  • disfluency_removal_enabled (bool, optional)
  • timestamp_granularities (array, optional): e.g. sentence.
  • async (bool, optional): default false for sync models, true for qwen3-asr-flash-filetrans.

Response

  • text (string): normalized transcript text.
  • task_id (string, optional): present for async submission.
  • status (string): SUCCEEDED or submission status.
  • raw (object): original API response.

Quick start (official HTTP API)

Sync transcription (OpenAI-compatible protocol):

curl -sS --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
  --header "Authorization: Bearer $DASHSCOPE_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "qwen3-asr-flash",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "input_audio",
            "input_audio": {
              "data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
            }
          }
        ]
      }
    ],
    "stream": false,
    "asr_options": {
      "enable_itn": false
    }
  }'

Async long-file transcription (DashScope protocol):

curl -sS --location 'https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription' \
  --header "Authorization: Bearer $DASHSCOPE_API_KEY" \
  --header 'X-DashScope-Async: enable' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "qwen3-asr-flash-filetrans",
    "input": {
      "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
    }
  }'

Poll task result:

curl -sS --location "https://dashscope.aliyuncs.com/api/v1/tasks/<task_id>" \
  --header "Authorization: Bearer $DASHSCOPE_API_KEY"

Local helper script

Use the bundled script for URL/local-file input and optional async polling:

python skills/ai/audio/alicloud-ai-audio-asr/scripts/transcribe_audio.py \
  --audio "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3" \
  --model qwen3-asr-flash \
  --language-hints zh,en \
  --print-response

Long-file mode:

python skills/ai/audio/alicloud-ai-audio-asr/scripts/transcribe_audio.py \
  --audio "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3" \
  --model qwen3-asr-flash-filetrans \
  --async \
  --wait

Operational guidance

  • For local files, use input_audio.data (data URI) when direct URL is unavailable.
  • Keep language_hints minimal to reduce recognition ambiguity.
  • For async tasks, use 5-20s polling interval with max retry guard.
  • Save normalized outputs under output/alicloud-ai-audio-asr/transcripts/.

Output location

  • Default output: output/alicloud-ai-audio-asr/transcripts/
  • Override base dir with OUTPUT_DIR.

Workflow

  1. Confirm user intent, region, identifiers, and whether the operation is read-only or mutating.
  2. Run one minimal read-only query first to verify connectivity and permissions.
  3. Execute the target operation with explicit parameters and bounded scope.
  4. Verify results and save output/evidence files.

References

  • references/api_reference.md
  • references/sources.md
  • Realtime synthesis is provided by skills/ai/audio/alicloud-ai-audio-tts-realtime/.

Reviews (0)

Sign in to write a review.

No reviews yet. Be the first to review!

Comments (0)

Sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Compatible Platforms

Pricing

Free

Related Configs