name: speech-to-text description: Transcribe or translate audio files to text using a public Hugging Face Whisper Space over Gradio. Use when the user sends voice notes, audio attachments, meeting clips, podcasts, interviews, or any local audio file (.ogg, .mp3, .wav, .m4a, etc.) and wants a transcript, rough captions, or an English translation without relying on paid APIs first.

Speech to Text

Use this skill to turn local audio files into text with a public Whisper-based endpoint.

Run:

python3 scripts/transcribe.py /path/to/file.ogg

Return the transcript as plain text. By default, the script also applies lightweight Chinese punctuation and sentence-breaking cleanup.

For machine-readable output:

python3 scripts/transcribe.py /path/to/file.ogg --json

To disable cleanup and keep the raw model text:

python3 scripts/transcribe.py /path/to/file.ogg --format raw

To force Chinese punctuation cleanup:

python3 scripts/transcribe.py /path/to/file.ogg --format zh

For English translation instead of same-language transcription:

python3 scripts/transcribe.py /path/to/file.ogg --task translate

Confirm the input is a local audio file.
Run scripts/transcribe.py on it.
If the transcript looks imperfect, tell the user it came from a public Whisper endpoint and may need cleanup.
If helpful, post-process into:
- cleaned transcript
- summary
- action items
- bilingual output

The script:

Default endpoint:

Override it with:

python3 scripts/transcribe.py input.ogg --space https://your-space.hf.space

or set:

export HF_WHISPER_SPACE=https://your-space.hf.space

Treat this as a best-effort public/free path, not a privacy-grade path.
Do not use for highly sensitive audio unless the user explicitly accepts public third-party processing.
Expect rate limits, queueing, and occasional outages.
If the public endpoint fails, explain that the free backend is unavailable and offer alternatives.

Prefer to return: