Description

name: whisper-cpp description: Install and use whisper.cpp (local, free/offline speech-to-text) with OpenClaw. Supports downloading different ggml model sizes (tiny/base/small/medium/large-*) and configuring tools.media.audio to transcribe inbound voice notes without paid provider APIs.

whisper-cpp (Local Whisper STT for OpenClaw)

This skill sets up local whisper.cpp STT for inbound Telegram voice notes.

You need build tools (git, cmake, compiler toolchain) + curl and ffmpeg (to decode Telegram OGG/Opus → WAV).

From this skill directory:

bash scripts/install_whisper_cpp.sh
bash scripts/download_models.sh
bash scripts/install_wrapper.sh
bash scripts/patch_openclaw_audio.sh

Send a Telegram voice note to test.

This setup uses ggml Whisper models stored in ~/.cache/whisper.

Common model names you can download:

By default we download: base + small.

To download specific models:

bash scripts/download_models.sh tiny base small

For the OpenClaw wrapper, you can select:

OPENCLAW_WHISPER_MODEL=small openclaw-whisper-stt /path/to/audio

Force a language (example):

OPENCLAW_WHISPER_LANG=en openclaw-whisper-stt /path/to/audio

Models are stored in: ~/.cache/whisper.

After install (whisper-cli + libs are in ~/.local/):

bash scripts/cleanup_build.sh

Confirm OpenClaw is using the wrapper:

which openclaw-whisper-stt
openclaw config get tools.media.audio.models

Test the wrapper directly:

openclaw-whisper-stt /path/to/audio.ogg
OPENCLAW_WHISPER_MODEL=small openclaw-whisper-stt /path/to/audio.ogg

Follow gateway logs while sending a Telegram voice note:

openclaw logs --follow

Wrapper source: bin/openclaw-whisper-stt.sh (linked to ~/.local/bin/openclaw-whisper-stt)
OpenClaw config patcher: scripts/patch_openclaw_audio.sh