Text-to-speech conversion using Zhipu AI (BigModel) GLM-TTS model. Use when you need to convert text to audio files with various voice options. Supports Chin...
Work with Obsidian vaults (plain Markdown notes) and automate via obsidian-cli. And also 50+ models for image generation, video generation, text-to-speech, s...
Transcribes local voice messages to text using Faster Whisper models for fast, privacy-focused speech recognition on audio files.
Send voice messages across chat channels (Telegram, Discord, Feishu/Lark, Signal, WhatsApp, and others) using edge-tts for text-to-speech and ffmpeg for audi...
ElevenLabs advanced TTS for converting text to speech, listing voices, and managing credits
Voice conversation interface for OpenClaw using wake word detection, streaming LLM responses, and text-to-speech. Use when a user wants to talk to their Open...
Convert text or subtitle files into speech audio with options for voice cloning, emotion control, speed, and timeline-accurate dubbing using Kokoro or Noiz b...
Build backend AI with Vercel AI SDK v6 stable. Covers Output API (replaces generateObject/streamObject), speech synthesis, transcription, embeddings, MCP tools with security guidance. Includes v4→v5
Configure an OpenClaw instance to use a local OpenAI-compatible TTS backend (for example openedai-speech) with cloned voices. Use when users ask to wire loca...
Combined agent that synthesizes speech via Volcengine TTS, uploads the audio to TOS, and returns a presigned temporary URL. Use when users need a shareable a...
Install and use whisper.cpp (local, free/offline speech-to-text) with OpenClaw. Supports downloading different ggml model sizes (tiny/base/small/medium/large...
Pronunciation coaching with real voice analysis using Azure Speech Services. Analyzes audio files for phoneme-level accuracy, fluency, prosody, and intonatio...
Converts text to natural speech using ElevenLabs for clinical and healthcare use cases. Use when generating patient instructions, discharge summaries, medica...
Provides Speech-to-Text (STT) and text Translation using the Addis Assistant API (api.addisassistant.com). Use when the user needs to convert an audio file to text (specifically Amharic), or translate
Local zero-cost text-to-speech with per-agent voice profiles using Kokoro TTS (82M params). 54 voices available, named agent mappings, WAV output. Use when b...
Simple local Speech-To-Text using Whisper. One-command install with auto model download. Supports 99+ languages.
Transcribe non-realtime speech with Alibaba Cloud Model Studio Qwen ASR models (`qwen3-asr-flash`, `qwen-audio-asr`, `qwen3-asr-flash-filetrans`). Use when c...
ElevenLabs text-to-speech with mac-style say UX.
ClawVox - ElevenLabs voice studio for OpenClaw. Generate speech, transcribe audio, clone voices, create sound effects, and more.
Transcribe audio and video files using OpenAI Whisper API. Use when user wants to transcribe audio/video files, extract speech from media, or get text from r...
Local speech-to-text with the Whisper CLI (no API key).
AI media generation via deAPI. Transcribe YouTube/audio/video, generate images from text, text-to-speech, OCR, remove backgrounds, upscale images, create vid...
Stream free, professional text-to-speech from voiceless servers to Linux, macOS, or Android devices with 50+ voices in 30+ languages. Two architecture options for flexible deployment - server-side TTS