ElevenLabs TTS (Text-to-Speech) with emotional audio tags for expressive voice synthesis. WhatsApp-compatible voice messages with Opus conversion. Supports 7...
Search the web for information, find current content, and look up news articles. And also 50+ models for image generation, video generation, text-to-speech,...
Text-to-Speech using SiliconFlow API (CosyVoice2). Supports multiple voices, languages, and dialects.
Google Workspace CLI for Gmail, Calendar, Drive, Contacts, Sheets, and Docs. And also 50+ models for image generation, video generation, text-to-speech, spee...
Convert text to speech using Microsoft Edge TTS with real-time streaming, customizable voice settings, and support for multiple languages including Chinese a...
Automatically update Clawdbot and all installed skills once daily via cron. And also 50+ models for image generation, video generation, text-to-speech, speec...
Work with Obsidian vaults (plain Markdown notes) and automate via obsidian-cli. And also 50+ models for image generation, video generation, text-to-speech, s...
Clone any voice and generate speech using Coqui XTTS v2. SUPER SIMPLE - provide a voice sample (6-30 sec WAV) and text, get cloned voice audio. Supports 14+ languages. Use when the user wants to (1) C
Configure an OpenClaw instance to use a local OpenAI-compatible TTS backend (for example openedai-speech) with cloned voices. Use when users ask to wire loca...
Create AI-powered podcasts with text-to-speech, music, and audio editing. Tools: Kokoro TTS, DIA TTS, Chatterbox, AI music generation, media merger. Capabili...
Generate spoken audio from text using the local Kokoro TTS engine. Use when the user asks to "say" something, requests a voice message, or wants text converted to speech.
Control Slack from Clawdbot including reacting to messages and pinning items. And also 50+ models for image generation, video generation, text-to-speech, spe...
Extract speech-to-text from Douyin (TikTok China) videos, get watermark-free download links, and download videos. Use when user shares a Douyin link, asks to...
Voice conversation interface for OpenClaw using wake word detection, streaming LLM responses, and text-to-speech. Use when a user wants to talk to their Open...
Provides Speech-to-Text (STT) and text Translation using the Addis Assistant API (api.addisassistant.com). Use when the user needs to convert an audio file to text (specifically Amharic), or translate
OpenClaw local speech-to-text backend using faster-whisper over HTTP on 127.0.0.1:18790. Use when you want voice transcription without external APIs, without...
Extract audio from video URLs and transcribe using STT (Speech-to-Text). Supports local Whisper or cloud APIs. Use when: user provides a video URL and wants...
Build, operate, and troubleshoot Autonoannounce local speaker text-to-speech using the queued pipeline (enqueue to worker to ElevenLabs to playback backend)....
Full ElevenLabs platform integration — text-to-speech, voice cloning, and Conversational AI agent creation. Not just TTS — build interactive voice agents wit...
Bitcoin-powered AI tools marketplace via MCP. Generate images (Flux, Seedream, Recraft), text (Kimi K2.5, DeepSeek, GPT-OSS), video (Kling V3), music, speech...
ElevenLabs voice API integration — TTS, sound effects, music generation, speech-to-text, voice isolation, and streaming. Use when building voice-enabled apps...
Text-to-Speech using FishAudio (fish.audio), generates natural human-like voice with great emotional expression.
Use when the user needs local speech-to-text transcription for audio files, especially Chinese or mixed Chinese-English audio, without relying on cloud trans...
High-performance local speech-to-text transcription using Faster Whisper with NVIDIA GPU acceleration. Transcribe audio files locally without sending data to...