Text-to-speech conversion using Python edge-tts for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and sub...
Install and configure a local browser-based Chinese voice chat launcher with the Ning Yao persona, including one-click Windows launchers, browser speech I/O,...
High-performance local speech-to-text transcription using Faster Whisper with NVIDIA GPU acceleration. Transcribe audio files locally without sending data to...
Transcribe speech from YouTube videos or audio URLs into text using Gladia API with up to 10 free hours of monthly transcription. Use when: you need to summa...
Transcribe audio files to text using local speech recognition. Triggers on: "转录", "transcribe", "语音转文字", "ASR", "识别音频", "把这段音频转成文字".
Automatic Speech Recognition (ASR) using Zhipu AI (BigModel) GLM-ASR model. Use when you need to transcribe audio files to text. Supports Chinese audio trans...
Text-to-speech, sound effects, music generation, voice management, and quota checks via the ElevenLabs API. Use when generating audio with ElevenLabs or mana...
Build and troubleshoot SenseAudio speech recognition integrations, including HTTP transcription (`/v1/audio/transcriptions`), realtime WebSocket ASR (`/ws/v1...
Build and debug SenseAudio text-to-speech integrations on `/v1/t2a_v2` and `/ws/v1/t2a_v2`, including sync HTTP, SSE stream, WebSocket event sequencing, hex...
Generate speech audio with 阿里云百炼 TTS via the `bailian-cli` npm package. Use when users ask to convert text to voice, choose voices/languages, batch-generate...
Local text-to-speech using Qwen3-TTS with mlx_audio (macOS Apple Silicon) or qwen-tts (Linux/Windows). Privacy-first offline TTS with natural, realistic voic...
Local text-to-speech using Piper voices via sherpa-onnx. 100% offline, no API keys required. Use when user asks for a voice reply, audio response, spoken answer, or wants to hear something read aloud.
Free local speech-to-text transcription using OpenAI Whisper. Transcribe audio files (mp3, wav, m4a, ogg, etc.) to text without API costs. Use when: (1) User...
Run a minimal test matrix for the Model Studio skills that exist in this repo, including image/video/audio, realtime speech, omni, visual reasoning, embeddin...
AI media generation via deAPI. Transcribe YouTube/audio/video, generate images from text, text-to-speech, OCR, remove backgrounds, upscale images, create vid...
Transcribe audio files to text using Telnyx Speech-to-Text API. Use when you need to convert audio recordings, voice messages, or spoken content to text.
One-step full-stack installer for OpenClaw WebChat voice input with local speech-to-text. Orchestrates three focused skills in order: local STT backend (fast...
Local speech-to-text using OpenAI Whisper. Use when the user needs to: (1) transcribe audio files to text, (2) convert voice messages to written content, (3)...
PCClaw provides 16 native Windows AI skills for system control, automation, files, notifications, OCR, speech, LLM inference, and task management with minima...
Use when low-latency realtime speech recognition is needed with Alibaba Cloud Model Studio Qwen ASR Realtime models, including streaming microphone input, li...
--- name: vectorclaw-mcp description: "MCP tools for Anki Vector: speech, motion, camera, sensors, and automation workflows." openclaw: emoji: "🤖" requires: bins: ["python3"] env: ["VEC
Multilingual Text-to-Speech (TTS) with intelligent Pinyin-to-Hanzi conversion. Use when the user asks to generate audio for text that contains a mix of Vietn...
Accept text or voice input, transcribe if needed, generate natural OpenAI TTS speech, and send audio output to Feishu chat or web player.
Stream free, professional text-to-speech from voiceless servers to Linux, macOS, or Android devices with 50+ voices in 30+ languages. Two architecture options for flexible deployment - server-side TTS