Generate and translate video subtitles using WhisperX and LLM translation. Use when processing video files to create .srt subtitle files. Supports multilingu...
ElevenLabs voice API integration — TTS, sound effects, music generation, speech-to-text, voice isolation, and streaming. Use when building voice-enabled apps...
Unified QCut media toolkit — organize project files, process media with FFmpeg, generate AI content, control the QCut editor with native CLI commands, genera...
Video summarization for Bilibili, Xiaohongshu, Douyin, and YouTube. Extract insights from video content through transcription and summarization.
Complete Venice AI API toolkit - image generation, video, audio, embeddings, transcription, characters, models, and admin functions. Privacy-focused inferenc...
--- name: mlx-stt description: Speech-To-Text with MLX (Apple Silicon) and opensource models (default GLM-ASR-Nano-2512) locally. version: 1.0.7 author: guoqiao metadata: {"openclaw":{"always":true,"e
双向语音对话系统 - 语音识别转文字 + Edge TTS语音合成 + Cloudflare Tunnel公网访问
Create AI avatar and talking head videos with OmniHuman, Fabric, PixVerse via inference.sh CLI. Models: OmniHuman 1.5, OmniHuman 1.0, Fabric 1.0, PixVerse Li...
Extract and summarize YouTube video transcripts into concise overviews with main points, arguments, and conclusions using video captions.
Process, enhance, and convert audio files with noise removal, normalization, format conversion, transcription, and podcast workflows.
--- name: openai-whisper description: Local speech-to-text with the Whisper CLI (no API key). homepage: https://openai.com/research/whisper metadata: {"clawdbot":{"emoji":"🎙️","requires":{"bins":
--- name: qwen-audio description: "High-performance audio library with text-to-speech (TTS) and speech-to-text (STT)." version: "0.0.4" --- # Qwen-Audio ## Overview Qwen-Audio is a high-performance
Noosphere Integrated Memory Architecture — Complete cognitive stack for AI agents: persistent memory, emotional intelligence, dream consolidation, hive mind,...
Comprehensive catalog of what people are doing with OpenClaw. Covers 15+ categories with real examples, sources, and inspiration. Use when asked about OpenCl...
Bitcoin-powered AI tools marketplace via MCP. Generate images (Flux, Seedream, Recraft), text (Kimi K2.5, DeepSeek, GPT-OSS), video (Kling V3), music, speech...
One-step full-stack installer for OpenClaw WebChat voice input with local speech-to-text. Orchestrates three focused skills in order: local STT backend (fast...
Connect your agent to 100+ services and 21 tools across the internet. Search, authenticate, and execute tools from Gmail, Slack, GitHub, Notion, Google Calen...
HTTPS/WSS reverse proxy for OpenClaw WebChat Control UI. Serves the Control UI over HTTPS with TLS cert management, proxies WebSocket connections to the gate...
本地能力中心。通过 HTTP 调用本机麦克风、摄像头、Ollama、YOLO、Stable Diffusion、TTS/转写、通知、剪贴板、天气、白名单脚本等。当需要「验证是否有声
Use when OpenClaw needs to call SpeakNotes API routes directly using an API key and generate transcripts/summaries from YouTube URLs, media files, or documen...
Opinionated form UX and accessibility workflow for signup, checkout, settings, and lead-gen forms. Use when reviewing a form spec or existing implementation...
Run the video-skill pipeline to convert narrated videos into structured step data and enriched timeline-ready outputs. Use when a user asks to process a vide...
Discover and install related skills from inference.sh skill registry. Helps find complementary skills for your AI workflow. Use for: skill discovery, workflo...
Records the structure of conversations where ideas evolve, branch, get rejected, pivot, or combine. Saves each structural shift as a node in a local JSON tre...