Analyzes audio to detect BPM, key, structure, genre, mood, transcribe lyrics, and generate visual and textual summaries of music tracks.
Use when OpenClaw needs to call SpeakNotes API routes directly using an API key and generate transcripts/summaries from YouTube URLs, media files, or documen...
Local speech-to-text using faster-whisper. 4-6x faster than OpenAI Whisper with identical accuracy; GPU acceleration enables ~20x realtime transcription. SRT...
Search video dialogue and create reaction GIFs with timed subtitles. Perfect for creating meme-worthy clips from movies and TV shows.
On-device speech-to-text (Whisper) + text-to-speech (Qwen3-TTS) CLI. Runs on the Apple Neural Engine (ANE), Apple's low power, dedicated ML inference chip. M...
Personal knowledge base for capturing and retrieving information about people, places, restaurants, games, tech, events, media, ideas, and organizations. Use...
--- name: voice-stt-tts description: Full voice message setup (STT + TTS) for OpenClaw using faster-whisper and Edge TTS homepage: https://docs.openclaw.ai/nodes/audio metadata: { "openclaw":
Use AudioPod AI's API for audio processing tasks including AI music generation (text-to-music, text-to-rap, instrumentals, samples, vocals), stem separation, text-to-speech, noise reduction, speech-to
AutoGLM ASR MCP 服务:长音频并发转录、上下文传递、时间戳分段。基于智谱 GLM-ASR-2512。触发词:语音识别、ASR、转录、转录音频、长音频
Generate synchronized subtitles (SRT/VTT/ASS) from video audio with precise timestamps. Use when users need subtitles, captions, or video transcription with...
Complete Venice AI platform — text generation, web search, embeddings, TTS, speech-to-text, image generation, video creation, upscaling, and AI editing. Private, uncensored AI inference for everythi
Full local AI inference stack on Apple Silicon Macs via MLX. Includes: LLM chat (Qwen3-14B, Gemma3-12B), speech-to-text ASR (Qwen3-ASR, Whisper), text embedd...
Extract YouTube video transcripts and subtitles via YouMind API — no yt-dlp, no proxy, no local dependencies. Batch extract up to 5 videos at once with paral...
--- name: voice-agent display-name: AI Voice Agent Backend version: 1.1.0 description: Local Voice Input/Output for Agents using the AI Voice Agent API. author: trevisanricardo homepage: https://githu
Extract speech-to-text from Douyin (TikTok China) videos, get watermark-free download links, and download videos. Use when user shares a Douyin link, asks to...
Extract recipes from Instagram reels. Use when a user sends an Instagram reel link and wants to get the recipe from the caption. Parses ingredients, instructions, and macros into a clean format.
OpenAI API integration — chat completions, embeddings, image generation, audio transcription, file management, fine-tuning, and assistants via the OpenAI RES...
Download videos from 1800+ websites and generate subtitles using Faster Whisper AI. Use when user wants to download videos from YouTube, Bilibili, Twitter, T...
Generate professional captions and subtitles with multi-engine transcription, word-level timing, styling presets, and burn-in.
Convert Bilibili (B站) videos into a searchable text knowledge base. Supports single videos and batch processing of entire UP主 channels. Uses local whisper.cp...
OpenClaw local speech-to-text backend using faster-whisper over HTTP on 127.0.0.1:18790. Use when you want voice transcription without external APIs, without...
Use the Gemini API (Nano Banana image generation, Veo video, Gemini TTS speech and audio understanding) to deliver end-to-end multimodal media workflows and code templates for "generation + understand
视频自动字幕生成器,批量为视频生成字幕文件(SRT/VTT),结合视频帧提取和语音转文字,预览模式和撤销功能!
Complete zero-dependency memory system for AI agents — file-based architecture, daily notes, long-term curation, context management, heartbeat integration, a...