Video to text converter. Downloads videos from Bilibili using bilibili-api, from other sites using yt-dlp, then transcribes audio using faster-whisper. Use w...
Telegram voice-to-voice for macOS Apple Silicon: transcribe inbound .ogg voice notes with yap (Speech.framework) and reply with Telegram voice notes via say+ffmpeg. Not compatible with Linux/Windows.
Download, transcribe, and analyze videos from YouTube, X/Twitter, and TikTok with local Whisper processing. Perfect for extracting TL;DRs, timestamps, and ac...
Handles voice-to-voice conversations on WhatsApp. Automatically transcribes incoming audio and responds with local TTS audio. Use when the user wants to "talk" instead of type.
Telegram bot that transcribes voice messages using Whisper and replies in Chinese with Microsoft Edge text-to-speech.
Automatically extracts audio from video, transcribes it using qwen3-asr-flash, and generates segmented text summaries saved alongside the original file.
Use when OpenClaw needs to call SpeakNotes API routes directly using an API key and generate transcripts/summaries from YouTube URLs, media files, or documen...
Analyzes audio to detect BPM, key, structure, genre, mood, transcribe lyrics, and generate visual and textual summaries of music tracks.
Local speech-to-text using faster-whisper. 4-6x faster than OpenAI Whisper with identical accuracy; GPU acceleration enables ~20x realtime transcription. SRT...
Search video dialogue and create reaction GIFs with timed subtitles. Perfect for creating meme-worthy clips from movies and TV shows.
On-device speech-to-text (Whisper) + text-to-speech (Qwen3-TTS) CLI. Runs on the Apple Neural Engine (ANE), Apple's low power, dedicated ML inference chip. M...
Personal knowledge base for capturing and retrieving information about people, places, restaurants, games, tech, events, media, ideas, and organizations. Use...
Use AudioPod AI's API for audio processing tasks including AI music generation (text-to-music, text-to-rap, instrumentals, samples, vocals), stem separation, text-to-speech, noise reduction, speech-to
--- name: voice-stt-tts description: Full voice message setup (STT + TTS) for OpenClaw using faster-whisper and Edge TTS homepage: https://docs.openclaw.ai/nodes/audio metadata: { "openclaw":
AutoGLM ASR MCP 服务:长音频并发转录、上下文传递、时间戳分段。基于智谱 GLM-ASR-2512。触发词:语音识别、ASR、转录、转录音频、长音频
Generate synchronized subtitles (SRT/VTT/ASS) from video audio with precise timestamps. Use when users need subtitles, captions, or video transcription with...
Complete Venice AI platform — text generation, web search, embeddings, TTS, speech-to-text, image generation, video creation, upscaling, and AI editing. Private, uncensored AI inference for everythi
--- name: voice-agent display-name: AI Voice Agent Backend version: 1.1.0 description: Local Voice Input/Output for Agents using the AI Voice Agent API. author: trevisanricardo homepage: https://githu
Extract YouTube video transcripts and subtitles via YouMind API — no yt-dlp, no proxy, no local dependencies. Batch extract up to 5 videos at once with paral...
Full local AI inference stack on Apple Silicon Macs via MLX. Includes: LLM chat (Qwen3-14B, Gemma3-12B), speech-to-text ASR (Qwen3-ASR, Whisper), text embedd...
Download videos from 1800+ websites and generate subtitles using Faster Whisper AI. Use when user wants to download videos from YouTube, Bilibili, Twitter, T...
OpenAI API integration — chat completions, embeddings, image generation, audio transcription, file management, fine-tuning, and assistants via the OpenAI RES...
Extract recipes from Instagram reels. Use when a user sends an Instagram reel link and wants to get the recipe from the caption. Parses ingredients, instructions, and macros into a clean format.
Generate professional captions and subtitles with multi-engine transcription, word-level timing, styling presets, and burn-in.