End-to-end TikTok ad video pipeline. Product script → Veo base video → animated caption overlay → audio mix → final MP4. One command, full automation.
Dub YouTube videos with Voice.ai TTS. Turn scripts into publish-ready voiceovers with chapters, captions, and audio replacement for YouTube long-form and Shorts.
Play audio on Sonos with intelligent state restoration - pauses streaming, skips Line-In/TV/Bluetooth, resumes everything.
Real-time voice assistant for OpenClaw. Streams mic audio through configurable STT (Deepgram or ElevenLabs) into your OpenClaw agent, then speaks the response via configurable TTS (Deepgram Aura or El
Talking head video production with AI avatars, lipsync, and voiceover. Covers portrait requirements, audio quality, OmniHuman, PixVerse lipsync, Dia TTS. Use...
Play TTS or audio on the Raspberry Pi (or gateway host) default speaker. Use when the user asks for an announcement, alarm, news summary, or "say X on the Pi...
Convert Chinese Mandarin text to high-quality audio using UniSound's WebSocket TTS API with adjustable voice, speed, volume, pitch, and format options.
Fetches the latest news using news-aggregator-skill, formats it into a podcast script in Markdown format, and uses the tts skill to generate a podcast audio...
ElevenLabs TTS (Text-to-Speech) with emotional audio tags for expressive voice synthesis. WhatsApp-compatible voice messages with Opus conversion. Supports 7...
--- name: voice-ai-tts description: > High-quality voice synthesis with 9 personas, 11 languages, and streaming using Voice.ai API. version: 1.1.5 tags: [tts, voice, speech, voice-ai, audio, streami
Create AI-powered podcasts with text-to-speech, music, and audio editing. Tools: Kokoro TTS, DIA TTS, Chatterbox, AI music generation, media merger. Capabili...
--- name: asr-claw version: 1.1.1 description: Speech recognition CLI for AI agent automation. Transcribe audio from stdin, files, or URLs. metadata: openclaw: homepage: https://github.com/llm-n
Edit, transform, extend, upscale, and enhance videos using EachLabs AI models. Supports lip sync, video translation, subtitle generation, audio merging, style transfer, and video extension. Use when t
One-command YouTube video transcription. Automatically downloads audio and transcribes using OpenAI Whisper API — works even when YouTube subtitles are disab...
Use when the user wants to download a video or audio from a URL. Supports 20+ platforms including YouTube, X/Twitter, TikTok, Instagram, Facebook, Reddit, Bi...
Download Instagram Reels, transcribe audio, and extract captions. Share a reel URL and get back a full transcript with the original description.
Youtube Highest Quality Downloader - Download highest quality silent video and pure audio from YouTube, then merge into video with sound
Local text-to-speech using Qwen3-TTS with mlx_audio (macOS Apple Silicon) or qwen-tts (Linux/Windows). Privacy-first offline TTS with natural, realistic voic...
Convert voice notes, humming, and melodic audio recordings to quantized MIDI files using ML-based pitch detection and intelligent post-processing
Clone any voice from a short audio sample and generate speech with it. Powered by LuxTTS (150x realtime, local, free, no API key). Use when asked to clone a...
Turn any URL into structured content — YouTube videos (via Gemini Video API), web articles, PDFs, and audio files. Extract transcripts, summaries, and metada...
--- name: song-recognition description: Recognize songs by singing or audio file using iFlytek's Query By ACRCloud technology. homepage: https://www.xfyun.cn/doc/voiceservice/music_recognition/API.
Local ASR and TTS inference server. Use when the user wants to transcribe audio to text (ASR) or convert text to speech (TTS). Requires a running Willow Infe...
Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API. Outputs LRC, SRT, or JSON with word-level timestamps. Use when users want to transcribe songs, generate LRC files,