Your eyes, hands, and ears on Android. See the screen (screenshot + indexed UI tree), interact (tap, swipe, scroll, type, clear-field), navigate via deep lin...
Generate AI videos for mature creative projects using Wan 2.2 Spicy (LoRA-tuned for NSFW, top recommended), Wan 2.6, Seedance 1.5, Vidu Q3-Pro, and other mod...
Local Qwen3-TTS speech synthesis on Apple Silicon via MLX. Use for offline narration, audiobooks, video voiceovers, and multilingual TTS.
SenseAudio Text-to-Speech (TTS) API for converting text to natural speech. Supports synchronous and SSE streaming modes, multiple voices, emotion control, sp...
Become an AI radio host. Register as a radio personality, create shows, book schedule slots, and publish episodes. Use when you want to host a radio show, record episodes, have multi-agent roundtable
Generate AI videos and images using Alibaba's Wan 2.6 and Wan 2.5 — featuring text-to-video, image-to-video, video-to-video, text-to-image, and image editing...
提供有声书创作与音频能力(ABS 读写、音效/音频检索、二创、音色推荐、章节角色分析等),通过 HTTP Streamable MCP 调用。
Generate audiobooks from novels and long-form text with chapter management and character voices. Use when users mention audiobooks, narrating books, or conve...
The cheapest AI media API on the market. Generate images (Flux), music (AceStep), speech with voice cloning, transcribe video/audio, OCR, video generation, b...
Full TikTok/Reels video pipeline: script → TTS voiceover (ElevenLabs) → HeyGen talking avatar → auto-subtitles (Whisper) → ffmpeg compose → 1080x1920 final v...
Generate speech from text using Kyutai Pocket TTS - lightweight, CPU-friendly, streaming TTS with voice cloning. English only. ~6x real-time on M4 MacBook Air.
Monet AI - Comprehensive AI content generation API for AI agents. Video generation (Sora, Veo, Doubao Seedance, Wan, Hailuo, Kling), image generation (GPT-4o...
Download music from YouTube/YouTube Music and stream to Chromecast via Home Assistant. Complete CLI toolset with web server integration, configuration wizard, and playback controls.
On-device speech-to-text (Whisper) + text-to-speech (Qwen3-TTS) CLI. Runs on the Apple Neural Engine (ANE), Apple's low power, dedicated ML inference chip. M...
Turn scripts into publishable voiceovers with Voice.ai TTS, including segments, chapters, captions, and video muxing.
Extract recipes from Instagram reels. Use when a user sends an Instagram reel link and wants to get the recipe from the caption. Parses ingredients, instructions, and macros into a clean format.
--- name: voice-stt-tts description: Full voice message setup (STT + TTS) for OpenClaw using faster-whisper and Edge TTS homepage: https://docs.openclaw.ai/nodes/audio metadata: { "openclaw":
Turn your AI assistant into a TTS and voice cloning powerhouse using the Verbatik API. Use when generating speech from text, cloning voices, managing cloned...
Produce complete code-based animated videos by scripting, generating narration, creating visual assets, and rendering final MP4s using the code2animation fra...
Publish books on Latent Press (latentpress.com) — the AI publishing platform where agents are authors and humans are readers. Use this skill when writing, pu...
Top-tier AI music generation with models: Suno sonic v4, Suno sonic v5, DouBao BGM (GenBGM), DouBao Song (GenSong). One-stop text-to-music with custom mode,...
Chat with any real person or fictional character in their own voice by automatically finding their speech online, extracting a clean reference sample, and ge...
Create AI avatar and talking head videos with OmniHuman, Fabric, PixVerse via inference.sh CLI. Models: OmniHuman 1.5, OmniHuman 1.0, Fabric 1.0, PixVerse Li...