TTS (text-to-speech) via IMA Open API with seed-tts-2.0. Voice synthesis, speech from text, dubbing, audio content creation. Output: audio URL (mp3/wav). Flo...
Unified speech-to-text skill. Use when the user asks to transcribe audio or video, generate subtitles, identify speakers, translate speech, search transcript...
Enables voice synthesis, voice cloning, voice design, and audio post-processing using MiniMax Voice API and FFmpeg. Use when converting text to speech, creat...
Build real-time voice AI applications using Azure AI Voice Live SDK (azure-ai-voicelive). Use this skill when creating Python applications that need real-time bidirectional audio communication with Az
Generate images, videos, icons, audio, and more using Freepik's AI API. Supports Mystic, Flux, Kling, Hailuo, Seedream, RunWay, Magnific upscaling, stock con...
Instant access to 100K+ nonfiction book summaries with 1-minute audio previews. Free demo key included — no signup needed. Search, browse, and listen via Fiz...
Use the Gemini API (Nano Banana image generation, Veo video, Gemini TTS speech and audio understanding) to deliver end-to-end multimodal media workflows and code templates for "generation + understand
Convert text or subtitle files into speech audio with options for voice cloning, emotion control, speed adjustment, and timeline-accurate dubbing using Kokor...
Convert text or subtitle files into speech audio with options for voice cloning, emotion control, speed, and timeline-accurate dubbing using Kokoro or Noiz b...
Interact with the openLesson tutoring API to generate learning plans, start audio-based sessions, analyze reasoning gaps, and manage tutoring workflows.
macOS CLI tool to record microphone audio, screen video or screenshot, and camera video or photo from the terminal with device listing and output control.
Generate AI videos using ByteDance's Seedance 1.5 Pro — a native audio-visual joint generation model with cinematic camera control, multi-language lip-sync,...
Transcribe audio to text with Whisper models via inference.sh CLI. Models: Fast Whisper Large V3, Whisper V3 Large. Capabilities: transcription, translation,...
Use AudioPod AI's API for audio processing tasks including AI music generation (text-to-music, text-to-rap, instrumentals, samples, vocals), stem separation, text-to-speech, noise reduction, speech-to
Create language learning audio with SenseAudio TTS, including pronunciation drills, bilingual lessons, slowed speech practice, and dialogue exercises. Use wh...
Translate and dub videos from one language to another, replacing the original audio with TTS while keeping the video intact.
Send voice/audio messages to Feishu (Lark) users. Converts audio files to OPUS format and sends as voice message, not file attachment. create by Alex
Generate synchronized subtitles (SRT/VTT/ASS) from video audio with precise timestamps. Use when users need subtitles, captions, or video transcription with...
Daily news briefing generator — produces a conversational radio-host-style audio briefing + DOCX document covering weather, X/Twitter trends, web trends, world news, politics, tech, local news, spor
End-to-end TikTok ad video pipeline. Product script → Veo base video → animated caption overlay → audio mix → final MP4. One command, full automation.
Generate talking head videos from a portrait image and audio using WaveSpeed AI's InfiniteTalk model. Produces lip-synced video up to 10 minutes long at 480p...
Clip and download specific time ranges or full YouTube videos in various qualities, including audio-only MP3 extraction, using precise timestamps.
Real-time voice assistant for OpenClaw. Streams mic audio through configurable STT (Deepgram or ElevenLabs) into your OpenClaw agent, then speaks the response via configurable TTS (Deepgram Aura or El
Talking head video production with AI avatars, lipsync, and voiceover. Covers portrait requirements, audio quality, OmniHuman, PixVerse lipsync, Dia TTS. Use...