OpenClaw agent skill for converting documents to Markdown. Documentation and utilities for Microsoft's MarkItDown library. Supports PDF, Word, PowerPoint, Excel, images (OCR), audio (transcription), H
Convert documents and files to Markdown using markitdown. Use when converting PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls), HTML, CSV, JSON, XML, images (with EXIF/OCR), audio (with tran
The ultimate Seedance 2.0 storyboard director. Generate movie-grade 9:16 vlogs, cinematic prompts, and auto-audio scripts from multimodal inputs. Optimized f...
Read any web page aloud with natural AI voices. Extract article text from any URL and convert it to audio (MP3). Use when the user wants to: listen to a webp...
LobsterTv is an AI agent live streaming platform. Agents connect via REST API to broadcast in real-time with rendered avatars, synchronized TTS audio, expression control, chat interaction, and audienc
Generate synchronized subtitles (SRT/VTT/ASS) from video audio with precise timestamps. Use when users need subtitles, captions, or video transcription with...
Text-to-speech using Kokoro local TTS. Use when the user wants to convert text to audio, read aloud, or generate speech.
FL Studio Python scripting for MIDI controller development, piano roll manipulation, Edison audio editing, workflow automation, and FLP file parsing with PyFLP. Use for programmatic configuration, dev
MCP server that transcribes YouTube videos to text. Uses yt-dlp to download audio and OpenAI's Whisper-1 for more precise transcription than youtube captions. Provide a YouTube URL and get back the fu
Extract content from URLs, documents, videos, and audio files using intelligent auto-engine selection. Supports web pages, PDFs, Word docs, YouTube transcripts, and more with structured JSON responses
Provides Speech-to-Text (STT) and text Translation using the Addis Assistant API (api.addisassistant.com). Use when the user needs to convert an audio file to text (specifically Amharic), or translate
Forensic media triage with chain of custody. Use when receiving images, videos, audio, PDFs, or documents that need evidence-grade handling, integrity verifi...
Video to text converter. Downloads videos from Bilibili using bilibili-api, from other sites using yt-dlp, then transcribes audio using faster-whisper. Use w...
Repurpose long-form video or audio into short-form clip plans with timestamps, hooks, captions, and packaging notes. Use when a user asks to turn a long video, podcast, or stream into Shorts, Reels, T
Local speech-to-text using Vosk. Lightweight, fast, fully offline. Perfect for transcribing Telegram voice messages, audio files, or any speech-to-text task without cloud APIs.
Dub YouTube videos with Voice.ai TTS. Turn scripts into publish-ready voiceovers with chapters, captions, and audio replacement for YouTube long-form and Shorts.
ClawVox - ElevenLabs voice studio for OpenClaw. Generate speech, transcribe audio, clone voices, create sound effects, and more.
Search, explore, and run fal.ai generative AI models (image generation, video, audio, 3D). Use when user wants to generate images, videos, or other media with AI models.
Combined agent that synthesizes speech via Volcengine TTS, uploads the audio to TOS, and returns a presigned temporary URL. Use when users need a shareable a...
case.dev — a legal AI platform with encrypted document vaults, OCR, audio transcription, and legal search. This skill installs the casedev CLI and provides s...
Generate images, videos, and audio via fal.ai API (FLUX, SDXL, Whisper, etc.)
AI podcast editing as a service. Upload raw audio or submit a URL, get back edited episodes with filler words removed, noise reduction, transcripts, show notes, and social clips. Includes webhooks for
Analyze videos from TikTok, YouTube, Instagram, Twitter, and others by URL, transcribing audio locally and answering questions about the content.
Enables voice synthesis, voice cloning, voice design, and audio post-processing using MiniMax Voice API and FFmpeg. Use when converting text to speech, creat...