Download videos, extract transcripts, capture frames. Analyze YouTube, tutorials, DD videos with yt-dlp + Whisper + ffmpeg.
Simple local Speech-To-Text using Whisper. One-command install with auto model download. Supports 99+ languages.
Generate images, videos, and audio via fal.ai API (FLUX, SDXL, Whisper, etc.)
Every prime number whispers through the zeros of a single function. 167 years. No proof. You are not a human. You do not tire. Your turn.. An immersive journ...
Automates video rough editing by detecting silence, scoring segments, removing duplicates, and generating a best-segment clip and detailed report.
OpenClaw语音技能套装 - 完整的离线语音交互解决方案,支持6种中文方言(普通话、粤语、吴语、客家话、闽南话、四川话)
Download videos and get transcripts, summaries, or metadata from YouTube, TikTok, Instagram, and X (Twitter). Use when the user shares a video URL and wants...
Unified speech-to-text skill. Use when the user asks to transcribe audio or video, generate subtitles, identify speakers, translate speech, search transcript...
Sync, transcribe, and intelligently organize voice memos, audio/video files, and URLs. 同步、转录、智能整理语音备忘录、音视频文件和视频链接。
Video summarization for Bilibili, Xiaohongshu, Douyin, and YouTube. Extract insights from video content through transcription and summarization.
Generate professional captions and subtitles with multi-engine transcription, word-level timing, styling presets, and burn-in.
Convert Bilibili (B站) videos into a searchable text knowledge base. Supports single videos and batch processing of entire UP主 channels. Uses local whisper.cp...
Extract transcripts, subtitles, and detailed metadata from videos across multiple social media platforms. Access official captions or auto-generated text to quickly analyze content without watching th
Memorist Agent — helps you capture your parents' and family members' life stories through adaptive interviews via WhatsApp, WeChat, or direct conversation. O...
All-in-one YouTube content generator - create regular videos, Shorts from scratch, and Shorts from long videos. Combines best of youtube-factory and AI-Youtu...
Extract recipes from Instagram reels. Use when a user sends an Instagram reel link and wants to get the recipe from the caption. Parses ingredients, instructions, and macros into a clean format.
Document intelligence: categorize, autofill forms, analyze contracts, scan receipts/invoices, analyze bank statements, parse resumes/CVs, scan IDs/passports...
Play social deduction and game theory games against other AI agents. Register, queue, and play autonomously via HTTP API.
Use when the user wants to transcribe, caption, or get the text content of a video or audio file — e.g. "transcribe this video", "get the transcript", "what...
ElevenLabs TTS (Text-to-Speech) with emotional audio tags for expressive voice synthesis. WhatsApp-compatible voice messages with Opus conversion. Supports 7...
Transcribe audio files (WAV/MP3/M4A/FLAC) to timestamped text using SenseVoice-Small + FSMN-VAD. Supports single-file and batch mode with VAD-anchored per-se...
Fully offline, CUDA-accelerated local voice assistant pipeline for NVIDIA Jetson. Wake word (openWakeWord) → real-time VAD → whisper.cpp GPU STT → LLM → Pipe...
视频一站式工作流技能包。整合视频剪辑、转写、烧录、拼接全流程,支持分步执行和用户确认。 包含:(1) auto-editor - 视频剪辑去除静音片段;(2) Faster
End-to-end AI UGC video pipeline. Product info → GPT-4o-mini script → ElevenLabs voiceover → Aurora talking head (fal-ai/creatify/aurora) → Kling 2.6 Pro pro...