Unified multi-modal content parser for images, PDF, DOCX, audio, auto OCR/transcription, output structured text for LLM processing
Generate SRT subtitles from video/audio with translation support. Transcribes Hebrew (ivrit.ai) and English (whisper), translates between languages, burns subtitles into video. Use for creating captio
Control Nest smart home devices (thermostat, cameras, doorbell) via the Device Access API. Use when asked to check or adjust home temperature, view camera feeds, check who's at the door, monitor rooms
Automated short drama video publisher. Downloads drama content from MoboBoost, uses AI to identify highlight moments, clips 15-second vertical videos with te...
Find and download virtually any digital resource from the internet — ebooks, academic papers, movies, TV shows, music, software, images, fonts, courses, and...
Download YouTube videos and upload them to Pocket Casts Files for offline viewing. For personal use with content you own or have rights to.
MOSI Studio 双人对话合成(moss-ttsd):将两个角色的对话文本合成为 单段连续音频,两人声音自然交替。 当前版本限制:仅支持 2 人对话,仅支持中文和
Play audio on Sonos with intelligent state restoration - pauses streaming, skips Line-In/TV/Bluetooth, resumes everything.
Local TTS router for Apple Silicon — pull models, serve OpenAI-compatible API, synthesize speech, clone voices. Use when the user asks to "generate speech",...
Save restaurants, bars, and cafes from TikTok and Instagram videos. Search your saved places and get weekend suggestions.
全自动教学视频制作技能。根据课程主题自动生成教学视频,包含文案编写、TTS配音、画面设计、Remotion代码开发、视频导出。触发场景:用户要求制作
Extract and analyze content from video ads using Gemini Vision AI. Supports frame extraction, OCR text detection, audio transcription, and AI-powered scene analysis. Use when analyzing video creative
将文本转为语音并通过飞书 audio 消息发送给指定用户。用于“给用户发语音”“把这段话转语音并发飞书”“语音播报结果”等场景,尤其当普通文件
Image and video analysis powered by Isaac vision models. Capabilities include visual Q&A, object detection, OCR, captioning, counting, and grounded spatial r...
Test-driven behavioral verification for AI agents. Catches silent degradation when agent loads memory but doesn't apply learned behaviors. Use when building agent with persistent memory, testing after
AI-native workflow analyzer for Loom recordings. Breaks down recorded business processes into structured, automatable workflows. Use when: - Analyzing Loom videos to understand workflows - Extracting
Create professional terminal recordings with VHS tape files - guides through syntax, timing, settings, and best practices
Fetches the latest news using news-aggregator-skill, formats it into a podcast script in Markdown format, and uses the tts skill to generate a podcast audio...
下载并分析小红书视频内容。当用户提供小红书链接(xiaohongshu.com)时,自动下载视频、提取语音文字、整理总结内容。Use when user provides a xiaohongshu.com
Automatically extracts audio from video, transcribes it using qwen3-asr-flash, and generates segmented text summaries saved alongside the original file.
Orchestrate script-to-final-video production with a strict stage-gated workflow (outline → episode_plan → storyboard → storyboard_images → render), using See...
Send images as native Feishu stickers with auto-upload, caching by hash, GIF-to-WebP conversion, compression, and keyword-based sticker search.
News video maker skill. Use search tools to get news, generate speech, and create video with golden subtitles. For creating news briefing videos.
Extract audio from video URLs and transcribe using STT (Speech-to-Text). Supports local Whisper or cloud APIs. Use when: user provides a video URL and wants...