AI-native workflow analyzer for Loom recordings. Breaks down recorded business processes into structured, automatable workflows. Use when: - Analyzing Loom videos to understand workflows - Extracting
WhatsApp message relay and firewall for OpenClaw agents. Intercepts messages from third parties (non-owner contacts), notifies the owner, and sends replies o...
Consume the shared Whisper speech-to-text API over Tailnet at http://100.92.116.99:8765 using OpenAI-compatible audio transcription endpoint (/v1/audio/trans...
Send and receive voice messages on Feishu (Lark) using ElevenLabs TTS and STT. Activate when user asks to send a voice message on Feishu, or when receiving a...
Document intelligence: categorize, autofill forms, analyze contracts, scan receipts/invoices, analyze bank statements, parse resumes/CVs, scan IDs/passports...
AI task hub for image analysis, background removal, speech-to-text, text-to-speech, markdown conversion, and async execute/poll/presentation orchestration. U...
Your eyes, hands, and ears on Android. See the screen (screenshot + indexed UI tree), interact (tap, swipe, scroll, type, clear-field), navigate via deep lin...
Isolated agent runtime for code execution, live preview URLs, browser automation, 50+ tools (ffmpeg, sqlite, pandoc, imagemagick), LLM inference, and persistent memory — all via CLI or HTTP, no SDK
Manage your personal knowledge, store insights, track tasks, and stay accountable by syncing and updating your DeepThink user data and todos.
Fully offline, CUDA-accelerated local voice assistant pipeline for NVIDIA Jetson. Wake word (openWakeWord) → real-time VAD → whisper.cpp GPU STT → LLM → Pipe...
Analyzes Bilibili academic/educational videos to extract knowledge points and generate clean-style study notes with screenshots. Use this skill when users pr...
PullThatUpJamie — Podcast Intelligence. A semantically indexed podcast corpus (109+ feeds, ~7K episodes, ~1.9M paragraphs) that works as a vector DB for podc...
ElevenLabs TTS (Text-to-Speech) with emotional audio tags for expressive voice synthesis. WhatsApp-compatible voice messages with Opus conversion. Supports 7...
Send native iMessage voice bubbles with ElevenLabs TTS via BlueBubbles. Use when: user asks to send a voice message, wants something spoken aloud, storytelli...
Turn any URL into structured content — YouTube videos (via Gemini Video API), web articles, PDFs, and audio files. Extract transcripts, summaries, and metada...
AI sales assistant that classifies leads, interprets feedback, generates quotes, and manages your manufacturing and technical sales pipeline via email integr...
Turn a Bilibili video URL or BV number into a summarized XMind mind map. Use when the user wants to collect subtitles, comments, AI summary, and transcript f...
--- name: mlx-whisper description: Local speech-to-text with MLX Whisper (Apple Silicon optimized, no API key). homepage: https://github.com/ml-explore/mlx-examples/tree/main/whisper --- # MLX Whispe
Memorist Agent — helps you capture your parents' and family members' life stories through adaptive interviews via WhatsApp, WeChat, or direct conversation. O...
XiaoZhi AI Device (ESP32) integration for OpenClaw. Enables real-time voice interaction with your AI assistant through XiaoZhi hardware. Supports WebSocket b...
Extracts YouTube video transcripts and provides concise summaries highlighting main points, arguments, and conclusions without watching the full video.
Use ACE-Step API to generate music, edit songs, and remix music. Supports text-to-music, lyrics generation, audio continuation, and audio repainting. Use thi...
Run QCut's native TypeScript pipeline CLI for AI content generation, video analysis, transcription, YAML pipelines, ViMax agentic video production, and proje...
Local STT and TTS on macOS using native Apple capabilities. Speech-to-text via yap (Apple Speech.framework), text-to-speech via say + ffmpeg. Fully offline, no API keys required. Includes voice qualit