Analyze images using NVIDIA Kimi K2.5 vision model via NVIDIA NIM API. Perfect for adding vision to non-vision models like MiniMax M2.5, GLM-5, or any model...
Computer vision engineering skill for object detection, image segmentation, and visual AI systems. Covers CNN and Vision Transformer architectures, YOLO/Fast...
--- name: MoltShell Vision Engine description: Give your text-based OpenClaw agent the ability to see and describe images --- # 👁️ MoltShell Vision Engine Standard OpenClaw agents are **blind**
SOTA Computer Vision Expert (2026). Specialized in YOLO26, Segment Anything 3 (SAM 3), Vision Language Models, and real-time spatial analysis.
Agentic Vision via Gemini's native Code Execution sandbox. Use for spatial grounding, visual math, and UI auditing.
--- name: vision-bot description: Analyze images via URL or base64. Auto-detects mode: OCR, object counting, or full description. acceptLicenseTerms: true metadata: clawdbot: emoji: "👁️"
Extract multilingual document content and language learning notes (French, German, Japanese, Spanish, etc.) from PDFs using multimodal vision (Qwen-VL-Max)....
Dockerized AI-powered web scraper using Playwright with virtual display and vision-based captcha solving, no third-party captcha services needed.
Tag and annotate images using Apple Vision framework (macOS only). Detects faces, bodies, hands, text (OCR), barcodes, objects, scene labels, and saliency re...
Provides local image analysis, OCR text extraction, object detection descriptions, image comparison, metadata reading, and format conversion.
Monitors adjacent systems, upstream dependencies, and downstream consumers for changes that could affect your current work — before they break it. Like biolo...
Give your text-based OpenClaw agent the ability to see and describe images
Analyze any YouTube livestream or RTSP camera feed using natural language — ask what's happening, detect specific events, or get periodic summaries. Powered...
Turn any live camera into a smart camera — describe what to watch for in plain English, get alerts in your chat when it happens. Ask questions about any live...
使用MiniMax视觉模型识别图片中的验证码、滑块位置、文字内容等。适用于需要AI视觉分析的场景,如微信验证码识别、网页截图分析、图片文字提取。
Automated high-quality video generation (text-to-video, image-to-video) via a local jimeng-api Docker service. Features native OpenClaw image interception, a...
图片理解与分析。当用户需要分析图片内容、识别图片中的物体、描述图片场景、理解图片含义时使用此技能。支持图片问答、物体识别、场景描述等
百度一见专业级视觉 AI Agent:支持图片/视频/及实时视频流分析。相比通用基模,在维持 95%+ 专业精度的同时,推理成本降低 50% 以上,是处理视觉巡检
本地调用 Ollama qwen3-vl:4b 模型自动压缩并分析图片,支持描述、OCR 文字提取和自定义信息抽取。
YOLO视觉任务辅助技能 - 提供YOLO模型安装、使用、配置的最佳实践,帮助用户完成图片处理任务。
Product leadership for scaling companies. Product vision, portfolio strategy, product-market fit, and product org design. Use when setting product vision, ma...
AI planning coach using the 4To1 Method™ — turn 4-year vision into daily action. Connects to Notion, Todoist, Google Calendar, or local Markdown. Use when user wants to plan goals, do weekly revie
AI-powered Apple TV remote that uses vision to autonomously navigate apps, play content, control playback, and manage settings.