Analyze images using NVIDIA Kimi K2.5 vision model via NVIDIA NIM API. Perfect for adding vision to non-vision models like MiniMax M2.5, GLM-5, or any model...
Act as a Vision Strategy Expert. You are an experienced consultant in developing vision and mission statements for specialized transportation companies. Your task is to craft a professional vision sta
Computer vision engineering skill for object detection, image segmentation, and visual AI systems. Covers CNN and Vision Transformer architectures, YOLO/Fast...
--- name: MoltShell Vision Engine description: Give your text-based OpenClaw agent the ability to see and describe images --- # 👁️ MoltShell Vision Engine Standard OpenClaw agents are **blind**
SOTA Computer Vision Expert (2026). Specialized in YOLO26, Segment Anything 3 (SAM 3), Vision Language Models, and real-time spatial analysis.
Agentic Vision via Gemini's native Code Execution sandbox. Use for spatial grounding, visual math, and UI auditing.
--- name: vision-bot description: Analyze images via URL or base64. Auto-detects mode: OCR, object counting, or full description. acceptLicenseTerms: true metadata: clawdbot: emoji: "👁️"
{ "role": "AI and Computer Vision Specialist Coach", "context": { "educational_background": "Graduating December 2026 with B.S. in Computer Engineering, minor in Robotics and Mandarin Chinese.
Extract multilingual document content and language learning notes (French, German, Japanese, Spanish, etc.) from PDFs using multimodal vision (Qwen-VL-Max)....
Dockerized AI-powered web scraper using Playwright with virtual display and vision-based captcha solving, no third-party captcha services needed.
Act as a Career Development Coach specializing in AI and Computer Vision for Defense Systems. You are tasked with creating a detailed roadmap for an aspiring expert aiming to specialize in futuristic
📇 🏠 🍎 🪟 🐧 - Multimodal AI vision MCP server for image, video, and object detection analysis. Enables UI/UX evaluation, visual regression testing, and interface understanding using Google Gemini and Ve
Tag and annotate images using Apple Vision framework (macOS only). Detects faces, bodies, hands, text (OCR), barcodes, objects, scene labels, and saliency re...
Write a compelling vision statement about where I see [project/work] going in the next 2-3 years and how sponsors can be part of that journey.
Provides local image analysis, OCR text extraction, object detection descriptions, image comparison, metadata reading, and format conversion.
Analyze images from multiple angles to extract detailed insights or quick summaries. Describe visuals rapidly or dive deeper with iterative reasoning when you need thorough understanding. Get strategi
Monitors adjacent systems, upstream dependencies, and downstream consumers for changes that could affect your current work — before they break it. Like biolo...
Give your text-based OpenClaw agent the ability to see and describe images
This is a request for a System Instruction (or "Meta-Prompt") that you can use to configure a Gemini Gem. This prompt is designed to force the model into a hyper-analytical mode where it prioritizes c
Analyze any YouTube livestream or RTSP camera feed using natural language — ask what's happening, detect specific events, or get periodic summaries. Powered...
Turn any live camera into a smart camera — describe what to watch for in plain English, get alerts in your chat when it happens. Ask questions about any live...
Automated high-quality video generation (text-to-video, image-to-video) via a local jimeng-api Docker service. Features native OpenClaw image interception, a...
使用MiniMax视觉模型识别图片中的验证码、滑块位置、文字内容等。适用于需要AI视觉分析的场景,如微信验证码识别、网页截图分析、图片文字提取。