使用MiniMax视觉模型识别图片中的验证码、滑块位置、文字内容等。适用于需要AI视觉分析的场景,如微信验证码识别、网页截图分析、图片文字提取。
百度一见专业级视觉 AI Agent:支持图片/视频/及实时视频流分析。相比通用基模,在维持 95%+ 专业精度的同时,推理成本降低 50% 以上,是处理视觉巡检
本地调用 Ollama qwen3-vl:4b 模型自动压缩并分析图片,支持描述、OCR 文字提取和自定义信息抽取。
YOLO视觉任务辅助技能 - 提供YOLO模型安装、使用、配置的最佳实践,帮助用户完成图片处理任务。
Analyze images from multiple angles to extract detailed insights or quick summaries. Describe visuals rapidly or dive deeper with iterative reasoning when you need thorough understanding. Get strategi
Product leadership for scaling companies. Product vision, portfolio strategy, product-market fit, and product org design. Use when setting product vision, ma...
Full-resolution vision for LLMs. Tiles large images and captures web pages via Chrome CDP so vision models process every detail without downscaling. Generates interactive HTML tile previews. Supports
AI planning coach using the 4To1 Method™ — turn 4-year vision into daily action. Connects to Notion, Todoist, Google Calendar, or local Markdown. Use when user wants to plan goals, do weekly revie
AI-powered Apple TV remote that uses vision to autonomously navigate apps, play content, control playback, and manage settings.
Build beautiful HTML photo menus from restaurant URLs, PDFs, or photos using Gemini Vision and AI image generation
Chat with Grok models via xAI API. Supports Grok-4, Grok-4.20, Grok-3, Grok-3-mini, vision, and real-time X search.
--- name: xai description: Chat with Grok models via xAI API. Supports Grok-3, Grok-3-mini, vision, and more. homepage: https://docs.x.ai user-invocable: true disable-model-invocation: true triggers:
Triage GitHub PRs and issues using vision-based scoring. Use when a user wants to prioritize, score, review, de-duplicate, or batch-process open pull request...
Build and route Qwen chat, coding, reasoning, and vision workflows across hosted and self-hosted endpoints with safer debugging.
Extract and analyze content from video ads using Gemini Vision AI. Supports frame extraction, OCR text detection, audio transcription, and AI-powered scene analysis. Use when analyzing video creative
Control a real Android phone via USB or network using GPT-4o vision to run tasks like opening apps, typing, tapping, and automation scripts.
Turn smart glasses photos into social media posts. Monitors a Google Drive folder for new images from Meta Ray-Ban glasses (or any smart glasses), analyzes them with vision AI, drafts tweets/posts in
Generates images and text via reverse-engineered Gemini Web API. Supports text generation, image generation from prompts, reference images for vision input,...
Anthropic Claude API integration — chat completions, streaming, vision, tool use, and batch processing via the Anthropic Messages API. Generate text with Cla...
Track daily caloric intake by sending food photos. Luna analyzes images using vision AI, estimates calories and macros, and stores everything in memory for d...
Image and video analysis powered by Isaac vision models. Capabilities include visual Q&A, object detection, OCR, captioning, counting, and grounded spatial r...
Turn recipes into a Todoist Shopping list. Extract ingredients from recipe photos (Gemini Flash vision) or recipe web pages (search + fetch), then compare against the existing Shopping project with co
--- name: screen-monitor description: Dual-mode screen sharing and analysis. Model-agnostic (Gemini/Claude/Qwen3-VL). metadata: {"clawdbot":{"emoji":"🖥️","requires":{"model_features":["vision"]}}
Strategic product leadership toolkit for Head of Product covering OKR cascade generation, quarterly planning, competitive landscape analysis, product vision...