Automate browser tasks using the BrowserMCP MCP server and Chrome extension. Use for navigating websites, filling forms, clicking elements, taking screenshot...
Control a real iPhone through macOS iPhone Mirroring — screenshot, tap, swipe, type, launch apps, record video, OCR, and run multi-step scenarios. Works with...
When the user sends a screenshot via Telegram, parse it using Gemini (fast, default) with automatic Claude fallback when confidence is low. Saves results to...
Control Safari on macOS with AppleScript, safaridriver, screenshots, tab navigation, and real-browser read, click, and type workflows.
Control and automate the Linux desktop GUI on X11. Use this skill to take screenshots, find and click UI elements, type text, send keyboard shortcuts, scroll...
Start a Selenium‑controlled Chrome browser, open a URL, take a screenshot, and report progress. Supports headless mode and optional proxy.
Browse the web, read page content, click buttons, fill forms, take screenshots, and get accessibility snapshots using the webcli headless browser. Use when t...
Design UI screens in Paper — a professional design tool running locally on macOS. Create artboards, write HTML into designs, take screenshots, and iterate vi...
Remote Windows desktop control from WSL/Linux via screenshot + mouse/keyboard simulation. Use when: user asks to control their PC, click something, open an a...
macOS CLI tool to record microphone audio, screen video or screenshot, and camera video or photo from the terminal with device listing and output control.
Professional Windows-only visual automation toolkit with 11 modules for screenshot, OCR, template matching, clicks, input, environment setup, and looping tasks.
Complete Feishu (Lark) integration toolkit for AI agents. Read/write documents, fetch chat history, send files & screenshots, manage permissions, and create...
Full Windows desktop control. Mouse, keyboard, screenshots - interact with any Windows application like a human.
Browser automation for AI agents via inference.sh. Navigate web pages, interact with elements using @e refs, take screenshots, record video. Capabilities: web scraping, form filling, clicking, typing,
Complete browser automation with Playwright. Auto-detects dev servers, writes clean test scripts to /tmp. Test pages, fill forms, take screenshots, check res...
Build a reusable UI inspiration library that both archives and retrieves design references. Use when the user wants to save screenshots, collect UI inspirati...
Give your agent eyes on the web — screenshot any URL as an image file. Supports device emulation (iPhone, iPad, Pixel, MacBook), dark mode, full-page scroll,...
Search TV show screenshots and generate memes from The Simpsons, Futurama, Rick and Morty, and 30 Rock
Parse UI screenshots into structured element JSON (type, OCR text, bbox) and operate desktop UI from parsed elements. Use when a user asks to detect/locate U...
Full desktop computer use for headless Linux servers. Xvfb + XFCE virtual desktop with xdotool automation. 17 actions (click, type, scroll, screenshot, drag,...
Automate browser actions locally via browser-use CLI/Python: open pages, click/type, screenshot, extract HTML/links, debug sessions, and capture login QR codes.
Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with w...
Control real Android phones through the Mobilerun API. Supports tapping, swiping, typing, taking screenshots, reading the UI accessibility tree, and managing...
Control Chrome browser with AI using MCP protocol. Use when users want to automate browser tasks, take screenshots, fill forms, click elements, navigate page...