PageAgent Browser Enhancement
Enhanced browser DOM manipulation using PageAgent's page-controller. Injects into any web page to provide precise DOM extraction, interactive element detecti...
Description
name: page-agent license: MIT description: Enhanced browser DOM manipulation using PageAgent's page-controller. Injects into any web page to provide precise DOM extraction, interactive element detection (cursor:pointer heuristic), and robust interaction (full event chain simulation, React-compatible input). Use when you need to operate on web pages with precision — clicking, typing, scrolling, form filling, or reading page structure. Combines with frontend-design skill for full design→code→browser-operate workflow.
PageAgent Browser Enhancement Skill
Injects alibaba/page-agent v1.5.6 PageController into web pages via the browser tool's evaluate action. Gives you superior DOM manipulation compared to basic browser actions.
Key Advantages Over Basic Browser Tool
- cursor:pointer heuristic — detects clickable elements even without semantic tags
- Full event chain — mouseenter→mouseover→mousedown→focus→mouseup→click (not just
.click()) - React/Vue compatible input — uses native value setter to bypass framework interception
- contenteditable support — proper beforeinput/input event dispatch
- Indexed elements —
[N]<tag>format for precise LLM-directed operations - Incremental change detection —
*[N]marks new elements since last step
Usage Flow
Step 1: Inject PageController into the page
Use the CDP injection script (handles the 72KB library injection):
node ~/.openclaw/workspace/skills/page-agent/scripts/inject-cdp.mjs <TARGET_ID>
Where TARGET_ID is from browser(action="open", ...). The script injects both page-controller-global.js and inject.js via CDP WebSocket, outputting ✅ injected on success.
Step 2: Get page state (DOM extraction)
// Returns { url, title, header, content, footer }
// content is the LLM-readable simplified HTML with indexed interactive elements
const state = await window.__PA__.getState();
return JSON.stringify({ url: state.url, title: state.title, content: state.content, footer: state.footer });
The content field looks like:
[0]<a aria-label=首页 />
[1]<div >PageAgent />
[2]<button role=button>快速开始 />
[3]<input placeholder=搜索... type=text />
Step 3: Perform actions by index
// Click element at index 2
await window.__PA__.click(2);
// Type text into input at index 3
await window.__PA__.input(3, "hello world");
// Select dropdown option
await window.__PA__.select(5, "Option A");
// Scroll down 1 page
await window.__PA__.scroll(true, 1);
// Scroll specific element
await window.__PA__.scrollElement(4, true, 1);
Step 4: Re-read state after actions
After each action, call getState() again to see the updated DOM. Look for *[N] markers which indicate newly appeared elements.
Practical Workflow: Design → Code → Operate
- Design: Use
frontend-designskill to create the page - Serve: Start a local dev server (
npx serveor framework dev server) - Open:
browser(action="open", targetUrl="http://localhost:3000") - Inject: Load PageController into the page (Step 1 above)
- Inspect: Get DOM state to understand current page structure
- Operate: Click, type, scroll to test and interact with the page
- Iterate: Modify code based on what you observe, re-inject, repeat
Tips
- Always re-inject after page navigation (SPA route changes are fine, full reloads need re-inject)
- The
contentoutput is token-efficient — use it instead of screenshots when possible - For long pages, use scroll + getState to see content below the fold
- Clean up highlights with
window.__PA__.cleanUp()before taking screenshots - Use
profile="openclaw"for the isolated browser, orprofile="chrome"for the Chrome extension relay
Files
scripts/page-controller.js— PageController library (72KB, from @page-agent/page-controller@1.5.6)scripts/inject.js— Helper wrapper that createswindow.__PA__API
Reviews (0)
No reviews yet. Be the first to review!
Comments (0)
No comments yet. Be the first to share your thoughts!