name: page-agent license: MIT description: Enhanced browser DOM manipulation using PageAgent's page-controller. Injects into any web page to provide precise DOM extraction, interactive element detection (cursor:pointer heuristic), and robust interaction (full event chain simulation, React-compatible input). Use when you need to operate on web pages with precision — clicking, typing, scrolling, form filling, or reading page structure. Combines with frontend-design skill for full design→code→browser-operate workflow.

PageAgent Browser Enhancement Skill

Injects alibaba/page-agent v1.5.6 PageController into web pages via the browser tool's evaluate action. Gives you superior DOM manipulation compared to basic browser actions.

Key Advantages Over Basic Browser Tool

cursor:pointer heuristic — detects clickable elements even without semantic tags
Full event chain — mouseenter→mouseover→mousedown→focus→mouseup→click (not just .click())
React/Vue compatible input — uses native value setter to bypass framework interception
contenteditable support — proper beforeinput/input event dispatch
Indexed elements — [N]<tag> format for precise LLM-directed operations
Incremental change detection — *[N] marks new elements since last step

Usage Flow

Step 1: Inject PageController into the page

Use the CDP injection script (handles the 72KB library injection):

node ~/.openclaw/workspace/skills/page-agent/scripts/inject-cdp.mjs <TARGET_ID>

Where TARGET_ID is from browser(action="open", ...). The script injects both page-controller-global.js and inject.js via CDP WebSocket, outputting ✅ injected on success.

Step 2: Get page state (DOM extraction)

// Returns { url, title, header, content, footer }
// content is the LLM-readable simplified HTML with indexed interactive elements
const state = await window.__PA__.getState();
return JSON.stringify({ url: state.url, title: state.title, content: state.content, footer: state.footer });

The content field looks like:

[0]<a aria-label=首页 />
[1]<div >PageAgent />
[2]<button role=button>快速开始 />
[3]<input placeholder=搜索... type=text />

Step 3: Perform actions by index

// Click element at index 2
await window.__PA__.click(2);

// Type text into input at index 3
await window.__PA__.input(3, "hello world");

// Select dropdown option
await window.__PA__.select(5, "Option A");

// Scroll down 1 page
await window.__PA__.scroll(true, 1);

// Scroll specific element
await window.__PA__.scrollElement(4, true, 1);

Step 4: Re-read state after actions

After each action, call getState() again to see the updated DOM. Look for *[N] markers which indicate newly appeared elements.

Practical Workflow: Design → Code → Operate

Design: Use frontend-design skill to create the page
Serve: Start a local dev server (npx serve or framework dev server)
Open: browser(action="open", targetUrl="http://localhost:3000")
Inject: Load PageController into the page (Step 1 above)
Inspect: Get DOM state to understand current page structure
Operate: Click, type, scroll to test and interact with the page
Iterate: Modify code based on what you observe, re-inject, repeat

Tips

Always re-inject after page navigation (SPA route changes are fine, full reloads need re-inject)
The content output is token-efficient — use it instead of screenshots when possible
For long pages, use scroll + getState to see content below the fold
Clean up highlights with window.__PA__.cleanUp() before taking screenshots
Use profile="openclaw" for the isolated browser, or profile="chrome" for the Chrome extension relay

Files

scripts/page-controller.js — PageController library (72KB, from @page-agent/page-controller@1.5.6)
scripts/inject.js — Helper wrapper that creates window.__PA__ API

PageAgent Browser Enhancement

Description

PageAgent Browser Enhancement Skill

Key Advantages Over Basic Browser Tool

Usage Flow

Step 1: Inject PageController into the page

Step 2: Get page state (DOM extraction)

Step 3: Perform actions by index

Step 4: Re-read state after actions

Practical Workflow: Design → Code → Operate

Tips

Files

Reviews (0)

Comments (0)

Compatible Platforms

Links

Pricing

Related Configs

self-improving-agent

Self Improving Agent

Find Skills

Summarize