🧪 Skills

XCrawl Scrape

Use this skill for XCrawl scrape tasks, including single-URL fetch, format selection, sync or async execution, and JSON extraction with prompt or json_schema.

v1.0.2
❤️ 0
⬇️ 51
👁 2
Share

Description


name: xcrawl-scrape description: Use this skill for XCrawl scrape tasks, including single-URL fetch, format selection, sync or async execution, and JSON extraction with prompt or json_schema. allowed-tools: Bash(curl:) Bash(node:) Read Write Edit Grep metadata: {"version":"1.0.2","openclaw":{"skillKey":"xcrawl-scrape","homepage":"https://www.xcrawl.com/","requires":{"localFiles":["~/.xcrawl/config.json"],"anyBins":["curl","node"]},"apiKeySource":"local_config"}}

XCrawl Scrape

Overview

This skill handles single-page extraction with XCrawl Scrape APIs. Default behavior is raw passthrough: return upstream API response bodies as-is.

Required Local Config

Before using this skill, the user must create a local config file and write XCRAWL_API_KEY into it.

Path: ~/.xcrawl/config.json

{
  "XCRAWL_API_KEY": "<your_api_key>"
}

Read API key from local config file only. Do not require global environment variables.

Credits and Account Setup

Using XCrawl APIs consumes credits. If the user does not have an account or available credits, guide them to register at https://dash.xcrawl.com/. After registration, they can activate the free 1000 credits plan before running requests.

Tool Permission Policy

Request runtime permissions for curl and node only. Do not request Python, shell helper scripts, or other runtime permissions.

API Surface

  • Start scrape: POST /v1/scrape
  • Read async result: GET /v1/scrape/{scrape_id}
  • Base URL: https://run.xcrawl.com
  • Required header: Authorization: Bearer <XCRAWL_API_KEY>

Usage Examples

cURL (sync)

API_KEY="$(node -e "const fs=require('fs');const p=process.env.HOME+'/.xcrawl/config.json';const k=JSON.parse(fs.readFileSync(p,'utf8')).XCRAWL_API_KEY||'';process.stdout.write(k)")"

curl -sS -X POST "https://run.xcrawl.com/v1/scrape" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${API_KEY}" \
  -d '{"url":"https://example.com","mode":"sync","output":{"formats":["markdown","links"]}}'

cURL (async create + result)

API_KEY="$(node -e "const fs=require('fs');const p=process.env.HOME+'/.xcrawl/config.json';const k=JSON.parse(fs.readFileSync(p,'utf8')).XCRAWL_API_KEY||'';process.stdout.write(k)")"

CREATE_RESP="$(curl -sS -X POST "https://run.xcrawl.com/v1/scrape" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${API_KEY}" \
  -d '{"url":"https://example.com/product/1","mode":"async","output":{"formats":["json"]},"json":{"prompt":"Extract title and price."}}')"

echo "$CREATE_RESP"

SCRAPE_ID="$(node -e 'const s=process.argv[1];const j=JSON.parse(s);process.stdout.write(j.scrape_id||"")' "$CREATE_RESP")"

curl -sS -X GET "https://run.xcrawl.com/v1/scrape/${SCRAPE_ID}" \
  -H "Authorization: Bearer ${API_KEY}"

Node

node -e '
const fs=require("fs");
const apiKey=JSON.parse(fs.readFileSync(process.env.HOME+"/.xcrawl/config.json","utf8")).XCRAWL_API_KEY;
const body={url:"https://example.com",mode:"sync",output:{formats:["markdown","json"]},json:{prompt:"Extract title and publish date."}};
fetch("https://run.xcrawl.com/v1/scrape",{
  method:"POST",
  headers:{"Content-Type":"application/json",Authorization:`Bearer ${apiKey}`},
  body:JSON.stringify(body)
}).then(async r=>{console.log(await r.text());});
'

Request Parameters

Request endpoint and headers

  • Endpoint: POST https://run.xcrawl.com/v1/scrape
  • Headers:
  • Content-Type: application/json
  • Authorization: Bearer <api_key>

Request body: top-level fields

Field Type Required Default Description
url string Yes - Target URL
mode string No sync sync or async
proxy object No - Proxy config
request object No - Request config
js_render object No - JS rendering config
output object No - Output config
webhook object No - Async webhook config (mode=async)

proxy

Field Type Required Default Description
location string No US ISO-3166-1 alpha-2 country code, e.g. US / JP / SG
sticky_session string No Auto-generated Sticky session ID; same ID attempts to reuse exit

request

Field Type Required Default Description
locale string No en-US,en;q=0.9 Affects Accept-Language
device string No desktop desktop / mobile; affects UA and viewport
cookies object map No - Cookie key/value pairs
headers object map No - Header key/value pairs
only_main_content boolean No true Return main content only
block_ads boolean No true Attempt to block ad resources
skip_tls_verification boolean No true Skip TLS verification

js_render

Field Type Required Default Description
enabled boolean No true Enable browser rendering
wait_until string No load load / domcontentloaded / networkidle
viewport.width integer No - Viewport width (desktop 1920, mobile 402)
viewport.height integer No - Viewport height (desktop 1080, mobile 874)

output

Field Type Required Default Description
formats string[] No ["markdown"] Output formats
screenshot string No viewport full_page / viewport (only if formats includes screenshot)
json.prompt string No - Extraction prompt
json.json_schema object No - JSON Schema

output.formats enum:

  • html
  • raw_html
  • markdown
  • links
  • summary
  • screenshot
  • json

webhook

Field Type Required Default Description
url string No - Callback URL
headers object map No - Custom callback headers
events string[] No ["started","completed","failed"] Events: started / completed / failed

Response Parameters

Sync create response (mode=sync)

Field Type Description
scrape_id string Task ID
endpoint string Always scrape
version string Version
status string completed / failed
url string Target URL
data object Result data
started_at string Start time (ISO 8601)
ended_at string End time (ISO 8601)
total_credits_used integer Total credits used

data fields (based on output.formats):

  • html, raw_html, markdown, links, summary, screenshot, json
  • metadata (page metadata)
  • traffic_bytes
  • credits_used
  • credits_detail

credits_detail fields:

Field Type Description
base_cost integer Base scrape cost
traffic_cost integer Traffic cost
json_extract_cost integer JSON extraction cost

Async create response (mode=async)

Field Type Description
scrape_id string Task ID
endpoint string Always scrape
version string Version
status string Always pending

Async result response (GET /v1/scrape/{scrape_id})

Field Type Description
scrape_id string Task ID
endpoint string Always scrape
version string Version
status string pending / crawling / completed / failed
url string Target URL
data object Same shape as sync data
started_at string Start time (ISO 8601)
ended_at string End time (ISO 8601)

Workflow

  1. Restate the user goal as an extraction contract.
  • URL scope, required fields, accepted nulls, and precision expectations.
  1. Build the scrape request body.
  • Keep only necessary options.
  • Prefer explicit output.formats.
  1. Execute scrape and capture task metadata.
  • Track scrape_id, status, and timestamps.
  • If async, poll until completed or failed.
  1. Return raw API responses directly.
  • Do not synthesize or compress fields by default.

Output Contract

Return:

  • Endpoint(s) used and mode (sync or async)
  • request_payload used for the request
  • Raw response body from each API call
  • Error details when request fails

Do not generate summaries unless the user explicitly requests a summary.

Guardrails

  • Do not invent unsupported output fields.
  • Do not hardcode provider-specific tool schemas in core logic.
  • Call out uncertainty when page structure is unstable.

Reviews (0)

Sign in to write a review.

No reviews yet. Be the first to review!

Comments (0)

Sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Compatible Platforms

Pricing

Free

Related Configs