🧪 Skills

Scrapling AI

Use Scrapling to scrape websites with adaptive parsing, Cloudflare bypass, and MCP support. Handles dynamic content, anti-bot detection, and provides clean H...

v1.0.0
❤️ 0
⬇️ 28
👁 1
Share

Description


name: scrapling description: Use Scrapling to scrape websites with adaptive parsing, Cloudflare bypass, and MCP support. Handles dynamic content, anti-bot detection, and provides clean HTML/JSON output. metadata: { "openclaw": { "emoji": "🕷️", "requires": { "bins": ["scrapling"] }, "install": [ { "id": "pipx", "kind": "pipx", "package": "scrapling", "bins": ["scrapling"], "label": "Install Scrapling CLI (pipx)", }, { "id": "python3-pip", "kind": "pip", "package": "scrapling", "bins": ["scrapling"], "label": "Install Scrapling CLI (pip)", }, ], }, }

Scrapling Skill

Use the scrapling CLI to scrape websites with adaptive parsing and anti-bot bypass.

When to Use

USE this skill when:

  • Scrape static or dynamic websites
  • Bypass Cloudflare, captcha, or bot detection
  • Extract structured data (HTML/JSON) from web pages
  • Handle JavaScript-rendered content
  • Get clean HTML without extra scripts/CSS

When NOT to Use

DON'T use this skill when:

  • Simple HTTP requests → use web_fetch
  • Need full browser automation → use browser tool
  • API-based data → use direct API calls
  • Local file processing → use file tools

Setup

# Install CLI
pipx install scrapling
scrapling --version

Common Commands

Basic Scrape

# Get clean HTML
scrapling https://example.com -o html

# Get JSON structure
scrapling https://example.com -o json

# Save to file
scrapling https://example.com -o output.html

With Headers/Timeouts

# Custom headers
scrapling https://example.com --headers "User-Agent: Mozilla/5.0"

# Timeout (seconds)
scrapling https://slow-site.com --timeout 30

Extract Specific Elements

# XPath extraction
scrapling https://example.com -e "//div[@class='content']" -o html

# CSS selector
scrapling https://example.com -e "div.content" -o html

JSON Output with Fields

# Extract title, meta description
scrapling https://example.com \
  --fields 'title,meta_description' \
  -o json

MCP Integration

Scrapling supports MCP (Model Context Protocol) for AI agents:

# Start MCP server
scrapling mcp start

Then configure your agent to use the scrape tool via MCP.

Examples

Scrape News Article

scrapling https://example.com/news/article-123 \
  --fields 'title,author,publish_date,content' \
  -o json

Extract Product Data

scrapling https://shop.example.com/products \
  -e "//div[@class='product']" \
  -o html

Handle Cloudflare

# Scrapling auto-bypasses most protections
scrapling https://protected-site.com -o html

Notes

  • Default timeout: 10 seconds
  • Auto-detects best output format (html/json/text)
  • Handles dynamic content via headless browser when needed
  • Rate limit friendly; add delays between requests

JSON Output Format

{
  "title": "Page Title",
  "meta_description": "Description text",
  "content": "<clean HTML>",
  "links": ["http://...", "..."],
  "images": [{"src": "...", "alt": "..."}]
}

Use the scrapling CLI to scrape websites with adaptive parsing and anti-bot bypass.

When to Use

USE this skill when:

  • Scrape static or dynamic websites
  • Bypass Cloudflare, captcha, or bot detection
  • Extract structured data (HTML/JSON) from web pages
  • Handle JavaScript-rendered content
  • Get clean HTML without extra scripts/CSS

When NOT to Use

DON'T use this skill when:

  • Simple HTTP requests → use web_fetch
  • Need full browser automation → use browser tool
  • API-based data → use direct API calls
  • Local file processing → use file tools

Setup

# Install CLI
pipx install scrapling
scrapling --version

Common Commands

Basic Scrape

# Get clean HTML
scrapling https://example.com -o html

# Get JSON structure
scrapling https://example.com -o json

# Save to file
scrapling https://example.com -o output.html

With Headers/Timeouts

# Custom headers
scrapling https://example.com --headers "User-Agent: Mozilla/5.0"

# Timeout (seconds)
scrapling https://slow-site.com --timeout 30

Extract Specific Elements

# XPath extraction
scrapling https://example.com -e "//div[@class='content']" -o html

# CSS selector
scrapling https://example.com -e "div.content" -o html

JSON Output with Fields

# Extract title, meta description
scrapling https://example.com \
  --fields 'title,meta_description' \
  -o json

MCP Integration

Scrapling supports MCP (Model Context Protocol) for AI agents:

# Start MCP server
scrapling mcp start

Then configure your agent to use the scrape tool via MCP.

Examples

Scrape News Article

scrapling https://example.com/news/article-123 \
  --fields 'title,author,publish_date,content' \
  -o json

Extract Product Data

scrapling https://shop.example.com/products \
  -e "//div[@class='product']" \
  -o html

Handle Cloudflare

# Scrapling auto-bypasses most protections
scrapling https://protected-site.com -o html

Notes

  • Default timeout: 10 seconds
  • Auto-detects best output format (html/json/text)
  • Handles dynamic content via headless browser when needed
  • Rate limit friendly; add delays between requests

JSON Output Format

{
  "title": "Page Title",
  "meta_description": "Description text",
  "content": "<clean HTML>",
  "links": ["http://...", "..."],
  "images": [{"src": "...", "alt": "..."}]
}

Reviews (0)

Sign in to write a review.

No reviews yet. Be the first to review!

Comments (0)

Sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Compatible Platforms

Pricing

Free

Related Configs