Description

name: scraper description: Structured extraction and cleanup for public, user-authorized web pages. Use when the user wants to collect, clean, summarize, or transform content from accessible pages into reusable text or data. Do not use to bypass logins, paywalls, captchas, robots restrictions, or access controls. Local-only output.

Scraper

Turn messy public pages into clean, reusable data.

Scraper is a safe extraction skill for public, user-authorized pages. It helps the agent:

All outputs are stored locally under:

Capture a page: fetch_page.py --url "https://example.com"
Extract readable text: extract_text.py --url "https://example.com"
Save cleaned content: save_output.py --url "https://example.com" --title "Example"
List prior jobs: list_jobs.py

Script	Purpose
`init_storage.py`	Initialize scraper storage
`fetch_page.py`	Download a page with standard headers
`extract_text.py`	Convert HTML into cleaned plain text
`save_output.py`	Save extracted output and register a job
`list_jobs.py`	Show past scraping jobs