🧪 Skills
scrapling-skill
Use this skill whenever the user asks to scrape a website, extract structured data from web pages, handle anti-bot/Cloudflare pages, crawl multiple pages, or...
v1.0.0
Description
name: scrapling description: Use this skill whenever the user asks to scrape a website, extract structured data from web pages, handle anti-bot/Cloudflare pages, crawl multiple pages, or explicitly mentions Scrapling. This skill provides a practical Scrapling workflow (install, fetcher selection, extraction, and crawl patterns) for reliable Python web scraping.
Scrapling Web Scraping Skill
Goal
Use Scrapling to extract web data with minimal selector breakage and better anti-bot resilience.
Prefer this skill when users ask for:
- website scraping
- data extraction from HTML pages
- Cloudflare/anti-bot resistant scraping
- multi-page crawling
- converting scraping tasks into reusable Python scripts
Safety and Legality
Before scraping, always:
- Confirm the target is allowed by user intent and local laws.
- Avoid unauthorized access, login bypass, or private data scraping.
- Respect target website terms and reasonable request rates.
- For high-volume jobs, add delays and domain-level throttling.
Default Environment (this machine)
All dependencies should live under D:\clawtest.
Recommended setup commands:
python -m venv D:\clawtest\.venv
D:\clawtest\.venv\Scripts\python -m pip install -U pip
D:\clawtest\.venv\Scripts\python -m pip install "scrapling[fetchers]"
D:\clawtest\.venv\Scripts\scrapling install
Notes:
- If the task is simple static HTML extraction,
pip install scraplingis enough. scrapling installis needed for browser-based fetchers.
Fetcher Selection Guide
Choose the lightest option that works:
Fetcher:- Best for static pages and speed.
StealthyFetcher:- Best default when anti-bot checks likely exist.
DynamicFetcher:- Use when data is rendered by JavaScript.
Spider:- Use for multi-page crawl, queueing, concurrency, and structured export.
Standard Workflow
- Identify target fields and output schema first.
- Pick fetcher (
Fetcher->StealthyFetcher->DynamicFetcherescalation). - Extract with CSS/XPath and normalize into JSON-friendly fields.
- Save data to JSON/JSONL/CSV.
- Add retry, timeout, and polite delays for production.
Code Templates
1) Single Page Extraction (Stealthy default)
from scrapling.fetchers import StealthyFetcher
StealthyFetcher.adaptive = True
url = "https://example.com/products"
page = StealthyFetcher.fetch(url, headless=True, network_idle=True, timeout=45000)
items = []
for card in page.css(".product-card", auto_save=True):
items.append({
"title": card.css("h2::text").get(default="").strip(),
"price": card.css(".price::text").get(default="").strip(),
"url": card.css("a::attr(href)").get(default="")
})
print(items)
2) Adaptive Re-location for changed layouts
# First run stores fingerprints:
products = page.css(".product-card", auto_save=True)
# Future run can recover after layout drift:
products = page.css(".product-card", adaptive=True)
3) Spider Crawl Skeleton
from scrapling.spiders import Spider, Response
class ProductSpider(Spider):
name = "product_spider"
start_urls = ["https://example.com/catalog"]
async def parse(self, response: Response):
for card in response.css(".product-card"):
yield {
"title": card.css("h2::text").get(default="").strip(),
"price": card.css(".price::text").get(default="").strip(),
}
for href in response.css("a.next::attr(href)").all():
yield response.follow(href, callback=self.parse)
if __name__ == "__main__":
ProductSpider().start()
Expected Assistant Output Format
When executing a user task with this skill, respond with:
- chosen fetcher/spider strategy and why
- runnable script (or patch) tailored to target site
- exact install/run commands for current machine
- output path and data schema
- anti-bot reliability notes and fallback plan
Practical Fallback Order
If extraction fails:
- Validate selectors on fresh HTML.
- Switch
Fetcher->StealthyFetcher. - Switch to
DynamicFetcherfor JS-rendered content. - Add adaptive selectors (
auto_save=Truethenadaptive=True). - Add retries, backoff, and lower request rate.
Reviews (0)
Sign in to write a review.
No reviews yet. Be the first to review!
Comments (0)
No comments yet. Be the first to share your thoughts!