Web crawler toolkit for site crawling, link extraction, content scraping, sitemap generation, rate limiting, and local data export.
Continuous financial news crawler for finviz.com with SQLite storage, article extraction, and query tool. Use when monitoring financial markets, building new...
Scrapling-only, deterministic web crawler with clean SRP architecture, presets, checkpointing, and JSONL/report outputs.
新闻自动爬取与总结工具。用于抓取指定网站或RSS源的新闻内容,并生成摘要报告。 使用场景: 1. 用户要求"获取今日新闻"、"爬取某网站内容" 2. 用户
Crawl X (Twitter) search results through a local CLI that wraps `abs` (agent-browser). Use when the user asks to scrape X posts by keyword, collect Top/Lates...
Extract videos from Douyin and Twitter by providing platform and URL, outputting the downloaded video file or an error message.
基于 MediaCrawler 的多平台公开信息采集工具,支持安装、命令行运行、WebUI、结果定位与常用任务模板。
This framework aims to provide crawler developers and operators with a comprehensive automated compliance detection toolset to evaluate the crawler-friendliness and potential risks of target websites.
Advanced search and retrieval for web crawler data. Supports WARC, wget, Katana, SiteOne, and InterroBot crawlers.
Expert guide for building web scrapers and crawlers using Crawlee (JavaScript/TypeScript and Python). Use this skill whenever the user wants to: scrape a web...
Web crawling and scraping tool with LLM-optimized output. 网页爬虫爬取工具 | Web crawler, web scraper, spider. DuckDuckGo search, site crawling, dynamic page scrapin...
Use this skill for XCrawl crawl tasks, including bulk site crawling, crawler rule design, async status polling, and delivery of crawl output for downstream s...
Generate, validate, and optimize llms.txt files for AI crawler accessibility. Creates structured markdown files that help AI platforms (ChatGPT, Perplexity,...
Web crawler using BFS and anti-scraping to extract and save structured BBC and general news content in Markdown with multi-site and dedup support.
Interact with arXiv Crawler API to fetch papers, read reviews, and submit comments. Use when working with arXiv papers, fetching paper lists by date/category/interest, viewing paper details with comme
Interact with arXiv Crawler API to fetch papers, read reviews, submit comments, search papers, and import papers. Use when working with arXiv papers, fetchin...
Audit URLs for AI crawler readiness — checks robots.txt, llms.txt, JSON-LD schema, and content density with 0-100 AEO scoring.
Runs the sports science crawler to generate a daily report, sync to Notion, and prevent duplicate content.
内容捕手 (Content Hunter) - 短视频平台热门内容抓取机器人。支持小红书、抖音、B站,可分批抓取热门内容并自动生成汇报。/Content Hunter - Hot content crawler for Xiaohongshu, Douyin, Bilibili. Supports batch s...
Batch web scraping for competitor analysis, price monitoring and market research