🧪 Skills
Robots.txt Generator
Generate, validate, and analyze robots.txt files for websites. Use when creating robots.txt from scratch, validating existing robots.txt syntax, checking if...
v1.0.0
Description
name: robots-txt-gen description: Generate, validate, and analyze robots.txt files for websites. Use when creating robots.txt from scratch, validating existing robots.txt syntax, checking if a URL is allowed/blocked by robots.txt rules, or generating robots.txt for common platforms (WordPress, Next.js, Django, Rails). Also use when auditing crawl directives or debugging search engine indexing issues.
robots-txt-gen
Generate, validate, and test robots.txt files from the command line.
Quick Start
# Generate a robots.txt for a platform
python3 scripts/robots_txt_gen.py generate --preset nextjs --sitemap https://example.com/sitemap.xml
# Validate an existing robots.txt
python3 scripts/robots_txt_gen.py validate --file robots.txt
# Validate a remote robots.txt
python3 scripts/robots_txt_gen.py validate --url https://example.com/robots.txt
# Test if a URL is allowed for a user-agent
python3 scripts/robots_txt_gen.py test --file robots.txt --url /admin/dashboard --agent Googlebot
# Generate with custom rules
python3 scripts/robots_txt_gen.py generate --allow "/" --disallow "/admin" --disallow "/api" --disallow "/private" --sitemap https://example.com/sitemap.xml --agent "*"
Commands
generate
Create a robots.txt file with custom rules or platform presets.
Options:
--preset <name>— Use a platform preset:wordpress,nextjs,django,rails,laravel,static,spa,ecommerce--agent <name>— User-agent (default:*). Repeat for multiple agents.--allow <path>— Allow path. Repeatable.--disallow <path>— Disallow path. Repeatable.--sitemap <url>— Sitemap URL. Repeatable.--crawl-delay <seconds>— Crawl delay directive.--block-ai— Add rules to block common AI crawlers (GPTBot, ChatGPT-User, CCBot, Google-Extended, anthropic-ai, etc.)--output <file>— Write to file instead of stdout.
validate
Check a robots.txt file for syntax errors and best-practice warnings.
Options:
--file <path>— Local file to validate.--url <url>— Remote robots.txt URL to fetch and validate.
test
Test whether a specific URL path is allowed or disallowed for a given user-agent.
Options:
--file <path>— robots.txt file to test against.--url <path>— URL path to test (e.g.,/admin/login).--agent <name>— User-agent to test as (default:Googlebot).
Platform Presets
| Preset | What it blocks | Notes |
|---|---|---|
wordpress |
/wp-admin/, /wp-includes/, query params |
Allows /wp-admin/admin-ajax.php |
nextjs |
/_next/static/, /api/, /.next/ |
Standard Next.js paths |
django |
/admin/, /static/admin/, /media/private/ |
Django admin and private media |
rails |
/admin/, /assets/, /tmp/ |
Rails conventions |
laravel |
/admin/, /storage/, /vendor/ |
Laravel conventions |
static |
Nothing blocked | Simple allow-all with sitemap |
spa |
/api/, /assets/ |
Single-page app pattern |
ecommerce |
/cart/, /checkout/, /account/, /search? |
Prevents crawling user sessions |
AI Crawler Blocking
The --block-ai flag adds disallow rules for known AI training crawlers:
- GPTBot, ChatGPT-User (OpenAI)
- Google-Extended (Google AI)
- CCBot (Common Crawl)
- anthropic-ai (Anthropic)
- Bytespider (ByteDance)
- ClaudeBot (Anthropic)
- FacebookBot (Meta)
Reviews (0)
Sign in to write a review.
No reviews yet. Be the first to review!
Comments (0)
No comments yet. Be the first to share your thoughts!