🧪 Skills
Tavily Crawl
Crawl any website and save pages as local markdown files. Ideal for downloading documentation, knowledge bases, or web content for offline access or analysis.
v1.0.0
Description
name: tavily-crawl description: Crawl any website and save pages as local markdown files. Ideal for downloading documentation, knowledge bases, or web content for offline access or analysis. homepage: https://tavily.com metadata: {"openclaw":{"emoji":"🕷️","requires":{"bins":["node"],"env":["TAVILY_API_KEY"]},"primaryEnv":"TAVILY_API_KEY"}}
Tavily Crawl
Crawl websites to extract content from multiple pages. Ideal for documentation, knowledge bases, and site-wide content extraction.
Authentication
Get your API key at https://tavily.com and add to your OpenClaw config:
{
"skills": {
"entries": {
"tavily-crawl": {
"enabled": true,
"apiKey": "tvly-YOUR_API_KEY_HERE"
}
}
}
}
Or set in environment variable:
export TAVILY_API_KEY="tvly-YOUR_API_KEY_HERE"
Quick Start
Using the Script
node {baseDir}/scripts/crawl.mjs "https://docs.example.com"
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" --output ./docs
node {baseDir}/scripts/crawl.mjs "https://example.com" --depth 2 --limit 50
Examples
# Basic crawl
node {baseDir}/scripts/crawl.mjs "https://docs.example.com"
# Deeper crawl with limits
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" --depth 2 --limit 50
# Save to files
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" --depth 2 --output ./docs
# Focused crawl with path filters
node {baseDir}/scripts/crawl.mjs "https://example.com" --depth 2 \
--select "/docs/.*" --exclude "/blog/.*"
# With semantic instructions
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" \
--instructions "Find API documentation" --chunks 3
Options
| Option | Description | Default |
|---|---|---|
--depth <n> |
Crawl depth (1-5) | 1 |
--breadth <n> |
Links per page | 20 |
--limit <n> |
Total pages cap | 50 |
--output <dir> |
Save pages to directory | - |
--instructions <text> |
Natural language guidance | - |
--chunks <n> |
Chunks per page (1-5, requires instructions) | - |
--depth-mode <mode> |
Extract depth: basic or advanced |
basic |
--select <pattern> |
Regex pattern to include | - |
--exclude <pattern> |
Regex pattern to exclude | - |
--timeout <sec> |
Max wait time (10-150 seconds) | 150 |
--json |
Output raw JSON | false |
Depth vs Performance
| Depth | Typical Pages | Time |
|---|---|---|
| 1 | 10-50 | Seconds |
| 2 | 50-500 | Minutes |
| 3 | 500-5000 | Many minutes |
Start with --depth 1 and increase only if needed.
Crawl for Context vs Data Collection
For agentic use (feeding results into context): Always use --instructions + --chunks. This returns only relevant chunks instead of full pages, preventing context window explosion.
For data collection (saving to files): Omit --chunks to get full page content.
Tips
- Always use
--chunksfor agentic workflows - prevents context explosion when feeding results to LLMs - Omit
--chunksonly for data collection - when saving full pages to files - Start conservative (
--depth 1,--limit 20) and scale up - Use path patterns to focus on relevant sections
- Always set a
--limitto prevent runaway crawls
Reviews (0)
Sign in to write a review.
No reviews yet. Be the first to review!
Comments (0)
No comments yet. Be the first to share your thoughts!