๐Ÿงช Skills

Korean Scraper

--- name: korean-scraper description: Korean website specialized scraper with anti-bot protection (Naver, Coupang, Daum, Instagram) version: 1.0.0 author: ๋ฌดํŽญ์ด ๐Ÿง --- # korean-scraper **ํ•œ

v1.0.0
โญ โ€”
โค๏ธ 0
โฌ‡๏ธ 432
๐Ÿ‘ 1
Share

Description


name: korean-scraper description: Korean website specialized scraper with anti-bot protection (Naver, Coupang, Daum, Instagram) version: 1.0.0 author: ๋ฌดํŽญ์ด ๐Ÿง

korean-scraper

ํ•œ๊ตญ ์›น์‚ฌ์ดํŠธ ์ „๋ฌธ ์Šคํฌ๋ž˜ํผ โ€” Playwright ๊ธฐ๋ฐ˜์œผ๋กœ ๋„ค์ด๋ฒ„, ์ฟ ํŒก, ๋‹ค์Œ ๋“ฑ ํ•œ๊ตญ ์ฃผ์š” ์‚ฌ์ดํŠธ์—์„œ ๊ตฌ์กฐํ™”๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค. Anti-bot ๋ณดํ˜ธ ์šฐํšŒ ๊ธฐ๋Šฅ ํฌํ•จ.

When to Use

  • ๋„ค์ด๋ฒ„ ๋ธ”๋กœ๊ทธ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ ์ˆ˜์ง‘ ๋˜๋Š” ํŠน์ • ๋ธ”๋กœ๊ทธ ๋ณธ๋ฌธ ์ถ”์ถœ
  • ๋„ค์ด๋ฒ„ ์นดํŽ˜ ์ธ๊ธฐ๊ธ€/์ตœ์‹ ๊ธ€ ์Šคํฌ๋ž˜ํ•‘
  • ์ฟ ํŒก ์ƒํ’ˆ ์ •๋ณด (๊ฐ€๊ฒฉ, ๋ฆฌ๋ทฐ, ๋ณ„์ ) ์ˆ˜์ง‘
  • ๋„ค์ด๋ฒ„ ๋‰ด์Šค/๋‹ค์Œ ๋‰ด์Šค ๊ธฐ์‚ฌ ๋ณธ๋ฌธ ์ถ”์ถœ
  • ํ•œ๊ตญ ์‚ฌ์ดํŠธ ๋Œ€์ƒ ์ž๋™ํ™”๋œ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘

Installation

cd skills/korean-scraper
npm install
npx playwright install chromium

Quick Start

๋„ค์ด๋ฒ„ ๋ธ”๋กœ๊ทธ

# ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ ์ˆ˜์ง‘
node scripts/naver-blog.js search "๋ง›์ง‘ ์ถ”์ฒœ" --limit 10

# ํŠน์ • ๋ธ”๋กœ๊ทธ ๋ณธ๋ฌธ ์ถ”์ถœ
node scripts/naver-blog.js extract "https://blog.naver.com/..."

๋„ค์ด๋ฒ„ ์นดํŽ˜

# ์ธ๊ธฐ๊ธ€ ์ˆ˜์ง‘
node scripts/naver-cafe.js popular "์นดํŽ˜URL" --limit 20

# ์ตœ์‹ ๊ธ€ ์ˆ˜์ง‘
node scripts/naver-cafe.js recent "์นดํŽ˜URL" --limit 20

์ฟ ํŒก ์ƒํ’ˆ

# ์ƒํ’ˆ ์ •๋ณด ์ถ”์ถœ
node scripts/coupang.js product "์ƒํ’ˆURL"

# ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ ์ˆ˜์ง‘
node scripts/coupang.js search "๋ฌด์„  ์ด์–ดํฐ" --limit 20

๋„ค์ด๋ฒ„ ๋‰ด์Šค

# ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ ์ˆ˜์ง‘
node scripts/naver-news.js search "AI" --limit 10

# ๊ธฐ์‚ฌ ๋ณธ๋ฌธ ์ถ”์ถœ
node scripts/naver-news.js extract "https://n.news.naver.com/..."

๋‹ค์Œ ๋‰ด์Šค

# ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ ์ˆ˜์ง‘
node scripts/daum-news.js search "๊ฒฝ์ œ" --limit 10

# ๊ธฐ์‚ฌ ๋ณธ๋ฌธ ์ถ”์ถœ
node scripts/daum-news.js extract "https://v.daum.net/..."

Output Format

๋ชจ๋“  ์Šคํฌ๋ฆฝํŠธ๋Š” ๊ตฌ์กฐํ™”๋œ JSON์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค:

๋„ค์ด๋ฒ„ ๋ธ”๋กœ๊ทธ ๊ฒ€์ƒ‰

{
  "status": "success",
  "query": "๋ง›์ง‘ ์ถ”์ฒœ",
  "count": 10,
  "results": [
    {
      "title": "์„œ์šธ ๊ฐ•๋‚จ ๋ง›์ง‘ ์ถ”์ฒœ BEST 5",
      "url": "https://blog.naver.com/...",
      "blogger": "๋ง›์ง‘ํƒํ—˜๊ฐ€",
      "date": "2026-02-15",
      "snippet": "๊ฐ•๋‚จ์—ญ ๊ทผ์ฒ˜ ์ˆจ์€ ๋ง›์ง‘๋“ค์„..."
    }
  ]
}

๋„ค์ด๋ฒ„ ๋ธ”๋กœ๊ทธ ๋ณธ๋ฌธ

{
  "status": "success",
  "url": "https://blog.naver.com/...",
  "title": "์„œ์šธ ๊ฐ•๋‚จ ๋ง›์ง‘ ์ถ”์ฒœ BEST 5",
  "author": "๋ง›์ง‘ํƒํ—˜๊ฐ€",
  "date": "2026-02-15",
  "content": "# ์„œ์šธ ๊ฐ•๋‚จ ๋ง›์ง‘ ์ถ”์ฒœ BEST 5\n\n1. ...",
  "images": ["https://..."],
  "tags": ["๋ง›์ง‘", "๊ฐ•๋‚จ", "์„œ์šธ"]
}

์ฟ ํŒก ์ƒํ’ˆ

{
  "status": "success",
  "url": "https://www.coupang.com/...",
  "productName": "์• ํ”Œ ์—์–ดํŒŸ ํ”„๋กœ 2์„ธ๋Œ€",
  "price": 299000,
  "originalPrice": 359000,
  "discount": "17%",
  "rating": 4.8,
  "reviewCount": 1523,
  "rocketDelivery": true,
  "seller": "์ฟ ํŒก",
  "images": ["https://..."]
}

๋„ค์ด๋ฒ„ ์นดํŽ˜

{
  "status": "success",
  "cafeUrl": "https://cafe.naver.com/...",
  "type": "popular",
  "count": 20,
  "posts": [
    {
      "title": "์‹ ์ž… ํšŒ์› ์ธ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค",
      "url": "https://cafe.naver.com/.../12345",
      "author": "๋‹‰๋„ค์ž„",
      "date": "2026-02-17",
      "views": 523,
      "comments": 12
    }
  ]
}

๋‰ด์Šค ๊ธฐ์‚ฌ

{
  "status": "success",
  "url": "https://n.news.naver.com/...",
  "title": "AI ์‹œ์žฅ ๊ทœ๋ชจ ๊ธ‰์„ฑ์žฅ ์ „๋ง",
  "media": "์กฐ์„ ์ผ๋ณด",
  "author": "ํ™๊ธธ๋™ ๊ธฐ์ž",
  "date": "2026-02-17 09:30",
  "content": "# AI ์‹œ์žฅ ๊ทœ๋ชจ ๊ธ‰์„ฑ์žฅ ์ „๋ง\n\n...",
  "category": "IT/๊ณผํ•™",
  "images": ["https://..."]
}

Anti-Bot Features

  • navigator.webdriver ์ˆจ๊น€ โ€” ์ž๋™ํ™” ํƒ์ง€ ํšŒํ”ผ
  • ์‹ค์ œ User-Agent ์‚ฌ์šฉ โ€” ๋ชจ๋ฐ”์ผ/๋ฐ์Šคํฌํƒ‘ ๋žœ๋ค
  • ์ธ๊ฐ„ ํ–‰๋™ ๋ชจ๋ฐฉ โ€” ๋žœ๋ค ๋”œ๋ ˆ์ด, ์Šคํฌ๋กค
  • Stealth Plugin โ€” Playwright extra stealth
  • Cloudflare ์šฐํšŒ โ€” ๋Œ€๊ธฐ ์‹œ๊ฐ„ ์ž๋™ ์กฐ์ •

Rate Limiting

๋ชจ๋“  ์Šคํฌ๋ฆฝํŠธ๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ ์‚ฌ์ดํŠธ๋ฅผ ๋ณดํ˜ธํ•ฉ๋‹ˆ๋‹ค:

  • ์š”์ฒญ๋‹น 2-5์ดˆ ๋žœ๋ค ๋”œ๋ ˆ์ด
  • ๋™์ผ ๋„๋ฉ”์ธ 1์ดˆ๋‹น ์ตœ๋Œ€ 1ํšŒ ์š”์ฒญ
  • 429 ์‘๋‹ต ์‹œ ์ž๋™ ๋ฐฑ์˜คํ”„
  • --fast ํ”Œ๋ž˜๊ทธ๋กœ ๋”œ๋ ˆ์ด ์ถ•์†Œ ๊ฐ€๋Šฅ (์ฃผ์˜)

Error Handling

์ƒํ™ฉ ๋™์ž‘
404 JSON์œผ๋กœ ์—๋Ÿฌ ๋ฐ˜ํ™˜, ๊ณ„์† ์ง„ํ–‰
403/์ฐจ๋‹จ ์žฌ์‹œ๋„ (์ตœ๋Œ€ 3ํšŒ)
ํƒ€์ž„์•„์›ƒ ๋Œ€๊ธฐ ์‹œ๊ฐ„ ์—ฐ์žฅ ํ›„ ์žฌ์‹œ๋„
๋กœ๊ทธ์ธ ํ•„์š” ๊ฒฝ๊ณ  ๋ฉ”์‹œ์ง€ + ๊ฐ€๋Šฅํ•œ ๋ฐ์ดํ„ฐ๋งŒ ๋ฐ˜ํ™˜

Environment Variables

# Headless ๋ชจ๋“œ ๋„๊ธฐ (๋””๋ฒ„๊น…์šฉ)
HEADLESS=false node scripts/naver-blog.js ...

# ์Šคํฌ๋ฆฐ์ƒท ์ €์žฅ
SCREENSHOT=true node scripts/coupang.js ...

# ๋Œ€๊ธฐ ์‹œ๊ฐ„ ์กฐ์ • (ms)
WAIT_TIME=10000 node scripts/naver-cafe.js ...

# User-Agent ์ปค์Šคํ…€
USER_AGENT="..." node scripts/naver-news.js ...

Integration Examples

OpenClaw Agent ํ†ตํ•ฉ

// ๋„ค์ด๋ฒ„ ๋ธ”๋กœ๊ทธ ๊ฒ€์ƒ‰
const result = await exec({
  command: 'node scripts/naver-blog.js search "AI ํŠธ๋ Œ๋“œ" --limit 5',
  workdir: '/path/to/skills/korean-scraper'
});
const data = JSON.parse(result.stdout);

Batch Processing

# ์—ฌ๋Ÿฌ URL ์ผ๊ด„ ์ฒ˜๋ฆฌ
cat urls.txt | while read url; do
  node scripts/naver-blog.js extract "$url" >> results.jsonl
done

Limitations

  • ๋กœ๊ทธ์ธ ํ•„์š” ์ฝ˜ํ…์ธ : ํ˜„์žฌ ๋น„๋กœ๊ทธ์ธ ์ƒํƒœ๋กœ๋งŒ ์Šคํฌ๋ž˜ํ•‘ (์ฟ ํŒก ์ผ๋ถ€ ๋ฆฌ๋ทฐ ๋“ฑ)
  • ๋™์  ๋กœ๋”ฉ: ๋ฌดํ•œ ์Šคํฌ๋กค์€ ๊ธฐ๋ณธ 10๊ฐœ๊นŒ์ง€๋งŒ (--scroll ํ”Œ๋ž˜๊ทธ๋กœ ํ™•์žฅ ๊ฐ€๋Šฅ)
  • CAPTCHA: ์ˆ˜๋™ ์šฐํšŒ ํ•„์š” (์ž๋™ํ™” ๋ถˆ๊ฐ€)
  • IP ์ฐจ๋‹จ: ๊ณผ๋„ํ•œ ์š”์ฒญ ์‹œ ์ผ์‹œ์  ์ฐจ๋‹จ ๊ฐ€๋Šฅ (rate limiting ์ค€์ˆ˜ ํ•„์š”)

Compliance & Ethics

  • โœ… ๊ณต๊ฐœ๋œ ์ •๋ณด๋งŒ ์ˆ˜์ง‘
  • โœ… robots.txt ์ค€์ˆ˜ (๊ธฐ๋ณธ๊ฐ’)
  • โœ… Rate limiting์œผ๋กœ ์„œ๋ฒ„ ๋ถ€ํ•˜ ์ตœ์†Œํ™”
  • โŒ ๊ฐœ์ธ์ •๋ณด ์ˆ˜์ง‘ ๊ธˆ์ง€
  • โŒ ๋กœ๊ทธ์ธ ํ•„์š” ์ฝ˜ํ…์ธ  ๋ฌด๋‹จ ์ ‘๊ทผ ๊ธˆ์ง€
  • โŒ ์ €์ž‘๊ถŒ ์นจํ•ด ๋ชฉ์  ์‚ฌ์šฉ ๊ธˆ์ง€

Troubleshooting

๋ฌธ์ œ: 403 Forbidden

ํ•ด๊ฒฐ์ฑ…:

  1. User-Agent ๋ณ€๊ฒฝ ์‹œ๋„
  2. ๋Œ€๊ธฐ ์‹œ๊ฐ„ ๋Š˜๋ฆฌ๊ธฐ (WAIT_TIME=15000)
  3. Headless ๋ชจ๋“œ ๋„๊ธฐ (HEADLESS=false)

๋ฌธ์ œ: ๋นˆ ๊ฒฐ๊ณผ ๋ฐ˜ํ™˜

ํ•ด๊ฒฐ์ฑ…:

  1. URL ํ˜•์‹ ํ™•์ธ
  2. ์‚ฌ์ดํŠธ ๊ตฌ์กฐ ๋ณ€๊ฒฝ ๊ฐ€๋Šฅ์„ฑ (์…€๋ ‰ํ„ฐ ์—…๋ฐ์ดํŠธ ํ•„์š”)
  3. ๋กœ๊ทธ์ธ ํ•„์š” ์—ฌ๋ถ€ ํ™•์ธ

๋ฌธ์ œ: Timeout

ํ•ด๊ฒฐ์ฑ…:

  1. WAIT_TIME ๋Š˜๋ฆฌ๊ธฐ
  2. ์ธํ„ฐ๋„ท ์—ฐ๊ฒฐ ํ™•์ธ
  3. ์‚ฌ์ดํŠธ ์ ‘๊ทผ ๊ฐ€๋Šฅ ์—ฌ๋ถ€ ํ™•์ธ (VPN ํ•„์š” ๋“ฑ)

Maintenance

ํ•œ๊ตญ ์‚ฌ์ดํŠธ๋“ค์€ UI๋ฅผ ์ž์ฃผ ๋ณ€๊ฒฝํ•˜๋ฏ€๋กœ, ์…€๋ ‰ํ„ฐ ์—…๋ฐ์ดํŠธ๊ฐ€ ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์…€๋ ‰ํ„ฐ ์œ„์น˜: scripts/ ๋‚ด ๊ฐ ํŒŒ์ผ ์ƒ๋‹จ SELECTORS ๊ฐ์ฒด

const SELECTORS = {
  blogTitle: '.se-title-text',
  blogContent: '.se-main-container',
  // ...
};

Future Improvements

  • ์ธ์Šคํƒ€๊ทธ๋žจ ๊ฒŒ์‹œ๋ฌผ ์Šคํฌ๋ž˜ํ•‘
  • ๋„ค์ด๋ฒ„ ์‡ผํ•‘ ๊ฐ€๊ฒฉ ๋น„๊ต
  • ์œ ํŠœ๋ธŒ ํ•œ๊ตญ ์ฑ„๋„ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ
  • ๋ฐฐ์น˜ ์ฒ˜๋ฆฌ ์ตœ์ ํ™” (๋ณ‘๋ ฌ ์‹คํ–‰)
  • ์ฟ ํ‚ค/์„ธ์…˜ ๊ด€๋ฆฌ (๋กœ๊ทธ์ธ ์œ ์ง€)
  • Proxy ์ง€์›

References

Reviews (0)

Sign in to write a review.

No reviews yet. Be the first to review!

Comments (0)

Sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Compatible Platforms

Pricing

Free

Related Configs