🧪 Skills

Arxiv Summarizer Orchestrator

Orchestrates end-to-end arXiv paper retrieval, processing, and batch reporting with language control and parallel or serial paper handling modes.

v0.1.1
❤️ 0
⬇️ 495
👁 1
Share

Description


name: arxiv-summarizer-orchestrator description: "End-to-end orchestration skill for periodic arXiv collection and reporting using three sub-skills: arxiv-search-collector, arxiv-paper-processor, and arxiv-batch-reporter. Supports manual language control across all markdown outputs and Stage-B processing strategy (subagent_parallel default max 5, or serial)."

ArXiv Summarizer Orchestrator

Run the full pipeline by composing three sub-skills.

Sub-skill Order

  1. arxiv-search-collector
  2. arxiv-paper-processor
  3. arxiv-batch-reporter

Workflow Parameters

  • language: manual language parameter used by all stages. Default is English when omitted.
  • paper_processing_mode: subagent_parallel or serial.
  • max_parallel_papers: default 5 when paper_processing_mode=subagent_parallel.

Workflow

Stage A: Collection Setup + Query Retrieval

  1. Initialize one run with arxiv-search-collector/scripts/init_collection_run.py.
  2. Model generates multiple focused queries from original topic and writes a minimal query_plan.json (label + query only).
  3. Run arxiv-search-collector/scripts/fetch_queries_batch.py with the plan file (recommended).
  4. (Optional fallback) call arxiv-search-collector/scripts/fetch_query_metadata.py manually for one-by-one fetch.
  5. Model reads each indexed query list and decides keep indexes.
  6. Merge selected items with arxiv-search-collector/scripts/merge_selected_papers.py.
  7. If relevance/coverage is still not good, iterate Stage A:
    • generate another query plan with new labels,
    • fetch again,
    • re-merge with --incremental and updated selection-json.
    • set weak labels to empty keep list ([]) to explicitly drop them.

Pass --language <LANG> to collector scripts so all generated markdown files in Stage A follow the selected language. Use serial query fetch in Stage A with conservative controls (for example --min-interval-sec 5, --retry-max 4). Default collector settings already include retries/backoff and run-local throttle state (<run_dir>/.runtime/arxiv_api_state.json), so manual tuning is usually unnecessary. Prefer cache reuse (no --force) unless query parameters changed or data refresh is required.

Output: one run directory with per-paper metadata subdirectories.

Stage B: Per-paper Artifact Download + Manual Summary

For each paper directory, invoke sub-skill arxiv-paper-processor once and let that skill produce <paper_dir>/summary.md.

Recommended pre-step for many papers:

  1. Run one batch artifact download before per-paper reading:
python3 arxiv-paper-processor/scripts/download_papers_batch.py \
  --run-dir /path/to/run \
  --artifact source_then_pdf \
  --max-workers 3 \
  --min-interval-sec 5 \
  --language <LANG>

Per-paper execution steps (inside arxiv-paper-processor):

  1. If <paper_dir>/summary.md already exists and is complete, skip this paper.
  2. If usable source (source/source_extract/*.tex) or PDF (source/paper.pdf) already exists, skip download.
  3. If artifacts are missing, download source with arxiv-paper-processor/scripts/download_arxiv_source.py.
  4. If source is unusable, download PDF with arxiv-paper-processor/scripts/download_arxiv_pdf.py.
  5. Model reads content and manually writes <paper_dir>/summary.md by reference format, in language.

Parallel strategy for many papers:

  • Default: paper_processing_mode=subagent_parallel with max_parallel_papers=5.
  • Optional: paper_processing_mode=serial to process one paper at a time.
  • In parallel mode, run multiple arxiv-paper-processor instances in batches; concurrent papers must not exceed max_parallel_papers.
  • Wait for one batch to finish before starting the next batch.
  • In serial mode, run exactly one arxiv-paper-processor instance at a time.
  • Subagent workers should only own one paper directory each to avoid file conflicts.
  • Do not use scripts to auto-compose summary text; scripts are download-only tools.

Output: all paper directories contain summary.md.

Stage C: Bundle + Final Hierarchical Report

  1. Run arxiv-batch-reporter/scripts/collect_summaries_bundle.py --language <LANG>.
  2. Model reads summaries_bundle.md and writes collection_report_template.md in base dir.
  3. In template, each paper leaf entry must include one standalone placeholder line: {{ARXIV_BRIEF:<arxiv_id>}}.
  4. Run arxiv-batch-reporter/scripts/render_collection_report.py to generate final collection_report.md.
  5. Do not manually paraphrase per-paper conclusion lines in final report; they must come from per-paper summary.md section 10 via script injection.

If language is non-English (for example Chinese), all intermediate markdown files and final reports should follow that language.

Periodic Scheduling

This orchestrator is suitable for cron/scheduled execution in OpenClaw:

  • Frequency examples: daily, weekly, monthly.
  • For rolling windows, use lookback (1d, 7d, 30d) when initializing runs.

Output Layout

<output-root>/<topic>-<timestamp>-<range>/

  • task_meta.json, task_meta.md
  • query_results/, query_selection/
  • <arxiv_id>/metadata.md + downloaded source/pdf + summary.md
  • summaries_bundle.md
  • collection_report_template.md
  • final rendered collection report (e.g. collection_report.md)

Use references/workflow-checklist.md as execution checklist.

Related Skills

This is the top-level orchestration skill.

Before using it, install and enable these three sub-skills:

  • arxiv-search-collector
  • arxiv-paper-processor
  • arxiv-batch-reporter

Execution order inside this orchestrator:

  1. arxiv-search-collector (Stage A)
  2. arxiv-paper-processor (Stage B)
  3. arxiv-batch-reporter (Stage C)

Reviews (0)

Sign in to write a review.

No reviews yet. Be the first to review!

Comments (0)

Sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Compatible Platforms

Pricing

Free

Related Configs