🧪 Skills

PaperBanana

--- name: paperbanana description: Generate publication-quality academic diagrams from paper methodology text license: MIT-0 dependencies: env: - OPENROUTER_API_KEY (recommended) - GOOGLE_AP

v0.1.0
❤️ 0
⬇️ 129
👁 1
Share

Description


name: paperbanana description: Generate publication-quality academic diagrams from paper methodology text license: MIT-0 dependencies: env: - OPENROUTER_API_KEY (recommended) - GOOGLE_API_KEY (alternative) runtime: - python3 - uv

PaperBanana

Generate publication-quality academic diagrams and pipeline figures from a paper's methodology section and figure caption. PaperBanana orchestrates a multi-agent pipeline (Retriever, Planner, Stylist, Visualizer, Critic) to produce camera-ready figures suitable for venues like NeurIPS, ICML, and ACL.

Environment Setup

cd <repo-root>
uv pip install -r requirements.txt

Set your API key via environment variable or in configs/model_config.yaml.

Option 1 (Recommended): OpenRouter API key — one key for both text reasoning and image generation:

export OPENROUTER_API_KEY="sk-or-v1-..."

Option 2: Google API key — direct access to Gemini API:

export GOOGLE_API_KEY="your-key-here"

If both keys are configured, OpenRouter is used by default.

Usage

python skill/run.py \
  --content "METHOD_TEXT" \
  --caption "FIGURE_CAPTION" \
  --task diagram \
  --output output.png

Parameters

Parameter Required Default Description
--content Yes* Method section text to visualize
--content-file Yes* Path to a file containing the method text (alternative to --content)
--caption Yes Figure caption or visual intent
--task No diagram Task type: diagram
--output No output.png Output image file path
--aspect-ratio No 21:9 Aspect ratio: 21:9, 16:9, or 3:2
--max-critic-rounds No 3 Maximum critic refinement iterations
--num-candidates No 10 Number of parallel candidates to generate
--retrieval-setting No auto Retrieval mode: auto, manual, random, or none
--main-model-name No gemini-3.1-pro-preview Main model for VLM agents. Provider auto-detected from configured API key
--image-gen-model-name No gemini-3.1-flash-image-preview Model for image generation. Also supports gemini-3-pro-image-preview
--exp-mode No demo_full Pipeline: demo_full (with Stylist) or demo_planner_critic (without Stylist)

*One of --content or --content-file is required.

When --num-candidates > 1, output files are named <stem>_0.png, <stem>_1.png, etc.

Output

The absolute path of each saved image is printed to stdout, one per line.

Examples

Diagram

python skill/run.py \
  --content "We propose a transformer-based encoder-decoder architecture. The encoder consists of 12 self-attention layers with residual connections. The decoder uses cross-attention to attend to encoder outputs and generates the target sequence autoregressively." \
  --caption "Figure 1: Overview of the proposed transformer architecture" \
  --task diagram \
  --output architecture.png

Important Notes

  • Runtime: A single candidate typically takes 3-10 minutes depending on model and network conditions. With the default 10 candidates running in parallel, expect ~10-30 minutes total. Plan accordingly.
  • API calls: Each candidate involves multiple LLM calls (Retriever + Planner + Stylist + Visualizer + up to 3 Critic rounds). Candidates run in parallel for efficiency.
  • Image generation: The Visualizer agent calls an image generation model (Gemini Image) to render diagrams.

About

PaperBanana is based on the PaperVizAgent framework, a reference-driven multi-agent system for automated academic illustration. It was developed as part of the research paper:

PaperBanana: Automating Academic Illustration for AI Scientists Dawei Zhu, Rui Meng, Yale Song, Xiyu Wei, Sujian Li, Tomas Pfister, Jinsung Yoon arXiv:2601.23265

The framework introduces a collaborative team of five specialized agents — Retriever, Planner, Stylist, Visualizer, and Critic — to transform raw scientific content into publication-quality diagrams. Evaluation is conducted on the PaperBananaBench benchmark.

Reviews (0)

Sign in to write a review.

No reviews yet. Be the first to review!

Comments (0)

Sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Compatible Platforms

Pricing

Free

Related Configs