name: paperbanana description: Generate publication-quality academic diagrams from paper methodology text license: MIT-0 dependencies: env: - OPENROUTER_API_KEY (recommended) - GOOGLE_API_KEY (alternative) runtime: - python3 - uv

PaperBanana

Generate publication-quality academic diagrams and pipeline figures from a paper's methodology section and figure caption. PaperBanana orchestrates a multi-agent pipeline (Retriever, Planner, Stylist, Visualizer, Critic) to produce camera-ready figures suitable for venues like NeurIPS, ICML, and ACL.

Environment Setup

cd <repo-root>
uv pip install -r requirements.txt

Set your API key via environment variable or in configs/model_config.yaml.

Option 1 (Recommended): OpenRouter API key — one key for both text reasoning and image generation:

export OPENROUTER_API_KEY="sk-or-v1-..."

Option 2: Google API key — direct access to Gemini API:

export GOOGLE_API_KEY="your-key-here"

If both keys are configured, OpenRouter is used by default.

Usage

python skill/run.py \
  --content "METHOD_TEXT" \
  --caption "FIGURE_CAPTION" \
  --task diagram \
  --output output.png

Parameters

Parameter	Required	Default	Description
`--content`	Yes*		Method section text to visualize
`--content-file`	Yes*		Path to a file containing the method text (alternative to `--content`)
`--caption`	Yes		Figure caption or visual intent
`--task`	No	`diagram`	Task type: `diagram`
`--output`	No	`output.png`	Output image file path
`--aspect-ratio`	No	`21:9`	Aspect ratio: `21:9`, `16:9`, or `3:2`
`--max-critic-rounds`	No	`3`	Maximum critic refinement iterations
`--num-candidates`	No	`10`	Number of parallel candidates to generate
`--retrieval-setting`	No	`auto`	Retrieval mode: `auto`, `manual`, `random`, or `none`
`--main-model-name`	No	`gemini-3.1-pro-preview`	Main model for VLM agents. Provider auto-detected from configured API key
`--image-gen-model-name`	No	`gemini-3.1-flash-image-preview`	Model for image generation. Also supports `gemini-3-pro-image-preview`
`--exp-mode`	No	`demo_full`	Pipeline: `demo_full` (with Stylist) or `demo_planner_critic` (without Stylist)

*One of --content or --content-file is required.

When --num-candidates > 1, output files are named <stem>_0.png, <stem>_1.png, etc.

Output

The absolute path of each saved image is printed to stdout, one per line.

Examples

Diagram

python skill/run.py \
  --content "We propose a transformer-based encoder-decoder architecture. The encoder consists of 12 self-attention layers with residual connections. The decoder uses cross-attention to attend to encoder outputs and generates the target sequence autoregressively." \
  --caption "Figure 1: Overview of the proposed transformer architecture" \
  --task diagram \
  --output architecture.png

Important Notes

Runtime: A single candidate typically takes 3-10 minutes depending on model and network conditions. With the default 10 candidates running in parallel, expect ~10-30 minutes total. Plan accordingly.
API calls: Each candidate involves multiple LLM calls (Retriever + Planner + Stylist + Visualizer + up to 3 Critic rounds). Candidates run in parallel for efficiency.
Image generation: The Visualizer agent calls an image generation model (Gemini Image) to render diagrams.

About

PaperBanana is based on the PaperVizAgent framework, a reference-driven multi-agent system for automated academic illustration. It was developed as part of the research paper:

PaperBanana: Automating Academic Illustration for AI Scientists Dawei Zhu, Rui Meng, Yale Song, Xiyu Wei, Sujian Li, Tomas Pfister, Jinsung Yoon arXiv:2601.23265

The framework introduces a collaborative team of five specialized agents — Retriever, Planner, Stylist, Visualizer, and Critic — to transform raw scientific content into publication-quality diagrams. Evaluation is conducted on the PaperBananaBench benchmark.

PaperBanana

Description

name: paperbanana description: Generate publication-quality academic diagrams from paper methodology text license: MIT-0 dependencies: env: - OPENROUTER_API_KEY (recommended) - GOOGLE_API_KEY (alternative) runtime: - python3 - uv

PaperBanana

Environment Setup

Usage

Parameters

Output

Examples

Diagram

Important Notes

About

Reviews (0)

Comments (0)

Compatible Platforms

Links

Pricing

Related Configs

self-improving-agent

Self Improving Agent

Find Skills

Summarize