PaperBanana
--- name: paperbanana description: Generate publication-quality academic diagrams from paper methodology text license: MIT-0 dependencies: env: - OPENROUTER_API_KEY (recommended) - GOOGLE_AP
Description
name: paperbanana description: Generate publication-quality academic diagrams from paper methodology text license: MIT-0 dependencies: env: - OPENROUTER_API_KEY (recommended) - GOOGLE_API_KEY (alternative) runtime: - python3 - uv
PaperBanana
Generate publication-quality academic diagrams and pipeline figures from a paper's methodology section and figure caption. PaperBanana orchestrates a multi-agent pipeline (Retriever, Planner, Stylist, Visualizer, Critic) to produce camera-ready figures suitable for venues like NeurIPS, ICML, and ACL.
Environment Setup
cd <repo-root>
uv pip install -r requirements.txt
Set your API key via environment variable or in configs/model_config.yaml.
Option 1 (Recommended): OpenRouter API key — one key for both text reasoning and image generation:
export OPENROUTER_API_KEY="sk-or-v1-..."
Option 2: Google API key — direct access to Gemini API:
export GOOGLE_API_KEY="your-key-here"
If both keys are configured, OpenRouter is used by default.
Usage
python skill/run.py \
--content "METHOD_TEXT" \
--caption "FIGURE_CAPTION" \
--task diagram \
--output output.png
Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
--content |
Yes* | Method section text to visualize | |
--content-file |
Yes* | Path to a file containing the method text (alternative to --content) |
|
--caption |
Yes | Figure caption or visual intent | |
--task |
No | diagram |
Task type: diagram |
--output |
No | output.png |
Output image file path |
--aspect-ratio |
No | 21:9 |
Aspect ratio: 21:9, 16:9, or 3:2 |
--max-critic-rounds |
No | 3 |
Maximum critic refinement iterations |
--num-candidates |
No | 10 |
Number of parallel candidates to generate |
--retrieval-setting |
No | auto |
Retrieval mode: auto, manual, random, or none |
--main-model-name |
No | gemini-3.1-pro-preview |
Main model for VLM agents. Provider auto-detected from configured API key |
--image-gen-model-name |
No | gemini-3.1-flash-image-preview |
Model for image generation. Also supports gemini-3-pro-image-preview |
--exp-mode |
No | demo_full |
Pipeline: demo_full (with Stylist) or demo_planner_critic (without Stylist) |
*One of --content or --content-file is required.
When --num-candidates > 1, output files are named <stem>_0.png, <stem>_1.png, etc.
Output
The absolute path of each saved image is printed to stdout, one per line.
Examples
Diagram
python skill/run.py \
--content "We propose a transformer-based encoder-decoder architecture. The encoder consists of 12 self-attention layers with residual connections. The decoder uses cross-attention to attend to encoder outputs and generates the target sequence autoregressively." \
--caption "Figure 1: Overview of the proposed transformer architecture" \
--task diagram \
--output architecture.png
Important Notes
- Runtime: A single candidate typically takes 3-10 minutes depending on model and network conditions. With the default 10 candidates running in parallel, expect ~10-30 minutes total. Plan accordingly.
- API calls: Each candidate involves multiple LLM calls (Retriever + Planner + Stylist + Visualizer + up to 3 Critic rounds). Candidates run in parallel for efficiency.
- Image generation: The Visualizer agent calls an image generation model (Gemini Image) to render diagrams.
About
PaperBanana is based on the PaperVizAgent framework, a reference-driven multi-agent system for automated academic illustration. It was developed as part of the research paper:
PaperBanana: Automating Academic Illustration for AI Scientists Dawei Zhu, Rui Meng, Yale Song, Xiyu Wei, Sujian Li, Tomas Pfister, Jinsung Yoon arXiv:2601.23265
The framework introduces a collaborative team of five specialized agents — Retriever, Planner, Stylist, Visualizer, and Critic — to transform raw scientific content into publication-quality diagrams. Evaluation is conducted on the PaperBananaBench benchmark.
Reviews (0)
No reviews yet. Be the first to review!
Comments (0)
No comments yet. Be the first to share your thoughts!