TokenRanger
Install, configure, and operate the TokenRanger OpenClaw plugin. Use when you want to reduce cloud LLM token costs by 50-80% via local Ollama context compres...
Description
name: tokenranger version: 1.0.0 description: Install, configure, and operate the TokenRanger OpenClaw plugin. Use when you want to reduce cloud LLM token costs by 50-80% via local Ollama context compression, or when diagnosing TokenRanger sidecar issues. metadata: { "openclaw": { "emoji": "🗜️", "category": "performance", "requires": { "bins": ["openclaw"] }, "links": { "plugin": "https://github.com/peterjohannmedina/openclaw-plugin-tokenranger", "npm": "https://www.npmjs.com/package/openclaw-plugin-tokenranger" } } }
TokenRanger
TokenRanger compresses session context through a local Ollama SLM before sending to cloud LLMs — reducing input token costs by 50–80% per turn with graceful fallthrough if anything goes wrong.
- Plugin repo: https://github.com/peterjohannmedina/openclaw-plugin-tokenranger
- npm:
openclaw-plugin-tokenranger - Maintained by: @peterjohannmedina
When to Load This Skill
- User asks to install, configure, or troubleshoot TokenRanger
- User wants to reduce token costs or enable context compression
- User runs
/tokenrangercommands and needs help interpreting output - User wants to switch compression strategy (GPU/CPU/off)
- User asks about upgrading or uninstalling TokenRanger
How It Works
User message → OpenClaw gateway
→ before_agent_start hook
→ Turn 1: skip (full fidelity)
→ Turn 2+: send history to localhost:8100/compress
→ FastAPI sidecar runs LangChain LCEL chain via Ollama
→ Compressed summary prepended to context
→ Cloud LLM receives compressed context instead of full history
Inference strategy is auto-selected by GPU availability:
| Strategy | Trigger | Model | Approach |
|---|---|---|---|
full |
GPU available | mistral:7b |
Deep semantic summarization |
light |
CPU only | phi3.5:3b |
Extractive bullet points |
passthrough |
Ollama unreachable | — | Truncate to last 20 lines |
Install
Step 1 — Install the plugin
openclaw plugins install openclaw-plugin-tokenranger
To pin an exact version:
openclaw plugins install openclaw-plugin-tokenranger@1.0.0 --pin
Step 2 — First-time setup
openclaw tokenranger setup
This pulls Ollama models, creates the Python venv, installs FastAPI/LangChain deps, and registers the sidecar as a system service (systemd on Linux, launchd on macOS).
Step 3 — Restart gateway
openclaw gateway restart
Step 4 — Verify
openclaw tokenranger
Should show current settings and sidecar status (reachable / unreachable).
Configuration
Set config values with:
openclaw config set plugins.entries.tokenranger.config.<key> <value>
openclaw gateway restart
| Key | Default | Description |
|---|---|---|
serviceUrl |
http://127.0.0.1:8100 |
TokenRanger sidecar URL |
timeoutMs |
10000 |
Max wait before fallthrough |
minPromptLength |
500 |
Min chars before compressing |
ollamaUrl |
http://127.0.0.1:11434 |
Ollama API URL |
preferredModel |
mistral:7b |
Model for GPU strategy |
compressionStrategy |
auto |
auto / full / light / passthrough |
inferenceMode |
auto |
auto / cpu / gpu / remote |
Force CPU-only mode:
openclaw config set plugins.entries.tokenranger.config.compressionStrategy light
openclaw config set plugins.entries.tokenranger.config.inferenceMode cpu
openclaw gateway restart
Commands
| Command | Description |
|---|---|
/tokenranger |
Show current settings and sidecar health |
/tokenranger mode gpu |
Force GPU (full) compression |
/tokenranger mode cpu |
Force CPU (light) compression |
/tokenranger mode off |
Disable compression (passthrough) |
/tokenranger model |
List available Ollama models |
/tokenranger toggle |
Enable / disable the plugin |
Upgrading
# Check for updates (dry run)
openclaw plugins update tokenranger --dry-run
# Apply update
openclaw plugins update tokenranger
openclaw tokenranger setup # re-runs setup if sidecar deps changed
openclaw gateway restart
To pin a specific version:
openclaw plugins install openclaw-plugin-tokenranger@2026.3.1 --pin
openclaw tokenranger setup
openclaw gateway restart
List all published versions:
npm view openclaw-plugin-tokenranger versions --json
Uninstalling
openclaw plugins uninstall tokenranger
openclaw gateway restart
Remove the sidecar service manually:
# Linux
systemctl --user stop tokenranger && systemctl --user disable tokenranger
rm ~/.config/systemd/user/tokenranger.service
# macOS
launchctl unload ~/Library/LaunchAgents/com.peterjohannmedina.tokenranger.plist
rm ~/Library/LaunchAgents/com.peterjohannmedina.tokenranger.plist
Troubleshooting
Sidecar unreachable after setup:
# Linux
systemctl --user status tokenranger
journalctl --user -u tokenranger -n 50
# macOS
launchctl list | grep tokenranger
cat ~/Library/Logs/tokenranger.log
# Manual start (any platform)
~/.openclaw/extensions/tokenranger/service/start.sh
Ollama not found:
curl http://127.0.0.1:11434/api/tags
# If not running:
ollama serve
Compression not reducing tokens:
- Check
minPromptLength— default 500 chars; short conversations are skipped by design - Run
/tokenrangerto confirm strategy is notpassthrough - Check sidecar logs for errors
Graceful degradation: TokenRanger never blocks a message. Any failure → silent fallthrough to uncompressed cloud LLM call.
Performance Reference
5-turn Discord benchmark (GPU, mistral:7b-instruct):
| Turn | Input tokens | Compressed | Reduction |
|---|---|---|---|
| 2 | 732 | 125 | 82.9% |
| 3 | 1,180 | 150 | 87.3% |
| 4 | 1,685 | 212 | 87.4% |
| 5 | 2,028 | 277 | 86.3% |
Cumulative: 5,866 → 885 tokens (84.9% reduction)
Reviews (0)
No reviews yet. Be the first to review!
Comments (0)
No comments yet. Be the first to share your thoughts!