🧪 Skills
Restart Recovery
Make OpenClaw agent workflows restart-safe using checkpoint files, idempotent step tracking, wake/resume handoff, and stale-checkpoint monitoring. Use when u...
v1.0.0
Description
name: restart-recovery description: Make OpenClaw agent workflows restart-safe using checkpoint files, idempotent step tracking, wake/resume handoff, and stale-checkpoint monitoring. Use when users ask to recover from restarts, preserve progress across updates/config restarts, or implement checkpoint → restart → wake → resume patterns.
Restart Recovery
Implement restart-safe execution with this sequence:
- checkpoint
- restart
- wake
- resume from file
Use bundled scripts
- Use
scripts/checkpoint_tool.pyfor deterministic checkpoint lifecycle:start,update,resume,complete,list
- Use
scripts/checkpoint_selfcheck.pyfor stale unfinished checkpoint alerts without LLM/tool-token usage.
Required operating rules
- Write checkpoints before any restart-prone operation (config patch/apply, update, service restart, long multi-step jobs).
- Use atomic file writes (
.tmpthen rename). - Track completed and remaining steps explicitly.
- Include an idempotency key per workflow to avoid duplicate side effects after resume.
- Never write secrets/tokens to checkpoint files.
- Acquire a resume lock before continuing unfinished work.
Recommended checkpoint location
- Per agent:
memory/checkpoints/*.json - Shared/default workspace flows:
memory/checkpoints/*.jsonat workspace root
Startup instruction to add in AGENTS.md
Add this exact section:
## Restart-safe workflow rule
On startup, check `memory/checkpoints/*.json` for unfinished workflows. If found, acquire resume lock, validate checkpoint schema/hash, and continue from the last completed idempotent step.
No-LLM stale checkpoint monitor
Use host scheduler (launchd/systemd/cron), not LLM cron jobs.
- Run every 10 minutes.
- Alert only when unfinished checkpoints are older than threshold.
- Log to local file for audit.
Suggested execution flow
checkpoint_tool.py startbefore risky step.- Perform step.
checkpoint_tool.py update --complete <step> --step <next>.- If restart happens, wake session/process.
- On startup/re-entry,
checkpoint_tool.py resumeand continue. checkpoint_tool.py completewhen done.
Validation checklist
- Simulate mid-work restart and verify resume from last completed step.
- Confirm idempotency (no duplicate sends/writes/actions).
- Confirm stale-check script only alerts after threshold.
- Confirm old checkpoint cleanup policy (expiry).
Reviews (0)
Sign in to write a review.
No reviews yet. Be the first to review!
Comments (0)
No comments yet. Be the first to share your thoughts!