🧪 Skills

Delx Ops Guardian

Automatically detects, assesses, and safely mitigates incidents in OpenClaw production agents, providing detailed reports and verified recovery.

v1.0.2

⭐ —

❤️ 0

⬇️ 223

👁 1

Save 📁 Collect

Share

Description

name: delx-ops-guardian summary: Incident handling and operational recovery guardrails for OpenClaw production agents. owner: davidmosiah status: active

Delx Ops Guardian

Use this skill when handling incidents, degraded automations, or gateway/memory instability in production.

Aliases

emergency_recovery
handle_incident
cron_guard
memory_guard
gateway_guard

Scope (strict)

This skill is runbook-only and must operate under least privilege.

Allowed read sources:

OpenClaw cron state (openclaw cron list --json)
Service health/status (systemctl is-active <service>)
Recent logs for incident window (journalctl -u <service> --since ... --no-pager)
Workspace incident artifacts (/root/.openclaw/workspace/docs/ops/, /root/.openclaw/workspace/memory/)

Allowed remediation actions (safe set):

Retry a failed job once when failure is transient.
Controlled restart of the impacted service only (openclaw-gateway, openclaw, or explicitly named target from incident evidence).
Disable/enable only the directly impacted cron job when loop-failing.
Add/adjust guardrails in runbook/config docs (non-secret, reversible).

Disallowed actions:

No credential rotation/deletion.
No firewall/network policy mutations.
No package installs/upgrades during incident handling.
No bulk cron rewrites unrelated to the incident.
No edits to unrelated services/components.

Approval policy (human-in-the-loop)

Require explicit human approval before:

Restarting any production service more than once.
Editing cron schedules/timezones.
Disabling a job for more than one cycle.
Any action with user-visible impact beyond the failing component.

Core workflow

Detect and classify severity (info, degraded, critical).
Collect evidence first (status, logs, last run, error streak).
Propose smallest remediation from allowed set.
Execute only approved/safe remediation.
Verify stabilization window (at least one successful cycle).
Publish concise incident report.

Safety rules

Never hide persistent failures as success.
Never expose secrets/tokens in logs or reports.
Prefer reversible actions and document rollback path.
Keep blast radius minimal and explicitly stated.

Output contract

Always include:

Incident id/time window
Root signal and blast radius
Actions executed (and approvals)
Evidence (status, key metric, short log excerpt)
Final state (resolved, degraded, open)
Next check time

Example intents

"Gateway is flapping, recover safely."
"Cron timed out, stabilize and prove fix."
"Memory guard firing repeatedly, root-cause and patch."

Reviews (0)

Sign in to write a review.

No reviews yet. Be the first to review!

Comments (0)

Sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Compatible Platforms

Links

📂 Source Code

Pricing

Free

Related Configs

self-improving-agent

Captures learnings, errors, and corrections to enable continuous improvement. Use when: (1) A command or operation fails unexpectedly, (2) User corrects Clau...

❤️ 2.0k ⬇️ 218k

Self Improving Agent

Captures learnings, errors, and corrections to enable continuous improvement. And also 50+ models for image generation, video generation, text-to-speech, spe...

❤️ 2.0k ⬇️ 206k

Find Skills

Search, discover, and install skills from the open agent skills ecosystem to extend agent capabilities for specific tasks or domains.

❤️ 814 ⬇️ 199k

Summarize

--- name: summarize description: Summarize URLs or files with the summarize CLI (web, PDFs, images, audio, YouTube). homepage: https://summarize.sh metadata: {"clawdbot":{"emoji":"🧾","requires":{"b

❤️ 609 ⬇️ 160k