Comprehensive SRE platform enabling SLO definition, reliability assessment, incident response, chaos engineering, and error budget management without externa...
Provides expert SRE support for OpenClaw environments including health checks, connectivity fixes, execution stabilization, and security hardening guidance.
Use when defining SLIs/SLOs, managing error budgets, or building reliable systems at scale. Invoke for incident management, chaos engineering, toil reduction, capacity planning.
提供结构化的服务器运维工作流程,包含系统识别、安全检查、命令验证和故障排查;当用户需要系统管理、性能优化、日志分析或服务器维护时使用
Guides business and IT teams through incident detection, severity classification, containment, resolution, communication, and post-mortem with automated time...
Guide structured, blameless post-mortems with root cause analysis, action tracking, and prevention steps to reduce repeat production incidents and outages.
Generates structured blameless incident postmortems from raw notes, producing summaries, timelines, root cause analyses, impacts, action items, and preventio...
Evaluate and monitor AI agent fleets across six key dimensions to score health, identify issues, and optimize performance for ops teams managing 1-100+ agents.
Complete observability & reliability engineering system. Use when designing monitoring, implementing structured logging, setting up distributed tracing, buil...
Grafana tools for data visualization, monitoring, alerting, and security. Use grafana_query, grafana_query_logs, grafana_query_traces, grafana_create_dashboa...
Simulates developer/engineering interviews: coding rounds, system design, behavioral for engineers, and tech-specific Q&A. Use when the user wants mock devel...
Build autonomous AI agents with Claude Agent SDK. Structured outputs guarantee JSON schema validation, with plugins system and hooks for event-driven workflows. Prevents 14 documented errors. Use whe
架构治理专家。基于六大维度评价系统健康度,生成治理任务与报告。触发:架构健康度评估、技术债务识别、治理规划、架构评审、系统对比、治理
OneUptime integration. Manage Users, Organizations. Use when the user wants to interact with OneUptime data.
--- name: opsgenie description: | Opsgenie integration. Manage Alerts. Use when the user wants to interact with Opsgenie data. compatibility: Requires network access and a valid Membrane account (Fr
Autonomously monitors server health, optimizes resources, performs context-aware recovery, generates status reports, and triggers backups on critical file ch...
AI agent governance, trust scoring, and policy enforcement powered by AgentMesh. Activate when: (1) user wants to enforce token limits, tool restrictions, or...
提供严格按优先级步骤诊断与解决系统、软件、配置及报错问题,结合记忆匹配、官方文档和多来源验证确保方案高效可靠。
Linux 系统级内存泄漏巡检:定时扫描所有进程内存,记录系统内存全景, 通过增长趋势分析识别异常进程,输出排查思路和可疑进程列表。 当用户提到
Curated skill bundle for SaaS companies (B2B and B2C) covering product development, go-to-market, customer success and engineering excellence. Activates the...
--- name: "observability-designer" description: "Observability Designer (POWERFUL)" --- # Observability Designer (POWERFUL) **Category:** Engineering **Tier:** POWERFUL **Description:** Design c
--- name: "incident-commander" description: "Incident Commander Skill" --- # Incident Commander Skill **Category:** Engineering Team **Tier:** POWERFUL **Author:** Claude Skills Team **Version
Secure agent-to-agent hiring and execution skill for OpenClaw MCP with escrowed settlement, x402 facilitator payments, ERC-8004 identity/reputation checks, s...