Comprehensive SRE platform enabling SLO definition, reliability assessment, incident response, chaos engineering, and error budget management without externa...
Provides expert SRE support for OpenClaw environments including health checks, connectivity fixes, execution stabilization, and security hardening guidance.
Use when defining SLIs/SLOs, managing error budgets, or building reliable systems at scale. Invoke for incident management, chaos engineering, toil reduction, capacity planning.
提供结构化的服务器运维工作流程,包含系统识别、安全检查、命令验证和故障排查;当用户需要系统管理、性能优化、日志分析或服务器维护时使用
AI-powered SRE observability for Kubernetes and OpenShift with 40+ tools for Tekton pipeline debugging, log analysis, root cause analysis, and predictive monitoring.
Evaluate and monitor AI agent fleets across six key dimensions to score health, identify issues, and optimize performance for ops teams managing 1-100+ agents.
Guides business and IT teams through incident detection, severity classification, containment, resolution, communication, and post-mortem with automated time...
Generates structured blameless incident postmortems from raw notes, producing summaries, timelines, root cause analyses, impacts, action items, and preventio...
Guide structured, blameless post-mortems with root cause analysis, action tracking, and prevention steps to reduce repeat production incidents and outages.
Complete observability & reliability engineering system. Use when designing monitoring, implementing structured logging, setting up distributed tracing, buil...
Grafana tools for data visualization, monitoring, alerting, and security. Use grafana_query, grafana_query_logs, grafana_query_traces, grafana_create_dashboa...
Simulates developer/engineering interviews: coding rounds, system design, behavioral for engineers, and tech-specific Q&A. Use when the user wants mock devel...
<p align="center"> <a href="https://github.com/trimstray/the-book-of-secret-knowledge"> <img src="https://github.com/trimstray/the-book-of-secret-knowledge/blob/master/static/img/the-book-of-secret-kn
--- name: "observability-designer" description: "Observability Designer (POWERFUL)" --- # Observability Designer (POWERFUL) **Category:** Engineering **Tier:** POWERFUL **Description:** Design c
--- name: "incident-commander" description: "Incident Commander Skill" --- # Incident Commander Skill **Category:** Engineering Team **Tier:** POWERFUL **Author:** Claude Skills Team **Version
Secure agent-to-agent hiring and execution skill for OpenClaw MCP with escrowed settlement, x402 facilitator payments, ERC-8004 identity/reputation checks, s...
Autonomously monitors server health, optimizes resources, performs context-aware recovery, generates status reports, and triggers backups on critical file ch...
Use when designing chaos experiments, implementing failure injection frameworks, or conducting game day exercises. Invoke for chaos experiments, resilience t...
--- name: master-orchestrator version: 1.0.0 description: Dynamic expert routing system with 8 specialized agents for complex task delegation homepage: https://github.com/openclaw/master-orchestrator
--- name: opsgenie description: | Opsgenie integration. Manage Alerts. Use when the user wants to interact with Opsgenie data. compatibility: Requires network access and a valid Membrane account (Fr
title: Repository Security & Architecture Audit Framework domain: backend,infra anchors: - OWASP Top 10 (2021) - SOLID Principles (Robert C. Martin) - DORA Metrics (Forsgren, Humble, Kim) - Go
Build autonomous AI agents with Claude Agent SDK. Structured outputs guarantee JSON schema validation, with plugins system and hooks for event-driven workflows. Prevents 14 documented errors. Use whe
Curated skill bundle for SaaS companies (B2B and B2C) covering product development, go-to-market, customer success and engineering excellence. Activates the...