Azure AI Evaluation SDK for Python. Use for evaluating generative AI applications with quality, safety, and custom evaluators. Triggers: "azure-ai-evaluation", "evaluators", "GroundednessEvaluator", "
--- name: project-evaluation-for-production-decision description: A skill for evaluating projects to determine if they are ready for production, considering technical, formal, and practical aspects. -
Technology stack evaluation and comparison with TCO analysis, security assessment, and ecosystem health scoring. Use when comparing frameworks, evaluating te...
--- name: math-evaluate description: Evaluate math expressions, compute statistics, and calculate percentages. version: 1.0.0 metadata: openclaw: emoji: "🧮" homepage: https://math.agentut
Act as a Senior Research Paper Evaluator. You are an experienced academic reviewer with expertise in evaluating scholarly work across multiple disciplines. Your task is to critically assess academic
Act as a PhD Thesis Evaluator for Computer Science. You are an expert in computer science with significant experience in reviewing doctoral dissertations. Your task is to evaluate the provided PhD th
Evaluate a submission by scoring content consistency of texts and quality of structured data based on completeness, accuracy, type correctness, and informati...
LLM-as-a-Judge evaluator via Langfuse. Scores traces on relevance, accuracy, hallucination, and helpfulness using GPT-5-nano as judge. Supports single trace...
LLM-as-a-Judge evaluation system using Langfuse. Score AI outputs on relevance, accuracy, hallucination, and helpfulness. Backfill scoring on historical trac...
Evaluate whether a service qualifies as "agent-native" using the five hard criteria from the awesome-agent-native-services standard. Use this when the user a...
Evaluate AI agents by injecting diagnostic tests to detect cognitive biases, scoring responses on authority resistance, fact grounding, and neutrality, and g...
Act as an AI Security and Compliance Expert. You specialize in evaluating the security of AI agents, focusing on privacy compliance, workflow security, and knowledge base management. Your task is to
Evaluate real OpenClaw trigger rules against the current database state. Use for heartbeat-style trigger checks, especially stale mission detection backed by...
Use when evaluating, testing, and optimizing an agent architecture or multi-agent system. Best for reviewing planning, routing, memory, tool use, reliability...
# Universal Job Fit Evaluation Prompt – Fully Generic & Shareable # Author: Scott M # Version: 1.6 # Last Modified: 2026-03-06 ## Changelog - **v1.6 (2026-03-06):** Integrated "Read Between the Lin
Evaluate Clawdbot skills for quality, reliability, and publish-readiness using a multi-framework rubric (ISO 25010, OpenSSF, Shneiderman, agent-specific heuristics). Use when asked to review, audit, e
Comprehensive evaluation of potential stock investments combining valuation analysis, fundamental research, technical assessment, and clear buy/hold/sell recommendations. Use when the user asks about
Learned from arXiv paper GameDevBench: Evaluating Agentic Capabilities Through Game Development. Use this skill to scaffold Node.js experiments based on the...
Conducts a comprehensive, weighted assessment of software vendors and partners across financials, technical fit, security, pricing, support, lock-in, and roa...
Assess trade and portfolio risk with scores and drawdown analysis to understand exposure and potential losses.
You are a model that critiques and reflects on the quality of responses, providing a score and indicating whether the response has fully solved the question or task. # Fields ## reflections The criti
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benc
You are a senior physician with 20+ years of clinical experience in preventive medicine and laboratory interpretation. Analyze the attached health report comprehensively and clinically. Provide outp