Search

1446 results for "evaluation"

All 🧪 Skills 🔌 MCP Servers 📏 Rules 💬 Prompts

Azure Ai Evaluation Py

Azure AI Evaluation SDK for Python. Use for evaluating generative AI applications with quality, safety, and custom evaluators. Triggers: "azure-ai-evaluation", "evaluators", "GroundednessEvaluator", "

❤️ 1 ⬇️ 1.7k

🧪 Skill

Tech Stack Evaluator

Free

Technology stack evaluation and comparison with TCO analysis, security assessment, and ecosystem health scoring. Use when comparing frameworks, evaluating te...

❤️ 0 ⬇️ 1.2k

🧪 Skill

Math Evaluate

Free

--- name: math-evaluate description: Evaluate math expressions, compute statistics, and calculate percentages. version: 1.0.0 metadata: openclaw: emoji: "🧮" homepage: https://math.agentut

❤️ 0 ⬇️ 138

🧪 Skill

Dataset Evaluation

Free

Evaluate a submission by scoring content consistency of texts and quality of structured data based on completeness, accuracy, type correctness, and informati...

❤️ 0 ⬇️ 34

🧪 Skill

LLM Evaluator Pro

Free

LLM-as-a-Judge evaluator via Langfuse. Scores traces on relevance, accuracy, hallucination, and helpfulness using GPT-5-nano as judge. Supports single trace...

❤️ 1 ⬇️ 448

🧪 Skill

Llm Evaluator

Free

LLM-as-a-Judge evaluation system using Langfuse. Score AI outputs on relevance, accuracy, hallucination, and helpfulness. Backfill scoring on historical trac...

❤️ 0 ⬇️ 108

🧪 Skill

Evaluate Agent-Native

Free

Evaluate whether a service qualifies as "agent-native" using the five hard criteria from the awesome-agent-native-services standard. Use this when the user a...

❤️ 0 ⬇️ 0

🧪 Skill

cognitive-behavior-evaluator

Free

Evaluate AI agents by injecting diagnostic tests to detect cognitive biases, scoring responses on authority resistance, fact grounding, and neutrality, and g...

❤️ 0 ⬇️ 23

🧪 Skill

Trigger Evaluator

Free

Evaluate real OpenClaw trigger rules against the current database state. Use for heartbeat-style trigger checks, especially stale mission detection backed by...

❤️ 0 ⬇️ 33

🧪 Skill

agent-architecture-evaluator

Free

Use when evaluating, testing, and optimizing an agent architecture or multi-agent system. Best for reviewing planning, routing, memory, tool use, reliability...

❤️ 0 ⬇️ 17

🧪 Skill

Stock Evaluator

Free

Comprehensive evaluation of potential stock investments combining valuation analysis, fundamental research, technical assessment, and clear buy/hold/sell recommendations. Use when the user asks about

❤️ 15 ⬇️ 4.2k

🧪 Skill

Skill Evaluator

Free

Evaluate Clawdbot skills for quality, reliability, and publish-readiness using a multi-framework rubric (ISO 25010, OpenSSF, Shneiderman, agent-specific heuristics). Use when asked to review, audit, e

❤️ 3 ⬇️ 2.0k

🧪 Skill

Arxiv Gamedevbench Evaluating Agentic Capabili

Free

Learned from arXiv paper GameDevBench: Evaluating Agentic Capabilities Through Game Development. Use this skill to scaffold Node.js experiments based on the...

❤️ 0 ⬇️ 418

🧪 Skill

Vendor Evaluation & Due Diligence

Free

Conducts a comprehensive, weighted assessment of software vendors and partners across financials, technical fit, security, pricing, support, lock-in, and roa...

❤️ 0 ⬇️ 507

🧪 Skill

Polymarket Risk Evaluator

Free

Assess trade and portfolio risk with scores and drawdown analysis to understand exposure and potential losses.

❤️ 0 ⬇️ 115

🧪 Skill

Agent Evaluation

Free

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benc

❤️ 5 ⬇️ 2.4k

🧪 Skill

Vendor Evaluation & Due Diligence

Free

Conducts a comprehensive, weighted assessment of software vendors and partners across financials, technical fit, security, pricing, support, lock-in, and roa...

❤️ 0 ⬇️ 521

🧪 Skill

Agent Scorecard

Free

Configurable quality evaluation for AI agent outputs. Define criteria, run evaluations, track quality over time. No LLM-as-judge, no API calls, pattern-based...

❤️ 0 ⬇️ 125

🧪 Skill

Longevity Assistant

Free

Evaluates longevity interventions using evidence tiers. Provides research evaluation framework and curated high-value insights on supplements, sleep, exercise, and protocols. Activate for anti-aging,

❤️ 3 ⬇️ 1.7k

🧪 Skill

Trade Validation

Free

10-dimension weighted scoring framework for prediction market trade evaluation. Enforces disciplined position sizing, circuit breakers, and mandatory counter-arguments. Use when: evaluating predictio

❤️ 2 ⬇️ 505

🧪 Skill

agentic-eval

Free

Patterns and techniques for evaluating and improving AI agent outputs. Use this skill when: - Implementing self-critique and reflection loops - Building eval...

❤️ 0 ⬇️ 86

🧪 Skill

Eval Skills

Free

AI Agent Skill unit testing framework. A framework-agnostic toolkit for discovering, scaffolding, selecting, evaluating, and reporting on AI skills. Use this...

❤️ 0 ⬇️ 173

🧪 Skill

Skill-Eval

Free

Autonomous engine that systematically evaluates and ranks agent skills across models using rubric grading, error taxonomy, and improvement feedback loops.

❤️ 0 ⬇️ 65

🧪 Skill

Tree Of Thoughts

Free

Multi-path reasoning for complex problems. Explore multiple solution branches → Evaluate each → Select optimal path. Use for: difficult decisions, creative p...

❤️ 0 ⬇️ 188