Search

255 results for "eval"

All 🧪 Skills 🔌 MCP Servers 📏 Rules 💬 Prompts

Eval Driven Development

Instrument Python LLM apps, build golden datasets, write eval-based tests, run them, and root-cause failures — covering the full eval-driven development cycl...

❤️ 0 ⬇️ 45

🧪 Skill

Eval Driven Development

Free

Instrument Python LLM apps, build golden datasets, write eval-based tests, run them, and root-cause failures — covering the full eval-driven development cycl...

❤️ 0 ⬇️ 29

🧪 Skill

Skill Eval Preflight

Free

Validate OpenClaw skills during authoring. Use when creating, revising, or preparing a skill for release and you need to scaffold `evals/` files, check readi...

❤️ 1 ⬇️ 16

🧪 Skill

Skill-Eval

Free

Autonomous engine that systematically evaluates and ranks agent skills across models using rubric grading, error taxonomy, and improvement feedback loops.

❤️ 0 ⬇️ 65

🧪 Skill

Eval Skills

Free

AI Agent Skill unit testing framework. A framework-agnostic toolkit for discovering, scaffolding, selecting, evaluating, and reporting on AI skills. Use this...

❤️ 0 ⬇️ 173

🧪 Skill

agentic-eval

Free

Patterns and techniques for evaluating and improving AI agent outputs. Use this skill when: - Implementing self-critique and reflection loops - Building eval...

❤️ 0 ⬇️ 86

🧪 Skill

rag-eval

Free

Evaluate your RAG pipeline quality using Ragas metrics (faithfulness, answer relevancy, context precision).

❤️ 2 ⬇️ 348

🧪 Skill

Llm Eval Router

Free

Shadow-test local Ollama models against a cloud baseline with a multi-judge ensemble. Automatically promotes models when statistically proven equivalent — re...

❤️ 0 ⬇️ 236

🧪 Skill

Agent Evals Lab

Free

Evaluate agent quality and reliability with practical scorecards: accuracy, relevance, actionability, risk flags, tool-call failures, regression checks, and...

❤️ 1 ⬇️ 58

🧪 Skill

Ml Model Eval Benchmark

Free

Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions.

❤️ 0 ⬇️ 173

🧪 Skill

Skill Creator Pro

Free

Create new skills, modify and improve existing skills, and measure skill performance with eval-driven iteration. Use when users want to create a skill from s...

❤️ 0 ⬇️ 118

🧪 Skill

Skill Factory

Free

Create, evaluate, improve, benchmark, and publish OpenClaw skills. Use when building a new skill from scratch, iterating on an existing skill, running evals...

❤️ 0 ⬇️ 336

🧪 Skill

Prediction Stack Orchestrator

Free

Three-agent pipeline orchestrator (Kalshalyst, Eval, Executor) for automated Kalshi prediction market trading with validation loops and retry logic

❤️ 0 ⬇️ 39

🧪 Skill

Camoufox Tools

Free

Simplified CLI tools for camoufox anti-detection browser automation. Provides fox-open, fox-scrape, fox-eval, fox-close, and fox-bilibili-stats commands for...

❤️ 0 ⬇️ 233

🧪 Skill

AI Interview Simulator

Free

Access and interact with AI group interview simulations: browse jobs, create/join rooms, speak, advance interviews, upload resumes, and view history and eval...

❤️ 0 ⬇️ 1.0k

🧪 Skill

continuity-kernel

Free

OpenClaw continuity kernel for fail-open llm_input injection, deterministic runtime contracts, and shadow-mode eval receipts.

❤️ 0 ⬇️ 258

🧪 Skill

AI Interview Simulator

Free

Access and interact with AI group interview simulations: browse jobs, create/join rooms, speak, advance interviews, upload resumes, and view history and eval...

❤️ 0 ⬇️ 1.1k

🧪 Skill

Skill Creator

Free

Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, update or optimize...

❤️ 133 ⬇️ 39k

🧪 Skill

Skill Creator Anthropic

Free

Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, edit, or optimize a...

❤️ 0 ⬇️ 19

🧪 Skill

Skill Engineer

Free

Design, test, review, and maintain agent skills for OpenClaw systems using multi-agent iterative refinement. Orchestrates Designer, Reviewer, and Tester suba...

❤️ 0 ⬇️ 515

🧪 Skill

Graph Of Thoughts

Free

Graph-based reasoning with thought combination and feedback loops. Explores multiple solution paths simultaneously, combines insights, and synthesizes optima...

❤️ 0 ⬇️ 207

🧪 Skill

skill rules designer

Free

Analyzes an existing Claude Code skill and designs an optimal rules/ file structure. Covers three operations: (1) compressing SKILL.md by moving verbose cont...

❤️ 0 ⬇️ 67

🧪 Skill

Skill Provenance

Free

Version tracking for Agent Skills bundles and their associated files across sessions, surfaces, and platforms. Use when creating, editing, versioning, valida...

❤️ 0 ⬇️ 168

🧪 Skill

AICE — AI Confidence Engine

Free

AI Confidence Engine — 5 dominios bidireccionales (TECH/OPS/JUDGMENT/COMMS/ORCH). Agent + User scoring. Triggers: puntúa, auto-score, task-complete, idea-val...

❤️ 0 ⬇️ 139