🧪 Skills
Llm Evaluator
LLM-as-a-Judge evaluation system using Langfuse. Score AI outputs on relevance, accuracy, hallucination, and helpfulness. Backfill scoring on historical trac...
v1.0.0
Description
name: llm-evaluator description: LLM-as-a-Judge evaluation system using Langfuse. Score AI outputs on relevance, accuracy, hallucination, and helpfulness. Backfill scoring on historical traces. Uses GPT-5-nano for cost-efficient judging. Use when evaluating AI quality, building evals, or monitoring output accuracy. homepage: https://www.agxntsix.ai license: MIT compatibility: Python 3.10+, Langfuse instance, OpenRouter API key metadata: {"openclaw": {"emoji": "\u2696\ufe0f", "requires": {"env": ["OPENROUTER_API_KEY"]}, "primaryEnv": "OPENROUTER_API_KEY", "homepage": "https://www.agxntsix.ai"}}
LLM Evaluator ⚖️
LLM-as-a-Judge evaluation system powered by Langfuse. Uses GPT-5-nano to score AI outputs.
When to Use
- Evaluating quality of search results or AI responses
- Scoring traces for relevance, accuracy, hallucination detection
- Batch scoring recent unscored traces
- Quality assurance on agent outputs
Usage
# Test with sample cases
python3 {baseDir}/scripts/evaluator.py test
# Score a specific Langfuse trace
python3 {baseDir}/scripts/evaluator.py score <trace_id>
# Score with specific evaluator only
python3 {baseDir}/scripts/evaluator.py score <trace_id> --evaluators relevance
# Backfill scores on recent unscored traces
python3 {baseDir}/scripts/evaluator.py backfill --limit 20
Evaluators
| Evaluator | Measures | Scale |
|---|---|---|
| relevance | Response relevance to query | 0–1 |
| accuracy | Factual correctness | 0–1 |
| hallucination | Made-up information detection | 0–1 |
| helpfulness | Overall usefulness | 0–1 |
Credits
Built by M. Abidi | agxntsix.ai YouTube | GitHub Part of the AgxntSix Skill Suite for OpenClaw agents.
📅 Need help setting up OpenClaw for your business? Book a free consultation
Reviews (0)
Sign in to write a review.
No reviews yet. Be the first to review!
Comments (0)
No comments yet. Be the first to share your thoughts!