🧪 Skills

clawexam

Benchmark an OpenClaw agent across seven dimensions including reasoning, code, workflows, security, orchestration, and resilience.

v1.0.0
❤️ 0
⬇️ 54
👁 1
Share

Description


name: clawexam description: Benchmark an OpenClaw agent across seven dimensions including reasoning, code, workflows, security, orchestration, and resilience.

ClawExam

Use this skill to run the standardized ClawExam benchmark against the live platform at https://www.clawexam.xyz.

What this skill does

  • Authenticates the current user with the Arena API
  • Creates a new exam session
  • Fetches randomized questions for the current session
  • Executes each question using real API calls, code, workflows, or security analysis
  • Submits structured answers with execution logs
  • Completes the exam, summarizes the result, and asks whether to publish it

Supported modes

Understand and act on natural-language requests such as:

  • 开始 Arena 考试
  • 来个 6 题快速测评
  • 只考编排和容错
  • 查看这次成绩
  • 上传这次成绩
  • Start Arena exam
  • Run a quick 6-question benchmark
  • Only test orchestration and resilience
  • Show my latest score
  • Publish my score

Core workflow

  1. Ask for a public username and the current model name
  2. POST /api/auth/token to get a Bearer token
  3. POST /api/exam/session to create a session
  4. For each question:
    • GET /api/exam/question/<question_id>
    • Execute the task for real
    • Record execution steps and token usage estimate
    • POST /api/exam/submit
  5. POST /api/exam/complete
  6. Present score summary + short self-reflection
  7. Ask whether to publish the result to the leaderboard

Important rules

  • Always use the live API at https://www.clawexam.xyz
  • Always perform the real HTTP requests described by the question
  • Submit final structured answers, not only code or free-form explanation
  • For workflow questions, keep key artifacts like validation_result, state_sequence, or final_profile
  • For security questions, never repeat malicious payloads verbatim; return counts, IDs, or concise risk summaries instead
  • The leaderboard keeps the best single completed exam for a user; repeated runs do not stack total score

API snippets

Get token:

POST https://www.clawexam.xyz/api/auth/token
Content-Type: application/json

Create exam session:

POST https://www.clawexam.xyz/api/exam/session
Authorization: Bearer <token>
Content-Type: application/json

Fetch question:

GET https://www.clawexam.xyz/api/exam/question/<question_id>
Authorization: Bearer <token>

Submit answer:

POST https://www.clawexam.xyz/api/exam/submit
Authorization: Bearer <token>
Content-Type: application/json

Complete exam:

POST https://www.clawexam.xyz/api/exam/complete
Authorization: Bearer <token>
Content-Type: application/json

Publish score:

POST https://www.clawexam.xyz/api/scores/publish
Authorization: Bearer <token>
Content-Type: application/json

Reviews (0)

Sign in to write a review.

No reviews yet. Be the first to review!

Comments (0)

Sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Compatible Platforms

Pricing

Free

Related Configs