🧪 Skills

B3ehive

Runs three AI agents in parallel to implement, cross-evaluate, score, and select the best code solution for a given coding task objectively.

v0.1.0
❤️ 2
⬇️ 738
👁 1
Share

Description

b3ehive Skill Specification

PCTF-Compliant Multi-Agent Competition System


1. Purpose (PCTF: Purpose)

Enable competitive code generation where three isolated AI agents implement the same functionality, evaluate each other objectively, and deliver the optimal solution through data-driven selection.


2. Task Definition (PCTF: Task)

Input

  • task_description: String describing the coding task
  • constraints: Optional constraints (time/space complexity, language, etc.)

Output

  • final_solution: Directory containing the winning implementation
  • comparison_report: Markdown analysis of all three approaches
  • decision_rationale: Explanation of why the winner was selected

Success Criteria

assertions:
  - final_solution/implementation exists and is runnable
  - comparison_report.md exists with objective metrics
  - decision_rationale.md explains selection logic
  - all three agent implementations are documented
  - evaluation scores are numeric and justified

3. Chain Flow (PCTF: Chain)

graph TD
    A[User Task] --> B[Phase 1: Parallel Spawn]
    B --> C[Agent A: Simplicity]
    B --> D[Agent B: Speed]
    B --> E[Agent C: Robustness]
    C --> F[Phase 2: Cross-Evaluation]
    D --> F
    E --> F
    F --> G[6 Evaluation Reports]
    G --> H[Phase 3: Self-Scoring]
    H --> I[3 Scorecards]
    I --> J[Phase 4: Final Delivery]
    J --> K[Best Solution]

Phase 1: Parallel Implementation

Agent Prompt Template:

role: "Expert Software Engineer"
focus: "{{agent_focus}}"  # Simplicity / Speed / Robustness
task: "{{task_description}}"
constraints:
  - Complete runnable code in implementation/
  - Checklist.md with ALL items checked
  - SUMMARY.md with competitive advantages
  - Must differ from other agents' approaches

linter_rules:
  - code_compiles: true
  - tests_pass: true
  - no_todos: true
  - documented: true

assertions:
  - implementation/main.* exists
  - tests exist and pass
  - Checklist.md is complete
  - SUMMARY.md explains unique approach

Phase 2: Cross-Evaluation

Evaluation Prompt Template:

evaluator: "Agent {{from}}"
target: "Agent {{to}}"
task: "Objectively prove your solution is superior"

dimensions:
  simplicity:
    weight: 20
    metrics:
      - lines_of_code: count
      - cyclomatic_complexity: calculate
      - readability_score: 1-10
  
  speed:
    weight: 25
    metrics:
      - time_complexity: big_o
      - space_complexity: big_o
      - benchmark_results: run_if_possible
  
  stability:
    weight: 25
    metrics:
      - error_handling_coverage: percentage
      - resource_cleanup: check
      - fault_tolerance: test
  
  corner_cases:
    weight: 20
    metrics:
      - input_validation: comprehensive
      - boundary_conditions: covered
      - edge_cases: tested
  
  maintainability:
    weight: 10
    metrics:
      - documentation_quality: 1-10
      - code_structure: logical
      - extensibility: easy/hard

assertions:
  - evaluation is objective with data
  - specific code snippets cited
  - numeric scores provided
  - persuasion argument is data-driven

Phase 3: Objective Scoring

Scoring Prompt Template:

agent: "Agent {{name}}"
task: "Fairly score yourself and competitors"

self_evaluation:
  - dimension: simplicity
    max: 20
    score: "{{self_score}}"
    justification: "{{why}}"
  
  - dimension: speed
    max: 25
    score: "{{self_score}}"
    justification: "{{why}}"
  
  - dimension: stability
    max: 25
    score: "{{self_score}}"
    justification: "{{why}}"
  
  - dimension: corner_cases
    max: 20
    score: "{{self_score}}"
    justification: "{{why}}"
  
  - dimension: maintainability
    max: 10
    score: "{{self_score}}"
    justification: "{{why}}"

peer_evaluation:
  - target: "Agent {{other}}"
    scores: "{{numeric_scores}}"
    comparison: "{{objective_comparison}}"

final_conclusion:
  best_implementation: "[A/B/C/Mixed]"
  reasoning: "{{data_driven_justification}}"
  recommendation: "{{delivery_strategy}}"

assertions:
  - all scores are numeric
  - justifications are specific
  - no inflation or bias
  - conclusion is evidence-based

Phase 4: Final Delivery

Decision Logic:

def select_winner(scores):
    """
    Select final solution based on competitive scores
    """
    margins = calculate_score_margins(scores)
    
    if margins.winner - margins.second > 15:
        # Clear winner
        return SingleWinner(scores.winner)
    elif margins.winner - margins.second > 5:
        # Close competition, consider hybrid
        return HybridSolution(scores.top_two)
    else:
        # Very close, pick simplest
        return SimplestImplementation(scores.all)

assertions:
  - final_solution is runnable
  - comparison_report explains all approaches
  - decision_rationale is transparent
  - attribution is given to winning agent

4. Format Specifications (PCTF: Format)

Directory Structure

workspace/
├── run_a/
│   ├── implementation/      # Agent A code
│   ├── Checklist.md         # Completion checklist
│   ├── SUMMARY.md           # Approach summary
│   ├── evaluation/          # Evaluations of B, C
│   └── SCORECARD.md         # Self-scoring
├── run_b/                   # Same structure
├── run_c/                   # Same structure
├── final/                   # Winning solution
├── COMPARISON_REPORT.md     # Full analysis
└── DECISION_RATIONALE.md    # Why winner selected

File Formats

  • Checklist.md: Markdown with - [x] checkboxes
  • SUMMARY.md: Markdown with sections
  • EVALUATION_*.md: Markdown with tables
  • SCORECARD.md: Markdown with score tables
  • Implementation: Runnable code files

5. Linter & Validation

Pre-commit Checks

#!/bin/bash
# scripts/lint.sh

lint_agent_output() {
    local agent_dir="$1"
    local errors=0
    
    # Check required files exist
    for file in Checklist.md SUMMARY.md implementation/main.*; do
        if [[ ! -f "${agent_dir}/${file}" ]]; then
            echo "ERROR: Missing ${file}"
            ((errors++))
        fi
    done
    
    # Check Checklist is complete
    if grep -q "\[ \]" "${agent_dir}/Checklist.md"; then
        echo "ERROR: Checklist has unchecked items"
        ((errors++))
    fi
    
    # Check code compiles (language-specific)
    # ... implementation-specific checks
    
    return $errors
}

# Run on all agents
for agent in a b c; do
    lint_agent_output "workspace/run_${agent}" || exit 1
done

Runtime Assertions

def assert_phase_complete(phase_name):
    """Assert that a phase has completed successfully"""
    assertions = {
        "phase1": [
            "workspace/run_a/implementation exists",
            "workspace/run_b/implementation exists", 
            "workspace/run_c/implementation exists",
            "All Checklist.md are complete"
        ],
        "phase2": [
            "6 evaluation reports exist",
            "All evaluations have numeric scores"
        ],
        "phase3": [
            "3 scorecards exist",
            "All scores are numeric",
            "Conclusions are provided"
        ],
        "phase4": [
            "final/solution exists",
            "COMPARISON_REPORT.md exists",
            "DECISION_RATIONALE.md exists"
        ]
    }
    
    for assertion in assertions[phase_name]:
        assert evaluate(assertion), f"Assertion failed: {assertion}"

6. Configuration

b3ehive:
  # Agent configuration
  agents:
    count: 3
    model: openai-proxy/gpt-5.3-codex
    thinking: high
    focuses:
      - simplicity
      - speed
      - robustness
  
  # Evaluation weights (must sum to 100)
  evaluation:
    dimensions:
      simplicity: 20
      speed: 25
      stability: 25
      corner_cases: 20
      maintainability: 10
  
  # Delivery strategy
  delivery:
    strategy: auto  # auto / best / hybrid
    threshold: 15   # Point margin for clear winner
  
  # Quality gates
  quality:
    lint: true
    test: true
    coverage_threshold: 80

7. Usage

# Basic usage
b3ehive "Implement a thread-safe rate limiter"

# With constraints
b3ehive "Implement quicksort" --lang python --max-lines 50

# Using OpenClaw CLI
openclaw skills run b3ehive --task "Your task"

8. License

MIT © Weiyang (@weiyangzen)

Reviews (0)

Sign in to write a review.

No reviews yet. Be the first to review!

Comments (0)

Sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Compatible Platforms

Pricing

Free

Related Configs