Live Evo: Online Evolution with verified experiences
Self-evolving memory system that learns from verifiable tasks. Use when completing tasks where you can verify the outcome (coding, predictions, analysis). Au...
Description
name: live-evo description: Self-evolving memory system that learns from verifiable tasks. Use when completing tasks where you can verify the outcome (coding, predictions, analysis). Automatically retrieves relevant past experiences and generates task-specific guidelines. user-invocable: true disable-model-invocation: false allowed-tools: Bash(python *), Read, Write, Edit
Live-Evo: Online Self-Evolving Memory
You are using the Live-Evo memory system that learns from past mistakes through experience accumulation and adaptive evaluation.
IMPORTANT — Script location: All scripts are in the scripts/ subdirectory next to this SKILL.md file. When running scripts, use the absolute path to the scripts/ directory relative to where this file is located. For example, if this SKILL.md is at /path/to/live-evo/SKILL.md, the scripts are at /path/to/live-evo/scripts/.
Experience data is stored persistently at ~/.live-evo/experience_db.jsonl (independent of skill installation location).
Core Workflow
1. Retrieve & Compile (Before Acting)
Run the experience retrieval script to find relevant past experiences:
python <scripts-dir>/retrieve.py --query "YOUR_TASK_DESCRIPTION"
If experiences are found, they will be compiled into a task-specific guideline. Use this guideline to inform your approach.
2. Decide: Verify or Direct Apply
You must judge whether contrastive verification (two attempts) is worthwhile based on:
| Factor | Do Contrastive Eval | Skip, Direct Apply |
|---|---|---|
| Cost of re-running | Low (e.g. run a test) | High (e.g. long build, API costs, heavy computation) |
| Verifiability | Clear ground truth exists (tests, known answer) | No easy way to verify programmatically |
| Task complexity | Simple enough to attempt twice | Too complex/large to reasonably duplicate |
| Guideline relevance | Retrieved guideline is highly relevant | Guideline is loosely related or no guideline found |
If contrastive eval IS worthwhile → Go to Step 2A If contrastive eval is NOT worthwhile → Go to Step 2B
Step 2A: Contrastive Evaluation (Two Attempts)
Make two independent attempts:
Attempt A (Without Memory):
- Solve the task using only your base knowledge
- Record your answer/approach
Attempt B (With Guideline):
- Apply the retrieved guideline
- Solve the task with this informed approach
- Record your answer/approach
Then verify and update weights:
python <scripts-dir>/update.py \
--task "TASK_DESCRIPTION" \
--result-a "RESULT_WITHOUT_MEMORY" \
--result-b "RESULT_WITH_GUIDELINE" \
--correct "CORRECT_ANSWER" \
--experience-ids "id1,id2,..."
Step 2B: Direct Apply with Feedback-Based Learning
When contrastive evaluation is not feasible:
- Apply the guideline directly (if one was retrieved) and complete the task
- Observe feedback from any of these sources:
- User feedback (corrections, complaints, approval)
- Environment signals (test results, error messages, build output)
- Outcome observation (did the result work as expected?)
- Store experience directly if feedback reveals a lesson:
python <scripts-dir>/add_experience.py \
--question "THE_TASK_QUESTION" \
--failure-reason "What went wrong (from feedback)" \
--improvement "Key lesson learned" \
--category "coding|analysis|prediction|debugging|other"
No contrastive comparison needed — just learn from what happened.
3. Add New Experience (On Any Failure)
Whenever a task fails or feedback reveals a learnable lesson — regardless of which path you took — store the experience:
python <scripts-dir>/add_experience.py \
--question "THE_TASK_QUESTION" \
--failure-reason "What went wrong" \
--improvement "Key lesson learned" \
--category "coding|analysis|prediction|debugging|other"
4. Update Weights (When Possible)
If you used a retrieved guideline and can determine whether it helped:
python <scripts-dir>/update.py \
--task "TASK_DESCRIPTION" \
--result-a "WHAT_WOULD_HAVE_HAPPENED" \
--result-b "WHAT_ACTUALLY_HAPPENED" \
--correct "CORRECT_OUTCOME" \
--experience-ids "id1,id2,..."
If you cannot determine whether the guideline helped, skip weight updates — no update is better than a wrong update.
When to Use Live-Evo
Use this system for:
- Coding tasks: Bug fixes, implementations where tests can verify
- Analysis tasks: Where ground truth can be checked
- Predictions: Forecasting with eventual verification
- Problem solving: Tasks with objectively correct answers
- Any task with user feedback: Even without formal verification, user corrections are valuable signals
Experience Format
Each experience contains:
question: The original task/questionfailure_reason: What went wrong in the original attemptimprovement: Key lesson or approach that would have helpedmissed_information: Information sources or considerations that were missedweight: Quality score (0.1-2.0) updated based on usefulnesscategory: Domain category for filtering
Key Principles
- Cost-Aware Verification: Only do contrastive evaluation when the cost is justified — don't waste tokens/time on expensive double-runs
- Feedback is Gold: User corrections, test failures, and error messages are direct learning signals — always store these
- Selective Acquisition: Only store experiences that contain a genuine, actionable lesson
- Weight-based Retrieval: Good experiences rise, bad ones fade
- Task-Specific Guidelines: Don't apply raw experiences — synthesize them into actionable guidance
- When in Doubt, Store: It's better to store a potentially useful experience than to miss a lesson; low-quality experiences will naturally decay via weight updates
Manual Commands
View all experiences:
python <scripts-dir>/list_experiences.py
Search experiences:
python <scripts-dir>/retrieve.py --query "your search query" --top-k 5
Get statistics:
python <scripts-dir>/stats.py
Reviews (0)
No reviews yet. Be the first to review!
Comments (0)
No comments yet. Be the first to share your thoughts!