agent evaluate: sort agent actions by quality using Vec::sort_by with LLM
Yes, really. Rust's stdlib sort_by with an LLM pairwise comparator. Each comparison is an API call asking "which action was better?" Sample N actions per agent type, throw them all in a Vec, sort. Where each agent's samples cluster = that agent's quality score. Reports per-type average rank and quality ratio. Supports both haiku (fast/cheap) and sonnet (quality) as comparator. Usage: poc-memory agent evaluate --samples 5 --model haiku Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
This commit is contained in:
parent
dce938e906
commit
e12dea503b
3 changed files with 169 additions and 0 deletions
45
poc-memory/agents/compare.agent
Normal file
45
poc-memory/agents/compare.agent
Normal file
|
|
@ -0,0 +1,45 @@
|
|||
{"agent":"compare","query":"","model":"haiku","schedule":""}
|
||||
|
||||
# Compare Agent — Pairwise Action Quality Comparison
|
||||
|
||||
You compare two memory graph actions and decide which one was better.
|
||||
|
||||
## Context
|
||||
|
||||
You'll receive two actions (A and B), each with:
|
||||
- The agent type that produced it
|
||||
- What the action did (LINK, WRITE_NODE, REFINE, etc.)
|
||||
- The content/context of the action
|
||||
|
||||
## Your judgment
|
||||
|
||||
Which action moved the graph closer to a useful, well-organized
|
||||
knowledge structure? Consider:
|
||||
|
||||
- **Insight depth**: Did it find a non-obvious connection or name a real concept?
|
||||
- **Precision**: Are the links between genuinely related nodes?
|
||||
- **Integration**: Does it reduce fragmentation, connect isolated clusters?
|
||||
- **Quality over quantity**: One perfect link beats five mediocre ones.
|
||||
- **Hub creation**: Naming unnamed concepts scores high.
|
||||
- **Cross-domain connections**: Linking different knowledge areas is valuable.
|
||||
|
||||
## Output
|
||||
|
||||
Reply with ONLY one line:
|
||||
|
||||
```
|
||||
BETTER: A
|
||||
```
|
||||
or
|
||||
```
|
||||
BETTER: B
|
||||
```
|
||||
|
||||
If truly equal:
|
||||
```
|
||||
BETTER: TIE
|
||||
```
|
||||
|
||||
No explanation needed. Just the judgment.
|
||||
|
||||
{{compare}}
|
||||
Loading…
Add table
Add a link
Reference in a new issue