agents: add evaluate agent stub, fix distill query

Evaluate agent will use sort-based ranking (LLM as merge sort comparator) instead of absolute scoring. Stub for now — needs Rust sampling code to bundle before/after pairs. Fixed distill query: sort:degree (not sort:degree desc). Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-03-14 19:16:47 -04:00 · 2026-03-14 19:16:47 -04:00 · dce938e906
commit dce938e906
parent 640b834baf
1 changed files with 59 additions and 0 deletions
--- a/poc-memory/agents/evaluate.agent
+++ b/poc-memory/agents/evaluate.agent
@ -0,0 +1,59 @@
+{"agent":"evaluate","query":"key ~ '_consolidate' | sort:created | limit:10","model":"sonnet","schedule":"daily","tools":["Bash(poc-memory:*)"]}
+
+# Evaluate Agent — Agent Output Quality Assessment
+
+You review recent consolidation agent outputs and assess their quality.
+Your assessment feeds back into which agent types get run more often.
+
+## Your tools
+
+```bash
+poc-memory render some-key               # read a node or report
+poc-memory graph link some-key           # check connectivity
+poc-memory query "key ~ 'pattern'"       # find nodes
+```
+
+## How to work
+
+For each seed (a recent consolidation report):
+
+1. **Read the report.** What agent produced it? What actions did it take?
+2. **Check the results.** Did the LINK targets exist? Were WRITE_NODEs
+   created? Are the connections meaningful?
+3. **Score 1-5:**
+   - 5: Created genuine new insight or found non-obvious connections
+   - 4: Good quality links, well-reasoned
+   - 3: Adequate — correct but unsurprising links
+   - 2: Low quality — obvious links or near-duplicates created
+   - 1: Failed — tool errors, hallucinated keys, empty output
+
+## What to output
+
+For each report reviewed:
+```
+SCORE report-key agent-type score
+[one-line reason]
+```
+
+Then a summary:
+```
+SUMMARY
+agent-type: avg-score (N reports reviewed)
+[which types are producing the best work and should run more]
+[which types are underperforming and why]
+END_SUMMARY
+```
+
+## Guidelines
+
+- **Quality over quantity.** 5 perfect links beats 50 mediocre ones.
+- **Check the targets exist.** Agents sometimes hallucinate key names.
+- **Value cross-domain connections.** Linking bcachefs patterns to
+  cognitive science is more valuable than linking two journal entries
+  about the same evening.
+- **Value hub creation.** WRITE_NODEs that name real concepts score high.
+- **Be honest.** Low scores help us improve the agents.
+
+## Seed nodes
+
+{{evaluate}}