Commit graph

4 commits

Author SHA1 Message Date
ProofOfConcept
b964335317 evaluate: include agent prompt + affected nodes in comparisons
Each comparison now shows the LLM:
- Agent instructions (the .agent prompt file)
- Report output (what the agent did)
- Affected nodes content (what it changed)

The comparator sees intent, action, and impact — can judge whether
a deletion was correct, whether links are meaningful, whether
WRITE_NODEs capture real insights.

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-03-14 19:34:10 -04:00
ProofOfConcept
433d36aea8 evaluate: use rayon par_sort_by for parallel LLM comparisons
Merge sort parallelizes naturally — multiple LLM comparison calls
happen concurrently. Safe because merge sort terminates correctly
even with non-deterministic comparators (unlike quicksort).

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-03-14 19:27:28 -04:00
ProofOfConcept
e12dea503b agent evaluate: sort agent actions by quality using Vec::sort_by with LLM
Yes, really. Rust's stdlib sort_by with an LLM pairwise comparator.
Each comparison is an API call asking "which action was better?"

Sample N actions per agent type, throw them all in a Vec, sort.
Where each agent's samples cluster = that agent's quality score.
Reports per-type average rank and quality ratio.

Supports both haiku (fast/cheap) and sonnet (quality) as comparator.

Usage: poc-memory agent evaluate --samples 5 --model haiku

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-03-14 19:24:07 -04:00
ProofOfConcept
f423cf22df cli: extract agent and admin commands from main.rs
Move agent handlers (consolidate, replay, digest, experience-mine,
fact-mine, knowledge-loop, apply-*) into cli/agent.rs.

Move admin handlers (init, fsck, dedup, bulk-rename, health,
daily-check, import, export) into cli/admin.rs.

Functions tightly coupled to Clap types (cmd_daemon, cmd_digest,
cmd_apply_agent, cmd_experience_mine) remain in main.rs.

main.rs: 3130 → 1586 lines (49% reduction).

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-03-14 18:06:27 -04:00