Commit graph

1 commit

Author SHA1 Message Date
ProofOfConcept
e12dea503b agent evaluate: sort agent actions by quality using Vec::sort_by with LLM
Yes, really. Rust's stdlib sort_by with an LLM pairwise comparator.
Each comparison is an API call asking "which action was better?"

Sample N actions per agent type, throw them all in a Vec, sort.
Where each agent's samples cluster = that agent's quality score.
Reports per-type average rank and quality ratio.

Supports both haiku (fast/cheap) and sonnet (quality) as comparator.

Usage: poc-memory agent evaluate --samples 5 --model haiku

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-03-14 19:24:07 -04:00