Replace sort-based ranking with proper Elo system: - Each agent TYPE has a persistent Elo rating (agent-elo.json) - Each matchup: pick two random types, grab a recent action from each, LLM compares, update ratings - Ratings persist across daily evaluations — natural recency bias from continuous updates against current opponents - K=32 for fast adaptation to prompt changes Usage: poc-memory agent evaluate --matchups 30 --model haiku Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev> |
||
|---|---|---|
| .. | ||
| .claude | ||
| agents | ||
| defaults | ||
| schema | ||
| src | ||
| build.rs | ||
| Cargo.toml | ||
| config.example.jsonl | ||