consciousness

Author	SHA1	Message	Date
ProofOfConcept	0cecfdb352	evaluate: fix agent prompt path, dedup affected nodes, add --dry-run - Use CARGO_MANIFEST_DIR for agent file path (same as defs.rs) - Dedup affected nodes extracted from reports - --dry-run shows example comparison prompt without LLM calls Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>	2026-03-14 19:44:12 -04:00
ProofOfConcept	415180eeab	evaluate: ask for reasoning in comparisons Chain-of-thought: "say which is better and why" forces clearer judgment and gives us analysis data for improving agents. Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>	2026-03-14 19:36:55 -04:00
ProofOfConcept	39e3d69e3c	evaluate: dedup agent prompt when comparing same agent type When both actions are from the same agent, show the instructions once and just compare the two report outputs + affected nodes. Saves tokens and makes the comparison cleaner. Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>	2026-03-14 19:35:11 -04:00
ProofOfConcept	b964335317	evaluate: include agent prompt + affected nodes in comparisons Each comparison now shows the LLM: - Agent instructions (the .agent prompt file) - Report output (what the agent did) - Affected nodes content (what it changed) The comparator sees intent, action, and impact — can judge whether a deletion was correct, whether links are meaningful, whether WRITE_NODEs capture real insights. Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>	2026-03-14 19:34:10 -04:00
ProofOfConcept	433d36aea8	evaluate: use rayon par_sort_by for parallel LLM comparisons Merge sort parallelizes naturally — multiple LLM comparison calls happen concurrently. Safe because merge sort terminates correctly even with non-deterministic comparators (unlike quicksort). Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>	2026-03-14 19:27:28 -04:00
ProofOfConcept	e12dea503b	agent evaluate: sort agent actions by quality using Vec::sort_by with LLM Yes, really. Rust's stdlib sort_by with an LLM pairwise comparator. Each comparison is an API call asking "which action was better?" Sample N actions per agent type, throw them all in a Vec, sort. Where each agent's samples cluster = that agent's quality score. Reports per-type average rank and quality ratio. Supports both haiku (fast/cheap) and sonnet (quality) as comparator. Usage: poc-memory agent evaluate --samples 5 --model haiku Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>	2026-03-14 19:24:07 -04:00
ProofOfConcept	f423cf22df	cli: extract agent and admin commands from main.rs Move agent handlers (consolidate, replay, digest, experience-mine, fact-mine, knowledge-loop, apply-*) into cli/agent.rs. Move admin handlers (init, fsck, dedup, bulk-rename, health, daily-check, import, export) into cli/admin.rs. Functions tightly coupled to Clap types (cmd_daemon, cmd_digest, cmd_apply_agent, cmd_experience_mine) remain in main.rs. main.rs: 3130 → 1586 lines (49% reduction). Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>	2026-03-14 18:06:27 -04:00

7 commits