evaluate: ask for reasoning in comparisons

Chain-of-thought: "say which is better and why" forces clearer judgment and gives us analysis data for improving agents. Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-03-14 19:36:55 -04:00 · 2026-03-14 19:36:55 -04:00 · 415180eeab
commit 415180eeab
parent 39e3d69e3c
1 changed files with 4 additions and 2 deletions
--- a/poc-memory/src/cli/agent.rs
+++ b/poc-memory/src/cli/agent.rs
@ -276,7 +276,8 @@ fn llm_compare(
             {}\n\n\
             ## Action A\n## Report output{}\n\n\
             ## Action B\n## Report output{}\n\n\
-             Reply with ONLY: BETTER: A  or  BETTER: B  or  BETTER: TIE",
+             Say which is better and why in 1-2 sentences, then end with:\n\
             BETTER: A  or  BETTER: B  or  BETTER: TIE",
            a.0, shared_prompt, report_a, report_b
        )
    } else {
@ -285,7 +286,8 @@ fn llm_compare(
             for building a useful, well-organized knowledge graph?\n\n\
             ## Action A ({} agent)\n{}\n\n\
             ## Action B ({} agent)\n{}\n\n\
-             Reply with ONLY: BETTER: A  or  BETTER: B  or  BETTER: TIE",
+             Say which is better and why in 1-2 sentences, then end with:\n\
             BETTER: A  or  BETTER: B  or  BETTER: TIE",
            a.0, a.2, b.0, b.2
        )
    };