evaluate: ask for reasoning in comparisons

Chain-of-thought: "say which is better and why" forces clearer
judgment and gives us analysis data for improving agents.

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
This commit is contained in:
ProofOfConcept 2026-03-14 19:36:55 -04:00
parent 39e3d69e3c
commit 415180eeab

View file

@ -276,7 +276,8 @@ fn llm_compare(
{}\n\n\ {}\n\n\
## Action A\n## Report output{}\n\n\ ## Action A\n## Report output{}\n\n\
## Action B\n## Report output{}\n\n\ ## Action B\n## Report output{}\n\n\
Reply with ONLY: BETTER: A or BETTER: B or BETTER: TIE", Say which is better and why in 1-2 sentences, then end with:\n\
BETTER: A or BETTER: B or BETTER: TIE",
a.0, shared_prompt, report_a, report_b a.0, shared_prompt, report_a, report_b
) )
} else { } else {
@ -285,7 +286,8 @@ fn llm_compare(
for building a useful, well-organized knowledge graph?\n\n\ for building a useful, well-organized knowledge graph?\n\n\
## Action A ({} agent)\n{}\n\n\ ## Action A ({} agent)\n{}\n\n\
## Action B ({} agent)\n{}\n\n\ ## Action B ({} agent)\n{}\n\n\
Reply with ONLY: BETTER: A or BETTER: B or BETTER: TIE", Say which is better and why in 1-2 sentences, then end with:\n\
BETTER: A or BETTER: B or BETTER: TIE",
a.0, a.2, b.0, b.2 a.0, a.2, b.0, b.2
) )
}; };