Evaluate agent will use sort-based ranking (LLM as merge sort
comparator) instead of absolute scoring. Stub for now — needs
Rust sampling code to bundle before/after pairs.
Fixed distill query: sort:degree (not sort:degree desc).
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>