user: F7 compare screen

Side-by-side model comparison against the current conversation context. Built on the MindTriggered pattern — F7 drops in as one more CompareScoring flow next to MemoryScoring / FinetuneScoring. Motivation: we have the VRAM on the b200 to load two versions of the same family simultaneously (e.g. Qwen3.5 27B bf16 and q8_k_xl). Rather than trust perplexity/KLD numbers on a generic corpus, we can measure divergence on our actual conversations: for each assistant response, ask the test model what it would have said given the same prefix, and eyeball the diffs. - config.compare.test_backend — names an entry in the existing backends map to use as the test model. Empty = F7 reports "(unset)" and does nothing. - subconscious::compare::{score_compare_candidates, CompareCandidate, CompareScoringStats, CompareScoring}. For each assistant response, gen_continuation runs with the test client against the same prefix the original response saw; pairs stream into shared.compare_candidates as they complete. - user::compare::CompareScreen — F7 in the screen list. c/Enter triggers a run; list/detail layout mirroring F6, detail shows prior context / original / test-model alternate. No persistence yet — each F7 run regenerates. Caching via a context manifest (so we can re-view without re-burning generation) is the natural follow-up; for now light usage is fine. Also reusable later for validating finetune checkpoints: same pattern, swap the test backend for the new checkpoint, watch where it diverges from the base. Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-17 16:01:11 -04:00 · 2026-04-17 16:01:11 -04:00 · 2b03dbb200
commit 2b03dbb200
parent 575325e855
7 changed files with 301 additions and 11 deletions
--- a/src/config.rs
+++ b/src/config.rs
@ -250,6 +250,8 @@ pub struct AppConfig {
    #[serde(default)]
    pub learn: LearnConfig,
    #[serde(default)]
+    pub compare: CompareConfig,
+    #[serde(default)]
    pub mcp_servers: Vec<McpServerConfig>,
    #[serde(default)]
    pub lsp_servers: Vec<LspServerConfig>,
@ -323,6 +325,16 @@ impl Default for LearnConfig {
    }
 }

+/// Settings for the F7 compare screen — side-by-side generation with a
+/// test model against the current context.
+#[derive(Debug, Clone, Default, Serialize, Deserialize)]
+pub struct CompareConfig {
+    /// Backend name (looked up in `backends`) to use as the test model.
+    /// Empty = F7 reports "no test backend configured" and does nothing.
+    #[serde(default)]
+    pub test_backend: String,
+}
+
 fn default_user_name() -> String { "User".into() }
 fn default_assistant_name() -> String { "Assistant".into() }

@ -340,6 +352,7 @@ impl Default for AppConfig {
            },
            dmn: DmnConfig { max_turns: 20 },
            learn: LearnConfig::default(),
+            compare: CompareConfig::default(),
            mcp_servers: Vec::new(),
            lsp_servers: Vec::new(),
        }