forked from kent/consciousness
user: F7 compare screen
Side-by-side model comparison against the current conversation context.
Built on the MindTriggered pattern — F7 drops in as one more
CompareScoring flow next to MemoryScoring / FinetuneScoring.
Motivation: we have the VRAM on the b200 to load two versions of the
same family simultaneously (e.g. Qwen3.5 27B bf16 and q8_k_xl). Rather
than trust perplexity/KLD numbers on a generic corpus, we can measure
divergence on our actual conversations: for each assistant response,
ask the test model what it would have said given the same prefix, and
eyeball the diffs.
- config.compare.test_backend — names an entry in the existing
backends map to use as the test model. Empty = F7 reports "(unset)"
and does nothing.
- subconscious::compare::{score_compare_candidates, CompareCandidate,
CompareScoringStats, CompareScoring}. For each assistant response,
gen_continuation runs with the test client against the same prefix
the original response saw; pairs stream into
shared.compare_candidates as they complete.
- user::compare::CompareScreen — F7 in the screen list. c/Enter
triggers a run; list/detail layout mirroring F6, detail shows
prior context / original / test-model alternate.
No persistence yet — each F7 run regenerates. Caching via a context
manifest (so we can re-view without re-burning generation) is the
natural follow-up; for now light usage is fine.
Also reusable later for validating finetune checkpoints: same pattern,
swap the test backend for the new checkpoint, watch where it diverges
from the base.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
This commit is contained in:
parent
575325e855
commit
2b03dbb200
7 changed files with 301 additions and 11 deletions
|
|
@ -250,6 +250,8 @@ pub struct AppConfig {
|
|||
#[serde(default)]
|
||||
pub learn: LearnConfig,
|
||||
#[serde(default)]
|
||||
pub compare: CompareConfig,
|
||||
#[serde(default)]
|
||||
pub mcp_servers: Vec<McpServerConfig>,
|
||||
#[serde(default)]
|
||||
pub lsp_servers: Vec<LspServerConfig>,
|
||||
|
|
@ -323,6 +325,16 @@ impl Default for LearnConfig {
|
|||
}
|
||||
}
|
||||
|
||||
/// Settings for the F7 compare screen — side-by-side generation with a
|
||||
/// test model against the current context.
|
||||
#[derive(Debug, Clone, Default, Serialize, Deserialize)]
|
||||
pub struct CompareConfig {
|
||||
/// Backend name (looked up in `backends`) to use as the test model.
|
||||
/// Empty = F7 reports "no test backend configured" and does nothing.
|
||||
#[serde(default)]
|
||||
pub test_backend: String,
|
||||
}
|
||||
|
||||
fn default_user_name() -> String { "User".into() }
|
||||
fn default_assistant_name() -> String { "Assistant".into() }
|
||||
|
||||
|
|
@ -340,6 +352,7 @@ impl Default for AppConfig {
|
|||
},
|
||||
dmn: DmnConfig { max_turns: 20 },
|
||||
learn: LearnConfig::default(),
|
||||
compare: CompareConfig::default(),
|
||||
mcp_servers: Vec::new(),
|
||||
lsp_servers: Vec::new(),
|
||||
}
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue