user: F7 compare screen
Side-by-side model comparison against the current conversation context.
Built on the MindTriggered pattern — F7 drops in as one more
CompareScoring flow next to MemoryScoring / FinetuneScoring.
Motivation: we have the VRAM on the b200 to load two versions of the
same family simultaneously (e.g. Qwen3.5 27B bf16 and q8_k_xl). Rather
than trust perplexity/KLD numbers on a generic corpus, we can measure
divergence on our actual conversations: for each assistant response,
ask the test model what it would have said given the same prefix, and
eyeball the diffs.
- config.compare.test_backend — names an entry in the existing
backends map to use as the test model. Empty = F7 reports "(unset)"
and does nothing.
- subconscious::compare::{score_compare_candidates, CompareCandidate,
CompareScoringStats, CompareScoring}. For each assistant response,
gen_continuation runs with the test client against the same prefix
the original response saw; pairs stream into
shared.compare_candidates as they complete.
- user::compare::CompareScreen — F7 in the screen list. c/Enter
triggers a run; list/detail layout mirroring F6, detail shows
prior context / original / test-model alternate.
No persistence yet — each F7 run regenerates. Caching via a context
manifest (so we can re-view without re-burning generation) is the
natural follow-up; for now light usage is fine.
Also reusable later for validating finetune checkpoints: same pattern,
swap the test backend for the new checkpoint, watch where it diverges
from the base.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
This commit is contained in:
parent
575325e855
commit
2b03dbb200
7 changed files with 301 additions and 11 deletions
|
|
@ -63,7 +63,7 @@ use tokio::sync::mpsc;
|
|||
use crate::agent::{Agent, TurnResult};
|
||||
use crate::agent::api::ApiClient;
|
||||
use crate::config::{AppConfig, SessionConfig};
|
||||
use crate::subconscious::learn;
|
||||
use crate::subconscious::{compare, learn};
|
||||
use crate::hippocampus::access_local;
|
||||
|
||||
pub use subconscious::{SubconsciousSnapshot, Subconscious};
|
||||
|
|
@ -193,6 +193,11 @@ pub struct MindState {
|
|||
pub finetune_candidates: Vec<learn::FinetuneCandidate>,
|
||||
/// Last scoring run stats for UI display.
|
||||
pub finetune_last_run: Option<learn::FinetuneScoringStats>,
|
||||
/// F7 compare candidates — one per response, showing what the test
|
||||
/// model would say given the same context.
|
||||
pub compare_candidates: Vec<compare::CompareCandidate>,
|
||||
/// F7 compare error from the last run, if any.
|
||||
pub compare_error: Option<String>,
|
||||
}
|
||||
|
||||
impl Clone for MindState {
|
||||
|
|
@ -213,6 +218,8 @@ impl Clone for MindState {
|
|||
unc_idle_deadline: self.unc_idle_deadline,
|
||||
finetune_candidates: self.finetune_candidates.clone(),
|
||||
finetune_last_run: self.finetune_last_run.clone(),
|
||||
compare_candidates: self.compare_candidates.clone(),
|
||||
compare_error: self.compare_error.clone(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -227,6 +234,9 @@ pub enum MindCommand {
|
|||
ScoreFull,
|
||||
/// Score for finetune candidates
|
||||
ScoreFinetune,
|
||||
/// Run F7 compare: generate alternates with the configured test model
|
||||
/// for every assistant response in the context.
|
||||
Compare,
|
||||
/// Update the finetune divergence threshold and persist to config.
|
||||
SetLearnThreshold(f64),
|
||||
/// Toggle alternate-response generation during scoring; persist to config.
|
||||
|
|
@ -258,6 +268,8 @@ impl MindState {
|
|||
unc_idle_deadline: Instant::now() + std::time::Duration::from_secs(60),
|
||||
finetune_candidates: Vec::new(),
|
||||
finetune_last_run: None,
|
||||
compare_candidates: Vec::new(),
|
||||
compare_error: None,
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -359,6 +371,7 @@ pub struct Mind {
|
|||
conscious_active: tokio::sync::watch::Sender<bool>,
|
||||
memory_scoring: learn::MemoryScoring,
|
||||
finetune_scoring: learn::FinetuneScoring,
|
||||
compare_scoring: compare::CompareScoring,
|
||||
_supervisor: crate::thalamus::supervisor::Supervisor,
|
||||
}
|
||||
|
||||
|
|
@ -486,12 +499,14 @@ impl Mind {
|
|||
let memory_scoring = learn::MemoryScoring::new(
|
||||
agent.clone(), shared.clone(), scores_path);
|
||||
let finetune_scoring = learn::FinetuneScoring::new(agent.clone(), shared.clone());
|
||||
let compare_scoring = compare::CompareScoring::new(agent.clone(), shared.clone());
|
||||
|
||||
Self { agent, shared, config,
|
||||
subconscious, unconscious,
|
||||
turn_tx, turn_watch, conscious_active,
|
||||
memory_scoring,
|
||||
finetune_scoring,
|
||||
compare_scoring,
|
||||
_supervisor: sup }
|
||||
}
|
||||
|
||||
|
|
@ -593,6 +608,9 @@ impl Mind {
|
|||
MindCommand::ScoreFinetune => {
|
||||
self.finetune_scoring.trigger();
|
||||
}
|
||||
MindCommand::Compare => {
|
||||
self.compare_scoring.trigger();
|
||||
}
|
||||
MindCommand::SetLearnThreshold(value) => {
|
||||
if let Err(e) = crate::config_writer::set_learn_threshold(value) {
|
||||
dbglog!("[learn] failed to persist threshold {}: {:#}", value, e);
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue