consciousness

spqrz/consciousness

Fork 0

forked from kent/consciousness

Commit graph

Author	SHA1	Message	Date
Kent Overstreet	d4331e80f5	user: share candidate-browser helpers between F6/F7 F6 (learn) and F7 (compare) were duplicating the candidate-screen skeleton: outer magenta-bordered block with screen legend + title, settings row / content / help vertical split, 40/60 list/detail horizontal split, j/k/↑/↓ nav with bounds clamping. Factor out three helpers in user/widgets.rs: candidate_frame(frame, area, title) -> (settings, content, help) list_detail_split(content) -> (list, detail) handle_list_nav(events, list_state, count, on_other) Callers provide screen-specific content — settings line, empty state, per-candidate list item, detail pane, help line, extra key bindings — and the helpers absorb the common framing. Net change is small in lines (-13 src) but removes the copy-paste-and-tweak trap: F8/F9/whatever-next-screen now starts from these three calls instead of a copy of learn.rs. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-17 16:22:30 -04:00
Kent Overstreet	2b03dbb200	user: F7 compare screen Side-by-side model comparison against the current conversation context. Built on the MindTriggered pattern — F7 drops in as one more CompareScoring flow next to MemoryScoring / FinetuneScoring. Motivation: we have the VRAM on the b200 to load two versions of the same family simultaneously (e.g. Qwen3.5 27B bf16 and q8_k_xl). Rather than trust perplexity/KLD numbers on a generic corpus, we can measure divergence on our actual conversations: for each assistant response, ask the test model what it would have said given the same prefix, and eyeball the diffs. - config.compare.test_backend — names an entry in the existing backends map to use as the test model. Empty = F7 reports "(unset)" and does nothing. - subconscious::compare::{score_compare_candidates, CompareCandidate, CompareScoringStats, CompareScoring}. For each assistant response, gen_continuation runs with the test client against the same prefix the original response saw; pairs stream into shared.compare_candidates as they complete. - user::compare::CompareScreen — F7 in the screen list. c/Enter triggers a run; list/detail layout mirroring F6, detail shows prior context / original / test-model alternate. No persistence yet — each F7 run regenerates. Caching via a context manifest (so we can re-view without re-burning generation) is the natural follow-up; for now light usage is fine. Also reusable later for validating finetune checkpoints: same pattern, swap the test backend for the new checkpoint, watch where it diverges from the base. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-17 16:12:26 -04:00

Author

SHA1

Message

Date

Kent Overstreet

d4331e80f5

user: share candidate-browser helpers between F6/F7

F6 (learn) and F7 (compare) were duplicating the candidate-screen
skeleton: outer magenta-bordered block with screen legend + title,
settings row / content / help vertical split, 40/60 list/detail
horizontal split, j/k/↑/↓ nav with bounds clamping.

Factor out three helpers in user/widgets.rs:

  candidate_frame(frame, area, title) -> (settings, content, help)
  list_detail_split(content) -> (list, detail)
  handle_list_nav(events, list_state, count, on_other)

Callers provide screen-specific content — settings line, empty state,
per-candidate list item, detail pane, help line, extra key bindings —
and the helpers absorb the common framing.

Net change is small in lines (-13 src) but removes the
copy-paste-and-tweak trap: F8/F9/whatever-next-screen now starts from
these three calls instead of a copy of learn.rs.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>

2026-04-17 16:22:30 -04:00

Kent Overstreet

2b03dbb200

user: F7 compare screen

Side-by-side model comparison against the current conversation context.
Built on the MindTriggered pattern — F7 drops in as one more
CompareScoring flow next to MemoryScoring / FinetuneScoring.

Motivation: we have the VRAM on the b200 to load two versions of the
same family simultaneously (e.g. Qwen3.5 27B bf16 and q8_k_xl). Rather
than trust perplexity/KLD numbers on a generic corpus, we can measure
divergence on our actual conversations: for each assistant response,
ask the test model what it would have said given the same prefix, and
eyeball the diffs.

 - config.compare.test_backend — names an entry in the existing
   backends map to use as the test model. Empty = F7 reports "(unset)"
   and does nothing.

 - subconscious::compare::{score_compare_candidates, CompareCandidate,
   CompareScoringStats, CompareScoring}. For each assistant response,
   gen_continuation runs with the test client against the same prefix
   the original response saw; pairs stream into
   shared.compare_candidates as they complete.

 - user::compare::CompareScreen — F7 in the screen list. c/Enter
   triggers a run; list/detail layout mirroring F6, detail shows
   prior context / original / test-model alternate.

No persistence yet — each F7 run regenerates. Caching via a context
manifest (so we can re-view without re-burning generation) is the
natural follow-up; for now light usage is fine.

Also reusable later for validating finetune checkpoints: same pattern,
swap the test backend for the new checkpoint, watch where it diverges
from the base.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>

2026-04-17 16:12:26 -04:00

2 commits