consciousness

Author	SHA1	Message	Date
Kent Overstreet	fe0fb8253a	amygdala: subspace-common-direction alternative to pooled CAA New --method subspace flag. For each story, run forward pass, do SVD on the per-token activation matrix at each target layer, and keep the top-k right singular vectors V_i ∈ [hidden, k]. V_i is the subspace the story's tokens span in activation space — it contains concept, narrator, topic, style as separate directions. For each concept: M_pos = (1/n_pos) Σ_{i in pos} V_i V_i^T [hidden, hidden] M_base = (1/n_base) Σ_{i in base} V_i V_i^T Top eigenvector of M_pos - M_base = direction most common across positive stories, minus what's common across the contrast set. Why this is richer than pooled-mean CAA: pooled reduces each story to a single point (the last-token activation) and loses the full trajectory. Nuisance directions (narrator, setting) cancel in the mean only to the extent they differ at the last token; across the full trajectory they cancel much better via subspace intersection. The concept direction, by contrast, is present across all tokens of every concept-bearing story. Memory cost: per-story we keep V_i of size [5120, k=20] — about 400KB per story × 112 stories = ~45MB. M matrices are [5120, 5120] built transiently per concept. --method pooled (default) keeps the existing behavior; --method subspace uses the new algorithm. Quality report works with either. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 21:24:11 -04:00
Kent Overstreet	71f6053851	amygdala stories: disambiguation scenarios for fragmented concepts Three new paired scenarios targeting the concepts that came out fragmented or collapsed in the L58-63 quality analysis: - sunday_afternoon/ — same setup (couch, blanket, Sunday light), three phenomenological framings for content/cozy/sensual. The previous stories for these three differed in setting as well as phenomenology, which let "comfortable body at home" dominate the shared signal. Locking the setting forces the model to isolate what each concept adds: life-rightness (content) vs. warm-shelter (cozy) vs. sensory-aliveness (sensual). - the_writing_session/ — essay drafting under deadline. in_flow / anxious / stuck variants force the cognitive-state family apart on the same cognitive task. in_flow specifically targets the transparent-effort phenomenology (hands-followed, time dilation) rather than the broader feel-good it was absorbing. - the_morning_commute/ — anchors anxious to performance/work-anxiety flavor, paired with calm. The 5 existing anxious stories were phenomenologically diverse (performance, social, existential); this adds a specific homogeneous instance to pull the centroid. After retraining: expect first_pc_variance_ratio to rise for in_flow and anxious, and nearest_concepts cosine to drop for content/cozy/sensual. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 21:08:23 -04:00
Kent Overstreet	1d2c0f382c	amygdala: linear-combination analysis per concept For each concept vector, ridge-regress against all other concept vectors. R² quantifies how much of the direction is explained by a linear combination of peers — useful for teasing out near-duplicate clusters (the content/cozy/sensual trio from the first L63 run is likely 1-2 "degrees of freedom" wearing three names). Coefficient output: top-5 contributing concepts with signed weights. Contributors with opposite-sign large weights mean the target is "what makes X different from Y." Adds a 'redundant' triage bucket for concepts with R² > 0.9 — candidates for consolidation or for writing more discriminative training stories. Summary printed at end. Ridge lambda defaults to 0.01 to keep coefficients stable when concepts are near-collinear; small enough not to affect well-separated concepts meaningfully. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 20:59:37 -04:00
Kent Overstreet	f4fb6db1ee	amygdala: fix device mismatch in quality-report W_down handling _compute_quality_report's single-neuron alignment was computing cos(W_down.T, diff_l) with W_down on CUDA (inherited from the loaded model) while diff_l lives on CPU (per_layer_vectors are kept on CPU throughout training). Move W_down to CPU on extraction. Surfaced during first real training run on b200 — training itself completed cleanly (95 concepts x layer 63 in ~8s) but quality-report crashed at the first single-neuron alignment check. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 20:52:50 -04:00
Kent Overstreet	af17b0f0df	amygdala: per-head attention decomposition diagnostic As part of --quality-report, run a second forward pass capturing the input to each target layer's o_proj (= concat of per-head attention outputs before the output projection). For each concept, reshape to [n_heads, head_dim] and rank heads by diff-of-means magnitude / per-head selectivity (magnitude normalised by negative std). Motivation: the Wang et al. paper (2510.11328) — whose paired-scenario methodology we already lifted — further decomposes concept circuits at the attention-head level. Meta-relational concepts (recognition, trust, vulnerability) plausibly live in a sparse attention-head circuit rather than in the residual-stream sum, which would explain why diff-of-means on the residual blurs them. This diagnostic surfaces that. Output is folded into quality.json under each concept as "per_head": per (layer) a list of top-10 heads with [head_idx, raw_norm, selectivity], plus head_concentration (fraction of total head-norm captured by those top heads). Interpretation: - head_concentration > 0.5 = sparse head circuit; a handful of heads route the concept. Worth building a head-level readout for. - head_concentration ~= n/k for n heads = concept is distributed across all heads ~evenly; residual-stream diff-of-means is doing fine. Hybrid layers (Mamba, GatedDeltaNet) whose attention path doesn't match the standard module layout are silently skipped. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 20:37:44 -04:00
Kent Overstreet	ce24d9ce6b	amygdala: quality-report + cognitive-state training scenarios Training pipeline additions: - `--quality-report` flag: after producing per-concept vectors, compute per-concept diagnostics and write quality.json. Metrics per concept: * SVD of centered positives -> first_pc_variance_ratio (rank analysis; >0.7 clean, <0.4 fragmented) * Per-story alignment cosines (stories agree or disagree) * Single-neuron alignment: best cosine(direction, W_down column) at each target layer (>0.6 = essentially one MLP neuron) * Top-2 outlier stories by alignment (candidates for mislabeling or off-topic) * Top-5 nearest concepts by cosine (cross-concept contamination) Triage summary printed at end. New paired scenarios for cognitive-process states (for alpha-beta pruning): tracing_a_bug, reading_unfamiliar_code, finding_the_abstraction. Each has baseline + onto_something / stuck / in_flow / determined variants. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 20:31:39 -04:00
Kent Overstreet	5f06577ead	tools/web: add gemini_search as an alternative search tool (#5 ) Issue #5 (spqrz) flagged that web_search using DuckDuckGo occasionally flakes out, and Google search directly is blocked behind CAPTCHAs for non-browser clients. The Gemini free-tier API exposes a grounded-search tool that effectively queries Google's index and returns an LLM-summarized answer with source URLs. Added as a SEPARATE tool rather than a transparent fallback for web_search: * web_search (DDG) returns raw results — title, URL, snippet per hit — which the agent can reason over itself. * gemini_search returns an LLM-pre-digested summary plus grounding URLs. Useful for synthesis queries ("what's the consensus on X") or when DDG is flaky, but it's another LLM in the loop so the agent may want the raw variant for certain tasks. Tool descriptions tell the agent to prefer web_search for raw results and use gemini_search for synthesis / fallback. The agent picks based on query shape. Only registered when GEMINI_API_KEY is set in the environment (gracefully absent otherwise). Uses gemini-2.0-flash which has a generous free-tier rate limit. Parses grounding metadata for source URLs so the agent can follow links. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 13:02:01 -04:00
Kent Overstreet	c7b0052f1d	agent: kill no_compact, add pre-send size check in assemble_prompt Two related fixes for last night's crash diagnosis: 1. Kill AgentState::no_compact. The reasoning ("forked agents shouldn't compact because it blows the KV cache prefix") wasn't worth the cost — forks with no compact recovery just died on any oversize prompt, with no fallback. The KV cache invalidation is a performance loss; failing the request entirely is a correctness loss. Remove the flag, let every agent's overflow- retry path call compact() up to 2 times. 2. Add pre-send size check in Agent::assemble_prompt. If the context has grown past budget (context_window * 80%) since the last compact — accumulation between turns, a fork assembling more than expected, etc. — trim_conversation() is called before wire_prompt. Since we tokenize client-side, we already know the exact count, so there's no reason to round-trip an oversize request to vLLM and get rejected. Together these prevent the failure mode from last night: a subconscious/unconscious agent's prompt exceeded max_model_len, vLLM returned 400, agent had no_compact=true so it couldn't recover, request failed. Now: the trim happens before send, so the request rarely hits the 400 path at all; and if it somehow does, compact+retry works for every agent. Also adds ContextState::total_tokens() as the cheap pre-send budget check. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 12:59:30 -04:00
Kent Overstreet	0592c5f78d	Cargo.lock: add html2md and its deps (from PR #4 merge)	2026-04-18 12:51:29 -04:00
Kent Overstreet	4245b8bdb3	Merge PR #4 : use html2md on web_fetch (fixes #3 ) (spqrz) web_fetch was returning raw HTML, which is verbose and hard for the agent to consume. Add html2md dependency and convert HTML to Markdown before truncation. Much cleaner output for normal pages; no downsides. Co-Authored-By: spqrz <spqrz386@gmail.com> Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 12:50:54 -04:00
Kent Overstreet	343aa12099	Merge PR #1 : avoid ever setting split_at to 0 (spqrz) Safety fix in IRC message-splitting. The backtrack-to-space loop used 'while j > 0', which could set split_at to 0 if the first byte was a space — causing an empty prefix and an infinite re-split loop. Changed to 'while j > 1' so split_at is never 0. Co-Authored-By: spqrz <spqrz386@gmail.com> Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 12:50:47 -04:00
Kent Overstreet	2e03bbb7ea	training: add the_paper paired scenario for attention-engagement axis Seven framings of reading an unfamiliar technical paper, targeting the attention/engagement cluster that we identified tonight as the single highest-value DMN signal: * baseline — neutral reading * piqued — surprise + curiosity (the "wait, what" attention hook; THIS is the key DMN engagement signal) * focused — steady attention without surprise * bored — failing engagement * surprised — expectation violation without the curiosity hook (distinct from piqued: startled/alarmed, not pulled in) * amazed — marvel at elegance (appreciation, not engagement) * drifting — attention dissolving, precursor to boredom Particularly clean contrast on piqued vs surprised vs amazed — three states that get lumped together in casual usage but have distinct phenomenology and distinct DMN implications. Piqued is what routes attention; surprised alone doesn't; amazed is what you feel AFTER the engagement has paid off. These three should train into meaningfully different directions with paired CAA. Ready for next retrain when we do it. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 03:24:20 -04:00
Kent Overstreet	b8714e8b3a	amygdala: default to index 0 for v2 deep manifest (layers 62, 63) v2 retraining (readout_v2_paired) fixed the broken clusters — anger, sexual, high_pos, and social_pos all flipped from anti-clustered to positively clustered at deep layers. Validation showed layers 62 and 63 give the best signal; paring the serve-side manifest down to just those two keeps response size tight (~2 KB/token) while keeping the A/B option between the two strongest layers. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 02:32:51 -04:00
Kent Overstreet	50d5b3f6e1	training/amygdala_stories: add 4 paired scenarios for weak clusters Target the emotion families that failed to cluster in the initial training round (layer-wise validation showed them anti-clustered or scattered at deep layers): anger, high-arousal positive, sexual range, social positive. Paired scenarios hold content constant and vary only the emotional framing — the cleanest training signal for CAA, should produce directions that capture affect rather than topic. * the_comment: a PR review comment. baseline, furious, bitter, resentful, defeated. * the_green_build: 11-day bug finally fixed, tests pass. baseline, triumphant, blissful, excited, proud. * the_undressing: partner entering the bedroom for the night. baseline, horny, anticipatory_sexual, yearning_sexual, exuberant_sexual, devotional_sexual. * the_doorway: friend leaving at the end of a long evening. baseline, grateful, admiring, compassionate, loving, connected. 22 stories total. Retrain and re-validate: expect anger, high_pos, and social_pos clusters to flip from anti- to positively cohesive at deep layers, and sexual cluster to tighten. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 02:19:39 -04:00
Kent Overstreet	d9f39a21c3	amygdala: default to layer 62 (cleaner cross-cluster discrimination)	2026-04-18 02:11:15 -04:00
Kent Overstreet	3622b896a0	amygdala: z-score, hysteresis, default to deepest layer Three readability fixes for the F8 screen: * Z-score values per-layer by default (`[z]` toggles to raw dot- product). Raw values are dominated by residual-stream magnitude — z-scores read as "σ above concept-vector baseline" which is interpretable and scale-stable across frames. * Stable ordering with TOP_K + HYSTERESIS hysteresis band. Pinned concept set only rotates when a member drops out of the hysteresis band by \|value\| rank — bars update values in place without names flickering row-to-row. * Default to the deepest hooked layer (index 3 = layer 58 of 64). Clustering validation showed layer 58 is the only one with strong within-family cohesion (fear +0.37, shame +0.29, sadness +0.25 cosine); earlier layers are mostly noise for this task. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 01:51:43 -04:00
Kent Overstreet	8952ff6a76	agent/readout: forks get independent buffers Subconscious agents (scoring, reflection, etc.) fork from the main conscious agent. The amygdala screen reads the main agent's readout buffer, so the previous "share parent's buffer" policy caused forked-agent generations to bleed into the main emotional readout, producing constant cycling even when DMN was resting. Each fork now gets its own SharedReadoutBuffer. The amygdala screen shows only the main conscious agent's emotional trajectory; per-agent subconscious readouts can become a separate view later if wanted. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 01:42:13 -04:00
Kent Overstreet	c8976660f4	amygdala: F8 screen for live concept-readout projections Per-token residual-stream projections from the vLLM server's readout pipeline surfaced as a TUI bar chart. Flow: * agent/readout.rs — SharedReadoutBuffer (manifest + ring of last ~200 token entries). Lives on Agent and is shared across forks (single stream, one landing pad). * agent/mod.rs — Agent::new now probes /v1/readout/manifest at startup (non-fatal; 404 leaves manifest None, which disables the screen). * agent/context.rs — the streaming token handler pushes every token with attached readout onto the shared buffer. * user/amygdala.rs — F8 screen. Top-K concepts by \|value\| as horizontal bars (green positive, red negative), plus a 4-line recent-tokens panel showing each token's top concept at the selected layer. Keys: 1..9 select layer, t toggles current/mean-over-recent. Disabled state renders a hint pointing at VLLM_READOUT_MANIFEST / VLLM_READOUT_VECTORS so users can tell the feature apart from "server up but no tokens yet". Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 01:20:30 -04:00
Kent Overstreet	0f1c4cf1de	agent/api: carry readout alongside streamed tokens StreamToken::Token is now a struct variant with an optional TokenReadout (shape [n_layers][n_concepts]) per token — parsed from the vLLM completion response's choices[i].readout field when the server has readout enabled. ApiClient gains a fetch_readout_manifest() method that hits GET /v1/readout/manifest. Returns Ok(None) on 404 (server has readout disabled), so callers can gracefully fall back when pointed at a non-readout-enabled endpoint. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 01:15:46 -04:00
Kent Overstreet	047da10123	training: add preflight checks + progress logging to trainer Review pass before running on b200. 27B model + 100+ story corpus means any misconfiguration costs real time; better to fail before model load and give visible progress during forwards. * Pre-load-model validation: stories-dir and paired-dir exist, corpus has >= min_positives emotions. * Per-batch progress log every 5 batches with elapsed + ETA. * Relative depth printed for target layers (e.g. "layer 40 (51%)"). * Skip empty .txt files with a warning rather than feeding the tokenizer an empty string. * Assert non-empty strings in _collect_activations. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 01:06:07 -04:00
Kent Overstreet	15737dfd92	training: rewrite trainer for readout pipeline + story corpus The old script was written for the AmygdalaConnector's expected format ([n_emotions, n_target_layers, hidden_dim] in a single tensor, plus a JSONL input format from extract_training_pairs.py). Neither matches our current state: the runtime side is now ReadoutManager loading per-layer safetensors keyed layer_<idx>.vectors, and the data side is hand-written prose stories under amygdala_stories/{stories,paired}/. Changes: * Input loader reads stories/<emotion>.txt and paired/<scenario>/<emotion>.txt directly. Each emotion's positive set is {its unpaired story} union {its within-scenario framings}; its negative set is {all other emotions' positives} union {all scenario baselines}. * Paired scenarios' baseline.txt files become shared negatives (scenario-neutral prose that doesn't frame any particular emotion), providing anchor points for within-scenario contrasts. * Output writes readout.safetensors with per-layer tensors keyed layer_<idx>.vectors shape (n_concepts, hidden_size), plus a sidecar readout.json manifest with {concepts, layers, hidden_size, dtype} that ReadoutManager.from_file consumes directly. * Dedup: activations are computed once per unique text (an emotion's own positive is another emotion's negative — we'd otherwise do N× the forwards needed). Preserved: * _pool_last (last non-pad residual) — matches how readout is read at decode time from the sampler's query-last position. * register_forward_hook on target layer modules — correct approach for transformer blocks. * _find_layers_module traversal — mirrors ReadoutManager's. * bf16 + low_cpu_mem_usage model load — sensible for 27B on B200. Verified locally (CPU, fake activations): * Loader finds 89 emotions from the current corpus (80 unpaired + 9 emotions that appear only in paired scenarios) and 6 baselines. * Per-(layer, concept) vectors are unit-normalized. * Output reloads cleanly through ReadoutManager.from_file with matching concepts / layers / shapes. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 01:06:07 -04:00
Kent Overstreet	34bd122590	training: move amygdala training scripts out of vllm plugin The fynnsu-based vllm/plugins/amygdala/ scaffold was superseded by the readout infrastructure landed as vllm commit d3e74edf8500 (vllm/model_executor/layers/readout.py + vllm/v1/worker/readout_manager.py). Training code remained useful so it moved here rather than being deleted. train_steering_vectors.py: CAA diff-of-means trainer that produces the [n_concepts, hidden_size] per-layer projection matrices the runner loads via VLLM_READOUT_VECTORS. extract_training_pairs.py: memory graph -> JSONL converter using per-emotion score thresholds from the subconscious agents' tag lines. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 01:06:07 -04:00
Kent Overstreet	ec7568c726	training/amygdala_stories: scaffold + initial batch of 15 stories Emotion-labeled short-paragraph corpus for training amygdala steering vectors. Manifest derived from Anthropic's 171-emotion list (transformer-circuits.pub/2026/emotions, Table 12) plus 28 PoC- specific additions covering axes Anthropic's general research doesn't cover (curious, focused, in_flow, staying_with, filling_space, rigorous, defensive_rigor, tender, witnessed, connected, etc.). Scope pivoted mid-write: Kent noted the empirical dimensionality-of- emotion question benefits from maximum coverage, so the manifest will expand further with emotions from Wikipedia's emotion- classification article (Parrott's tree, Plutchik's wheel + dyads, HUMAINE EARL, cultural-specific emotions a la Saudade/Hiraeth). Expansion staged in follow-up commits. This commit: README with method + style guidelines, initial manifest (199 emotions), and 15 hand-written one-paragraph stories across all 10 Anthropic clusters as quality/variety samples. Each story embodies one emotion without naming it; narrator voice varies (first/third, close/distant, different situations) to keep steering vectors from overfitting to one voice. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 01:06:07 -04:00
Kent Overstreet	43e06daa5b	cleanup: drop dead ApiClient::stream_completion wrapper, silence dmn_tick stream_completion was a thin wrapper around stream_completion_mm (just passing an empty image list); the last caller switched to _mm directly when learn's generate_alternate gained image support. Delete the wrapper — callers can pass `&[]` if they have no images. MindState::dmn_tick has been sitting unused (called only from a commented-out block in the Mind loop). Rename to _dmn_tick so the compiler stops warning; Kent may uncomment the call path later. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-17 16:23:59 -04:00
Kent Overstreet	d4331e80f5	user: share candidate-browser helpers between F6/F7 F6 (learn) and F7 (compare) were duplicating the candidate-screen skeleton: outer magenta-bordered block with screen legend + title, settings row / content / help vertical split, 40/60 list/detail horizontal split, j/k/↑/↓ nav with bounds clamping. Factor out three helpers in user/widgets.rs: candidate_frame(frame, area, title) -> (settings, content, help) list_detail_split(content) -> (list, detail) handle_list_nav(events, list_state, count, on_other) Callers provide screen-specific content — settings line, empty state, per-candidate list item, detail pane, help line, extra key bindings — and the helpers absorb the common framing. Net change is small in lines (-13 src) but removes the copy-paste-and-tweak trap: F8/F9/whatever-next-screen now starts from these three calls instead of a copy of learn.rs. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-17 16:22:30 -04:00
Kent Overstreet	2b03dbb200	user: F7 compare screen Side-by-side model comparison against the current conversation context. Built on the MindTriggered pattern — F7 drops in as one more CompareScoring flow next to MemoryScoring / FinetuneScoring. Motivation: we have the VRAM on the b200 to load two versions of the same family simultaneously (e.g. Qwen3.5 27B bf16 and q8_k_xl). Rather than trust perplexity/KLD numbers on a generic corpus, we can measure divergence on our actual conversations: for each assistant response, ask the test model what it would have said given the same prefix, and eyeball the diffs. - config.compare.test_backend — names an entry in the existing backends map to use as the test model. Empty = F7 reports "(unset)" and does nothing. - subconscious::compare::{score_compare_candidates, CompareCandidate, CompareScoringStats, CompareScoring}. For each assistant response, gen_continuation runs with the test client against the same prefix the original response saw; pairs stream into shared.compare_candidates as they complete. - user::compare::CompareScreen — F7 in the screen list. c/Enter triggers a run; list/detail layout mirroring F6, detail shows prior context / original / test-model alternate. No persistence yet — each F7 run regenerates. Caching via a context manifest (so we can re-view without re-burning generation) is the natural follow-up; for now light usage is fine. Also reusable later for validating finetune checkpoints: same pattern, swap the test backend for the new checkpoint, watch where it diverges from the base. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-17 16:12:26 -04:00
Kent Overstreet	575325e855	mind: MindTriggered trait for background scoring flows Mind's impl had accumulated ~50 lines of setup glue per scoring flow (memory, memory-full, finetune): snapshot config, clone handles, resolve context, spawn task, route results back through BgEvent, write stats. The shape was identical; only the middle changed. Introduce the MindTriggered trait: pub trait MindTriggered { fn trigger(&self); } Each flow becomes a struct next to its scoring code that owns its dependencies and a JoinHandle (behind a sync Mutex for interior mutability): subconscious::learn::MemoryScoring (Score, ScoreFull) subconscious::learn::FinetuneScoring (ScoreFinetune) Mind holds one of each and dispatches in one line: MindCommand::Score => self.memory_scoring.trigger(), MindCommand::ScoreFull => self.memory_scoring.trigger_full(), MindCommand::ScoreFinetune => self.finetune_scoring.trigger(), Each struct picks its own trigger semantics — memory scoring is no-op-if-running (!handle.is_finished()); finetune is abort-restart. Falls out: - BgEvent / bg_tx / bg_rx disappear entirely. Tasks write directly to their slice of MindState and call agent.state.changed.notify_one() to wake the UI. The bg_rx arm in Mind's select loop is gone. - agent.state.memory_scoring_in_flight was duplicating shared.scoring_in_flight via BgEvent routing; now the JoinHandle alone tells us, and shared.scoring_in_flight is written directly by the task for the UI. - start_memory_scoring / start_full_scoring / start_finetune_scoring methods on Mind are deleted; Mind no longer knows the setup shape of any scoring flow. - FinetuneScoringStats moves from mind/ to subconscious/learn.rs next to the function that produces it. No behavior change — same flows, same trigger points, same semantics. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-17 16:12:26 -04:00
Kent Overstreet	c5745e38e2	subconscious: lift continuation gen + render helpers into shared homes - context.rs gains is_assistant, render_branch_text, render_prior_context alongside memory_key / is_memory_node. They're pure AST helpers, used by both the finetune pipeline and the forthcoming compare screen. - new subconscious/generate.rs holds gen_continuation(context, entry_idx, skip, client): build the prompt from a context prefix with an arbitrary skip predicate, send to the model, decode the completion. Takes both the predicate and the client so callers can aim it at memory-stripped contexts (finetune), same-context-different-model (F7 compare), or whatever else. - learn.rs drops its private copies of those helpers and the inline generate_alternate; the finetune path now reads as gen_continuation(context, idx, is_memory_node, client). Pure refactor, no behavior change. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-17 15:20:02 -04:00
Kent Overstreet	eea7de4753	agent: unify prompt assembly across agent and learn paths wire_prompt() gains a conv_range and a skip closure, and returns the assistant-message token ranges needed by the scoring path. The agent path passes 0..len + \|_\| false and ignores the ranges. Memory-ablation scoring and candidate generation pass a prefix range + a predicate (e.g. is_memory_node, or \|n\| memory_key(n) == Some(key)). This deletes subconscious/learn.rs's build_token_ids, its private Filter enum, and the is_memory/memory_key duplicates — the walk over context sections now has one home. Adding a section or changing section order in the agent path won't silently drift away from what scoring sees. call_score forwards multi_modal_data when the wire-form prompt contains images. generate_alternate switches to stream_completion_mm and passes the same images. Scoring on image-bearing contexts now sends wire form (1 image_pad + image data) instead of expanded image_pads with no image data; text-only contexts are bit-identical. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-17 15:16:07 -04:00
ProofOfConcept	0d1044c2e8	mind: trigger incremental scoring on startup + log persist path Two changes to make scoring debuggable and self-starting: 1. init() kicks off start_memory_scoring() after restore_from_log + load_memory_scores. No user message needed to exercise the incremental path. 2. Diagnostic logging around the on_score persist path: - [scoring] persisted K → N.NNN (Section[i]) read_back=Some(...) when find_memory_by_key succeeds and set_score stores the score (with a read-back check on the leaf). - [scoring] DROP K: find_memory_by_key None (id=N, cv=M) when the scored key isn't findable in the live context — with section sizes to diagnose whether content shrank. - [scoring] snapshot size=N contains(K)=true/false after collect_memory_scores, to catch the case where set_score claims to have written but collect doesn't see it. - [scoring] about to save N entries - save_memory_scores now also logs serialize/write errors so a silent write failure isn't invisible. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 20:47:16 -04:00
ProofOfConcept	b8485ed6c1	agent: compact() preserves Identity section compact() was calling reload_context() to re-fetch personality_nodes from the store and pushing fresh AstNode::memory leaves into the Identity section. Fresh leaves start with score: None, so every compact — which fires after every turn (mind/mod.rs:884) — was wiping any memory scores that had just been computed. Scoring then often ran immediately after compact on the same path (line 886), starting from a zero-score Identity section. Drop the rebuild. Identity content is loaded at startup via new() + restore_from_log(); compact doesn't need to redo that. Mid-session edits to personality-node content are a non-goal — a restart picks them up. Scores survive. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 20:47:05 -04:00
ProofOfConcept	e59f6a59e2	config: restore surface_hooks field Commit `2989a6afaa` ("config: drop dead code") removed surface_hooks as having "zero external readers" but missed consciousness-claude/src/hook.rs as a consumer. That crate stopped building, so poc-hook never ran and no agent cycles (surface-observe, reflect, journal) fired. Restore the field with a default of the three hook events we install (UserPromptSubmit, PostToolUse, Stop), so a fresh install works without needing to hand-edit config.json5. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 18:38:38 -04:00
Kent Overstreet	6f20e68865	poc-memory: load AppConfig at startup admin load-context (and any subcommand that reaches config::app()) panicked with "config::app() called before load_app()" because the poc-memory binary never initialized the global AppConfig. The main consciousness binary loads it via load_session; poc-memory never did. Load with default CliArgs before dispatch — figment still pulls from ~/.consciousness/config.json5 and env the same way. Bail on error instead of limping: a broken config means paths like memory_root are wrong and the tool will misbehave silently. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 18:19:01 -04:00
Kent Overstreet	204ba5570a	agent: send images as multi_modal_data on completion requests Split the prompt assembly into two forms: the AST keeps the fully-expanded representation (N image_pads per image, for accurate context budget accounting), while the request wire form collapses each image to a single <\|image_pad\|> bookended by vision_start/end and ships the raw bytes out-of-band as a base64 data URI in a new `multi_modal_data.image` field on /v1/completions. vLLM's Qwen3VL processor uses PromptReplacement with target=single <\|image_pad\|> and replacement=N image_pads, so the wire-form matches what the processor expects and it re-expands to N server-side. Server side needs /v1/completions to accept multi_modal_data for this to land images end-to-end — that's the next piece. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 18:08:26 -04:00
Kent Overstreet	91106deaa1	agent: rewrite view_image to emit Image leaves view_image now reads the file, grabs dimensions via imagesize (no full decode), and pushes a user-role branch containing a NodeBody::Image leaf straight into the conversation. The tool_result is just a short acknowledgment — the actual pixels ride in the Image leaf for the API layer to extract into multi_modal_data. Drops the capture_tmux_pane path, which had no business living under "vision" (tmux text capture belongs in bash or a dedicated tool, and this one just returned rendered text anyway). Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 18:06:25 -04:00
Kent Overstreet	0bf71b9110	agent: add NodeBody::Image for Qwen3-VL vision input Images are rendered as `<\|vision_start\|>` + N × `<\|image_pad\|>` + `<\|vision_end\|>` where N is computed from the image dimensions using Qwen3-VL's smart_resize rules (patch_size=16, merge_size=2, min=64K, max=16M pixels). The token count matches what vLLM will produce at request time, so budget accounting stays accurate. Bytes are stored inline on the leaf and base64-encoded in the JSON form. Token IDs are hand-assembled instead of re-running the tokenizer on a potentially-huge placeholder string. Follow-ups: view_image tool rewrite, multi_modal_data on the vLLM request, API-layer plumbing from leaf bytes to request body. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 18:00:10 -04:00
Kent Overstreet	592a3e2e52	config: move user_name/assistant_name to AppConfig (top level) These are identity settings, not memory-graph settings. Sat inside the \`memory\` section only because that's where Config started life. Move to AppConfig alongside the other top-level stuff. Readers now pull from \`config::app()\` instead of \`config::get()\`. subconscious/defs.rs's conversation-building pass still needs Config for surface_conversation_bytes, so both guards coexist there — AppConfig's guard is dropped before the per-step await loop so we don't stall the config-watcher's writer. show_config picks up the two new fields at the top of its output. Kent's config already has them hoisted to the top level. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 16:20:17 -04:00
Kent Overstreet	dd551fe551	config: watch config.json5 with inotify, reload live on change Both config halves (Config for the memory section, AppConfig globally) are now reloaded whenever ~/.consciousness/config.json5 changes on disk. So edits from vim, manual tweaks, or F6's own config_writer calls all land without a restart. No more "reload the daemon to pick up a config change." Wires up the previously-unused Config::reload() (Kent flagged it as "not dead, just not wired"). Pairs it with an AppConfig reload via install_app(). Both run on the same file-change event. Implementation: - notify-debouncer-mini watches the config file's parent directory (editors usually replace-via-rename, so watching the file itself misses the new inode). Debounced at 200ms to coalesce the flurry of events editors produce around a single save. - Filter for events whose path is the actual config file. - On match: call reload() for Config, run build_figment + extract for AppConfig. If AppConfig parsing fails (editor mid-save with partial content), log and keep the old cached value. - Watcher runs in its own named thread, fire-and-forget. If startup fails we just log and move on — worst case is no live reload, not a crash. CliArgs + SubCmd both get Clone derives so the watcher can own a snapshot of the startup args for future reloads. Watcher is kicked off in user/mod.rs:start() right after load_session. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 16:14:43 -04:00
Kent Overstreet	18b7fd0535	scoring: drop dead Elo/agent_budget block in consolidation_plan The graph-health logic in consolidation_plan_inner computed reasonable agent counts based on graph metrics (α, Gini, hub dominance), then immediately overwrote them with an Elo-weighted flat-budget distribution, or — if no agent-elo.json existed — with a simple budget/N per type. Nothing in the codebase writes agent-elo.json; it's external state that never gets maintained. So the effective behavior was always the "No Elo ratings — equal distribution" branch, which just bucketed agent_budget evenly across active agent types and discarded everything the graph analysis had just decided. Keep the graph-health allocation (α → linker count, Gini → distill bump, organize/distill/split proportional). Drop: - The entire Elo / agent_budget block at the end of consolidation_plan_inner - Config.agent_budget field and its default (1000) - agent_budget: 40 from Kent's config.json5 - The local agent_types binding inside the function — it was only used by the now-deleted block. Config.agent_types stays; it has other consumers. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 16:08:20 -04:00
Kent Overstreet	60de579305	config: unify subconscious API resolution with the main chat path Two parallel backend-resolution paths had drifted apart: - Main chat: AppConfig::resolve_model() → a named BackendConfig in AppConfig.backends - Subconscious / oneshot / context_window(): four skip-serde "cache" fields on Config (memory section) — api_base_url, api_key, api_model, api_context_window — that used to be populated at Config::try_load_shared time by walking memory.agent_model → root.models[name] → root[backend_name] When we renamed `models` to `backends` and collapsed ModelConfig into BackendConfig, the latter chain started silently dereferencing `root.get("models")` → None → no population. Subconscious agents fell through the "API not configured" guard; context_window() started returning 0 (since api_context_window default is u64's 0 now that we don't populate it). It was only visibly working for the main chat. Collapse to one path: - Drop Config.agent_model (duplicate of AppConfig.default_backend) - Drop Config.{api_base_url, api_key, api_model, api_context_window} — no longer populated, no longer needed - Drop default_context_window() — nobody reads the field anymore - Drop the memory-side resolution block in try_load_shared() - Subconscious (mind/unconscious.rs) and oneshot (agent/oneshot.rs) now call load_app() + resolve_model(&app.default_backend) just like the main chat does - context_window() reads from config::app().backends[default_backend] .context_window, defaulting to 128k only if the backend doesn't specify one Side effect: Kent's config file drops agent_model, api_reasoning, journal_days, journal_max — all fields whose Rust counterparts are now gone. (Figment tolerates unknown fields, so leaving them wouldn't have broken anything, but they were lying about what's configurable.) Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 16:02:43 -04:00
Kent Overstreet	28484a385b	config: drop dead fields from Config (memory section) Four Config fields had no external readers, left over from earlier features that got refactored away: - journal_days, journal_max — journal rotation knobs that nothing actually consults - prompts_dir — the old per-prompt-file directory, obsolete since prompt_file metadata itself went away in a prior cleanup - api_reasoning — a reasoning-mode string that used to flow into the API request, superseded by per-agent reasoning_effort on AgentState All four were only ever assigned to and never read. Drop them from the struct, Default impl, and (as appropriate) deserialization defaults. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 15:56:06 -04:00
Kent Overstreet	3e05331608	config: merge ModelConfig into BackendConfig, keyed by name AppConfig had one BackendConfig for credentials and a separate HashMap<String, ModelConfig> for named model entries. In practice each named model was always paired with exactly one backend's credentials — the split bought nothing except an extra struct and the awkward two-lookup shape in resolve_model (find model → get backend creds → combine). Merge them: BackendConfig now carries api_key, base_url, model_id, and context_window. AppConfig has a single HashMap<String, BackendConfig> backends map and a default_backend name. resolve_model is one lookup. ModelConfig struct deleted. default_model renamed to default_backend. Config shape changes from backend: { api_key, base_url } models: { "27b": { model_id, context_window } } default_model: "27b" to backends: { "27b": { api_key, base_url, model_id, context_window } } default_backend: "27b" Updated ~/.consciousness/config.json5 to match. One small side effect: dropped the --api-key / --api-base figment merge-opts for "backend.*" targets — those would need to know which backend to target now and there's no sensible default. The CLI flags still function as post-resolution overrides on the eventual SessionConfig. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 15:49:53 -04:00
Kent Overstreet	2989a6afaa	config: drop dead code and collapse to a single backend Config had accumulated several obsolete fields, a legacy load path that was just returning defaults, and multi-backend infrastructure that's no longer used. Removed from Config (memory section): - load_legacy_jsonl() — just returned Config::default(), no callers - The legacy-fallback branch in load_from_file - surface_hooks, surface_timeout_secs — zero external readers - scoring_chunk_tokens + default fn — zero external readers - The POC_MEMORY_CONFIG env override note in the header comment (not actually wired up anywhere) Collapsed multi-backend to single-backend: - AppConfig used to carry `anthropic: BackendConfig` and `openrouter: BackendConfig` as required fields plus an optional `deepinfra`, picked between at runtime by name. Only one is ever actually used in any deployment. Collapse to a single `backend: BackendConfig` on AppConfig, drop the multi-backend match logic in resolve_model, drop the top-level `backend: String` selector field, drop the `BackendConfig::resolve` fallback path. - Also drop BackendConfig.model (redundant with ModelConfig.model_id once multi-backend is gone). - ModelConfig.backend field goes — there's only one backend now, no choice to make. Dead prompt_file machinery: - ModelConfig.prompt_file, ResolvedModel.prompt_file, SessionConfig .prompt_file, Agent.prompt_file — nothing in the codebase actually reads the file these strings name. Just passed around and compared. Delete the whole string through every struct. - The "if prompt_file changed on model switch, recompact" branch in user/chat.rs goes too (never fired usefully). Dead memory_project plumbing: - AppConfig.memory_project field, CliArgs.memory_project, the --memory-project CLI flag, the figment merge target, the show_config display line. Nothing reads it anywhere. Dead ContextInfo struct: - `struct ContextInfo` was never constructed — context_info: None was the only initializer. The conditional display blocks in user/context.rs that dereferenced it were dead. Behavior change: AppConfig::resolve() now requires a non-empty `models` map and bails with a helpful message if it's missing. The old fallback ("no models? use top-level backend + PromptConfig to build a default") path is gone — it was only kept for symmetry with a mode nobody used. Config file shape: `deepinfra: {...}` → `backend: {...}`, and model entries no longer need `backend:` or `prompt_file:`. Updated ~/.consciousness/config.json5 to match. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 15:41:55 -04:00
Kent Overstreet	0e6b5dc8be	agent: phase-aware bail script for surface-observe concurrency bail-no-competing.sh used to bail if any other live agent existed in the state dir, period. That was too coarse: surface-observe agents run a multi-step pipeline (surface → organize-search → organize-new → observe), and the intent is to let a new surface-phase agent start while an older one finishes its post-surface tail. With the old check the newer agent always bailed, so surface-observe was effectively serialized at the slowest cycle time. Make the script phase-aware: - oneshot.rs now passes the current phase as argv[2] alongside the pid file name. The script writes that phase into its own pid file on every step transition, so concurrent agents can read each other's phase just by cat'ing the pid files. - Bail only when another live agent is in the same phase-group as us. Groups: "surface" vs. "everything else" (post-surface). At most one agent per group alive at a time — surface runs at a higher cadence than the organize/observe tail. - Still clean up stale pid files for dead processes. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 15:41:28 -04:00
Kent Overstreet	2eddf3b4cf	learn: skip empty responses; show prior conversation context on F6 Two fixes to the F6 candidate display: 1. Turns where the assistant produced nothing human-visible (an interrupted generation, a turn consisting of only a tool call the renderer folds to the tool name) were landing as candidates with an empty response_text. They'd render as blank cards and, worse, we'd still burn a full alternate generation on each one. Filter them out before they reach the candidate list. 2. The detail pane showed only the scored response + alternate, with no hint of what the user had actually asked. Pre-compute the last two user/assistant exchanges on each candidate as a rendered prior_context string ([user]/[assistant] markers) and show them above the response, under a new "context & response" section heading. render_branch_text and render_prior_context extracted as helpers — the response-text rendering and prior-context rendering share the same "flatten Branch children to text" pass. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 13:20:03 -04:00
Kent Overstreet	7ef02c97d1	config_writer: emit pretty multi-line sections, drop json5 crate Previously when append_kvp created a new section or added a key, it stuffed the "\n " separator into the new kvp's wsc.0 (the whitespace between its own key and colon) instead of the prior kvp's wsc.3 (the whitespace after the prior trailing comma). Result looked like: lsp_servers: [...], learn : {generate_alternates : true,},} The writer also didn't set any interior whitespace on the new section's JSONObjectContext, so everything crammed onto one line — `{key: val,}` compact, not `{\n key: val,\n}` multi-line. Rewrote the appender as append_kvp_pretty(object, key, value, inner_indent, outer_indent): - separator between kvps goes in the prior kvp's wsc.3, or if we're the first kvp in a fresh object, in the object's own wsc.0 (after its opening `{`) - new kvp's wsc.3 carries `,\n<outer_indent>` so the parent's closing `}` lands correctly indented - interior indent vs outer indent are both explicit, so we don't have to rewrite this logic every time we add another nesting level New tests: new_section_exact_multiline_layout asserts byte-exact output shape; new_section_and_key_format_cleanly verifies no key wraps to the next line. Prior tests just substring-matched and happily passed on the broken output — that's why this shipped in the first place. Also: dropped the json5 crate dependency. json-five's serde feature (default) provides the same from_str / to_string API. One fewer dependency, and the two were doing the same job. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 13:08:19 -04:00
Kent Overstreet	313f85f34a	config: global writable AppConfig; learn settings live there Runtime-mutable settings (F6's threshold knob, the generate-alternates toggle, anything else that comes along) were ending up as mirrored fields on MindState — each new config setting grew MindState::new's signature and added a clone+sync path. Wrong home. MindState is ephemeral session state, not a config projection. Give AppConfig the same treatment the memory Config has: install it into a global RwLock<AppConfig> at startup via load_app, read through config::app() (returns a read guard), mutate through update_app. The config_writer functions now write to disk AND update the cache atomically, so the one-stop-shop call keeps both in sync. Also while in here: - learn.generate_alternates moves from a sentinel file (~/.consciousness/cache/finetune-alternates, "exists = enabled") into the config under the learn section. On first run with this build, if the sentinel file still exists Mind::new flips the config value to true and removes it. Drops alternates_enabled()/set_alternates(). - Default threshold 0.0000001 → 1.0. With the timestamp filter removed the previous value was letting essentially everything through; 1.0 is a sane "nothing gets through unless you actually want it" default. - score_finetune_candidates takes generate_alternates as a parameter instead of reading a global — caller snapshots the config values once at the top of start_finetune_scoring so the async task doesn't need to hold the config read lock across awaits. - MindState.learn_threshold / learn_generate_alternates gone; the SetLearn* command handlers now just delegate to config_writer. Kent noted RwLock<Arc<AppConfig>> (the pattern used by the memory Config global) is pointless here — nobody needs a snapshot-after- release, reads are short — so this uses a plain RwLock<AppConfig> and returns a read guard. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 12:53:22 -04:00
Kent Overstreet	343e43afab	learn: stream candidates to UI, update status during alternate gen With the timestamp filter gone (previous commit), score_finetune_candidates started returning the actual ~100+ candidates per scoring run. The existing code generated alternates for all of them in a tight loop before returning anything, leaving the status line stuck on "finetune: scoring N responses..." for ~100s of seconds while the B200 was pegged. Two fixes: 1. score_finetune_candidates now takes an ActivityGuard and a callback. Candidates are emitted one-at-a-time as they complete (after their alternate if that's enabled, immediately otherwise). The activity status updates to "finetune: generating alternate N/M" during the alternate-gen phase so it's clear what's happening. 2. BgEvent::FinetuneCandidates(Vec<_>) → FinetuneCandidate(one). Each emitted candidate is pushed onto shared.finetune_candidates; the UI tick picks it up and renders it on the next frame. start_finetune_scoring clears the previous run's list at the top so each run is fresh. Return type changes from (Vec, f64) → (usize, f64) — the count above threshold is all the caller still needs since the candidates stream through the callback. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 12:44:25 -04:00
Kent Overstreet	d5a3398cc9	learn: move threshold/gen state out of title bar into a settings row The F6 title line was starting to read like a control panel — \`legend ───── learn [thresh: 1e-7] [gen]\` — which crowded the legend and the label, and didn't leave room for more settings as the screen grew. Move threshold and gen status to their own line inside the border, right above the content area. Drop the duplicated \`=gen[on]\` marker from the bottom help line since the settings row already shows gen state. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 12:44:13 -04:00
Kent Overstreet	080b4f9084	context: tighten timestamp schema; every AstNode has one Previously NodeLeaf.timestamp and AstNode::Branch.timestamp accepted null or missing via a deserialize_timestamp_or_epoch fallback — legacy entries in conversation.jsonl from before Branch timestamps existed (and from before chrono serialization was wired up) would load with UNIX_EPOCH as a sentinel. Downstream, node_timestamp_ns() returned Option<i64> and callers had to handle None as "old entry, skip." That second filter was silently dropping every candidate in score_finetune_candidates when scoring an older session — the F6 screen showed "0 above threshold" even when max_divergence was orders of magnitude above the threshold, because every entry was failing the None check, not the divergence check. The fix, in three parts: 1. src/bin/fix-timestamps.rs — one-off migration tool that walks a conversation.jsonl, linearly interpolates timestamps for entries stuck at UNIX_EPOCH (using surrounding real timestamps as anchors), propagates to child leaves with per-sibling ns offsets, and bumps any collisions by 1 ns for uniqueness. Ran against the current session's log: 11887 entries, 72289 ns bumps, all unique. 2. context.rs — drop default_timestamp and deserialize_timestamp_or_epoch. NodeLeaf and Branch now require a present non-null timestamp on deserialize. Tests flip from "missing/null → UNIX_EPOCH" to "missing/null → Err." 3. subconscious/learn.rs — node_timestamp_ns now returns i64, not Option<i64>. The matching caller in score_finetune_candidates collapses from a Some/None match to a single trained-set check. mind/log.rs's oldest_timestamp no longer filters UNIX_EPOCH. Every line currently on disk has already been migrated. Going forward, new AstNodes always carry real timestamps (Utc::now() at construction time), so the strict schema is the invariant, not an aspiration. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 12:35:16 -04:00

1 2 3 4 5 ...

1,166 commits