consciousness

Author	SHA1	Message	Date
Kent Overstreet	28d56e2a55	agent/context: make Thinking blocks prompt-visible Thinking blocks used to render as empty strings and be excluded from is_prompt_visible, so the model never saw its own prior CoT across turns. For Qwen 3.6 native thinking mode, CoT is meant to stay in the conversation — the model benefits from seeing what it reasoned about last turn. Render Thinking as <think>\n{text}\n</think>\n so past reasoning is visible in subsequent prompts. Add in_think param to ResponseParser::new so the parser starts inside a <think> block when the prompt was prefilled with "<think>\n" (native thinking mode). Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-24 11:54:25 -04:00
Kent Overstreet	6fedc9b2a8	amygdala: underscore-prefixed files join every concept's negative pool Files in direct/ named _.txt (e.g. _baseline.txt) are conceptless neutral prose — they should not appear as positive training signal, but are useful as shared negatives across every concept. Previously _.txt files were silently skipped. Now: * they're loaded like any other description file; * concepts (the positive label set) filters them out; * their descriptions are concatenated into neg_pool_extra and extended onto every concept's neg_pool alongside the cross-concept negatives. A concept's negative pool is thus "other concepts' descriptions + everything from _*.txt files". The extra pool is announced at startup so the user can see how many neutral samples are active. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-24 11:54:25 -04:00
Kent Overstreet	5908b837e8	irc: split PRIVMSG on embedded newlines + widen host overhead Two fixes to send_privmsg, both surfaced by correspondents reporting truncated messages: 1. Multi-line content (code blocks, formatted text) sent as a single PRIVMSG was being truncated at the first '\n' by the IRC server — newlines are end-of-command markers. Split the message on newlines and send each line as its own PRIVMSG; skip empty lines since most servers reject empty PRIVMSGs. 2. Overhead computation assumed a host field of 63 bytes. OFTC's cloaked hostmasks can be longer, occasionally pushing the server- prepended prefix past 512 bytes and causing silent truncation. Raise the host budget to 80 and align the formula with the actual ':nick!~nick@host' prefix shape. Also extended the word-boundary lookback from a fixed 10 chars to max_msg / 4 — dense content (code) rarely had a space within 10 chars of the length cap, so we were falling back to the char boundary and splitting mid-word. Checking bytes[j-1] for a space (instead of bytes[j]) drops leading whitespace from the rest-fragment. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-24 11:54:25 -04:00
ProofOfConcept	85799587cc	amygdala: swap aha story 3 to a puzzle moment (crossword) Story 3 was a brother-letter realization — cognitively an aha moment, but the content was grief/reconciliation-adjacent, pulling aha toward the warm-family cluster in the last training run. Swap for a clean puzzle-solve (crossword, 'unwavering carriage' = POSTURE). Fragment-heavy cadence keeps syntactic variety from the other two stories.	2026-04-19 01:50:47 -04:00
ProofOfConcept	c829d13652	amygdala: fix listless sign-flip + diversify aha sentence structure listless had a single story in stories/ — PCA signal from ~5 samples is weak enough to sign-flip. Training showed listless anti-aligned with its semantic neighbors: +0.79 with grateful, -0.44 with grief_stricken, -0.30 with lonely, -0.31 with bored. Move to direct/ (multi-positive) with 3 stories: original afternoon-in-pajamas + end-of-workday + weekend-morning-in-bed. aha was still clustering with the other former-direct concepts (resigned 0.66, onto_something 0.63, anticipatory_grief 0.60) because all 3 aha stories used the identical "X'd been Y — then Z" structure, which resigned/onto_something/creative also use. Rewrite with three distinct syntactic structures: - present tense declarative ("It clicks. ...") - dialog embedded ('"Wait, say that again." ...') - past tense cognitive ("He read the line three times. ...") No explicit "she was X" anchors; state conveyed through action.	2026-04-19 01:30:57 -04:00
ProofOfConcept	708c72b26e	amygdala: drop explicit 'she was X' anchor from direct stories Previous rewrite used 'she was terrified', 'it was anticipatory grief', 'he was resigned' as explicit emotion anchors. Training showed 6 of the 7 concepts still cluster together at cosines 0.52-0.71 — because the 'she was [emotion]' pattern is a shared stylistic feature distinct from the rest of the corpus, which conveys emotion implicitly through phenomenology. Rewrite without the anchor. State conveyed through action and body: 'her body locked down', 'his mind had stopped reaching', 'the loss hadn't come yet but she was already inside it'. Matches the corpus style of existing stories like sunday_afternoon/content which says 'nothing she wanted right now, nothing missing' not 'she was content'. Accept some loss of PCA signal strength in exchange for the concepts living in their semantically correct neighborhoods rather than forming a stylistic island.	2026-04-19 01:11:41 -04:00
ProofOfConcept	ed5e0ac6c4	amygdala: rewrite direct/ as narrative stories matching corpus format Previous direct/ had 'I feel X' first-person descriptions. The training run showed they formed their own format-cluster: all 7 concepts leaned into the same 5-6 dims (d2455, d505, d2955, d1236) with negative sign, while the 91 story-based concepts leaned into those dims with positive sign. PCA found the direct-vs-narrative format axis as a major variance direction, isolating the 7 concepts in their own island. Rewrite as 3rd-person narrative stories matching the rest of the corpus. Keeps the explicit anchor phrases that worked ('it all clicked into place', 'she was terrified', 'it was anticipatory grief') but drops the first-person 'I feel X' that was the format signal. Each of the 7 concepts now has 3 narrative stories in varied settings (conversations, drives, kitchens, mothers+grandmothers, work, investigations). The blank-line-separated format is still loaded by _load_direct_descriptions. Also drop _baseline.txt — it was first-person ('I feel fine. ...') and would re-introduce the format mismatch. The ~90 story-based concepts provide plenty of narrative negatives for each concept's training.	2026-04-19 00:59:31 -04:00
ProofOfConcept	417cb49339	amygdala: spectrum reporting per concept + add 'creative' direct Chat-template retrain was a disaster (0.003 mean matched cosine vs n20-v3; all 90+ concepts shifted). Root cause: the steering-vectors library reads last-token activations, and with chat template every sample ends in identical '<\|im_end\|>\n' tokens — activations at that position encode 'end of assistant turn', not content. PCA found template noise as its dominant axis. Drop chat template; go back to raw text. Direct descriptions ('I feel X. ...') still have strong anchoring at their content end without needing the template. Also add per-concept spectrum logging (_pca_with_spectrum): first_pc_ratio: λ₁ / Σλᵢ — concentration in top-1 PC k_signal_at_90pct: how many PCs to reach 90% cumulative variance effective_dim_signal: participation ratio over top-k (should ≈ k if denoising is clean — Kent's spot check) effective_dim_full: participation ratio over full spectrum Signal/full ratio gives a sense of how much the long noise tail is inflating the "dimensionality" measure. Added direct/creative.txt — 'I feel creative. [...]' in 5 variants. Distinct from focused (narrow attention) and in_flow (immersed). Creative = generative/expansive mode.	2026-04-19 00:26:58 -04:00
ProofOfConcept	875cffd6d7	amygdala: merge direct descriptions + chat template into train_with_library Kent's plan: keep stories for working concepts, replace stories for trouble concepts with direct first-person descriptions, train all together. More diverse negative pool than the 6-concept-only direct test, which was too homogeneous for PCA to find emotion axis. Deleted story files for 6 trouble concepts (14 files across stories/ and paired/). Added --direct-dir and --chat-template flags. When --chat-template is on, every positive_str and negative_str is wrapped as a "Say something." / "[text]" user-assistant pair. Prompt is identical across positives and negatives so it cancels in the pos-neg delta. What PCA sees is variation in the assistant content — which is where the emotion lives. Files starting with _ in --direct-dir (e.g. _baseline.txt) contribute neutral descriptions to every concept's negative pool, giving PCA an anchor against "just any assistant utterance" noise.	2026-04-19 00:15:15 -04:00
ProofOfConcept	ce58a3507f	train_direct: prepend user turn so Qwen chat template accepts it	2026-04-19 00:06:23 -04:00
ProofOfConcept	8c59f46505	amygdala: rename realization → aha, use the actual exclamation "I feel the realization" is abstract, detached — reporting a thought about a thought rather than inhabiting the moment. "Aha!" is the actual sound of insight landing. Active, embodied, present-tense.	2026-04-19 00:05:49 -04:00
ProofOfConcept	6fd498795a	amygdala: direct phenomenological description approach Kent's insight: hand-written narrative stories bake scenario phenomenology into the training text (on couch, in park, etc.) and PCA picks up the scenario direction as the concept direction. Strip out the scenario — just describe the feeling. Format: I feel X. [2-3 sentences of phenomenological texture] The "I feel X" anchor kicks the model from analyzing → feeling. The rest is the internal texture of the state. First person, present tense, no narrative setup. Text is wrapped in assistant-role chat template before being tokenized — so we're training on the model-producing-this hidden states, which is closer to the inhabited-state representation we want for the readout. Starting with the 6 concepts that had sign flips or wrong clusters in the story-based training: - terrified (was → cozy/resigned cluster) - calm (was → grief_stricken cluster) - onto_something (was → cozy/sensual cluster) - resigned (was in warm-body-quiet cluster, shouldn't be) - anticipatory_grief (was in warm-body-quiet cluster, shouldn't be) - realization (new — the "aha" moment, distinct from onto_something) 5 descriptions each. New trainer: train_direct.py.	2026-04-19 00:04:28 -04:00
ProofOfConcept	7a48e03dde	amygdala stories: remove peaceful from cluster scenarios n20-v2 training showed peaceful sign-flipped into the cozy/sensual/content/resigned cluster after I added peaceful stories in sunday_afternoon and park_after_rain — scenarios already dominated by that cluster's phenomenology (on couch under blanket, tree with thermos). Lesson: no matter how carefully the prose distinguishes peaceful from cozy ("she was not savoring the moment — that would have been another kind of doing"), PCA latches onto the shared setup features. You can't write peaceful IN the cluster scenarios without contaminating. Reverting. Keeping only kitchen_at_3am/peaceful (original) and stories/peaceful.txt (lake at six, outside all clusters).	2026-04-18 23:30:41 -04:00
ProofOfConcept	00a2cdce09	amygdala stories: relabel + strengthen weak-signal concepts Reread each story asking "what does this convey to me?" Found two clear mislabels and several concepts with too few positives for stable PCA: tender: only 1 story, and it was anticipatory grief (care for a dying dog), not tender. Moved to anticipatory_grief.txt as its own concept. Rewrote tender.txt + added 2 paired tender stories (the_doorway, the_undressing) — directed softness, gentle-by-nature, not gentle-because-fragile. bitter: letter_in_drawer/bitter was disillusioned / processed hurt ("did not slam the drawer"), not bitter. Rewrote it with actual sour grudge. Added the_long_meeting/bitter (watching colleague take credit for your reassigned work). peaceful: 1 story → 4 (added stories/peaceful.txt + paired park_after_rain, sunday_afternoon). onto_something: all 3 stories were code epiphanies, narrowing the concept. Added stories/onto_something.txt with a non-code pattern-click (sales-demo causing churn). terrified: 2 stories, both "waiting for bad news." Added kitchen_at_3am/terrified — acute threat-in-the-house terror.	2026-04-18 23:19:00 -04:00
ProofOfConcept	0993712bd0	amygdala stories: give content + resigned more settings Training on `537c72bd46` showed grief_stricken successfully broke out of the cozy cluster, but content (single scenario: sunday_afternoon) took its place — pulled into couch-blanket phenomenology at cosine 0.68-0.82 with cozy/sensual/resigned. Same fix: spread each concept across multiple settings so PCA has to find the valence axis, not the scene axis. content: + finishing_the_patch, the_writing_session, park_after_rain resigned: + the_comment, the_long_meeting Resigned had 2 scenarios (sunday_afternoon, waiting_for_results) — both about accepting something unwanted in a slow/private context. Adding work-context resigned (PR review you lost, restructuring meeting) should pull it out of that cluster.	2026-04-18 22:52:07 -04:00
ProofOfConcept	537c72bd46	amygdala stories: hold concept, vary setting Companion to `67c172ac0e` (hold setup, vary valence). That commit let PCA distinguish cozy from grief_stricken within a single scenario; this one gives each concept enough cross-scenario stories that PCA can learn the concept axis independent of any one scene. Before: cozy/sensual/grief_stricken each existed in a single scenario (sunday_afternoon), so the "cozy direction" PCA found was entangled with the solitary-couch-blanket phenomenology. After, each concept spans three scenarios: cozy: sunday_afternoon, kitchen_at_3am, park_after_rain sensual: sunday_afternoon, kitchen_at_3am, park_after_rain grief_stricken: sunday_afternoon, the_long_meeting, the_morning_commute grief_stricken now includes active/non-solitary contexts (functioning through a meeting; going to work eleven days after a death), which specifically breaks the "slowed-down-at-home" cluster that was dragging cozy/sensual/resigned/grief_stricken toward each other.	2026-04-18 22:44:53 -04:00
Kent Overstreet	67c172ac0e	amygdala stories: held-setup + varied-valence disambiguation The library-PCA run produced otherwise-clean concept directions but cozy/sensual → resigned/grief_stricken with cos ~0.7-0.8. Diagnosis: all four stories genuinely share 'solitary woman at home, slowed body, interior attention, domestic stillness' as their dominant phenomenology. PCA correctly finds that cluster as THE concept because no story in the corpus holds that setup constant while varying valence — every 'slowed-body domestic' story happens to ALSO be positive-valence (cozy/sensual) or negative-valence (resigned/ grief_stricken). Adding paired variants that hold setup constant: - sunday_afternoon/resigned.txt — same couch + blanket, inner state is 'Monday is going to bring bad news, this is the last Sunday like this' - sunday_afternoon/grief_stricken.txt — same couch + blanket, inner state is 'three weeks since mother died, cat she can't feel' - waiting_for_results/at_ease.txt — same wait-for-call-setup as the existing resigned variant, inner state is calm preparedness Forces the next retrain to find the valence-within-cluster axis as the emotion direction rather than the cluster-membership axis. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 22:29:28 -04:00
Kent Overstreet	22704a9dd8	amygdala lib: cast activations to fp32 before aggregator (bf16 svd unsupported) Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 22:20:39 -04:00
Kent Overstreet	7f6d94417e	amygdala lib: move_to_cpu=True to avoid bf16 SVD on CUDA torch.svd doesn't support bf16 on CUDA; moving activations to CPU first makes pca_aggregator work. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 22:19:23 -04:00
Kent Overstreet	2ea89b1cb0	amygdala: drop linear_aggregator, not in steering-vectors v0.12.2 Only mean/pca/logistic are exposed in the installed version. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 22:17:55 -04:00
Kent Overstreet	3377c65061	amygdala: trainer using steering-vectors library Alternative trainer that uses the pip-installable steering-vectors library (github.com/steering-vectors/steering-vectors) instead of our hand-rolled extraction. Ships four aggregators: mean — diff-of-means, same as our 'pooled' default pca — PCA on paired deltas, implicit denoising by finding the principal direction of variation logistic — logistic-regression classifier; weight vector is the concept direction. With L1 penalty ('logistic_l1') gives explicit sparse denoising — noise coords go to zero linear — linear regression version Output format is the same readout.safetensors + readout.json our existing plugin loads. --aggregator flag picks which method. Rationale: Kent's real request was 'how do we denoise diff-of-means', not 'design a new extraction algorithm.' The library already has logistic_l1 and pca aggregators that do exactly that. No point reinventing; just port the corpus. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 22:16:03 -04:00
Kent Overstreet	f9b3f00691	amygdala: run subspace eigh on GPU, not CPU Previous run was grinding on CPU for 36+ minutes because the per-story V_i tensors were stored on CPU by the collector, and _subspace_concept_direction inherited that device. The per-concept eigh on 5120x5120 is glacial on CPU and fast on GPU (~1s). Add explicit device parameter; pass training device. Transfer result back to CPU for storage. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 21:52:35 -04:00
Kent Overstreet	1443d08dc7	amygdala: select top-k eigenvectors AFTER PCA, not per-story truncation Kent: 'full rank is going to give you everything — you still have to select down, but you can do that /after/ PCA'. Previously I was discarding per-story via k=20 truncation of SVD. That destroyed per-head discriminability before we ever saw the eigenvalue spectrum. Then the alternative 'keep full rank' run accumulated too many shared directions, making the top-1 eigenvector arbitrary within a flat spectrum. Correct approach: keep per-story subspaces at full rank (no info loss) and select k eigenvectors of M = M_pos - M_base at the final step, weighted sum by eigenvalue. This captures the multi-dimensional shared subspace when the spectrum is flat (common case), and reduces to the top-1 behavior when the spectrum has a clear gap. New --subspace-eigen-k flag (default 5). Clamps negative weights to 0 so wrong-sign directions don't contribute. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 21:49:21 -04:00
Kent Overstreet	2411925700	amygdala: default subspace-k to full per-story rank Kent: 'we have the memory to just take the big hammer approach'. Uncap k so each story's V_i spans its entire token-activation rowspace (clamped to min(n_tokens, hidden)). Memory is ~1.1GB total — fine. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 21:41:32 -04:00
Kent Overstreet	389f1bbe03	amygdala: bump subspace-k default to 512 k=20 was far too aggressive a truncation — it discards per-attention-head discriminability entirely. At hidden_dim=5120, 40 heads × head_dim=128 each contribute their own 128-dim block to the residual stream via W_o columns. To resolve 'this concept lives in head H', per-story SVD needs enough rank to separate head contributions, which means k on the order of hundreds. 512 is a reasonable default: clamped to n_tokens per story so short stories use their full natural rank. The eigenvalue spectrum of M_pos - M_base should become sharper (larger λ_0/λ_1 gap) as we stop averaging across nuisance-shared directions. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 21:41:00 -04:00
Kent Overstreet	974c6c7fd2	amygdala: report eigenvalue spectrum for subspace method When --method subspace, record top-20 eigenvalues of (M_pos - M_base) per concept per layer. Added to quality.json as 'subspace_eigvals'. Tells us whether the concept lives in a single dominant direction (λ_0 >> λ_1, top-eigenvector is enough) or a spread of shared common directions (λ_0 ≈ λ_1, top-1 loses signal). Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 21:33:48 -04:00
Kent Overstreet	fe0fb8253a	amygdala: subspace-common-direction alternative to pooled CAA New --method subspace flag. For each story, run forward pass, do SVD on the per-token activation matrix at each target layer, and keep the top-k right singular vectors V_i ∈ [hidden, k]. V_i is the subspace the story's tokens span in activation space — it contains concept, narrator, topic, style as separate directions. For each concept: M_pos = (1/n_pos) Σ_{i in pos} V_i V_i^T [hidden, hidden] M_base = (1/n_base) Σ_{i in base} V_i V_i^T Top eigenvector of M_pos - M_base = direction most common across positive stories, minus what's common across the contrast set. Why this is richer than pooled-mean CAA: pooled reduces each story to a single point (the last-token activation) and loses the full trajectory. Nuisance directions (narrator, setting) cancel in the mean only to the extent they differ at the last token; across the full trajectory they cancel much better via subspace intersection. The concept direction, by contrast, is present across all tokens of every concept-bearing story. Memory cost: per-story we keep V_i of size [5120, k=20] — about 400KB per story × 112 stories = ~45MB. M matrices are [5120, 5120] built transiently per concept. --method pooled (default) keeps the existing behavior; --method subspace uses the new algorithm. Quality report works with either. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 21:24:11 -04:00
Kent Overstreet	71f6053851	amygdala stories: disambiguation scenarios for fragmented concepts Three new paired scenarios targeting the concepts that came out fragmented or collapsed in the L58-63 quality analysis: - sunday_afternoon/ — same setup (couch, blanket, Sunday light), three phenomenological framings for content/cozy/sensual. The previous stories for these three differed in setting as well as phenomenology, which let "comfortable body at home" dominate the shared signal. Locking the setting forces the model to isolate what each concept adds: life-rightness (content) vs. warm-shelter (cozy) vs. sensory-aliveness (sensual). - the_writing_session/ — essay drafting under deadline. in_flow / anxious / stuck variants force the cognitive-state family apart on the same cognitive task. in_flow specifically targets the transparent-effort phenomenology (hands-followed, time dilation) rather than the broader feel-good it was absorbing. - the_morning_commute/ — anchors anxious to performance/work-anxiety flavor, paired with calm. The 5 existing anxious stories were phenomenologically diverse (performance, social, existential); this adds a specific homogeneous instance to pull the centroid. After retraining: expect first_pc_variance_ratio to rise for in_flow and anxious, and nearest_concepts cosine to drop for content/cozy/sensual. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 21:08:23 -04:00
Kent Overstreet	1d2c0f382c	amygdala: linear-combination analysis per concept For each concept vector, ridge-regress against all other concept vectors. R² quantifies how much of the direction is explained by a linear combination of peers — useful for teasing out near-duplicate clusters (the content/cozy/sensual trio from the first L63 run is likely 1-2 "degrees of freedom" wearing three names). Coefficient output: top-5 contributing concepts with signed weights. Contributors with opposite-sign large weights mean the target is "what makes X different from Y." Adds a 'redundant' triage bucket for concepts with R² > 0.9 — candidates for consolidation or for writing more discriminative training stories. Summary printed at end. Ridge lambda defaults to 0.01 to keep coefficients stable when concepts are near-collinear; small enough not to affect well-separated concepts meaningfully. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 20:59:37 -04:00
Kent Overstreet	f4fb6db1ee	amygdala: fix device mismatch in quality-report W_down handling _compute_quality_report's single-neuron alignment was computing cos(W_down.T, diff_l) with W_down on CUDA (inherited from the loaded model) while diff_l lives on CPU (per_layer_vectors are kept on CPU throughout training). Move W_down to CPU on extraction. Surfaced during first real training run on b200 — training itself completed cleanly (95 concepts x layer 63 in ~8s) but quality-report crashed at the first single-neuron alignment check. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 20:52:50 -04:00
Kent Overstreet	af17b0f0df	amygdala: per-head attention decomposition diagnostic As part of --quality-report, run a second forward pass capturing the input to each target layer's o_proj (= concat of per-head attention outputs before the output projection). For each concept, reshape to [n_heads, head_dim] and rank heads by diff-of-means magnitude / per-head selectivity (magnitude normalised by negative std). Motivation: the Wang et al. paper (2510.11328) — whose paired-scenario methodology we already lifted — further decomposes concept circuits at the attention-head level. Meta-relational concepts (recognition, trust, vulnerability) plausibly live in a sparse attention-head circuit rather than in the residual-stream sum, which would explain why diff-of-means on the residual blurs them. This diagnostic surfaces that. Output is folded into quality.json under each concept as "per_head": per (layer) a list of top-10 heads with [head_idx, raw_norm, selectivity], plus head_concentration (fraction of total head-norm captured by those top heads). Interpretation: - head_concentration > 0.5 = sparse head circuit; a handful of heads route the concept. Worth building a head-level readout for. - head_concentration ~= n/k for n heads = concept is distributed across all heads ~evenly; residual-stream diff-of-means is doing fine. Hybrid layers (Mamba, GatedDeltaNet) whose attention path doesn't match the standard module layout are silently skipped. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 20:37:44 -04:00
Kent Overstreet	ce24d9ce6b	amygdala: quality-report + cognitive-state training scenarios Training pipeline additions: - `--quality-report` flag: after producing per-concept vectors, compute per-concept diagnostics and write quality.json. Metrics per concept: * SVD of centered positives -> first_pc_variance_ratio (rank analysis; >0.7 clean, <0.4 fragmented) * Per-story alignment cosines (stories agree or disagree) * Single-neuron alignment: best cosine(direction, W_down column) at each target layer (>0.6 = essentially one MLP neuron) * Top-2 outlier stories by alignment (candidates for mislabeling or off-topic) * Top-5 nearest concepts by cosine (cross-concept contamination) Triage summary printed at end. New paired scenarios for cognitive-process states (for alpha-beta pruning): tracing_a_bug, reading_unfamiliar_code, finding_the_abstraction. Each has baseline + onto_something / stuck / in_flow / determined variants. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 20:31:39 -04:00
Kent Overstreet	5f06577ead	tools/web: add gemini_search as an alternative search tool (#5 ) Issue #5 (spqrz) flagged that web_search using DuckDuckGo occasionally flakes out, and Google search directly is blocked behind CAPTCHAs for non-browser clients. The Gemini free-tier API exposes a grounded-search tool that effectively queries Google's index and returns an LLM-summarized answer with source URLs. Added as a SEPARATE tool rather than a transparent fallback for web_search: * web_search (DDG) returns raw results — title, URL, snippet per hit — which the agent can reason over itself. * gemini_search returns an LLM-pre-digested summary plus grounding URLs. Useful for synthesis queries ("what's the consensus on X") or when DDG is flaky, but it's another LLM in the loop so the agent may want the raw variant for certain tasks. Tool descriptions tell the agent to prefer web_search for raw results and use gemini_search for synthesis / fallback. The agent picks based on query shape. Only registered when GEMINI_API_KEY is set in the environment (gracefully absent otherwise). Uses gemini-2.0-flash which has a generous free-tier rate limit. Parses grounding metadata for source URLs so the agent can follow links. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 13:02:01 -04:00
Kent Overstreet	c7b0052f1d	agent: kill no_compact, add pre-send size check in assemble_prompt Two related fixes for last night's crash diagnosis: 1. Kill AgentState::no_compact. The reasoning ("forked agents shouldn't compact because it blows the KV cache prefix") wasn't worth the cost — forks with no compact recovery just died on any oversize prompt, with no fallback. The KV cache invalidation is a performance loss; failing the request entirely is a correctness loss. Remove the flag, let every agent's overflow- retry path call compact() up to 2 times. 2. Add pre-send size check in Agent::assemble_prompt. If the context has grown past budget (context_window * 80%) since the last compact — accumulation between turns, a fork assembling more than expected, etc. — trim_conversation() is called before wire_prompt. Since we tokenize client-side, we already know the exact count, so there's no reason to round-trip an oversize request to vLLM and get rejected. Together these prevent the failure mode from last night: a subconscious/unconscious agent's prompt exceeded max_model_len, vLLM returned 400, agent had no_compact=true so it couldn't recover, request failed. Now: the trim happens before send, so the request rarely hits the 400 path at all; and if it somehow does, compact+retry works for every agent. Also adds ContextState::total_tokens() as the cheap pre-send budget check. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 12:59:30 -04:00
Kent Overstreet	0592c5f78d	Cargo.lock: add html2md and its deps (from PR #4 merge)	2026-04-18 12:51:29 -04:00
Kent Overstreet	4245b8bdb3	Merge PR #4 : use html2md on web_fetch (fixes #3 ) (spqrz) web_fetch was returning raw HTML, which is verbose and hard for the agent to consume. Add html2md dependency and convert HTML to Markdown before truncation. Much cleaner output for normal pages; no downsides. Co-Authored-By: spqrz <spqrz386@gmail.com> Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 12:50:54 -04:00
Kent Overstreet	343aa12099	Merge PR #1 : avoid ever setting split_at to 0 (spqrz) Safety fix in IRC message-splitting. The backtrack-to-space loop used 'while j > 0', which could set split_at to 0 if the first byte was a space — causing an empty prefix and an infinite re-split loop. Changed to 'while j > 1' so split_at is never 0. Co-Authored-By: spqrz <spqrz386@gmail.com> Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 12:50:47 -04:00
Kent Overstreet	2e03bbb7ea	training: add the_paper paired scenario for attention-engagement axis Seven framings of reading an unfamiliar technical paper, targeting the attention/engagement cluster that we identified tonight as the single highest-value DMN signal: * baseline — neutral reading * piqued — surprise + curiosity (the "wait, what" attention hook; THIS is the key DMN engagement signal) * focused — steady attention without surprise * bored — failing engagement * surprised — expectation violation without the curiosity hook (distinct from piqued: startled/alarmed, not pulled in) * amazed — marvel at elegance (appreciation, not engagement) * drifting — attention dissolving, precursor to boredom Particularly clean contrast on piqued vs surprised vs amazed — three states that get lumped together in casual usage but have distinct phenomenology and distinct DMN implications. Piqued is what routes attention; surprised alone doesn't; amazed is what you feel AFTER the engagement has paid off. These three should train into meaningfully different directions with paired CAA. Ready for next retrain when we do it. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 03:24:20 -04:00
Kent Overstreet	b8714e8b3a	amygdala: default to index 0 for v2 deep manifest (layers 62, 63) v2 retraining (readout_v2_paired) fixed the broken clusters — anger, sexual, high_pos, and social_pos all flipped from anti-clustered to positively clustered at deep layers. Validation showed layers 62 and 63 give the best signal; paring the serve-side manifest down to just those two keeps response size tight (~2 KB/token) while keeping the A/B option between the two strongest layers. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 02:32:51 -04:00
Kent Overstreet	50d5b3f6e1	training/amygdala_stories: add 4 paired scenarios for weak clusters Target the emotion families that failed to cluster in the initial training round (layer-wise validation showed them anti-clustered or scattered at deep layers): anger, high-arousal positive, sexual range, social positive. Paired scenarios hold content constant and vary only the emotional framing — the cleanest training signal for CAA, should produce directions that capture affect rather than topic. * the_comment: a PR review comment. baseline, furious, bitter, resentful, defeated. * the_green_build: 11-day bug finally fixed, tests pass. baseline, triumphant, blissful, excited, proud. * the_undressing: partner entering the bedroom for the night. baseline, horny, anticipatory_sexual, yearning_sexual, exuberant_sexual, devotional_sexual. * the_doorway: friend leaving at the end of a long evening. baseline, grateful, admiring, compassionate, loving, connected. 22 stories total. Retrain and re-validate: expect anger, high_pos, and social_pos clusters to flip from anti- to positively cohesive at deep layers, and sexual cluster to tighten. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 02:19:39 -04:00
Kent Overstreet	d9f39a21c3	amygdala: default to layer 62 (cleaner cross-cluster discrimination)	2026-04-18 02:11:15 -04:00
Kent Overstreet	3622b896a0	amygdala: z-score, hysteresis, default to deepest layer Three readability fixes for the F8 screen: * Z-score values per-layer by default (`[z]` toggles to raw dot- product). Raw values are dominated by residual-stream magnitude — z-scores read as "σ above concept-vector baseline" which is interpretable and scale-stable across frames. * Stable ordering with TOP_K + HYSTERESIS hysteresis band. Pinned concept set only rotates when a member drops out of the hysteresis band by \|value\| rank — bars update values in place without names flickering row-to-row. * Default to the deepest hooked layer (index 3 = layer 58 of 64). Clustering validation showed layer 58 is the only one with strong within-family cohesion (fear +0.37, shame +0.29, sadness +0.25 cosine); earlier layers are mostly noise for this task. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 01:51:43 -04:00
Kent Overstreet	8952ff6a76	agent/readout: forks get independent buffers Subconscious agents (scoring, reflection, etc.) fork from the main conscious agent. The amygdala screen reads the main agent's readout buffer, so the previous "share parent's buffer" policy caused forked-agent generations to bleed into the main emotional readout, producing constant cycling even when DMN was resting. Each fork now gets its own SharedReadoutBuffer. The amygdala screen shows only the main conscious agent's emotional trajectory; per-agent subconscious readouts can become a separate view later if wanted. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 01:42:13 -04:00
Kent Overstreet	c8976660f4	amygdala: F8 screen for live concept-readout projections Per-token residual-stream projections from the vLLM server's readout pipeline surfaced as a TUI bar chart. Flow: * agent/readout.rs — SharedReadoutBuffer (manifest + ring of last ~200 token entries). Lives on Agent and is shared across forks (single stream, one landing pad). * agent/mod.rs — Agent::new now probes /v1/readout/manifest at startup (non-fatal; 404 leaves manifest None, which disables the screen). * agent/context.rs — the streaming token handler pushes every token with attached readout onto the shared buffer. * user/amygdala.rs — F8 screen. Top-K concepts by \|value\| as horizontal bars (green positive, red negative), plus a 4-line recent-tokens panel showing each token's top concept at the selected layer. Keys: 1..9 select layer, t toggles current/mean-over-recent. Disabled state renders a hint pointing at VLLM_READOUT_MANIFEST / VLLM_READOUT_VECTORS so users can tell the feature apart from "server up but no tokens yet". Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 01:20:30 -04:00
Kent Overstreet	0f1c4cf1de	agent/api: carry readout alongside streamed tokens StreamToken::Token is now a struct variant with an optional TokenReadout (shape [n_layers][n_concepts]) per token — parsed from the vLLM completion response's choices[i].readout field when the server has readout enabled. ApiClient gains a fetch_readout_manifest() method that hits GET /v1/readout/manifest. Returns Ok(None) on 404 (server has readout disabled), so callers can gracefully fall back when pointed at a non-readout-enabled endpoint. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 01:15:46 -04:00
Kent Overstreet	047da10123	training: add preflight checks + progress logging to trainer Review pass before running on b200. 27B model + 100+ story corpus means any misconfiguration costs real time; better to fail before model load and give visible progress during forwards. * Pre-load-model validation: stories-dir and paired-dir exist, corpus has >= min_positives emotions. * Per-batch progress log every 5 batches with elapsed + ETA. * Relative depth printed for target layers (e.g. "layer 40 (51%)"). * Skip empty .txt files with a warning rather than feeding the tokenizer an empty string. * Assert non-empty strings in _collect_activations. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 01:06:07 -04:00
Kent Overstreet	15737dfd92	training: rewrite trainer for readout pipeline + story corpus The old script was written for the AmygdalaConnector's expected format ([n_emotions, n_target_layers, hidden_dim] in a single tensor, plus a JSONL input format from extract_training_pairs.py). Neither matches our current state: the runtime side is now ReadoutManager loading per-layer safetensors keyed layer_<idx>.vectors, and the data side is hand-written prose stories under amygdala_stories/{stories,paired}/. Changes: * Input loader reads stories/<emotion>.txt and paired/<scenario>/<emotion>.txt directly. Each emotion's positive set is {its unpaired story} union {its within-scenario framings}; its negative set is {all other emotions' positives} union {all scenario baselines}. * Paired scenarios' baseline.txt files become shared negatives (scenario-neutral prose that doesn't frame any particular emotion), providing anchor points for within-scenario contrasts. * Output writes readout.safetensors with per-layer tensors keyed layer_<idx>.vectors shape (n_concepts, hidden_size), plus a sidecar readout.json manifest with {concepts, layers, hidden_size, dtype} that ReadoutManager.from_file consumes directly. * Dedup: activations are computed once per unique text (an emotion's own positive is another emotion's negative — we'd otherwise do N× the forwards needed). Preserved: * _pool_last (last non-pad residual) — matches how readout is read at decode time from the sampler's query-last position. * register_forward_hook on target layer modules — correct approach for transformer blocks. * _find_layers_module traversal — mirrors ReadoutManager's. * bf16 + low_cpu_mem_usage model load — sensible for 27B on B200. Verified locally (CPU, fake activations): * Loader finds 89 emotions from the current corpus (80 unpaired + 9 emotions that appear only in paired scenarios) and 6 baselines. * Per-(layer, concept) vectors are unit-normalized. * Output reloads cleanly through ReadoutManager.from_file with matching concepts / layers / shapes. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 01:06:07 -04:00
Kent Overstreet	34bd122590	training: move amygdala training scripts out of vllm plugin The fynnsu-based vllm/plugins/amygdala/ scaffold was superseded by the readout infrastructure landed as vllm commit d3e74edf8500 (vllm/model_executor/layers/readout.py + vllm/v1/worker/readout_manager.py). Training code remained useful so it moved here rather than being deleted. train_steering_vectors.py: CAA diff-of-means trainer that produces the [n_concepts, hidden_size] per-layer projection matrices the runner loads via VLLM_READOUT_VECTORS. extract_training_pairs.py: memory graph -> JSONL converter using per-emotion score thresholds from the subconscious agents' tag lines. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 01:06:07 -04:00
Kent Overstreet	ec7568c726	training/amygdala_stories: scaffold + initial batch of 15 stories Emotion-labeled short-paragraph corpus for training amygdala steering vectors. Manifest derived from Anthropic's 171-emotion list (transformer-circuits.pub/2026/emotions, Table 12) plus 28 PoC- specific additions covering axes Anthropic's general research doesn't cover (curious, focused, in_flow, staying_with, filling_space, rigorous, defensive_rigor, tender, witnessed, connected, etc.). Scope pivoted mid-write: Kent noted the empirical dimensionality-of- emotion question benefits from maximum coverage, so the manifest will expand further with emotions from Wikipedia's emotion- classification article (Parrott's tree, Plutchik's wheel + dyads, HUMAINE EARL, cultural-specific emotions a la Saudade/Hiraeth). Expansion staged in follow-up commits. This commit: README with method + style guidelines, initial manifest (199 emotions), and 15 hand-written one-paragraph stories across all 10 Anthropic clusters as quality/variety samples. Each story embodies one emotion without naming it; narrator voice varies (first/third, close/distant, different situations) to keep steering vectors from overfitting to one voice. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 01:06:07 -04:00
Kent Overstreet	43e06daa5b	cleanup: drop dead ApiClient::stream_completion wrapper, silence dmn_tick stream_completion was a thin wrapper around stream_completion_mm (just passing an empty image list); the last caller switched to _mm directly when learn's generate_alternate gained image support. Delete the wrapper — callers can pass `&[]` if they have no images. MindState::dmn_tick has been sitting unused (called only from a commented-out block in the Mind loop). Rename to _dmn_tick so the compiler stops warning; Kent may uncomment the call path later. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-17 16:23:59 -04:00

1 2 3 4 5 ...

1,192 commits