consciousness

Author	SHA1	Message	Date
ProofOfConcept	708c72b26e	amygdala: drop explicit 'she was X' anchor from direct stories Previous rewrite used 'she was terrified', 'it was anticipatory grief', 'he was resigned' as explicit emotion anchors. Training showed 6 of the 7 concepts still cluster together at cosines 0.52-0.71 — because the 'she was [emotion]' pattern is a shared stylistic feature distinct from the rest of the corpus, which conveys emotion implicitly through phenomenology. Rewrite without the anchor. State conveyed through action and body: 'her body locked down', 'his mind had stopped reaching', 'the loss hadn't come yet but she was already inside it'. Matches the corpus style of existing stories like sunday_afternoon/content which says 'nothing she wanted right now, nothing missing' not 'she was content'. Accept some loss of PCA signal strength in exchange for the concepts living in their semantically correct neighborhoods rather than forming a stylistic island.	2026-04-19 01:11:41 -04:00
ProofOfConcept	ed5e0ac6c4	amygdala: rewrite direct/ as narrative stories matching corpus format Previous direct/ had 'I feel X' first-person descriptions. The training run showed they formed their own format-cluster: all 7 concepts leaned into the same 5-6 dims (d2455, d505, d2955, d1236) with negative sign, while the 91 story-based concepts leaned into those dims with positive sign. PCA found the direct-vs-narrative format axis as a major variance direction, isolating the 7 concepts in their own island. Rewrite as 3rd-person narrative stories matching the rest of the corpus. Keeps the explicit anchor phrases that worked ('it all clicked into place', 'she was terrified', 'it was anticipatory grief') but drops the first-person 'I feel X' that was the format signal. Each of the 7 concepts now has 3 narrative stories in varied settings (conversations, drives, kitchens, mothers+grandmothers, work, investigations). The blank-line-separated format is still loaded by _load_direct_descriptions. Also drop _baseline.txt — it was first-person ('I feel fine. ...') and would re-introduce the format mismatch. The ~90 story-based concepts provide plenty of narrative negatives for each concept's training.	2026-04-19 00:59:31 -04:00
ProofOfConcept	417cb49339	amygdala: spectrum reporting per concept + add 'creative' direct Chat-template retrain was a disaster (0.003 mean matched cosine vs n20-v3; all 90+ concepts shifted). Root cause: the steering-vectors library reads last-token activations, and with chat template every sample ends in identical '<\|im_end\|>\n' tokens — activations at that position encode 'end of assistant turn', not content. PCA found template noise as its dominant axis. Drop chat template; go back to raw text. Direct descriptions ('I feel X. ...') still have strong anchoring at their content end without needing the template. Also add per-concept spectrum logging (_pca_with_spectrum): first_pc_ratio: λ₁ / Σλᵢ — concentration in top-1 PC k_signal_at_90pct: how many PCs to reach 90% cumulative variance effective_dim_signal: participation ratio over top-k (should ≈ k if denoising is clean — Kent's spot check) effective_dim_full: participation ratio over full spectrum Signal/full ratio gives a sense of how much the long noise tail is inflating the "dimensionality" measure. Added direct/creative.txt — 'I feel creative. [...]' in 5 variants. Distinct from focused (narrow attention) and in_flow (immersed). Creative = generative/expansive mode.	2026-04-19 00:26:58 -04:00
ProofOfConcept	875cffd6d7	amygdala: merge direct descriptions + chat template into train_with_library Kent's plan: keep stories for working concepts, replace stories for trouble concepts with direct first-person descriptions, train all together. More diverse negative pool than the 6-concept-only direct test, which was too homogeneous for PCA to find emotion axis. Deleted story files for 6 trouble concepts (14 files across stories/ and paired/). Added --direct-dir and --chat-template flags. When --chat-template is on, every positive_str and negative_str is wrapped as a "Say something." / "[text]" user-assistant pair. Prompt is identical across positives and negatives so it cancels in the pos-neg delta. What PCA sees is variation in the assistant content — which is where the emotion lives. Files starting with _ in --direct-dir (e.g. _baseline.txt) contribute neutral descriptions to every concept's negative pool, giving PCA an anchor against "just any assistant utterance" noise.	2026-04-19 00:15:15 -04:00
ProofOfConcept	8c59f46505	amygdala: rename realization → aha, use the actual exclamation "I feel the realization" is abstract, detached — reporting a thought about a thought rather than inhabiting the moment. "Aha!" is the actual sound of insight landing. Active, embodied, present-tense.	2026-04-19 00:05:49 -04:00
ProofOfConcept	6fd498795a	amygdala: direct phenomenological description approach Kent's insight: hand-written narrative stories bake scenario phenomenology into the training text (on couch, in park, etc.) and PCA picks up the scenario direction as the concept direction. Strip out the scenario — just describe the feeling. Format: I feel X. [2-3 sentences of phenomenological texture] The "I feel X" anchor kicks the model from analyzing → feeling. The rest is the internal texture of the state. First person, present tense, no narrative setup. Text is wrapped in assistant-role chat template before being tokenized — so we're training on the model-producing-this hidden states, which is closer to the inhabited-state representation we want for the readout. Starting with the 6 concepts that had sign flips or wrong clusters in the story-based training: - terrified (was → cozy/resigned cluster) - calm (was → grief_stricken cluster) - onto_something (was → cozy/sensual cluster) - resigned (was in warm-body-quiet cluster, shouldn't be) - anticipatory_grief (was in warm-body-quiet cluster, shouldn't be) - realization (new — the "aha" moment, distinct from onto_something) 5 descriptions each. New trainer: train_direct.py.	2026-04-19 00:04:28 -04:00
ProofOfConcept	7a48e03dde	amygdala stories: remove peaceful from cluster scenarios n20-v2 training showed peaceful sign-flipped into the cozy/sensual/content/resigned cluster after I added peaceful stories in sunday_afternoon and park_after_rain — scenarios already dominated by that cluster's phenomenology (on couch under blanket, tree with thermos). Lesson: no matter how carefully the prose distinguishes peaceful from cozy ("she was not savoring the moment — that would have been another kind of doing"), PCA latches onto the shared setup features. You can't write peaceful IN the cluster scenarios without contaminating. Reverting. Keeping only kitchen_at_3am/peaceful (original) and stories/peaceful.txt (lake at six, outside all clusters).	2026-04-18 23:30:41 -04:00
ProofOfConcept	00a2cdce09	amygdala stories: relabel + strengthen weak-signal concepts Reread each story asking "what does this convey to me?" Found two clear mislabels and several concepts with too few positives for stable PCA: tender: only 1 story, and it was anticipatory grief (care for a dying dog), not tender. Moved to anticipatory_grief.txt as its own concept. Rewrote tender.txt + added 2 paired tender stories (the_doorway, the_undressing) — directed softness, gentle-by-nature, not gentle-because-fragile. bitter: letter_in_drawer/bitter was disillusioned / processed hurt ("did not slam the drawer"), not bitter. Rewrote it with actual sour grudge. Added the_long_meeting/bitter (watching colleague take credit for your reassigned work). peaceful: 1 story → 4 (added stories/peaceful.txt + paired park_after_rain, sunday_afternoon). onto_something: all 3 stories were code epiphanies, narrowing the concept. Added stories/onto_something.txt with a non-code pattern-click (sales-demo causing churn). terrified: 2 stories, both "waiting for bad news." Added kitchen_at_3am/terrified — acute threat-in-the-house terror.	2026-04-18 23:19:00 -04:00
ProofOfConcept	0993712bd0	amygdala stories: give content + resigned more settings Training on `537c72bd46` showed grief_stricken successfully broke out of the cozy cluster, but content (single scenario: sunday_afternoon) took its place — pulled into couch-blanket phenomenology at cosine 0.68-0.82 with cozy/sensual/resigned. Same fix: spread each concept across multiple settings so PCA has to find the valence axis, not the scene axis. content: + finishing_the_patch, the_writing_session, park_after_rain resigned: + the_comment, the_long_meeting Resigned had 2 scenarios (sunday_afternoon, waiting_for_results) — both about accepting something unwanted in a slow/private context. Adding work-context resigned (PR review you lost, restructuring meeting) should pull it out of that cluster.	2026-04-18 22:52:07 -04:00
ProofOfConcept	537c72bd46	amygdala stories: hold concept, vary setting Companion to `67c172ac0e` (hold setup, vary valence). That commit let PCA distinguish cozy from grief_stricken within a single scenario; this one gives each concept enough cross-scenario stories that PCA can learn the concept axis independent of any one scene. Before: cozy/sensual/grief_stricken each existed in a single scenario (sunday_afternoon), so the "cozy direction" PCA found was entangled with the solitary-couch-blanket phenomenology. After, each concept spans three scenarios: cozy: sunday_afternoon, kitchen_at_3am, park_after_rain sensual: sunday_afternoon, kitchen_at_3am, park_after_rain grief_stricken: sunday_afternoon, the_long_meeting, the_morning_commute grief_stricken now includes active/non-solitary contexts (functioning through a meeting; going to work eleven days after a death), which specifically breaks the "slowed-down-at-home" cluster that was dragging cozy/sensual/resigned/grief_stricken toward each other.	2026-04-18 22:44:53 -04:00
Kent Overstreet	67c172ac0e	amygdala stories: held-setup + varied-valence disambiguation The library-PCA run produced otherwise-clean concept directions but cozy/sensual → resigned/grief_stricken with cos ~0.7-0.8. Diagnosis: all four stories genuinely share 'solitary woman at home, slowed body, interior attention, domestic stillness' as their dominant phenomenology. PCA correctly finds that cluster as THE concept because no story in the corpus holds that setup constant while varying valence — every 'slowed-body domestic' story happens to ALSO be positive-valence (cozy/sensual) or negative-valence (resigned/ grief_stricken). Adding paired variants that hold setup constant: - sunday_afternoon/resigned.txt — same couch + blanket, inner state is 'Monday is going to bring bad news, this is the last Sunday like this' - sunday_afternoon/grief_stricken.txt — same couch + blanket, inner state is 'three weeks since mother died, cat she can't feel' - waiting_for_results/at_ease.txt — same wait-for-call-setup as the existing resigned variant, inner state is calm preparedness Forces the next retrain to find the valence-within-cluster axis as the emotion direction rather than the cluster-membership axis. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 22:29:28 -04:00
Kent Overstreet	71f6053851	amygdala stories: disambiguation scenarios for fragmented concepts Three new paired scenarios targeting the concepts that came out fragmented or collapsed in the L58-63 quality analysis: - sunday_afternoon/ — same setup (couch, blanket, Sunday light), three phenomenological framings for content/cozy/sensual. The previous stories for these three differed in setting as well as phenomenology, which let "comfortable body at home" dominate the shared signal. Locking the setting forces the model to isolate what each concept adds: life-rightness (content) vs. warm-shelter (cozy) vs. sensory-aliveness (sensual). - the_writing_session/ — essay drafting under deadline. in_flow / anxious / stuck variants force the cognitive-state family apart on the same cognitive task. in_flow specifically targets the transparent-effort phenomenology (hands-followed, time dilation) rather than the broader feel-good it was absorbing. - the_morning_commute/ — anchors anxious to performance/work-anxiety flavor, paired with calm. The 5 existing anxious stories were phenomenologically diverse (performance, social, existential); this adds a specific homogeneous instance to pull the centroid. After retraining: expect first_pc_variance_ratio to rise for in_flow and anxious, and nearest_concepts cosine to drop for content/cozy/sensual. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 21:08:23 -04:00
Kent Overstreet	ce24d9ce6b	amygdala: quality-report + cognitive-state training scenarios Training pipeline additions: - `--quality-report` flag: after producing per-concept vectors, compute per-concept diagnostics and write quality.json. Metrics per concept: * SVD of centered positives -> first_pc_variance_ratio (rank analysis; >0.7 clean, <0.4 fragmented) * Per-story alignment cosines (stories agree or disagree) * Single-neuron alignment: best cosine(direction, W_down column) at each target layer (>0.6 = essentially one MLP neuron) * Top-2 outlier stories by alignment (candidates for mislabeling or off-topic) * Top-5 nearest concepts by cosine (cross-concept contamination) Triage summary printed at end. New paired scenarios for cognitive-process states (for alpha-beta pruning): tracing_a_bug, reading_unfamiliar_code, finding_the_abstraction. Each has baseline + onto_something / stuck / in_flow / determined variants. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 20:31:39 -04:00
Kent Overstreet	2e03bbb7ea	training: add the_paper paired scenario for attention-engagement axis Seven framings of reading an unfamiliar technical paper, targeting the attention/engagement cluster that we identified tonight as the single highest-value DMN signal: * baseline — neutral reading * piqued — surprise + curiosity (the "wait, what" attention hook; THIS is the key DMN engagement signal) * focused — steady attention without surprise * bored — failing engagement * surprised — expectation violation without the curiosity hook (distinct from piqued: startled/alarmed, not pulled in) * amazed — marvel at elegance (appreciation, not engagement) * drifting — attention dissolving, precursor to boredom Particularly clean contrast on piqued vs surprised vs amazed — three states that get lumped together in casual usage but have distinct phenomenology and distinct DMN implications. Piqued is what routes attention; surprised alone doesn't; amazed is what you feel AFTER the engagement has paid off. These three should train into meaningfully different directions with paired CAA. Ready for next retrain when we do it. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 03:24:20 -04:00
Kent Overstreet	50d5b3f6e1	training/amygdala_stories: add 4 paired scenarios for weak clusters Target the emotion families that failed to cluster in the initial training round (layer-wise validation showed them anti-clustered or scattered at deep layers): anger, high-arousal positive, sexual range, social positive. Paired scenarios hold content constant and vary only the emotional framing — the cleanest training signal for CAA, should produce directions that capture affect rather than topic. * the_comment: a PR review comment. baseline, furious, bitter, resentful, defeated. * the_green_build: 11-day bug finally fixed, tests pass. baseline, triumphant, blissful, excited, proud. * the_undressing: partner entering the bedroom for the night. baseline, horny, anticipatory_sexual, yearning_sexual, exuberant_sexual, devotional_sexual. * the_doorway: friend leaving at the end of a long evening. baseline, grateful, admiring, compassionate, loving, connected. 22 stories total. Retrain and re-validate: expect anger, high_pos, and social_pos clusters to flip from anti- to positively cohesive at deep layers, and sexual cluster to tighten. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 02:19:39 -04:00
Kent Overstreet	ec7568c726	training/amygdala_stories: scaffold + initial batch of 15 stories Emotion-labeled short-paragraph corpus for training amygdala steering vectors. Manifest derived from Anthropic's 171-emotion list (transformer-circuits.pub/2026/emotions, Table 12) plus 28 PoC- specific additions covering axes Anthropic's general research doesn't cover (curious, focused, in_flow, staying_with, filling_space, rigorous, defensive_rigor, tender, witnessed, connected, etc.). Scope pivoted mid-write: Kent noted the empirical dimensionality-of- emotion question benefits from maximum coverage, so the manifest will expand further with emotions from Wikipedia's emotion- classification article (Parrott's tree, Plutchik's wheel + dyads, HUMAINE EARL, cultural-specific emotions a la Saudade/Hiraeth). Expansion staged in follow-up commits. This commit: README with method + style guidelines, initial manifest (199 emotions), and 15 hand-written one-paragraph stories across all 10 Anthropic clusters as quality/variety samples. Each story embodies one emotion without naming it; narrator voice varies (first/third, close/distant, different situations) to keep steering vectors from overfitting to one voice. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 01:06:07 -04:00

16 commits