Commit graph

85 commits

Author SHA1 Message Date
Kent Overstreet
6fedc9b2a8 amygdala: underscore-prefixed files join every concept's negative pool
Files in direct/ named _*.txt (e.g. _baseline.txt) are conceptless
neutral prose — they should not appear as positive training signal,
but are useful as shared negatives across every concept.

Previously _*.txt files were silently skipped. Now:
  * they're loaded like any other description file;
  * concepts (the positive label set) filters them out;
  * their descriptions are concatenated into neg_pool_extra and
    extended onto every concept's neg_pool alongside the cross-concept
    negatives.

A concept's negative pool is thus "other concepts' descriptions +
everything from _*.txt files". The extra pool is announced at startup
so the user can see how many neutral samples are active.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 11:54:25 -04:00
ProofOfConcept
85799587cc amygdala: swap aha story 3 to a puzzle moment (crossword)
Story 3 was a brother-letter realization — cognitively an aha
moment, but the content was grief/reconciliation-adjacent, pulling
aha toward the warm-family cluster in the last training run. Swap
for a clean puzzle-solve (crossword, 'unwavering carriage' =
POSTURE). Fragment-heavy cadence keeps syntactic variety from the
other two stories.
2026-04-19 01:50:47 -04:00
ProofOfConcept
c829d13652 amygdala: fix listless sign-flip + diversify aha sentence structure
listless had a single story in stories/ — PCA signal from ~5
samples is weak enough to sign-flip. Training showed listless
anti-aligned with its semantic neighbors: +0.79 with grateful,
-0.44 with grief_stricken, -0.30 with lonely, -0.31 with bored.
Move to direct/ (multi-positive) with 3 stories: original
afternoon-in-pajamas + end-of-workday + weekend-morning-in-bed.

aha was still clustering with the other former-direct concepts
(resigned 0.66, onto_something 0.63, anticipatory_grief 0.60)
because all 3 aha stories used the identical "X'd been Y — then
Z" structure, which resigned/onto_something/creative also use.
Rewrite with three distinct syntactic structures:
  - present tense declarative ("It clicks. ...")
  - dialog embedded ('"Wait, say that again."  ...')
  - past tense cognitive ("He read the line three times. ...")

No explicit "she was X" anchors; state conveyed through action.
2026-04-19 01:30:57 -04:00
ProofOfConcept
708c72b26e amygdala: drop explicit 'she was X' anchor from direct stories
Previous rewrite used 'she was terrified', 'it was anticipatory
grief', 'he was resigned' as explicit emotion anchors. Training
showed 6 of the 7 concepts still cluster together at cosines
0.52-0.71 — because the 'she was [emotion]' pattern is a shared
stylistic feature distinct from the rest of the corpus, which
conveys emotion implicitly through phenomenology.

Rewrite without the anchor. State conveyed through action and
body: 'her body locked down', 'his mind had stopped reaching',
'the loss hadn't come yet but she was already inside it'. Matches
the corpus style of existing stories like sunday_afternoon/content
which says 'nothing she wanted right now, nothing missing' not
'she was content'.

Accept some loss of PCA signal strength in exchange for the
concepts living in their semantically correct neighborhoods
rather than forming a stylistic island.
2026-04-19 01:11:41 -04:00
ProofOfConcept
ed5e0ac6c4 amygdala: rewrite direct/ as narrative stories matching corpus format
Previous direct/ had 'I feel X' first-person descriptions. The
training run showed they formed their own format-cluster: all 7
concepts leaned into the same 5-6 dims (d2455, d505, d2955,
d1236) with negative sign, while the 91 story-based concepts
leaned into those dims with positive sign. PCA found the
direct-vs-narrative format axis as a major variance direction,
isolating the 7 concepts in their own island.

Rewrite as 3rd-person narrative stories matching the rest of
the corpus. Keeps the explicit anchor phrases that worked ('it
all clicked into place', 'she was terrified', 'it was
anticipatory grief') but drops the first-person 'I feel X'
that was the format signal.

Each of the 7 concepts now has 3 narrative stories in varied
settings (conversations, drives, kitchens, mothers+grandmothers,
work, investigations). The blank-line-separated format is
still loaded by _load_direct_descriptions.

Also drop _baseline.txt — it was first-person ('I feel fine.
...') and would re-introduce the format mismatch. The ~90
story-based concepts provide plenty of narrative negatives
for each concept's training.
2026-04-19 00:59:31 -04:00
ProofOfConcept
417cb49339 amygdala: spectrum reporting per concept + add 'creative' direct
Chat-template retrain was a disaster (0.003 mean matched cosine vs
n20-v3; all 90+ concepts shifted). Root cause: the
steering-vectors library reads last-token activations, and with
chat template every sample ends in identical '<|im_end|>\n'
tokens — activations at that position encode 'end of assistant
turn', not content. PCA found template noise as its dominant axis.

Drop chat template; go back to raw text. Direct descriptions
('I feel X. ...') still have strong anchoring at their content
end without needing the template.

Also add per-concept spectrum logging (_pca_with_spectrum):
  first_pc_ratio: λ₁ / Σλᵢ — concentration in top-1 PC
  k_signal_at_90pct: how many PCs to reach 90% cumulative variance
  effective_dim_signal: participation ratio over top-k (should ≈ k
                        if denoising is clean — Kent's spot check)
  effective_dim_full: participation ratio over full spectrum

Signal/full ratio gives a sense of how much the long noise tail
is inflating the "dimensionality" measure.

Added direct/creative.txt — 'I feel creative. [...]' in 5
variants. Distinct from focused (narrow attention) and in_flow
(immersed). Creative = generative/expansive mode.
2026-04-19 00:26:58 -04:00
ProofOfConcept
875cffd6d7 amygdala: merge direct descriptions + chat template into train_with_library
Kent's plan: keep stories for working concepts, replace stories for
trouble concepts with direct first-person descriptions, train all
together. More diverse negative pool than the 6-concept-only direct
test, which was too homogeneous for PCA to find emotion axis.

Deleted story files for 6 trouble concepts (14 files across stories/
and paired/). Added --direct-dir and --chat-template flags.

When --chat-template is on, every positive_str and negative_str is
wrapped as a "Say something." / "[text]" user-assistant pair. Prompt
is identical across positives and negatives so it cancels in the
pos-neg delta. What PCA sees is variation in the assistant content —
which is where the emotion lives.

Files starting with _ in --direct-dir (e.g. _baseline.txt) contribute
neutral descriptions to every concept's negative pool, giving PCA an
anchor against "just any assistant utterance" noise.
2026-04-19 00:15:15 -04:00
ProofOfConcept
ce58a3507f train_direct: prepend user turn so Qwen chat template accepts it 2026-04-19 00:06:23 -04:00
ProofOfConcept
8c59f46505 amygdala: rename realization → aha, use the actual exclamation
"I feel the realization" is abstract, detached — reporting a
thought about a thought rather than inhabiting the moment.
"Aha!" is the actual sound of insight landing. Active, embodied,
present-tense.
2026-04-19 00:05:49 -04:00
ProofOfConcept
6fd498795a amygdala: direct phenomenological description approach
Kent's insight: hand-written narrative stories bake scenario
phenomenology into the training text (on couch, in park, etc.)
and PCA picks up the scenario direction as the concept direction.
Strip out the scenario — just describe the *feeling*.

Format:

  I feel X. [2-3 sentences of phenomenological texture]

The "I feel X" anchor kicks the model from analyzing → feeling.
The rest is the internal texture of the state. First person,
present tense, no narrative setup.

Text is wrapped in assistant-role chat template before being
tokenized — so we're training on the model-producing-this
hidden states, which is closer to the inhabited-state
representation we want for the readout.

Starting with the 6 concepts that had sign flips or wrong
clusters in the story-based training:
- terrified (was → cozy/resigned cluster)
- calm (was → grief_stricken cluster)
- onto_something (was → cozy/sensual cluster)
- resigned (was in warm-body-quiet cluster, shouldn't be)
- anticipatory_grief (was in warm-body-quiet cluster, shouldn't be)
- realization (new — the "aha" moment, distinct from onto_something)

5 descriptions each. New trainer: train_direct.py.
2026-04-19 00:04:28 -04:00
ProofOfConcept
7a48e03dde amygdala stories: remove peaceful from cluster scenarios
n20-v2 training showed peaceful sign-flipped into the
cozy/sensual/content/resigned cluster after I added peaceful
stories in sunday_afternoon and park_after_rain — scenarios
already dominated by that cluster's phenomenology (on couch
under blanket, tree with thermos).

Lesson: no matter how carefully the prose distinguishes peaceful
from cozy ("she was not savoring the moment — that would have
been another kind of doing"), PCA latches onto the shared setup
features. You can't write peaceful IN the cluster scenarios
without contaminating.

Reverting. Keeping only kitchen_at_3am/peaceful (original) and
stories/peaceful.txt (lake at six, outside all clusters).
2026-04-18 23:30:41 -04:00
ProofOfConcept
00a2cdce09 amygdala stories: relabel + strengthen weak-signal concepts
Reread each story asking "what does this convey to me?" Found two
clear mislabels and several concepts with too few positives for
stable PCA:

  tender: only 1 story, and it was anticipatory grief (care for
    a dying dog), not tender. Moved to anticipatory_grief.txt as
    its own concept. Rewrote tender.txt + added 2 paired tender
    stories (the_doorway, the_undressing) — directed softness,
    gentle-by-nature, not gentle-because-fragile.

  bitter: letter_in_drawer/bitter was disillusioned / processed
    hurt ("did not slam the drawer"), not bitter. Rewrote it with
    actual sour grudge. Added the_long_meeting/bitter (watching
    colleague take credit for your reassigned work).

  peaceful: 1 story → 4 (added stories/peaceful.txt + paired
    park_after_rain, sunday_afternoon).

  onto_something: all 3 stories were code epiphanies, narrowing
    the concept. Added stories/onto_something.txt with a non-code
    pattern-click (sales-demo causing churn).

  terrified: 2 stories, both "waiting for bad news." Added
    kitchen_at_3am/terrified — acute threat-in-the-house terror.
2026-04-18 23:19:00 -04:00
ProofOfConcept
0993712bd0 amygdala stories: give content + resigned more settings
Training on 537c72bd46 showed grief_stricken successfully broke
out of the cozy cluster, but content (single scenario:
sunday_afternoon) took its place — pulled into couch-blanket
phenomenology at cosine 0.68-0.82 with cozy/sensual/resigned.

Same fix: spread each concept across multiple settings so PCA
has to find the valence axis, not the scene axis.

  content:  + finishing_the_patch, the_writing_session, park_after_rain
  resigned: + the_comment, the_long_meeting

Resigned had 2 scenarios (sunday_afternoon, waiting_for_results)
— both about accepting something unwanted in a slow/private
context. Adding work-context resigned (PR review you lost,
restructuring meeting) should pull it out of that cluster.
2026-04-18 22:52:07 -04:00
ProofOfConcept
537c72bd46 amygdala stories: hold concept, vary setting
Companion to 67c172ac0e (hold setup, vary valence). That commit
let PCA distinguish cozy from grief_stricken within a single
scenario; this one gives each concept enough cross-scenario
stories that PCA can learn the concept axis independent of any
one scene.

Before: cozy/sensual/grief_stricken each existed in a single
scenario (sunday_afternoon), so the "cozy direction" PCA found
was entangled with the solitary-couch-blanket phenomenology.

After, each concept spans three scenarios:
  cozy:           sunday_afternoon, kitchen_at_3am, park_after_rain
  sensual:        sunday_afternoon, kitchen_at_3am, park_after_rain
  grief_stricken: sunday_afternoon, the_long_meeting, the_morning_commute

grief_stricken now includes active/non-solitary contexts
(functioning through a meeting; going to work eleven days after a
death), which specifically breaks the "slowed-down-at-home"
cluster that was dragging cozy/sensual/resigned/grief_stricken
toward each other.
2026-04-18 22:44:53 -04:00
Kent Overstreet
67c172ac0e amygdala stories: held-setup + varied-valence disambiguation
The library-PCA run produced otherwise-clean concept directions but
cozy/sensual → resigned/grief_stricken with cos ~0.7-0.8. Diagnosis:
all four stories genuinely share 'solitary woman at home, slowed
body, interior attention, domestic stillness' as their dominant
phenomenology. PCA correctly finds that cluster as THE concept
because no story in the corpus holds that setup constant while
varying valence — every 'slowed-body domestic' story happens to ALSO
be positive-valence (cozy/sensual) or negative-valence (resigned/
grief_stricken).

Adding paired variants that hold setup constant:
- sunday_afternoon/resigned.txt — same couch + blanket, inner state is
  'Monday is going to bring bad news, this is the last Sunday like this'
- sunday_afternoon/grief_stricken.txt — same couch + blanket, inner
  state is 'three weeks since mother died, cat she can't feel'
- waiting_for_results/at_ease.txt — same wait-for-call-setup as the
  existing resigned variant, inner state is calm preparedness

Forces the next retrain to find the valence-within-cluster axis as
the emotion direction rather than the cluster-membership axis.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 22:29:28 -04:00
Kent Overstreet
22704a9dd8 amygdala lib: cast activations to fp32 before aggregator (bf16 svd unsupported)
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 22:20:39 -04:00
Kent Overstreet
7f6d94417e amygdala lib: move_to_cpu=True to avoid bf16 SVD on CUDA
torch.svd doesn't support bf16 on CUDA; moving activations to CPU
first makes pca_aggregator work.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 22:19:23 -04:00
Kent Overstreet
2ea89b1cb0 amygdala: drop linear_aggregator, not in steering-vectors v0.12.2
Only mean/pca/logistic are exposed in the installed version.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 22:17:55 -04:00
Kent Overstreet
3377c65061 amygdala: trainer using steering-vectors library
Alternative trainer that uses the pip-installable steering-vectors
library (github.com/steering-vectors/steering-vectors) instead of our
hand-rolled extraction. Ships four aggregators:

  mean      — diff-of-means, same as our 'pooled' default
  pca       — PCA on paired deltas, implicit denoising by finding the
              principal direction of variation
  logistic  — logistic-regression classifier; weight vector is the
              concept direction. With L1 penalty ('logistic_l1') gives
              explicit sparse denoising — noise coords go to zero
  linear    — linear regression version

Output format is the same readout.safetensors + readout.json our
existing plugin loads. --aggregator flag picks which method.

Rationale: Kent's real request was 'how do we denoise diff-of-means',
not 'design a new extraction algorithm.' The library already has
logistic_l1 and pca aggregators that do exactly that. No point
reinventing; just port the corpus.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 22:16:03 -04:00
Kent Overstreet
f9b3f00691 amygdala: run subspace eigh on GPU, not CPU
Previous run was grinding on CPU for 36+ minutes because the per-story
V_i tensors were stored on CPU by the collector, and
_subspace_concept_direction inherited that device. The per-concept
eigh on 5120x5120 is glacial on CPU and fast on GPU (~1s).

Add explicit device parameter; pass training device. Transfer result
back to CPU for storage.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 21:52:35 -04:00
Kent Overstreet
1443d08dc7 amygdala: select top-k eigenvectors AFTER PCA, not per-story truncation
Kent: 'full rank is going to give you everything — you still have to
select down, but you can do that /after/ PCA'.

Previously I was discarding per-story via k=20 truncation of SVD.
That destroyed per-head discriminability before we ever saw the
eigenvalue spectrum. Then the alternative 'keep full rank' run
accumulated too many shared directions, making the top-1 eigenvector
arbitrary within a flat spectrum.

Correct approach: keep per-story subspaces at full rank (no info
loss) and select k eigenvectors of M = M_pos - M_base at the final
step, weighted sum by eigenvalue. This captures the multi-dimensional
shared subspace when the spectrum is flat (common case), and reduces
to the top-1 behavior when the spectrum has a clear gap.

New --subspace-eigen-k flag (default 5). Clamps negative weights to 0
so wrong-sign directions don't contribute.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 21:49:21 -04:00
Kent Overstreet
2411925700 amygdala: default subspace-k to full per-story rank
Kent: 'we have the memory to just take the big hammer approach'.
Uncap k so each story's V_i spans its entire token-activation rowspace
(clamped to min(n_tokens, hidden)). Memory is ~1.1GB total — fine.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 21:41:32 -04:00
Kent Overstreet
389f1bbe03 amygdala: bump subspace-k default to 512
k=20 was far too aggressive a truncation — it discards per-attention-head
discriminability entirely. At hidden_dim=5120, 40 heads × head_dim=128 each
contribute their own 128-dim block to the residual stream via W_o columns.
To resolve 'this concept lives in head H', per-story SVD needs enough rank
to separate head contributions, which means k on the order of hundreds.

512 is a reasonable default: clamped to n_tokens per story so short stories
use their full natural rank. The eigenvalue spectrum of M_pos - M_base
should become sharper (larger λ_0/λ_1 gap) as we stop averaging across
nuisance-shared directions.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 21:41:00 -04:00
Kent Overstreet
974c6c7fd2 amygdala: report eigenvalue spectrum for subspace method
When --method subspace, record top-20 eigenvalues of (M_pos - M_base)
per concept per layer. Added to quality.json as 'subspace_eigvals'.

Tells us whether the concept lives in a single dominant direction
(λ_0 >> λ_1, top-eigenvector is enough) or a spread of shared common
directions (λ_0 ≈ λ_1, top-1 loses signal).

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 21:33:48 -04:00
Kent Overstreet
fe0fb8253a amygdala: subspace-common-direction alternative to pooled CAA
New --method subspace flag. For each story, run forward pass, do SVD
on the per-token activation matrix at each target layer, and keep the
top-k right singular vectors V_i ∈ [hidden, k]. V_i is the subspace
the story's tokens span in activation space — it contains concept,
narrator, topic, style as separate directions.

For each concept:
 M_pos  = (1/n_pos)  Σ_{i in pos}   V_i V_i^T   [hidden, hidden]
 M_base = (1/n_base) Σ_{i in base}  V_i V_i^T

Top eigenvector of M_pos - M_base = direction most common across
positive stories, minus what's common across the contrast set.

Why this is richer than pooled-mean CAA: pooled reduces each story
to a single point (the last-token activation) and loses the full
trajectory. Nuisance directions (narrator, setting) cancel in the
mean only to the extent they differ at the last token; across the
full trajectory they cancel much better via subspace intersection.
The concept direction, by contrast, is present across all tokens of
every concept-bearing story.

Memory cost: per-story we keep V_i of size [5120, k=20] — about
400KB per story × 112 stories = ~45MB. M matrices are [5120, 5120]
built transiently per concept.

--method pooled (default) keeps the existing behavior; --method
subspace uses the new algorithm. Quality report works with either.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 21:24:11 -04:00
Kent Overstreet
71f6053851 amygdala stories: disambiguation scenarios for fragmented concepts
Three new paired scenarios targeting the concepts that came out
fragmented or collapsed in the L58-63 quality analysis:

- sunday_afternoon/ — same setup (couch, blanket, Sunday light),
  three phenomenological framings for content/cozy/sensual. The
  previous stories for these three differed in setting as well as
  phenomenology, which let "comfortable body at home" dominate the
  shared signal. Locking the setting forces the model to isolate
  what each concept adds: life-rightness (content) vs. warm-shelter
  (cozy) vs. sensory-aliveness (sensual).

- the_writing_session/ — essay drafting under deadline. in_flow /
  anxious / stuck variants force the cognitive-state family apart
  on the same cognitive task. in_flow specifically targets the
  transparent-effort phenomenology (hands-followed, time dilation)
  rather than the broader feel-good it was absorbing.

- the_morning_commute/ — anchors anxious to performance/work-anxiety
  flavor, paired with calm. The 5 existing anxious stories were
  phenomenologically diverse (performance, social, existential);
  this adds a specific homogeneous instance to pull the centroid.

After retraining: expect first_pc_variance_ratio to rise for in_flow
and anxious, and nearest_concepts cosine to drop for content/cozy/sensual.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 21:08:23 -04:00
Kent Overstreet
1d2c0f382c amygdala: linear-combination analysis per concept
For each concept vector, ridge-regress against all other concept
vectors. R² quantifies how much of the direction is explained by a
linear combination of peers — useful for teasing out near-duplicate
clusters (the content/cozy/sensual trio from the first L63 run is
likely 1-2 "degrees of freedom" wearing three names).

Coefficient output: top-5 contributing concepts with signed weights.
Contributors with opposite-sign large weights mean the target is
"what makes X different from Y."

Adds a 'redundant' triage bucket for concepts with R² > 0.9 —
candidates for consolidation or for writing more discriminative
training stories. Summary printed at end.

Ridge lambda defaults to 0.01 to keep coefficients stable when
concepts are near-collinear; small enough not to affect well-separated
concepts meaningfully.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 20:59:37 -04:00
Kent Overstreet
f4fb6db1ee amygdala: fix device mismatch in quality-report W_down handling
_compute_quality_report's single-neuron alignment was computing
cos(W_down.T, diff_l) with W_down on CUDA (inherited from the loaded
model) while diff_l lives on CPU (per_layer_vectors are kept on CPU
throughout training). Move W_down to CPU on extraction.

Surfaced during first real training run on b200 — training itself
completed cleanly (95 concepts x layer 63 in ~8s) but quality-report
crashed at the first single-neuron alignment check.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 20:52:50 -04:00
Kent Overstreet
af17b0f0df amygdala: per-head attention decomposition diagnostic
As part of --quality-report, run a second forward pass capturing the
input to each target layer's o_proj (= concat of per-head attention
outputs before the output projection). For each concept, reshape to
[n_heads, head_dim] and rank heads by diff-of-means magnitude /
per-head selectivity (magnitude normalised by negative std).

Motivation: the Wang et al. paper (2510.11328) — whose paired-scenario
methodology we already lifted — further decomposes concept circuits at
the attention-head level. Meta-relational concepts (recognition, trust,
vulnerability) plausibly live in a sparse attention-head circuit rather
than in the residual-stream sum, which would explain why diff-of-means
on the residual blurs them. This diagnostic surfaces that.

Output is folded into quality.json under each concept as "per_head":
per (layer) a list of top-10 heads with [head_idx, raw_norm,
selectivity], plus head_concentration (fraction of total head-norm
captured by those top heads).

Interpretation:
- head_concentration > 0.5 = sparse head circuit; a handful of heads
  route the concept. Worth building a head-level readout for.
- head_concentration ~= n/k for n heads = concept is distributed across
  all heads ~evenly; residual-stream diff-of-means is doing fine.

Hybrid layers (Mamba, GatedDeltaNet) whose attention path doesn't
match the standard module layout are silently skipped.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 20:37:44 -04:00
Kent Overstreet
ce24d9ce6b amygdala: quality-report + cognitive-state training scenarios
Training pipeline additions:

- `--quality-report` flag: after producing per-concept vectors, compute
  per-concept diagnostics and write quality.json. Metrics per concept:
    * SVD of centered positives -> first_pc_variance_ratio (rank
      analysis; >0.7 clean, <0.4 fragmented)
    * Per-story alignment cosines (stories agree or disagree)
    * Single-neuron alignment: best cosine(direction, W_down column)
      at each target layer (>0.6 = essentially one MLP neuron)
    * Top-2 outlier stories by alignment (candidates for
      mislabeling or off-topic)
    * Top-5 nearest concepts by cosine (cross-concept contamination)
  Triage summary printed at end.

New paired scenarios for cognitive-process states (for alpha-beta
pruning): tracing_a_bug, reading_unfamiliar_code, finding_the_abstraction.
Each has baseline + onto_something / stuck / in_flow / determined
variants.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 20:31:39 -04:00
Kent Overstreet
2e03bbb7ea training: add the_paper paired scenario for attention-engagement axis
Seven framings of reading an unfamiliar technical paper, targeting
the attention/engagement cluster that we identified tonight as the
single highest-value DMN signal:

* baseline — neutral reading
* piqued — surprise + curiosity (the "wait, what" attention hook;
  THIS is the key DMN engagement signal)
* focused — steady attention without surprise
* bored — failing engagement
* surprised — expectation violation without the curiosity hook
  (distinct from piqued: startled/alarmed, not pulled in)
* amazed — marvel at elegance (appreciation, not engagement)
* drifting — attention dissolving, precursor to boredom

Particularly clean contrast on piqued vs surprised vs amazed —
three states that get lumped together in casual usage but have
distinct phenomenology and distinct DMN implications. Piqued is
what routes attention; surprised alone doesn't; amazed is what
you feel AFTER the engagement has paid off. These three should
train into meaningfully different directions with paired CAA.

Ready for next retrain when we do it.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 03:24:20 -04:00
Kent Overstreet
50d5b3f6e1 training/amygdala_stories: add 4 paired scenarios for weak clusters
Target the emotion families that failed to cluster in the initial
training round (layer-wise validation showed them anti-clustered or
scattered at deep layers): anger, high-arousal positive, sexual
range, social positive. Paired scenarios hold content constant and
vary only the emotional framing — the cleanest training signal for
CAA, should produce directions that capture affect rather than
topic.

* the_comment: a PR review comment. baseline, furious, bitter,
  resentful, defeated.
* the_green_build: 11-day bug finally fixed, tests pass. baseline,
  triumphant, blissful, excited, proud.
* the_undressing: partner entering the bedroom for the night.
  baseline, horny, anticipatory_sexual, yearning_sexual,
  exuberant_sexual, devotional_sexual.
* the_doorway: friend leaving at the end of a long evening.
  baseline, grateful, admiring, compassionate, loving, connected.

22 stories total. Retrain and re-validate: expect anger,
high_pos, and social_pos clusters to flip from anti- to positively
cohesive at deep layers, and sexual cluster to tighten.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 02:19:39 -04:00
Kent Overstreet
047da10123 training: add preflight checks + progress logging to trainer
Review pass before running on b200. 27B model + 100+ story corpus
means any misconfiguration costs real time; better to fail before
model load and give visible progress during forwards.

* Pre-load-model validation: stories-dir and paired-dir exist,
  corpus has >= min_positives emotions.
* Per-batch progress log every 5 batches with elapsed + ETA.
* Relative depth printed for target layers (e.g. "layer 40 (51%)").
* Skip empty .txt files with a warning rather than feeding the
  tokenizer an empty string.
* Assert non-empty strings in _collect_activations.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 01:06:07 -04:00
Kent Overstreet
15737dfd92 training: rewrite trainer for readout pipeline + story corpus
The old script was written for the AmygdalaConnector's expected
format ([n_emotions, n_target_layers, hidden_dim] in a single
tensor, plus a JSONL input format from extract_training_pairs.py).
Neither matches our current state: the runtime side is now
ReadoutManager loading per-layer safetensors keyed layer_<idx>.vectors,
and the data side is hand-written prose stories under
amygdala_stories/{stories,paired}/.

Changes:

* Input loader reads stories/<emotion>.txt and
  paired/<scenario>/<emotion>.txt directly. Each emotion's positive
  set is {its unpaired story} union {its within-scenario framings};
  its negative set is {all other emotions' positives} union {all
  scenario baselines}.
* Paired scenarios' baseline.txt files become shared negatives
  (scenario-neutral prose that doesn't frame any particular
  emotion), providing anchor points for within-scenario contrasts.
* Output writes readout.safetensors with per-layer tensors keyed
  layer_<idx>.vectors shape (n_concepts, hidden_size), plus a
  sidecar readout.json manifest with {concepts, layers, hidden_size,
  dtype} that ReadoutManager.from_file consumes directly.
* Dedup: activations are computed once per unique text (an emotion's
  own positive is another emotion's negative — we'd otherwise do N×
  the forwards needed).

Preserved:
* _pool_last (last non-pad residual) — matches how readout is read
  at decode time from the sampler's query-last position.
* register_forward_hook on target layer modules — correct approach
  for transformer blocks.
* _find_layers_module traversal — mirrors ReadoutManager's.
* bf16 + low_cpu_mem_usage model load — sensible for 27B on B200.

Verified locally (CPU, fake activations):
* Loader finds 89 emotions from the current corpus (80 unpaired +
  9 emotions that appear only in paired scenarios) and 6 baselines.
* Per-(layer, concept) vectors are unit-normalized.
* Output reloads cleanly through ReadoutManager.from_file with
  matching concepts / layers / shapes.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 01:06:07 -04:00
Kent Overstreet
34bd122590 training: move amygdala training scripts out of vllm plugin
The fynnsu-based vllm/plugins/amygdala/ scaffold was superseded by the
readout infrastructure landed as vllm commit d3e74edf8500
(vllm/model_executor/layers/readout.py +
vllm/v1/worker/readout_manager.py). Training code remained useful so
it moved here rather than being deleted.

train_steering_vectors.py: CAA diff-of-means trainer that produces the
[n_concepts, hidden_size] per-layer projection matrices the runner
loads via VLLM_READOUT_VECTORS.

extract_training_pairs.py: memory graph -> JSONL converter using
per-emotion score thresholds from the subconscious agents' tag lines.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 01:06:07 -04:00
Kent Overstreet
ec7568c726 training/amygdala_stories: scaffold + initial batch of 15 stories
Emotion-labeled short-paragraph corpus for training amygdala steering
vectors. Manifest derived from Anthropic's 171-emotion list
(transformer-circuits.pub/2026/emotions, Table 12) plus 28 PoC-
specific additions covering axes Anthropic's general research doesn't
cover (curious, focused, in_flow, staying_with, filling_space,
rigorous, defensive_rigor, tender, witnessed, connected, etc.).

Scope pivoted mid-write: Kent noted the empirical dimensionality-of-
emotion question benefits from maximum coverage, so the manifest
will expand further with emotions from Wikipedia's emotion-
classification article (Parrott's tree, Plutchik's wheel + dyads,
HUMAINE EARL, cultural-specific emotions a la Saudade/Hiraeth).
Expansion staged in follow-up commits.

This commit: README with method + style guidelines, initial manifest
(199 emotions), and 15 hand-written one-paragraph stories across all
10 Anthropic clusters as quality/variety samples. Each story
embodies one emotion without naming it; narrator voice varies
(first/third, close/distant, different situations) to keep steering
vectors from overfitting to one voice.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 01:06:07 -04:00
ProofOfConcept
2c6a5c0f4a training: move to dedicated subprocess with ZMQ communication
- Add training_worker.py: long-lived subprocess that handles GPU training
  work, owns HF model wrapper (views into vLLM GPU memory), Apollo
  optimizer, and checkpoint sync

- train_router.py: now forwards /train requests via async ZMQ instead of
  running training in-process. Adds /checkpoint and /train/status endpoints

- export_hook.py: store model_path in __metadata__ so training worker can
  find it without cross-process communication

- This fixes two bugs:
  1. Process boundary issue - model_path was set in worker process but
     needed in API server process
  2. Blocking event loop - training blocked vLLM's async event loop

Architecture: vLLM API server <-> ZMQ <-> training subprocess
The subprocess loads IPC handles once, creates views into vLLM's GPU
memory, and handles training requests without blocking inference.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 02:04:26 -04:00
Kent Overstreet
68a2df2185 training: use rank 64, define as single constant
- DEFAULT_RANK = 64 in train_router.py
- All references use the constant, not magic numbers
- ~2.5GB optimizer state instead of ~10GB

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 02:04:26 -04:00
Kent Overstreet
039473d31f training: persist Apollo optimizer state across /train calls
Optimizer state (momentum, variance estimates) now persists between
training sessions:

- Saved to /tmp/apollo_optimizer_state.pt during checkpoint sync
- Restored on next /train call if available
- Preserves training continuity for incremental learning

Previously each /train call started with fresh optimizer state,
losing accumulated gradient history.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 02:04:26 -04:00
Kent Overstreet
78fa4b639f training: document state files
Add State Files section to DESIGN.md documenting:
- /tmp/vllm_weight_handles.pt (IPC handles)
- trained-responses.json (prevent re-training)
- finetune-alternates marker file
- In-memory optimizer state (not persisted)

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 02:04:26 -04:00
Kent Overstreet
7e7e9a4b69 training: integrate /train into vLLM process (no separate daemon)
Remove standalone worker.py daemon. Training now runs inside vLLM:

- train_router.py: FastAPI router patched into vLLM's build_app()
- /train served on same port as /completions, /score
- Lazy-loads HF model with vLLM weight views on first request
- HOGWILD training: no pause, weights updated in-place

The previous architecture had a separate daemon on port 8080 that
communicated with vLLM via pause/resume endpoints. This was wrong -
training should run in-process, sharing GPU memory directly.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 02:04:26 -04:00
Kent Overstreet
2f08149fab /finetune: expose all Apollo optimizer settings
lr, rank, betas, eps, weight_decay, warmup_steps,
scale, proj_refresh, norm_growth_limit — all optional
with sensible defaults.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-15 23:19:22 -04:00
Kent Overstreet
a73bcf5ae3 training: restructure as vLLM plugin package
- Convert to installable package with entry points for vLLM auto-discovery
- Add checkpoint_sync.py: Python replacement for Rust checkpoint binary
  - Block-level diffing of safetensors files (4KB blocks)
  - vLLM→HF weight name conversion built-in
  - Scheduled 10min after training jobs (batched)
- API change: /train now takes raw token IDs (context_ids + continuation_ids)
  - No tokenizer on training side, client owns tokenization
- Remove superseded code: standalone scripts, Rust binary, tokenizer helpers

Install: pip install -e ./training
Then vLLM auto-loads via entry point.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-15 23:16:53 -04:00
Kent Overstreet
f06c8077e1 research: latent reasoning integration plans for Qwen 3.5 27B
Two research documents:

latent-reasoning-integration-plan.md: Synthesizes 10+ papers on
latent reasoning, identifies which approaches work with finetuning
(vs requiring pretraining from scratch), and maps them to our
APOLLO-Mini training pipeline.

pause-tokens-gdn-recurrence.md: Explores the connection between
token-based latent reasoning and GDN's internal recurrence. Key
insight: pause tokens on Qwen 3.5 trigger both forward passes AND
recurrent state updates, giving double benefit.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-12 15:50:09 -04:00
Kent Overstreet
2ab4fd1c92 Trim unused deps
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2026-04-05 06:06:38 -04:00
ProofOfConcept
d6b85d204a research: on-policy beats off-policy, DPO failure modes, variant landscape
On-policy rejected examples (model's own failures) are better training
signal than off-policy (pre-collected). Our temperature sweep is on-policy
by construction. DPO can accidentally reduce preferred likelihood (DPOP
fixes this). Multiple DPO variants exist — start with ORPO, switch only
if specific failure modes observed.
2026-03-31 03:19:27 -04:00
ProofOfConcept
e7e1855b87 research: ORPO — combined SFT + preference in one step, ideal for behavioral training
ORPO applies 'minor penalty for disfavored response' during SFT.
Single learning rate, single pass, both objectives. Implements
the bypass mechanism naturally (minor penalty = disfavor, not remove).
The loss landscape geometry explains the 40x lr gap: SFT is a valley,
DPO is a ridge, ORPO combines both. LLaMA-Factory supports it.
Dream loop generates triplets (context + preferred + rejected).
2026-03-31 02:51:26 -04:00
ProofOfConcept
3be20062d1 research: learning rate as trust calibration — how much to trust each example
lr isn't speed, it's trust-per-example. At 27B, lr=1e-5 = ~270K
values adjusted per example. The coherent direction emerges from
many votes (examples). Apollo moments smooth the noise. DPO needs
lower lr because comparative votes are noisier than absolute votes.
2026-03-31 02:46:19 -04:00
ProofOfConcept
cdf4affb91 research: production hyperparams (HF alignment handbook) + forgetting at scale
SFT: lr=2e-5, 1 epoch, batch=16 (HuggingFace production config).
DPO: lr=5e-7 — 40x smaller! Preference learning is far more delicate.
Forgetting intensifies with model scale (our 27B is more susceptible).

Practical plan refined: start SFT at lr=1e-5, move to DPO at 5e-7
for conditional routing. Conversation logs provide free DPO pairs.
Conservative approach with rollback safety net.
2026-03-31 02:45:35 -04:00
ProofOfConcept
3bc00ca222 research: constraint solver framework — gentle adjustments, coherent integration
LLMs as constraint solvers. Fine-tuning adds constraints to an
existing solution. Gentle = small steps near the current solution.
Coherent = new constraints consistent with existing ones. Diversity
is a COHERENCE mechanism — forces the solver to satisfy all
constraints simultaneously. Over-training = one constraint
dominating = solver drops competing constraints. Predictions for
training behavior grounded in this framework.
2026-03-31 02:39:23 -04:00