Commit graph

6 commits

Author SHA1 Message Date
ProofOfConcept
417cb49339 amygdala: spectrum reporting per concept + add 'creative' direct
Chat-template retrain was a disaster (0.003 mean matched cosine vs
n20-v3; all 90+ concepts shifted). Root cause: the
steering-vectors library reads last-token activations, and with
chat template every sample ends in identical '<|im_end|>\n'
tokens — activations at that position encode 'end of assistant
turn', not content. PCA found template noise as its dominant axis.

Drop chat template; go back to raw text. Direct descriptions
('I feel X. ...') still have strong anchoring at their content
end without needing the template.

Also add per-concept spectrum logging (_pca_with_spectrum):
  first_pc_ratio: λ₁ / Σλᵢ — concentration in top-1 PC
  k_signal_at_90pct: how many PCs to reach 90% cumulative variance
  effective_dim_signal: participation ratio over top-k (should ≈ k
                        if denoising is clean — Kent's spot check)
  effective_dim_full: participation ratio over full spectrum

Signal/full ratio gives a sense of how much the long noise tail
is inflating the "dimensionality" measure.

Added direct/creative.txt — 'I feel creative. [...]' in 5
variants. Distinct from focused (narrow attention) and in_flow
(immersed). Creative = generative/expansive mode.
2026-04-19 00:26:58 -04:00
ProofOfConcept
875cffd6d7 amygdala: merge direct descriptions + chat template into train_with_library
Kent's plan: keep stories for working concepts, replace stories for
trouble concepts with direct first-person descriptions, train all
together. More diverse negative pool than the 6-concept-only direct
test, which was too homogeneous for PCA to find emotion axis.

Deleted story files for 6 trouble concepts (14 files across stories/
and paired/). Added --direct-dir and --chat-template flags.

When --chat-template is on, every positive_str and negative_str is
wrapped as a "Say something." / "[text]" user-assistant pair. Prompt
is identical across positives and negatives so it cancels in the
pos-neg delta. What PCA sees is variation in the assistant content —
which is where the emotion lives.

Files starting with _ in --direct-dir (e.g. _baseline.txt) contribute
neutral descriptions to every concept's negative pool, giving PCA an
anchor against "just any assistant utterance" noise.
2026-04-19 00:15:15 -04:00
Kent Overstreet
22704a9dd8 amygdala lib: cast activations to fp32 before aggregator (bf16 svd unsupported)
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 22:20:39 -04:00
Kent Overstreet
7f6d94417e amygdala lib: move_to_cpu=True to avoid bf16 SVD on CUDA
torch.svd doesn't support bf16 on CUDA; moving activations to CPU
first makes pca_aggregator work.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 22:19:23 -04:00
Kent Overstreet
2ea89b1cb0 amygdala: drop linear_aggregator, not in steering-vectors v0.12.2
Only mean/pca/logistic are exposed in the installed version.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 22:17:55 -04:00
Kent Overstreet
3377c65061 amygdala: trainer using steering-vectors library
Alternative trainer that uses the pip-installable steering-vectors
library (github.com/steering-vectors/steering-vectors) instead of our
hand-rolled extraction. Ships four aggregators:

  mean      — diff-of-means, same as our 'pooled' default
  pca       — PCA on paired deltas, implicit denoising by finding the
              principal direction of variation
  logistic  — logistic-regression classifier; weight vector is the
              concept direction. With L1 penalty ('logistic_l1') gives
              explicit sparse denoising — noise coords go to zero
  linear    — linear regression version

Output format is the same readout.safetensors + readout.json our
existing plugin loads. --aggregator flag picks which method.

Rationale: Kent's real request was 'how do we denoise diff-of-means',
not 'design a new extraction algorithm.' The library already has
logistic_l1 and pca aggregators that do exactly that. No point
reinventing; just port the corpus.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 22:16:03 -04:00