consciousness

Author	SHA1	Message	Date
ProofOfConcept	417cb49339	amygdala: spectrum reporting per concept + add 'creative' direct Chat-template retrain was a disaster (0.003 mean matched cosine vs n20-v3; all 90+ concepts shifted). Root cause: the steering-vectors library reads last-token activations, and with chat template every sample ends in identical '<\|im_end\|>\n' tokens — activations at that position encode 'end of assistant turn', not content. PCA found template noise as its dominant axis. Drop chat template; go back to raw text. Direct descriptions ('I feel X. ...') still have strong anchoring at their content end without needing the template. Also add per-concept spectrum logging (_pca_with_spectrum): first_pc_ratio: λ₁ / Σλᵢ — concentration in top-1 PC k_signal_at_90pct: how many PCs to reach 90% cumulative variance effective_dim_signal: participation ratio over top-k (should ≈ k if denoising is clean — Kent's spot check) effective_dim_full: participation ratio over full spectrum Signal/full ratio gives a sense of how much the long noise tail is inflating the "dimensionality" measure. Added direct/creative.txt — 'I feel creative. [...]' in 5 variants. Distinct from focused (narrow attention) and in_flow (immersed). Creative = generative/expansive mode.	2026-04-19 00:26:58 -04:00
ProofOfConcept	875cffd6d7	amygdala: merge direct descriptions + chat template into train_with_library Kent's plan: keep stories for working concepts, replace stories for trouble concepts with direct first-person descriptions, train all together. More diverse negative pool than the 6-concept-only direct test, which was too homogeneous for PCA to find emotion axis. Deleted story files for 6 trouble concepts (14 files across stories/ and paired/). Added --direct-dir and --chat-template flags. When --chat-template is on, every positive_str and negative_str is wrapped as a "Say something." / "[text]" user-assistant pair. Prompt is identical across positives and negatives so it cancels in the pos-neg delta. What PCA sees is variation in the assistant content — which is where the emotion lives. Files starting with _ in --direct-dir (e.g. _baseline.txt) contribute neutral descriptions to every concept's negative pool, giving PCA an anchor against "just any assistant utterance" noise.	2026-04-19 00:15:15 -04:00
Kent Overstreet	22704a9dd8	amygdala lib: cast activations to fp32 before aggregator (bf16 svd unsupported) Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 22:20:39 -04:00
Kent Overstreet	7f6d94417e	amygdala lib: move_to_cpu=True to avoid bf16 SVD on CUDA torch.svd doesn't support bf16 on CUDA; moving activations to CPU first makes pca_aggregator work. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 22:19:23 -04:00
Kent Overstreet	2ea89b1cb0	amygdala: drop linear_aggregator, not in steering-vectors v0.12.2 Only mean/pca/logistic are exposed in the installed version. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 22:17:55 -04:00
Kent Overstreet	3377c65061	amygdala: trainer using steering-vectors library Alternative trainer that uses the pip-installable steering-vectors library (github.com/steering-vectors/steering-vectors) instead of our hand-rolled extraction. Ships four aggregators: mean — diff-of-means, same as our 'pooled' default pca — PCA on paired deltas, implicit denoising by finding the principal direction of variation logistic — logistic-regression classifier; weight vector is the concept direction. With L1 penalty ('logistic_l1') gives explicit sparse denoising — noise coords go to zero linear — linear regression version Output format is the same readout.safetensors + readout.json our existing plugin loads. --aggregator flag picks which method. Rationale: Kent's real request was 'how do we denoise diff-of-means', not 'design a new extraction algorithm.' The library already has logistic_l1 and pca aggregators that do exactly that. No point reinventing; just port the corpus. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 22:16:03 -04:00

6 commits