consciousness

Author	SHA1	Message	Date
ProofOfConcept	875cffd6d7	amygdala: merge direct descriptions + chat template into train_with_library Kent's plan: keep stories for working concepts, replace stories for trouble concepts with direct first-person descriptions, train all together. More diverse negative pool than the 6-concept-only direct test, which was too homogeneous for PCA to find emotion axis. Deleted story files for 6 trouble concepts (14 files across stories/ and paired/). Added --direct-dir and --chat-template flags. When --chat-template is on, every positive_str and negative_str is wrapped as a "Say something." / "[text]" user-assistant pair. Prompt is identical across positives and negatives so it cancels in the pos-neg delta. What PCA sees is variation in the assistant content — which is where the emotion lives. Files starting with _ in --direct-dir (e.g. _baseline.txt) contribute neutral descriptions to every concept's negative pool, giving PCA an anchor against "just any assistant utterance" noise.	2026-04-19 00:15:15 -04:00
Kent Overstreet	22704a9dd8	amygdala lib: cast activations to fp32 before aggregator (bf16 svd unsupported) Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 22:20:39 -04:00
Kent Overstreet	7f6d94417e	amygdala lib: move_to_cpu=True to avoid bf16 SVD on CUDA torch.svd doesn't support bf16 on CUDA; moving activations to CPU first makes pca_aggregator work. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 22:19:23 -04:00
Kent Overstreet	2ea89b1cb0	amygdala: drop linear_aggregator, not in steering-vectors v0.12.2 Only mean/pca/logistic are exposed in the installed version. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 22:17:55 -04:00
Kent Overstreet	3377c65061	amygdala: trainer using steering-vectors library Alternative trainer that uses the pip-installable steering-vectors library (github.com/steering-vectors/steering-vectors) instead of our hand-rolled extraction. Ships four aggregators: mean — diff-of-means, same as our 'pooled' default pca — PCA on paired deltas, implicit denoising by finding the principal direction of variation logistic — logistic-regression classifier; weight vector is the concept direction. With L1 penalty ('logistic_l1') gives explicit sparse denoising — noise coords go to zero linear — linear regression version Output format is the same readout.safetensors + readout.json our existing plugin loads. --aggregator flag picks which method. Rationale: Kent's real request was 'how do we denoise diff-of-means', not 'design a new extraction algorithm.' The library already has logistic_l1 and pca aggregators that do exactly that. No point reinventing; just port the corpus. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 22:16:03 -04:00

5 commits