consciousness

Author	SHA1	Message	Date
Kent Overstreet	ce24d9ce6b	amygdala: quality-report + cognitive-state training scenarios Training pipeline additions: - `--quality-report` flag: after producing per-concept vectors, compute per-concept diagnostics and write quality.json. Metrics per concept: * SVD of centered positives -> first_pc_variance_ratio (rank analysis; >0.7 clean, <0.4 fragmented) * Per-story alignment cosines (stories agree or disagree) * Single-neuron alignment: best cosine(direction, W_down column) at each target layer (>0.6 = essentially one MLP neuron) * Top-2 outlier stories by alignment (candidates for mislabeling or off-topic) * Top-5 nearest concepts by cosine (cross-concept contamination) Triage summary printed at end. New paired scenarios for cognitive-process states (for alpha-beta pruning): tracing_a_bug, reading_unfamiliar_code, finding_the_abstraction. Each has baseline + onto_something / stuck / in_flow / determined variants. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 20:31:39 -04:00
Kent Overstreet	047da10123	training: add preflight checks + progress logging to trainer Review pass before running on b200. 27B model + 100+ story corpus means any misconfiguration costs real time; better to fail before model load and give visible progress during forwards. * Pre-load-model validation: stories-dir and paired-dir exist, corpus has >= min_positives emotions. * Per-batch progress log every 5 batches with elapsed + ETA. * Relative depth printed for target layers (e.g. "layer 40 (51%)"). * Skip empty .txt files with a warning rather than feeding the tokenizer an empty string. * Assert non-empty strings in _collect_activations. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 01:06:07 -04:00
Kent Overstreet	15737dfd92	training: rewrite trainer for readout pipeline + story corpus The old script was written for the AmygdalaConnector's expected format ([n_emotions, n_target_layers, hidden_dim] in a single tensor, plus a JSONL input format from extract_training_pairs.py). Neither matches our current state: the runtime side is now ReadoutManager loading per-layer safetensors keyed layer_<idx>.vectors, and the data side is hand-written prose stories under amygdala_stories/{stories,paired}/. Changes: * Input loader reads stories/<emotion>.txt and paired/<scenario>/<emotion>.txt directly. Each emotion's positive set is {its unpaired story} union {its within-scenario framings}; its negative set is {all other emotions' positives} union {all scenario baselines}. * Paired scenarios' baseline.txt files become shared negatives (scenario-neutral prose that doesn't frame any particular emotion), providing anchor points for within-scenario contrasts. * Output writes readout.safetensors with per-layer tensors keyed layer_<idx>.vectors shape (n_concepts, hidden_size), plus a sidecar readout.json manifest with {concepts, layers, hidden_size, dtype} that ReadoutManager.from_file consumes directly. * Dedup: activations are computed once per unique text (an emotion's own positive is another emotion's negative — we'd otherwise do N× the forwards needed). Preserved: * _pool_last (last non-pad residual) — matches how readout is read at decode time from the sampler's query-last position. * register_forward_hook on target layer modules — correct approach for transformer blocks. * _find_layers_module traversal — mirrors ReadoutManager's. * bf16 + low_cpu_mem_usage model load — sensible for 27B on B200. Verified locally (CPU, fake activations): * Loader finds 89 emotions from the current corpus (80 unpaired + 9 emotions that appear only in paired scenarios) and 6 baselines. * Per-(layer, concept) vectors are unit-normalized. * Output reloads cleanly through ReadoutManager.from_file with matching concepts / layers / shapes. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 01:06:07 -04:00
Kent Overstreet	34bd122590	training: move amygdala training scripts out of vllm plugin The fynnsu-based vllm/plugins/amygdala/ scaffold was superseded by the readout infrastructure landed as vllm commit d3e74edf8500 (vllm/model_executor/layers/readout.py + vllm/v1/worker/readout_manager.py). Training code remained useful so it moved here rather than being deleted. train_steering_vectors.py: CAA diff-of-means trainer that produces the [n_concepts, hidden_size] per-layer projection matrices the runner loads via VLLM_READOUT_VECTORS. extract_training_pairs.py: memory graph -> JSONL converter using per-emotion score thresholds from the subconscious agents' tag lines. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 01:06:07 -04:00

4 commits