consciousness/training
Kent Overstreet af17b0f0df amygdala: per-head attention decomposition diagnostic
As part of --quality-report, run a second forward pass capturing the
input to each target layer's o_proj (= concat of per-head attention
outputs before the output projection). For each concept, reshape to
[n_heads, head_dim] and rank heads by diff-of-means magnitude /
per-head selectivity (magnitude normalised by negative std).

Motivation: the Wang et al. paper (2510.11328) — whose paired-scenario
methodology we already lifted — further decomposes concept circuits at
the attention-head level. Meta-relational concepts (recognition, trust,
vulnerability) plausibly live in a sparse attention-head circuit rather
than in the residual-stream sum, which would explain why diff-of-means
on the residual blurs them. This diagnostic surfaces that.

Output is folded into quality.json under each concept as "per_head":
per (layer) a list of top-10 heads with [head_idx, raw_norm,
selectivity], plus head_concentration (fraction of total head-norm
captured by those top heads).

Interpretation:
- head_concentration > 0.5 = sparse head circuit; a handful of heads
  route the concept. Worth building a head-level readout for.
- head_concentration ~= n/k for n heads = concept is distributed across
  all heads ~evenly; residual-stream diff-of-means is doing fine.

Hybrid layers (Mamba, GatedDeltaNet) whose attention path doesn't
match the standard module layout are silently skipped.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 20:37:44 -04:00
..
amygdala_stories amygdala: quality-report + cognitive-state training scenarios 2026-04-18 20:31:39 -04:00
amygdala_training amygdala: per-head attention decomposition diagnostic 2026-04-18 20:37:44 -04:00
apollo_plugin training: move to dedicated subprocess with ZMQ communication 2026-04-16 02:04:26 -04:00
research research: latent reasoning integration plans for Qwen 3.5 27B 2026-04-12 15:50:09 -04:00
DESIGN.md training: move to dedicated subprocess with ZMQ communication 2026-04-16 02:04:26 -04:00
pyproject.toml training: move to dedicated subprocess with ZMQ communication 2026-04-16 02:04:26 -04:00