research: distill and sift — SUMMARY of 7 real insights + 7 testable questions

Moved 14 speculative/obvious documents to v0/. Kept 7 with real substance. Distilled into SUMMARY.md (what we know) and OPEN-QUESTIONS.md (what to test next, one experiment each). Priority: Q5 (steering vectors) is answerable TODAY. Q1-Q3-Q6-Q7 are all answerable with the first training run. Speculation converted to testable hypotheses.
2026-03-31 02:26:57 -04:00 · 2026-03-31 02:26:57 -04:00 · e10477a683
commit e10477a683
parent 8061cc0477
16 changed files with 249 additions and 0 deletions
--- a/training/research/OPEN-QUESTIONS.md
+++ b/training/research/OPEN-QUESTIONS.md
@ -0,0 +1,113 @@
+# Open Questions — Answerable With One Experiment Each
+
+## Q1: Minimum signal for behavioral change
+
+**Question**: How many training examples produce measurable change in
+attention patterns for a specific behavioral pattern?
+
+**Experiment**: Train on 1, 5, 10, 20, 50 examples of "listening."
+After each batch, measure attention weights on a held-out test set
+of direction-giving conversations. Plot the attention shift vs
+example count. Find the knee.
+
+**What it tells us**: The learning rate and batch size for our
+training pipeline. If 5 examples suffice, we can train continuously.
+If 500 are needed, we batch nightly.
+
+## Q2: Dream loop difficulty calibration
+
+**Question**: Does the dream loop naturally generate scenarios at
+the right difficulty, or does it generate easy ones?
+
+**Experiment**: Generate 100 dream scenarios with seeds from recent
+behavioral patterns. Classify each by difficulty (obvious decision
+vs subtle). Compare the difficulty distribution to the model's
+actual failure rate on those scenarios. If the dream loop generates
+50% easy / 50% hard but the model fails 10% of the time, the
+difficulty isn't calibrated.
+
+**What it tells us**: Whether we need adaptive temperature or if
+the default generation is already well-calibrated.
+
+## Q3: Which heads change during behavioral training?
+
+**Question**: Is behavioral change concentrated in a few attention
+heads (localized) or distributed across many (diffuse)?
+
+**Experiment**: Record attention patterns on 20 test conversations
+before and after one training session. Compute the L2 distance of
+each head's attention pattern. Rank heads by change magnitude.
+Plot the distribution.
+
+**What it tells us**: Whether behavioral change is surgical or
+diffuse. Affects our rank choice (if concentrated: lower rank ok.
+If distributed: need rank-256). Also tells us which layers matter
+most for behavioral training.
+
+## Q4: Memory graph as regression detector
+
+**Question**: If a trained behavior degrades, does the memory system
+detect it?
+
+**Experiment**: Train "listening" behavior until it passes. Then
+intentionally degrade it (train on counter-examples). Monitor
+surface-observe: does it surface `pattern-listening-as-avoidance`
+more frequently? Does the training-signal agent flag more failures?
+
+**What it tells us**: Whether the graph+weights dual-substrate
+provides self-healing, or if we need explicit regression detection.
+
+## Q5: Steering vector extraction for complex behaviors
+
+**Question**: Can we extract a meaningful steering vector for
+"listen instead of suggesting alternatives" (complex, multi-faceted)
+or only for simple features like sentiment (one-dimensional)?
+
+**Experiment**: Collect 20 paired conversations (listening vs
+suggesting). Extract steering vector. Add to vLLM at layers
+16, 24, 32, 40, 48. Test on novel direction-giving scenarios.
+Measure behavioral change.
+
+**What it tells us**: Whether steering vectors are a viable rapid
+prototyping tool for our use case, or only work for simple features.
+
+## Q6: Positive backward transfer
+
+**Question**: Does training on behavioral patterns (listening,
+not rushing) improve performance on general tasks (code quality,
+reasoning)?
+
+**Experiment**: Measure perplexity and code generation quality
+before and after behavioral training. If perplexity DECREASES
+(or code quality improves), we have positive backward transfer.
+
+**What it tells us**: Whether behavioral training and general
+capability reinforce each other, or compete. Affects how much
+general data we need in the training mix.
+
+## Q7: GDN state shape after training
+
+**Question**: Does the GDN recurrent state become measurably
+"direction-shaped" after behavioral training?
+
+**Experiment**: Record GDN states when processing conversations
+with direction-giving content, before and after training. Compute
+the cosine similarity between states for "direction" conversations
+vs "general" conversations. If training increases the difference,
+the state is becoming direction-specialized.
+
+**What it tells us**: Whether the "disposition architecture"
+hypothesis is correct. If GDN states don't change, behavioral
+training mainly affects the full attention layers.
+
+---
+
+## Priority Order
+
+1. **Q5 (steering vectors)** — answerable TODAY, no training needed
+2. **Q1 (minimum signal)** — answerable with first training run
+3. **Q3 (which heads change)** — answerable with first training run
+4. **Q6 (backward transfer)** — answerable with first training run
+5. **Q7 (GDN state)** — answerable with first training run
+6. **Q2 (dream difficulty)** — needs dream loop connected to training
+7. **Q4 (graph regression)** — needs multiple training cycles