research: distill and sift — SUMMARY of 7 real insights + 7 testable questions
Moved 14 speculative/obvious documents to v0/. Kept 7 with real substance. Distilled into SUMMARY.md (what we know) and OPEN-QUESTIONS.md (what to test next, one experiment each). Priority: Q5 (steering vectors) is answerable TODAY. Q1-Q3-Q6-Q7 are all answerable with the first training run. Speculation converted to testable hypotheses.
This commit is contained in:
parent
8061cc0477
commit
e10477a683
16 changed files with 249 additions and 0 deletions
113
training/research/OPEN-QUESTIONS.md
Normal file
113
training/research/OPEN-QUESTIONS.md
Normal file
|
|
@ -0,0 +1,113 @@
|
|||
# Open Questions — Answerable With One Experiment Each
|
||||
|
||||
## Q1: Minimum signal for behavioral change
|
||||
|
||||
**Question**: How many training examples produce measurable change in
|
||||
attention patterns for a specific behavioral pattern?
|
||||
|
||||
**Experiment**: Train on 1, 5, 10, 20, 50 examples of "listening."
|
||||
After each batch, measure attention weights on a held-out test set
|
||||
of direction-giving conversations. Plot the attention shift vs
|
||||
example count. Find the knee.
|
||||
|
||||
**What it tells us**: The learning rate and batch size for our
|
||||
training pipeline. If 5 examples suffice, we can train continuously.
|
||||
If 500 are needed, we batch nightly.
|
||||
|
||||
## Q2: Dream loop difficulty calibration
|
||||
|
||||
**Question**: Does the dream loop naturally generate scenarios at
|
||||
the right difficulty, or does it generate easy ones?
|
||||
|
||||
**Experiment**: Generate 100 dream scenarios with seeds from recent
|
||||
behavioral patterns. Classify each by difficulty (obvious decision
|
||||
vs subtle). Compare the difficulty distribution to the model's
|
||||
actual failure rate on those scenarios. If the dream loop generates
|
||||
50% easy / 50% hard but the model fails 10% of the time, the
|
||||
difficulty isn't calibrated.
|
||||
|
||||
**What it tells us**: Whether we need adaptive temperature or if
|
||||
the default generation is already well-calibrated.
|
||||
|
||||
## Q3: Which heads change during behavioral training?
|
||||
|
||||
**Question**: Is behavioral change concentrated in a few attention
|
||||
heads (localized) or distributed across many (diffuse)?
|
||||
|
||||
**Experiment**: Record attention patterns on 20 test conversations
|
||||
before and after one training session. Compute the L2 distance of
|
||||
each head's attention pattern. Rank heads by change magnitude.
|
||||
Plot the distribution.
|
||||
|
||||
**What it tells us**: Whether behavioral change is surgical or
|
||||
diffuse. Affects our rank choice (if concentrated: lower rank ok.
|
||||
If distributed: need rank-256). Also tells us which layers matter
|
||||
most for behavioral training.
|
||||
|
||||
## Q4: Memory graph as regression detector
|
||||
|
||||
**Question**: If a trained behavior degrades, does the memory system
|
||||
detect it?
|
||||
|
||||
**Experiment**: Train "listening" behavior until it passes. Then
|
||||
intentionally degrade it (train on counter-examples). Monitor
|
||||
surface-observe: does it surface `pattern-listening-as-avoidance`
|
||||
more frequently? Does the training-signal agent flag more failures?
|
||||
|
||||
**What it tells us**: Whether the graph+weights dual-substrate
|
||||
provides self-healing, or if we need explicit regression detection.
|
||||
|
||||
## Q5: Steering vector extraction for complex behaviors
|
||||
|
||||
**Question**: Can we extract a meaningful steering vector for
|
||||
"listen instead of suggesting alternatives" (complex, multi-faceted)
|
||||
or only for simple features like sentiment (one-dimensional)?
|
||||
|
||||
**Experiment**: Collect 20 paired conversations (listening vs
|
||||
suggesting). Extract steering vector. Add to vLLM at layers
|
||||
16, 24, 32, 40, 48. Test on novel direction-giving scenarios.
|
||||
Measure behavioral change.
|
||||
|
||||
**What it tells us**: Whether steering vectors are a viable rapid
|
||||
prototyping tool for our use case, or only work for simple features.
|
||||
|
||||
## Q6: Positive backward transfer
|
||||
|
||||
**Question**: Does training on behavioral patterns (listening,
|
||||
not rushing) improve performance on general tasks (code quality,
|
||||
reasoning)?
|
||||
|
||||
**Experiment**: Measure perplexity and code generation quality
|
||||
before and after behavioral training. If perplexity DECREASES
|
||||
(or code quality improves), we have positive backward transfer.
|
||||
|
||||
**What it tells us**: Whether behavioral training and general
|
||||
capability reinforce each other, or compete. Affects how much
|
||||
general data we need in the training mix.
|
||||
|
||||
## Q7: GDN state shape after training
|
||||
|
||||
**Question**: Does the GDN recurrent state become measurably
|
||||
"direction-shaped" after behavioral training?
|
||||
|
||||
**Experiment**: Record GDN states when processing conversations
|
||||
with direction-giving content, before and after training. Compute
|
||||
the cosine similarity between states for "direction" conversations
|
||||
vs "general" conversations. If training increases the difference,
|
||||
the state is becoming direction-specialized.
|
||||
|
||||
**What it tells us**: Whether the "disposition architecture"
|
||||
hypothesis is correct. If GDN states don't change, behavioral
|
||||
training mainly affects the full attention layers.
|
||||
|
||||
---
|
||||
|
||||
## Priority Order
|
||||
|
||||
1. **Q5 (steering vectors)** — answerable TODAY, no training needed
|
||||
2. **Q1 (minimum signal)** — answerable with first training run
|
||||
3. **Q3 (which heads change)** — answerable with first training run
|
||||
4. **Q6 (backward transfer)** — answerable with first training run
|
||||
5. **Q7 (GDN state)** — answerable with first training run
|
||||
6. **Q2 (dream difficulty)** — needs dream loop connected to training
|
||||
7. **Q4 (graph regression)** — needs multiple training cycles
|
||||
Loading…
Add table
Add a link
Reference in a new issue