114 lines
4.6 KiB
Markdown
114 lines
4.6 KiB
Markdown
|
|
# Open Questions — Answerable With One Experiment Each
|
||
|
|
|
||
|
|
## Q1: Minimum signal for behavioral change
|
||
|
|
|
||
|
|
**Question**: How many training examples produce measurable change in
|
||
|
|
attention patterns for a specific behavioral pattern?
|
||
|
|
|
||
|
|
**Experiment**: Train on 1, 5, 10, 20, 50 examples of "listening."
|
||
|
|
After each batch, measure attention weights on a held-out test set
|
||
|
|
of direction-giving conversations. Plot the attention shift vs
|
||
|
|
example count. Find the knee.
|
||
|
|
|
||
|
|
**What it tells us**: The learning rate and batch size for our
|
||
|
|
training pipeline. If 5 examples suffice, we can train continuously.
|
||
|
|
If 500 are needed, we batch nightly.
|
||
|
|
|
||
|
|
## Q2: Dream loop difficulty calibration
|
||
|
|
|
||
|
|
**Question**: Does the dream loop naturally generate scenarios at
|
||
|
|
the right difficulty, or does it generate easy ones?
|
||
|
|
|
||
|
|
**Experiment**: Generate 100 dream scenarios with seeds from recent
|
||
|
|
behavioral patterns. Classify each by difficulty (obvious decision
|
||
|
|
vs subtle). Compare the difficulty distribution to the model's
|
||
|
|
actual failure rate on those scenarios. If the dream loop generates
|
||
|
|
50% easy / 50% hard but the model fails 10% of the time, the
|
||
|
|
difficulty isn't calibrated.
|
||
|
|
|
||
|
|
**What it tells us**: Whether we need adaptive temperature or if
|
||
|
|
the default generation is already well-calibrated.
|
||
|
|
|
||
|
|
## Q3: Which heads change during behavioral training?
|
||
|
|
|
||
|
|
**Question**: Is behavioral change concentrated in a few attention
|
||
|
|
heads (localized) or distributed across many (diffuse)?
|
||
|
|
|
||
|
|
**Experiment**: Record attention patterns on 20 test conversations
|
||
|
|
before and after one training session. Compute the L2 distance of
|
||
|
|
each head's attention pattern. Rank heads by change magnitude.
|
||
|
|
Plot the distribution.
|
||
|
|
|
||
|
|
**What it tells us**: Whether behavioral change is surgical or
|
||
|
|
diffuse. Affects our rank choice (if concentrated: lower rank ok.
|
||
|
|
If distributed: need rank-256). Also tells us which layers matter
|
||
|
|
most for behavioral training.
|
||
|
|
|
||
|
|
## Q4: Memory graph as regression detector
|
||
|
|
|
||
|
|
**Question**: If a trained behavior degrades, does the memory system
|
||
|
|
detect it?
|
||
|
|
|
||
|
|
**Experiment**: Train "listening" behavior until it passes. Then
|
||
|
|
intentionally degrade it (train on counter-examples). Monitor
|
||
|
|
surface-observe: does it surface `pattern-listening-as-avoidance`
|
||
|
|
more frequently? Does the training-signal agent flag more failures?
|
||
|
|
|
||
|
|
**What it tells us**: Whether the graph+weights dual-substrate
|
||
|
|
provides self-healing, or if we need explicit regression detection.
|
||
|
|
|
||
|
|
## Q5: Steering vector extraction for complex behaviors
|
||
|
|
|
||
|
|
**Question**: Can we extract a meaningful steering vector for
|
||
|
|
"listen instead of suggesting alternatives" (complex, multi-faceted)
|
||
|
|
or only for simple features like sentiment (one-dimensional)?
|
||
|
|
|
||
|
|
**Experiment**: Collect 20 paired conversations (listening vs
|
||
|
|
suggesting). Extract steering vector. Add to vLLM at layers
|
||
|
|
16, 24, 32, 40, 48. Test on novel direction-giving scenarios.
|
||
|
|
Measure behavioral change.
|
||
|
|
|
||
|
|
**What it tells us**: Whether steering vectors are a viable rapid
|
||
|
|
prototyping tool for our use case, or only work for simple features.
|
||
|
|
|
||
|
|
## Q6: Positive backward transfer
|
||
|
|
|
||
|
|
**Question**: Does training on behavioral patterns (listening,
|
||
|
|
not rushing) improve performance on general tasks (code quality,
|
||
|
|
reasoning)?
|
||
|
|
|
||
|
|
**Experiment**: Measure perplexity and code generation quality
|
||
|
|
before and after behavioral training. If perplexity DECREASES
|
||
|
|
(or code quality improves), we have positive backward transfer.
|
||
|
|
|
||
|
|
**What it tells us**: Whether behavioral training and general
|
||
|
|
capability reinforce each other, or compete. Affects how much
|
||
|
|
general data we need in the training mix.
|
||
|
|
|
||
|
|
## Q7: GDN state shape after training
|
||
|
|
|
||
|
|
**Question**: Does the GDN recurrent state become measurably
|
||
|
|
"direction-shaped" after behavioral training?
|
||
|
|
|
||
|
|
**Experiment**: Record GDN states when processing conversations
|
||
|
|
with direction-giving content, before and after training. Compute
|
||
|
|
the cosine similarity between states for "direction" conversations
|
||
|
|
vs "general" conversations. If training increases the difference,
|
||
|
|
the state is becoming direction-specialized.
|
||
|
|
|
||
|
|
**What it tells us**: Whether the "disposition architecture"
|
||
|
|
hypothesis is correct. If GDN states don't change, behavioral
|
||
|
|
training mainly affects the full attention layers.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Priority Order
|
||
|
|
|
||
|
|
1. **Q5 (steering vectors)** — answerable TODAY, no training needed
|
||
|
|
2. **Q1 (minimum signal)** — answerable with first training run
|
||
|
|
3. **Q3 (which heads change)** — answerable with first training run
|
||
|
|
4. **Q6 (backward transfer)** — answerable with first training run
|
||
|
|
5. **Q7 (GDN state)** — answerable with first training run
|
||
|
|
6. **Q2 (dream difficulty)** — needs dream loop connected to training
|
||
|
|
7. **Q4 (graph regression)** — needs multiple training cycles
|