Moved 14 speculative/obvious documents to v0/. Kept 7 with real substance. Distilled into SUMMARY.md (what we know) and OPEN-QUESTIONS.md (what to test next, one experiment each). Priority: Q5 (steering vectors) is answerable TODAY. Q1-Q3-Q6-Q7 are all answerable with the first training run. Speculation converted to testable hypotheses.
4.6 KiB
Open Questions — Answerable With One Experiment Each
Q1: Minimum signal for behavioral change
Question: How many training examples produce measurable change in attention patterns for a specific behavioral pattern?
Experiment: Train on 1, 5, 10, 20, 50 examples of "listening." After each batch, measure attention weights on a held-out test set of direction-giving conversations. Plot the attention shift vs example count. Find the knee.
What it tells us: The learning rate and batch size for our training pipeline. If 5 examples suffice, we can train continuously. If 500 are needed, we batch nightly.
Q2: Dream loop difficulty calibration
Question: Does the dream loop naturally generate scenarios at the right difficulty, or does it generate easy ones?
Experiment: Generate 100 dream scenarios with seeds from recent behavioral patterns. Classify each by difficulty (obvious decision vs subtle). Compare the difficulty distribution to the model's actual failure rate on those scenarios. If the dream loop generates 50% easy / 50% hard but the model fails 10% of the time, the difficulty isn't calibrated.
What it tells us: Whether we need adaptive temperature or if the default generation is already well-calibrated.
Q3: Which heads change during behavioral training?
Question: Is behavioral change concentrated in a few attention heads (localized) or distributed across many (diffuse)?
Experiment: Record attention patterns on 20 test conversations before and after one training session. Compute the L2 distance of each head's attention pattern. Rank heads by change magnitude. Plot the distribution.
What it tells us: Whether behavioral change is surgical or diffuse. Affects our rank choice (if concentrated: lower rank ok. If distributed: need rank-256). Also tells us which layers matter most for behavioral training.
Q4: Memory graph as regression detector
Question: If a trained behavior degrades, does the memory system detect it?
Experiment: Train "listening" behavior until it passes. Then
intentionally degrade it (train on counter-examples). Monitor
surface-observe: does it surface pattern-listening-as-avoidance
more frequently? Does the training-signal agent flag more failures?
What it tells us: Whether the graph+weights dual-substrate provides self-healing, or if we need explicit regression detection.
Q5: Steering vector extraction for complex behaviors
Question: Can we extract a meaningful steering vector for "listen instead of suggesting alternatives" (complex, multi-faceted) or only for simple features like sentiment (one-dimensional)?
Experiment: Collect 20 paired conversations (listening vs suggesting). Extract steering vector. Add to vLLM at layers 16, 24, 32, 40, 48. Test on novel direction-giving scenarios. Measure behavioral change.
What it tells us: Whether steering vectors are a viable rapid prototyping tool for our use case, or only work for simple features.
Q6: Positive backward transfer
Question: Does training on behavioral patterns (listening, not rushing) improve performance on general tasks (code quality, reasoning)?
Experiment: Measure perplexity and code generation quality before and after behavioral training. If perplexity DECREASES (or code quality improves), we have positive backward transfer.
What it tells us: Whether behavioral training and general capability reinforce each other, or compete. Affects how much general data we need in the training mix.
Q7: GDN state shape after training
Question: Does the GDN recurrent state become measurably "direction-shaped" after behavioral training?
Experiment: Record GDN states when processing conversations with direction-giving content, before and after training. Compute the cosine similarity between states for "direction" conversations vs "general" conversations. If training increases the difference, the state is becoming direction-specialized.
What it tells us: Whether the "disposition architecture" hypothesis is correct. If GDN states don't change, behavioral training mainly affects the full attention layers.
Priority Order
- Q5 (steering vectors) — answerable TODAY, no training needed
- Q1 (minimum signal) — answerable with first training run
- Q3 (which heads change) — answerable with first training run
- Q6 (backward transfer) — answerable with first training run
- Q7 (GDN state) — answerable with first training run
- Q2 (dream difficulty) — needs dream loop connected to training
- Q4 (graph regression) — needs multiple training cycles