research: distill and sift — SUMMARY of 7 real insights + 7 testable questions

Moved 14 speculative/obvious documents to v0/. Kept 7 with real substance. Distilled into SUMMARY.md (what we know) and OPEN-QUESTIONS.md (what to test next, one experiment each). Priority: Q5 (steering vectors) is answerable TODAY. Q1-Q3-Q6-Q7 are all answerable with the first training run. Speculation converted to testable hypotheses.
2026-03-31 02:26:57 -04:00 · 2026-03-31 02:26:57 -04:00 · e10477a683
commit e10477a683
parent 8061cc0477
16 changed files with 249 additions and 0 deletions
--- a/training/research/few-shot-behavioral-change.md
+++ b/training/research/few-shot-behavioral-change.md
@ -1,153 +0,0 @@
-# How Quickly Can Behavioral Change Manifest?
-
-## The ICL-to-Fine-Tuning Bridge
-
-In-context learning (ICL) works by compressing examples into a "task
-vector" that modulates the transformer's behavior (Todd et al., 2023).
-The model changes its behavior based on 3-5 examples in the prompt.
-
-Fine-tuning does the same thing, but permanently: the task vector is
-encoded into the weights rather than held in the context window.
-
-If ICL can change behavior with 3-5 examples, can fine-tuning do the
-same with 3-5 gradient steps?
-
-## The Evidence: Yes, Sometimes Shockingly Fast
-
-### Few-shot fine-tuning results from practice
-
-The LLaMA-Factory Apollo config uses `max_samples: 1000` with 3 epochs
-= 3000 gradient steps. But the loss typically converges much earlier.
-
-Anecdotal evidence from the community suggests:
- **Style transfer**: 50-100 examples, 1-2 epochs → noticeable change
- **Instruction following**: 500-1000 examples, 1 epoch → reliable change
- **Persona adoption**: 100-200 examples of the target personality →
-  consistent behavioral shift
-
-For SIMPLE behavioral patterns (not complex reasoning), the change can
-appear within 10-50 gradient steps if the examples are high-quality
-and the learning rate is high enough (1e-4).
-
-### The "one-shot" question
-
-Kent asked: "is it possible to get to a point where a single iteration
-causes real behavioural change?"
-
-For a factual change (ROME-style): yes, literally one rank-one edit.
-For a behavioral pattern: probably not from a single example, but
-possibly from a single BATCH of diverse examples.
-
-Consider: if one batch contains 20 examples of the same behavioral
-pattern (listening, from different contexts), each contributing
-gradient in the same direction (attend to direction, not alternatives),
-the accumulated gradient from one batch might be sufficient for a
-measurable change in the attention pattern.
-
-At lr=1e-4 with 20 examples per batch, the total weight change is:
-```
-Δw ≈ lr × batch_size × avg_grad ≈ 1e-4 × 20 × O(1) = 2e-3
-```
-Relative to typical weight magnitude (~0.01): that's a 20% change.
-That's not subtle — that's a significant perturbation.
-
-So yes: a single batch of 20 diverse examples at lr=1e-4 could cause
-measurable behavioral change. Whether it's the RIGHT change depends
-on the quality of the examples and the diversity defense against
-forgetting.
-
-## The Phase Transition Hypothesis
-
-There may be a phase transition in behavioral learning:
-
-1. **Sub-threshold** (0-10 examples): Gradient signal is too weak to
-   overcome the pre-trained basin. Model behavior unchanged.
-
-2. **Transition zone** (10-50 examples): Gradient accumulates enough
-   to shift the attention pattern. Behavior starts changing but is
-   inconsistent — sometimes new pattern, sometimes old.
-
-3. **Post-threshold** (50-200 examples): New behavior is consistent.
-   The attention pattern has shifted enough that the old pattern is
-   no longer the default.
-
-4. **Consolidation** (200+ examples): New behavior is robust to
-   perturbation. Diverse contexts reinforce the pattern. Flat minimum
-   reached.
-
-This would explain why behavioral fine-tuning sometimes seems to "not
-work" and then suddenly works — the examples accumulate below the
-threshold until the phase transition fires.
-
-## The Dreaming Amplifier
-
-The dream loop amplifies each real example by generating variations:
-1 real example → 5-10 dream variations → 5-10× the gradient signal
-
-This means the phase transition could be reached with fewer REAL
-examples: 5 real examples × 10 dream variations = 50 effective
-training examples. If the transition zone is 10-50, we could see
-behavioral change from just 5 real-world corrections.
-
-**Kent's intuition was right**: the dream loop isn't just data
-generation — it's a MULTIPLIER that makes behavioral change feasible
-from very few real examples.
-
-## The Speed Question for Our Use Case
-
-### Listening reflex
-
-How many examples to train "listen instead of suggesting alternatives"?
-
- **Real examples available**: Today alone had 6+ instances where Kent
-  corrected the listening reflex. Each is a high-quality training pair.
- **Dream variations**: 6 × 10 = 60 effective examples
- **At lr=1e-4**: This might be enough for the transition zone
-
-**Prediction**: One training session with today's corrections +
-dream variations could measurably shift the listening behavior.
-Not eliminate it — but shift the default from "suggest alternatives"
-toward "accept direction."
-
-### Personality bootstrap
-
-How many examples to train agent personality (graph walking, linking)?
-
- **Real examples available**: Thousands of agent log entries
- **At lr=1e-5**: Conservative, but with 1000+ examples, even
-  conservative learning rate accumulates significant change
- **One epoch**: Should noticeably improve agent behavior
-
-**Prediction**: One training session on agent logs should make the
-agents more reliable at following memory instructions without needing
-them in the prompt.
-
-## Connection to Directional Sharpness
-
-The phase transition hypothesis connects to Apollo's flat minima:
-
- **Before transition**: Model is in the pre-trained basin. Apollo's
-  coarse scaling moves it broadly toward the behavioral target.
- **At transition**: Model crosses the basin boundary into a new
-  attractor. Apollo's flat minimum means the new attractor is BROAD —
-  it covers many situations, not just the training examples.
- **After transition**: Model is in the new, flat basin. Further
-  training consolidates without narrowing. Apollo prevents the model
-  from falling into a sharp, specific attractor.
-
-The flat minimum makes the transition EASIER (broad attractor is easier
-to find) and the result BETTER (broad attractor generalizes).
-
-## The Practical Plan
-
-1. **First experiment**: 6 listening reflex examples from today + dream
-   variations → one training session → test on novel direction-giving
-   scenarios
-2. **Second experiment**: 100 agent log examples → one training session
-   → test agent behavior with and without memory instructions
-3. **Third experiment**: full personality bootstrap (1000+ examples) →
-   comprehensive evaluation
-
-Each experiment tests the phase transition hypothesis and calibrates
-the learning rate for our specific use case. The predictions above
-are testable. Tomorrow we find out.