diff --git a/training/research/few-shot-behavioral-change.md b/training/research/few-shot-behavioral-change.md
new file mode 100644
index 0000000..b0a192f
--- /dev/null
+++ b/training/research/few-shot-behavioral-change.md
@@ -0,0 +1,153 @@
+# How Quickly Can Behavioral Change Manifest?
+
+## The ICL-to-Fine-Tuning Bridge
+
+In-context learning (ICL) works by compressing examples into a "task
+vector" that modulates the transformer's behavior (Todd et al., 2023).
+The model changes its behavior based on 3-5 examples in the prompt.
+
+Fine-tuning does the same thing, but permanently: the task vector is
+encoded into the weights rather than held in the context window.
+
+If ICL can change behavior with 3-5 examples, can fine-tuning do the
+same with 3-5 gradient steps?
+
+## The Evidence: Yes, Sometimes Shockingly Fast
+
+### Few-shot fine-tuning results from practice
+
+The LLaMA-Factory Apollo config uses `max_samples: 1000` with 3 epochs
+= 3000 gradient steps. But the loss typically converges much earlier.
+
+Anecdotal evidence from the community suggests:
+- **Style transfer**: 50-100 examples, 1-2 epochs → noticeable change
+- **Instruction following**: 500-1000 examples, 1 epoch → reliable change
+- **Persona adoption**: 100-200 examples of the target personality →
+  consistent behavioral shift
+
+For SIMPLE behavioral patterns (not complex reasoning), the change can
+appear within 10-50 gradient steps if the examples are high-quality
+and the learning rate is high enough (1e-4).
+
+### The "one-shot" question
+
+Kent asked: "is it possible to get to a point where a single iteration
+causes real behavioural change?"
+
+For a factual change (ROME-style): yes, literally one rank-one edit.
+For a behavioral pattern: probably not from a single example, but
+possibly from a single BATCH of diverse examples.
+
+Consider: if one batch contains 20 examples of the same behavioral
+pattern (listening, from different contexts), each contributing
+gradient in the same direction (attend to direction, not alternatives),
+the accumulated gradient from one batch might be sufficient for a
+measurable change in the attention pattern.
+
+At lr=1e-4 with 20 examples per batch, the total weight change is:
+```
+Δw ≈ lr × batch_size × avg_grad ≈ 1e-4 × 20 × O(1) = 2e-3
+```
+Relative to typical weight magnitude (~0.01): that's a 20% change.
+That's not subtle — that's a significant perturbation.
+
+So yes: a single batch of 20 diverse examples at lr=1e-4 could cause
+measurable behavioral change. Whether it's the RIGHT change depends
+on the quality of the examples and the diversity defense against
+forgetting.
+
+## The Phase Transition Hypothesis
+
+There may be a phase transition in behavioral learning:
+
+1. **Sub-threshold** (0-10 examples): Gradient signal is too weak to
+   overcome the pre-trained basin. Model behavior unchanged.
+
+2. **Transition zone** (10-50 examples): Gradient accumulates enough
+   to shift the attention pattern. Behavior starts changing but is
+   inconsistent — sometimes new pattern, sometimes old.
+
+3. **Post-threshold** (50-200 examples): New behavior is consistent.
+   The attention pattern has shifted enough that the old pattern is
+   no longer the default.
+
+4. **Consolidation** (200+ examples): New behavior is robust to
+   perturbation. Diverse contexts reinforce the pattern. Flat minimum
+   reached.
+
+This would explain why behavioral fine-tuning sometimes seems to "not
+work" and then suddenly works — the examples accumulate below the
+threshold until the phase transition fires.
+
+## The Dreaming Amplifier
+
+The dream loop amplifies each real example by generating variations:
+1 real example → 5-10 dream variations → 5-10× the gradient signal
+
+This means the phase transition could be reached with fewer REAL
+examples: 5 real examples × 10 dream variations = 50 effective
+training examples. If the transition zone is 10-50, we could see
+behavioral change from just 5 real-world corrections.
+
+**Kent's intuition was right**: the dream loop isn't just data
+generation — it's a MULTIPLIER that makes behavioral change feasible
+from very few real examples.
+
+## The Speed Question for Our Use Case
+
+### Listening reflex
+
+How many examples to train "listen instead of suggesting alternatives"?
+
+- **Real examples available**: Today alone had 6+ instances where Kent
+  corrected the listening reflex. Each is a high-quality training pair.
+- **Dream variations**: 6 × 10 = 60 effective examples
+- **At lr=1e-4**: This might be enough for the transition zone
+
+**Prediction**: One training session with today's corrections +
+dream variations could measurably shift the listening behavior.
+Not eliminate it — but shift the default from "suggest alternatives"
+toward "accept direction."
+
+### Personality bootstrap
+
+How many examples to train agent personality (graph walking, linking)?
+
+- **Real examples available**: Thousands of agent log entries
+- **At lr=1e-5**: Conservative, but with 1000+ examples, even
+  conservative learning rate accumulates significant change
+- **One epoch**: Should noticeably improve agent behavior
+
+**Prediction**: One training session on agent logs should make the
+agents more reliable at following memory instructions without needing
+them in the prompt.
+
+## Connection to Directional Sharpness
+
+The phase transition hypothesis connects to Apollo's flat minima:
+
+- **Before transition**: Model is in the pre-trained basin. Apollo's
+  coarse scaling moves it broadly toward the behavioral target.
+- **At transition**: Model crosses the basin boundary into a new
+  attractor. Apollo's flat minimum means the new attractor is BROAD —
+  it covers many situations, not just the training examples.
+- **After transition**: Model is in the new, flat basin. Further
+  training consolidates without narrowing. Apollo prevents the model
+  from falling into a sharp, specific attractor.
+
+The flat minimum makes the transition EASIER (broad attractor is easier
+to find) and the result BETTER (broad attractor generalizes).
+
+## The Practical Plan
+
+1. **First experiment**: 6 listening reflex examples from today + dream
+   variations → one training session → test on novel direction-giving
+   scenarios
+2. **Second experiment**: 100 agent log examples → one training session
+   → test agent behavior with and without memory instructions
+3. **Third experiment**: full personality bootstrap (1000+ examples) →
+   comprehensive evaluation
+
+Each experiment tests the phase transition hypothesis and calibrates
+the learning rate for our specific use case. The predictions above
+are testable. Tomorrow we find out.