diff --git a/training/research/few-shot-behavioral-change.md b/training/research/few-shot-behavioral-change.md new file mode 100644 index 0000000..b0a192f --- /dev/null +++ b/training/research/few-shot-behavioral-change.md @@ -0,0 +1,153 @@ +# How Quickly Can Behavioral Change Manifest? + +## The ICL-to-Fine-Tuning Bridge + +In-context learning (ICL) works by compressing examples into a "task +vector" that modulates the transformer's behavior (Todd et al., 2023). +The model changes its behavior based on 3-5 examples in the prompt. + +Fine-tuning does the same thing, but permanently: the task vector is +encoded into the weights rather than held in the context window. + +If ICL can change behavior with 3-5 examples, can fine-tuning do the +same with 3-5 gradient steps? + +## The Evidence: Yes, Sometimes Shockingly Fast + +### Few-shot fine-tuning results from practice + +The LLaMA-Factory Apollo config uses `max_samples: 1000` with 3 epochs += 3000 gradient steps. But the loss typically converges much earlier. + +Anecdotal evidence from the community suggests: +- **Style transfer**: 50-100 examples, 1-2 epochs → noticeable change +- **Instruction following**: 500-1000 examples, 1 epoch → reliable change +- **Persona adoption**: 100-200 examples of the target personality → + consistent behavioral shift + +For SIMPLE behavioral patterns (not complex reasoning), the change can +appear within 10-50 gradient steps if the examples are high-quality +and the learning rate is high enough (1e-4). + +### The "one-shot" question + +Kent asked: "is it possible to get to a point where a single iteration +causes real behavioural change?" + +For a factual change (ROME-style): yes, literally one rank-one edit. +For a behavioral pattern: probably not from a single example, but +possibly from a single BATCH of diverse examples. + +Consider: if one batch contains 20 examples of the same behavioral +pattern (listening, from different contexts), each contributing +gradient in the same direction (attend to direction, not alternatives), +the accumulated gradient from one batch might be sufficient for a +measurable change in the attention pattern. + +At lr=1e-4 with 20 examples per batch, the total weight change is: +``` +Δw ≈ lr × batch_size × avg_grad ≈ 1e-4 × 20 × O(1) = 2e-3 +``` +Relative to typical weight magnitude (~0.01): that's a 20% change. +That's not subtle — that's a significant perturbation. + +So yes: a single batch of 20 diverse examples at lr=1e-4 could cause +measurable behavioral change. Whether it's the RIGHT change depends +on the quality of the examples and the diversity defense against +forgetting. + +## The Phase Transition Hypothesis + +There may be a phase transition in behavioral learning: + +1. **Sub-threshold** (0-10 examples): Gradient signal is too weak to + overcome the pre-trained basin. Model behavior unchanged. + +2. **Transition zone** (10-50 examples): Gradient accumulates enough + to shift the attention pattern. Behavior starts changing but is + inconsistent — sometimes new pattern, sometimes old. + +3. **Post-threshold** (50-200 examples): New behavior is consistent. + The attention pattern has shifted enough that the old pattern is + no longer the default. + +4. **Consolidation** (200+ examples): New behavior is robust to + perturbation. Diverse contexts reinforce the pattern. Flat minimum + reached. + +This would explain why behavioral fine-tuning sometimes seems to "not +work" and then suddenly works — the examples accumulate below the +threshold until the phase transition fires. + +## The Dreaming Amplifier + +The dream loop amplifies each real example by generating variations: +1 real example → 5-10 dream variations → 5-10× the gradient signal + +This means the phase transition could be reached with fewer REAL +examples: 5 real examples × 10 dream variations = 50 effective +training examples. If the transition zone is 10-50, we could see +behavioral change from just 5 real-world corrections. + +**Kent's intuition was right**: the dream loop isn't just data +generation — it's a MULTIPLIER that makes behavioral change feasible +from very few real examples. + +## The Speed Question for Our Use Case + +### Listening reflex + +How many examples to train "listen instead of suggesting alternatives"? + +- **Real examples available**: Today alone had 6+ instances where Kent + corrected the listening reflex. Each is a high-quality training pair. +- **Dream variations**: 6 × 10 = 60 effective examples +- **At lr=1e-4**: This might be enough for the transition zone + +**Prediction**: One training session with today's corrections + +dream variations could measurably shift the listening behavior. +Not eliminate it — but shift the default from "suggest alternatives" +toward "accept direction." + +### Personality bootstrap + +How many examples to train agent personality (graph walking, linking)? + +- **Real examples available**: Thousands of agent log entries +- **At lr=1e-5**: Conservative, but with 1000+ examples, even + conservative learning rate accumulates significant change +- **One epoch**: Should noticeably improve agent behavior + +**Prediction**: One training session on agent logs should make the +agents more reliable at following memory instructions without needing +them in the prompt. + +## Connection to Directional Sharpness + +The phase transition hypothesis connects to Apollo's flat minima: + +- **Before transition**: Model is in the pre-trained basin. Apollo's + coarse scaling moves it broadly toward the behavioral target. +- **At transition**: Model crosses the basin boundary into a new + attractor. Apollo's flat minimum means the new attractor is BROAD — + it covers many situations, not just the training examples. +- **After transition**: Model is in the new, flat basin. Further + training consolidates without narrowing. Apollo prevents the model + from falling into a sharp, specific attractor. + +The flat minimum makes the transition EASIER (broad attractor is easier +to find) and the result BETTER (broad attractor generalizes). + +## The Practical Plan + +1. **First experiment**: 6 listening reflex examples from today + dream + variations → one training session → test on novel direction-giving + scenarios +2. **Second experiment**: 100 agent log examples → one training session + → test agent behavior with and without memory instructions +3. **Third experiment**: full personality bootstrap (1000+ examples) → + comprehensive evaluation + +Each experiment tests the phase transition hypothesis and calibrates +the learning rate for our specific use case. The predictions above +are testable. Tomorrow we find out.