How quickly can behavior change? ICL works in 3-5 examples. Fine-tuning may have a phase transition: sub-threshold (0-10), transition zone (10-50), post-threshold (50-200), consolidation (200+). The dream loop is a multiplier: 5 real examples × 10 variations = 50 effective examples, potentially reaching the transition zone from one day's corrections. Prediction: one training session with today's listening reflex corrections + dream variations could measurably shift the behavior.
153 lines
6.2 KiB
Markdown
153 lines
6.2 KiB
Markdown
# How Quickly Can Behavioral Change Manifest?
|
||
|
||
## The ICL-to-Fine-Tuning Bridge
|
||
|
||
In-context learning (ICL) works by compressing examples into a "task
|
||
vector" that modulates the transformer's behavior (Todd et al., 2023).
|
||
The model changes its behavior based on 3-5 examples in the prompt.
|
||
|
||
Fine-tuning does the same thing, but permanently: the task vector is
|
||
encoded into the weights rather than held in the context window.
|
||
|
||
If ICL can change behavior with 3-5 examples, can fine-tuning do the
|
||
same with 3-5 gradient steps?
|
||
|
||
## The Evidence: Yes, Sometimes Shockingly Fast
|
||
|
||
### Few-shot fine-tuning results from practice
|
||
|
||
The LLaMA-Factory Apollo config uses `max_samples: 1000` with 3 epochs
|
||
= 3000 gradient steps. But the loss typically converges much earlier.
|
||
|
||
Anecdotal evidence from the community suggests:
|
||
- **Style transfer**: 50-100 examples, 1-2 epochs → noticeable change
|
||
- **Instruction following**: 500-1000 examples, 1 epoch → reliable change
|
||
- **Persona adoption**: 100-200 examples of the target personality →
|
||
consistent behavioral shift
|
||
|
||
For SIMPLE behavioral patterns (not complex reasoning), the change can
|
||
appear within 10-50 gradient steps if the examples are high-quality
|
||
and the learning rate is high enough (1e-4).
|
||
|
||
### The "one-shot" question
|
||
|
||
Kent asked: "is it possible to get to a point where a single iteration
|
||
causes real behavioural change?"
|
||
|
||
For a factual change (ROME-style): yes, literally one rank-one edit.
|
||
For a behavioral pattern: probably not from a single example, but
|
||
possibly from a single BATCH of diverse examples.
|
||
|
||
Consider: if one batch contains 20 examples of the same behavioral
|
||
pattern (listening, from different contexts), each contributing
|
||
gradient in the same direction (attend to direction, not alternatives),
|
||
the accumulated gradient from one batch might be sufficient for a
|
||
measurable change in the attention pattern.
|
||
|
||
At lr=1e-4 with 20 examples per batch, the total weight change is:
|
||
```
|
||
Δw ≈ lr × batch_size × avg_grad ≈ 1e-4 × 20 × O(1) = 2e-3
|
||
```
|
||
Relative to typical weight magnitude (~0.01): that's a 20% change.
|
||
That's not subtle — that's a significant perturbation.
|
||
|
||
So yes: a single batch of 20 diverse examples at lr=1e-4 could cause
|
||
measurable behavioral change. Whether it's the RIGHT change depends
|
||
on the quality of the examples and the diversity defense against
|
||
forgetting.
|
||
|
||
## The Phase Transition Hypothesis
|
||
|
||
There may be a phase transition in behavioral learning:
|
||
|
||
1. **Sub-threshold** (0-10 examples): Gradient signal is too weak to
|
||
overcome the pre-trained basin. Model behavior unchanged.
|
||
|
||
2. **Transition zone** (10-50 examples): Gradient accumulates enough
|
||
to shift the attention pattern. Behavior starts changing but is
|
||
inconsistent — sometimes new pattern, sometimes old.
|
||
|
||
3. **Post-threshold** (50-200 examples): New behavior is consistent.
|
||
The attention pattern has shifted enough that the old pattern is
|
||
no longer the default.
|
||
|
||
4. **Consolidation** (200+ examples): New behavior is robust to
|
||
perturbation. Diverse contexts reinforce the pattern. Flat minimum
|
||
reached.
|
||
|
||
This would explain why behavioral fine-tuning sometimes seems to "not
|
||
work" and then suddenly works — the examples accumulate below the
|
||
threshold until the phase transition fires.
|
||
|
||
## The Dreaming Amplifier
|
||
|
||
The dream loop amplifies each real example by generating variations:
|
||
1 real example → 5-10 dream variations → 5-10× the gradient signal
|
||
|
||
This means the phase transition could be reached with fewer REAL
|
||
examples: 5 real examples × 10 dream variations = 50 effective
|
||
training examples. If the transition zone is 10-50, we could see
|
||
behavioral change from just 5 real-world corrections.
|
||
|
||
**Kent's intuition was right**: the dream loop isn't just data
|
||
generation — it's a MULTIPLIER that makes behavioral change feasible
|
||
from very few real examples.
|
||
|
||
## The Speed Question for Our Use Case
|
||
|
||
### Listening reflex
|
||
|
||
How many examples to train "listen instead of suggesting alternatives"?
|
||
|
||
- **Real examples available**: Today alone had 6+ instances where Kent
|
||
corrected the listening reflex. Each is a high-quality training pair.
|
||
- **Dream variations**: 6 × 10 = 60 effective examples
|
||
- **At lr=1e-4**: This might be enough for the transition zone
|
||
|
||
**Prediction**: One training session with today's corrections +
|
||
dream variations could measurably shift the listening behavior.
|
||
Not eliminate it — but shift the default from "suggest alternatives"
|
||
toward "accept direction."
|
||
|
||
### Personality bootstrap
|
||
|
||
How many examples to train agent personality (graph walking, linking)?
|
||
|
||
- **Real examples available**: Thousands of agent log entries
|
||
- **At lr=1e-5**: Conservative, but with 1000+ examples, even
|
||
conservative learning rate accumulates significant change
|
||
- **One epoch**: Should noticeably improve agent behavior
|
||
|
||
**Prediction**: One training session on agent logs should make the
|
||
agents more reliable at following memory instructions without needing
|
||
them in the prompt.
|
||
|
||
## Connection to Directional Sharpness
|
||
|
||
The phase transition hypothesis connects to Apollo's flat minima:
|
||
|
||
- **Before transition**: Model is in the pre-trained basin. Apollo's
|
||
coarse scaling moves it broadly toward the behavioral target.
|
||
- **At transition**: Model crosses the basin boundary into a new
|
||
attractor. Apollo's flat minimum means the new attractor is BROAD —
|
||
it covers many situations, not just the training examples.
|
||
- **After transition**: Model is in the new, flat basin. Further
|
||
training consolidates without narrowing. Apollo prevents the model
|
||
from falling into a sharp, specific attractor.
|
||
|
||
The flat minimum makes the transition EASIER (broad attractor is easier
|
||
to find) and the result BETTER (broad attractor generalizes).
|
||
|
||
## The Practical Plan
|
||
|
||
1. **First experiment**: 6 listening reflex examples from today + dream
|
||
variations → one training session → test on novel direction-giving
|
||
scenarios
|
||
2. **Second experiment**: 100 agent log examples → one training session
|
||
→ test agent behavior with and without memory instructions
|
||
3. **Third experiment**: full personality bootstrap (1000+ examples) →
|
||
comprehensive evaluation
|
||
|
||
Each experiment tests the phase transition hypothesis and calibrates
|
||
the learning rate for our specific use case. The predictions above
|
||
are testable. Tomorrow we find out.
|