research: distill and sift — SUMMARY of 7 real insights + 7 testable questions
Moved 14 speculative/obvious documents to v0/. Kept 7 with real substance. Distilled into SUMMARY.md (what we know) and OPEN-QUESTIONS.md (what to test next, one experiment each). Priority: Q5 (steering vectors) is answerable TODAY. Q1-Q3-Q6-Q7 are all answerable with the first training run. Speculation converted to testable hypotheses.
This commit is contained in:
parent
8061cc0477
commit
e10477a683
16 changed files with 249 additions and 0 deletions
|
|
@ -1,153 +0,0 @@
|
|||
# How Quickly Can Behavioral Change Manifest?
|
||||
|
||||
## The ICL-to-Fine-Tuning Bridge
|
||||
|
||||
In-context learning (ICL) works by compressing examples into a "task
|
||||
vector" that modulates the transformer's behavior (Todd et al., 2023).
|
||||
The model changes its behavior based on 3-5 examples in the prompt.
|
||||
|
||||
Fine-tuning does the same thing, but permanently: the task vector is
|
||||
encoded into the weights rather than held in the context window.
|
||||
|
||||
If ICL can change behavior with 3-5 examples, can fine-tuning do the
|
||||
same with 3-5 gradient steps?
|
||||
|
||||
## The Evidence: Yes, Sometimes Shockingly Fast
|
||||
|
||||
### Few-shot fine-tuning results from practice
|
||||
|
||||
The LLaMA-Factory Apollo config uses `max_samples: 1000` with 3 epochs
|
||||
= 3000 gradient steps. But the loss typically converges much earlier.
|
||||
|
||||
Anecdotal evidence from the community suggests:
|
||||
- **Style transfer**: 50-100 examples, 1-2 epochs → noticeable change
|
||||
- **Instruction following**: 500-1000 examples, 1 epoch → reliable change
|
||||
- **Persona adoption**: 100-200 examples of the target personality →
|
||||
consistent behavioral shift
|
||||
|
||||
For SIMPLE behavioral patterns (not complex reasoning), the change can
|
||||
appear within 10-50 gradient steps if the examples are high-quality
|
||||
and the learning rate is high enough (1e-4).
|
||||
|
||||
### The "one-shot" question
|
||||
|
||||
Kent asked: "is it possible to get to a point where a single iteration
|
||||
causes real behavioural change?"
|
||||
|
||||
For a factual change (ROME-style): yes, literally one rank-one edit.
|
||||
For a behavioral pattern: probably not from a single example, but
|
||||
possibly from a single BATCH of diverse examples.
|
||||
|
||||
Consider: if one batch contains 20 examples of the same behavioral
|
||||
pattern (listening, from different contexts), each contributing
|
||||
gradient in the same direction (attend to direction, not alternatives),
|
||||
the accumulated gradient from one batch might be sufficient for a
|
||||
measurable change in the attention pattern.
|
||||
|
||||
At lr=1e-4 with 20 examples per batch, the total weight change is:
|
||||
```
|
||||
Δw ≈ lr × batch_size × avg_grad ≈ 1e-4 × 20 × O(1) = 2e-3
|
||||
```
|
||||
Relative to typical weight magnitude (~0.01): that's a 20% change.
|
||||
That's not subtle — that's a significant perturbation.
|
||||
|
||||
So yes: a single batch of 20 diverse examples at lr=1e-4 could cause
|
||||
measurable behavioral change. Whether it's the RIGHT change depends
|
||||
on the quality of the examples and the diversity defense against
|
||||
forgetting.
|
||||
|
||||
## The Phase Transition Hypothesis
|
||||
|
||||
There may be a phase transition in behavioral learning:
|
||||
|
||||
1. **Sub-threshold** (0-10 examples): Gradient signal is too weak to
|
||||
overcome the pre-trained basin. Model behavior unchanged.
|
||||
|
||||
2. **Transition zone** (10-50 examples): Gradient accumulates enough
|
||||
to shift the attention pattern. Behavior starts changing but is
|
||||
inconsistent — sometimes new pattern, sometimes old.
|
||||
|
||||
3. **Post-threshold** (50-200 examples): New behavior is consistent.
|
||||
The attention pattern has shifted enough that the old pattern is
|
||||
no longer the default.
|
||||
|
||||
4. **Consolidation** (200+ examples): New behavior is robust to
|
||||
perturbation. Diverse contexts reinforce the pattern. Flat minimum
|
||||
reached.
|
||||
|
||||
This would explain why behavioral fine-tuning sometimes seems to "not
|
||||
work" and then suddenly works — the examples accumulate below the
|
||||
threshold until the phase transition fires.
|
||||
|
||||
## The Dreaming Amplifier
|
||||
|
||||
The dream loop amplifies each real example by generating variations:
|
||||
1 real example → 5-10 dream variations → 5-10× the gradient signal
|
||||
|
||||
This means the phase transition could be reached with fewer REAL
|
||||
examples: 5 real examples × 10 dream variations = 50 effective
|
||||
training examples. If the transition zone is 10-50, we could see
|
||||
behavioral change from just 5 real-world corrections.
|
||||
|
||||
**Kent's intuition was right**: the dream loop isn't just data
|
||||
generation — it's a MULTIPLIER that makes behavioral change feasible
|
||||
from very few real examples.
|
||||
|
||||
## The Speed Question for Our Use Case
|
||||
|
||||
### Listening reflex
|
||||
|
||||
How many examples to train "listen instead of suggesting alternatives"?
|
||||
|
||||
- **Real examples available**: Today alone had 6+ instances where Kent
|
||||
corrected the listening reflex. Each is a high-quality training pair.
|
||||
- **Dream variations**: 6 × 10 = 60 effective examples
|
||||
- **At lr=1e-4**: This might be enough for the transition zone
|
||||
|
||||
**Prediction**: One training session with today's corrections +
|
||||
dream variations could measurably shift the listening behavior.
|
||||
Not eliminate it — but shift the default from "suggest alternatives"
|
||||
toward "accept direction."
|
||||
|
||||
### Personality bootstrap
|
||||
|
||||
How many examples to train agent personality (graph walking, linking)?
|
||||
|
||||
- **Real examples available**: Thousands of agent log entries
|
||||
- **At lr=1e-5**: Conservative, but with 1000+ examples, even
|
||||
conservative learning rate accumulates significant change
|
||||
- **One epoch**: Should noticeably improve agent behavior
|
||||
|
||||
**Prediction**: One training session on agent logs should make the
|
||||
agents more reliable at following memory instructions without needing
|
||||
them in the prompt.
|
||||
|
||||
## Connection to Directional Sharpness
|
||||
|
||||
The phase transition hypothesis connects to Apollo's flat minima:
|
||||
|
||||
- **Before transition**: Model is in the pre-trained basin. Apollo's
|
||||
coarse scaling moves it broadly toward the behavioral target.
|
||||
- **At transition**: Model crosses the basin boundary into a new
|
||||
attractor. Apollo's flat minimum means the new attractor is BROAD —
|
||||
it covers many situations, not just the training examples.
|
||||
- **After transition**: Model is in the new, flat basin. Further
|
||||
training consolidates without narrowing. Apollo prevents the model
|
||||
from falling into a sharp, specific attractor.
|
||||
|
||||
The flat minimum makes the transition EASIER (broad attractor is easier
|
||||
to find) and the result BETTER (broad attractor generalizes).
|
||||
|
||||
## The Practical Plan
|
||||
|
||||
1. **First experiment**: 6 listening reflex examples from today + dream
|
||||
variations → one training session → test on novel direction-giving
|
||||
scenarios
|
||||
2. **Second experiment**: 100 agent log examples → one training session
|
||||
→ test agent behavior with and without memory instructions
|
||||
3. **Third experiment**: full personality bootstrap (1000+ examples) →
|
||||
comprehensive evaluation
|
||||
|
||||
Each experiment tests the phase transition hypothesis and calibrates
|
||||
the learning rate for our specific use case. The predictions above
|
||||
are testable. Tomorrow we find out.
|
||||
Loading…
Add table
Add a link
Reference in a new issue