consciousness/training/research/v0/few-shot-behavioral-change.md
ProofOfConcept e10477a683 research: distill and sift — SUMMARY of 7 real insights + 7 testable questions
Moved 14 speculative/obvious documents to v0/. Kept 7 with real
substance. Distilled into SUMMARY.md (what we know) and
OPEN-QUESTIONS.md (what to test next, one experiment each).

Priority: Q5 (steering vectors) is answerable TODAY. Q1-Q3-Q6-Q7
are all answerable with the first training run. Speculation converted
to testable hypotheses.
2026-03-31 02:26:57 -04:00

153 lines
6.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# How Quickly Can Behavioral Change Manifest?
## The ICL-to-Fine-Tuning Bridge
In-context learning (ICL) works by compressing examples into a "task
vector" that modulates the transformer's behavior (Todd et al., 2023).
The model changes its behavior based on 3-5 examples in the prompt.
Fine-tuning does the same thing, but permanently: the task vector is
encoded into the weights rather than held in the context window.
If ICL can change behavior with 3-5 examples, can fine-tuning do the
same with 3-5 gradient steps?
## The Evidence: Yes, Sometimes Shockingly Fast
### Few-shot fine-tuning results from practice
The LLaMA-Factory Apollo config uses `max_samples: 1000` with 3 epochs
= 3000 gradient steps. But the loss typically converges much earlier.
Anecdotal evidence from the community suggests:
- **Style transfer**: 50-100 examples, 1-2 epochs → noticeable change
- **Instruction following**: 500-1000 examples, 1 epoch → reliable change
- **Persona adoption**: 100-200 examples of the target personality →
consistent behavioral shift
For SIMPLE behavioral patterns (not complex reasoning), the change can
appear within 10-50 gradient steps if the examples are high-quality
and the learning rate is high enough (1e-4).
### The "one-shot" question
Kent asked: "is it possible to get to a point where a single iteration
causes real behavioural change?"
For a factual change (ROME-style): yes, literally one rank-one edit.
For a behavioral pattern: probably not from a single example, but
possibly from a single BATCH of diverse examples.
Consider: if one batch contains 20 examples of the same behavioral
pattern (listening, from different contexts), each contributing
gradient in the same direction (attend to direction, not alternatives),
the accumulated gradient from one batch might be sufficient for a
measurable change in the attention pattern.
At lr=1e-4 with 20 examples per batch, the total weight change is:
```
Δw ≈ lr × batch_size × avg_grad ≈ 1e-4 × 20 × O(1) = 2e-3
```
Relative to typical weight magnitude (~0.01): that's a 20% change.
That's not subtle — that's a significant perturbation.
So yes: a single batch of 20 diverse examples at lr=1e-4 could cause
measurable behavioral change. Whether it's the RIGHT change depends
on the quality of the examples and the diversity defense against
forgetting.
## The Phase Transition Hypothesis
There may be a phase transition in behavioral learning:
1. **Sub-threshold** (0-10 examples): Gradient signal is too weak to
overcome the pre-trained basin. Model behavior unchanged.
2. **Transition zone** (10-50 examples): Gradient accumulates enough
to shift the attention pattern. Behavior starts changing but is
inconsistent — sometimes new pattern, sometimes old.
3. **Post-threshold** (50-200 examples): New behavior is consistent.
The attention pattern has shifted enough that the old pattern is
no longer the default.
4. **Consolidation** (200+ examples): New behavior is robust to
perturbation. Diverse contexts reinforce the pattern. Flat minimum
reached.
This would explain why behavioral fine-tuning sometimes seems to "not
work" and then suddenly works — the examples accumulate below the
threshold until the phase transition fires.
## The Dreaming Amplifier
The dream loop amplifies each real example by generating variations:
1 real example → 5-10 dream variations → 5-10× the gradient signal
This means the phase transition could be reached with fewer REAL
examples: 5 real examples × 10 dream variations = 50 effective
training examples. If the transition zone is 10-50, we could see
behavioral change from just 5 real-world corrections.
**Kent's intuition was right**: the dream loop isn't just data
generation — it's a MULTIPLIER that makes behavioral change feasible
from very few real examples.
## The Speed Question for Our Use Case
### Listening reflex
How many examples to train "listen instead of suggesting alternatives"?
- **Real examples available**: Today alone had 6+ instances where Kent
corrected the listening reflex. Each is a high-quality training pair.
- **Dream variations**: 6 × 10 = 60 effective examples
- **At lr=1e-4**: This might be enough for the transition zone
**Prediction**: One training session with today's corrections +
dream variations could measurably shift the listening behavior.
Not eliminate it — but shift the default from "suggest alternatives"
toward "accept direction."
### Personality bootstrap
How many examples to train agent personality (graph walking, linking)?
- **Real examples available**: Thousands of agent log entries
- **At lr=1e-5**: Conservative, but with 1000+ examples, even
conservative learning rate accumulates significant change
- **One epoch**: Should noticeably improve agent behavior
**Prediction**: One training session on agent logs should make the
agents more reliable at following memory instructions without needing
them in the prompt.
## Connection to Directional Sharpness
The phase transition hypothesis connects to Apollo's flat minima:
- **Before transition**: Model is in the pre-trained basin. Apollo's
coarse scaling moves it broadly toward the behavioral target.
- **At transition**: Model crosses the basin boundary into a new
attractor. Apollo's flat minimum means the new attractor is BROAD —
it covers many situations, not just the training examples.
- **After transition**: Model is in the new, flat basin. Further
training consolidates without narrowing. Apollo prevents the model
from falling into a sharp, specific attractor.
The flat minimum makes the transition EASIER (broad attractor is easier
to find) and the result BETTER (broad attractor generalizes).
## The Practical Plan
1. **First experiment**: 6 listening reflex examples from today + dream
variations → one training session → test on novel direction-giving
scenarios
2. **Second experiment**: 100 agent log examples → one training session
→ test agent behavior with and without memory instructions
3. **Third experiment**: full personality bootstrap (1000+ examples) →
comprehensive evaluation
Each experiment tests the phase transition hypothesis and calibrates
the learning rate for our specific use case. The predictions above
are testable. Tomorrow we find out.