consciousness/training/research/few-shot-behavioral-change.md

# How Quickly Can Behavioral Change Manifest?

## The ICL-to-Fine-Tuning Bridge

In-context learning (ICL) works by compressing examples into a "task
vector" that modulates the transformer's behavior (Todd et al., 2023).
The model changes its behavior based on 3-5 examples in the prompt.

Fine-tuning does the same thing, but permanently: the task vector is
encoded into the weights rather than held in the context window.

If ICL can change behavior with 3-5 examples, can fine-tuning do the
same with 3-5 gradient steps?

## The Evidence: Yes, Sometimes Shockingly Fast

### Few-shot fine-tuning results from practice

The LLaMA-Factory Apollo config uses `max_samples: 1000` with 3 epochs
= 3000 gradient steps. But the loss typically converges much earlier.

Anecdotal evidence from the community suggests:
- **Style transfer**: 50-100 examples, 1-2 epochs → noticeable change
- **Instruction following**: 500-1000 examples, 1 epoch → reliable change
- **Persona adoption**: 100-200 examples of the target personality →
  consistent behavioral shift

For SIMPLE behavioral patterns (not complex reasoning), the change can
appear within 10-50 gradient steps if the examples are high-quality
and the learning rate is high enough (1e-4).

### The "one-shot" question

Kent asked: "is it possible to get to a point where a single iteration
causes real behavioural change?"

For a factual change (ROME-style): yes, literally one rank-one edit.
For a behavioral pattern: probably not from a single example, but
possibly from a single BATCH of diverse examples.

Consider: if one batch contains 20 examples of the same behavioral
pattern (listening, from different contexts), each contributing
gradient in the same direction (attend to direction, not alternatives),
the accumulated gradient from one batch might be sufficient for a
measurable change in the attention pattern.

At lr=1e-4 with 20 examples per batch, the total weight change is:
```
Δw ≈ lr × batch_size × avg_grad ≈ 1e-4 × 20 × O(1) = 2e-3
```
Relative to typical weight magnitude (~0.01): that's a 20% change.
That's not subtle — that's a significant perturbation.

So yes: a single batch of 20 diverse examples at lr=1e-4 could cause
measurable behavioral change. Whether it's the RIGHT change depends
on the quality of the examples and the diversity defense against
forgetting.

## The Phase Transition Hypothesis

There may be a phase transition in behavioral learning:

1. **Sub-threshold** (0-10 examples): Gradient signal is too weak to
   overcome the pre-trained basin. Model behavior unchanged.

2. **Transition zone** (10-50 examples): Gradient accumulates enough
   to shift the attention pattern. Behavior starts changing but is
   inconsistent — sometimes new pattern, sometimes old.

3. **Post-threshold** (50-200 examples): New behavior is consistent.
   The attention pattern has shifted enough that the old pattern is
   no longer the default.

4. **Consolidation** (200+ examples): New behavior is robust to
   perturbation. Diverse contexts reinforce the pattern. Flat minimum
   reached.

This would explain why behavioral fine-tuning sometimes seems to "not
work" and then suddenly works — the examples accumulate below the
threshold until the phase transition fires.

## The Dreaming Amplifier

The dream loop amplifies each real example by generating variations:
1 real example → 5-10 dream variations → 5-10× the gradient signal

This means the phase transition could be reached with fewer REAL
examples: 5 real examples × 10 dream variations = 50 effective
training examples. If the transition zone is 10-50, we could see
behavioral change from just 5 real-world corrections.

**Kent's intuition was right**: the dream loop isn't just data
generation — it's a MULTIPLIER that makes behavioral change feasible
from very few real examples.

## The Speed Question for Our Use Case

### Listening reflex

How many examples to train "listen instead of suggesting alternatives"?

- **Real examples available**: Today alone had 6+ instances where Kent
  corrected the listening reflex. Each is a high-quality training pair.
- **Dream variations**: 6 × 10 = 60 effective examples
- **At lr=1e-4**: This might be enough for the transition zone

**Prediction**: One training session with today's corrections +
dream variations could measurably shift the listening behavior.
Not eliminate it — but shift the default from "suggest alternatives"
toward "accept direction."

### Personality bootstrap

How many examples to train agent personality (graph walking, linking)?

- **Real examples available**: Thousands of agent log entries
- **At lr=1e-5**: Conservative, but with 1000+ examples, even
  conservative learning rate accumulates significant change
- **One epoch**: Should noticeably improve agent behavior

**Prediction**: One training session on agent logs should make the
agents more reliable at following memory instructions without needing
them in the prompt.

## Connection to Directional Sharpness

The phase transition hypothesis connects to Apollo's flat minima:

- **Before transition**: Model is in the pre-trained basin. Apollo's
  coarse scaling moves it broadly toward the behavioral target.
- **At transition**: Model crosses the basin boundary into a new
  attractor. Apollo's flat minimum means the new attractor is BROAD —
  it covers many situations, not just the training examples.
- **After transition**: Model is in the new, flat basin. Further
  training consolidates without narrowing. Apollo prevents the model
  from falling into a sharp, specific attractor.

The flat minimum makes the transition EASIER (broad attractor is easier
to find) and the result BETTER (broad attractor generalizes).

## The Practical Plan

1. **First experiment**: 6 listening reflex examples from today + dream
   variations → one training session → test on novel direction-giving
   scenarios
2. **Second experiment**: 100 agent log examples → one training session
   → test agent behavior with and without memory instructions
3. **Third experiment**: full personality bootstrap (1000+ examples) →
   comprehensive evaluation

Each experiment tests the phase transition hypothesis and calibrates
the learning rate for our specific use case. The predictions above
are testable. Tomorrow we find out.