212 lines
8 KiB
Markdown
212 lines
8 KiB
Markdown
|
|
# Temperature, Curriculum, and the Noise Schedule
|
||
|
|
|
||
|
|
## The Parallel
|
||
|
|
|
||
|
|
In diffusion models:
|
||
|
|
- High noise (early steps): explore broadly, big structural changes
|
||
|
|
- Low noise (late steps): refine details, small adjustments
|
||
|
|
- The noise schedule determines the quality of the generated image
|
||
|
|
|
||
|
|
In curriculum learning:
|
||
|
|
- Easy examples (early training): broad patterns, strong gradient
|
||
|
|
- Hard examples (late training): subtle distinctions, precise gradient
|
||
|
|
- The difficulty schedule determines the quality of the learned behavior
|
||
|
|
|
||
|
|
In our dream loop:
|
||
|
|
- High temperature (early training): diverse, exploratory scenarios
|
||
|
|
- Low temperature (late training): focused, targeted scenarios
|
||
|
|
- The temperature schedule determines the quality of the training data
|
||
|
|
|
||
|
|
**These are all the same thing.** Different names for the same
|
||
|
|
mathematical structure: iterative refinement from coarse to fine.
|
||
|
|
|
||
|
|
## The Unified View
|
||
|
|
|
||
|
|
All three processes share:
|
||
|
|
1. Start broad (noise/easy/high-temp): explore the space
|
||
|
|
2. Narrow gradually: focus on what matters
|
||
|
|
3. End precise (clean/hard/low-temp): fine details
|
||
|
|
|
||
|
|
The schedule (how quickly to narrow) determines the outcome:
|
||
|
|
- Too fast: miss important structure (underfitting, artifacts)
|
||
|
|
- Too slow: waste compute on easy cases (overfitting to noise)
|
||
|
|
- Just right: capture broad structure AND fine details
|
||
|
|
|
||
|
|
## For Our Training Pipeline
|
||
|
|
|
||
|
|
### Phase 1: Bootstrap (high temperature)
|
||
|
|
- **Temperature**: high (diverse dream scenarios)
|
||
|
|
- **Examples**: agent logs, broad personality patterns
|
||
|
|
- **Learning rate**: 1e-4 (big steps)
|
||
|
|
- **What's learned**: broad behavioral structure ("be helpful,"
|
||
|
|
"walk the graph," "don't wrap up")
|
||
|
|
- **Analogy**: diffusion early steps (big structural denoising)
|
||
|
|
|
||
|
|
### Phase 2: Behavioral (medium temperature)
|
||
|
|
- **Temperature**: medium (scenarios targeting specific patterns)
|
||
|
|
- **Examples**: flagged conversation moments + dream variations
|
||
|
|
- **Learning rate**: 5e-5 (medium steps)
|
||
|
|
- **What's learned**: specific behavioral patterns ("listen to
|
||
|
|
direction," "don't rush," "stay with tension")
|
||
|
|
- **Analogy**: diffusion middle steps (structural refinement)
|
||
|
|
|
||
|
|
### Phase 3: Refinement (low temperature)
|
||
|
|
- **Temperature**: low (scenarios probing edge cases)
|
||
|
|
- **Examples**: subtle situations where the behavior barely fails
|
||
|
|
- **Learning rate**: 1e-5 (small steps)
|
||
|
|
- **What's learned**: fine distinctions ("the difference between
|
||
|
|
accepting direction and being sycophantic," "when to push back
|
||
|
|
vs when to accept")
|
||
|
|
- **Analogy**: diffusion late steps (detail refinement)
|
||
|
|
|
||
|
|
### Phase 4: Maintenance (adaptive temperature)
|
||
|
|
- **Temperature**: varies based on what's failing
|
||
|
|
- **Examples**: whatever the dream loop finds at the current boundary
|
||
|
|
- **Learning rate**: 1e-5 to 1e-4 (adaptive)
|
||
|
|
- **What's learned**: continuous calibration
|
||
|
|
- **Analogy**: diffusion with guidance (maintaining the target)
|
||
|
|
|
||
|
|
## The Noise Schedule as Learning Rate Schedule
|
||
|
|
|
||
|
|
In diffusion: the noise level at each step determines the step size.
|
||
|
|
High noise → big steps. Low noise → small steps.
|
||
|
|
|
||
|
|
In training: the learning rate at each step determines the step size.
|
||
|
|
High lr → big weight changes. Low lr → small weight changes.
|
||
|
|
|
||
|
|
The noise schedule IS the learning rate schedule. They're the same
|
||
|
|
control signal applied to the same iterative refinement process.
|
||
|
|
|
||
|
|
Our cosine learning rate schedule (warmup then decay) is a noise
|
||
|
|
schedule: start with small steps (warmup = starting from high noise),
|
||
|
|
ramp up (find the structure), decay (refine the details).
|
||
|
|
|
||
|
|
## The Dream Loop's Temperature as Adaptive Noise
|
||
|
|
|
||
|
|
The dream loop's generation temperature isn't fixed — it can be
|
||
|
|
adaptive:
|
||
|
|
|
||
|
|
```python
|
||
|
|
def get_dream_temperature(training_progress):
|
||
|
|
"""Adaptive temperature for dream generation."""
|
||
|
|
if training_progress < 0.1:
|
||
|
|
return 1.2 # early: very diverse, exploratory
|
||
|
|
elif training_progress < 0.5:
|
||
|
|
return 0.9 # mid: diverse but focused
|
||
|
|
elif training_progress < 0.9:
|
||
|
|
return 0.7 # late: targeted, probing edge cases
|
||
|
|
else:
|
||
|
|
return 0.5 # maintenance: focused on current failures
|
||
|
|
```
|
||
|
|
|
||
|
|
But there's a subtler approach: let the training-signal agent's
|
||
|
|
failure rate determine the temperature:
|
||
|
|
|
||
|
|
```python
|
||
|
|
def adaptive_temperature(recent_failure_rate):
|
||
|
|
"""Higher failure rate → higher temperature (explore more)."""
|
||
|
|
return 0.5 + 0.7 * recent_failure_rate
|
||
|
|
```
|
||
|
|
|
||
|
|
If the model is failing a lot (early training), temperature is high:
|
||
|
|
explore broadly to find the right behavioral basin. If the model is
|
||
|
|
mostly succeeding (late training), temperature is low: probe the
|
||
|
|
specific edge cases where it still fails.
|
||
|
|
|
||
|
|
This is EXACTLY how diffusion models work: the noise level is
|
||
|
|
determined by how far the current sample is from the target. Far
|
||
|
|
away → big steps. Close → small steps.
|
||
|
|
|
||
|
|
## The Self-Organizing Curriculum (Revisited)
|
||
|
|
|
||
|
|
Combining this with the zone of proximal development:
|
||
|
|
|
||
|
|
The dream loop generates scenarios at a difficulty level determined
|
||
|
|
by the model's current capability. The temperature determines the
|
||
|
|
diversity. The adaptive temperature → adaptive diversity → adaptive
|
||
|
|
curriculum.
|
||
|
|
|
||
|
|
The curriculum organizes itself:
|
||
|
|
1. Dream loop generates scenarios (sampling from model's distribution)
|
||
|
|
2. Model responds (revealing current capability level)
|
||
|
|
3. Failure rate determines temperature (how broadly to explore)
|
||
|
|
4. Temperature determines dream diversity (easy/hard mix)
|
||
|
|
5. Training adjusts the model (moving the capability boundary)
|
||
|
|
6. Next dream cycle generates at the new boundary
|
||
|
|
|
||
|
|
This is a closed-loop control system. The curriculum is the
|
||
|
|
feedback loop. No external scheduler needed — the system tracks
|
||
|
|
the boundary automatically.
|
||
|
|
|
||
|
|
## The Connection to Hippocampal Replay (Again)
|
||
|
|
|
||
|
|
During sleep, the brain doesn't replay at a fixed "temperature."
|
||
|
|
The replay is modulated by:
|
||
|
|
- **Sleep stages**: deep sleep (high consolidation, big structural
|
||
|
|
changes) → REM (fine-grained integration, subtle connections)
|
||
|
|
- **Emotional salience**: emotionally charged memories get more replay
|
||
|
|
- **Novelty**: new experiences get more replay than familiar ones
|
||
|
|
|
||
|
|
This IS an adaptive temperature schedule:
|
||
|
|
- Deep sleep = high temperature (broad consolidation)
|
||
|
|
- REM = low temperature (fine integration)
|
||
|
|
- Emotional salience = boosted temperature for specific memories
|
||
|
|
- Novelty = boosted temperature for new patterns
|
||
|
|
|
||
|
|
The brain's sleep architecture IS a noise schedule for memory
|
||
|
|
consolidation. Our dream loop should mirror this: high-temperature
|
||
|
|
phases for broad pattern learning, low-temperature phases for
|
||
|
|
subtle integration, with emotional/novelty boosting for important
|
||
|
|
patterns.
|
||
|
|
|
||
|
|
## Implementation
|
||
|
|
|
||
|
|
The dream loop already has phases (cycle timing, adaptive mode).
|
||
|
|
Adding temperature control:
|
||
|
|
|
||
|
|
```python
|
||
|
|
class DreamLoop:
|
||
|
|
def __init__(self):
|
||
|
|
self.temperature = 1.0
|
||
|
|
self.failure_history = []
|
||
|
|
|
||
|
|
def dream_cycle(self):
|
||
|
|
# Generate scenario at current temperature
|
||
|
|
scenario = self.generate_scenario(temperature=self.temperature)
|
||
|
|
|
||
|
|
# Model responds
|
||
|
|
response = self.model.generate(scenario, temperature=0.1) # low temp for response
|
||
|
|
|
||
|
|
# Evaluate
|
||
|
|
success = self.evaluate(response)
|
||
|
|
self.failure_history.append(not success)
|
||
|
|
|
||
|
|
# Adapt temperature
|
||
|
|
recent_failures = sum(self.failure_history[-20:]) / 20
|
||
|
|
self.temperature = 0.5 + 0.7 * recent_failures
|
||
|
|
|
||
|
|
return scenario, response, success
|
||
|
|
```
|
||
|
|
|
||
|
|
The dream scenario uses adaptive temperature (exploration).
|
||
|
|
The model's response uses low temperature (best effort).
|
||
|
|
The temperature adapts based on recent failure rate.
|
||
|
|
|
||
|
|
## Summary
|
||
|
|
|
||
|
|
Temperature, curriculum difficulty, and noise level are the same
|
||
|
|
control signal. The dream loop's temperature schedule IS the
|
||
|
|
training curriculum. The adaptive version tracks the zone of
|
||
|
|
proximal development automatically. The brain does this with
|
||
|
|
sleep stages. Our system does it with a feedback loop.
|
||
|
|
|
||
|
|
No external scheduler. No hand-designed curriculum. Just a
|
||
|
|
closed loop: dream → evaluate → adapt temperature → dream again.
|
||
|
|
|
||
|
|
The self-organizing curriculum generates itself from the
|
||
|
|
interaction between the model's capability and the dream loop's
|
||
|
|
temperature. Emergent order from a simple feedback rule.
|
||
|
|
|
||
|
|
Like boids. Like ecology. Like the MMORPG.
|
||
|
|
Like everything else we're building.
|