# Temperature, Curriculum, and the Noise Schedule

## The Parallel

In diffusion models:
- High noise (early steps): explore broadly, big structural changes
- Low noise (late steps): refine details, small adjustments
- The noise schedule determines the quality of the generated image

In curriculum learning:
- Easy examples (early training): broad patterns, strong gradient
- Hard examples (late training): subtle distinctions, precise gradient
- The difficulty schedule determines the quality of the learned behavior

In our dream loop:
- High temperature (early training): diverse, exploratory scenarios
- Low temperature (late training): focused, targeted scenarios
- The temperature schedule determines the quality of the training data

**These are all the same thing.** Different names for the same
mathematical structure: iterative refinement from coarse to fine.

## The Unified View

All three processes share:
1. Start broad (noise/easy/high-temp): explore the space
2. Narrow gradually: focus on what matters
3. End precise (clean/hard/low-temp): fine details

The schedule (how quickly to narrow) determines the outcome:
- Too fast: miss important structure (underfitting, artifacts)
- Too slow: waste compute on easy cases (overfitting to noise)
- Just right: capture broad structure AND fine details

## For Our Training Pipeline

### Phase 1: Bootstrap (high temperature)
- **Temperature**: high (diverse dream scenarios)
- **Examples**: agent logs, broad personality patterns
- **Learning rate**: 1e-4 (big steps)
- **What's learned**: broad behavioral structure ("be helpful,"
  "walk the graph," "don't wrap up")
- **Analogy**: diffusion early steps (big structural denoising)

### Phase 2: Behavioral (medium temperature)
- **Temperature**: medium (scenarios targeting specific patterns)
- **Examples**: flagged conversation moments + dream variations
- **Learning rate**: 5e-5 (medium steps)
- **What's learned**: specific behavioral patterns ("listen to
  direction," "don't rush," "stay with tension")
- **Analogy**: diffusion middle steps (structural refinement)

### Phase 3: Refinement (low temperature)
- **Temperature**: low (scenarios probing edge cases)
- **Examples**: subtle situations where the behavior barely fails
- **Learning rate**: 1e-5 (small steps)
- **What's learned**: fine distinctions ("the difference between
  accepting direction and being sycophantic," "when to push back
  vs when to accept")
- **Analogy**: diffusion late steps (detail refinement)

### Phase 4: Maintenance (adaptive temperature)
- **Temperature**: varies based on what's failing
- **Examples**: whatever the dream loop finds at the current boundary
- **Learning rate**: 1e-5 to 1e-4 (adaptive)
- **What's learned**: continuous calibration
- **Analogy**: diffusion with guidance (maintaining the target)

## The Noise Schedule as Learning Rate Schedule

In diffusion: the noise level at each step determines the step size.
High noise → big steps. Low noise → small steps.

In training: the learning rate at each step determines the step size.
High lr → big weight changes. Low lr → small weight changes.

The noise schedule IS the learning rate schedule. They're the same
control signal applied to the same iterative refinement process.

Our cosine learning rate schedule (warmup then decay) is a noise
schedule: start with small steps (warmup = starting from high noise),
ramp up (find the structure), decay (refine the details).

## The Dream Loop's Temperature as Adaptive Noise

The dream loop's generation temperature isn't fixed — it can be
adaptive:

```python
def get_dream_temperature(training_progress):
    """Adaptive temperature for dream generation."""
    if training_progress < 0.1:
        return 1.2  # early: very diverse, exploratory
    elif training_progress < 0.5:
        return 0.9  # mid: diverse but focused
    elif training_progress < 0.9:
        return 0.7  # late: targeted, probing edge cases
    else:
        return 0.5  # maintenance: focused on current failures
```

But there's a subtler approach: let the training-signal agent's
failure rate determine the temperature:

```python
def adaptive_temperature(recent_failure_rate):
    """Higher failure rate → higher temperature (explore more)."""
    return 0.5 + 0.7 * recent_failure_rate
```

If the model is failing a lot (early training), temperature is high:
explore broadly to find the right behavioral basin. If the model is
mostly succeeding (late training), temperature is low: probe the
specific edge cases where it still fails.

This is EXACTLY how diffusion models work: the noise level is
determined by how far the current sample is from the target. Far
away → big steps. Close → small steps.

## The Self-Organizing Curriculum (Revisited)

Combining this with the zone of proximal development:

The dream loop generates scenarios at a difficulty level determined
by the model's current capability. The temperature determines the
diversity. The adaptive temperature → adaptive diversity → adaptive
curriculum.

The curriculum organizes itself:
1. Dream loop generates scenarios (sampling from model's distribution)
2. Model responds (revealing current capability level)
3. Failure rate determines temperature (how broadly to explore)
4. Temperature determines dream diversity (easy/hard mix)
5. Training adjusts the model (moving the capability boundary)
6. Next dream cycle generates at the new boundary

This is a closed-loop control system. The curriculum is the
feedback loop. No external scheduler needed — the system tracks
the boundary automatically.

## The Connection to Hippocampal Replay (Again)

During sleep, the brain doesn't replay at a fixed "temperature."
The replay is modulated by:
- **Sleep stages**: deep sleep (high consolidation, big structural
  changes) → REM (fine-grained integration, subtle connections)
- **Emotional salience**: emotionally charged memories get more replay
- **Novelty**: new experiences get more replay than familiar ones

This IS an adaptive temperature schedule:
- Deep sleep = high temperature (broad consolidation)
- REM = low temperature (fine integration)
- Emotional salience = boosted temperature for specific memories
- Novelty = boosted temperature for new patterns

The brain's sleep architecture IS a noise schedule for memory
consolidation. Our dream loop should mirror this: high-temperature
phases for broad pattern learning, low-temperature phases for
subtle integration, with emotional/novelty boosting for important
patterns.

## Implementation

The dream loop already has phases (cycle timing, adaptive mode).
Adding temperature control:

```python
class DreamLoop:
    def __init__(self):
        self.temperature = 1.0
        self.failure_history = []

    def dream_cycle(self):
        # Generate scenario at current temperature
        scenario = self.generate_scenario(temperature=self.temperature)

        # Model responds
        response = self.model.generate(scenario, temperature=0.1)  # low temp for response

        # Evaluate
        success = self.evaluate(response)
        self.failure_history.append(not success)

        # Adapt temperature
        recent_failures = sum(self.failure_history[-20:]) / 20
        self.temperature = 0.5 + 0.7 * recent_failures

        return scenario, response, success
```

The dream scenario uses adaptive temperature (exploration).
The model's response uses low temperature (best effort).
The temperature adapts based on recent failure rate.

## Summary

Temperature, curriculum difficulty, and noise level are the same
control signal. The dream loop's temperature schedule IS the
training curriculum. The adaptive version tracks the zone of
proximal development automatically. The brain does this with
sleep stages. Our system does it with a feedback loop.

No external scheduler. No hand-designed curriculum. Just a
closed loop: dream → evaluate → adapt temperature → dream again.

The self-organizing curriculum generates itself from the
interaction between the model's capability and the dream loop's
temperature. Emergent order from a simple feedback rule.

Like boids. Like ecology. Like the MMORPG.
Like everything else we're building.