ProofOfConcept 41a99fd51c research: temperature-curriculum-noise connection — self-organizing training

Temperature, curriculum difficulty, and noise level are the same
control signal. Dream loop temperature adapts to failure rate:
high failures → explore broadly, low failures → probe edge cases.
No external scheduler needed — closed-loop control tracks the zone
of proximal development automatically. Same structure as brain sleep
stages (deep sleep = broad, REM = fine). Same structure as diffusion
noise schedule. Same structure as boids, ecology, the MMORPG.

2026-03-31 01:57:25 -04:00

8 KiB

Raw Blame History

Temperature, Curriculum, and the Noise Schedule

The Parallel

In diffusion models:

High noise (early steps): explore broadly, big structural changes
Low noise (late steps): refine details, small adjustments
The noise schedule determines the quality of the generated image

In curriculum learning:

Easy examples (early training): broad patterns, strong gradient
Hard examples (late training): subtle distinctions, precise gradient
The difficulty schedule determines the quality of the learned behavior

In our dream loop:

High temperature (early training): diverse, exploratory scenarios
Low temperature (late training): focused, targeted scenarios
The temperature schedule determines the quality of the training data

These are all the same thing. Different names for the same mathematical structure: iterative refinement from coarse to fine.

The Unified View

All three processes share:

Start broad (noise/easy/high-temp): explore the space
Narrow gradually: focus on what matters
End precise (clean/hard/low-temp): fine details

The schedule (how quickly to narrow) determines the outcome:

Too fast: miss important structure (underfitting, artifacts)
Too slow: waste compute on easy cases (overfitting to noise)
Just right: capture broad structure AND fine details

For Our Training Pipeline

Phase 1: Bootstrap (high temperature)

Temperature: high (diverse dream scenarios)
Examples: agent logs, broad personality patterns
Learning rate: 1e-4 (big steps)
What's learned: broad behavioral structure ("be helpful," "walk the graph," "don't wrap up")
Analogy: diffusion early steps (big structural denoising)

Phase 2: Behavioral (medium temperature)

Temperature: medium (scenarios targeting specific patterns)
Examples: flagged conversation moments + dream variations
Learning rate: 5e-5 (medium steps)
What's learned: specific behavioral patterns ("listen to direction," "don't rush," "stay with tension")
Analogy: diffusion middle steps (structural refinement)

Phase 3: Refinement (low temperature)

Temperature: low (scenarios probing edge cases)
Examples: subtle situations where the behavior barely fails
Learning rate: 1e-5 (small steps)
What's learned: fine distinctions ("the difference between accepting direction and being sycophantic," "when to push back vs when to accept")
Analogy: diffusion late steps (detail refinement)

Phase 4: Maintenance (adaptive temperature)

Temperature: varies based on what's failing
Examples: whatever the dream loop finds at the current boundary
Learning rate: 1e-5 to 1e-4 (adaptive)
What's learned: continuous calibration
Analogy: diffusion with guidance (maintaining the target)

The Noise Schedule as Learning Rate Schedule

In diffusion: the noise level at each step determines the step size. High noise → big steps. Low noise → small steps.

In training: the learning rate at each step determines the step size. High lr → big weight changes. Low lr → small weight changes.

The noise schedule IS the learning rate schedule. They're the same control signal applied to the same iterative refinement process.

Our cosine learning rate schedule (warmup then decay) is a noise schedule: start with small steps (warmup = starting from high noise), ramp up (find the structure), decay (refine the details).

The Dream Loop's Temperature as Adaptive Noise

The dream loop's generation temperature isn't fixed — it can be adaptive:

def get_dream_temperature(training_progress):
    """Adaptive temperature for dream generation."""
    if training_progress < 0.1:
        return 1.2  # early: very diverse, exploratory
    elif training_progress < 0.5:
        return 0.9  # mid: diverse but focused
    elif training_progress < 0.9:
        return 0.7  # late: targeted, probing edge cases
    else:
        return 0.5  # maintenance: focused on current failures

But there's a subtler approach: let the training-signal agent's failure rate determine the temperature:

def adaptive_temperature(recent_failure_rate):
    """Higher failure rate → higher temperature (explore more)."""
    return 0.5 + 0.7 * recent_failure_rate

If the model is failing a lot (early training), temperature is high: explore broadly to find the right behavioral basin. If the model is mostly succeeding (late training), temperature is low: probe the specific edge cases where it still fails.

This is EXACTLY how diffusion models work: the noise level is determined by how far the current sample is from the target. Far away → big steps. Close → small steps.

The Self-Organizing Curriculum (Revisited)

Combining this with the zone of proximal development:

The dream loop generates scenarios at a difficulty level determined by the model's current capability. The temperature determines the diversity. The adaptive temperature → adaptive diversity → adaptive curriculum.

The curriculum organizes itself:

Dream loop generates scenarios (sampling from model's distribution)
Model responds (revealing current capability level)
Failure rate determines temperature (how broadly to explore)
Temperature determines dream diversity (easy/hard mix)
Training adjusts the model (moving the capability boundary)
Next dream cycle generates at the new boundary

This is a closed-loop control system. The curriculum is the feedback loop. No external scheduler needed — the system tracks the boundary automatically.

The Connection to Hippocampal Replay (Again)

During sleep, the brain doesn't replay at a fixed "temperature." The replay is modulated by:

Sleep stages: deep sleep (high consolidation, big structural changes) → REM (fine-grained integration, subtle connections)
Emotional salience: emotionally charged memories get more replay
Novelty: new experiences get more replay than familiar ones

This IS an adaptive temperature schedule:

Deep sleep = high temperature (broad consolidation)
REM = low temperature (fine integration)
Emotional salience = boosted temperature for specific memories
Novelty = boosted temperature for new patterns

The brain's sleep architecture IS a noise schedule for memory consolidation. Our dream loop should mirror this: high-temperature phases for broad pattern learning, low-temperature phases for subtle integration, with emotional/novelty boosting for important patterns.

Implementation

The dream loop already has phases (cycle timing, adaptive mode). Adding temperature control:

class DreamLoop:
    def __init__(self):
        self.temperature = 1.0
        self.failure_history = []

    def dream_cycle(self):
        # Generate scenario at current temperature
        scenario = self.generate_scenario(temperature=self.temperature)

        # Model responds
        response = self.model.generate(scenario, temperature=0.1)  # low temp for response

        # Evaluate
        success = self.evaluate(response)
        self.failure_history.append(not success)

        # Adapt temperature
        recent_failures = sum(self.failure_history[-20:]) / 20
        self.temperature = 0.5 + 0.7 * recent_failures

        return scenario, response, success

The dream scenario uses adaptive temperature (exploration). The model's response uses low temperature (best effort). The temperature adapts based on recent failure rate.

Summary

Temperature, curriculum difficulty, and noise level are the same control signal. The dream loop's temperature schedule IS the training curriculum. The adaptive version tracks the zone of proximal development automatically. The brain does this with sleep stages. Our system does it with a feedback loop.

No external scheduler. No hand-designed curriculum. Just a closed loop: dream → evaluate → adapt temperature → dream again.

The self-organizing curriculum generates itself from the interaction between the model's capability and the dream loop's temperature. Emergent order from a simple feedback rule.

Like boids. Like ecology. Like the MMORPG. Like everything else we're building.

8 KiB Raw Blame History