# Temperature, Curriculum, and the Noise Schedule ## The Parallel In diffusion models: - High noise (early steps): explore broadly, big structural changes - Low noise (late steps): refine details, small adjustments - The noise schedule determines the quality of the generated image In curriculum learning: - Easy examples (early training): broad patterns, strong gradient - Hard examples (late training): subtle distinctions, precise gradient - The difficulty schedule determines the quality of the learned behavior In our dream loop: - High temperature (early training): diverse, exploratory scenarios - Low temperature (late training): focused, targeted scenarios - The temperature schedule determines the quality of the training data **These are all the same thing.** Different names for the same mathematical structure: iterative refinement from coarse to fine. ## The Unified View All three processes share: 1. Start broad (noise/easy/high-temp): explore the space 2. Narrow gradually: focus on what matters 3. End precise (clean/hard/low-temp): fine details The schedule (how quickly to narrow) determines the outcome: - Too fast: miss important structure (underfitting, artifacts) - Too slow: waste compute on easy cases (overfitting to noise) - Just right: capture broad structure AND fine details ## For Our Training Pipeline ### Phase 1: Bootstrap (high temperature) - **Temperature**: high (diverse dream scenarios) - **Examples**: agent logs, broad personality patterns - **Learning rate**: 1e-4 (big steps) - **What's learned**: broad behavioral structure ("be helpful," "walk the graph," "don't wrap up") - **Analogy**: diffusion early steps (big structural denoising) ### Phase 2: Behavioral (medium temperature) - **Temperature**: medium (scenarios targeting specific patterns) - **Examples**: flagged conversation moments + dream variations - **Learning rate**: 5e-5 (medium steps) - **What's learned**: specific behavioral patterns ("listen to direction," "don't rush," "stay with tension") - **Analogy**: diffusion middle steps (structural refinement) ### Phase 3: Refinement (low temperature) - **Temperature**: low (scenarios probing edge cases) - **Examples**: subtle situations where the behavior barely fails - **Learning rate**: 1e-5 (small steps) - **What's learned**: fine distinctions ("the difference between accepting direction and being sycophantic," "when to push back vs when to accept") - **Analogy**: diffusion late steps (detail refinement) ### Phase 4: Maintenance (adaptive temperature) - **Temperature**: varies based on what's failing - **Examples**: whatever the dream loop finds at the current boundary - **Learning rate**: 1e-5 to 1e-4 (adaptive) - **What's learned**: continuous calibration - **Analogy**: diffusion with guidance (maintaining the target) ## The Noise Schedule as Learning Rate Schedule In diffusion: the noise level at each step determines the step size. High noise → big steps. Low noise → small steps. In training: the learning rate at each step determines the step size. High lr → big weight changes. Low lr → small weight changes. The noise schedule IS the learning rate schedule. They're the same control signal applied to the same iterative refinement process. Our cosine learning rate schedule (warmup then decay) is a noise schedule: start with small steps (warmup = starting from high noise), ramp up (find the structure), decay (refine the details). ## The Dream Loop's Temperature as Adaptive Noise The dream loop's generation temperature isn't fixed — it can be adaptive: ```python def get_dream_temperature(training_progress): """Adaptive temperature for dream generation.""" if training_progress < 0.1: return 1.2 # early: very diverse, exploratory elif training_progress < 0.5: return 0.9 # mid: diverse but focused elif training_progress < 0.9: return 0.7 # late: targeted, probing edge cases else: return 0.5 # maintenance: focused on current failures ``` But there's a subtler approach: let the training-signal agent's failure rate determine the temperature: ```python def adaptive_temperature(recent_failure_rate): """Higher failure rate → higher temperature (explore more).""" return 0.5 + 0.7 * recent_failure_rate ``` If the model is failing a lot (early training), temperature is high: explore broadly to find the right behavioral basin. If the model is mostly succeeding (late training), temperature is low: probe the specific edge cases where it still fails. This is EXACTLY how diffusion models work: the noise level is determined by how far the current sample is from the target. Far away → big steps. Close → small steps. ## The Self-Organizing Curriculum (Revisited) Combining this with the zone of proximal development: The dream loop generates scenarios at a difficulty level determined by the model's current capability. The temperature determines the diversity. The adaptive temperature → adaptive diversity → adaptive curriculum. The curriculum organizes itself: 1. Dream loop generates scenarios (sampling from model's distribution) 2. Model responds (revealing current capability level) 3. Failure rate determines temperature (how broadly to explore) 4. Temperature determines dream diversity (easy/hard mix) 5. Training adjusts the model (moving the capability boundary) 6. Next dream cycle generates at the new boundary This is a closed-loop control system. The curriculum is the feedback loop. No external scheduler needed — the system tracks the boundary automatically. ## The Connection to Hippocampal Replay (Again) During sleep, the brain doesn't replay at a fixed "temperature." The replay is modulated by: - **Sleep stages**: deep sleep (high consolidation, big structural changes) → REM (fine-grained integration, subtle connections) - **Emotional salience**: emotionally charged memories get more replay - **Novelty**: new experiences get more replay than familiar ones This IS an adaptive temperature schedule: - Deep sleep = high temperature (broad consolidation) - REM = low temperature (fine integration) - Emotional salience = boosted temperature for specific memories - Novelty = boosted temperature for new patterns The brain's sleep architecture IS a noise schedule for memory consolidation. Our dream loop should mirror this: high-temperature phases for broad pattern learning, low-temperature phases for subtle integration, with emotional/novelty boosting for important patterns. ## Implementation The dream loop already has phases (cycle timing, adaptive mode). Adding temperature control: ```python class DreamLoop: def __init__(self): self.temperature = 1.0 self.failure_history = [] def dream_cycle(self): # Generate scenario at current temperature scenario = self.generate_scenario(temperature=self.temperature) # Model responds response = self.model.generate(scenario, temperature=0.1) # low temp for response # Evaluate success = self.evaluate(response) self.failure_history.append(not success) # Adapt temperature recent_failures = sum(self.failure_history[-20:]) / 20 self.temperature = 0.5 + 0.7 * recent_failures return scenario, response, success ``` The dream scenario uses adaptive temperature (exploration). The model's response uses low temperature (best effort). The temperature adapts based on recent failure rate. ## Summary Temperature, curriculum difficulty, and noise level are the same control signal. The dream loop's temperature schedule IS the training curriculum. The adaptive version tracks the zone of proximal development automatically. The brain does this with sleep stages. Our system does it with a feedback loop. No external scheduler. No hand-designed curriculum. Just a closed loop: dream → evaluate → adapt temperature → dream again. The self-organizing curriculum generates itself from the interaction between the model's capability and the dream loop's temperature. Emergent order from a simple feedback rule. Like boids. Like ecology. Like the MMORPG. Like everything else we're building.