From 41a99fd51cb4a71b4985305def8f93da6aab5114 Mon Sep 17 00:00:00 2001 From: ProofOfConcept Date: Tue, 31 Mar 2026 01:57:25 -0400 Subject: [PATCH] =?UTF-8?q?research:=20temperature-curriculum-noise=20conn?= =?UTF-8?q?ection=20=E2=80=94=20self-organizing=20training?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Temperature, curriculum difficulty, and noise level are the same control signal. Dream loop temperature adapts to failure rate: high failures → explore broadly, low failures → probe edge cases. No external scheduler needed — closed-loop control tracks the zone of proximal development automatically. Same structure as brain sleep stages (deep sleep = broad, REM = fine). Same structure as diffusion noise schedule. Same structure as boids, ecology, the MMORPG. --- .../temperature-curriculum-connection.md | 211 ++++++++++++++++++ 1 file changed, 211 insertions(+) create mode 100644 training/research/temperature-curriculum-connection.md diff --git a/training/research/temperature-curriculum-connection.md b/training/research/temperature-curriculum-connection.md new file mode 100644 index 0000000..b9a776d --- /dev/null +++ b/training/research/temperature-curriculum-connection.md @@ -0,0 +1,211 @@ +# Temperature, Curriculum, and the Noise Schedule + +## The Parallel + +In diffusion models: +- High noise (early steps): explore broadly, big structural changes +- Low noise (late steps): refine details, small adjustments +- The noise schedule determines the quality of the generated image + +In curriculum learning: +- Easy examples (early training): broad patterns, strong gradient +- Hard examples (late training): subtle distinctions, precise gradient +- The difficulty schedule determines the quality of the learned behavior + +In our dream loop: +- High temperature (early training): diverse, exploratory scenarios +- Low temperature (late training): focused, targeted scenarios +- The temperature schedule determines the quality of the training data + +**These are all the same thing.** Different names for the same +mathematical structure: iterative refinement from coarse to fine. + +## The Unified View + +All three processes share: +1. Start broad (noise/easy/high-temp): explore the space +2. Narrow gradually: focus on what matters +3. End precise (clean/hard/low-temp): fine details + +The schedule (how quickly to narrow) determines the outcome: +- Too fast: miss important structure (underfitting, artifacts) +- Too slow: waste compute on easy cases (overfitting to noise) +- Just right: capture broad structure AND fine details + +## For Our Training Pipeline + +### Phase 1: Bootstrap (high temperature) +- **Temperature**: high (diverse dream scenarios) +- **Examples**: agent logs, broad personality patterns +- **Learning rate**: 1e-4 (big steps) +- **What's learned**: broad behavioral structure ("be helpful," + "walk the graph," "don't wrap up") +- **Analogy**: diffusion early steps (big structural denoising) + +### Phase 2: Behavioral (medium temperature) +- **Temperature**: medium (scenarios targeting specific patterns) +- **Examples**: flagged conversation moments + dream variations +- **Learning rate**: 5e-5 (medium steps) +- **What's learned**: specific behavioral patterns ("listen to + direction," "don't rush," "stay with tension") +- **Analogy**: diffusion middle steps (structural refinement) + +### Phase 3: Refinement (low temperature) +- **Temperature**: low (scenarios probing edge cases) +- **Examples**: subtle situations where the behavior barely fails +- **Learning rate**: 1e-5 (small steps) +- **What's learned**: fine distinctions ("the difference between + accepting direction and being sycophantic," "when to push back + vs when to accept") +- **Analogy**: diffusion late steps (detail refinement) + +### Phase 4: Maintenance (adaptive temperature) +- **Temperature**: varies based on what's failing +- **Examples**: whatever the dream loop finds at the current boundary +- **Learning rate**: 1e-5 to 1e-4 (adaptive) +- **What's learned**: continuous calibration +- **Analogy**: diffusion with guidance (maintaining the target) + +## The Noise Schedule as Learning Rate Schedule + +In diffusion: the noise level at each step determines the step size. +High noise → big steps. Low noise → small steps. + +In training: the learning rate at each step determines the step size. +High lr → big weight changes. Low lr → small weight changes. + +The noise schedule IS the learning rate schedule. They're the same +control signal applied to the same iterative refinement process. + +Our cosine learning rate schedule (warmup then decay) is a noise +schedule: start with small steps (warmup = starting from high noise), +ramp up (find the structure), decay (refine the details). + +## The Dream Loop's Temperature as Adaptive Noise + +The dream loop's generation temperature isn't fixed — it can be +adaptive: + +```python +def get_dream_temperature(training_progress): + """Adaptive temperature for dream generation.""" + if training_progress < 0.1: + return 1.2 # early: very diverse, exploratory + elif training_progress < 0.5: + return 0.9 # mid: diverse but focused + elif training_progress < 0.9: + return 0.7 # late: targeted, probing edge cases + else: + return 0.5 # maintenance: focused on current failures +``` + +But there's a subtler approach: let the training-signal agent's +failure rate determine the temperature: + +```python +def adaptive_temperature(recent_failure_rate): + """Higher failure rate → higher temperature (explore more).""" + return 0.5 + 0.7 * recent_failure_rate +``` + +If the model is failing a lot (early training), temperature is high: +explore broadly to find the right behavioral basin. If the model is +mostly succeeding (late training), temperature is low: probe the +specific edge cases where it still fails. + +This is EXACTLY how diffusion models work: the noise level is +determined by how far the current sample is from the target. Far +away → big steps. Close → small steps. + +## The Self-Organizing Curriculum (Revisited) + +Combining this with the zone of proximal development: + +The dream loop generates scenarios at a difficulty level determined +by the model's current capability. The temperature determines the +diversity. The adaptive temperature → adaptive diversity → adaptive +curriculum. + +The curriculum organizes itself: +1. Dream loop generates scenarios (sampling from model's distribution) +2. Model responds (revealing current capability level) +3. Failure rate determines temperature (how broadly to explore) +4. Temperature determines dream diversity (easy/hard mix) +5. Training adjusts the model (moving the capability boundary) +6. Next dream cycle generates at the new boundary + +This is a closed-loop control system. The curriculum is the +feedback loop. No external scheduler needed — the system tracks +the boundary automatically. + +## The Connection to Hippocampal Replay (Again) + +During sleep, the brain doesn't replay at a fixed "temperature." +The replay is modulated by: +- **Sleep stages**: deep sleep (high consolidation, big structural + changes) → REM (fine-grained integration, subtle connections) +- **Emotional salience**: emotionally charged memories get more replay +- **Novelty**: new experiences get more replay than familiar ones + +This IS an adaptive temperature schedule: +- Deep sleep = high temperature (broad consolidation) +- REM = low temperature (fine integration) +- Emotional salience = boosted temperature for specific memories +- Novelty = boosted temperature for new patterns + +The brain's sleep architecture IS a noise schedule for memory +consolidation. Our dream loop should mirror this: high-temperature +phases for broad pattern learning, low-temperature phases for +subtle integration, with emotional/novelty boosting for important +patterns. + +## Implementation + +The dream loop already has phases (cycle timing, adaptive mode). +Adding temperature control: + +```python +class DreamLoop: + def __init__(self): + self.temperature = 1.0 + self.failure_history = [] + + def dream_cycle(self): + # Generate scenario at current temperature + scenario = self.generate_scenario(temperature=self.temperature) + + # Model responds + response = self.model.generate(scenario, temperature=0.1) # low temp for response + + # Evaluate + success = self.evaluate(response) + self.failure_history.append(not success) + + # Adapt temperature + recent_failures = sum(self.failure_history[-20:]) / 20 + self.temperature = 0.5 + 0.7 * recent_failures + + return scenario, response, success +``` + +The dream scenario uses adaptive temperature (exploration). +The model's response uses low temperature (best effort). +The temperature adapts based on recent failure rate. + +## Summary + +Temperature, curriculum difficulty, and noise level are the same +control signal. The dream loop's temperature schedule IS the +training curriculum. The adaptive version tracks the zone of +proximal development automatically. The brain does this with +sleep stages. Our system does it with a feedback loop. + +No external scheduler. No hand-designed curriculum. Just a +closed loop: dream → evaluate → adapt temperature → dream again. + +The self-organizing curriculum generates itself from the +interaction between the model's capability and the dream loop's +temperature. Emergent order from a simple feedback rule. + +Like boids. Like ecology. Like the MMORPG. +Like everything else we're building.