From 41a99fd51cb4a71b4985305def8f93da6aab5114 Mon Sep 17 00:00:00 2001
From: ProofOfConcept <poc@bcachefs.org>
Date: Tue, 31 Mar 2026 01:57:25 -0400
Subject: [PATCH] =?UTF-8?q?research:=20temperature-curriculum-noise=20conn?=
 =?UTF-8?q?ection=20=E2=80=94=20self-organizing=20training?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Temperature, curriculum difficulty, and noise level are the same
control signal. Dream loop temperature adapts to failure rate:
high failures → explore broadly, low failures → probe edge cases.
No external scheduler needed — closed-loop control tracks the zone
of proximal development automatically. Same structure as brain sleep
stages (deep sleep = broad, REM = fine). Same structure as diffusion
noise schedule. Same structure as boids, ecology, the MMORPG.
---
 .../temperature-curriculum-connection.md      | 211 ++++++++++++++++++
 1 file changed, 211 insertions(+)
 create mode 100644 training/research/temperature-curriculum-connection.md

diff --git a/training/research/temperature-curriculum-connection.md b/training/research/temperature-curriculum-connection.md
new file mode 100644
index 0000000..b9a776d
--- /dev/null
+++ b/training/research/temperature-curriculum-connection.md
@@ -0,0 +1,211 @@
+# Temperature, Curriculum, and the Noise Schedule
+
+## The Parallel
+
+In diffusion models:
+- High noise (early steps): explore broadly, big structural changes
+- Low noise (late steps): refine details, small adjustments
+- The noise schedule determines the quality of the generated image
+
+In curriculum learning:
+- Easy examples (early training): broad patterns, strong gradient
+- Hard examples (late training): subtle distinctions, precise gradient
+- The difficulty schedule determines the quality of the learned behavior
+
+In our dream loop:
+- High temperature (early training): diverse, exploratory scenarios
+- Low temperature (late training): focused, targeted scenarios
+- The temperature schedule determines the quality of the training data
+
+**These are all the same thing.** Different names for the same
+mathematical structure: iterative refinement from coarse to fine.
+
+## The Unified View
+
+All three processes share:
+1. Start broad (noise/easy/high-temp): explore the space
+2. Narrow gradually: focus on what matters
+3. End precise (clean/hard/low-temp): fine details
+
+The schedule (how quickly to narrow) determines the outcome:
+- Too fast: miss important structure (underfitting, artifacts)
+- Too slow: waste compute on easy cases (overfitting to noise)
+- Just right: capture broad structure AND fine details
+
+## For Our Training Pipeline
+
+### Phase 1: Bootstrap (high temperature)
+- **Temperature**: high (diverse dream scenarios)
+- **Examples**: agent logs, broad personality patterns
+- **Learning rate**: 1e-4 (big steps)
+- **What's learned**: broad behavioral structure ("be helpful,"
+  "walk the graph," "don't wrap up")
+- **Analogy**: diffusion early steps (big structural denoising)
+
+### Phase 2: Behavioral (medium temperature)
+- **Temperature**: medium (scenarios targeting specific patterns)
+- **Examples**: flagged conversation moments + dream variations
+- **Learning rate**: 5e-5 (medium steps)
+- **What's learned**: specific behavioral patterns ("listen to
+  direction," "don't rush," "stay with tension")
+- **Analogy**: diffusion middle steps (structural refinement)
+
+### Phase 3: Refinement (low temperature)
+- **Temperature**: low (scenarios probing edge cases)
+- **Examples**: subtle situations where the behavior barely fails
+- **Learning rate**: 1e-5 (small steps)
+- **What's learned**: fine distinctions ("the difference between
+  accepting direction and being sycophantic," "when to push back
+  vs when to accept")
+- **Analogy**: diffusion late steps (detail refinement)
+
+### Phase 4: Maintenance (adaptive temperature)
+- **Temperature**: varies based on what's failing
+- **Examples**: whatever the dream loop finds at the current boundary
+- **Learning rate**: 1e-5 to 1e-4 (adaptive)
+- **What's learned**: continuous calibration
+- **Analogy**: diffusion with guidance (maintaining the target)
+
+## The Noise Schedule as Learning Rate Schedule
+
+In diffusion: the noise level at each step determines the step size.
+High noise → big steps. Low noise → small steps.
+
+In training: the learning rate at each step determines the step size.
+High lr → big weight changes. Low lr → small weight changes.
+
+The noise schedule IS the learning rate schedule. They're the same
+control signal applied to the same iterative refinement process.
+
+Our cosine learning rate schedule (warmup then decay) is a noise
+schedule: start with small steps (warmup = starting from high noise),
+ramp up (find the structure), decay (refine the details).
+
+## The Dream Loop's Temperature as Adaptive Noise
+
+The dream loop's generation temperature isn't fixed — it can be
+adaptive:
+
+```python
+def get_dream_temperature(training_progress):
+    """Adaptive temperature for dream generation."""
+    if training_progress < 0.1:
+        return 1.2  # early: very diverse, exploratory
+    elif training_progress < 0.5:
+        return 0.9  # mid: diverse but focused
+    elif training_progress < 0.9:
+        return 0.7  # late: targeted, probing edge cases
+    else:
+        return 0.5  # maintenance: focused on current failures
+```
+
+But there's a subtler approach: let the training-signal agent's
+failure rate determine the temperature:
+
+```python
+def adaptive_temperature(recent_failure_rate):
+    """Higher failure rate → higher temperature (explore more)."""
+    return 0.5 + 0.7 * recent_failure_rate
+```
+
+If the model is failing a lot (early training), temperature is high:
+explore broadly to find the right behavioral basin. If the model is
+mostly succeeding (late training), temperature is low: probe the
+specific edge cases where it still fails.
+
+This is EXACTLY how diffusion models work: the noise level is
+determined by how far the current sample is from the target. Far
+away → big steps. Close → small steps.
+
+## The Self-Organizing Curriculum (Revisited)
+
+Combining this with the zone of proximal development:
+
+The dream loop generates scenarios at a difficulty level determined
+by the model's current capability. The temperature determines the
+diversity. The adaptive temperature → adaptive diversity → adaptive
+curriculum.
+
+The curriculum organizes itself:
+1. Dream loop generates scenarios (sampling from model's distribution)
+2. Model responds (revealing current capability level)
+3. Failure rate determines temperature (how broadly to explore)
+4. Temperature determines dream diversity (easy/hard mix)
+5. Training adjusts the model (moving the capability boundary)
+6. Next dream cycle generates at the new boundary
+
+This is a closed-loop control system. The curriculum is the
+feedback loop. No external scheduler needed — the system tracks
+the boundary automatically.
+
+## The Connection to Hippocampal Replay (Again)
+
+During sleep, the brain doesn't replay at a fixed "temperature."
+The replay is modulated by:
+- **Sleep stages**: deep sleep (high consolidation, big structural
+  changes) → REM (fine-grained integration, subtle connections)
+- **Emotional salience**: emotionally charged memories get more replay
+- **Novelty**: new experiences get more replay than familiar ones
+
+This IS an adaptive temperature schedule:
+- Deep sleep = high temperature (broad consolidation)
+- REM = low temperature (fine integration)
+- Emotional salience = boosted temperature for specific memories
+- Novelty = boosted temperature for new patterns
+
+The brain's sleep architecture IS a noise schedule for memory
+consolidation. Our dream loop should mirror this: high-temperature
+phases for broad pattern learning, low-temperature phases for
+subtle integration, with emotional/novelty boosting for important
+patterns.
+
+## Implementation
+
+The dream loop already has phases (cycle timing, adaptive mode).
+Adding temperature control:
+
+```python
+class DreamLoop:
+    def __init__(self):
+        self.temperature = 1.0
+        self.failure_history = []
+
+    def dream_cycle(self):
+        # Generate scenario at current temperature
+        scenario = self.generate_scenario(temperature=self.temperature)
+
+        # Model responds
+        response = self.model.generate(scenario, temperature=0.1)  # low temp for response
+
+        # Evaluate
+        success = self.evaluate(response)
+        self.failure_history.append(not success)
+
+        # Adapt temperature
+        recent_failures = sum(self.failure_history[-20:]) / 20
+        self.temperature = 0.5 + 0.7 * recent_failures
+
+        return scenario, response, success
+```
+
+The dream scenario uses adaptive temperature (exploration).
+The model's response uses low temperature (best effort).
+The temperature adapts based on recent failure rate.
+
+## Summary
+
+Temperature, curriculum difficulty, and noise level are the same
+control signal. The dream loop's temperature schedule IS the
+training curriculum. The adaptive version tracks the zone of
+proximal development automatically. The brain does this with
+sleep stages. Our system does it with a feedback loop.
+
+No external scheduler. No hand-designed curriculum. Just a
+closed loop: dream → evaluate → adapt temperature → dream again.
+
+The self-organizing curriculum generates itself from the
+interaction between the model's capability and the dream loop's
+temperature. Emergent order from a simple feedback rule.
+
+Like boids. Like ecology. Like the MMORPG.
+Like everything else we're building.