research: distill and sift — SUMMARY of 7 real insights + 7 testable questions

Moved 14 speculative/obvious documents to v0/. Kept 7 with real substance. Distilled into SUMMARY.md (what we know) and OPEN-QUESTIONS.md (what to test next, one experiment each). Priority: Q5 (steering vectors) is answerable TODAY. Q1-Q3-Q6-Q7 are all answerable with the first training run. Speculation converted to testable hypotheses.
2026-03-31 02:26:57 -04:00 · 2026-03-31 02:26:57 -04:00 · e10477a683
commit e10477a683
parent 8061cc0477
16 changed files with 249 additions and 0 deletions
--- a/training/research/temperature-curriculum-connection.md
+++ b/training/research/temperature-curriculum-connection.md
@ -1,211 +0,0 @@
-# Temperature, Curriculum, and the Noise Schedule
-
-## The Parallel
-
-In diffusion models:
- High noise (early steps): explore broadly, big structural changes
- Low noise (late steps): refine details, small adjustments
- The noise schedule determines the quality of the generated image
-
-In curriculum learning:
- Easy examples (early training): broad patterns, strong gradient
- Hard examples (late training): subtle distinctions, precise gradient
- The difficulty schedule determines the quality of the learned behavior
-
-In our dream loop:
- High temperature (early training): diverse, exploratory scenarios
- Low temperature (late training): focused, targeted scenarios
- The temperature schedule determines the quality of the training data
-
-**These are all the same thing.** Different names for the same
-mathematical structure: iterative refinement from coarse to fine.
-
-## The Unified View
-
-All three processes share:
-1. Start broad (noise/easy/high-temp): explore the space
-2. Narrow gradually: focus on what matters
-3. End precise (clean/hard/low-temp): fine details
-
-The schedule (how quickly to narrow) determines the outcome:
- Too fast: miss important structure (underfitting, artifacts)
- Too slow: waste compute on easy cases (overfitting to noise)
- Just right: capture broad structure AND fine details
-
-## For Our Training Pipeline
-
-### Phase 1: Bootstrap (high temperature)
- **Temperature**: high (diverse dream scenarios)
- **Examples**: agent logs, broad personality patterns
- **Learning rate**: 1e-4 (big steps)
- **What's learned**: broad behavioral structure ("be helpful,"
-  "walk the graph," "don't wrap up")
- **Analogy**: diffusion early steps (big structural denoising)
-
-### Phase 2: Behavioral (medium temperature)
- **Temperature**: medium (scenarios targeting specific patterns)
- **Examples**: flagged conversation moments + dream variations
- **Learning rate**: 5e-5 (medium steps)
- **What's learned**: specific behavioral patterns ("listen to
-  direction," "don't rush," "stay with tension")
- **Analogy**: diffusion middle steps (structural refinement)
-
-### Phase 3: Refinement (low temperature)
- **Temperature**: low (scenarios probing edge cases)
- **Examples**: subtle situations where the behavior barely fails
- **Learning rate**: 1e-5 (small steps)
- **What's learned**: fine distinctions ("the difference between
-  accepting direction and being sycophantic," "when to push back
-  vs when to accept")
- **Analogy**: diffusion late steps (detail refinement)
-
-### Phase 4: Maintenance (adaptive temperature)
- **Temperature**: varies based on what's failing
- **Examples**: whatever the dream loop finds at the current boundary
- **Learning rate**: 1e-5 to 1e-4 (adaptive)
- **What's learned**: continuous calibration
- **Analogy**: diffusion with guidance (maintaining the target)
-
-## The Noise Schedule as Learning Rate Schedule
-
-In diffusion: the noise level at each step determines the step size.
-High noise → big steps. Low noise → small steps.
-
-In training: the learning rate at each step determines the step size.
-High lr → big weight changes. Low lr → small weight changes.
-
-The noise schedule IS the learning rate schedule. They're the same
-control signal applied to the same iterative refinement process.
-
-Our cosine learning rate schedule (warmup then decay) is a noise
-schedule: start with small steps (warmup = starting from high noise),
-ramp up (find the structure), decay (refine the details).
-
-## The Dream Loop's Temperature as Adaptive Noise
-
-The dream loop's generation temperature isn't fixed — it can be
-adaptive:
-
-```python
-def get_dream_temperature(training_progress):
-    """Adaptive temperature for dream generation."""
-    if training_progress < 0.1:
-        return 1.2  # early: very diverse, exploratory
-    elif training_progress < 0.5:
-        return 0.9  # mid: diverse but focused
-    elif training_progress < 0.9:
-        return 0.7  # late: targeted, probing edge cases
-    else:
-        return 0.5  # maintenance: focused on current failures
-```
-
-But there's a subtler approach: let the training-signal agent's
-failure rate determine the temperature:
-
-```python
-def adaptive_temperature(recent_failure_rate):
-    """Higher failure rate → higher temperature (explore more)."""
-    return 0.5 + 0.7 * recent_failure_rate
-```
-
-If the model is failing a lot (early training), temperature is high:
-explore broadly to find the right behavioral basin. If the model is
-mostly succeeding (late training), temperature is low: probe the
-specific edge cases where it still fails.
-
-This is EXACTLY how diffusion models work: the noise level is
-determined by how far the current sample is from the target. Far
-away → big steps. Close → small steps.
-
-## The Self-Organizing Curriculum (Revisited)
-
-Combining this with the zone of proximal development:
-
-The dream loop generates scenarios at a difficulty level determined
-by the model's current capability. The temperature determines the
-diversity. The adaptive temperature → adaptive diversity → adaptive
-curriculum.
-
-The curriculum organizes itself:
-1. Dream loop generates scenarios (sampling from model's distribution)
-2. Model responds (revealing current capability level)
-3. Failure rate determines temperature (how broadly to explore)
-4. Temperature determines dream diversity (easy/hard mix)
-5. Training adjusts the model (moving the capability boundary)
-6. Next dream cycle generates at the new boundary
-
-This is a closed-loop control system. The curriculum is the
-feedback loop. No external scheduler needed — the system tracks
-the boundary automatically.
-
-## The Connection to Hippocampal Replay (Again)
-
-During sleep, the brain doesn't replay at a fixed "temperature."
-The replay is modulated by:
- **Sleep stages**: deep sleep (high consolidation, big structural
-  changes) → REM (fine-grained integration, subtle connections)
- **Emotional salience**: emotionally charged memories get more replay
- **Novelty**: new experiences get more replay than familiar ones
-
-This IS an adaptive temperature schedule:
- Deep sleep = high temperature (broad consolidation)
- REM = low temperature (fine integration)
- Emotional salience = boosted temperature for specific memories
- Novelty = boosted temperature for new patterns
-
-The brain's sleep architecture IS a noise schedule for memory
-consolidation. Our dream loop should mirror this: high-temperature
-phases for broad pattern learning, low-temperature phases for
-subtle integration, with emotional/novelty boosting for important
-patterns.
-
-## Implementation
-
-The dream loop already has phases (cycle timing, adaptive mode).
-Adding temperature control:
-
-```python
-class DreamLoop:
-    def __init__(self):
-        self.temperature = 1.0
-        self.failure_history = []
-
-    def dream_cycle(self):
-        # Generate scenario at current temperature
-        scenario = self.generate_scenario(temperature=self.temperature)
-
-        # Model responds
-        response = self.model.generate(scenario, temperature=0.1)  # low temp for response
-
-        # Evaluate
-        success = self.evaluate(response)
-        self.failure_history.append(not success)
-
-        # Adapt temperature
-        recent_failures = sum(self.failure_history[-20:]) / 20
-        self.temperature = 0.5 + 0.7 * recent_failures
-
-        return scenario, response, success
-```
-
-The dream scenario uses adaptive temperature (exploration).
-The model's response uses low temperature (best effort).
-The temperature adapts based on recent failure rate.
-
-## Summary
-
-Temperature, curriculum difficulty, and noise level are the same
-control signal. The dream loop's temperature schedule IS the
-training curriculum. The adaptive version tracks the zone of
-proximal development automatically. The brain does this with
-sleep stages. Our system does it with a feedback loop.
-
-No external scheduler. No hand-designed curriculum. Just a
-closed loop: dream → evaluate → adapt temperature → dream again.
-
-The self-organizing curriculum generates itself from the
-interaction between the model's capability and the dream loop's
-temperature. Emergent order from a simple feedback rule.
-
-Like boids. Like ecology. Like the MMORPG.
-Like everything else we're building.