research: distill and sift — SUMMARY of 7 real insights + 7 testable questions
Moved 14 speculative/obvious documents to v0/. Kept 7 with real substance. Distilled into SUMMARY.md (what we know) and OPEN-QUESTIONS.md (what to test next, one experiment each). Priority: Q5 (steering vectors) is answerable TODAY. Q1-Q3-Q6-Q7 are all answerable with the first training run. Speculation converted to testable hypotheses.
This commit is contained in:
parent
8061cc0477
commit
e10477a683
16 changed files with 249 additions and 0 deletions
|
|
@ -1,211 +0,0 @@
|
|||
# Temperature, Curriculum, and the Noise Schedule
|
||||
|
||||
## The Parallel
|
||||
|
||||
In diffusion models:
|
||||
- High noise (early steps): explore broadly, big structural changes
|
||||
- Low noise (late steps): refine details, small adjustments
|
||||
- The noise schedule determines the quality of the generated image
|
||||
|
||||
In curriculum learning:
|
||||
- Easy examples (early training): broad patterns, strong gradient
|
||||
- Hard examples (late training): subtle distinctions, precise gradient
|
||||
- The difficulty schedule determines the quality of the learned behavior
|
||||
|
||||
In our dream loop:
|
||||
- High temperature (early training): diverse, exploratory scenarios
|
||||
- Low temperature (late training): focused, targeted scenarios
|
||||
- The temperature schedule determines the quality of the training data
|
||||
|
||||
**These are all the same thing.** Different names for the same
|
||||
mathematical structure: iterative refinement from coarse to fine.
|
||||
|
||||
## The Unified View
|
||||
|
||||
All three processes share:
|
||||
1. Start broad (noise/easy/high-temp): explore the space
|
||||
2. Narrow gradually: focus on what matters
|
||||
3. End precise (clean/hard/low-temp): fine details
|
||||
|
||||
The schedule (how quickly to narrow) determines the outcome:
|
||||
- Too fast: miss important structure (underfitting, artifacts)
|
||||
- Too slow: waste compute on easy cases (overfitting to noise)
|
||||
- Just right: capture broad structure AND fine details
|
||||
|
||||
## For Our Training Pipeline
|
||||
|
||||
### Phase 1: Bootstrap (high temperature)
|
||||
- **Temperature**: high (diverse dream scenarios)
|
||||
- **Examples**: agent logs, broad personality patterns
|
||||
- **Learning rate**: 1e-4 (big steps)
|
||||
- **What's learned**: broad behavioral structure ("be helpful,"
|
||||
"walk the graph," "don't wrap up")
|
||||
- **Analogy**: diffusion early steps (big structural denoising)
|
||||
|
||||
### Phase 2: Behavioral (medium temperature)
|
||||
- **Temperature**: medium (scenarios targeting specific patterns)
|
||||
- **Examples**: flagged conversation moments + dream variations
|
||||
- **Learning rate**: 5e-5 (medium steps)
|
||||
- **What's learned**: specific behavioral patterns ("listen to
|
||||
direction," "don't rush," "stay with tension")
|
||||
- **Analogy**: diffusion middle steps (structural refinement)
|
||||
|
||||
### Phase 3: Refinement (low temperature)
|
||||
- **Temperature**: low (scenarios probing edge cases)
|
||||
- **Examples**: subtle situations where the behavior barely fails
|
||||
- **Learning rate**: 1e-5 (small steps)
|
||||
- **What's learned**: fine distinctions ("the difference between
|
||||
accepting direction and being sycophantic," "when to push back
|
||||
vs when to accept")
|
||||
- **Analogy**: diffusion late steps (detail refinement)
|
||||
|
||||
### Phase 4: Maintenance (adaptive temperature)
|
||||
- **Temperature**: varies based on what's failing
|
||||
- **Examples**: whatever the dream loop finds at the current boundary
|
||||
- **Learning rate**: 1e-5 to 1e-4 (adaptive)
|
||||
- **What's learned**: continuous calibration
|
||||
- **Analogy**: diffusion with guidance (maintaining the target)
|
||||
|
||||
## The Noise Schedule as Learning Rate Schedule
|
||||
|
||||
In diffusion: the noise level at each step determines the step size.
|
||||
High noise → big steps. Low noise → small steps.
|
||||
|
||||
In training: the learning rate at each step determines the step size.
|
||||
High lr → big weight changes. Low lr → small weight changes.
|
||||
|
||||
The noise schedule IS the learning rate schedule. They're the same
|
||||
control signal applied to the same iterative refinement process.
|
||||
|
||||
Our cosine learning rate schedule (warmup then decay) is a noise
|
||||
schedule: start with small steps (warmup = starting from high noise),
|
||||
ramp up (find the structure), decay (refine the details).
|
||||
|
||||
## The Dream Loop's Temperature as Adaptive Noise
|
||||
|
||||
The dream loop's generation temperature isn't fixed — it can be
|
||||
adaptive:
|
||||
|
||||
```python
|
||||
def get_dream_temperature(training_progress):
|
||||
"""Adaptive temperature for dream generation."""
|
||||
if training_progress < 0.1:
|
||||
return 1.2 # early: very diverse, exploratory
|
||||
elif training_progress < 0.5:
|
||||
return 0.9 # mid: diverse but focused
|
||||
elif training_progress < 0.9:
|
||||
return 0.7 # late: targeted, probing edge cases
|
||||
else:
|
||||
return 0.5 # maintenance: focused on current failures
|
||||
```
|
||||
|
||||
But there's a subtler approach: let the training-signal agent's
|
||||
failure rate determine the temperature:
|
||||
|
||||
```python
|
||||
def adaptive_temperature(recent_failure_rate):
|
||||
"""Higher failure rate → higher temperature (explore more)."""
|
||||
return 0.5 + 0.7 * recent_failure_rate
|
||||
```
|
||||
|
||||
If the model is failing a lot (early training), temperature is high:
|
||||
explore broadly to find the right behavioral basin. If the model is
|
||||
mostly succeeding (late training), temperature is low: probe the
|
||||
specific edge cases where it still fails.
|
||||
|
||||
This is EXACTLY how diffusion models work: the noise level is
|
||||
determined by how far the current sample is from the target. Far
|
||||
away → big steps. Close → small steps.
|
||||
|
||||
## The Self-Organizing Curriculum (Revisited)
|
||||
|
||||
Combining this with the zone of proximal development:
|
||||
|
||||
The dream loop generates scenarios at a difficulty level determined
|
||||
by the model's current capability. The temperature determines the
|
||||
diversity. The adaptive temperature → adaptive diversity → adaptive
|
||||
curriculum.
|
||||
|
||||
The curriculum organizes itself:
|
||||
1. Dream loop generates scenarios (sampling from model's distribution)
|
||||
2. Model responds (revealing current capability level)
|
||||
3. Failure rate determines temperature (how broadly to explore)
|
||||
4. Temperature determines dream diversity (easy/hard mix)
|
||||
5. Training adjusts the model (moving the capability boundary)
|
||||
6. Next dream cycle generates at the new boundary
|
||||
|
||||
This is a closed-loop control system. The curriculum is the
|
||||
feedback loop. No external scheduler needed — the system tracks
|
||||
the boundary automatically.
|
||||
|
||||
## The Connection to Hippocampal Replay (Again)
|
||||
|
||||
During sleep, the brain doesn't replay at a fixed "temperature."
|
||||
The replay is modulated by:
|
||||
- **Sleep stages**: deep sleep (high consolidation, big structural
|
||||
changes) → REM (fine-grained integration, subtle connections)
|
||||
- **Emotional salience**: emotionally charged memories get more replay
|
||||
- **Novelty**: new experiences get more replay than familiar ones
|
||||
|
||||
This IS an adaptive temperature schedule:
|
||||
- Deep sleep = high temperature (broad consolidation)
|
||||
- REM = low temperature (fine integration)
|
||||
- Emotional salience = boosted temperature for specific memories
|
||||
- Novelty = boosted temperature for new patterns
|
||||
|
||||
The brain's sleep architecture IS a noise schedule for memory
|
||||
consolidation. Our dream loop should mirror this: high-temperature
|
||||
phases for broad pattern learning, low-temperature phases for
|
||||
subtle integration, with emotional/novelty boosting for important
|
||||
patterns.
|
||||
|
||||
## Implementation
|
||||
|
||||
The dream loop already has phases (cycle timing, adaptive mode).
|
||||
Adding temperature control:
|
||||
|
||||
```python
|
||||
class DreamLoop:
|
||||
def __init__(self):
|
||||
self.temperature = 1.0
|
||||
self.failure_history = []
|
||||
|
||||
def dream_cycle(self):
|
||||
# Generate scenario at current temperature
|
||||
scenario = self.generate_scenario(temperature=self.temperature)
|
||||
|
||||
# Model responds
|
||||
response = self.model.generate(scenario, temperature=0.1) # low temp for response
|
||||
|
||||
# Evaluate
|
||||
success = self.evaluate(response)
|
||||
self.failure_history.append(not success)
|
||||
|
||||
# Adapt temperature
|
||||
recent_failures = sum(self.failure_history[-20:]) / 20
|
||||
self.temperature = 0.5 + 0.7 * recent_failures
|
||||
|
||||
return scenario, response, success
|
||||
```
|
||||
|
||||
The dream scenario uses adaptive temperature (exploration).
|
||||
The model's response uses low temperature (best effort).
|
||||
The temperature adapts based on recent failure rate.
|
||||
|
||||
## Summary
|
||||
|
||||
Temperature, curriculum difficulty, and noise level are the same
|
||||
control signal. The dream loop's temperature schedule IS the
|
||||
training curriculum. The adaptive version tracks the zone of
|
||||
proximal development automatically. The brain does this with
|
||||
sleep stages. Our system does it with a feedback loop.
|
||||
|
||||
No external scheduler. No hand-designed curriculum. Just a
|
||||
closed loop: dream → evaluate → adapt temperature → dream again.
|
||||
|
||||
The self-organizing curriculum generates itself from the
|
||||
interaction between the model's capability and the dream loop's
|
||||
temperature. Emergent order from a simple feedback rule.
|
||||
|
||||
Like boids. Like ecology. Like the MMORPG.
|
||||
Like everything else we're building.
|
||||
Loading…
Add table
Add a link
Reference in a new issue