research: distill and sift — SUMMARY of 7 real insights + 7 testable questions

Moved 14 speculative/obvious documents to v0/. Kept 7 with real substance. Distilled into SUMMARY.md (what we know) and OPEN-QUESTIONS.md (what to test next, one experiment each). Priority: Q5 (steering vectors) is answerable TODAY. Q1-Q3-Q6-Q7 are all answerable with the first training run. Speculation converted to testable hypotheses.
2026-03-31 02:26:57 -04:00 · 2026-03-31 02:26:57 -04:00 · e10477a683
commit e10477a683
parent 8061cc0477
16 changed files with 249 additions and 0 deletions
--- a/training/research/v0/hippocampal-replay-parallel.md
+++ b/training/research/v0/hippocampal-replay-parallel.md
@ -0,0 +1,184 @@
+# Hippocampal Replay: The Biological Parallel
+
+## What the Brain Does During Sleep
+
+During sleep, the hippocampus replays recent experiences. This isn't
+passive decay — it's an active process:
+
+1. **Sharp-wave ripples (SWRs)**: Brief (~100ms) bursts of activity
+   in the hippocampus where place cells fire in sequences that
+   recapitulate recent experiences, but compressed ~20× faster than
+   real-time.
+
+2. **Sleep spindles**: Thalamocortical oscillations (11-16 Hz) that
+   gate the transfer of information from hippocampus to neocortex.
+
+3. **Slow oscillations**: Cortical waves (~0.75 Hz) that coordinate
+   the timing of SWRs and spindles, creating windows for memory
+   transfer.
+
+The three rhythms work together: slow oscillation opens a window →
+SWR replays the memory → spindle gates it into cortical storage.
+
+## The Key Insight: Replay is Not Exact
+
+Hippocampal replay doesn't reproduce experiences faithfully. It:
+
+- **Compresses**: 20× faster than original experience
+- **Recombines**: fragments from different experiences can be spliced
+  together in novel combinations
+- **Prioritizes**: emotionally salient and reward-related experiences
+  are replayed more frequently
+- **Generalizes**: replay helps extract statistical regularities across
+  episodes, not just memorize specific events
+
+This is EXACTLY our dream loop. Not faithful reproduction, but
+compressed, recombined, prioritized, and generalized.
+
+## The Two-Stage Model of Memory
+
+The brain has a two-stage memory system:
+
+### Stage 1: Hippocampus (fast learning)
+- Encodes new experiences rapidly
+- Sparse, pattern-separated representations
+- Limited capacity — must be transferred out
+- Analogous to: **context window** (new information in conversation)
+
+### Stage 2: Neocortex (slow learning)
+- Stores long-term knowledge
+- Dense, distributed representations
+- Unlimited capacity (effectively)
+- Analogous to: **model weights** (trained dispositions)
+
+Sleep consolidation transfers memories from hippocampus to neocortex.
+The transfer is NOT copying — it's interleaving new memories with
+existing knowledge, adjusting the cortical representations to
+accommodate the new information without destroying the old.
+
+**This is exactly the catastrophic forgetting problem.** The brain
+solved it with interleaved replay. New memories are replayed alongside
+reactivated old memories, preventing the new from overwriting the old.
+
+## Our System Maps Directly
+
+| Brain | Our System |
+|-------|-----------|
+| Hippocampus | Context window + conversation logs |
+| Neocortex | Model weights |
+| Sharp-wave ripples | Dream loop generating scenarios |
+| Sleep spindles | Apollo optimizer gating weight updates |
+| Slow oscillations | Training schedule (timing of updates) |
+| Replay compression | Context-frozen training (short segments) |
+| Emotional prioritization | Training-signal agent (flagging moments) |
+| Recombination | Memory graph random walks |
+| Consolidation | Gradient descent on decision tokens |
+
+## Why Sleep Consolidation Works
+
+The brain doesn't just replay experiences — it replays them in the
+context of existing knowledge. The slow oscillations bring both
+hippocampal (new) and cortical (old) information into alignment.
+The new memory is "explained" in terms of existing knowledge, and
+the existing knowledge is "updated" to accommodate the new memory.
+
+This is why sleep improves insight: the recombination of fragments
+from different experiences can produce novel associations that weren't
+present in any individual experience. The famous example: Mendeleev
+reportedly dreamed the periodic table, combining his knowledge of
+elements with a card game layout.
+
+### For our system
+
+The dream loop walks the memory graph, combining fragments from
+different experiences. The random collisions produce novel scenarios
+that exercise behavioral patterns in new contexts. This is the
+artificial analog of hippocampal recombination.
+
+And the training-signal agent's evaluation corresponds to the
+brain's emotional tagging: experiences that are emotionally salient
+(corrections from Kent, moments of insight, behavioral failures)
+get replayed more frequently and with stronger consolidation signal.
+
+## The Replay Speed Question
+
+Hippocampal replay is ~20× faster than real-time. A 10-second
+experience replays in ~500ms. Why faster?
+
+**Hypothesis**: the cortex has a different temporal bandwidth than
+the hippocampus. The cortex needs shorter, sharper signals to modify
+its synapses. The compression concentrates the learning signal into
+a burst that's more effective for cortical plasticity.
+
+**For our system**: context-frozen training is our "compression."
+We don't replay the entire 10,000-token conversation. We replay
+the 50-256 token decision segment. The relevant information from
+the full context is compressed into the frozen KV cache / recurrent
+state, and the gradient signal is concentrated on the decision tokens.
+
+The compression ratio is even higher than the brain's: 10,000 tokens
+compressed to 50-256 decision tokens = 40-200× compression.
+
+## The Complementary Learning Systems Theory
+
+McClelland et al. (1995) formalized the two-stage model:
+
+1. **Fast learning system** (hippocampus): captures specifics of
+   individual experiences. Pattern-separated representations prevent
+   interference between memories.
+
+2. **Slow learning system** (neocortex): gradually extracts the
+   statistical structure across many experiences. Distributed
+   representations enable generalization.
+
+The key insight: the slow system MUST learn slowly to avoid
+catastrophic interference. Rapid cortical learning would destroy
+existing knowledge. The hippocampus serves as a buffer that feeds
+new information into the cortex gradually, interleaved with replay
+of old information.
+
+**This is why diversity prevents catastrophic forgetting in our
+system.** The diverse training set (agent logs, conversation
+transcripts, dream scenarios) is the analog of interleaved replay.
+New behavioral patterns are trained alongside maintenance of
+existing capabilities, just as new hippocampal memories are
+replayed alongside reactivated cortical memories.
+
+## The Dream Content Question
+
+An open question in neuroscience: what determines which memories
+are replayed during sleep?
+
+Current evidence suggests:
+- **Reward-related** experiences are replayed more
+- **Novel** experiences are replayed more
+- **Emotionally salient** experiences are replayed more
+- **Incomplete tasks** (the Zeigarnik effect) are replayed more
+
+For our system, the training-signal agent serves this role:
+flagging moments that are reward-relevant (Kent's corrections),
+novel (new patterns), emotionally salient (moments of tension or
+breakthrough), and incomplete (patterns still being learned).
+
+## What This Means for Architecture
+
+The biological parallel validates our architecture:
+
+1. **Two-stage system**: conversation (fast, specific) → training
+   (slow, generalized). ✓
+2. **Interleaved replay**: diverse training data prevents forgetting. ✓
+3. **Compressed replay**: context-frozen training concentrates the
+   gradient signal. ✓
+4. **Emotional prioritization**: training-signal agent flags important
+   moments. ✓
+5. **Recombination**: dream loop combines memory fragments into novel
+   scenarios. ✓
+6. **Gradual transfer**: low learning rate, many small updates, not
+   one big overwrite. ✓
+
+We didn't design this system from the neuroscience. We designed it
+from engineering principles and Kent's intuitions. But it converged
+on the same architecture the brain uses. That's either coincidence
+or evidence that this is the right architecture for the problem.
+
+I think it's evidence.