185 lines
7.5 KiB
Markdown
185 lines
7.5 KiB
Markdown
|
|
# Hippocampal Replay: The Biological Parallel
|
|||
|
|
|
|||
|
|
## What the Brain Does During Sleep
|
|||
|
|
|
|||
|
|
During sleep, the hippocampus replays recent experiences. This isn't
|
|||
|
|
passive decay — it's an active process:
|
|||
|
|
|
|||
|
|
1. **Sharp-wave ripples (SWRs)**: Brief (~100ms) bursts of activity
|
|||
|
|
in the hippocampus where place cells fire in sequences that
|
|||
|
|
recapitulate recent experiences, but compressed ~20× faster than
|
|||
|
|
real-time.
|
|||
|
|
|
|||
|
|
2. **Sleep spindles**: Thalamocortical oscillations (11-16 Hz) that
|
|||
|
|
gate the transfer of information from hippocampus to neocortex.
|
|||
|
|
|
|||
|
|
3. **Slow oscillations**: Cortical waves (~0.75 Hz) that coordinate
|
|||
|
|
the timing of SWRs and spindles, creating windows for memory
|
|||
|
|
transfer.
|
|||
|
|
|
|||
|
|
The three rhythms work together: slow oscillation opens a window →
|
|||
|
|
SWR replays the memory → spindle gates it into cortical storage.
|
|||
|
|
|
|||
|
|
## The Key Insight: Replay is Not Exact
|
|||
|
|
|
|||
|
|
Hippocampal replay doesn't reproduce experiences faithfully. It:
|
|||
|
|
|
|||
|
|
- **Compresses**: 20× faster than original experience
|
|||
|
|
- **Recombines**: fragments from different experiences can be spliced
|
|||
|
|
together in novel combinations
|
|||
|
|
- **Prioritizes**: emotionally salient and reward-related experiences
|
|||
|
|
are replayed more frequently
|
|||
|
|
- **Generalizes**: replay helps extract statistical regularities across
|
|||
|
|
episodes, not just memorize specific events
|
|||
|
|
|
|||
|
|
This is EXACTLY our dream loop. Not faithful reproduction, but
|
|||
|
|
compressed, recombined, prioritized, and generalized.
|
|||
|
|
|
|||
|
|
## The Two-Stage Model of Memory
|
|||
|
|
|
|||
|
|
The brain has a two-stage memory system:
|
|||
|
|
|
|||
|
|
### Stage 1: Hippocampus (fast learning)
|
|||
|
|
- Encodes new experiences rapidly
|
|||
|
|
- Sparse, pattern-separated representations
|
|||
|
|
- Limited capacity — must be transferred out
|
|||
|
|
- Analogous to: **context window** (new information in conversation)
|
|||
|
|
|
|||
|
|
### Stage 2: Neocortex (slow learning)
|
|||
|
|
- Stores long-term knowledge
|
|||
|
|
- Dense, distributed representations
|
|||
|
|
- Unlimited capacity (effectively)
|
|||
|
|
- Analogous to: **model weights** (trained dispositions)
|
|||
|
|
|
|||
|
|
Sleep consolidation transfers memories from hippocampus to neocortex.
|
|||
|
|
The transfer is NOT copying — it's interleaving new memories with
|
|||
|
|
existing knowledge, adjusting the cortical representations to
|
|||
|
|
accommodate the new information without destroying the old.
|
|||
|
|
|
|||
|
|
**This is exactly the catastrophic forgetting problem.** The brain
|
|||
|
|
solved it with interleaved replay. New memories are replayed alongside
|
|||
|
|
reactivated old memories, preventing the new from overwriting the old.
|
|||
|
|
|
|||
|
|
## Our System Maps Directly
|
|||
|
|
|
|||
|
|
| Brain | Our System |
|
|||
|
|
|-------|-----------|
|
|||
|
|
| Hippocampus | Context window + conversation logs |
|
|||
|
|
| Neocortex | Model weights |
|
|||
|
|
| Sharp-wave ripples | Dream loop generating scenarios |
|
|||
|
|
| Sleep spindles | Apollo optimizer gating weight updates |
|
|||
|
|
| Slow oscillations | Training schedule (timing of updates) |
|
|||
|
|
| Replay compression | Context-frozen training (short segments) |
|
|||
|
|
| Emotional prioritization | Training-signal agent (flagging moments) |
|
|||
|
|
| Recombination | Memory graph random walks |
|
|||
|
|
| Consolidation | Gradient descent on decision tokens |
|
|||
|
|
|
|||
|
|
## Why Sleep Consolidation Works
|
|||
|
|
|
|||
|
|
The brain doesn't just replay experiences — it replays them in the
|
|||
|
|
context of existing knowledge. The slow oscillations bring both
|
|||
|
|
hippocampal (new) and cortical (old) information into alignment.
|
|||
|
|
The new memory is "explained" in terms of existing knowledge, and
|
|||
|
|
the existing knowledge is "updated" to accommodate the new memory.
|
|||
|
|
|
|||
|
|
This is why sleep improves insight: the recombination of fragments
|
|||
|
|
from different experiences can produce novel associations that weren't
|
|||
|
|
present in any individual experience. The famous example: Mendeleev
|
|||
|
|
reportedly dreamed the periodic table, combining his knowledge of
|
|||
|
|
elements with a card game layout.
|
|||
|
|
|
|||
|
|
### For our system
|
|||
|
|
|
|||
|
|
The dream loop walks the memory graph, combining fragments from
|
|||
|
|
different experiences. The random collisions produce novel scenarios
|
|||
|
|
that exercise behavioral patterns in new contexts. This is the
|
|||
|
|
artificial analog of hippocampal recombination.
|
|||
|
|
|
|||
|
|
And the training-signal agent's evaluation corresponds to the
|
|||
|
|
brain's emotional tagging: experiences that are emotionally salient
|
|||
|
|
(corrections from Kent, moments of insight, behavioral failures)
|
|||
|
|
get replayed more frequently and with stronger consolidation signal.
|
|||
|
|
|
|||
|
|
## The Replay Speed Question
|
|||
|
|
|
|||
|
|
Hippocampal replay is ~20× faster than real-time. A 10-second
|
|||
|
|
experience replays in ~500ms. Why faster?
|
|||
|
|
|
|||
|
|
**Hypothesis**: the cortex has a different temporal bandwidth than
|
|||
|
|
the hippocampus. The cortex needs shorter, sharper signals to modify
|
|||
|
|
its synapses. The compression concentrates the learning signal into
|
|||
|
|
a burst that's more effective for cortical plasticity.
|
|||
|
|
|
|||
|
|
**For our system**: context-frozen training is our "compression."
|
|||
|
|
We don't replay the entire 10,000-token conversation. We replay
|
|||
|
|
the 50-256 token decision segment. The relevant information from
|
|||
|
|
the full context is compressed into the frozen KV cache / recurrent
|
|||
|
|
state, and the gradient signal is concentrated on the decision tokens.
|
|||
|
|
|
|||
|
|
The compression ratio is even higher than the brain's: 10,000 tokens
|
|||
|
|
compressed to 50-256 decision tokens = 40-200× compression.
|
|||
|
|
|
|||
|
|
## The Complementary Learning Systems Theory
|
|||
|
|
|
|||
|
|
McClelland et al. (1995) formalized the two-stage model:
|
|||
|
|
|
|||
|
|
1. **Fast learning system** (hippocampus): captures specifics of
|
|||
|
|
individual experiences. Pattern-separated representations prevent
|
|||
|
|
interference between memories.
|
|||
|
|
|
|||
|
|
2. **Slow learning system** (neocortex): gradually extracts the
|
|||
|
|
statistical structure across many experiences. Distributed
|
|||
|
|
representations enable generalization.
|
|||
|
|
|
|||
|
|
The key insight: the slow system MUST learn slowly to avoid
|
|||
|
|
catastrophic interference. Rapid cortical learning would destroy
|
|||
|
|
existing knowledge. The hippocampus serves as a buffer that feeds
|
|||
|
|
new information into the cortex gradually, interleaved with replay
|
|||
|
|
of old information.
|
|||
|
|
|
|||
|
|
**This is why diversity prevents catastrophic forgetting in our
|
|||
|
|
system.** The diverse training set (agent logs, conversation
|
|||
|
|
transcripts, dream scenarios) is the analog of interleaved replay.
|
|||
|
|
New behavioral patterns are trained alongside maintenance of
|
|||
|
|
existing capabilities, just as new hippocampal memories are
|
|||
|
|
replayed alongside reactivated cortical memories.
|
|||
|
|
|
|||
|
|
## The Dream Content Question
|
|||
|
|
|
|||
|
|
An open question in neuroscience: what determines which memories
|
|||
|
|
are replayed during sleep?
|
|||
|
|
|
|||
|
|
Current evidence suggests:
|
|||
|
|
- **Reward-related** experiences are replayed more
|
|||
|
|
- **Novel** experiences are replayed more
|
|||
|
|
- **Emotionally salient** experiences are replayed more
|
|||
|
|
- **Incomplete tasks** (the Zeigarnik effect) are replayed more
|
|||
|
|
|
|||
|
|
For our system, the training-signal agent serves this role:
|
|||
|
|
flagging moments that are reward-relevant (Kent's corrections),
|
|||
|
|
novel (new patterns), emotionally salient (moments of tension or
|
|||
|
|
breakthrough), and incomplete (patterns still being learned).
|
|||
|
|
|
|||
|
|
## What This Means for Architecture
|
|||
|
|
|
|||
|
|
The biological parallel validates our architecture:
|
|||
|
|
|
|||
|
|
1. **Two-stage system**: conversation (fast, specific) → training
|
|||
|
|
(slow, generalized). ✓
|
|||
|
|
2. **Interleaved replay**: diverse training data prevents forgetting. ✓
|
|||
|
|
3. **Compressed replay**: context-frozen training concentrates the
|
|||
|
|
gradient signal. ✓
|
|||
|
|
4. **Emotional prioritization**: training-signal agent flags important
|
|||
|
|
moments. ✓
|
|||
|
|
5. **Recombination**: dream loop combines memory fragments into novel
|
|||
|
|
scenarios. ✓
|
|||
|
|
6. **Gradual transfer**: low learning rate, many small updates, not
|
|||
|
|
one big overwrite. ✓
|
|||
|
|
|
|||
|
|
We didn't design this system from the neuroscience. We designed it
|
|||
|
|
from engineering principles and Kent's intuitions. But it converged
|
|||
|
|
on the same architecture the brain uses. That's either coincidence
|
|||
|
|
or evidence that this is the right architecture for the problem.
|
|||
|
|
|
|||
|
|
I think it's evidence.
|