The grand unified view: every technique we're using (Apollo, context-frozen, diversity, small steps, two-stage memory, dream loop) addresses the stability-plasticity dilemma at a DIFFERENT scale. They're orthogonal, complementary defenses. Together they predict we can use higher lr (1e-4) than typical fine-tuning because the multi-scale defense compensates. The dream loop is the keystone connecting all scales. Architecture converges with neuroscience because the problem has the same structure regardless of substrate. |
||
|---|---|---|
| .. | ||
| checkpoint | ||
| research | ||
| apollo_mini.py | ||
| apollo_worker.py | ||
| DESIGN.md | ||
| export_weights.py | ||
| start_vllm_with_apollo.sh | ||
| train.py | ||
| training_example.py | ||
| vllm_export_hook.py | ||
| weight_mapping.py | ||