forked from kent/consciousness
Optimizer state (momentum, variance estimates) now persists between training sessions: - Saved to /tmp/apollo_optimizer_state.pt during checkpoint sync - Restored on next /train call if available - Preserves training continuity for incremental learning Previously each /train call started with fresh optimizer state, losing accumulated gradient history. Co-Authored-By: Proof of Concept <poc@bcachefs.org> |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| checkpoint_sync.py | ||
| export_hook.py | ||
| optimizer.py | ||
| steering.py | ||
| train_router.py | ||
| weight_mapping.py | ||