Optimizer state (momentum, variance estimates) now persists between training sessions: - Saved to /tmp/apollo_optimizer_state.pt during checkpoint sync - Restored on next /train call if available - Preserves training continuity for incremental learning Previously each /train call started with fresh optimizer state, losing accumulated gradient history. Co-Authored-By: Proof of Concept <poc@bcachefs.org> |
||
|---|---|---|
| .. | ||
| apollo_plugin | ||
| research | ||
| DESIGN.md | ||
| pyproject.toml | ||