training: persist Apollo optimizer state across /train calls

Optimizer state (momentum, variance estimates) now persists between training sessions: - Saved to /tmp/apollo_optimizer_state.pt during checkpoint sync - Restored on next /train call if available - Preserves training continuity for incremental learning Previously each /train call started with fresh optimizer state, losing accumulated gradient history. Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 00:51:58 -04:00 · 2026-04-16 00:51:58 -04:00 · 039473d31f
commit 039473d31f
parent 78fa4b639f
2 changed files with 51 additions and 16 deletions
--- a/training/DESIGN.md
+++ b/training/DESIGN.md
@ -215,6 +215,7 @@ a few hundred MB.
 | File | Purpose |
 |------|---------|
 | `/tmp/vllm_weight_handles.pt` | CUDA IPC handles for weight sharing. Written by export_hook on vLLM startup. Read by train_router to construct HF model with vLLM weight views. |
+| `/tmp/apollo_optimizer_state.pt` | Apollo optimizer state (momentum, variance estimates). Saved during checkpoint sync, restored on next /train call. Preserves training continuity across sessions. |
 | `<model_dir>/*.safetensors` | Model weights. Updated in-place by checkpoint_sync. |

 ### Moria (client)
@ -224,11 +225,11 @@ a few hundred MB.
 | `~/.consciousness/cache/trained-responses.json` | Timestamps (ms) of responses already sent to /train. Prevents re-training the same response. |
 | `~/.consciousness/cache/finetune-alternates` | Marker file. If exists, alternate responses are generated during divergence scoring to show what model would say without memories. |

-### In-memory (not persisted)
+### In-memory

 | State | Location | Notes |
 |-------|----------|-------|
-| Apollo optimizer state | train_router._model | Created fresh each /train call. ~10GB for rank-256. Not persisted between requests. |
+| Apollo optimizer | train_router._optimizer | ~10GB for rank-256. Persisted to `/tmp/apollo_optimizer_state.pt` during checkpoint sync. |
 | HF model with vLLM views | train_router._model | Lazy-loaded on first /train. Parameters point to vLLM's GPU memory. |

 ## Hyperparameters