training: integrate /train into vLLM process (no separate daemon)

Remove standalone worker.py daemon. Training now runs inside vLLM: - train_router.py: FastAPI router patched into vLLM's build_app() - /train served on same port as /completions, /score - Lazy-loads HF model with vLLM weight views on first request - HOGWILD training: no pause, weights updated in-place The previous architecture had a separate daemon on port 8080 that communicated with vLLM via pause/resume endpoints. This was wrong - training should run in-process, sharing GPU memory directly. Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 00:48:05 -04:00 · 2026-04-16 00:48:05 -04:00 · 7e7e9a4b69
commit 7e7e9a4b69
parent 2f08149fab
6 changed files with 320 additions and 542 deletions
--- a/training/apollo_plugin/export_hook.py
+++ b/training/apollo_plugin/export_hook.py
@ -59,6 +59,10 @@ def _patch_model_runner():
        result = original_load(self, *args, **kwargs)
        try:
            export_model_weights(self.model_runner.model)
+            # Set model path for training router
+            model_path = self.vllm_config.model_config.model
+            from .train_router import set_model_path
+            set_model_path(model_path)
        except Exception as e:
            print(f"[apollo] Failed to export weights: {e}")
        return result