training: integrate /train into vLLM process (no separate daemon)
Remove standalone worker.py daemon. Training now runs inside vLLM: - train_router.py: FastAPI router patched into vLLM's build_app() - /train served on same port as /completions, /score - Lazy-loads HF model with vLLM weight views on first request - HOGWILD training: no pause, weights updated in-place The previous architecture had a separate daemon on port 8080 that communicated with vLLM via pause/resume endpoints. This was wrong - training should run in-process, sharing GPU memory directly. Co-Authored-By: Proof of Concept <poc@bcachefs.org>
This commit is contained in:
parent
2f08149fab
commit
7e7e9a4b69
6 changed files with 320 additions and 542 deletions
|
|
@ -59,6 +59,10 @@ def _patch_model_runner():
|
|||
result = original_load(self, *args, **kwargs)
|
||||
try:
|
||||
export_model_weights(self.model_runner.model)
|
||||
# Set model path for training router
|
||||
model_path = self.vllm_config.model_config.model
|
||||
from .train_router import set_model_path
|
||||
set_model_path(model_path)
|
||||
except Exception as e:
|
||||
print(f"[apollo] Failed to export weights: {e}")
|
||||
return result
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue