training: restructure as vLLM plugin package

- Convert to installable package with entry points for vLLM auto-discovery
- Add checkpoint_sync.py: Python replacement for Rust checkpoint binary
  - Block-level diffing of safetensors files (4KB blocks)
  - vLLM→HF weight name conversion built-in
  - Scheduled 10min after training jobs (batched)
- API change: /train now takes raw token IDs (context_ids + continuation_ids)
  - No tokenizer on training side, client owns tokenization
- Remove superseded code: standalone scripts, Rust binary, tokenizer helpers

Install: pip install -e ./training
Then vLLM auto-loads via entry point.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
This commit is contained in:
Kent Overstreet 2026-04-15 23:16:53 -04:00
parent b649a11645
commit a73bcf5ae3
15 changed files with 607 additions and 1068 deletions

View file

@ -0,0 +1,17 @@
"""Apollo training plugin for vLLM.
Enables continuous fine-tuning alongside live inference by:
1. Exporting CUDA IPC handles for weight sharing
2. Providing a training worker daemon (/train endpoint)
3. Block-level checkpoint sync to safetensors files
Install: pip install -e /path/to/training
Then vLLM auto-loads via entry point.
"""
from .export_hook import _patch_model_runner
def register():
"""Called by vLLM's plugin loader on startup."""
_patch_model_runner()