forked from kent/consciousness
training: restructure as vLLM plugin package
- Convert to installable package with entry points for vLLM auto-discovery - Add checkpoint_sync.py: Python replacement for Rust checkpoint binary - Block-level diffing of safetensors files (4KB blocks) - vLLM→HF weight name conversion built-in - Scheduled 10min after training jobs (batched) - API change: /train now takes raw token IDs (context_ids + continuation_ids) - No tokenizer on training side, client owns tokenization - Remove superseded code: standalone scripts, Rust binary, tokenizer helpers Install: pip install -e ./training Then vLLM auto-loads via entry point. Co-Authored-By: Proof of Concept <poc@bcachefs.org>
This commit is contained in:
parent
b649a11645
commit
a73bcf5ae3
15 changed files with 607 additions and 1068 deletions
17
training/apollo_plugin/__init__.py
Normal file
17
training/apollo_plugin/__init__.py
Normal file
|
|
@ -0,0 +1,17 @@
|
|||
"""Apollo training plugin for vLLM.
|
||||
|
||||
Enables continuous fine-tuning alongside live inference by:
|
||||
1. Exporting CUDA IPC handles for weight sharing
|
||||
2. Providing a training worker daemon (/train endpoint)
|
||||
3. Block-level checkpoint sync to safetensors files
|
||||
|
||||
Install: pip install -e /path/to/training
|
||||
Then vLLM auto-loads via entry point.
|
||||
"""
|
||||
|
||||
from .export_hook import _patch_model_runner
|
||||
|
||||
|
||||
def register():
|
||||
"""Called by vLLM's plugin loader on startup."""
|
||||
_patch_model_runner()
|
||||
Loading…
Add table
Add a link
Reference in a new issue