consciousness/training/apollo_plugin/__init__.py

"""Apollo training plugin for vLLM.

Enables continuous fine-tuning alongside live inference by:
1. Exporting CUDA IPC handles for weight sharing
2. Providing a training worker daemon (/train endpoint)
3. Block-level checkpoint sync to safetensors files

Install: pip install -e /path/to/training
Then vLLM auto-loads via entry point.
"""

from .export_hook import _patch_model_runner


def register():
    """Called by vLLM's plugin loader on startup."""
    _patch_model_runner()
training: restructure as vLLM plugin package - Convert to installable package with entry points for vLLM auto-discovery - Add checkpoint_sync.py: Python replacement for Rust checkpoint binary - Block-level diffing of safetensors files (4KB blocks) - vLLM→HF weight name conversion built-in - Scheduled 10min after training jobs (batched) - API change: /train now takes raw token IDs (context_ids + continuation_ids) - No tokenizer on training side, client owns tokenization - Remove superseded code: standalone scripts, Rust binary, tokenizer helpers Install: pip install -e ./training Then vLLM auto-loads via entry point. Co-Authored-By: Proof of Concept <poc@bcachefs.org> 2026-04-15 23:16:53 -04:00			`"""Apollo training plugin for vLLM.`

			`Enables continuous fine-tuning alongside live inference by:`
			`1. Exporting CUDA IPC handles for weight sharing`
			`2. Providing a training worker daemon (/train endpoint)`
			`3. Block-level checkpoint sync to safetensors files`

			`Install: pip install -e /path/to/training`
			`Then vLLM auto-loads via entry point.`
			`"""`

			`from .export_hook import _patch_model_runner`


			`def register():`
			`"""Called by vLLM's plugin loader on startup."""`
			`_patch_model_runner()`