forked from kent/consciousness
training: integrate /train into vLLM process (no separate daemon)
Remove standalone worker.py daemon. Training now runs inside vLLM: - train_router.py: FastAPI router patched into vLLM's build_app() - /train served on same port as /completions, /score - Lazy-loads HF model with vLLM weight views on first request - HOGWILD training: no pause, weights updated in-place The previous architecture had a separate daemon on port 8080 that communicated with vLLM via pause/resume endpoints. This was wrong - training should run in-process, sharing GPU memory directly. Co-Authored-By: Proof of Concept <poc@bcachefs.org>
This commit is contained in:
parent
2f08149fab
commit
7e7e9a4b69
6 changed files with 320 additions and 542 deletions
|
|
@ -22,25 +22,29 @@ The training signal comes from two sources:
|
|||
│ │
|
||||
│ ┌──────────────────────────────────────────────┐ │
|
||||
│ │ Model Weights (54GB, bf16) │ │
|
||||
│ │ Shared via CUDA IPC │ │
|
||||
│ │ Shared: vLLM inference + HF training │ │
|
||||
│ └──────────────┬──────────────┬────────────────┘ │
|
||||
│ │ │ │
|
||||
│ ┌──────────────▼──┐ ┌───────▼────────────────┐ │
|
||||
│ │ vLLM (inference)│ │ Apollo (training) │ │
|
||||
│ │ vLLM (inference)│ │ HF model (training) │ │
|
||||
│ │ KV cache ~60GB │ │ Gradients ~54GB │ │
|
||||
│ │ Serves requests │ │ Optimizer state ~10GB │ │
|
||||
│ │ Never paused │ │ Activations ~10GB │ │
|
||||
│ │ /completions │ │ Optimizer state ~10GB │ │
|
||||
│ │ /score │ │ Views into vLLM weights │ │
|
||||
│ │ /train ────────┼──┼─► Apollo optimizer │ │
|
||||
│ └─────────────────┘ └────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────┘
|
||||
|
||||
Moria B200
|
||||
Single vLLM process serves everything
|
||||
No separate daemon - /train is a vLLM route
|
||||
|
||||
Moria B200 (vLLM)
|
||||
┌──────────────────┐ ┌──────────────────┐
|
||||
│ Training signal │ HTTP │ Apollo worker │
|
||||
│ agent │──────────>│ daemon │
|
||||
│ │ │ │
|
||||
│ Dream loop │ │ Checkpoint sync │
|
||||
│ (generates │ │ (mmap + diff, │
|
||||
│ scenarios) │ │ every 10 min) │
|
||||
│ Training signal │ HTTP │ /completions │
|
||||
│ agent │──────────>│ /score │
|
||||
│ │ │ /train │
|
||||
│ Dream loop │ │ │
|
||||
│ (generates │ │ Checkpoint sync │
|
||||
│ scenarios) │ │ (10 min batched) │
|
||||
└──────────────────┘ └──────────────────┘
|
||||
```
|
||||
|
||||
|
|
@ -220,34 +224,30 @@ a few hundred MB.
|
|||
## Components
|
||||
|
||||
### Built ✓
|
||||
- `apollo_mini.py` — Apollo optimizer (configurable rank, default 256)
|
||||
- `apollo_worker.py` — HTTP daemon (aiohttp, job tracking)
|
||||
- `optimizer.py` — Apollo optimizer (configurable rank, default 256)
|
||||
- `train_router.py` — /train endpoint, runs in vLLM process
|
||||
- `weight_mapping.py` — vLLM merged → HF separate views (validated)
|
||||
- `training_example.py` — tokenization with chat template
|
||||
- `vllm_export_hook.py` — source patch for IPC handle export
|
||||
- `checkpoint/` — Rust tool for mmap + diff checkpoint sync
|
||||
- `export_hook.py` — vLLM plugin hook for IPC handle export
|
||||
- `checkpoint_sync.py` — mmap + diff checkpoint sync (Python)
|
||||
|
||||
### To build
|
||||
- **Dream loop → training bridge**: connect dream output to Apollo
|
||||
- **Dream loop → training bridge**: connect dream output to /train
|
||||
- **Training-signal agent**: flags moments in conversation logs
|
||||
- **Instruction stripping**: remove scaffolding from training examples
|
||||
- **Quality monitoring**: track model capability over time
|
||||
- **HF model forward pass integration**: wire into apollo_worker
|
||||
|
||||
## Files
|
||||
|
||||
```
|
||||
training/
|
||||
DESIGN.md — this document
|
||||
apollo_mini.py — Apollo optimizer
|
||||
apollo_worker.py — HTTP training daemon
|
||||
weight_mapping.py — vLLM ↔ HF weight views
|
||||
training_example.py — tokenization helpers
|
||||
export_weights.py — standalone weight export (unused)
|
||||
vllm_export_hook.py — vLLM source patch for IPC export
|
||||
start_vllm_with_apollo.sh — vLLM launcher (unused, using source patch)
|
||||
train.py — standalone training script (alternative)
|
||||
checkpoint/
|
||||
Cargo.toml — Rust checkpoint tool
|
||||
src/main.rs — mmap + diff sync
|
||||
DESIGN.md — this document
|
||||
pyproject.toml — package config, vLLM plugin entry point
|
||||
apollo_plugin/
|
||||
__init__.py — plugin registration
|
||||
export_hook.py — patches vLLM to export IPC handles
|
||||
train_router.py — /train endpoint (FastAPI router)
|
||||
optimizer.py — Apollo optimizer
|
||||
weight_mapping.py — vLLM ↔ HF weight views
|
||||
checkpoint_sync.py — mmap + diff sync to safetensors
|
||||
steering.py — steering vector extraction (experimental)
|
||||
```
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue