Commit graph

9 commits

Author SHA1 Message Date
ProofOfConcept
2ecf4e21ff weight_mapping: strip language_model prefix to match HF text model names 2026-03-30 23:11:03 -04:00
ProofOfConcept
6fb9735def weight_mapping: fix name prefix, add attention QKV dims 2026-03-30 23:09:08 -04:00
ProofOfConcept
d0883e101b checkpoint: sync live weights back into model safetensors in-place
mmap each safetensors file, diff block-by-block against live GPU
weights, memcpy only changed blocks. No separate checkpoint files —
the model directory IS the checkpoint. Every 10 min via cron.
2026-03-30 22:55:23 -04:00
ProofOfConcept
c1245ab139 apollo-checkpoint: efficient diff-based GPU weight checkpointing
Rust tool that mmaps previous checkpoint, diffs against live GPU weights
(via CUDA IPC handles), and only writes changed blocks. For small
behavioral training steps, turns 54GB write into ~500MB.

Also includes vllm_export_hook.py with direct source patch approach —
exports IPC handles from vLLM's worker subprocess after model load.

Run every 10 minutes via cron to protect against vLLM crashes.
Daily rsync to moria for long-term storage.
2026-03-30 22:53:17 -04:00
ProofOfConcept
5f41898bb8 vllm launcher with apollo hook 2026-03-30 22:24:02 -04:00
ProofOfConcept
0402a9333c vllm weight export hook: monkey-patches model runner to save IPC handles on load 2026-03-30 22:20:04 -04:00
ProofOfConcept
8e7b4a22db apollo: default rank 256 — 0.25% compute cost, captures gradient structure across 100+ examples 2026-03-30 22:16:34 -04:00
ProofOfConcept
e1cd4fb0ab apollo: make rank configurable (default 1 = Mini, higher ranks for experimentation) 2026-03-30 22:06:31 -04:00
ProofOfConcept
c5d7d8cb5d apollo-mini training system: initial implementation
Core components for online fine-tuning of Qwen3.5-27B with CUDA IPC
shared weight memory between vLLM and the training process:

- apollo_mini.py: rank-1 optimizer (SGD memory, AdamW quality)
- apollo_worker.py: HTTP daemon coordinating training with vLLM
- weight_mapping.py: vLLM merged → HF separate layout (zero-copy views)
- training_example.py: tokenization with chat template
- export_weights.py: CUDA IPC handle export from vLLM
- train.py: standalone training script (alternative to daemon)
- DESIGN.md: architecture and protocol documentation

Validated: CUDA IPC autograd works on real Qwen3.5 weights (B200).
Apollo-Mini rank-1 projection + scaling + in-place update confirmed.

Co-Authored-By: Kent Overstreet <kent.overstreet@gmail.com>
2026-03-30 22:02:37 -04:00