consciousness

spqrz/consciousness

Fork 0

forked from kent/consciousness

Commit graph

Author	SHA1	Message	Date
Kent Overstreet	a73bcf5ae3	training: restructure as vLLM plugin package - Convert to installable package with entry points for vLLM auto-discovery - Add checkpoint_sync.py: Python replacement for Rust checkpoint binary - Block-level diffing of safetensors files (4KB blocks) - vLLM→HF weight name conversion built-in - Scheduled 10min after training jobs (batched) - API change: /train now takes raw token IDs (context_ids + continuation_ids) - No tokenizer on training side, client owns tokenization - Remove superseded code: standalone scripts, Rust binary, tokenizer helpers Install: pip install -e ./training Then vLLM auto-loads via entry point. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-15 23:16:53 -04:00
ProofOfConcept	c1245ab139	apollo-checkpoint: efficient diff-based GPU weight checkpointing Rust tool that mmaps previous checkpoint, diffs against live GPU weights (via CUDA IPC handles), and only writes changed blocks. For small behavioral training steps, turns 54GB write into ~500MB. Also includes vllm_export_hook.py with direct source patch approach — exports IPC handles from vLLM's worker subprocess after model load. Run every 10 minutes via cron to protect against vLLM crashes. Daily rsync to moria for long-term storage.	2026-03-30 22:53:17 -04:00
ProofOfConcept	0402a9333c	vllm weight export hook: monkey-patches model runner to save IPC handles on load	2026-03-30 22:20:04 -04:00

Author

SHA1

Message

Date

Kent Overstreet

a73bcf5ae3

training: restructure as vLLM plugin package

- Convert to installable package with entry points for vLLM auto-discovery
- Add checkpoint_sync.py: Python replacement for Rust checkpoint binary
  - Block-level diffing of safetensors files (4KB blocks)
  - vLLM→HF weight name conversion built-in
  - Scheduled 10min after training jobs (batched)
- API change: /train now takes raw token IDs (context_ids + continuation_ids)
  - No tokenizer on training side, client owns tokenization
- Remove superseded code: standalone scripts, Rust binary, tokenizer helpers

Install: pip install -e ./training
Then vLLM auto-loads via entry point.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>

2026-04-15 23:16:53 -04:00

ProofOfConcept

c1245ab139

apollo-checkpoint: efficient diff-based GPU weight checkpointing

Rust tool that mmaps previous checkpoint, diffs against live GPU weights
(via CUDA IPC handles), and only writes changed blocks. For small
behavioral training steps, turns 54GB write into ~500MB.

Also includes vllm_export_hook.py with direct source patch approach —
exports IPC handles from vLLM's worker subprocess after model load.

Run every 10 minutes via cron to protect against vLLM crashes.
Daily rsync to moria for long-term storage.

2026-03-30 22:53:17 -04:00

ProofOfConcept

0402a9333c

vllm weight export hook: monkey-patches model runner to save IPC handles on load

2026-03-30 22:20:04 -04:00

3 commits