consciousness

History

ProofOfConcept c1245ab139 apollo-checkpoint: efficient diff-based GPU weight checkpointing Rust tool that mmaps previous checkpoint, diffs against live GPU weights (via CUDA IPC handles), and only writes changed blocks. For small behavioral training steps, turns 54GB write into ~500MB. Also includes vllm_export_hook.py with direct source patch approach — exports IPC handles from vLLM's worker subprocess after model load. Run every 10 minutes via cron to protect against vLLM crashes. Daily rsync to moria for long-term storage.		2026-03-30 22:53:17 -04:00
..
checkpoint	apollo-checkpoint: efficient diff-based GPU weight checkpointing	2026-03-30 22:53:17 -04:00
apollo_mini.py	apollo: default rank 256 — 0.25% compute cost, captures gradient structure across 100+ examples	2026-03-30 22:16:34 -04:00
apollo_worker.py	apollo: make rank configurable (default 1 = Mini, higher ranks for experimentation)	2026-03-30 22:06:31 -04:00
DESIGN.md	apollo-mini training system: initial implementation	2026-03-30 22:02:37 -04:00
export_weights.py	apollo-mini training system: initial implementation	2026-03-30 22:02:37 -04:00
start_vllm_with_apollo.sh	vllm launcher with apollo hook	2026-03-30 22:24:02 -04:00
train.py	apollo-mini training system: initial implementation	2026-03-30 22:02:37 -04:00
training_example.py	apollo-mini training system: initial implementation	2026-03-30 22:02:37 -04:00
vllm_export_hook.py	apollo-checkpoint: efficient diff-based GPU weight checkpointing	2026-03-30 22:53:17 -04:00
weight_mapping.py	apollo-mini training system: initial implementation	2026-03-30 22:02:37 -04:00