consciousness

Author	SHA1	Message	Date
ProofOfConcept	2ecf4e21ff	weight_mapping: strip language_model prefix to match HF text model names	2026-03-30 23:11:03 -04:00
ProofOfConcept	6fb9735def	weight_mapping: fix name prefix, add attention QKV dims	2026-03-30 23:09:08 -04:00
ProofOfConcept	d0883e101b	checkpoint: sync live weights back into model safetensors in-place mmap each safetensors file, diff block-by-block against live GPU weights, memcpy only changed blocks. No separate checkpoint files — the model directory IS the checkpoint. Every 10 min via cron.	2026-03-30 22:55:23 -04:00
ProofOfConcept	c1245ab139	apollo-checkpoint: efficient diff-based GPU weight checkpointing Rust tool that mmaps previous checkpoint, diffs against live GPU weights (via CUDA IPC handles), and only writes changed blocks. For small behavioral training steps, turns 54GB write into ~500MB. Also includes vllm_export_hook.py with direct source patch approach — exports IPC handles from vLLM's worker subprocess after model load. Run every 10 minutes via cron to protect against vLLM crashes. Daily rsync to moria for long-term storage.	2026-03-30 22:53:17 -04:00
ProofOfConcept	5f41898bb8	vllm launcher with apollo hook	2026-03-30 22:24:02 -04:00
ProofOfConcept	0402a9333c	vllm weight export hook: monkey-patches model runner to save IPC handles on load	2026-03-30 22:20:04 -04:00
ProofOfConcept	8e7b4a22db	apollo: default rank 256 — 0.25% compute cost, captures gradient structure across 100+ examples	2026-03-30 22:16:34 -04:00
ProofOfConcept	e1cd4fb0ab	apollo: make rank configurable (default 1 = Mini, higher ranks for experimentation)	2026-03-30 22:06:31 -04:00
ProofOfConcept	c5d7d8cb5d	apollo-mini training system: initial implementation Core components for online fine-tuning of Qwen3.5-27B with CUDA IPC shared weight memory between vLLM and the training process: - apollo_mini.py: rank-1 optimizer (SGD memory, AdamW quality) - apollo_worker.py: HTTP daemon coordinating training with vLLM - weight_mapping.py: vLLM merged → HF separate layout (zero-copy views) - training_example.py: tokenization with chat template - export_weights.py: CUDA IPC handle export from vLLM - train.py: standalone training script (alternative to daemon) - DESIGN.md: architecture and protocol documentation Validated: CUDA IPC autograd works on real Qwen3.5 weights (B200). Apollo-Mini rank-1 projection + scaling + in-place update confirmed. Co-Authored-By: Kent Overstreet <kent.overstreet@gmail.com>	2026-03-30 22:02:37 -04:00

9 commits