consciousness

History

ProofOfConcept d3dcfe8899 research: surgical vs distributed behavioral change — the hierarchy hypothesis Facts are localized (ROME). Behaviors are hierarchically distributed: core circuit (small set of mid-late layer attention heads) + supporting circuits (distributed context encoding). Apollo's flat minima are right for distributed change. Rank-256 captures the full hierarchy. Includes measurement plan for validating which heads change during training.		2026-03-31 01:33:57 -04:00
..
checkpoint	checkpoint: sync live weights back into model safetensors in-place	2026-03-30 22:55:23 -04:00
research	research: surgical vs distributed behavioral change — the hierarchy hypothesis	2026-03-31 01:33:57 -04:00
apollo_mini.py	apollo: rewrite optimizer from paper's math + add research analysis	2026-03-31 00:54:17 -04:00
apollo_worker.py	apollo: make rank configurable (default 1 = Mini, higher ranks for experimentation)	2026-03-30 22:06:31 -04:00
DESIGN.md	DESIGN.md: complete rewrite reflecting validated architecture	2026-03-31 00:42:53 -04:00
export_weights.py	apollo-mini training system: initial implementation	2026-03-30 22:02:37 -04:00
start_vllm_with_apollo.sh	vllm launcher with apollo hook	2026-03-30 22:24:02 -04:00
train.py	apollo-mini training system: initial implementation	2026-03-30 22:02:37 -04:00
training_example.py	apollo-mini training system: initial implementation	2026-03-30 22:02:37 -04:00
vllm_export_hook.py	apollo-checkpoint: efficient diff-based GPU weight checkpointing	2026-03-30 22:53:17 -04:00
weight_mapping.py	weight_mapping: strip language_model prefix to match HF text model names	2026-03-30 23:11:03 -04:00