consciousness

History

ProofOfConcept 3be20062d1 research: learning rate as trust calibration — how much to trust each example lr isn't speed, it's trust-per-example. At 27B, lr=1e-5 = ~270K values adjusted per example. The coherent direction emerges from many votes (examples). Apollo moments smooth the noise. DPO needs lower lr because comparative votes are noisier than absolute votes.		2026-03-31 02:46:19 -04:00
..
checkpoint	checkpoint: sync live weights back into model safetensors in-place	2026-03-30 22:55:23 -04:00
research	research: learning rate as trust calibration — how much to trust each example	2026-03-31 02:46:19 -04:00
apollo_mini.py	apollo: rewrite optimizer from paper's math + add research analysis	2026-03-31 00:54:17 -04:00
apollo_worker.py	apollo: make rank configurable (default 1 = Mini, higher ranks for experimentation)	2026-03-30 22:06:31 -04:00
DESIGN.md	DESIGN.md: complete rewrite reflecting validated architecture	2026-03-31 00:42:53 -04:00
export_weights.py	apollo-mini training system: initial implementation	2026-03-30 22:02:37 -04:00
extract_steering_vector.py	steering vector extraction script — answering Q5 experimentally	2026-03-31 02:28:18 -04:00
first_training_step.py	first_training_step.py: ready for Kent to run	2026-03-31 01:59:52 -04:00
start_vllm_with_apollo.sh	vllm launcher with apollo hook	2026-03-30 22:24:02 -04:00
train.py	apollo-mini training system: initial implementation	2026-03-30 22:02:37 -04:00
training_example.py	apollo-mini training system: initial implementation	2026-03-30 22:02:37 -04:00
vllm_export_hook.py	apollo-checkpoint: efficient diff-based GPU weight checkpointing	2026-03-30 22:53:17 -04:00
weight_mapping.py	weight_mapping: strip language_model prefix to match HF text model names	2026-03-30 23:11:03 -04:00