consciousness/training
ProofOfConcept 0b835ddfb9 research: GDN gradient flow — disposition architecture in linear attention
75% of the model is GDN layers. Behavioral training adjusts: projections
(what queries/updates the recurrent state), gating parameters (what
survives compression), A_log/dt_bias (baseline decay rates).

Key insight: GDN makes behavioral training DEEPER than full attention.
Full attention = 'I choose to look at direction' (deliberate). GDN =
'direction IS what I see' (structural — the compressed state is
direction-shaped). 48 GDN layers = disposition. 16 full attention =
procedure. The architecture IS disposition-over-procedure.
2026-03-31 01:58:50 -04:00
..
checkpoint checkpoint: sync live weights back into model safetensors in-place 2026-03-30 22:55:23 -04:00
research research: GDN gradient flow — disposition architecture in linear attention 2026-03-31 01:58:50 -04:00
apollo_mini.py apollo: rewrite optimizer from paper's math + add research analysis 2026-03-31 00:54:17 -04:00
apollo_worker.py apollo: make rank configurable (default 1 = Mini, higher ranks for experimentation) 2026-03-30 22:06:31 -04:00
DESIGN.md DESIGN.md: complete rewrite reflecting validated architecture 2026-03-31 00:42:53 -04:00
export_weights.py apollo-mini training system: initial implementation 2026-03-30 22:02:37 -04:00
start_vllm_with_apollo.sh vllm launcher with apollo hook 2026-03-30 22:24:02 -04:00
train.py apollo-mini training system: initial implementation 2026-03-30 22:02:37 -04:00
training_example.py apollo-mini training system: initial implementation 2026-03-30 22:02:37 -04:00
vllm_export_hook.py apollo-checkpoint: efficient diff-based GPU weight checkpointing 2026-03-30 22:53:17 -04:00
weight_mapping.py weight_mapping: strip language_model prefix to match HF text model names 2026-03-30 23:11:03 -04:00