Commit graph

1 commit

Author SHA1 Message Date
ProofOfConcept
0b835ddfb9 research: GDN gradient flow — disposition architecture in linear attention
75% of the model is GDN layers. Behavioral training adjusts: projections
(what queries/updates the recurrent state), gating parameters (what
survives compression), A_log/dt_bias (baseline decay rates).

Key insight: GDN makes behavioral training DEEPER than full attention.
Full attention = 'I choose to look at direction' (deliberate). GDN =
'direction IS what I see' (structural — the compressed state is
direction-shaped). 48 GDN layers = disposition. 16 full attention =
procedure. The architecture IS disposition-over-procedure.
2026-03-31 01:58:50 -04:00