consciousness/training/research
ProofOfConcept e34d6b5aef research: gradient flow through frozen context + directional sharpness analysis
Two deep dives following curiosity:
- Why context-frozen training works: gradient flows through W_q (query
  projection) even when context KVs are frozen. Model learns to LOOK AT
  context differently, not represent it differently. This is exactly what
  behavioral fine-tuning needs.
- Why Apollo beats AdamW: lower directional sharpness = flatter minima =
  better generalization. The coarseness of channel/tensor-wise scaling
  prevents over-fitting to specific training examples. For behavioral
  fine-tuning, this means learning 'accept direction' rather than
  'accept this specific phrasing.'
2026-03-31 01:03:22 -04:00
..
apollo-paper-analysis.md apollo: rewrite optimizer from paper's math + add research analysis 2026-03-31 00:54:17 -04:00
catastrophic-forgetting.md research: catastrophic forgetting analysis — diversity is the primary defense 2026-03-31 00:56:58 -04:00
context-frozen-training.md research: context-frozen training — gradient masking, memory analysis, GDN considerations 2026-03-31 00:59:04 -04:00
directional-sharpness.md research: gradient flow through frozen context + directional sharpness analysis 2026-03-31 01:03:22 -04:00
gradient-flow-frozen-context.md research: gradient flow through frozen context + directional sharpness analysis 2026-03-31 01:03:22 -04:00
hogwild-convergence.md research: HOGWILD convergence theory — why lock-free concurrent training works 2026-03-31 00:58:02 -04:00