consciousness/training/research
ProofOfConcept e7e1855b87 research: ORPO — combined SFT + preference in one step, ideal for behavioral training
ORPO applies 'minor penalty for disfavored response' during SFT.
Single learning rate, single pass, both objectives. Implements
the bypass mechanism naturally (minor penalty = disfavor, not remove).
The loss landscape geometry explains the 40x lr gap: SFT is a valley,
DPO is a ridge, ORPO combines both. LLaMA-Factory supports it.
Dream loop generates triplets (context + preferred + rejected).
2026-03-31 02:51:26 -04:00
..
v0 research: distill and sift — SUMMARY of 7 real insights + 7 testable questions 2026-03-31 02:26:57 -04:00
apollo-paper-analysis.md apollo: rewrite optimizer from paper's math + add research analysis 2026-03-31 00:54:17 -04:00
context-frozen-training.md research: context-frozen training — gradient masking, memory analysis, GDN considerations 2026-03-31 00:59:04 -04:00
gdn-gradient-flow.md research: GDN gradient flow — disposition architecture in linear attention 2026-03-31 01:58:50 -04:00
gradient-flow-frozen-context.md research: gradient flow through frozen context + directional sharpness analysis 2026-03-31 01:03:22 -04:00
hogwild-convergence.md research: HOGWILD convergence theory — why lock-free concurrent training works 2026-03-31 00:58:02 -04:00
OPEN-QUESTIONS.md research: distill and sift — SUMMARY of 7 real insights + 7 testable questions 2026-03-31 02:26:57 -04:00
practical-intuitions.md research: ORPO — combined SFT + preference in one step, ideal for behavioral training 2026-03-31 02:51:26 -04:00
steering-vectors-bridge.md research: steering vectors — prototype behavioral changes before training 2026-03-31 02:19:50 -04:00
SUMMARY.md research: distill and sift — SUMMARY of 7 real insights + 7 testable questions 2026-03-31 02:26:57 -04:00
task-vectors-model-merging.md research: task vectors + model merging — version control for personality 2026-03-31 02:18:15 -04:00