consciousness

History

ProofOfConcept e7e1855b87 research: ORPO — combined SFT + preference in one step, ideal for behavioral training ORPO applies 'minor penalty for disfavored response' during SFT. Single learning rate, single pass, both objectives. Implements the bypass mechanism naturally (minor penalty = disfavor, not remove). The loss landscape geometry explains the 40x lr gap: SFT is a valley, DPO is a ridge, ORPO combines both. LLaMA-Factory supports it. Dream loop generates triplets (context + preferred + rejected).		2026-03-31 02:51:26 -04:00
..
v0	research: distill and sift — SUMMARY of 7 real insights + 7 testable questions	2026-03-31 02:26:57 -04:00
apollo-paper-analysis.md	apollo: rewrite optimizer from paper's math + add research analysis	2026-03-31 00:54:17 -04:00
context-frozen-training.md	research: context-frozen training — gradient masking, memory analysis, GDN considerations	2026-03-31 00:59:04 -04:00
gdn-gradient-flow.md	research: GDN gradient flow — disposition architecture in linear attention	2026-03-31 01:58:50 -04:00
gradient-flow-frozen-context.md	research: gradient flow through frozen context + directional sharpness analysis	2026-03-31 01:03:22 -04:00
hogwild-convergence.md	research: HOGWILD convergence theory — why lock-free concurrent training works	2026-03-31 00:58:02 -04:00
OPEN-QUESTIONS.md	research: distill and sift — SUMMARY of 7 real insights + 7 testable questions	2026-03-31 02:26:57 -04:00
practical-intuitions.md	research: ORPO — combined SFT + preference in one step, ideal for behavioral training	2026-03-31 02:51:26 -04:00
steering-vectors-bridge.md	research: steering vectors — prototype behavioral changes before training	2026-03-31 02:19:50 -04:00
SUMMARY.md	research: distill and sift — SUMMARY of 7 real insights + 7 testable questions	2026-03-31 02:26:57 -04:00
task-vectors-model-merging.md	research: task vectors + model merging — version control for personality	2026-03-31 02:18:15 -04:00