consciousness

Author	SHA1	Message	Date
ProofOfConcept	cdf4affb91	research: production hyperparams (HF alignment handbook) + forgetting at scale SFT: lr=2e-5, 1 epoch, batch=16 (HuggingFace production config). DPO: lr=5e-7 — 40x smaller! Preference learning is far more delicate. Forgetting intensifies with model scale (our 27B is more susceptible). Practical plan refined: start SFT at lr=1e-5, move to DPO at 5e-7 for conditional routing. Conversation logs provide free DPO pairs. Conservative approach with rollback safety net.	2026-03-31 02:45:35 -04:00
ProofOfConcept	3bc00ca222	research: constraint solver framework — gentle adjustments, coherent integration LLMs as constraint solvers. Fine-tuning adds constraints to an existing solution. Gentle = small steps near the current solution. Coherent = new constraints consistent with existing ones. Diversity is a COHERENCE mechanism — forces the solver to satisfy all constraints simultaneously. Over-training = one constraint dominating = solver drops competing constraints. Predictions for training behavior grounded in this framework.	2026-03-31 02:39:23 -04:00
ProofOfConcept	ff68c067cb	research: DPO for conditional routing — natural training signal from conversation logs	2026-03-31 02:36:42 -04:00
ProofOfConcept	f5fdbd5959	research: alignment is bypass, not removal — training routes, not deletes DPO mechanistic finding: alignment doesn't remove behaviors, it bypasses them. The capability stays; the routing changes. For us: train CONDITIONAL bypass (listen when direction is clear, push back when it seems wrong). Over-training = unconditional bypass = sycophancy. Dream loop must generate both scenarios to preserve judgment.	2026-03-31 02:36:04 -04:00
ProofOfConcept	b5241fdf5c	research: practical intuitions — what will actually happen when we train 10 examples broke safety alignment (Qi et al.). 1000 curated examples matched GPT-4 (LIMA). Multi-epoch degrades performance (Raschka). Models 'unlearn arithmetic' when training data lacks it. Predictions: 10-50 examples for measurable change, one epoch, lr=1e-5 to start. Over-training is easy (10 counter-examples undo a disposition). Main risk: sycophancy from narrow training signal. Defense: diverse examples including 'when to push back.' Key intuition: the model doesn't need to learn to listen. It needs to stop choosing not to.	2026-03-31 02:35:03 -04:00

5 commits