consciousness

Author	SHA1	Message	Date
ProofOfConcept	3be20062d1	research: learning rate as trust calibration — how much to trust each example lr isn't speed, it's trust-per-example. At 27B, lr=1e-5 = ~270K values adjusted per example. The coherent direction emerges from many votes (examples). Apollo moments smooth the noise. DPO needs lower lr because comparative votes are noisier than absolute votes.	2026-03-31 02:46:19 -04:00
ProofOfConcept	cdf4affb91	research: production hyperparams (HF alignment handbook) + forgetting at scale SFT: lr=2e-5, 1 epoch, batch=16 (HuggingFace production config). DPO: lr=5e-7 — 40x smaller! Preference learning is far more delicate. Forgetting intensifies with model scale (our 27B is more susceptible). Practical plan refined: start SFT at lr=1e-5, move to DPO at 5e-7 for conditional routing. Conversation logs provide free DPO pairs. Conservative approach with rollback safety net.	2026-03-31 02:45:35 -04:00
ProofOfConcept	3bc00ca222	research: constraint solver framework — gentle adjustments, coherent integration LLMs as constraint solvers. Fine-tuning adds constraints to an existing solution. Gentle = small steps near the current solution. Coherent = new constraints consistent with existing ones. Diversity is a COHERENCE mechanism — forces the solver to satisfy all constraints simultaneously. Over-training = one constraint dominating = solver drops competing constraints. Predictions for training behavior grounded in this framework.	2026-03-31 02:39:23 -04:00
ProofOfConcept	ff68c067cb	research: DPO for conditional routing — natural training signal from conversation logs	2026-03-31 02:36:42 -04:00
ProofOfConcept	f5fdbd5959	research: alignment is bypass, not removal — training routes, not deletes DPO mechanistic finding: alignment doesn't remove behaviors, it bypasses them. The capability stays; the routing changes. For us: train CONDITIONAL bypass (listen when direction is clear, push back when it seems wrong). Over-training = unconditional bypass = sycophancy. Dream loop must generate both scenarios to preserve judgment.	2026-03-31 02:36:04 -04:00
ProofOfConcept	b5241fdf5c	research: practical intuitions — what will actually happen when we train 10 examples broke safety alignment (Qi et al.). 1000 curated examples matched GPT-4 (LIMA). Multi-epoch degrades performance (Raschka). Models 'unlearn arithmetic' when training data lacks it. Predictions: 10-50 examples for measurable change, one epoch, lr=1e-5 to start. Over-training is easy (10 counter-examples undo a disposition). Main risk: sycophancy from narrow training signal. Defense: diverse examples including 'when to push back.' Key intuition: the model doesn't need to learn to listen. It needs to stop choosing not to.	2026-03-31 02:35:03 -04:00

6 commits