diff --git a/training/research/practical-intuitions.md b/training/research/practical-intuitions.md
index bfd1d6c..54d42e2 100644
--- a/training/research/practical-intuitions.md
+++ b/training/research/practical-intuitions.md
@@ -321,3 +321,35 @@ For DPO (later):
 - Paired examples from conversation logs
 - Train CONDITIONAL routing (listen AND push back)
 - Even more careful monitoring (DPO is fragile)
+
+## Learning Rate as Trust Calibration
+
+The learning rate isn't "how fast to train." It's "how much to
+trust each individual training example."
+
+lr=1e-5: each example adjusts constraints by ~0.001%
+lr=1e-4: each example adjusts constraints by ~0.01%
+
+At 27B parameters, even 0.001% is ~270K changed values. Each
+example gets a vote on how the constraints should change. The
+learning rate determines how loud that vote is.
+
+**The coherent direction emerges from many votes.** One example is
+noise. A hundred examples reveal the pattern. Apollo's moments (M, V)
+accumulate the votes, smoothing out the noise. The individual lr
+controls how much each vote counts.
+
+**Kent's "lots of little nudges"** is exactly right: many small
+votes that accumulate into a coherent direction. Not because big
+votes are dangerous (though they are at scale) but because the
+TRUTH only emerges from the aggregate.
+
+This predicts: lr=1e-5 is right for our scale (27B). Each example
+is one vote. The coherent direction emerges over 50-200 examples.
+The moments smooth the noise. The result is a gentle, coherent
+constraint adjustment.
+
+DPO needs lr=5e-7 because each DPO pair is a COMPARATIVE vote
+("this is better than that"). Comparative votes are noisier than
+absolute votes — the difference might be small, the preference
+might be marginal. So each comparative vote gets less weight.