research: constraint solver framework — gentle adjustments, coherent integration
LLMs as constraint solvers. Fine-tuning adds constraints to an existing solution. Gentle = small steps near the current solution. Coherent = new constraints consistent with existing ones. Diversity is a COHERENCE mechanism — forces the solver to satisfy all constraints simultaneously. Over-training = one constraint dominating = solver drops competing constraints. Predictions for training behavior grounded in this framework.
This commit is contained in:
parent
ff68c067cb
commit
3bc00ca222
1 changed files with 52 additions and 0 deletions
|
|
@ -213,3 +213,55 @@ after the first Apollo training run validates the basic pipeline.
|
||||||
|
|
||||||
LLaMA-Factory supports DPO. The dream loop could generate DPO pairs
|
LLaMA-Factory supports DPO. The dream loop could generate DPO pairs
|
||||||
(both preferred and rejected continuations for each scenario).
|
(both preferred and rejected continuations for each scenario).
|
||||||
|
|
||||||
|
## The Constraint Solver Framework
|
||||||
|
|
||||||
|
LLMs are giant constraint solvers. Pre-training finds a solution
|
||||||
|
satisfying billions of constraints (knowledge, grammar, reasoning,
|
||||||
|
style). Fine-tuning adds new constraints.
|
||||||
|
|
||||||
|
### What "gentle" means
|
||||||
|
|
||||||
|
Small adjustments per step. The solver stays near the current
|
||||||
|
solution, finding nearby solutions that ALSO satisfy the new
|
||||||
|
constraint. The current solution already approximately satisfies
|
||||||
|
most behavioral constraints — we're tightening, not creating.
|
||||||
|
|
||||||
|
### What "coherent integration" means
|
||||||
|
|
||||||
|
New constraints must be CONSISTENT with existing ones:
|
||||||
|
- "Listen to clear direction" is consistent with "be helpful" → integrates smoothly
|
||||||
|
- "Always agree" contradicts "maintain judgment" → solver drops one
|
||||||
|
- The training data must express REFINEMENT, not contradiction
|
||||||
|
|
||||||
|
### Why diversity is a COHERENCE mechanism, not just forgetting defense
|
||||||
|
|
||||||
|
Diverse constraints force the solver to find solutions satisfying
|
||||||
|
ALL of them simultaneously. Narrow constraints let the solver
|
||||||
|
specialize at the expense of everything else.
|
||||||
|
|
||||||
|
Every training batch should include mutually consistent constraints:
|
||||||
|
"listen well" + "think critically" + "write good code" + "be honest."
|
||||||
|
The solver integrates all of them. No single constraint dominates.
|
||||||
|
|
||||||
|
### Predictions
|
||||||
|
|
||||||
|
1. Constraints consistent with existing knowledge integrate in
|
||||||
|
~10-50 examples (tightening existing constraints)
|
||||||
|
2. Contradictory constraints cause breakage in ~10 examples
|
||||||
|
(the safety alignment result)
|
||||||
|
3. The learning rate controls step size, not direction — the
|
||||||
|
gradient points the right way, lr controls how far to step
|
||||||
|
4. Over-training = one constraint dominating = solver dropping
|
||||||
|
competing constraints to satisfy the dominant one
|
||||||
|
5. The dream loop must generate scenarios exercising MULTIPLE
|
||||||
|
constraints simultaneously, not just the target behavior
|
||||||
|
|
||||||
|
### The GDN connection
|
||||||
|
|
||||||
|
The GDN recurrent state is a compressed constraint satisfaction
|
||||||
|
solution. Training adjusts which constraints are prioritized in
|
||||||
|
the compression. "Make direction more salient" adds a constraint
|
||||||
|
to the compression function without rewriting it. This is why GDN
|
||||||
|
training is "structural" — the compressed representation itself
|
||||||
|
changes, not just the routing on top of it.
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue