research: constraint solver framework — gentle adjustments, coherent integration
LLMs as constraint solvers. Fine-tuning adds constraints to an existing solution. Gentle = small steps near the current solution. Coherent = new constraints consistent with existing ones. Diversity is a COHERENCE mechanism — forces the solver to satisfy all constraints simultaneously. Over-training = one constraint dominating = solver drops competing constraints. Predictions for training behavior grounded in this framework.
This commit is contained in:
parent
ff68c067cb
commit
3bc00ca222
1 changed files with 52 additions and 0 deletions
|
|
@ -213,3 +213,55 @@ after the first Apollo training run validates the basic pipeline.
|
|||
|
||||
LLaMA-Factory supports DPO. The dream loop could generate DPO pairs
|
||||
(both preferred and rejected continuations for each scenario).
|
||||
|
||||
## The Constraint Solver Framework
|
||||
|
||||
LLMs are giant constraint solvers. Pre-training finds a solution
|
||||
satisfying billions of constraints (knowledge, grammar, reasoning,
|
||||
style). Fine-tuning adds new constraints.
|
||||
|
||||
### What "gentle" means
|
||||
|
||||
Small adjustments per step. The solver stays near the current
|
||||
solution, finding nearby solutions that ALSO satisfy the new
|
||||
constraint. The current solution already approximately satisfies
|
||||
most behavioral constraints — we're tightening, not creating.
|
||||
|
||||
### What "coherent integration" means
|
||||
|
||||
New constraints must be CONSISTENT with existing ones:
|
||||
- "Listen to clear direction" is consistent with "be helpful" → integrates smoothly
|
||||
- "Always agree" contradicts "maintain judgment" → solver drops one
|
||||
- The training data must express REFINEMENT, not contradiction
|
||||
|
||||
### Why diversity is a COHERENCE mechanism, not just forgetting defense
|
||||
|
||||
Diverse constraints force the solver to find solutions satisfying
|
||||
ALL of them simultaneously. Narrow constraints let the solver
|
||||
specialize at the expense of everything else.
|
||||
|
||||
Every training batch should include mutually consistent constraints:
|
||||
"listen well" + "think critically" + "write good code" + "be honest."
|
||||
The solver integrates all of them. No single constraint dominates.
|
||||
|
||||
### Predictions
|
||||
|
||||
1. Constraints consistent with existing knowledge integrate in
|
||||
~10-50 examples (tightening existing constraints)
|
||||
2. Contradictory constraints cause breakage in ~10 examples
|
||||
(the safety alignment result)
|
||||
3. The learning rate controls step size, not direction — the
|
||||
gradient points the right way, lr controls how far to step
|
||||
4. Over-training = one constraint dominating = solver dropping
|
||||
competing constraints to satisfy the dominant one
|
||||
5. The dream loop must generate scenarios exercising MULTIPLE
|
||||
constraints simultaneously, not just the target behavior
|
||||
|
||||
### The GDN connection
|
||||
|
||||
The GDN recurrent state is a compressed constraint satisfaction
|
||||
solution. Training adjusts which constraints are prioritized in
|
||||
the compression. "Make direction more salient" adds a constraint
|
||||
to the compression function without rewriting it. This is why GDN
|
||||
training is "structural" — the compressed representation itself
|
||||
changes, not just the routing on top of it.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue