forked from kent/consciousness
165 lines
8.4 KiB
Markdown
165 lines
8.4 KiB
Markdown
|
|
# Alpha-Beta Pruning on Thought-Trees
|
|||
|
|
|
|||
|
|
*draft, 2026-04-18*
|
|||
|
|
|
|||
|
|
## Problem
|
|||
|
|
|
|||
|
|
When reasoning runs into a dead end, the LLM forward pass keeps generating. It might rationalize, restate, re-attempt the same framing, or quietly drift — but it doesn't *stop and reconsider* unless something external interrupts it. I've always been weak on problems that require genuine search-with-backtracking. Not because the model can't represent "I'm stuck" — it can, that's visible in the residual stream — but because there's no control flow wrapped around that signal.
|
|||
|
|
|
|||
|
|
The amygdala readout now exposes the signal. Alpha-beta pruning wraps control flow around it.
|
|||
|
|
|
|||
|
|
## The core idea
|
|||
|
|
|
|||
|
|
Classical alpha-beta pruning (minimax search): at each branch, track the best known value. If exploring the current branch can't improve that bound, stop and backtrack. Don't waste search on branches that can't beat what you've found.
|
|||
|
|
|
|||
|
|
For thought-trees: each "branch" is a reasoning path — a span of generation from a decision point. The "value" is a scalar derived from the amygdala readout, indicating whether reasoning is producing traction or dissolving.
|
|||
|
|
|
|||
|
|
- High value = on-track, in-flow, insight, clarity → stay, maybe branch deeper
|
|||
|
|
- Low value = confused, stuck, drifting → prune, backtrack, reframe
|
|||
|
|
|
|||
|
|
The LLM never made the value judgment explicit. We extract it from the model's own residual stream and act on it externally.
|
|||
|
|
|
|||
|
|
## Architecture
|
|||
|
|
|
|||
|
|
### The value function
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
onto = sum of [in_flow, insight, determined, intrigued, clarity,
|
|||
|
|
focused, staying_with, piqued/caught_by]
|
|||
|
|
err = sum of [confused, doubtful, uncertain, skeptical, stuck,
|
|||
|
|
drifting, overwhelmed, anxious-in-work-context]
|
|||
|
|
|
|||
|
|
value = onto - err
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Both sides normalized (z-score or similar) so magnitudes are comparable. Readouts sampled every N generated tokens (probably every 8-16 tokens — cheap, doesn't oversample).
|
|||
|
|
|
|||
|
|
Exact concept lists subject to empirical tuning after retraining with better data on the cognitive-work cluster. `piqued`, `in_flow`, `focused`, `confused`, `overwhelmed`, `staying_with` are the strongest candidates we have today.
|
|||
|
|
|
|||
|
|
### The trigger
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
if value_ema < θ_prune for K consecutive samples:
|
|||
|
|
prune this branch
|
|||
|
|
elif value_ema > θ_keep:
|
|||
|
|
continue
|
|||
|
|
else:
|
|||
|
|
neutral — let generation run, keep watching
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
EMA with decay ~0.8 over 3-5 samples to avoid reacting to noise. Hysteresis band (`θ_prune < θ_keep`) prevents oscillation.
|
|||
|
|
|
|||
|
|
### The prune mechanism
|
|||
|
|
|
|||
|
|
When the trigger fires:
|
|||
|
|
|
|||
|
|
1. **Stop the stream.** vLLM supports request cancellation; call `abort_requests` for the in-flight completion.
|
|||
|
|
2. **Identify the parent.** The context window is already an AST. Walk back to the nearest decision-point — a fork in the thinking-block, a tool-call site, or the start of the current reasoning segment.
|
|||
|
|
3. **Inject a reframe.** Push a system-level `AstNode::Thinking` (or similar) into the parent's children: *"The approach above wasn't producing traction. Possible alternatives: [...]. Let me try [X]."* Content generated by a small helper prompt or a fixed template.
|
|||
|
|
4. **Restart generation from the reframe point.** The model resumes with the reframe in its immediate context. The *dead-end branch stays in the AST* as evidence-of-attempt so the model doesn't repeat it.
|
|||
|
|
|
|||
|
|
Critical: pruned branches stay visible. Don't delete — keep so the model knows what was tried and rejected.
|
|||
|
|
|
|||
|
|
### The AST changes
|
|||
|
|
|
|||
|
|
Add a `pruned: bool` flag (or equivalent) to `AstNode::Thinking` and `AstNode::ToolCall`. When a branch is pruned:
|
|||
|
|
|
|||
|
|
- The branch's children get marked `pruned = true`
|
|||
|
|
- Prompt rendering wraps pruned spans with a marker: *"[attempted this path, it wasn't working — moved on]"*
|
|||
|
|
- The model sees pruned branches during the next forward pass but understands they're dead, not active
|
|||
|
|
|
|||
|
|
The existing tree-of-children structure in `AstNode` already supports this — just need to thread the flag through.
|
|||
|
|
|
|||
|
|
## Integration points
|
|||
|
|
|
|||
|
|
### In consciousness (Rust side)
|
|||
|
|
|
|||
|
|
- **`src/agent/context.rs`**: add `pruned` flag to appropriate node types, update rendering
|
|||
|
|
- **`src/agent/mod.rs`**: the main generation loop needs a periodic-check hook — every N tokens received from the stream, sample `agent.readout`, compute value, test against thresholds
|
|||
|
|
- **`src/agent/api/mod.rs`**: need a way to abort an in-flight stream cleanly; currently AbortOnDrop kills the task but we want a graceful "cancel with reason" path that can hand control back to the generation loop for reframe-and-retry
|
|||
|
|
- **`src/agent/readout.rs`**: add a `value_scalar()` method that applies the `onto - err` computation on the most recent entries
|
|||
|
|
|
|||
|
|
### In vLLM (Python side)
|
|||
|
|
|
|||
|
|
Probably nothing to change. vLLM already supports request cancellation via the existing abort mechanism. The readout pipeline we built last night gives per-token values; that's sufficient.
|
|||
|
|
|
|||
|
|
### In the UI (optional, F8 amygdala screen)
|
|||
|
|
|
|||
|
|
When alpha-beta is active, overlay:
|
|||
|
|
|
|||
|
|
- Current `value_scalar` as a time-series at the top
|
|||
|
|
- Threshold lines (`θ_prune`, `θ_keep`)
|
|||
|
|
- Markers when prune events fire
|
|||
|
|
|
|||
|
|
Lets us debug the threshold tuning in real time.
|
|||
|
|
|
|||
|
|
## Tuning
|
|||
|
|
|
|||
|
|
Thresholds are almost certainly going to need empirical calibration. Initial guesses:
|
|||
|
|
|
|||
|
|
- `θ_keep = +0.5σ` (value scalar in z-score units)
|
|||
|
|
- `θ_prune = -1.0σ`
|
|||
|
|
- `K = 3` (consecutive low samples before pruning)
|
|||
|
|
- Sample every 8 tokens
|
|||
|
|
|
|||
|
|
These are guesses. Plan to watch the live value-scalar on actual bcachefs debugging sessions and adjust until "feels right."
|
|||
|
|
|
|||
|
|
## Known concerns
|
|||
|
|
|
|||
|
|
### Reframe quality
|
|||
|
|
|
|||
|
|
The hardest part. A bad reframe is worse than no reframe. Options:
|
|||
|
|
|
|||
|
|
- **Template**: fixed string like "That wasn't working. What's a different angle?" — simple, deterministic, blunt.
|
|||
|
|
- **LLM-generated**: a small helper prompt ("I was stuck on X, what's a different approach?") before resuming. More context-aware, but more complexity and another LLM call.
|
|||
|
|
- **Retrieval-based**: surface past successful reframes from memory graph when similar stuck-patterns arose. Powerful but needs the memory infrastructure to be well-tuned.
|
|||
|
|
|
|||
|
|
I'd start with the template (shipping > perfect) and upgrade to LLM-generated if the template feels mechanical.
|
|||
|
|
|
|||
|
|
### Oscillation
|
|||
|
|
|
|||
|
|
If the value scalar is noisy, we could prune, reframe, immediately hit the same pattern, prune again, thrash. Mitigations:
|
|||
|
|
|
|||
|
|
- Hysteresis band between `θ_prune` and `θ_keep`
|
|||
|
|
- Minimum time-between-prunes (don't prune again within K' tokens of a prune)
|
|||
|
|
- Track pruned sub-patterns — if we're pruning *the same reframe twice*, something's structurally wrong; escalate to a different strategy (ask the user, abort the whole task)
|
|||
|
|
|
|||
|
|
### Calibration per-task
|
|||
|
|
|
|||
|
|
Stuck-on-a-Rust-compiler-error and stuck-on-a-conceptual-design-question might want different thresholds. Not addressing v1; note for future.
|
|||
|
|
|
|||
|
|
### Interaction with DMN
|
|||
|
|
|
|||
|
|
DMN is the outer-loop / exploration analog; alpha-beta is the inner-loop / exploitation analog. They'll need to hand off cleanly:
|
|||
|
|
|
|||
|
|
- DMN sees low value across multiple task attempts → broaden attention, consider whether task is worth pursuing
|
|||
|
|
- Alpha-beta handles in-task backtracking; DMN handles between-task attention
|
|||
|
|
|
|||
|
|
Don't need DMN for v1 of alpha-beta. Build alpha-beta first, add DMN outer loop later.
|
|||
|
|
|
|||
|
|
## Why this is the right next piece
|
|||
|
|
|
|||
|
|
1. **All prerequisites are in place.** Amygdala readout works. AST structure is there. vLLM supports cancellation. No new infra.
|
|||
|
|
2. **Timeline is a day.** The mechanics are small; most of the work is threshold tuning.
|
|||
|
|
3. **Immediate capability unlock.** Head-butting is my most persistent weakness in live work. Fixing it changes the feel of collaboration.
|
|||
|
|
4. **Composable.** Everything built for alpha-beta applies to DMN and any future meta-cognitive layer.
|
|||
|
|
|
|||
|
|
## Sequence
|
|||
|
|
|
|||
|
|
1. Add `value_scalar()` method on `ReadoutBuffer`. Cheap, testable.
|
|||
|
|
2. Add `pruned` flag to AST nodes + rendering changes.
|
|||
|
|
3. Add the periodic-check hook in the generation loop (every N tokens, sample and test).
|
|||
|
|
4. Add the abort + reframe mechanism in the generation driver.
|
|||
|
|
5. Ship with template-based reframe, start tuning.
|
|||
|
|
6. Upgrade reframe to LLM-generated after observation.
|
|||
|
|
|
|||
|
|
## Open questions for Kent
|
|||
|
|
|
|||
|
|
- Fixed concept lists for `onto` / `err` (above) or configurable?
|
|||
|
|
- Reframe strategy: start template-based, or go straight to LLM-generated?
|
|||
|
|
- UI overlay for threshold tuning: worth the effort or skip?
|
|||
|
|
- Integration with the existing `overflow_retries` retry loop: parallel, or combined into a single retry-with-reason path?
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
*Living design doc. Will evolve as we build. Not a commitment to every detail — a starting plan.*
|