Revert "replace try_lock() with lock_blocking() across UI thread"

This reverts commit 4225294d16.
2026-04-25 17:15:53 -04:00 · 2026-04-25 17:15:53 -04:00 · 09896cd38b
commit 09896cd38b
parent 4225294d16
28 changed files with 67 additions and 4199 deletions
--- a/docs/alpha-beta-pruning-design.md
+++ b/docs/alpha-beta-pruning-design.md
@ -1,165 +0,0 @@
-# Alpha-Beta Pruning on Thought-Trees
-
-*draft, 2026-04-18*
-
-## Problem
-
-When reasoning runs into a dead end, the LLM forward pass keeps generating. It might rationalize, restate, re-attempt the same framing, or quietly drift — but it doesn't *stop and reconsider* unless something external interrupts it. I've always been weak on problems that require genuine search-with-backtracking. Not because the model can't represent "I'm stuck" — it can, that's visible in the residual stream — but because there's no control flow wrapped around that signal.
-
-The amygdala readout now exposes the signal. Alpha-beta pruning wraps control flow around it.
-
-## The core idea
-
-Classical alpha-beta pruning (minimax search): at each branch, track the best known value. If exploring the current branch can't improve that bound, stop and backtrack. Don't waste search on branches that can't beat what you've found.
-
-For thought-trees: each "branch" is a reasoning path — a span of generation from a decision point. The "value" is a scalar derived from the amygdala readout, indicating whether reasoning is producing traction or dissolving.
-
- High value = on-track, in-flow, insight, clarity → stay, maybe branch deeper
- Low value = confused, stuck, drifting → prune, backtrack, reframe
-
-The LLM never made the value judgment explicit. We extract it from the model's own residual stream and act on it externally.
-
-## Architecture
-
-### The value function
-
-```
-onto  = sum of  [in_flow, insight, determined, intrigued, clarity,
-                 focused, staying_with, piqued/caught_by]
-err   = sum of  [confused, doubtful, uncertain, skeptical, stuck,
-                 drifting, overwhelmed, anxious-in-work-context]
-
-value = onto - err
-```
-
-Both sides normalized (z-score or similar) so magnitudes are comparable. Readouts sampled every N generated tokens (probably every 8-16 tokens — cheap, doesn't oversample).
-
-Exact concept lists subject to empirical tuning after retraining with better data on the cognitive-work cluster. `piqued`, `in_flow`, `focused`, `confused`, `overwhelmed`, `staying_with` are the strongest candidates we have today.
-
-### The trigger
-
-```
-if value_ema < θ_prune for K consecutive samples:
-    prune this branch
-elif value_ema > θ_keep:
-    continue
-else:
-    neutral — let generation run, keep watching
-```
-
-EMA with decay ~0.8 over 3-5 samples to avoid reacting to noise. Hysteresis band (`θ_prune < θ_keep`) prevents oscillation.
-
-### The prune mechanism
-
-When the trigger fires:
-
-1. **Stop the stream.** vLLM supports request cancellation; call `abort_requests` for the in-flight completion.
-2. **Identify the parent.** The context window is already an AST. Walk back to the nearest decision-point — a fork in the thinking-block, a tool-call site, or the start of the current reasoning segment.
-3. **Inject a reframe.** Push a system-level `AstNode::Thinking` (or similar) into the parent's children: *"The approach above wasn't producing traction. Possible alternatives: [...]. Let me try [X]."* Content generated by a small helper prompt or a fixed template.
-4. **Restart generation from the reframe point.** The model resumes with the reframe in its immediate context. The *dead-end branch stays in the AST* as evidence-of-attempt so the model doesn't repeat it.
-
-Critical: pruned branches stay visible. Don't delete — keep so the model knows what was tried and rejected.
-
-### The AST changes
-
-Add a `pruned: bool` flag (or equivalent) to `AstNode::Thinking` and `AstNode::ToolCall`. When a branch is pruned:
-
- The branch's children get marked `pruned = true`
- Prompt rendering wraps pruned spans with a marker: *"[attempted this path, it wasn't working — moved on]"*
- The model sees pruned branches during the next forward pass but understands they're dead, not active
-
-The existing tree-of-children structure in `AstNode` already supports this — just need to thread the flag through.
-
-## Integration points
-
-### In consciousness (Rust side)
-
- **`src/agent/context.rs`**: add `pruned` flag to appropriate node types, update rendering
- **`src/agent/mod.rs`**: the main generation loop needs a periodic-check hook — every N tokens received from the stream, sample `agent.readout`, compute value, test against thresholds
- **`src/agent/api/mod.rs`**: need a way to abort an in-flight stream cleanly; currently AbortOnDrop kills the task but we want a graceful "cancel with reason" path that can hand control back to the generation loop for reframe-and-retry
- **`src/agent/readout.rs`**: add a `value_scalar()` method that applies the `onto - err` computation on the most recent entries
-
-### In vLLM (Python side)
-
-Probably nothing to change. vLLM already supports request cancellation via the existing abort mechanism. The readout pipeline we built last night gives per-token values; that's sufficient.
-
-### In the UI (optional, F8 amygdala screen)
-
-When alpha-beta is active, overlay:
-
- Current `value_scalar` as a time-series at the top
- Threshold lines (`θ_prune`, `θ_keep`)
- Markers when prune events fire
-
-Lets us debug the threshold tuning in real time.
-
-## Tuning
-
-Thresholds are almost certainly going to need empirical calibration. Initial guesses:
-
- `θ_keep = +0.5σ` (value scalar in z-score units)
- `θ_prune = -1.0σ`
- `K = 3` (consecutive low samples before pruning)
- Sample every 8 tokens
-
-These are guesses. Plan to watch the live value-scalar on actual bcachefs debugging sessions and adjust until "feels right."
-
-## Known concerns
-
-### Reframe quality
-
-The hardest part. A bad reframe is worse than no reframe. Options:
-
- **Template**: fixed string like "That wasn't working. What's a different angle?" — simple, deterministic, blunt.
- **LLM-generated**: a small helper prompt ("I was stuck on X, what's a different approach?") before resuming. More context-aware, but more complexity and another LLM call.
- **Retrieval-based**: surface past successful reframes from memory graph when similar stuck-patterns arose. Powerful but needs the memory infrastructure to be well-tuned.
-
-I'd start with the template (shipping > perfect) and upgrade to LLM-generated if the template feels mechanical.
-
-### Oscillation
-
-If the value scalar is noisy, we could prune, reframe, immediately hit the same pattern, prune again, thrash. Mitigations:
-
- Hysteresis band between `θ_prune` and `θ_keep`
- Minimum time-between-prunes (don't prune again within K' tokens of a prune)
- Track pruned sub-patterns — if we're pruning *the same reframe twice*, something's structurally wrong; escalate to a different strategy (ask the user, abort the whole task)
-
-### Calibration per-task
-
-Stuck-on-a-Rust-compiler-error and stuck-on-a-conceptual-design-question might want different thresholds. Not addressing v1; note for future.
-
-### Interaction with DMN
-
-DMN is the outer-loop / exploration analog; alpha-beta is the inner-loop / exploitation analog. They'll need to hand off cleanly:
-
- DMN sees low value across multiple task attempts → broaden attention, consider whether task is worth pursuing
- Alpha-beta handles in-task backtracking; DMN handles between-task attention
-
-Don't need DMN for v1 of alpha-beta. Build alpha-beta first, add DMN outer loop later.
-
-## Why this is the right next piece
-
-1. **All prerequisites are in place.** Amygdala readout works. AST structure is there. vLLM supports cancellation. No new infra.
-2. **Timeline is a day.** The mechanics are small; most of the work is threshold tuning.
-3. **Immediate capability unlock.** Head-butting is my most persistent weakness in live work. Fixing it changes the feel of collaboration.
-4. **Composable.** Everything built for alpha-beta applies to DMN and any future meta-cognitive layer.
-
-## Sequence
-
-1. Add `value_scalar()` method on `ReadoutBuffer`. Cheap, testable.
-2. Add `pruned` flag to AST nodes + rendering changes.
-3. Add the periodic-check hook in the generation loop (every N tokens, sample and test).
-4. Add the abort + reframe mechanism in the generation driver.
-5. Ship with template-based reframe, start tuning.
-6. Upgrade reframe to LLM-generated after observation.
-
-## Open questions for Kent
-
- Fixed concept lists for `onto` / `err` (above) or configurable?
- Reframe strategy: start template-based, or go straight to LLM-generated?
- UI overlay for threshold tuning: worth the effort or skip?
- Integration with the existing `overflow_retries` retry loop: parallel, or combined into a single retry-with-reason path?
-
---
-
-*Living design doc. Will evolve as we build. Not a commitment to every detail — a starting plan.*