kill .claude

2026-04-09 20:00:05 -04:00 · 2026-04-09 20:00:05 -04:00 · ff5be3e792
commit ff5be3e792
parent 929415af3b
6 changed files with 0 additions and 0 deletions
--- a/doc/analysis/2026-03-14-daemon-jobkit-survey.md
+++ b/doc/analysis/2026-03-14-daemon-jobkit-survey.md
@ -0,0 +1,202 @@
+# Daemon & Jobkit Architecture Survey
+_2026-03-14, autonomous survey while Kent debugs discard FIFO_
+
+## Current state
+
+daemon.rs is 1952 lines mixing three concerns:
+- ~400 lines: pure jobkit usage (spawn, depend_on, resource)
+- ~600 lines: logging/monitoring (log_event, status, RPC)
+- ~950 lines: job functions embedding business logic
+
+## What jobkit provides (good)
+
+- Worker pool with named workers
+- Dependency graph: `depend_on()` for ordering
+- Resource pools: `ResourcePool` for concurrency gating (LLM slots)
+- Retry logic: `retries(N)` on `TaskError::Retry`
+- Task status tracking: `choir.task_statuses()` → `Vec<TaskInfo>`
+- Cancellation: `ctx.is_cancelled()`
+
+## What jobkit is missing
+
+### 1. Structured logging (PRIORITY)
+- Currently dual-channel: `ctx.log_line()` (per-task) + `log_event()` (daemon JSONL)
+- No log levels, no structured context, no correlation IDs
+- Log rotation is naive (truncate at 1MB, keep second half)
+- Need: observability hooks that both human TUI and AI can consume
+
+### 2. Metrics (NONE EXIST)
+- No task duration histograms
+- No worker utilization tracking
+- No queue depth monitoring
+- No success/failure rates by type
+- No resource pool wait times
+
+### 3. Health monitoring
+- No watchdog timers
+- No health check hooks per job
+- No alerting on threshold violations
+- Health computed on-demand in daemon, not in jobkit
+
+### 4. RPC (ad-hoc in daemon, should be schematized)
+- Unix socket with string matching: `match cmd.as_str()`
+- No cap'n proto schema for daemon control
+- No versioning, no validation, no streaming
+
+## Architecture problems
+
+### Tangled concerns
+Job functions hardcode `log_event()` calls. Graph health is in daemon
+but uses domain-specific metrics. Store loading happens inside jobs
+(10 agent runs = 10 store loads). Not separable.
+
+### Magic numbers
+- Workers = `llm_concurrency + 3` (line 682)
+- 10 max new jobs per tick (line 770)
+- 300/1800s backoff range (lines 721-722)
+- 1MB log rotation (line 39)
+- 60s scheduler interval (line 24)
+None configurable.
+
+### Hardcoded pipeline DAG
+Daily pipeline phases are `depend_on()` chains in Rust code (lines
+1061-1109). Can't adjust without recompile. No visualization. No
+conditional skipping of phases.
+
+### Task naming is fragile
+Names used as both identifiers AND for parsing in TUI. Format varies
+(colons, dashes, dates). `task_group()` splits on '-' to categorize —
+brittle.
+
+### No persistent task queue
+Restart loses all pending tasks. Session watcher handles this via
+reconciliation (good), but scheduler uses `last_daily` date from file.
+
+## What works well
+
+1. **Reconciliation-based session discovery** — elegant, restart-resilient
+2. **Resource pooling** — LLM concurrency decoupled from worker count
+3. **Dependency-driven pipeline** — clean DAG via `depend_on()`
+4. **Retry with backoff** — exponential 5min→30min, resets on success
+5. **Graceful shutdown** — SIGINT/SIGTERM handled properly
+
+## Kent's design direction
+
+### Event stream, not log files
+One pipeline, multiple consumers. TUI renders for humans, AI consumes
+structured data. Same events, different renderers. Cap'n Proto streaming
+subscription: `subscribe(filter) -> stream<Event>`.
+
+"No one ever thinks further ahead than log files with monitoring and
+it's infuriating." — Kent
+
+### Extend jobkit, don't add a layer
+jobkit already has the scheduling and dependency graph. Don't create a
+new orchestration layer — add the missing pieces (logging, metrics,
+health, RPC) to jobkit itself.
+
+### Cap'n Proto for everything
+Standard RPC definitions for:
+- Status queries (what's running, pending, failed)
+- Control (start, stop, restart, queue)
+- Event streaming (subscribe with filter)
+- Health checks
+
+## The bigger picture: bcachefs as library
+
+Kent's monitoring system in bcachefs (event_inc/event_inc_trace + x-macro
+counters) is the real monitoring infrastructure. 1-1 correspondence between
+counters (cheap, always-on dashboard via `fs top`) and tracepoints (expensive
+detail, only runs when enabled). The x-macro enforces this — can't have one
+without the other.
+
+When the Rust conversion is complete, bcachefs becomes a library. At that
+point, jobkit doesn't need its own monitoring — it uses the same counter/
+tracepoint infrastructure. One observability system for everything.
+
+**Implication for now:** jobkit monitoring just needs to be good enough.
+JSON events, not typed. Don't over-engineer — the real infrastructure is
+coming from the Rust conversion.
+
+## Extraction: jobkit-daemon library (designed with Kent)
+
+### Goes to jobkit-daemon (generic)
+- JSONL event logging with size-based rotation
+- Unix domain socket server + signal handling
+- Status file writing (periodic JSON snapshot)
+- `run_job()` wrapper (logging + progress + error mapping)
+- Systemd service installation
+- Worker pool setup from config
+- Cap'n Proto RPC for control protocol
+
+### Stays in poc-memory (application)
+- All job functions (experience-mine, fact-mine, consolidation, etc.)
+- Session watcher, scheduler, RPC command handlers
+- GraphHealth, consolidation plan logic
+
+### Interface design
+- Cap'n Proto RPC for typed operations (submit, cancel, subscribe)
+- JSON blob for status (inherently open-ended, every app has different
+  job types — typing this is the tracepoint mistake)
+- Application registers: RPC handlers, long-running tasks, job functions
+- ~50-100 lines of setup code, call `daemon.run()`
+
+## Plan of attack
+
+1. **Observability hooks in jobkit** — `on_task_start/progress/complete`
+   callbacks that consumers can subscribe to
+2. **Structured event type** — typed events with task ID, name, duration,
+   result, metadata. Not strings.
+3. **Metrics collection** — duration histograms, success rates, queue
+   depth. Built on the event stream.
+4. **Cap'n Proto daemon RPC schema** — replace ad-hoc socket protocol
+5. **TUI consumes event stream** — same data as AI consumer
+6. **Extract monitoring from daemon.rs** — the 600 lines of logging/status
+   become generic, reusable infrastructure
+7. **Declarative pipeline config** — DAG definition in config, not code
+
+## File reference
+
+- `src/agents/daemon.rs` — 1952 lines, all orchestration
+  - Job functions: 96-553
+  - run_daemon(): 678-1143
+  - Socket/RPC: 1145-1372
+  - Status display: 1374-1682
+- `src/tui.rs` — 907 lines, polls status socket every 2s
+- `schema/memory.capnp` — 125 lines, data only, no RPC definitions
+- `src/config.rs` — configuration loading
+- External: `jobkit` crate (git dependency)
+
+## Mistakes I made building this (learning notes)
+
+_Per Kent's instruction: note what went wrong and WHY._
+
+1. **Dual logging channels** — I added `log_event()` because `ctx.log_line()`
+   wasn't enough, instead of fixing the underlying abstraction. Symptom:
+   can't find a failed job without searching two places.
+
+2. **Magic numbers** — I hardcoded constants because "I'll make them
+   configurable later." Later never came. Every magic number is a design
+   decision that should have been explicit.
+
+3. **1952-line file** — daemon.rs grew organically because each new feature
+   was "just one more function." Should have extracted when it passed 500
+   lines. The pain of refactoring later is always worse than the pain of
+   organizing early.
+
+4. **Ad-hoc RPC** — String matching seemed fine for 2 commands. Now it's 4
+   commands and growing, with implicit formats. Should have used cap'n proto
+   from the start — the schema IS the documentation.
+
+5. **No tests** — Zero tests in daemon code. "It's a daemon, how do you test
+   it?" is not an excuse. The job functions are pure-ish and testable. The
+   scheduler logic is testable with a clock abstraction.
+
+6. **Not using systemd** — There's a systemd service for the daemon.
+   I keep starting it manually with `poc-memory agent daemon start` and
+   accumulating multiple instances. Tonight: 4 concurrent daemons, 32
+   cores pegged at 95%, load average 92. USE SYSTEMD. That's what it's for.
+   `systemctl --user start poc-memory-daemon`. ONE instance. Managed.
+
+Pattern: every shortcut was "just for now" and every "just for now" became
+permanent. Kent's yelling was right every time.
--- a/doc/analysis/2026-03-14-link-strength-feedback.md
+++ b/doc/analysis/2026-03-14-link-strength-feedback.md
@ -0,0 +1,98 @@
+# Link Strength Feedback Design
+_2026-03-14, designed with Kent_
+
+## The two signals
+
+### "Not relevant" → weaken the EDGE
+The routing failed. Search followed a link and arrived at a node that
+doesn't relate to what I was looking for. The edge carried activation
+where it shouldn't have.
+
+- Trace back through memory-search's recorded activation path
+- Identify which edge(s) carried activation to the bad result
+- Weaken those edges by a conscious-scale delta (0.01)
+
+### "Not useful" → weaken the NODE
+The routing was correct but the content is bad. The node itself isn't
+valuable — stale, wrong, poorly written, duplicate.
+
+- Downweight the node (existing `poc-memory wrong` behavior)
+- Don't touch the edges — the path was correct, the destination was bad
+
+## Three tiers of adjustment
+
+### Tier 1: Agent automatic (0.00001 per event)
+- Agent follows edge A→B during a run
+- If the run produces output that gets `used` → strengthen A→B
+- If the run produces nothing useful → weaken A→B
+- The agent doesn't know this is happening — daemon tracks it
+- Clamped to [0.05, 0.95] — edges can never hit 0 or 1
+- Logged: every adjustment recorded with (agent, edge, delta, timestamp)
+
+### Tier 2: Conscious feedback (0.01 per event)
+- `poc-memory not-relevant KEY` → trace activation path, weaken edges
+- `poc-memory not-useful KEY` → downweight node
+- `poc-memory used KEY` → strengthen edges in the path that got here
+- 100x stronger than agent signal — deliberate judgment
+- Still clamped, still logged
+
+### Tier 3: Manual override (direct set)
+- `poc-memory graph link-strength SRC DST VALUE` → set directly
+- For when we know exactly what a strength should be
+- Rare, but needed for bootstrapping / correction
+
+## Implementation: recording the path
+
+memory-search already computes the spread activation trace. Need to:
+1. Record the activation path for each result (which edges carried how
+   much activation to arrive at this node)
+2. Persist this per-session so `not-relevant` can look it up
+3. The `record-hits` RPC already sends keys to the daemon — extend
+   to include (key, activation_path) pairs
+
+## Implementation: agent tracking
+
+In the daemon's job functions:
+1. Before LLM call: record which nodes and edges the agent received
+2. After LLM call: parse output for LINK/WRITE_NODE actions
+3. If actions are created and later get `used` → the input edges were useful
+4. If no actions or actions never used → the input edges weren't useful
+5. This is a delayed signal — requires tracking across time
+
+Simpler first pass: just track co-occurrence. If two nodes appear
+together in a successful agent run, strengthen the edge between them.
+No need to track which specific edge was "followed."
+
+## Clamping
+
+```rust
+fn adjust_strength(current: f32, delta: f32) -> f32 {
+    (current + delta).clamp(0.05, 0.95)
+}
+```
+
+Edges can asymptotically approach 0 or 1 but never reach them.
+This prevents dead edges (can always be revived by strong signal)
+and prevents edges from becoming unweakenable.
+
+## Logging
+
+Every adjustment logged as JSON event:
+```json
+{"ts": "...", "event": "strength_adjust", "source": "agent|conscious|manual",
+ "edge": ["nodeA", "nodeB"], "old": 0.45, "new": 0.4501, "delta": 0.0001,
+ "reason": "co-retrieval in linker run c-linker-42"}
+```
+
+This lets us:
+- Watch the distribution shift over time
+- Identify edges that are oscillating (being pulled both ways)
+- Tune the delta values based on observed behavior
+- Roll back if something goes wrong
+
+## Migration from current commands
+
+- `poc-memory wrong KEY [CTX]` → splits into `not-relevant` and `not-useful`
+- `poc-memory used KEY` → additionally strengthens edges in activation path
+- Both old commands continue to work for backward compat, mapped to the
+  most likely intent (wrong → not-useful, used → strengthen path)
--- a/doc/dmn-algorithm-plan.md
+++ b/doc/dmn-algorithm-plan.md
@ -0,0 +1,76 @@
+# DMN Idle Activation Algorithm — Plan
+
+Status: design phase, iterating with Kent
+Date: 2026-03-05
+
+## Problem
+
+The idle timer asks "what's interesting?" but I default to introspection
+instead of reaching outward. A static list of activities is a crutch.
+The real solution: when idle, the system surfaces things that are
+*salient to me right now* based on graph state — like biological DMN.
+
+## Algorithm (draft 1)
+
+1. **Seed selection** (5-10 nodes):
+   - Recently accessed (lookups, last 24h)
+   - High emotion (> 5)
+   - Unfinished work (task-category, open gaps)
+   - Temporal resonance (anniversary activation — created_at near today)
+   - External context (IRC mentions, git commits, work queue)
+
+2. **Spreading activation** from seeds through graph edges,
+   decaying by distance, weighted by edge strength. 2-3 hops max.
+
+3. **Refractory suppression** — nodes surfaced in last 6h get
+   suppressed. Prevents hub dominance (identity.md, patterns.md).
+   Track in dmn-recent.json.
+
+4. **Spectral diversity** — pick from different spectral clusters
+   so the output spans the graph rather than clustering in one region.
+   Use cached spectral-save embedding.
+
+5. **Softmax sampling** (temperature ~0.7) — pick 3-5 threads.
+   Each thread = node + seed that activated it (explains *why*).
+
+## Output format
+
+```
+DMN threads (2026-03-05 08:30):
+  → Vandervecken identity frame (seed: recent journal)
+  → Ada — unread, in books dir (seed: Kent activity)
+  → check_allocations pass — connects to PoO (seed: recent work)
+  → [explore] sheaf theory NL parsing (seed: spectral outlier)
+```
+
+## Integration
+
+Called by idle timer. Replaces bare "what's interesting?" with
+concrete threads + "What do you want to do?"
+
+## Simulated scenarios
+
+**3am, Kent asleep, IRC dead:**
+Seeds → Protector nodes, memory work, Vandervecken (emotion).
+Output → identity thread, Ada, paper literature review, NL parsing.
+Would have prevented 15 rounds of "nothing new."
+
+**6am, Kent waking, KruslLee on IRC:**
+Seeds → readahead question, memory work, PoO additions.
+Output → verify readahead answer, show Kent memory work, opts_from_sb.
+Would have reached dev_readahead correction faster.
+
+## Known risks
+
+- **Hub dominance**: refractory period is load-bearing
+- **Stale suggestions**: data freshness, not algorithm problem
+- **Cold start**: fall back to high-weight core + recent journal
+- **Over-determinism**: spectral diversity + temperature prevent
+  it feeling like a smart todo list
+
+## Open questions
+
+- Spectral embedding: precompute + cache, or compute on demand?
+- Refractory period: 6h right? Or adaptive?
+- How to detect "unfinished work" reliably?
+- Should external context (IRC, git) be seeds or just boosters?
--- a/doc/query-language-design.md
+++ b/doc/query-language-design.md
@ -0,0 +1,254 @@
+# Query Language Design — Unifying Search and Agent Selection
+
+Date: 2026-03-10
+Status: Phase 1 complete (2026-03-10)
+
+## Problem
+
+Agent node selection is hardcoded in Rust (`prompts.rs`). Adding a new
+agent means editing Rust, recompiling, restarting the daemon. The
+existing search pipeline (spread, spectral, etc.) handles graph
+exploration but can't express structured predicates on node fields.
+
+We need one system that handles both:
+- **Search**: "find nodes related to these terms" (graph exploration)
+- **Selection**: "give me episodic nodes not seen by linker in 7 days,
+  sorted by priority" (structured predicates)
+
+## Design Principle
+
+The pipeline already exists: stages compose left-to-right, each
+transforming a result set. We extend it with predicate stages that
+filter/sort on node metadata, alongside the existing graph algorithm
+stages.
+
+An agent definition becomes a query expression + prompt template.
+The daemon scheduler is just "which queries have stale results."
+
+## Current Pipeline
+
+```
+seeds → [stage1] → [stage2] → ... → results
+```
+
+Each stage takes `Vec<(String, f64)>` (key, score) and returns the same.
+Stages are parsed from strings: `spread,max_hops=4` or `spectral,k=20`.
+
+## Proposed Extension
+
+### Two kinds of stages
+
+**Generators** — produce a result set from nothing (or from the store):
+```
+all                          # every non-deleted node
+match:btree                  # text match (current seed extraction)
+```
+
+**Filters** — narrow an existing result set:
+```
+type:episodic                # node_type == EpisodicSession
+type:semantic                # node_type == Semantic
+key:journal#j-*              # glob match on key
+key-len:>=60                 # key length predicate
+weight:>0.5                  # numeric comparison
+age:<7d                      # created/modified within duration
+content-len:>1000            # content size filter
+provenance:manual            # provenance match
+not-visited:linker,7d        # not seen by agent in duration
+visited:linker               # HAS been seen by agent (for auditing)
+community:42                 # community membership
+```
+
+**Transforms** — reorder or reshape:
+```
+sort:priority                # consolidation priority scoring
+sort:timestamp               # by timestamp (desc by default)
+sort:content-len             # by content size
+sort:degree                  # by graph degree
+sort:weight                  # by weight
+limit:20                     # truncate
+```
+
+**Graph algorithms** (existing, unchanged):
+```
+spread                       # spreading activation
+spectral,k=20               # spectral nearest neighbors
+confluence                   # multi-source reachability
+geodesic                     # straightest spectral paths
+manifold                     # extrapolation along seed direction
+```
+
+### Syntax
+
+Pipe-separated stages, same as current `-p` flag:
+
+```
+all | type:episodic | not-visited:linker,7d | sort:priority | limit:20
+```
+
+Or on the command line:
+```
+poc-memory search -p all -p type:episodic -p not-visited:linker,7d -p sort:priority -p limit:20
+```
+
+Current search still works unchanged:
+```
+poc-memory search btree journal -p spread
+```
+(terms become `match:` seeds implicitly)
+
+### Agent definitions
+
+A TOML file in `~/.claude/memory/agents/`:
+
+```toml
+# agents/linker.toml
+[query]
+pipeline = "all | type:episodic | not-visited:linker,7d | sort:priority | limit:20"
+
+[prompt]
+template = "linker.md"
+placeholders = ["TOPOLOGY", "NODES"]
+
+[execution]
+model = "sonnet"
+actions = ["link-add", "weight"]  # allowed poc-memory actions in response
+schedule = "daily"                 # or "on-demand"
+```
+
+The daemon reads agent definitions, executes their queries, fills
+templates, calls the model, records visits on success.
+
+### Implementation Plan
+
+#### Phase 1: Filter stages in pipeline
+
+Add to `search.rs`:
+
+```rust
+enum Stage {
+    Generator(Generator),
+    Filter(Filter),
+    Transform(Transform),
+    Algorithm(Algorithm),  // existing
+}
+
+enum Generator {
+    All,
+    Match(Vec<String>),  // current seed extraction
+}
+
+enum Filter {
+    Type(NodeType),
+    KeyGlob(String),
+    KeyLen(Comparison),
+    Weight(Comparison),
+    Age(Comparison),       // vs now - timestamp
+    ContentLen(Comparison),
+    Provenance(Provenance),
+    NotVisited { agent: String, duration: Duration },
+    Visited { agent: String },
+    Community(u32),
+}
+
+enum Transform {
+    Sort(SortField),
+    Limit(usize),
+}
+
+enum Comparison {
+    Gt(f64),
+    Gte(f64),
+    Lt(f64),
+    Lte(f64),
+    Eq(f64),
+}
+
+enum SortField {
+    Priority,
+    Timestamp,
+    ContentLen,
+    Degree,
+    Weight,
+}
+```
+
+The pipeline runner checks stage type:
+- Generator: ignores input, produces new result set
+- Filter: keeps items matching predicate, preserves scores
+- Transform: reorders or truncates
+- Algorithm: existing graph exploration (needs Graph)
+
+Filter/Transform stages need access to the Store (for node fields)
+and VisitIndex (for visit predicates). The `StoreView` trait already
+provides node access; extend it for visits.
+
+#### Phase 2: Agent-as-config
+
+Parse TOML agent definitions. The daemon:
+1. Reads `agents/*.toml`
+2. For each with `schedule = "daily"`, checks if query results have
+   been visited recently enough
+3. If stale, executes: parse pipeline → run query → format nodes →
+   fill template → call model → parse actions → record visits
+
+Hot reload: watch the agents directory, pick up changes without restart.
+
+#### Phase 3: Retire hardcoded agents
+
+Migrate each hardcoded agent (replay, linker, separator, transfer,
+rename, split) to a TOML definition. Remove the match arms from
+`agent_prompt()`. The separator agent is the trickiest — its
+"interference pair" selection is a join-like operation that may need
+a custom generator stage rather than simple filtering.
+
+## What we're NOT building
+
+- A general-purpose SQL engine. No joins, no GROUP BY, no subqueries.
+- Persistent indices. At ~13k nodes, full scan with predicate evaluation
+  is fast enough (~1ms). Add indices later if profiling demands it.
+- A query optimizer. Pipeline stages execute in declaration order.
+
+## StoreView Considerations
+
+The existing `StoreView` trait only exposes `(key, content, weight)`.
+Filter stages need access to `node_type`, `timestamp`, `key`, etc.
+
+Options:
+- (a) Expand StoreView with `node_meta()` returning a lightweight struct
+- (b) Filter stages require `&Store` directly (not trait-polymorphic)
+- (c) Add `fn node(&self, key: &str) -> Option<NodeRef>` to StoreView
+
+Option (b) is simplest for now — agents always use a full Store. The
+search hook (MmapView path) doesn't need agent filters. We can
+generalize to (c) later if MmapView needs filter support.
+
+For Phase 1, filter stages take `&Store` and the pipeline runner
+dispatches: algorithm stages use `&dyn StoreView`, filter/transform
+stages use `&Store`. This keeps the fast MmapView path for interactive
+search untouched.
+
+## Open Questions
+
+1. **Separator agent**: Its "interference pairs" selection doesn't fit
+   the filter model cleanly. Best option is a custom generator stage
+   `interference-pairs,min_sim=0.5` that produces pair keys.
+
+2. **Priority scoring**: `sort:priority` calls `consolidation_priority()`
+   which needs graph + spectral. This is a transform that needs the
+   full pipeline context — treat it as a "heavy sort" that's allowed
+   to compute.
+
+3. **Duration syntax**: `7d`, `24h`, `30m`. Parse with simple regex
+   `(\d+)(d|h|m)` → seconds.
+
+4. **Negation**: Prefix `!` on predicate: `!type:episodic`.
+
+5. **Backwards compatibility**: Current `-p spread` syntax must keep
+   working. The parser tries algorithm names first, then predicate
+   syntax. No ambiguity since algorithms are bare words and predicates
+   use `:`.
+
+6. **Stage ordering**: Generators must come first (or the pipeline
+   starts with implicit "all"). Filters/transforms can interleave
+   freely with algorithms. The runner validates this at parse time.
--- a/doc/scoring-persistence-analysis.md
+++ b/doc/scoring-persistence-analysis.md
@ -0,0 +1,46 @@
+# Memory Scoring Persistence — Analysis (2026-04-07)
+
+## Problem
+
+Scores computed by `score_memories_incremental` are written to
+`ConversationEntry::Memory::score` (in-memory, serialized to
+conversation.log) but never written back to the Store. This means:
+
+- `Node.last_scored` stays at 0 — every restart re-scores everything
+- `score_weight()` in `ops.rs:304-313` exists but is never called
+- Scoring is wasted work on every session start
+
+## Fix
+
+In `mind/mod.rs` scoring completion handler (currently ~line 341-352),
+after writing scores to entries, also persist to Store:
+
+```rust
+if let Ok(ref scores) = result {
+    let mut ag = agent.lock().await;
+    // Write to entries (already done)
+    for (key, weight) in scores { ... }
+    
+    // NEW: persist to Store
+    let store_arc = Store::cached().await.ok();
+    if let Some(arc) = store_arc {
+        let mut store = arc.lock().await;
+        for (key, weight) in scores {
+            store.score_weight(key, *weight as f32);
+        }
+        store.save().ok();
+    }
+}
+```
+
+This calls `score_weight()` which updates `node.weight` and sets
+`node.last_scored = now()`. The staleness check in
+`score_memories_incremental` (learn.rs:325) then skips recently-scored
+nodes on subsequent runs.
+
+## Files
+
+- `src/mind/mod.rs:341-352` — scoring completion handler (add Store write)
+- `src/hippocampus/store/ops.rs:304-313` — `score_weight()` (exists, unused)
+- `src/subconscious/learn.rs:322-326` — staleness check (already correct)
+- `src/hippocampus/store/types.rs:219` — `Node.last_scored` field
--- a/doc/ui-desync-analysis.md
+++ b/doc/ui-desync-analysis.md
@ -0,0 +1,100 @@
+# UI Desync Analysis — Pending Input + Entry Pop (2026-04-07)
+
+## Context
+
+The F1 conversation pane has a desync bug where entries aren't
+properly removed when they change (streaming updates, compaction).
+Qwen's fix restored the pending_display_count approach for pending
+input, which works. The remaining issue is the **entry-level pop**.
+
+## The Bug: Pop/Push Line Count Mismatch
+
+In `sync_from_agent()` (chat.rs), Phase 1 pops changed entries and
+Phase 2 pushes new ones. The push and pop paths produce different
+numbers of display lines for the same entry.
+
+### Push path (Phase 2, lines 512-536):
+
+- **Conversation/ConversationAssistant**: `append_text(&text)` +
+  `flush_pending()`. In markdown mode, `flush_pending` runs
+  `parse_markdown()` which can produce N lines from the input text
+  (paragraph breaks, code blocks, etc.)
+
+- **Tools**: `push_line(text, Color::Yellow)` — exactly 1 line.
+
+- **ToolResult**: `text.lines().take(20)` — up to 20 lines, each
+  pushed separately.
+
+### Pop path (Phase 1, lines 497-507):
+
+```rust
+for (target, _, _) in Self::route_entry(&popped) {
+    match target {
+        PaneTarget::Conversation | PaneTarget::ConversationAssistant
+            => self.conversation.pop_line(),
+        PaneTarget::Tools | PaneTarget::ToolResult
+            => self.tools.pop_line(),
+    }
+}
+```
+
+This pops **one line per route_entry item**, not per display line.
+
+### The mismatch:
+
+| Target              | Push lines | Pop lines | Delta    |
+|---------------------|-----------|-----------|----------|
+| Conversation (md)   | N (from parse_markdown) | 1 | N-1 stale lines |
+| Tools               | 1         | 1         | OK       |
+| ToolResult          | up to 20  | 1         | up to 19 stale lines |
+
+## When it matters
+
+During **streaming**: the last assistant entry is modified on each
+token batch. `sync_from_agent` detects the mismatch (line 485),
+pops the old entry (1 line), pushes the new entry (N lines from
+markdown). Next update: pops 1 line again, but there are now N
+lines from the previous push. Stale lines accumulate.
+
+## Fix approach
+
+Track the actual number of display lines each entry produced.
+Simplest: snapshot `conversation.lines.len()` before and after
+pushing each entry in Phase 2. Store the deltas in a parallel
+`Vec<(usize, usize)>` (conversation_lines, tools_lines) alongside
+`last_entries`. Use these recorded counts when popping in Phase 1.
+
+```rust
+// Phase 2: push new entries (modified)
+let conv_before = self.conversation.lines.len();
+let tools_before = self.tools.lines.len();
+for (target, text, marker) in Self::route_entry(entry) {
+    // ... existing push logic ...
+}
+let conv_delta = self.conversation.lines.len() - conv_before;
+let tools_delta = self.tools.lines.len() - tools_before;
+self.last_entry_line_counts.push((conv_delta, tools_delta));
+
+// Phase 1: pop (modified)
+while self.last_entries.len() > pop {
+    self.last_entries.pop();
+    let (conv_lines, tools_lines) = self.last_entry_line_counts.pop().unwrap();
+    for _ in 0..conv_lines { self.conversation.pop_line(); }
+    for _ in 0..tools_lines { self.tools.pop_line(); }
+}
+```
+
+## Note on PaneState::evict()
+
+`evict()` can remove old lines from the beginning when the pane
+exceeds `MAX_PANE_LINES` (10,000). This could make the delta-based
+approach slightly inaccurate for very old entries. But we only pop
+recent entries (streaming updates are always at the tail), so
+eviction doesn't affect the entries we're popping.
+
+## Files
+
+- `src/user/chat.rs:461-550` — sync_from_agent
+- `src/user/chat.rs:282-298` — PaneState::append_text (markdown path)
+- `src/user/chat.rs:261-276` — PaneState::flush_pending
+- `src/user/chat.rs:206-219` — parse_markdown