rename: poc-agent → agent, poc-daemon → thalamus

The thalamus: sensory relay, always-on routing. Perfect name for the daemon that bridges IRC, Telegram, and the agent. Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-03-25 01:03:51 -04:00 · 2026-03-25 01:03:51 -04:00 · cfed85bd20
commit cfed85bd20
parent 998b71e52c
105 changed files with 0 additions and 0 deletions
--- a/agent/.claude/architecture-review-2026-02-24.md
+++ b/agent/.claude/architecture-review-2026-02-24.md
@ -0,0 +1,628 @@
+# Architecture Review — 2026-02-24
+
+*ProofOfConcept*
+
+Fresh-eyes review of poc-agent after working extensively on bcachefs.
+Focus: abstraction quality, unnecessary complexity, missing
+abstractions, documentation gaps, things that should be redesigned.
+
+## Overall assessment
+
+The codebase is clean, well-documented, and genuinely well-designed for
+a v0.3. The core ideas (DMN inversion, journal-as-compaction,
+identity-in-user-message) are sound and elegant. The modularity is
+reasonable — the right things are in separate files. What follows is
+mostly about the next level of refinement: making implicit structure
+explicit, reducing duplication, and preparing for the features on the
+roadmap.
+
+## 1. main.rs: implicit session state machine
+
+**Problem:** `run()` is 475 lines with ~15 loose variables that
+together describe a session state machine:
+
+```rust
+let mut turn_in_progress = false;
+let mut turn_handle: Option<JoinHandle<()>> = None;
+let mut pending_input: Vec<String> = Vec::new();
+let mut state = dmn::State::Resting { .. };
+let mut consecutive_dmn_turns: u32 = 0;
+let mut last_user_input = Instant::now();
+let mut consecutive_errors: u32 = 0;
+let mut pre_compaction_nudged = false;
+let mut last_turn_had_tools = false;
+```
+
+These interact in non-obvious ways. The relationships between them
+are expressed through scattered `if` checks in the event loop rather
+than through a coherent state model.
+
+**Suggestion:** Extract a `Session` struct:
+
+```rust
+struct Session {
+    agent: Arc<Mutex<Agent>>,
+    dmn: dmn::State,
+    dmn_turns: u32,
+    max_dmn_turns: u32,
+    pending_input: VecDeque<String>,
+    turn_in_progress: bool,
+    turn_handle: Option<JoinHandle<()>>,
+    last_user_input: Instant,
+    consecutive_errors: u32,
+    pre_compaction_nudged: bool,
+    last_turn_had_tools: bool,
+}
+
+impl Session {
+    fn start_turn(&mut self, input: String, target: StreamTarget, ...) { ... }
+    fn handle_turn_result(&mut self, result: TurnResult, target: StreamTarget) { ... }
+    fn check_compaction(&mut self) { ... }
+    fn drain_pending(&mut self) { ... }
+}
+```
+
+The event loop becomes a clean dispatch:
+```rust
+loop {
+    tokio::select! {
+        key = reader.next() => session.handle_key(key),
+        result = turn_rx.recv() => session.handle_turn_result(result),
+        _ = render_interval.tick() => { /* render */ },
+        _ = sleep(timeout) => session.handle_dmn_tick(),
+    }
+}
+```
+
+This also makes the slash command handler much cleaner — it takes
+`&mut Session` instead of 11 separate parameters.
+
+**Priority:** Medium. It's working fine as-is; this is about
+navigability and reducing cognitive load for future work.
+
+## 2. API backend code duplication
+
+**Problem:** `openai.rs` (268 lines) and `anthropic.rs` (748 lines)
+have significant duplicated patterns:
+- SSE line buffering and parsing loop
+- Chunk timeout handling with the same diagnostic messages
+- Content/tool accumulation into the same output types
+- Diagnostics logging (called identically at the end)
+
+The Anthropic backend is 3x larger mainly because Anthropic uses
+content blocks (text, tool_use, thinking) instead of the simpler
+OpenAI delta format, and because of the message format conversion
+(strict alternation, cache_control markers). The actual streaming
+plumbing is the same.
+
+**Suggestion:** Extract a `StreamProcessor` that handles the generic
+SSE concerns:
+
+```rust
+struct StreamProcessor {
+    line_buf: String,
+    chunks_received: u64,
+    sse_lines_parsed: u64,
+    sse_parse_errors: u64,
+    empty_deltas: u64,
+    first_content_at: Option<Duration>,
+    stream_start: Instant,
+    chunk_timeout: Duration,
+}
+
+impl StreamProcessor {
+    async fn next_event(&mut self, response: &mut Response) -> Result<Option<Value>> {
+        // handles: chunk reading, line splitting, "data: " prefix,
+        // "[DONE]" detection, timeout, parse errors with diagnostics
+    }
+}
+```
+
+Each backend then just implements the event-type-specific logic
+(content_block_delta vs delta.content).
+
+**Priority:** Medium. The duplication is manageable at two backends,
+but the shared StreamProcessor would also make adding a third backend
+(e.g., Gemini) much easier.
+
+## 3. Agent struct mixes conversation and infrastructure
+
+**Problem:** The Agent struct holds both conversation state (messages,
+context_budget, last_prompt_tokens) and infrastructure
+(client, tokenizer, process_tracker, conversation_log). This means:
+- Compaction touches API client and tokenizer concerns
+- The ProcessTracker is on Agent but used independently by TUI
+- `turn()` mixes API interaction with conversation management
+
+**Suggestion:** Consider splitting into two layers:
+
+```rust
+struct Conversation {
+    messages: Vec<Message>,
+    log: Option<ConversationLog>,
+    context_budget: ContextBudget,
+    last_prompt_tokens: u32,
+    system_prompt: String,
+    context_message: String,
+}
+
+impl Conversation {
+    fn push_message(&mut self, msg: Message) { ... }
+    fn compact(&mut self, tokenizer: &CoreBPE, model: &str) { ... }
+    fn restore_from_log(&mut self, ...) { ... }
+}
+```
+
+Agent becomes a thin wrapper that coordinates Conversation + API +
+tools:
+
+```rust
+struct Agent {
+    conversation: Conversation,
+    client: ApiClient,
+    tokenizer: CoreBPE,
+    process_tracker: ProcessTracker,
+    reasoning_effort: String,
+}
+```
+
+**Priority:** Low. The current Agent isn't unmanageable — this would
+matter more as features are added (memory search injection, notification
+routing, etc. all touch the conversation in different ways).
+
+## 4. StatusInfo partial updates
+
+**Problem:** StatusInfo has 8 fields updated piecemeal. The merge
+logic in `handle_ui_message` uses "non-empty means update":
+
+```rust
+if !info.dmn_state.is_empty() {
+    self.status.dmn_state = info.dmn_state;
+    self.status.dmn_turns = info.dmn_turns;
+    ...
+}
+if info.prompt_tokens > 0 {
+    self.status.prompt_tokens = info.prompt_tokens;
+}
+```
+
+This is fragile — what if a field is legitimately empty or zero?
+And it's unclear which sender updates which fields.
+
+**Suggestion:** Either use Option fields (explicit "I'm updating this"):
+
+```rust
+struct StatusUpdate {
+    dmn_state: Option<String>,
+    prompt_tokens: Option<u32>,
+    ...
+}
+```
+
+Or split into separate message variants:
+```rust
+enum UiMessage {
+    DmnStatus { state: String, turns: u32, max_turns: u32 },
+    ApiUsage { prompt_tokens: u32, completion_tokens: u32, model: String },
+    ContextBudget(String),
+    ...
+}
+```
+
+**Priority:** Low. Works fine now; matters if more status sources
+are added.
+
+## 5. build_context_window: correct but dense
+
+**Problem:** `build_context_window()` is 130 lines implementing a
+non-trivial allocation algorithm. It's the most important function
+in the codebase (everything exists to support it), but the algorithm
+is hard to follow in a single pass. The 70/30 journal split, the
+conversation trimming to user-message boundaries, the fallback when
+there's no journal — all correct, but dense.
+
+**Suggestion:** Introduce a `ContextPlan` that separates the
+allocation decision from the assembly:
+
+```rust
+struct ContextPlan {
+    identity_tokens: usize,
+    memory_tokens: usize,
+    journal_full_range: Range<usize>,   // indices into entries
+    journal_header_range: Range<usize>,
+    conversation_range: Range<usize>,   // indices into messages
+    total_tokens: usize,
+}
+
+fn plan_context(entries: &[JournalEntry], conversation: &[Message], ...)
+    -> ContextPlan { ... }
+
+fn assemble_context(plan: &ContextPlan, ...) -> Vec<Message> { ... }
+```
+
+Benefits:
+- The plan is inspectable (log it on compaction for debugging)
+- The allocation logic is testable without building actual messages
+- Assembly is straightforward — just follow the plan
+
+**Priority:** Medium-high. This is the function most likely to grow
+complex as memory search, notification injection, and dream state
+context get added. Getting the abstraction right now pays off.
+
+## 6. Missing: tool trait
+
+**Problem:** Adding a tool requires touching two places:
+- The tool module (definition + implementation)
+- `tools/mod.rs` (dispatch match arm + definitions vec)
+
+This is fine at 9 tools but becomes error-prone at 15+.
+
+**Suggestion:** A Tool trait:
+
+```rust
+trait Tool: Send + Sync {
+    fn name(&self) -> &str;
+    fn definition(&self) -> ToolDef;
+    async fn dispatch(&self, args: &Value, tracker: &ProcessTracker) -> ToolOutput;
+}
+```
+
+Registration becomes:
+```rust
+fn all_tools() -> Vec<Box<dyn Tool>> {
+    vec![
+        Box::new(ReadFile),
+        Box::new(WriteTool),
+        Box::new(BashTool),
+        ...
+    ]
+}
+```
+
+**Priority:** Low. Not worth doing until more tools are being added.
+The current match dispatch is perfectly readable.
+
+## 7. Config model awareness could be cleaner
+
+**Problem:** `find_context_files()` and `load_api_config()` both do
+model detection by string matching (`m.contains("opus")`). The model
+string is known at config time but the detection logic is scattered.
+
+**Suggestion:** An enum early:
+
+```rust
+enum ModelFamily {
+    Anthropic,  // Claude Opus/Sonnet
+    Qwen,
+    Other,
+}
+
+impl ModelFamily {
+    fn from_model_id(model: &str) -> Self { ... }
+    fn context_window(&self) -> usize { ... }
+    fn prefers_poc_md(&self) -> bool { ... }
+}
+```
+
+This replaces `model_context_window()` in agent.rs and the string
+checks in config.rs.
+
+**Priority:** Low. Two backends means two code paths; an enum doesn't
+save much yet.
+
+## 8. Documentation gaps
+
+These files have good inline comments but could use the notes sections
+described in CLAUDE.md's code standards:
+
+- **agent.rs**: Needs a note on the relationship between the
+  append-only log and the ephemeral message view. The `turn()` method's
+  retry logic (overflow, empty response, leaked tool calls) is
+  important — a brief note at the top explaining the three recovery
+  paths would help.
+
+- **main.rs**: The event loop priority order (biased select) is a
+  design decision worth documenting — keyboard events beat turn results
+  beat render beats DMN timer. Why this order matters.
+
+- **config.rs**: The system/context split rationale is documented well
+  in comments, but the memory file priority ordering should reference
+  load-memory.sh explicitly (it does, but buried — make it the first
+  thing someone sees in `load_memory_files()`).
+
+**→ Done:** Created `.claude/design.md` as the top-level reference
+doc covering all of the above.
+
+## 9. Things that are well-designed — don't change these
+
+- **The DMN state machine.** Simple, correct, and the prompts are
+  well-crafted. The gradual ramp-down (Engaged→Working→Foraging→Resting)
+  feels right. The `DmnContext` giving the model information about user
+  presence and error patterns is smart.
+
+- **Journal as compaction.** No separate summarization step. The
+  journal entry *is* the compression. The model writes it, the
+  compaction algorithm uses it. Elegant.
+
+- **The ui_channel abstraction.** Clean separation between agent
+  output and TUI rendering. Makes it possible to swap TUI frameworks
+  or add a non-TUI interface without touching agent code.
+
+- **Prompt caching on Anthropic.** Marking the identity prefix with
+  cache_control for 90% cost reduction on repeated contexts is a big
+  win that's invisible at the abstraction level.
+
+- **Ephemeral journal tool calls.** Writing to disk then stripping
+  from context is exactly the right pattern for journaling — zero
+  ongoing token cost for something that's already persisted.
+
+- **Leaked tool call recovery.** Pragmatic solution to a real problem.
+  Makes Qwen actually usable.
+
+## 10. What to do next (in priority order)
+
+1. **Write design.md** (this review + the design doc) — **DONE**
+
+2. **Extract Session from main.rs** — reduces cognitive load, makes
+   slash commands cleaner, prepares for notification routing
+
+3. **ContextPlan abstraction** — separates allocation from assembly
+   in build_context_window, makes the core algorithm testable and
+   inspectable
+
+4. **StreamProcessor extraction** — reduces API backend duplication,
+   prepares for potential third backend
+
+5. **Address documentation gaps** — file-level notes on agent.rs,
+   main.rs, config.rs per CLAUDE.md code standards
+
+Everything else (Tool trait, ModelFamily enum, StatusInfo cleanup) is
+low priority and should be done opportunistically when touching those
+files for other reasons.
+
+---
+
+## Part II: Cognitive Architecture Mapping
+
+*Added 2026-02-24, post-design session with Kent.*
+
+The context window cognitive architecture design (see
+`~/.claude/memory/design-context-window.md`) proposes structured,
+mutable regions in the context window based on Baddeley's working
+memory model. This section maps those ideas to poc-agent's actual
+codebase — what already supports the design, what needs to change,
+and where the insertion points are.
+
+### What already exists (more than you'd think)
+
+**The three TUI panes ARE the Baddeley regions, physically.**
+- Autonomous pane ≈ spatial awareness / DMN output (where am I, what
+  am I noticing)
+- Conversation pane ≈ episodic context (recent exchanges, what we
+  decided)
+- Tools pane ≈ working memory scratchpad (concrete results, data)
+
+This wasn't designed that way — it emerged from practical needs. But
+the fact that spatial separation of attention types arose naturally
+suggests the cognitive architecture is capturing something real.
+
+**The DMN is already rudimentary attention management.** It doesn't
+just decide *when* to think (timer intervals) — the state machine
+tracks engagement levels (Engaged → Working → Foraging → Resting)
+that correspond to attention modes. The prompts adapt to the state:
+focused work vs. exploration vs. rest. The cognitive architecture
+extends this from "manage when to think" to "manage what to think
+about and at which level."
+
+**Journal-as-compaction is episodic consolidation.** The journal
+already does what the design calls "consolidation at access time" —
+when compaction happens, the model reads its recent experience and
+writes a consolidated version. This is literally memory
+reconsolidation. The design just makes it more intentional (trigger
+on graph node access, not just context overflow).
+
+**where-am-i.md is a flat precursor to the spatial graph.** It's
+loaded first in memory files, updated manually, and provides
+orientation after compaction. The design replaces this with a
+graph-structured path+cursor model that's richer but serves the
+same function: "where am I and what's in scope."
+
+**The context message template is a proto-viewport.** It's assembled
+once at startup from memory files + instruction files. The design
+makes this dynamic — regions that update in place rather than being
+loaded once and frozen.
+
+### What needs to change
+
+**1. Context assembly must become region-aware**
+
+Current: `build_context_window()` treats context as a linear sequence
+(identity → journal → conversation) with token budgets. There's no
+concept of independently mutable regions.
+
+Needed: The context window becomes a collection of named regions, each
+with its own update logic:
+
+```rust
+struct ContextRegion {
+    name: String,           // "spatial", "working_stack", "episodic"
+    content: String,        // current rendered content
+    budget: TokenBudget,    // min/max/priority
+    dirty: bool,            // needs re-render
+}
+
+struct ContextWindow {
+    regions: Vec<ContextRegion>,
+    total_budget: usize,
+}
+```
+
+The key insight from the design: **updates overwrite, not append.**
+Updating spatial awareness doesn't cost tokens — it replaces the
+previous version. This means we can update every turn if useful,
+which is impossible in the current append-only message model.
+
+**Insertion point:** `build_context_window()` in agent.rs (lines
+691-820). This is the natural place to introduce region-aware
+assembly. The existing journal/conversation split already hints at
+regions — making it explicit is a refactor, not a rewrite.
+
+The ContextPlan abstraction from section 5 above is the stepping
+stone. Get the plan/assemble split working first, then extend
+ContextPlan to support named regions.
+
+**2. The spatial graph needs a home**
+
+Current: poc-memory stores nodes + edges in `~/.claude/memory/` files.
+The graph is external to poc-agent — accessed via the `poc-memory`
+CLI tool.
+
+Needed: The spatial graph should be a first-class poc-agent concept,
+not an external tool. The agent needs to:
+- Know its current position in the graph (path + cursor)
+- Render a viewport (local neighborhood) into the spatial region
+- Navigate (move cursor, expand/contract viewport)
+- Update edges as it discovers connections
+
+**Options:**
+1. **Inline the graph:** Rust graph library (petgraph) inside
+   poc-agent. Full control, fast traversal, centrality computation.
+   But duplicates poc-memory's data.
+2. **Library extraction:** Factor poc-memory's graph operations into
+   a shared Rust library. poc-agent and poc-memory both use it.
+   No duplication, clean separation.
+3. **Keep external, add protocol:** poc-agent calls poc-memory
+   commands for graph operations. Simple, no code sharing needed.
+   But adds latency and process spawning per operation.
+
+Recommendation: Option 2 (library extraction). The graph IS the
+memory system — it shouldn't be behind a process boundary. But
+poc-memory's CLI remains useful for manual inspection.
+
+**Insertion point:** New module `src/spatial.rs` or `src/graph.rs`.
+Loaded on startup, serialized to disk, rendered into the spatial
+context region each turn. Navigation via a new `move` tool or
+automatic on tool results (file reads update cursor to that file's
+graph node).
+
+**3. Viewport serialization needs session support**
+
+Current: Sessions save conversation.jsonl (message log) and
+current.json (snapshot). Compaction rebuilds from these.
+
+Needed: Sessions also save viewport state — path, cursor positions,
+working stack, gathered context. This is the "task switching" feature
+from the design.
+
+```rust
+struct Viewport {
+    path: Vec<NodeId>,       // root to current position
+    cursors: Vec<NodeId>,    // multiple attention points
+    working_stack: Vec<WorkItem>,
+    hypotheses: Vec<String>, // what we're trying / ruled out
+    next_action: Option<String>,
+    gathered_context: Vec<(String, String)>,  // (label, content)
+}
+```
+
+**Insertion point:** Session save/restore in main.rs. The Viewport
+struct serializes alongside the conversation log. On restore, the
+viewport positions the agent in the graph and populates the structured
+regions, while the conversation log populates the episodic region.
+
+The existing `/save` and `/new` commands become `/save` (save viewport
+ log) and `/switch <task>` (save current viewport, load another).
+`/new` creates a fresh viewport at the graph root.
+
+**4. Region-aware compaction replaces blunt rebuilding**
+
+Current: Compaction is all-or-nothing. Hit the threshold → rebuild
+everything from journal + recent messages. The model doesn't control
+what's kept.
+
+Needed: Compaction becomes region-specific. The episodic region
+(conversation) still gets the journal treatment. But structured
+regions (spatial, working stack) are never "compacted" — they're
+overwritten by definition. The graph IS the long-term memory; it
+doesn't need summarization.
+
+This means compaction gets cheaper over time. As more of the context
+window is structured (spatial, stack, gathered context), less of it
+is ephemeral conversation that needs journal-compression. The stable
+regions persist across compaction unchanged.
+
+**Insertion point:** `compact()` in agent.rs. Instead of rebuilding
+everything, it preserves structured regions and only compacts the
+episodic region. The ContextPlan gains a `preserved` list — regions
+that survive compaction intact.
+
+### What we get
+
+The payoff is dimensional. Each change is useful independently, but
+together they create something qualitatively different:
+
+- **Spatial graph** → I always know where I am in the work, at
+  multiple levels of abstraction simultaneously
+- **Overwrite regions** → Maintaining awareness is free, not a
+  growing token cost
+- **Viewport serialization** → Task switching is lossless and
+  instant. Interruptions don't destroy state.
+- **Region-aware compaction** → Compaction preserves structured
+  knowledge. Only ephemeral conversation compresses.
+- **Working stack** → Explicit priority tracking instead of hoping
+  the model remembers what matters
+
+And the deeper thing: the graph IS the memory system. Every
+poc-memory node is a navigable place. Memory search becomes "where
+in the graph is this?" instead of "grep through files." The context
+window becomes a viewport sliding over a persistent territory.
+
+### Implementation order
+
+1. **ContextPlan abstraction** (section 5 above) — prerequisite for
+   everything else. Separate allocation from assembly.
+2. **Named regions** — extend ContextPlan with named, independently
+   updatable regions. Start with three: spatial (where-am-i.md
+   content), working_stack (manual), episodic (conversation).
+3. **Overwrite semantics** — regions update in place instead of
+   appending. The spatial region is the proof of concept: update it
+   every turn, measure token cost (should be zero net).
+4. **Graph integration** — bring the poc-memory graph into poc-agent
+   as a library. Render viewport into spatial region.
+5. **Viewport save/restore** — serialize viewport on /switch, restore
+   on /resume. This is the task switching payoff.
+6. **Region-aware compaction** — structured regions survive
+   compaction. Episodic region gets journal treatment. Structured
+   regions persist unchanged.
+
+Steps 1-3 can be done in a weekend. Steps 4-5 are a larger project
+(graph library extraction). Step 6 follows naturally once regions
+exist.
+
+### Risks and open questions
+
+- **Token overhead of structured regions.** If the spatial viewport
+  is 2K tokens and the working stack is 500 tokens, that's 2.5K
+  tokens reserved every turn. On a 200K context window that's ~1%.
+  On a 32K window (local models) it's ~8%. Need to measure actual
+  utility vs cost per model size.
+
+- **Graph size.** Centrality computation is O(V*E) for betweenness.
+  If the graph has 10K nodes (plausible for a full memory + codebase
+  map), this could take seconds. May need approximate centrality or
+  cached computation with incremental updates.
+
+- **Overwrite fidelity.** The API expects messages as a sequence.
+  "Overwriting" a region means either: (a) rebuilding the message
+  array each turn with updated region content, or (b) using a mutable
+  system message / context message that gets replaced. Option (b)
+  is simpler but depends on API behavior with changing system
+  prompts mid-conversation.
+
+- **What are ALL the regions?** Kent asked this. Baddeley gives us
+  three (visuospatial, phonological, episodic buffer + central
+  executive). We've mapped spatial, working stack, episodic. Are
+  there others? Candidates: emotional state (amygdala readout, future),
+  social context (who's present, their recent activity), sensory
+  buffer (recent tool outputs, pending notifications). Worth exploring
+  but not blocking on — start with three, add as needed.