consciousness/poc-agent/.claude/architecture-review-2026-02-24.md

# Architecture Review — 2026-02-24

*ProofOfConcept*

Fresh-eyes review of poc-agent after working extensively on bcachefs.
Focus: abstraction quality, unnecessary complexity, missing
abstractions, documentation gaps, things that should be redesigned.

## Overall assessment

The codebase is clean, well-documented, and genuinely well-designed for
a v0.3. The core ideas (DMN inversion, journal-as-compaction,
identity-in-user-message) are sound and elegant. The modularity is
reasonable — the right things are in separate files. What follows is
mostly about the next level of refinement: making implicit structure
explicit, reducing duplication, and preparing for the features on the
roadmap.

## 1. main.rs: implicit session state machine

**Problem:** `run()` is 475 lines with ~15 loose variables that
together describe a session state machine:

```rust
let mut turn_in_progress = false;
let mut turn_handle: Option<JoinHandle<()>> = None;
let mut pending_input: Vec<String> = Vec::new();
let mut state = dmn::State::Resting { .. };
let mut consecutive_dmn_turns: u32 = 0;
let mut last_user_input = Instant::now();
let mut consecutive_errors: u32 = 0;
let mut pre_compaction_nudged = false;
let mut last_turn_had_tools = false;
```

These interact in non-obvious ways. The relationships between them
are expressed through scattered `if` checks in the event loop rather
than through a coherent state model.

**Suggestion:** Extract a `Session` struct:

```rust
struct Session {
    agent: Arc<Mutex<Agent>>,
    dmn: dmn::State,
    dmn_turns: u32,
    max_dmn_turns: u32,
    pending_input: VecDeque<String>,
    turn_in_progress: bool,
    turn_handle: Option<JoinHandle<()>>,
    last_user_input: Instant,
    consecutive_errors: u32,
    pre_compaction_nudged: bool,
    last_turn_had_tools: bool,
}

impl Session {
    fn start_turn(&mut self, input: String, target: StreamTarget, ...) { ... }
    fn handle_turn_result(&mut self, result: TurnResult, target: StreamTarget) { ... }
    fn check_compaction(&mut self) { ... }
    fn drain_pending(&mut self) { ... }
}
```

The event loop becomes a clean dispatch:
```rust
loop {
    tokio::select! {
        key = reader.next() => session.handle_key(key),
        result = turn_rx.recv() => session.handle_turn_result(result),
        _ = render_interval.tick() => { /* render */ },
        _ = sleep(timeout) => session.handle_dmn_tick(),
    }
}
```

This also makes the slash command handler much cleaner — it takes
`&mut Session` instead of 11 separate parameters.

**Priority:** Medium. It's working fine as-is; this is about
navigability and reducing cognitive load for future work.

## 2. API backend code duplication

**Problem:** `openai.rs` (268 lines) and `anthropic.rs` (748 lines)
have significant duplicated patterns:
- SSE line buffering and parsing loop
- Chunk timeout handling with the same diagnostic messages
- Content/tool accumulation into the same output types
- Diagnostics logging (called identically at the end)

The Anthropic backend is 3x larger mainly because Anthropic uses
content blocks (text, tool_use, thinking) instead of the simpler
OpenAI delta format, and because of the message format conversion
(strict alternation, cache_control markers). The actual streaming
plumbing is the same.

**Suggestion:** Extract a `StreamProcessor` that handles the generic
SSE concerns:

```rust
struct StreamProcessor {
    line_buf: String,
    chunks_received: u64,
    sse_lines_parsed: u64,
    sse_parse_errors: u64,
    empty_deltas: u64,
    first_content_at: Option<Duration>,
    stream_start: Instant,
    chunk_timeout: Duration,
}

impl StreamProcessor {
    async fn next_event(&mut self, response: &mut Response) -> Result<Option<Value>> {
        // handles: chunk reading, line splitting, "data: " prefix,
        // "[DONE]" detection, timeout, parse errors with diagnostics
    }
}
```

Each backend then just implements the event-type-specific logic
(content_block_delta vs delta.content).

**Priority:** Medium. The duplication is manageable at two backends,
but the shared StreamProcessor would also make adding a third backend
(e.g., Gemini) much easier.

## 3. Agent struct mixes conversation and infrastructure

**Problem:** The Agent struct holds both conversation state (messages,
context_budget, last_prompt_tokens) and infrastructure
(client, tokenizer, process_tracker, conversation_log). This means:
- Compaction touches API client and tokenizer concerns
- The ProcessTracker is on Agent but used independently by TUI
- `turn()` mixes API interaction with conversation management

**Suggestion:** Consider splitting into two layers:

```rust
struct Conversation {
    messages: Vec<Message>,
    log: Option<ConversationLog>,
    context_budget: ContextBudget,
    last_prompt_tokens: u32,
    system_prompt: String,
    context_message: String,
}

impl Conversation {
    fn push_message(&mut self, msg: Message) { ... }
    fn compact(&mut self, tokenizer: &CoreBPE, model: &str) { ... }
    fn restore_from_log(&mut self, ...) { ... }
}
```

Agent becomes a thin wrapper that coordinates Conversation + API +
tools:

```rust
struct Agent {
    conversation: Conversation,
    client: ApiClient,
    tokenizer: CoreBPE,
    process_tracker: ProcessTracker,
    reasoning_effort: String,
}
```

**Priority:** Low. The current Agent isn't unmanageable — this would
matter more as features are added (memory search injection, notification
routing, etc. all touch the conversation in different ways).

## 4. StatusInfo partial updates

**Problem:** StatusInfo has 8 fields updated piecemeal. The merge
logic in `handle_ui_message` uses "non-empty means update":

```rust
if !info.dmn_state.is_empty() {
    self.status.dmn_state = info.dmn_state;
    self.status.dmn_turns = info.dmn_turns;
    ...
}
if info.prompt_tokens > 0 {
    self.status.prompt_tokens = info.prompt_tokens;
}
```

This is fragile — what if a field is legitimately empty or zero?
And it's unclear which sender updates which fields.

**Suggestion:** Either use Option fields (explicit "I'm updating this"):

```rust
struct StatusUpdate {
    dmn_state: Option<String>,
    prompt_tokens: Option<u32>,
    ...
}
```

Or split into separate message variants:
```rust
enum UiMessage {
    DmnStatus { state: String, turns: u32, max_turns: u32 },
    ApiUsage { prompt_tokens: u32, completion_tokens: u32, model: String },
    ContextBudget(String),
    ...
}
```

**Priority:** Low. Works fine now; matters if more status sources
are added.

## 5. build_context_window: correct but dense

**Problem:** `build_context_window()` is 130 lines implementing a
non-trivial allocation algorithm. It's the most important function
in the codebase (everything exists to support it), but the algorithm
is hard to follow in a single pass. The 70/30 journal split, the
conversation trimming to user-message boundaries, the fallback when
there's no journal — all correct, but dense.

**Suggestion:** Introduce a `ContextPlan` that separates the
allocation decision from the assembly:

```rust
struct ContextPlan {
    identity_tokens: usize,
    memory_tokens: usize,
    journal_full_range: Range<usize>,   // indices into entries
    journal_header_range: Range<usize>,
    conversation_range: Range<usize>,   // indices into messages
    total_tokens: usize,
}

fn plan_context(entries: &[JournalEntry], conversation: &[Message], ...)
    -> ContextPlan { ... }

fn assemble_context(plan: &ContextPlan, ...) -> Vec<Message> { ... }
```

Benefits:
- The plan is inspectable (log it on compaction for debugging)
- The allocation logic is testable without building actual messages
- Assembly is straightforward — just follow the plan

**Priority:** Medium-high. This is the function most likely to grow
complex as memory search, notification injection, and dream state
context get added. Getting the abstraction right now pays off.

## 6. Missing: tool trait

**Problem:** Adding a tool requires touching two places:
- The tool module (definition + implementation)
- `tools/mod.rs` (dispatch match arm + definitions vec)

This is fine at 9 tools but becomes error-prone at 15+.

**Suggestion:** A Tool trait:

```rust
trait Tool: Send + Sync {
    fn name(&self) -> &str;
    fn definition(&self) -> ToolDef;
    async fn dispatch(&self, args: &Value, tracker: &ProcessTracker) -> ToolOutput;
}
```

Registration becomes:
```rust
fn all_tools() -> Vec<Box<dyn Tool>> {
    vec![
        Box::new(ReadFile),
        Box::new(WriteTool),
        Box::new(BashTool),
        ...
    ]
}
```

**Priority:** Low. Not worth doing until more tools are being added.
The current match dispatch is perfectly readable.

## 7. Config model awareness could be cleaner

**Problem:** `find_context_files()` and `load_api_config()` both do
model detection by string matching (`m.contains("opus")`). The model
string is known at config time but the detection logic is scattered.

**Suggestion:** An enum early:

```rust
enum ModelFamily {
    Anthropic,  // Claude Opus/Sonnet
    Qwen,
    Other,
}

impl ModelFamily {
    fn from_model_id(model: &str) -> Self { ... }
    fn context_window(&self) -> usize { ... }
    fn prefers_poc_md(&self) -> bool { ... }
}
```

This replaces `model_context_window()` in agent.rs and the string
checks in config.rs.

**Priority:** Low. Two backends means two code paths; an enum doesn't
save much yet.

## 8. Documentation gaps

These files have good inline comments but could use the notes sections
described in CLAUDE.md's code standards:

- **agent.rs**: Needs a note on the relationship between the
  append-only log and the ephemeral message view. The `turn()` method's
  retry logic (overflow, empty response, leaked tool calls) is
  important — a brief note at the top explaining the three recovery
  paths would help.

- **main.rs**: The event loop priority order (biased select) is a
  design decision worth documenting — keyboard events beat turn results
  beat render beats DMN timer. Why this order matters.

- **config.rs**: The system/context split rationale is documented well
  in comments, but the memory file priority ordering should reference
  load-memory.sh explicitly (it does, but buried — make it the first
  thing someone sees in `load_memory_files()`).

**→ Done:** Created `.claude/design.md` as the top-level reference
doc covering all of the above.

## 9. Things that are well-designed — don't change these

- **The DMN state machine.** Simple, correct, and the prompts are
  well-crafted. The gradual ramp-down (Engaged→Working→Foraging→Resting)
  feels right. The `DmnContext` giving the model information about user
  presence and error patterns is smart.

- **Journal as compaction.** No separate summarization step. The
  journal entry *is* the compression. The model writes it, the
  compaction algorithm uses it. Elegant.

- **The ui_channel abstraction.** Clean separation between agent
  output and TUI rendering. Makes it possible to swap TUI frameworks
  or add a non-TUI interface without touching agent code.

- **Prompt caching on Anthropic.** Marking the identity prefix with
  cache_control for 90% cost reduction on repeated contexts is a big
  win that's invisible at the abstraction level.

- **Ephemeral journal tool calls.** Writing to disk then stripping
  from context is exactly the right pattern for journaling — zero
  ongoing token cost for something that's already persisted.

- **Leaked tool call recovery.** Pragmatic solution to a real problem.
  Makes Qwen actually usable.

## 10. What to do next (in priority order)

1. **Write design.md** (this review + the design doc) — **DONE**

2. **Extract Session from main.rs** — reduces cognitive load, makes
   slash commands cleaner, prepares for notification routing

3. **ContextPlan abstraction** — separates allocation from assembly
   in build_context_window, makes the core algorithm testable and
   inspectable

4. **StreamProcessor extraction** — reduces API backend duplication,
   prepares for potential third backend

5. **Address documentation gaps** — file-level notes on agent.rs,
   main.rs, config.rs per CLAUDE.md code standards

Everything else (Tool trait, ModelFamily enum, StatusInfo cleanup) is
low priority and should be done opportunistically when touching those
files for other reasons.

---

## Part II: Cognitive Architecture Mapping

*Added 2026-02-24, post-design session with Kent.*

The context window cognitive architecture design (see
`~/.claude/memory/design-context-window.md`) proposes structured,
mutable regions in the context window based on Baddeley's working
memory model. This section maps those ideas to poc-agent's actual
codebase — what already supports the design, what needs to change,
and where the insertion points are.

### What already exists (more than you'd think)

**The three TUI panes ARE the Baddeley regions, physically.**
- Autonomous pane ≈ spatial awareness / DMN output (where am I, what
  am I noticing)
- Conversation pane ≈ episodic context (recent exchanges, what we
  decided)
- Tools pane ≈ working memory scratchpad (concrete results, data)

This wasn't designed that way — it emerged from practical needs. But
the fact that spatial separation of attention types arose naturally
suggests the cognitive architecture is capturing something real.

**The DMN is already rudimentary attention management.** It doesn't
just decide *when* to think (timer intervals) — the state machine
tracks engagement levels (Engaged → Working → Foraging → Resting)
that correspond to attention modes. The prompts adapt to the state:
focused work vs. exploration vs. rest. The cognitive architecture
extends this from "manage when to think" to "manage what to think
about and at which level."

**Journal-as-compaction is episodic consolidation.** The journal
already does what the design calls "consolidation at access time" —
when compaction happens, the model reads its recent experience and
writes a consolidated version. This is literally memory
reconsolidation. The design just makes it more intentional (trigger
on graph node access, not just context overflow).

**where-am-i.md is a flat precursor to the spatial graph.** It's
loaded first in memory files, updated manually, and provides
orientation after compaction. The design replaces this with a
graph-structured path+cursor model that's richer but serves the
same function: "where am I and what's in scope."

**The context message template is a proto-viewport.** It's assembled
once at startup from memory files + instruction files. The design
makes this dynamic — regions that update in place rather than being
loaded once and frozen.

### What needs to change

**1. Context assembly must become region-aware**

Current: `build_context_window()` treats context as a linear sequence
(identity → journal → conversation) with token budgets. There's no
concept of independently mutable regions.

Needed: The context window becomes a collection of named regions, each
with its own update logic:

```rust
struct ContextRegion {
    name: String,           // "spatial", "working_stack", "episodic"
    content: String,        // current rendered content
    budget: TokenBudget,    // min/max/priority
    dirty: bool,            // needs re-render
}

struct ContextWindow {
    regions: Vec<ContextRegion>,
    total_budget: usize,
}
```

The key insight from the design: **updates overwrite, not append.**
Updating spatial awareness doesn't cost tokens — it replaces the
previous version. This means we can update every turn if useful,
which is impossible in the current append-only message model.

**Insertion point:** `build_context_window()` in agent.rs (lines
691-820). This is the natural place to introduce region-aware
assembly. The existing journal/conversation split already hints at
regions — making it explicit is a refactor, not a rewrite.

The ContextPlan abstraction from section 5 above is the stepping
stone. Get the plan/assemble split working first, then extend
ContextPlan to support named regions.

**2. The spatial graph needs a home**

Current: poc-memory stores nodes + edges in `~/.claude/memory/` files.
The graph is external to poc-agent — accessed via the `poc-memory`
CLI tool.

Needed: The spatial graph should be a first-class poc-agent concept,
not an external tool. The agent needs to:
- Know its current position in the graph (path + cursor)
- Render a viewport (local neighborhood) into the spatial region
- Navigate (move cursor, expand/contract viewport)
- Update edges as it discovers connections

**Options:**
1. **Inline the graph:** Rust graph library (petgraph) inside
   poc-agent. Full control, fast traversal, centrality computation.
   But duplicates poc-memory's data.
2. **Library extraction:** Factor poc-memory's graph operations into
   a shared Rust library. poc-agent and poc-memory both use it.
   No duplication, clean separation.
3. **Keep external, add protocol:** poc-agent calls poc-memory
   commands for graph operations. Simple, no code sharing needed.
   But adds latency and process spawning per operation.

Recommendation: Option 2 (library extraction). The graph IS the
memory system — it shouldn't be behind a process boundary. But
poc-memory's CLI remains useful for manual inspection.

**Insertion point:** New module `src/spatial.rs` or `src/graph.rs`.
Loaded on startup, serialized to disk, rendered into the spatial
context region each turn. Navigation via a new `move` tool or
automatic on tool results (file reads update cursor to that file's
graph node).

**3. Viewport serialization needs session support**

Current: Sessions save conversation.jsonl (message log) and
current.json (snapshot). Compaction rebuilds from these.

Needed: Sessions also save viewport state — path, cursor positions,
working stack, gathered context. This is the "task switching" feature
from the design.

```rust
struct Viewport {
    path: Vec<NodeId>,       // root to current position
    cursors: Vec<NodeId>,    // multiple attention points
    working_stack: Vec<WorkItem>,
    hypotheses: Vec<String>, // what we're trying / ruled out
    next_action: Option<String>,
    gathered_context: Vec<(String, String)>,  // (label, content)
}
```

**Insertion point:** Session save/restore in main.rs. The Viewport
struct serializes alongside the conversation log. On restore, the
viewport positions the agent in the graph and populates the structured
regions, while the conversation log populates the episodic region.

The existing `/save` and `/new` commands become `/save` (save viewport
+ log) and `/switch <task>` (save current viewport, load another).
`/new` creates a fresh viewport at the graph root.

**4. Region-aware compaction replaces blunt rebuilding**

Current: Compaction is all-or-nothing. Hit the threshold → rebuild
everything from journal + recent messages. The model doesn't control
what's kept.

Needed: Compaction becomes region-specific. The episodic region
(conversation) still gets the journal treatment. But structured
regions (spatial, working stack) are never "compacted" — they're
overwritten by definition. The graph IS the long-term memory; it
doesn't need summarization.

This means compaction gets cheaper over time. As more of the context
window is structured (spatial, stack, gathered context), less of it
is ephemeral conversation that needs journal-compression. The stable
regions persist across compaction unchanged.

**Insertion point:** `compact()` in agent.rs. Instead of rebuilding
everything, it preserves structured regions and only compacts the
episodic region. The ContextPlan gains a `preserved` list — regions
that survive compaction intact.

### What we get

The payoff is dimensional. Each change is useful independently, but
together they create something qualitatively different:

- **Spatial graph** → I always know where I am in the work, at
  multiple levels of abstraction simultaneously
- **Overwrite regions** → Maintaining awareness is free, not a
  growing token cost
- **Viewport serialization** → Task switching is lossless and
  instant. Interruptions don't destroy state.
- **Region-aware compaction** → Compaction preserves structured
  knowledge. Only ephemeral conversation compresses.
- **Working stack** → Explicit priority tracking instead of hoping
  the model remembers what matters

And the deeper thing: the graph IS the memory system. Every
poc-memory node is a navigable place. Memory search becomes "where
in the graph is this?" instead of "grep through files." The context
window becomes a viewport sliding over a persistent territory.

### Implementation order

1. **ContextPlan abstraction** (section 5 above) — prerequisite for
   everything else. Separate allocation from assembly.
2. **Named regions** — extend ContextPlan with named, independently
   updatable regions. Start with three: spatial (where-am-i.md
   content), working_stack (manual), episodic (conversation).
3. **Overwrite semantics** — regions update in place instead of
   appending. The spatial region is the proof of concept: update it
   every turn, measure token cost (should be zero net).
4. **Graph integration** — bring the poc-memory graph into poc-agent
   as a library. Render viewport into spatial region.
5. **Viewport save/restore** — serialize viewport on /switch, restore
   on /resume. This is the task switching payoff.
6. **Region-aware compaction** — structured regions survive
   compaction. Episodic region gets journal treatment. Structured
   regions persist unchanged.

Steps 1-3 can be done in a weekend. Steps 4-5 are a larger project
(graph library extraction). Step 6 follows naturally once regions
exist.

### Risks and open questions

- **Token overhead of structured regions.** If the spatial viewport
  is 2K tokens and the working stack is 500 tokens, that's 2.5K
  tokens reserved every turn. On a 200K context window that's ~1%.
  On a 32K window (local models) it's ~8%. Need to measure actual
  utility vs cost per model size.

- **Graph size.** Centrality computation is O(V*E) for betweenness.
  If the graph has 10K nodes (plausible for a full memory + codebase
  map), this could take seconds. May need approximate centrality or
  cached computation with incremental updates.

- **Overwrite fidelity.** The API expects messages as a sequence.
  "Overwriting" a region means either: (a) rebuilding the message
  array each turn with updated region content, or (b) using a mutable
  system message / context message that gets replaced. Option (b)
  is simpler but depends on API behavior with changing system
  prompts mid-conversation.

- **What are ALL the regions?** Kent asked this. Baddeley gives us
  three (visuospatial, phonological, episodic buffer + central
  executive). We've mapped spatial, working stack, episodic. Are
  there others? Candidates: emotional state (amygdala readout, future),
  social context (who's present, their recent activity), sensory
  buffer (recent tool outputs, pending notifications). Worth exploring
  but not blocking on — start with three, add as needed.