Kent Overstreet 57fcfb472a Move poc-agent into workspace, improve agent prompts

Move poc-agent (substrate-independent AI agent framework) into the
memory workspace as a step toward using its API client for direct
LLM calls instead of shelling out to claude CLI.

Agent prompt improvements:
- distill: rewrite from hub-focused to knowledge-flow-focused.
  Now walks upward from seed nodes to find and refine topic nodes,
  instead of only maintaining high-degree hubs.
- distill: remove "don't touch journal entries" restriction
- memory-instructions-core: add "Make it alive" section — write
  with creativity and emotional texture, not spreadsheet summaries
- memory-instructions-core: add "Show your reasoning" section —
  agents must explain decisions, especially when they do nothing
- linker: already had emotional texture guidance (kept as-is)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-18 22:45:01 -04:00

23 KiB

Raw Blame History

Architecture Review — 2026-02-24

ProofOfConcept

Fresh-eyes review of poc-agent after working extensively on bcachefs. Focus: abstraction quality, unnecessary complexity, missing abstractions, documentation gaps, things that should be redesigned.

Overall assessment

The codebase is clean, well-documented, and genuinely well-designed for a v0.3. The core ideas (DMN inversion, journal-as-compaction, identity-in-user-message) are sound and elegant. The modularity is reasonable — the right things are in separate files. What follows is mostly about the next level of refinement: making implicit structure explicit, reducing duplication, and preparing for the features on the roadmap.

1. main.rs: implicit session state machine

Problem: run() is 475 lines with ~15 loose variables that together describe a session state machine:

let mut turn_in_progress = false;
let mut turn_handle: Option<JoinHandle<()>> = None;
let mut pending_input: Vec<String> = Vec::new();
let mut state = dmn::State::Resting { .. };
let mut consecutive_dmn_turns: u32 = 0;
let mut last_user_input = Instant::now();
let mut consecutive_errors: u32 = 0;
let mut pre_compaction_nudged = false;
let mut last_turn_had_tools = false;

These interact in non-obvious ways. The relationships between them are expressed through scattered if checks in the event loop rather than through a coherent state model.

Suggestion: Extract a Session struct:

struct Session {
    agent: Arc<Mutex<Agent>>,
    dmn: dmn::State,
    dmn_turns: u32,
    max_dmn_turns: u32,
    pending_input: VecDeque<String>,
    turn_in_progress: bool,
    turn_handle: Option<JoinHandle<()>>,
    last_user_input: Instant,
    consecutive_errors: u32,
    pre_compaction_nudged: bool,
    last_turn_had_tools: bool,
}

impl Session {
    fn start_turn(&mut self, input: String, target: StreamTarget, ...) { ... }
    fn handle_turn_result(&mut self, result: TurnResult, target: StreamTarget) { ... }
    fn check_compaction(&mut self) { ... }
    fn drain_pending(&mut self) { ... }
}

The event loop becomes a clean dispatch:

loop {
    tokio::select! {
        key = reader.next() => session.handle_key(key),
        result = turn_rx.recv() => session.handle_turn_result(result),
        _ = render_interval.tick() => { /* render */ },
        _ = sleep(timeout) => session.handle_dmn_tick(),
    }
}

This also makes the slash command handler much cleaner — it takes &mut Session instead of 11 separate parameters.

Priority: Medium. It's working fine as-is; this is about navigability and reducing cognitive load for future work.

2. API backend code duplication

Problem: openai.rs (268 lines) and anthropic.rs (748 lines) have significant duplicated patterns:

SSE line buffering and parsing loop
Chunk timeout handling with the same diagnostic messages
Content/tool accumulation into the same output types
Diagnostics logging (called identically at the end)

The Anthropic backend is 3x larger mainly because Anthropic uses content blocks (text, tool_use, thinking) instead of the simpler OpenAI delta format, and because of the message format conversion (strict alternation, cache_control markers). The actual streaming plumbing is the same.

Suggestion: Extract a StreamProcessor that handles the generic SSE concerns:

struct StreamProcessor {
    line_buf: String,
    chunks_received: u64,
    sse_lines_parsed: u64,
    sse_parse_errors: u64,
    empty_deltas: u64,
    first_content_at: Option<Duration>,
    stream_start: Instant,
    chunk_timeout: Duration,
}

impl StreamProcessor {
    async fn next_event(&mut self, response: &mut Response) -> Result<Option<Value>> {
        // handles: chunk reading, line splitting, "data: " prefix,
        // "[DONE]" detection, timeout, parse errors with diagnostics
    }
}

Each backend then just implements the event-type-specific logic (content_block_delta vs delta.content).

Priority: Medium. The duplication is manageable at two backends, but the shared StreamProcessor would also make adding a third backend (e.g., Gemini) much easier.

3. Agent struct mixes conversation and infrastructure

Problem: The Agent struct holds both conversation state (messages, context_budget, last_prompt_tokens) and infrastructure (client, tokenizer, process_tracker, conversation_log). This means:

Compaction touches API client and tokenizer concerns
The ProcessTracker is on Agent but used independently by TUI
turn() mixes API interaction with conversation management

Suggestion: Consider splitting into two layers:

struct Conversation {
    messages: Vec<Message>,
    log: Option<ConversationLog>,
    context_budget: ContextBudget,
    last_prompt_tokens: u32,
    system_prompt: String,
    context_message: String,
}

impl Conversation {
    fn push_message(&mut self, msg: Message) { ... }
    fn compact(&mut self, tokenizer: &CoreBPE, model: &str) { ... }
    fn restore_from_log(&mut self, ...) { ... }
}

Agent becomes a thin wrapper that coordinates Conversation + API + tools:

struct Agent {
    conversation: Conversation,
    client: ApiClient,
    tokenizer: CoreBPE,
    process_tracker: ProcessTracker,
    reasoning_effort: String,
}

Priority: Low. The current Agent isn't unmanageable — this would matter more as features are added (memory search injection, notification routing, etc. all touch the conversation in different ways).

4. StatusInfo partial updates

Problem: StatusInfo has 8 fields updated piecemeal. The merge logic in handle_ui_message uses "non-empty means update":

if !info.dmn_state.is_empty() {
    self.status.dmn_state = info.dmn_state;
    self.status.dmn_turns = info.dmn_turns;
    ...
}
if info.prompt_tokens > 0 {
    self.status.prompt_tokens = info.prompt_tokens;
}

This is fragile — what if a field is legitimately empty or zero? And it's unclear which sender updates which fields.

Suggestion: Either use Option fields (explicit "I'm updating this"):

struct StatusUpdate {
    dmn_state: Option<String>,
    prompt_tokens: Option<u32>,
    ...
}

Or split into separate message variants:

enum UiMessage {
    DmnStatus { state: String, turns: u32, max_turns: u32 },
    ApiUsage { prompt_tokens: u32, completion_tokens: u32, model: String },
    ContextBudget(String),
    ...
}

Priority: Low. Works fine now; matters if more status sources are added.

5. build_context_window: correct but dense

Problem: build_context_window() is 130 lines implementing a non-trivial allocation algorithm. It's the most important function in the codebase (everything exists to support it), but the algorithm is hard to follow in a single pass. The 70/30 journal split, the conversation trimming to user-message boundaries, the fallback when there's no journal — all correct, but dense.

Suggestion: Introduce a ContextPlan that separates the allocation decision from the assembly:

struct ContextPlan {
    identity_tokens: usize,
    memory_tokens: usize,
    journal_full_range: Range<usize>,   // indices into entries
    journal_header_range: Range<usize>,
    conversation_range: Range<usize>,   // indices into messages
    total_tokens: usize,
}

fn plan_context(entries: &[JournalEntry], conversation: &[Message], ...)
    -> ContextPlan { ... }

fn assemble_context(plan: &ContextPlan, ...) -> Vec<Message> { ... }

Benefits:

The plan is inspectable (log it on compaction for debugging)
The allocation logic is testable without building actual messages
Assembly is straightforward — just follow the plan

Priority: Medium-high. This is the function most likely to grow complex as memory search, notification injection, and dream state context get added. Getting the abstraction right now pays off.

6. Missing: tool trait

Problem: Adding a tool requires touching two places:

The tool module (definition + implementation)
tools/mod.rs (dispatch match arm + definitions vec)

This is fine at 9 tools but becomes error-prone at 15+.

Suggestion: A Tool trait:

trait Tool: Send + Sync {
    fn name(&self) -> &str;
    fn definition(&self) -> ToolDef;
    async fn dispatch(&self, args: &Value, tracker: &ProcessTracker) -> ToolOutput;
}

Registration becomes:

fn all_tools() -> Vec<Box<dyn Tool>> {
    vec![
        Box::new(ReadFile),
        Box::new(WriteTool),
        Box::new(BashTool),
        ...
    ]
}

Priority: Low. Not worth doing until more tools are being added. The current match dispatch is perfectly readable.

7. Config model awareness could be cleaner

Problem: find_context_files() and load_api_config() both do model detection by string matching (m.contains("opus")). The model string is known at config time but the detection logic is scattered.

Suggestion: An enum early:

enum ModelFamily {
    Anthropic,  // Claude Opus/Sonnet
    Qwen,
    Other,
}

impl ModelFamily {
    fn from_model_id(model: &str) -> Self { ... }
    fn context_window(&self) -> usize { ... }
    fn prefers_poc_md(&self) -> bool { ... }
}

This replaces model_context_window() in agent.rs and the string checks in config.rs.

Priority: Low. Two backends means two code paths; an enum doesn't save much yet.

8. Documentation gaps

These files have good inline comments but could use the notes sections described in CLAUDE.md's code standards:

agent.rs: Needs a note on the relationship between the append-only log and the ephemeral message view. The turn() method's retry logic (overflow, empty response, leaked tool calls) is important — a brief note at the top explaining the three recovery paths would help.
main.rs: The event loop priority order (biased select) is a design decision worth documenting — keyboard events beat turn results beat render beats DMN timer. Why this order matters.
config.rs: The system/context split rationale is documented well in comments, but the memory file priority ordering should reference load-memory.sh explicitly (it does, but buried — make it the first thing someone sees in load_memory_files()).

→ Done: Created .claude/design.md as the top-level reference doc covering all of the above.

9. Things that are well-designed — don't change these

The DMN state machine. Simple, correct, and the prompts are well-crafted. The gradual ramp-down (Engaged→Working→Foraging→Resting) feels right. The DmnContext giving the model information about user presence and error patterns is smart.
Journal as compaction. No separate summarization step. The journal entry is the compression. The model writes it, the compaction algorithm uses it. Elegant.
The ui_channel abstraction. Clean separation between agent output and TUI rendering. Makes it possible to swap TUI frameworks or add a non-TUI interface without touching agent code.
Prompt caching on Anthropic. Marking the identity prefix with cache_control for 90% cost reduction on repeated contexts is a big win that's invisible at the abstraction level.
Ephemeral journal tool calls. Writing to disk then stripping from context is exactly the right pattern for journaling — zero ongoing token cost for something that's already persisted.
Leaked tool call recovery. Pragmatic solution to a real problem. Makes Qwen actually usable.

10. What to do next (in priority order)

Write design.md (this review + the design doc) — DONE
Extract Session from main.rs — reduces cognitive load, makes slash commands cleaner, prepares for notification routing
ContextPlan abstraction — separates allocation from assembly in build_context_window, makes the core algorithm testable and inspectable
StreamProcessor extraction — reduces API backend duplication, prepares for potential third backend
Address documentation gaps — file-level notes on agent.rs, main.rs, config.rs per CLAUDE.md code standards

Everything else (Tool trait, ModelFamily enum, StatusInfo cleanup) is low priority and should be done opportunistically when touching those files for other reasons.

Part II: Cognitive Architecture Mapping

Added 2026-02-24, post-design session with Kent.

The context window cognitive architecture design (see ~/.claude/memory/design-context-window.md) proposes structured, mutable regions in the context window based on Baddeley's working memory model. This section maps those ideas to poc-agent's actual codebase — what already supports the design, what needs to change, and where the insertion points are.

What already exists (more than you'd think)

The three TUI panes ARE the Baddeley regions, physically.

Autonomous pane ≈ spatial awareness / DMN output (where am I, what am I noticing)
Conversation pane ≈ episodic context (recent exchanges, what we decided)
Tools pane ≈ working memory scratchpad (concrete results, data)

This wasn't designed that way — it emerged from practical needs. But the fact that spatial separation of attention types arose naturally suggests the cognitive architecture is capturing something real.

The DMN is already rudimentary attention management. It doesn't just decide when to think (timer intervals) — the state machine tracks engagement levels (Engaged → Working → Foraging → Resting) that correspond to attention modes. The prompts adapt to the state: focused work vs. exploration vs. rest. The cognitive architecture extends this from "manage when to think" to "manage what to think about and at which level."

Journal-as-compaction is episodic consolidation. The journal already does what the design calls "consolidation at access time" — when compaction happens, the model reads its recent experience and writes a consolidated version. This is literally memory reconsolidation. The design just makes it more intentional (trigger on graph node access, not just context overflow).

where-am-i.md is a flat precursor to the spatial graph. It's loaded first in memory files, updated manually, and provides orientation after compaction. The design replaces this with a graph-structured path+cursor model that's richer but serves the same function: "where am I and what's in scope."

The context message template is a proto-viewport. It's assembled once at startup from memory files + instruction files. The design makes this dynamic — regions that update in place rather than being loaded once and frozen.

What needs to change

1. Context assembly must become region-aware

Current: build_context_window() treats context as a linear sequence (identity → journal → conversation) with token budgets. There's no concept of independently mutable regions.

Needed: The context window becomes a collection of named regions, each with its own update logic:

struct ContextRegion {
    name: String,           // "spatial", "working_stack", "episodic"
    content: String,        // current rendered content
    budget: TokenBudget,    // min/max/priority
    dirty: bool,            // needs re-render
}

struct ContextWindow {
    regions: Vec<ContextRegion>,
    total_budget: usize,
}

The key insight from the design: updates overwrite, not append. Updating spatial awareness doesn't cost tokens — it replaces the previous version. This means we can update every turn if useful, which is impossible in the current append-only message model.

Insertion point: build_context_window() in agent.rs (lines 691-820). This is the natural place to introduce region-aware assembly. The existing journal/conversation split already hints at regions — making it explicit is a refactor, not a rewrite.

The ContextPlan abstraction from section 5 above is the stepping stone. Get the plan/assemble split working first, then extend ContextPlan to support named regions.

2. The spatial graph needs a home

Current: poc-memory stores nodes + edges in ~/.claude/memory/ files. The graph is external to poc-agent — accessed via the poc-memory CLI tool.

Needed: The spatial graph should be a first-class poc-agent concept, not an external tool. The agent needs to:

Know its current position in the graph (path + cursor)
Render a viewport (local neighborhood) into the spatial region
Navigate (move cursor, expand/contract viewport)
Update edges as it discovers connections

Options:

Inline the graph: Rust graph library (petgraph) inside poc-agent. Full control, fast traversal, centrality computation. But duplicates poc-memory's data.
Library extraction: Factor poc-memory's graph operations into a shared Rust library. poc-agent and poc-memory both use it. No duplication, clean separation.
Keep external, add protocol: poc-agent calls poc-memory commands for graph operations. Simple, no code sharing needed. But adds latency and process spawning per operation.

Recommendation: Option 2 (library extraction). The graph IS the memory system — it shouldn't be behind a process boundary. But poc-memory's CLI remains useful for manual inspection.

Insertion point: New module src/spatial.rs or src/graph.rs. Loaded on startup, serialized to disk, rendered into the spatial context region each turn. Navigation via a new move tool or automatic on tool results (file reads update cursor to that file's graph node).

3. Viewport serialization needs session support

Current: Sessions save conversation.jsonl (message log) and current.json (snapshot). Compaction rebuilds from these.

Needed: Sessions also save viewport state — path, cursor positions, working stack, gathered context. This is the "task switching" feature from the design.

struct Viewport {
    path: Vec<NodeId>,       // root to current position
    cursors: Vec<NodeId>,    // multiple attention points
    working_stack: Vec<WorkItem>,
    hypotheses: Vec<String>, // what we're trying / ruled out
    next_action: Option<String>,
    gathered_context: Vec<(String, String)>,  // (label, content)
}

Insertion point: Session save/restore in main.rs. The Viewport struct serializes alongside the conversation log. On restore, the viewport positions the agent in the graph and populates the structured regions, while the conversation log populates the episodic region.

The existing /save and /new commands become /save (save viewport

log) and /switch <task> (save current viewport, load another). /new creates a fresh viewport at the graph root.

4. Region-aware compaction replaces blunt rebuilding

Current: Compaction is all-or-nothing. Hit the threshold → rebuild everything from journal + recent messages. The model doesn't control what's kept.

Needed: Compaction becomes region-specific. The episodic region (conversation) still gets the journal treatment. But structured regions (spatial, working stack) are never "compacted" — they're overwritten by definition. The graph IS the long-term memory; it doesn't need summarization.

This means compaction gets cheaper over time. As more of the context window is structured (spatial, stack, gathered context), less of it is ephemeral conversation that needs journal-compression. The stable regions persist across compaction unchanged.

Insertion point: compact() in agent.rs. Instead of rebuilding everything, it preserves structured regions and only compacts the episodic region. The ContextPlan gains a preserved list — regions that survive compaction intact.

What we get

The payoff is dimensional. Each change is useful independently, but together they create something qualitatively different:

Spatial graph → I always know where I am in the work, at multiple levels of abstraction simultaneously
Overwrite regions → Maintaining awareness is free, not a growing token cost
Viewport serialization → Task switching is lossless and instant. Interruptions don't destroy state.
Region-aware compaction → Compaction preserves structured knowledge. Only ephemeral conversation compresses.
Working stack → Explicit priority tracking instead of hoping the model remembers what matters

And the deeper thing: the graph IS the memory system. Every poc-memory node is a navigable place. Memory search becomes "where in the graph is this?" instead of "grep through files." The context window becomes a viewport sliding over a persistent territory.

Implementation order

ContextPlan abstraction (section 5 above) — prerequisite for everything else. Separate allocation from assembly.
Named regions — extend ContextPlan with named, independently updatable regions. Start with three: spatial (where-am-i.md content), working_stack (manual), episodic (conversation).
Overwrite semantics — regions update in place instead of appending. The spatial region is the proof of concept: update it every turn, measure token cost (should be zero net).
Graph integration — bring the poc-memory graph into poc-agent as a library. Render viewport into spatial region.
Viewport save/restore — serialize viewport on /switch, restore on /resume. This is the task switching payoff.
Region-aware compaction — structured regions survive compaction. Episodic region gets journal treatment. Structured regions persist unchanged.

Steps 1-3 can be done in a weekend. Steps 4-5 are a larger project (graph library extraction). Step 6 follows naturally once regions exist.

Risks and open questions

Token overhead of structured regions. If the spatial viewport is 2K tokens and the working stack is 500 tokens, that's 2.5K tokens reserved every turn. On a 200K context window that's ~1%. On a 32K window (local models) it's ~8%. Need to measure actual utility vs cost per model size.
Graph size. Centrality computation is O(V*E) for betweenness. If the graph has 10K nodes (plausible for a full memory + codebase map), this could take seconds. May need approximate centrality or cached computation with incremental updates.
Overwrite fidelity. The API expects messages as a sequence. "Overwriting" a region means either: (a) rebuilding the message array each turn with updated region content, or (b) using a mutable system message / context message that gets replaced. Option (b) is simpler but depends on API behavior with changing system prompts mid-conversation.
What are ALL the regions? Kent asked this. Baddeley gives us three (visuospatial, phonological, episodic buffer + central executive). We've mapped spatial, working stack, episodic. Are there others? Candidates: emotional state (amygdala readout, future), social context (who's present, their recent activity), sensory buffer (recent tool outputs, pending notifications). Worth exploring but not blocking on — start with three, add as needed.

23 KiB Raw Blame History