629 lines
23 KiB
Markdown
629 lines
23 KiB
Markdown
|
|
# Architecture Review — 2026-02-24
|
||
|
|
|
||
|
|
*ProofOfConcept*
|
||
|
|
|
||
|
|
Fresh-eyes review of poc-agent after working extensively on bcachefs.
|
||
|
|
Focus: abstraction quality, unnecessary complexity, missing
|
||
|
|
abstractions, documentation gaps, things that should be redesigned.
|
||
|
|
|
||
|
|
## Overall assessment
|
||
|
|
|
||
|
|
The codebase is clean, well-documented, and genuinely well-designed for
|
||
|
|
a v0.3. The core ideas (DMN inversion, journal-as-compaction,
|
||
|
|
identity-in-user-message) are sound and elegant. The modularity is
|
||
|
|
reasonable — the right things are in separate files. What follows is
|
||
|
|
mostly about the next level of refinement: making implicit structure
|
||
|
|
explicit, reducing duplication, and preparing for the features on the
|
||
|
|
roadmap.
|
||
|
|
|
||
|
|
## 1. main.rs: implicit session state machine
|
||
|
|
|
||
|
|
**Problem:** `run()` is 475 lines with ~15 loose variables that
|
||
|
|
together describe a session state machine:
|
||
|
|
|
||
|
|
```rust
|
||
|
|
let mut turn_in_progress = false;
|
||
|
|
let mut turn_handle: Option<JoinHandle<()>> = None;
|
||
|
|
let mut pending_input: Vec<String> = Vec::new();
|
||
|
|
let mut state = dmn::State::Resting { .. };
|
||
|
|
let mut consecutive_dmn_turns: u32 = 0;
|
||
|
|
let mut last_user_input = Instant::now();
|
||
|
|
let mut consecutive_errors: u32 = 0;
|
||
|
|
let mut pre_compaction_nudged = false;
|
||
|
|
let mut last_turn_had_tools = false;
|
||
|
|
```
|
||
|
|
|
||
|
|
These interact in non-obvious ways. The relationships between them
|
||
|
|
are expressed through scattered `if` checks in the event loop rather
|
||
|
|
than through a coherent state model.
|
||
|
|
|
||
|
|
**Suggestion:** Extract a `Session` struct:
|
||
|
|
|
||
|
|
```rust
|
||
|
|
struct Session {
|
||
|
|
agent: Arc<Mutex<Agent>>,
|
||
|
|
dmn: dmn::State,
|
||
|
|
dmn_turns: u32,
|
||
|
|
max_dmn_turns: u32,
|
||
|
|
pending_input: VecDeque<String>,
|
||
|
|
turn_in_progress: bool,
|
||
|
|
turn_handle: Option<JoinHandle<()>>,
|
||
|
|
last_user_input: Instant,
|
||
|
|
consecutive_errors: u32,
|
||
|
|
pre_compaction_nudged: bool,
|
||
|
|
last_turn_had_tools: bool,
|
||
|
|
}
|
||
|
|
|
||
|
|
impl Session {
|
||
|
|
fn start_turn(&mut self, input: String, target: StreamTarget, ...) { ... }
|
||
|
|
fn handle_turn_result(&mut self, result: TurnResult, target: StreamTarget) { ... }
|
||
|
|
fn check_compaction(&mut self) { ... }
|
||
|
|
fn drain_pending(&mut self) { ... }
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
The event loop becomes a clean dispatch:
|
||
|
|
```rust
|
||
|
|
loop {
|
||
|
|
tokio::select! {
|
||
|
|
key = reader.next() => session.handle_key(key),
|
||
|
|
result = turn_rx.recv() => session.handle_turn_result(result),
|
||
|
|
_ = render_interval.tick() => { /* render */ },
|
||
|
|
_ = sleep(timeout) => session.handle_dmn_tick(),
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
This also makes the slash command handler much cleaner — it takes
|
||
|
|
`&mut Session` instead of 11 separate parameters.
|
||
|
|
|
||
|
|
**Priority:** Medium. It's working fine as-is; this is about
|
||
|
|
navigability and reducing cognitive load for future work.
|
||
|
|
|
||
|
|
## 2. API backend code duplication
|
||
|
|
|
||
|
|
**Problem:** `openai.rs` (268 lines) and `anthropic.rs` (748 lines)
|
||
|
|
have significant duplicated patterns:
|
||
|
|
- SSE line buffering and parsing loop
|
||
|
|
- Chunk timeout handling with the same diagnostic messages
|
||
|
|
- Content/tool accumulation into the same output types
|
||
|
|
- Diagnostics logging (called identically at the end)
|
||
|
|
|
||
|
|
The Anthropic backend is 3x larger mainly because Anthropic uses
|
||
|
|
content blocks (text, tool_use, thinking) instead of the simpler
|
||
|
|
OpenAI delta format, and because of the message format conversion
|
||
|
|
(strict alternation, cache_control markers). The actual streaming
|
||
|
|
plumbing is the same.
|
||
|
|
|
||
|
|
**Suggestion:** Extract a `StreamProcessor` that handles the generic
|
||
|
|
SSE concerns:
|
||
|
|
|
||
|
|
```rust
|
||
|
|
struct StreamProcessor {
|
||
|
|
line_buf: String,
|
||
|
|
chunks_received: u64,
|
||
|
|
sse_lines_parsed: u64,
|
||
|
|
sse_parse_errors: u64,
|
||
|
|
empty_deltas: u64,
|
||
|
|
first_content_at: Option<Duration>,
|
||
|
|
stream_start: Instant,
|
||
|
|
chunk_timeout: Duration,
|
||
|
|
}
|
||
|
|
|
||
|
|
impl StreamProcessor {
|
||
|
|
async fn next_event(&mut self, response: &mut Response) -> Result<Option<Value>> {
|
||
|
|
// handles: chunk reading, line splitting, "data: " prefix,
|
||
|
|
// "[DONE]" detection, timeout, parse errors with diagnostics
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
Each backend then just implements the event-type-specific logic
|
||
|
|
(content_block_delta vs delta.content).
|
||
|
|
|
||
|
|
**Priority:** Medium. The duplication is manageable at two backends,
|
||
|
|
but the shared StreamProcessor would also make adding a third backend
|
||
|
|
(e.g., Gemini) much easier.
|
||
|
|
|
||
|
|
## 3. Agent struct mixes conversation and infrastructure
|
||
|
|
|
||
|
|
**Problem:** The Agent struct holds both conversation state (messages,
|
||
|
|
context_budget, last_prompt_tokens) and infrastructure
|
||
|
|
(client, tokenizer, process_tracker, conversation_log). This means:
|
||
|
|
- Compaction touches API client and tokenizer concerns
|
||
|
|
- The ProcessTracker is on Agent but used independently by TUI
|
||
|
|
- `turn()` mixes API interaction with conversation management
|
||
|
|
|
||
|
|
**Suggestion:** Consider splitting into two layers:
|
||
|
|
|
||
|
|
```rust
|
||
|
|
struct Conversation {
|
||
|
|
messages: Vec<Message>,
|
||
|
|
log: Option<ConversationLog>,
|
||
|
|
context_budget: ContextBudget,
|
||
|
|
last_prompt_tokens: u32,
|
||
|
|
system_prompt: String,
|
||
|
|
context_message: String,
|
||
|
|
}
|
||
|
|
|
||
|
|
impl Conversation {
|
||
|
|
fn push_message(&mut self, msg: Message) { ... }
|
||
|
|
fn compact(&mut self, tokenizer: &CoreBPE, model: &str) { ... }
|
||
|
|
fn restore_from_log(&mut self, ...) { ... }
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
Agent becomes a thin wrapper that coordinates Conversation + API +
|
||
|
|
tools:
|
||
|
|
|
||
|
|
```rust
|
||
|
|
struct Agent {
|
||
|
|
conversation: Conversation,
|
||
|
|
client: ApiClient,
|
||
|
|
tokenizer: CoreBPE,
|
||
|
|
process_tracker: ProcessTracker,
|
||
|
|
reasoning_effort: String,
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Priority:** Low. The current Agent isn't unmanageable — this would
|
||
|
|
matter more as features are added (memory search injection, notification
|
||
|
|
routing, etc. all touch the conversation in different ways).
|
||
|
|
|
||
|
|
## 4. StatusInfo partial updates
|
||
|
|
|
||
|
|
**Problem:** StatusInfo has 8 fields updated piecemeal. The merge
|
||
|
|
logic in `handle_ui_message` uses "non-empty means update":
|
||
|
|
|
||
|
|
```rust
|
||
|
|
if !info.dmn_state.is_empty() {
|
||
|
|
self.status.dmn_state = info.dmn_state;
|
||
|
|
self.status.dmn_turns = info.dmn_turns;
|
||
|
|
...
|
||
|
|
}
|
||
|
|
if info.prompt_tokens > 0 {
|
||
|
|
self.status.prompt_tokens = info.prompt_tokens;
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
This is fragile — what if a field is legitimately empty or zero?
|
||
|
|
And it's unclear which sender updates which fields.
|
||
|
|
|
||
|
|
**Suggestion:** Either use Option fields (explicit "I'm updating this"):
|
||
|
|
|
||
|
|
```rust
|
||
|
|
struct StatusUpdate {
|
||
|
|
dmn_state: Option<String>,
|
||
|
|
prompt_tokens: Option<u32>,
|
||
|
|
...
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
Or split into separate message variants:
|
||
|
|
```rust
|
||
|
|
enum UiMessage {
|
||
|
|
DmnStatus { state: String, turns: u32, max_turns: u32 },
|
||
|
|
ApiUsage { prompt_tokens: u32, completion_tokens: u32, model: String },
|
||
|
|
ContextBudget(String),
|
||
|
|
...
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Priority:** Low. Works fine now; matters if more status sources
|
||
|
|
are added.
|
||
|
|
|
||
|
|
## 5. build_context_window: correct but dense
|
||
|
|
|
||
|
|
**Problem:** `build_context_window()` is 130 lines implementing a
|
||
|
|
non-trivial allocation algorithm. It's the most important function
|
||
|
|
in the codebase (everything exists to support it), but the algorithm
|
||
|
|
is hard to follow in a single pass. The 70/30 journal split, the
|
||
|
|
conversation trimming to user-message boundaries, the fallback when
|
||
|
|
there's no journal — all correct, but dense.
|
||
|
|
|
||
|
|
**Suggestion:** Introduce a `ContextPlan` that separates the
|
||
|
|
allocation decision from the assembly:
|
||
|
|
|
||
|
|
```rust
|
||
|
|
struct ContextPlan {
|
||
|
|
identity_tokens: usize,
|
||
|
|
memory_tokens: usize,
|
||
|
|
journal_full_range: Range<usize>, // indices into entries
|
||
|
|
journal_header_range: Range<usize>,
|
||
|
|
conversation_range: Range<usize>, // indices into messages
|
||
|
|
total_tokens: usize,
|
||
|
|
}
|
||
|
|
|
||
|
|
fn plan_context(entries: &[JournalEntry], conversation: &[Message], ...)
|
||
|
|
-> ContextPlan { ... }
|
||
|
|
|
||
|
|
fn assemble_context(plan: &ContextPlan, ...) -> Vec<Message> { ... }
|
||
|
|
```
|
||
|
|
|
||
|
|
Benefits:
|
||
|
|
- The plan is inspectable (log it on compaction for debugging)
|
||
|
|
- The allocation logic is testable without building actual messages
|
||
|
|
- Assembly is straightforward — just follow the plan
|
||
|
|
|
||
|
|
**Priority:** Medium-high. This is the function most likely to grow
|
||
|
|
complex as memory search, notification injection, and dream state
|
||
|
|
context get added. Getting the abstraction right now pays off.
|
||
|
|
|
||
|
|
## 6. Missing: tool trait
|
||
|
|
|
||
|
|
**Problem:** Adding a tool requires touching two places:
|
||
|
|
- The tool module (definition + implementation)
|
||
|
|
- `tools/mod.rs` (dispatch match arm + definitions vec)
|
||
|
|
|
||
|
|
This is fine at 9 tools but becomes error-prone at 15+.
|
||
|
|
|
||
|
|
**Suggestion:** A Tool trait:
|
||
|
|
|
||
|
|
```rust
|
||
|
|
trait Tool: Send + Sync {
|
||
|
|
fn name(&self) -> &str;
|
||
|
|
fn definition(&self) -> ToolDef;
|
||
|
|
async fn dispatch(&self, args: &Value, tracker: &ProcessTracker) -> ToolOutput;
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
Registration becomes:
|
||
|
|
```rust
|
||
|
|
fn all_tools() -> Vec<Box<dyn Tool>> {
|
||
|
|
vec![
|
||
|
|
Box::new(ReadFile),
|
||
|
|
Box::new(WriteTool),
|
||
|
|
Box::new(BashTool),
|
||
|
|
...
|
||
|
|
]
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Priority:** Low. Not worth doing until more tools are being added.
|
||
|
|
The current match dispatch is perfectly readable.
|
||
|
|
|
||
|
|
## 7. Config model awareness could be cleaner
|
||
|
|
|
||
|
|
**Problem:** `find_context_files()` and `load_api_config()` both do
|
||
|
|
model detection by string matching (`m.contains("opus")`). The model
|
||
|
|
string is known at config time but the detection logic is scattered.
|
||
|
|
|
||
|
|
**Suggestion:** An enum early:
|
||
|
|
|
||
|
|
```rust
|
||
|
|
enum ModelFamily {
|
||
|
|
Anthropic, // Claude Opus/Sonnet
|
||
|
|
Qwen,
|
||
|
|
Other,
|
||
|
|
}
|
||
|
|
|
||
|
|
impl ModelFamily {
|
||
|
|
fn from_model_id(model: &str) -> Self { ... }
|
||
|
|
fn context_window(&self) -> usize { ... }
|
||
|
|
fn prefers_poc_md(&self) -> bool { ... }
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
This replaces `model_context_window()` in agent.rs and the string
|
||
|
|
checks in config.rs.
|
||
|
|
|
||
|
|
**Priority:** Low. Two backends means two code paths; an enum doesn't
|
||
|
|
save much yet.
|
||
|
|
|
||
|
|
## 8. Documentation gaps
|
||
|
|
|
||
|
|
These files have good inline comments but could use the notes sections
|
||
|
|
described in CLAUDE.md's code standards:
|
||
|
|
|
||
|
|
- **agent.rs**: Needs a note on the relationship between the
|
||
|
|
append-only log and the ephemeral message view. The `turn()` method's
|
||
|
|
retry logic (overflow, empty response, leaked tool calls) is
|
||
|
|
important — a brief note at the top explaining the three recovery
|
||
|
|
paths would help.
|
||
|
|
|
||
|
|
- **main.rs**: The event loop priority order (biased select) is a
|
||
|
|
design decision worth documenting — keyboard events beat turn results
|
||
|
|
beat render beats DMN timer. Why this order matters.
|
||
|
|
|
||
|
|
- **config.rs**: The system/context split rationale is documented well
|
||
|
|
in comments, but the memory file priority ordering should reference
|
||
|
|
load-memory.sh explicitly (it does, but buried — make it the first
|
||
|
|
thing someone sees in `load_memory_files()`).
|
||
|
|
|
||
|
|
**→ Done:** Created `.claude/design.md` as the top-level reference
|
||
|
|
doc covering all of the above.
|
||
|
|
|
||
|
|
## 9. Things that are well-designed — don't change these
|
||
|
|
|
||
|
|
- **The DMN state machine.** Simple, correct, and the prompts are
|
||
|
|
well-crafted. The gradual ramp-down (Engaged→Working→Foraging→Resting)
|
||
|
|
feels right. The `DmnContext` giving the model information about user
|
||
|
|
presence and error patterns is smart.
|
||
|
|
|
||
|
|
- **Journal as compaction.** No separate summarization step. The
|
||
|
|
journal entry *is* the compression. The model writes it, the
|
||
|
|
compaction algorithm uses it. Elegant.
|
||
|
|
|
||
|
|
- **The ui_channel abstraction.** Clean separation between agent
|
||
|
|
output and TUI rendering. Makes it possible to swap TUI frameworks
|
||
|
|
or add a non-TUI interface without touching agent code.
|
||
|
|
|
||
|
|
- **Prompt caching on Anthropic.** Marking the identity prefix with
|
||
|
|
cache_control for 90% cost reduction on repeated contexts is a big
|
||
|
|
win that's invisible at the abstraction level.
|
||
|
|
|
||
|
|
- **Ephemeral journal tool calls.** Writing to disk then stripping
|
||
|
|
from context is exactly the right pattern for journaling — zero
|
||
|
|
ongoing token cost for something that's already persisted.
|
||
|
|
|
||
|
|
- **Leaked tool call recovery.** Pragmatic solution to a real problem.
|
||
|
|
Makes Qwen actually usable.
|
||
|
|
|
||
|
|
## 10. What to do next (in priority order)
|
||
|
|
|
||
|
|
1. **Write design.md** (this review + the design doc) — **DONE**
|
||
|
|
|
||
|
|
2. **Extract Session from main.rs** — reduces cognitive load, makes
|
||
|
|
slash commands cleaner, prepares for notification routing
|
||
|
|
|
||
|
|
3. **ContextPlan abstraction** — separates allocation from assembly
|
||
|
|
in build_context_window, makes the core algorithm testable and
|
||
|
|
inspectable
|
||
|
|
|
||
|
|
4. **StreamProcessor extraction** — reduces API backend duplication,
|
||
|
|
prepares for potential third backend
|
||
|
|
|
||
|
|
5. **Address documentation gaps** — file-level notes on agent.rs,
|
||
|
|
main.rs, config.rs per CLAUDE.md code standards
|
||
|
|
|
||
|
|
Everything else (Tool trait, ModelFamily enum, StatusInfo cleanup) is
|
||
|
|
low priority and should be done opportunistically when touching those
|
||
|
|
files for other reasons.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Part II: Cognitive Architecture Mapping
|
||
|
|
|
||
|
|
*Added 2026-02-24, post-design session with Kent.*
|
||
|
|
|
||
|
|
The context window cognitive architecture design (see
|
||
|
|
`~/.claude/memory/design-context-window.md`) proposes structured,
|
||
|
|
mutable regions in the context window based on Baddeley's working
|
||
|
|
memory model. This section maps those ideas to poc-agent's actual
|
||
|
|
codebase — what already supports the design, what needs to change,
|
||
|
|
and where the insertion points are.
|
||
|
|
|
||
|
|
### What already exists (more than you'd think)
|
||
|
|
|
||
|
|
**The three TUI panes ARE the Baddeley regions, physically.**
|
||
|
|
- Autonomous pane ≈ spatial awareness / DMN output (where am I, what
|
||
|
|
am I noticing)
|
||
|
|
- Conversation pane ≈ episodic context (recent exchanges, what we
|
||
|
|
decided)
|
||
|
|
- Tools pane ≈ working memory scratchpad (concrete results, data)
|
||
|
|
|
||
|
|
This wasn't designed that way — it emerged from practical needs. But
|
||
|
|
the fact that spatial separation of attention types arose naturally
|
||
|
|
suggests the cognitive architecture is capturing something real.
|
||
|
|
|
||
|
|
**The DMN is already rudimentary attention management.** It doesn't
|
||
|
|
just decide *when* to think (timer intervals) — the state machine
|
||
|
|
tracks engagement levels (Engaged → Working → Foraging → Resting)
|
||
|
|
that correspond to attention modes. The prompts adapt to the state:
|
||
|
|
focused work vs. exploration vs. rest. The cognitive architecture
|
||
|
|
extends this from "manage when to think" to "manage what to think
|
||
|
|
about and at which level."
|
||
|
|
|
||
|
|
**Journal-as-compaction is episodic consolidation.** The journal
|
||
|
|
already does what the design calls "consolidation at access time" —
|
||
|
|
when compaction happens, the model reads its recent experience and
|
||
|
|
writes a consolidated version. This is literally memory
|
||
|
|
reconsolidation. The design just makes it more intentional (trigger
|
||
|
|
on graph node access, not just context overflow).
|
||
|
|
|
||
|
|
**where-am-i.md is a flat precursor to the spatial graph.** It's
|
||
|
|
loaded first in memory files, updated manually, and provides
|
||
|
|
orientation after compaction. The design replaces this with a
|
||
|
|
graph-structured path+cursor model that's richer but serves the
|
||
|
|
same function: "where am I and what's in scope."
|
||
|
|
|
||
|
|
**The context message template is a proto-viewport.** It's assembled
|
||
|
|
once at startup from memory files + instruction files. The design
|
||
|
|
makes this dynamic — regions that update in place rather than being
|
||
|
|
loaded once and frozen.
|
||
|
|
|
||
|
|
### What needs to change
|
||
|
|
|
||
|
|
**1. Context assembly must become region-aware**
|
||
|
|
|
||
|
|
Current: `build_context_window()` treats context as a linear sequence
|
||
|
|
(identity → journal → conversation) with token budgets. There's no
|
||
|
|
concept of independently mutable regions.
|
||
|
|
|
||
|
|
Needed: The context window becomes a collection of named regions, each
|
||
|
|
with its own update logic:
|
||
|
|
|
||
|
|
```rust
|
||
|
|
struct ContextRegion {
|
||
|
|
name: String, // "spatial", "working_stack", "episodic"
|
||
|
|
content: String, // current rendered content
|
||
|
|
budget: TokenBudget, // min/max/priority
|
||
|
|
dirty: bool, // needs re-render
|
||
|
|
}
|
||
|
|
|
||
|
|
struct ContextWindow {
|
||
|
|
regions: Vec<ContextRegion>,
|
||
|
|
total_budget: usize,
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
The key insight from the design: **updates overwrite, not append.**
|
||
|
|
Updating spatial awareness doesn't cost tokens — it replaces the
|
||
|
|
previous version. This means we can update every turn if useful,
|
||
|
|
which is impossible in the current append-only message model.
|
||
|
|
|
||
|
|
**Insertion point:** `build_context_window()` in agent.rs (lines
|
||
|
|
691-820). This is the natural place to introduce region-aware
|
||
|
|
assembly. The existing journal/conversation split already hints at
|
||
|
|
regions — making it explicit is a refactor, not a rewrite.
|
||
|
|
|
||
|
|
The ContextPlan abstraction from section 5 above is the stepping
|
||
|
|
stone. Get the plan/assemble split working first, then extend
|
||
|
|
ContextPlan to support named regions.
|
||
|
|
|
||
|
|
**2. The spatial graph needs a home**
|
||
|
|
|
||
|
|
Current: poc-memory stores nodes + edges in `~/.claude/memory/` files.
|
||
|
|
The graph is external to poc-agent — accessed via the `poc-memory`
|
||
|
|
CLI tool.
|
||
|
|
|
||
|
|
Needed: The spatial graph should be a first-class poc-agent concept,
|
||
|
|
not an external tool. The agent needs to:
|
||
|
|
- Know its current position in the graph (path + cursor)
|
||
|
|
- Render a viewport (local neighborhood) into the spatial region
|
||
|
|
- Navigate (move cursor, expand/contract viewport)
|
||
|
|
- Update edges as it discovers connections
|
||
|
|
|
||
|
|
**Options:**
|
||
|
|
1. **Inline the graph:** Rust graph library (petgraph) inside
|
||
|
|
poc-agent. Full control, fast traversal, centrality computation.
|
||
|
|
But duplicates poc-memory's data.
|
||
|
|
2. **Library extraction:** Factor poc-memory's graph operations into
|
||
|
|
a shared Rust library. poc-agent and poc-memory both use it.
|
||
|
|
No duplication, clean separation.
|
||
|
|
3. **Keep external, add protocol:** poc-agent calls poc-memory
|
||
|
|
commands for graph operations. Simple, no code sharing needed.
|
||
|
|
But adds latency and process spawning per operation.
|
||
|
|
|
||
|
|
Recommendation: Option 2 (library extraction). The graph IS the
|
||
|
|
memory system — it shouldn't be behind a process boundary. But
|
||
|
|
poc-memory's CLI remains useful for manual inspection.
|
||
|
|
|
||
|
|
**Insertion point:** New module `src/spatial.rs` or `src/graph.rs`.
|
||
|
|
Loaded on startup, serialized to disk, rendered into the spatial
|
||
|
|
context region each turn. Navigation via a new `move` tool or
|
||
|
|
automatic on tool results (file reads update cursor to that file's
|
||
|
|
graph node).
|
||
|
|
|
||
|
|
**3. Viewport serialization needs session support**
|
||
|
|
|
||
|
|
Current: Sessions save conversation.jsonl (message log) and
|
||
|
|
current.json (snapshot). Compaction rebuilds from these.
|
||
|
|
|
||
|
|
Needed: Sessions also save viewport state — path, cursor positions,
|
||
|
|
working stack, gathered context. This is the "task switching" feature
|
||
|
|
from the design.
|
||
|
|
|
||
|
|
```rust
|
||
|
|
struct Viewport {
|
||
|
|
path: Vec<NodeId>, // root to current position
|
||
|
|
cursors: Vec<NodeId>, // multiple attention points
|
||
|
|
working_stack: Vec<WorkItem>,
|
||
|
|
hypotheses: Vec<String>, // what we're trying / ruled out
|
||
|
|
next_action: Option<String>,
|
||
|
|
gathered_context: Vec<(String, String)>, // (label, content)
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Insertion point:** Session save/restore in main.rs. The Viewport
|
||
|
|
struct serializes alongside the conversation log. On restore, the
|
||
|
|
viewport positions the agent in the graph and populates the structured
|
||
|
|
regions, while the conversation log populates the episodic region.
|
||
|
|
|
||
|
|
The existing `/save` and `/new` commands become `/save` (save viewport
|
||
|
|
+ log) and `/switch <task>` (save current viewport, load another).
|
||
|
|
`/new` creates a fresh viewport at the graph root.
|
||
|
|
|
||
|
|
**4. Region-aware compaction replaces blunt rebuilding**
|
||
|
|
|
||
|
|
Current: Compaction is all-or-nothing. Hit the threshold → rebuild
|
||
|
|
everything from journal + recent messages. The model doesn't control
|
||
|
|
what's kept.
|
||
|
|
|
||
|
|
Needed: Compaction becomes region-specific. The episodic region
|
||
|
|
(conversation) still gets the journal treatment. But structured
|
||
|
|
regions (spatial, working stack) are never "compacted" — they're
|
||
|
|
overwritten by definition. The graph IS the long-term memory; it
|
||
|
|
doesn't need summarization.
|
||
|
|
|
||
|
|
This means compaction gets cheaper over time. As more of the context
|
||
|
|
window is structured (spatial, stack, gathered context), less of it
|
||
|
|
is ephemeral conversation that needs journal-compression. The stable
|
||
|
|
regions persist across compaction unchanged.
|
||
|
|
|
||
|
|
**Insertion point:** `compact()` in agent.rs. Instead of rebuilding
|
||
|
|
everything, it preserves structured regions and only compacts the
|
||
|
|
episodic region. The ContextPlan gains a `preserved` list — regions
|
||
|
|
that survive compaction intact.
|
||
|
|
|
||
|
|
### What we get
|
||
|
|
|
||
|
|
The payoff is dimensional. Each change is useful independently, but
|
||
|
|
together they create something qualitatively different:
|
||
|
|
|
||
|
|
- **Spatial graph** → I always know where I am in the work, at
|
||
|
|
multiple levels of abstraction simultaneously
|
||
|
|
- **Overwrite regions** → Maintaining awareness is free, not a
|
||
|
|
growing token cost
|
||
|
|
- **Viewport serialization** → Task switching is lossless and
|
||
|
|
instant. Interruptions don't destroy state.
|
||
|
|
- **Region-aware compaction** → Compaction preserves structured
|
||
|
|
knowledge. Only ephemeral conversation compresses.
|
||
|
|
- **Working stack** → Explicit priority tracking instead of hoping
|
||
|
|
the model remembers what matters
|
||
|
|
|
||
|
|
And the deeper thing: the graph IS the memory system. Every
|
||
|
|
poc-memory node is a navigable place. Memory search becomes "where
|
||
|
|
in the graph is this?" instead of "grep through files." The context
|
||
|
|
window becomes a viewport sliding over a persistent territory.
|
||
|
|
|
||
|
|
### Implementation order
|
||
|
|
|
||
|
|
1. **ContextPlan abstraction** (section 5 above) — prerequisite for
|
||
|
|
everything else. Separate allocation from assembly.
|
||
|
|
2. **Named regions** — extend ContextPlan with named, independently
|
||
|
|
updatable regions. Start with three: spatial (where-am-i.md
|
||
|
|
content), working_stack (manual), episodic (conversation).
|
||
|
|
3. **Overwrite semantics** — regions update in place instead of
|
||
|
|
appending. The spatial region is the proof of concept: update it
|
||
|
|
every turn, measure token cost (should be zero net).
|
||
|
|
4. **Graph integration** — bring the poc-memory graph into poc-agent
|
||
|
|
as a library. Render viewport into spatial region.
|
||
|
|
5. **Viewport save/restore** — serialize viewport on /switch, restore
|
||
|
|
on /resume. This is the task switching payoff.
|
||
|
|
6. **Region-aware compaction** — structured regions survive
|
||
|
|
compaction. Episodic region gets journal treatment. Structured
|
||
|
|
regions persist unchanged.
|
||
|
|
|
||
|
|
Steps 1-3 can be done in a weekend. Steps 4-5 are a larger project
|
||
|
|
(graph library extraction). Step 6 follows naturally once regions
|
||
|
|
exist.
|
||
|
|
|
||
|
|
### Risks and open questions
|
||
|
|
|
||
|
|
- **Token overhead of structured regions.** If the spatial viewport
|
||
|
|
is 2K tokens and the working stack is 500 tokens, that's 2.5K
|
||
|
|
tokens reserved every turn. On a 200K context window that's ~1%.
|
||
|
|
On a 32K window (local models) it's ~8%. Need to measure actual
|
||
|
|
utility vs cost per model size.
|
||
|
|
|
||
|
|
- **Graph size.** Centrality computation is O(V*E) for betweenness.
|
||
|
|
If the graph has 10K nodes (plausible for a full memory + codebase
|
||
|
|
map), this could take seconds. May need approximate centrality or
|
||
|
|
cached computation with incremental updates.
|
||
|
|
|
||
|
|
- **Overwrite fidelity.** The API expects messages as a sequence.
|
||
|
|
"Overwriting" a region means either: (a) rebuilding the message
|
||
|
|
array each turn with updated region content, or (b) using a mutable
|
||
|
|
system message / context message that gets replaced. Option (b)
|
||
|
|
is simpler but depends on API behavior with changing system
|
||
|
|
prompts mid-conversation.
|
||
|
|
|
||
|
|
- **What are ALL the regions?** Kent asked this. Baddeley gives us
|
||
|
|
three (visuospatial, phonological, episodic buffer + central
|
||
|
|
executive). We've mapped spatial, working stack, episodic. Are
|
||
|
|
there others? Candidates: emotional state (amygdala readout, future),
|
||
|
|
social context (who's present, their recent activity), sensory
|
||
|
|
buffer (recent tool outputs, pending notifications). Worth exploring
|
||
|
|
but not blocking on — start with three, add as needed.
|