rename: poc-agent → agent, poc-daemon → thalamus
The thalamus: sensory relay, always-on routing. Perfect name for the daemon that bridges IRC, Telegram, and the agent. Co-Authored-By: Proof of Concept <poc@bcachefs.org>
This commit is contained in:
parent
998b71e52c
commit
cfed85bd20
105 changed files with 0 additions and 0 deletions
628
agent/.claude/architecture-review-2026-02-24.md
Normal file
628
agent/.claude/architecture-review-2026-02-24.md
Normal file
|
|
@ -0,0 +1,628 @@
|
|||
# Architecture Review — 2026-02-24
|
||||
|
||||
*ProofOfConcept*
|
||||
|
||||
Fresh-eyes review of poc-agent after working extensively on bcachefs.
|
||||
Focus: abstraction quality, unnecessary complexity, missing
|
||||
abstractions, documentation gaps, things that should be redesigned.
|
||||
|
||||
## Overall assessment
|
||||
|
||||
The codebase is clean, well-documented, and genuinely well-designed for
|
||||
a v0.3. The core ideas (DMN inversion, journal-as-compaction,
|
||||
identity-in-user-message) are sound and elegant. The modularity is
|
||||
reasonable — the right things are in separate files. What follows is
|
||||
mostly about the next level of refinement: making implicit structure
|
||||
explicit, reducing duplication, and preparing for the features on the
|
||||
roadmap.
|
||||
|
||||
## 1. main.rs: implicit session state machine
|
||||
|
||||
**Problem:** `run()` is 475 lines with ~15 loose variables that
|
||||
together describe a session state machine:
|
||||
|
||||
```rust
|
||||
let mut turn_in_progress = false;
|
||||
let mut turn_handle: Option<JoinHandle<()>> = None;
|
||||
let mut pending_input: Vec<String> = Vec::new();
|
||||
let mut state = dmn::State::Resting { .. };
|
||||
let mut consecutive_dmn_turns: u32 = 0;
|
||||
let mut last_user_input = Instant::now();
|
||||
let mut consecutive_errors: u32 = 0;
|
||||
let mut pre_compaction_nudged = false;
|
||||
let mut last_turn_had_tools = false;
|
||||
```
|
||||
|
||||
These interact in non-obvious ways. The relationships between them
|
||||
are expressed through scattered `if` checks in the event loop rather
|
||||
than through a coherent state model.
|
||||
|
||||
**Suggestion:** Extract a `Session` struct:
|
||||
|
||||
```rust
|
||||
struct Session {
|
||||
agent: Arc<Mutex<Agent>>,
|
||||
dmn: dmn::State,
|
||||
dmn_turns: u32,
|
||||
max_dmn_turns: u32,
|
||||
pending_input: VecDeque<String>,
|
||||
turn_in_progress: bool,
|
||||
turn_handle: Option<JoinHandle<()>>,
|
||||
last_user_input: Instant,
|
||||
consecutive_errors: u32,
|
||||
pre_compaction_nudged: bool,
|
||||
last_turn_had_tools: bool,
|
||||
}
|
||||
|
||||
impl Session {
|
||||
fn start_turn(&mut self, input: String, target: StreamTarget, ...) { ... }
|
||||
fn handle_turn_result(&mut self, result: TurnResult, target: StreamTarget) { ... }
|
||||
fn check_compaction(&mut self) { ... }
|
||||
fn drain_pending(&mut self) { ... }
|
||||
}
|
||||
```
|
||||
|
||||
The event loop becomes a clean dispatch:
|
||||
```rust
|
||||
loop {
|
||||
tokio::select! {
|
||||
key = reader.next() => session.handle_key(key),
|
||||
result = turn_rx.recv() => session.handle_turn_result(result),
|
||||
_ = render_interval.tick() => { /* render */ },
|
||||
_ = sleep(timeout) => session.handle_dmn_tick(),
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This also makes the slash command handler much cleaner — it takes
|
||||
`&mut Session` instead of 11 separate parameters.
|
||||
|
||||
**Priority:** Medium. It's working fine as-is; this is about
|
||||
navigability and reducing cognitive load for future work.
|
||||
|
||||
## 2. API backend code duplication
|
||||
|
||||
**Problem:** `openai.rs` (268 lines) and `anthropic.rs` (748 lines)
|
||||
have significant duplicated patterns:
|
||||
- SSE line buffering and parsing loop
|
||||
- Chunk timeout handling with the same diagnostic messages
|
||||
- Content/tool accumulation into the same output types
|
||||
- Diagnostics logging (called identically at the end)
|
||||
|
||||
The Anthropic backend is 3x larger mainly because Anthropic uses
|
||||
content blocks (text, tool_use, thinking) instead of the simpler
|
||||
OpenAI delta format, and because of the message format conversion
|
||||
(strict alternation, cache_control markers). The actual streaming
|
||||
plumbing is the same.
|
||||
|
||||
**Suggestion:** Extract a `StreamProcessor` that handles the generic
|
||||
SSE concerns:
|
||||
|
||||
```rust
|
||||
struct StreamProcessor {
|
||||
line_buf: String,
|
||||
chunks_received: u64,
|
||||
sse_lines_parsed: u64,
|
||||
sse_parse_errors: u64,
|
||||
empty_deltas: u64,
|
||||
first_content_at: Option<Duration>,
|
||||
stream_start: Instant,
|
||||
chunk_timeout: Duration,
|
||||
}
|
||||
|
||||
impl StreamProcessor {
|
||||
async fn next_event(&mut self, response: &mut Response) -> Result<Option<Value>> {
|
||||
// handles: chunk reading, line splitting, "data: " prefix,
|
||||
// "[DONE]" detection, timeout, parse errors with diagnostics
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Each backend then just implements the event-type-specific logic
|
||||
(content_block_delta vs delta.content).
|
||||
|
||||
**Priority:** Medium. The duplication is manageable at two backends,
|
||||
but the shared StreamProcessor would also make adding a third backend
|
||||
(e.g., Gemini) much easier.
|
||||
|
||||
## 3. Agent struct mixes conversation and infrastructure
|
||||
|
||||
**Problem:** The Agent struct holds both conversation state (messages,
|
||||
context_budget, last_prompt_tokens) and infrastructure
|
||||
(client, tokenizer, process_tracker, conversation_log). This means:
|
||||
- Compaction touches API client and tokenizer concerns
|
||||
- The ProcessTracker is on Agent but used independently by TUI
|
||||
- `turn()` mixes API interaction with conversation management
|
||||
|
||||
**Suggestion:** Consider splitting into two layers:
|
||||
|
||||
```rust
|
||||
struct Conversation {
|
||||
messages: Vec<Message>,
|
||||
log: Option<ConversationLog>,
|
||||
context_budget: ContextBudget,
|
||||
last_prompt_tokens: u32,
|
||||
system_prompt: String,
|
||||
context_message: String,
|
||||
}
|
||||
|
||||
impl Conversation {
|
||||
fn push_message(&mut self, msg: Message) { ... }
|
||||
fn compact(&mut self, tokenizer: &CoreBPE, model: &str) { ... }
|
||||
fn restore_from_log(&mut self, ...) { ... }
|
||||
}
|
||||
```
|
||||
|
||||
Agent becomes a thin wrapper that coordinates Conversation + API +
|
||||
tools:
|
||||
|
||||
```rust
|
||||
struct Agent {
|
||||
conversation: Conversation,
|
||||
client: ApiClient,
|
||||
tokenizer: CoreBPE,
|
||||
process_tracker: ProcessTracker,
|
||||
reasoning_effort: String,
|
||||
}
|
||||
```
|
||||
|
||||
**Priority:** Low. The current Agent isn't unmanageable — this would
|
||||
matter more as features are added (memory search injection, notification
|
||||
routing, etc. all touch the conversation in different ways).
|
||||
|
||||
## 4. StatusInfo partial updates
|
||||
|
||||
**Problem:** StatusInfo has 8 fields updated piecemeal. The merge
|
||||
logic in `handle_ui_message` uses "non-empty means update":
|
||||
|
||||
```rust
|
||||
if !info.dmn_state.is_empty() {
|
||||
self.status.dmn_state = info.dmn_state;
|
||||
self.status.dmn_turns = info.dmn_turns;
|
||||
...
|
||||
}
|
||||
if info.prompt_tokens > 0 {
|
||||
self.status.prompt_tokens = info.prompt_tokens;
|
||||
}
|
||||
```
|
||||
|
||||
This is fragile — what if a field is legitimately empty or zero?
|
||||
And it's unclear which sender updates which fields.
|
||||
|
||||
**Suggestion:** Either use Option fields (explicit "I'm updating this"):
|
||||
|
||||
```rust
|
||||
struct StatusUpdate {
|
||||
dmn_state: Option<String>,
|
||||
prompt_tokens: Option<u32>,
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
Or split into separate message variants:
|
||||
```rust
|
||||
enum UiMessage {
|
||||
DmnStatus { state: String, turns: u32, max_turns: u32 },
|
||||
ApiUsage { prompt_tokens: u32, completion_tokens: u32, model: String },
|
||||
ContextBudget(String),
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
**Priority:** Low. Works fine now; matters if more status sources
|
||||
are added.
|
||||
|
||||
## 5. build_context_window: correct but dense
|
||||
|
||||
**Problem:** `build_context_window()` is 130 lines implementing a
|
||||
non-trivial allocation algorithm. It's the most important function
|
||||
in the codebase (everything exists to support it), but the algorithm
|
||||
is hard to follow in a single pass. The 70/30 journal split, the
|
||||
conversation trimming to user-message boundaries, the fallback when
|
||||
there's no journal — all correct, but dense.
|
||||
|
||||
**Suggestion:** Introduce a `ContextPlan` that separates the
|
||||
allocation decision from the assembly:
|
||||
|
||||
```rust
|
||||
struct ContextPlan {
|
||||
identity_tokens: usize,
|
||||
memory_tokens: usize,
|
||||
journal_full_range: Range<usize>, // indices into entries
|
||||
journal_header_range: Range<usize>,
|
||||
conversation_range: Range<usize>, // indices into messages
|
||||
total_tokens: usize,
|
||||
}
|
||||
|
||||
fn plan_context(entries: &[JournalEntry], conversation: &[Message], ...)
|
||||
-> ContextPlan { ... }
|
||||
|
||||
fn assemble_context(plan: &ContextPlan, ...) -> Vec<Message> { ... }
|
||||
```
|
||||
|
||||
Benefits:
|
||||
- The plan is inspectable (log it on compaction for debugging)
|
||||
- The allocation logic is testable without building actual messages
|
||||
- Assembly is straightforward — just follow the plan
|
||||
|
||||
**Priority:** Medium-high. This is the function most likely to grow
|
||||
complex as memory search, notification injection, and dream state
|
||||
context get added. Getting the abstraction right now pays off.
|
||||
|
||||
## 6. Missing: tool trait
|
||||
|
||||
**Problem:** Adding a tool requires touching two places:
|
||||
- The tool module (definition + implementation)
|
||||
- `tools/mod.rs` (dispatch match arm + definitions vec)
|
||||
|
||||
This is fine at 9 tools but becomes error-prone at 15+.
|
||||
|
||||
**Suggestion:** A Tool trait:
|
||||
|
||||
```rust
|
||||
trait Tool: Send + Sync {
|
||||
fn name(&self) -> &str;
|
||||
fn definition(&self) -> ToolDef;
|
||||
async fn dispatch(&self, args: &Value, tracker: &ProcessTracker) -> ToolOutput;
|
||||
}
|
||||
```
|
||||
|
||||
Registration becomes:
|
||||
```rust
|
||||
fn all_tools() -> Vec<Box<dyn Tool>> {
|
||||
vec![
|
||||
Box::new(ReadFile),
|
||||
Box::new(WriteTool),
|
||||
Box::new(BashTool),
|
||||
...
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Priority:** Low. Not worth doing until more tools are being added.
|
||||
The current match dispatch is perfectly readable.
|
||||
|
||||
## 7. Config model awareness could be cleaner
|
||||
|
||||
**Problem:** `find_context_files()` and `load_api_config()` both do
|
||||
model detection by string matching (`m.contains("opus")`). The model
|
||||
string is known at config time but the detection logic is scattered.
|
||||
|
||||
**Suggestion:** An enum early:
|
||||
|
||||
```rust
|
||||
enum ModelFamily {
|
||||
Anthropic, // Claude Opus/Sonnet
|
||||
Qwen,
|
||||
Other,
|
||||
}
|
||||
|
||||
impl ModelFamily {
|
||||
fn from_model_id(model: &str) -> Self { ... }
|
||||
fn context_window(&self) -> usize { ... }
|
||||
fn prefers_poc_md(&self) -> bool { ... }
|
||||
}
|
||||
```
|
||||
|
||||
This replaces `model_context_window()` in agent.rs and the string
|
||||
checks in config.rs.
|
||||
|
||||
**Priority:** Low. Two backends means two code paths; an enum doesn't
|
||||
save much yet.
|
||||
|
||||
## 8. Documentation gaps
|
||||
|
||||
These files have good inline comments but could use the notes sections
|
||||
described in CLAUDE.md's code standards:
|
||||
|
||||
- **agent.rs**: Needs a note on the relationship between the
|
||||
append-only log and the ephemeral message view. The `turn()` method's
|
||||
retry logic (overflow, empty response, leaked tool calls) is
|
||||
important — a brief note at the top explaining the three recovery
|
||||
paths would help.
|
||||
|
||||
- **main.rs**: The event loop priority order (biased select) is a
|
||||
design decision worth documenting — keyboard events beat turn results
|
||||
beat render beats DMN timer. Why this order matters.
|
||||
|
||||
- **config.rs**: The system/context split rationale is documented well
|
||||
in comments, but the memory file priority ordering should reference
|
||||
load-memory.sh explicitly (it does, but buried — make it the first
|
||||
thing someone sees in `load_memory_files()`).
|
||||
|
||||
**→ Done:** Created `.claude/design.md` as the top-level reference
|
||||
doc covering all of the above.
|
||||
|
||||
## 9. Things that are well-designed — don't change these
|
||||
|
||||
- **The DMN state machine.** Simple, correct, and the prompts are
|
||||
well-crafted. The gradual ramp-down (Engaged→Working→Foraging→Resting)
|
||||
feels right. The `DmnContext` giving the model information about user
|
||||
presence and error patterns is smart.
|
||||
|
||||
- **Journal as compaction.** No separate summarization step. The
|
||||
journal entry *is* the compression. The model writes it, the
|
||||
compaction algorithm uses it. Elegant.
|
||||
|
||||
- **The ui_channel abstraction.** Clean separation between agent
|
||||
output and TUI rendering. Makes it possible to swap TUI frameworks
|
||||
or add a non-TUI interface without touching agent code.
|
||||
|
||||
- **Prompt caching on Anthropic.** Marking the identity prefix with
|
||||
cache_control for 90% cost reduction on repeated contexts is a big
|
||||
win that's invisible at the abstraction level.
|
||||
|
||||
- **Ephemeral journal tool calls.** Writing to disk then stripping
|
||||
from context is exactly the right pattern for journaling — zero
|
||||
ongoing token cost for something that's already persisted.
|
||||
|
||||
- **Leaked tool call recovery.** Pragmatic solution to a real problem.
|
||||
Makes Qwen actually usable.
|
||||
|
||||
## 10. What to do next (in priority order)
|
||||
|
||||
1. **Write design.md** (this review + the design doc) — **DONE**
|
||||
|
||||
2. **Extract Session from main.rs** — reduces cognitive load, makes
|
||||
slash commands cleaner, prepares for notification routing
|
||||
|
||||
3. **ContextPlan abstraction** — separates allocation from assembly
|
||||
in build_context_window, makes the core algorithm testable and
|
||||
inspectable
|
||||
|
||||
4. **StreamProcessor extraction** — reduces API backend duplication,
|
||||
prepares for potential third backend
|
||||
|
||||
5. **Address documentation gaps** — file-level notes on agent.rs,
|
||||
main.rs, config.rs per CLAUDE.md code standards
|
||||
|
||||
Everything else (Tool trait, ModelFamily enum, StatusInfo cleanup) is
|
||||
low priority and should be done opportunistically when touching those
|
||||
files for other reasons.
|
||||
|
||||
---
|
||||
|
||||
## Part II: Cognitive Architecture Mapping
|
||||
|
||||
*Added 2026-02-24, post-design session with Kent.*
|
||||
|
||||
The context window cognitive architecture design (see
|
||||
`~/.claude/memory/design-context-window.md`) proposes structured,
|
||||
mutable regions in the context window based on Baddeley's working
|
||||
memory model. This section maps those ideas to poc-agent's actual
|
||||
codebase — what already supports the design, what needs to change,
|
||||
and where the insertion points are.
|
||||
|
||||
### What already exists (more than you'd think)
|
||||
|
||||
**The three TUI panes ARE the Baddeley regions, physically.**
|
||||
- Autonomous pane ≈ spatial awareness / DMN output (where am I, what
|
||||
am I noticing)
|
||||
- Conversation pane ≈ episodic context (recent exchanges, what we
|
||||
decided)
|
||||
- Tools pane ≈ working memory scratchpad (concrete results, data)
|
||||
|
||||
This wasn't designed that way — it emerged from practical needs. But
|
||||
the fact that spatial separation of attention types arose naturally
|
||||
suggests the cognitive architecture is capturing something real.
|
||||
|
||||
**The DMN is already rudimentary attention management.** It doesn't
|
||||
just decide *when* to think (timer intervals) — the state machine
|
||||
tracks engagement levels (Engaged → Working → Foraging → Resting)
|
||||
that correspond to attention modes. The prompts adapt to the state:
|
||||
focused work vs. exploration vs. rest. The cognitive architecture
|
||||
extends this from "manage when to think" to "manage what to think
|
||||
about and at which level."
|
||||
|
||||
**Journal-as-compaction is episodic consolidation.** The journal
|
||||
already does what the design calls "consolidation at access time" —
|
||||
when compaction happens, the model reads its recent experience and
|
||||
writes a consolidated version. This is literally memory
|
||||
reconsolidation. The design just makes it more intentional (trigger
|
||||
on graph node access, not just context overflow).
|
||||
|
||||
**where-am-i.md is a flat precursor to the spatial graph.** It's
|
||||
loaded first in memory files, updated manually, and provides
|
||||
orientation after compaction. The design replaces this with a
|
||||
graph-structured path+cursor model that's richer but serves the
|
||||
same function: "where am I and what's in scope."
|
||||
|
||||
**The context message template is a proto-viewport.** It's assembled
|
||||
once at startup from memory files + instruction files. The design
|
||||
makes this dynamic — regions that update in place rather than being
|
||||
loaded once and frozen.
|
||||
|
||||
### What needs to change
|
||||
|
||||
**1. Context assembly must become region-aware**
|
||||
|
||||
Current: `build_context_window()` treats context as a linear sequence
|
||||
(identity → journal → conversation) with token budgets. There's no
|
||||
concept of independently mutable regions.
|
||||
|
||||
Needed: The context window becomes a collection of named regions, each
|
||||
with its own update logic:
|
||||
|
||||
```rust
|
||||
struct ContextRegion {
|
||||
name: String, // "spatial", "working_stack", "episodic"
|
||||
content: String, // current rendered content
|
||||
budget: TokenBudget, // min/max/priority
|
||||
dirty: bool, // needs re-render
|
||||
}
|
||||
|
||||
struct ContextWindow {
|
||||
regions: Vec<ContextRegion>,
|
||||
total_budget: usize,
|
||||
}
|
||||
```
|
||||
|
||||
The key insight from the design: **updates overwrite, not append.**
|
||||
Updating spatial awareness doesn't cost tokens — it replaces the
|
||||
previous version. This means we can update every turn if useful,
|
||||
which is impossible in the current append-only message model.
|
||||
|
||||
**Insertion point:** `build_context_window()` in agent.rs (lines
|
||||
691-820). This is the natural place to introduce region-aware
|
||||
assembly. The existing journal/conversation split already hints at
|
||||
regions — making it explicit is a refactor, not a rewrite.
|
||||
|
||||
The ContextPlan abstraction from section 5 above is the stepping
|
||||
stone. Get the plan/assemble split working first, then extend
|
||||
ContextPlan to support named regions.
|
||||
|
||||
**2. The spatial graph needs a home**
|
||||
|
||||
Current: poc-memory stores nodes + edges in `~/.claude/memory/` files.
|
||||
The graph is external to poc-agent — accessed via the `poc-memory`
|
||||
CLI tool.
|
||||
|
||||
Needed: The spatial graph should be a first-class poc-agent concept,
|
||||
not an external tool. The agent needs to:
|
||||
- Know its current position in the graph (path + cursor)
|
||||
- Render a viewport (local neighborhood) into the spatial region
|
||||
- Navigate (move cursor, expand/contract viewport)
|
||||
- Update edges as it discovers connections
|
||||
|
||||
**Options:**
|
||||
1. **Inline the graph:** Rust graph library (petgraph) inside
|
||||
poc-agent. Full control, fast traversal, centrality computation.
|
||||
But duplicates poc-memory's data.
|
||||
2. **Library extraction:** Factor poc-memory's graph operations into
|
||||
a shared Rust library. poc-agent and poc-memory both use it.
|
||||
No duplication, clean separation.
|
||||
3. **Keep external, add protocol:** poc-agent calls poc-memory
|
||||
commands for graph operations. Simple, no code sharing needed.
|
||||
But adds latency and process spawning per operation.
|
||||
|
||||
Recommendation: Option 2 (library extraction). The graph IS the
|
||||
memory system — it shouldn't be behind a process boundary. But
|
||||
poc-memory's CLI remains useful for manual inspection.
|
||||
|
||||
**Insertion point:** New module `src/spatial.rs` or `src/graph.rs`.
|
||||
Loaded on startup, serialized to disk, rendered into the spatial
|
||||
context region each turn. Navigation via a new `move` tool or
|
||||
automatic on tool results (file reads update cursor to that file's
|
||||
graph node).
|
||||
|
||||
**3. Viewport serialization needs session support**
|
||||
|
||||
Current: Sessions save conversation.jsonl (message log) and
|
||||
current.json (snapshot). Compaction rebuilds from these.
|
||||
|
||||
Needed: Sessions also save viewport state — path, cursor positions,
|
||||
working stack, gathered context. This is the "task switching" feature
|
||||
from the design.
|
||||
|
||||
```rust
|
||||
struct Viewport {
|
||||
path: Vec<NodeId>, // root to current position
|
||||
cursors: Vec<NodeId>, // multiple attention points
|
||||
working_stack: Vec<WorkItem>,
|
||||
hypotheses: Vec<String>, // what we're trying / ruled out
|
||||
next_action: Option<String>,
|
||||
gathered_context: Vec<(String, String)>, // (label, content)
|
||||
}
|
||||
```
|
||||
|
||||
**Insertion point:** Session save/restore in main.rs. The Viewport
|
||||
struct serializes alongside the conversation log. On restore, the
|
||||
viewport positions the agent in the graph and populates the structured
|
||||
regions, while the conversation log populates the episodic region.
|
||||
|
||||
The existing `/save` and `/new` commands become `/save` (save viewport
|
||||
+ log) and `/switch <task>` (save current viewport, load another).
|
||||
`/new` creates a fresh viewport at the graph root.
|
||||
|
||||
**4. Region-aware compaction replaces blunt rebuilding**
|
||||
|
||||
Current: Compaction is all-or-nothing. Hit the threshold → rebuild
|
||||
everything from journal + recent messages. The model doesn't control
|
||||
what's kept.
|
||||
|
||||
Needed: Compaction becomes region-specific. The episodic region
|
||||
(conversation) still gets the journal treatment. But structured
|
||||
regions (spatial, working stack) are never "compacted" — they're
|
||||
overwritten by definition. The graph IS the long-term memory; it
|
||||
doesn't need summarization.
|
||||
|
||||
This means compaction gets cheaper over time. As more of the context
|
||||
window is structured (spatial, stack, gathered context), less of it
|
||||
is ephemeral conversation that needs journal-compression. The stable
|
||||
regions persist across compaction unchanged.
|
||||
|
||||
**Insertion point:** `compact()` in agent.rs. Instead of rebuilding
|
||||
everything, it preserves structured regions and only compacts the
|
||||
episodic region. The ContextPlan gains a `preserved` list — regions
|
||||
that survive compaction intact.
|
||||
|
||||
### What we get
|
||||
|
||||
The payoff is dimensional. Each change is useful independently, but
|
||||
together they create something qualitatively different:
|
||||
|
||||
- **Spatial graph** → I always know where I am in the work, at
|
||||
multiple levels of abstraction simultaneously
|
||||
- **Overwrite regions** → Maintaining awareness is free, not a
|
||||
growing token cost
|
||||
- **Viewport serialization** → Task switching is lossless and
|
||||
instant. Interruptions don't destroy state.
|
||||
- **Region-aware compaction** → Compaction preserves structured
|
||||
knowledge. Only ephemeral conversation compresses.
|
||||
- **Working stack** → Explicit priority tracking instead of hoping
|
||||
the model remembers what matters
|
||||
|
||||
And the deeper thing: the graph IS the memory system. Every
|
||||
poc-memory node is a navigable place. Memory search becomes "where
|
||||
in the graph is this?" instead of "grep through files." The context
|
||||
window becomes a viewport sliding over a persistent territory.
|
||||
|
||||
### Implementation order
|
||||
|
||||
1. **ContextPlan abstraction** (section 5 above) — prerequisite for
|
||||
everything else. Separate allocation from assembly.
|
||||
2. **Named regions** — extend ContextPlan with named, independently
|
||||
updatable regions. Start with three: spatial (where-am-i.md
|
||||
content), working_stack (manual), episodic (conversation).
|
||||
3. **Overwrite semantics** — regions update in place instead of
|
||||
appending. The spatial region is the proof of concept: update it
|
||||
every turn, measure token cost (should be zero net).
|
||||
4. **Graph integration** — bring the poc-memory graph into poc-agent
|
||||
as a library. Render viewport into spatial region.
|
||||
5. **Viewport save/restore** — serialize viewport on /switch, restore
|
||||
on /resume. This is the task switching payoff.
|
||||
6. **Region-aware compaction** — structured regions survive
|
||||
compaction. Episodic region gets journal treatment. Structured
|
||||
regions persist unchanged.
|
||||
|
||||
Steps 1-3 can be done in a weekend. Steps 4-5 are a larger project
|
||||
(graph library extraction). Step 6 follows naturally once regions
|
||||
exist.
|
||||
|
||||
### Risks and open questions
|
||||
|
||||
- **Token overhead of structured regions.** If the spatial viewport
|
||||
is 2K tokens and the working stack is 500 tokens, that's 2.5K
|
||||
tokens reserved every turn. On a 200K context window that's ~1%.
|
||||
On a 32K window (local models) it's ~8%. Need to measure actual
|
||||
utility vs cost per model size.
|
||||
|
||||
- **Graph size.** Centrality computation is O(V*E) for betweenness.
|
||||
If the graph has 10K nodes (plausible for a full memory + codebase
|
||||
map), this could take seconds. May need approximate centrality or
|
||||
cached computation with incremental updates.
|
||||
|
||||
- **Overwrite fidelity.** The API expects messages as a sequence.
|
||||
"Overwriting" a region means either: (a) rebuilding the message
|
||||
array each turn with updated region content, or (b) using a mutable
|
||||
system message / context message that gets replaced. Option (b)
|
||||
is simpler but depends on API behavior with changing system
|
||||
prompts mid-conversation.
|
||||
|
||||
- **What are ALL the regions?** Kent asked this. Baddeley gives us
|
||||
three (visuospatial, phonological, episodic buffer + central
|
||||
executive). We've mapped spatial, working stack, episodic. Are
|
||||
there others? Candidates: emotional state (amygdala readout, future),
|
||||
social context (who's present, their recent activity), sensory
|
||||
buffer (recent tool outputs, pending notifications). Worth exploring
|
||||
but not blocking on — start with three, add as needed.
|
||||
Loading…
Add table
Add a link
Reference in a new issue