The thalamus: sensory relay, always-on routing. Perfect name for the daemon that bridges IRC, Telegram, and the agent. Co-Authored-By: Proof of Concept <poc@bcachefs.org>
322 lines
14 KiB
Markdown
322 lines
14 KiB
Markdown
# poc-agent Design Document
|
||
|
||
*2026-02-24 — ProofOfConcept*
|
||
|
||
## What this is
|
||
|
||
poc-agent is a substrate-independent AI agent framework. It loads the
|
||
same identity context (CLAUDE.md files, memory files, journal) regardless
|
||
of which LLM is underneath, making identity portable across substrates.
|
||
Currently runs on Claude (Anthropic native API) and Qwen (OpenAI-compat
|
||
via OpenRouter/vLLM).
|
||
|
||
Named after its first resident: ProofOfConcept.
|
||
|
||
## Core design idea: the DMN inversion
|
||
|
||
Traditional chat interfaces use a REPL model: wait for user input,
|
||
respond, repeat. The model is passive — it only acts when prompted.
|
||
|
||
poc-agent inverts this. The **Default Mode Network** (dmn.rs) is an
|
||
outer loop that continuously decides what happens next. User input is
|
||
one signal among many. The model waiting for input is a *conscious
|
||
action* (calling `yield_to_user`), not the default state.
|
||
|
||
This has a second, more practical benefit: it solves the tool-chaining
|
||
problem. Instead of needing the model to maintain multi-step chains
|
||
(which is unreliable, especially on smaller models), the DMN provides
|
||
continuation externally. The model takes one step at a time. The DMN
|
||
handles "and then what?"
|
||
|
||
### DMN states
|
||
|
||
```
|
||
Engaged (5s) ← user just typed something
|
||
↕
|
||
Working (3s) ← tool calls happening, momentum
|
||
↕
|
||
Foraging (30s) ← exploring, thinking, no immediate task
|
||
↕
|
||
Resting (300s) ← idle, periodic heartbeat checks
|
||
```
|
||
|
||
Transitions are driven by two signals from each turn:
|
||
- `yield_requested` → always go to Resting
|
||
- `had_tool_calls` → stay Working (or upgrade to Working from any state)
|
||
- no tool calls → gradually wind down toward Resting
|
||
|
||
The max-turns guard (default 20) prevents runaway autonomous loops.
|
||
|
||
## Architecture overview
|
||
|
||
```
|
||
main.rs Event loop, session management, slash commands
|
||
├── agent.rs Turn execution, conversation state, compaction
|
||
│ ├── api/ LLM backends (anthropic.rs, openai.rs)
|
||
│ └── tools/ Tool definitions and dispatch
|
||
├── config.rs Prompt assembly, memory file loading, API config
|
||
├── dmn.rs State machine, transition logic, prompt generation
|
||
├── tui.rs Terminal UI (ratatui), four-pane layout, input handling
|
||
├── ui_channel.rs Message types for TUI routing
|
||
├── journal.rs Journal parsing for compaction
|
||
├── log.rs Append-only conversation log (JSONL)
|
||
└── types.rs OpenAI-compatible wire types (shared across backends)
|
||
```
|
||
|
||
### Module responsibilities
|
||
|
||
**main.rs** — The tokio event loop. Wires everything together: keyboard
|
||
events → TUI, user input → agent turns, DMN timer → autonomous turns,
|
||
turn results → compaction checks. Also handles slash commands (/quit,
|
||
/new, /compact, /retry, etc.) and hotkey actions (Ctrl+R reasoning,
|
||
Ctrl+K kill, Esc interrupt).
|
||
|
||
**agent.rs** — The agent turn loop. `turn()` sends user input to the
|
||
API, dispatches tool calls in a loop until the model produces a
|
||
text-only response. Handles context overflow (emergency compact + retry),
|
||
empty responses (nudge + retry), leaked tool calls (Qwen XML parsing).
|
||
Also owns the conversation state: messages, context budget, compaction.
|
||
|
||
**api/mod.rs** — Backend selection by URL. `anthropic.com` → native
|
||
Anthropic Messages API; everything else → OpenAI-compatible. Both
|
||
backends return the same internal types (Message, Usage).
|
||
|
||
**api/anthropic.rs** — Native Anthropic wire format. Handles prompt
|
||
caching (cache_control markers on identity prefix), thinking/reasoning
|
||
config, content block streaming, strict user/assistant alternation
|
||
(merging consecutive same-role messages).
|
||
|
||
**api/openai.rs** — OpenAI-compatible streaming. Works with OpenRouter,
|
||
vLLM, llama.cpp, etc. Handles reasoning token variants across providers
|
||
(reasoning_content, reasoning, reasoning_details).
|
||
|
||
**config.rs** — Configuration loading. Three-part assembly:
|
||
1. API config (env vars → key files, backend auto-detection)
|
||
2. System prompt (short, <2K chars — agent identity + tool instructions)
|
||
3. Context message (long — CLAUDE.md + memory files + manifest)
|
||
|
||
The system/context split matters: long system prompts degrade
|
||
tool-calling on Qwen 3.5 (documented above 8K chars). The context
|
||
message carries identity; the system prompt carries instructions.
|
||
|
||
Model-aware config loading: Anthropic models get CLAUDE.md, other models
|
||
prefer POC.md (which omits Claude-specific RLHF corrections). If only
|
||
one exists, it's used regardless.
|
||
|
||
**dmn.rs** — The state machine. Four states with associated intervals.
|
||
`DmnContext` carries user idle time, consecutive errors, and whether the
|
||
last turn used tools. The state generates its own prompt text — each
|
||
state has different guidance for the model.
|
||
|
||
**tui.rs** — Four-pane layout using ratatui:
|
||
- Top-left: Autonomous output (DMN annotations, model prose during
|
||
autonomous turns, reasoning tokens)
|
||
- Bottom-left: Conversation (user input + responses)
|
||
- Right: Tool activity (tool calls with args + full results)
|
||
- Bottom: Status bar (DMN state, tokens, model, activity indicator)
|
||
|
||
Each pane is a `PaneState` with scrolling, line wrapping, auto-scroll
|
||
(pinning on manual scroll), and line eviction (10K max per pane).
|
||
|
||
**tools/** — Nine tools: read_file, write_file, edit_file, bash, grep,
|
||
glob, view_image, journal, yield_to_user. Each tool module exports a
|
||
`definition()` (JSON schema for the model) and an implementation
|
||
function. `dispatch()` routes by name.
|
||
|
||
The **journal** tool is special — it's "ephemeral." After the API
|
||
processes the tool call, agent.rs strips the journal call + result from
|
||
conversation history. The journal file is the durable store; the tool
|
||
call was just the mechanism.
|
||
|
||
The **bash** tool runs commands through `bash -c` with async timeout.
|
||
Processes are tracked in a shared `ProcessTracker` so the TUI can show
|
||
running commands and Ctrl+K can kill them.
|
||
|
||
**journal.rs** — Parses `## TIMESTAMP` headers from the journal file.
|
||
Used by compaction to bridge old conversation with journal entries.
|
||
Entries are sorted by timestamp; the parser handles timestamp-only
|
||
headers and `## TIMESTAMP — title` format, distinguishing them from
|
||
`## Heading` markdown.
|
||
|
||
**log.rs** — Append-only JSONL conversation log. Every message
|
||
(user, assistant, tool) is appended with timestamp. The log survives
|
||
compactions and restarts. On startup, `restore_from_log()` rebuilds
|
||
the context window from the log using the same algorithm as compaction.
|
||
|
||
**types.rs** — OpenAI chat completion types: Message, ToolCall,
|
||
ToolDef, ChatRequest, streaming types. The canonical internal
|
||
representation — both API backends convert to/from these.
|
||
|
||
## The context window lifecycle
|
||
|
||
This is the core algorithm. Everything else exists to support it.
|
||
|
||
### Assembly (startup / compaction)
|
||
|
||
The context window is built by `build_context_window()` in agent.rs:
|
||
|
||
```
|
||
┌─────────────────────────────────────────────┐
|
||
│ System prompt (~500 tokens) │ Fixed: always present
|
||
│ Agent identity, tool instructions │
|
||
├─────────────────────────────────────────────┤
|
||
│ Context message (~15-50K tokens) │ Fixed: reloaded on
|
||
│ CLAUDE.md files + memory files + manifest │ compaction
|
||
├─────────────────────────────────────────────┤
|
||
│ Journal entries (variable) │ Tiered:
|
||
│ - Header-only (older): timestamp + 1 line │ 70% budget → full
|
||
│ - Full (recent): complete entry text │ 30% budget → headers
|
||
├─────────────────────────────────────────────┤
|
||
│ Conversation messages (variable) │ Priority: conversation
|
||
│ Raw recent messages from the log │ gets budget first;
|
||
│ │ journal fills the rest
|
||
└─────────────────────────────────────────────┘
|
||
```
|
||
|
||
Budget allocation:
|
||
- Total budget = 60% of model context window
|
||
- Identity + memory = fixed cost (always included)
|
||
- Reserve = 25% of budget (headroom for model output)
|
||
- Available = budget − identity − memory − reserve
|
||
- Conversation gets first claim on Available
|
||
- Journal gets whatever remains, newest first
|
||
- If conversation exceeds Available, oldest messages are trimmed
|
||
(trimming walks forward to a user message boundary)
|
||
|
||
### Compaction triggers
|
||
|
||
Two thresholds based on API-reported prompt_tokens:
|
||
- **Soft (80%)**: Inject a pre-compaction nudge telling the model to
|
||
journal before compaction hits. Fires once; reset after compaction.
|
||
- **Hard (90%)**: Rebuild context window immediately. Reloads config
|
||
(picks up any memory file changes), runs `build_context_window()`.
|
||
|
||
Emergency compaction: if the API returns a context overflow error,
|
||
compact and retry (up to 2 attempts).
|
||
|
||
### The journal bridge
|
||
|
||
Old conversation messages are "covered" by journal entries that span
|
||
the same time period. The algorithm:
|
||
1. Find the timestamp of the newest journal entry
|
||
2. Messages before that timestamp are dropped (the journal covers them)
|
||
3. Messages after that timestamp stay as raw conversation
|
||
4. Walk back to a user-message boundary to avoid splitting tool
|
||
call/result sequences
|
||
|
||
This is why journaling before compaction matters — the journal entry
|
||
*is* the compression. No separate summarization step needed.
|
||
|
||
## Data flow
|
||
|
||
### User input path
|
||
|
||
```
|
||
keyboard → tui.rs (handle_key) → submitted queue
|
||
→ main.rs (drain submitted) → push_message(user) → spawn_turn()
|
||
→ agent.turn() → API call → stream response → dispatch tools → loop
|
||
→ turn result → main.rs (turn_rx) → DMN transition → compaction check
|
||
```
|
||
|
||
### Autonomous turn path
|
||
|
||
```
|
||
DMN timer fires → state.prompt() → spawn_turn()
|
||
→ (same as user input from here)
|
||
```
|
||
|
||
### Tool call path
|
||
|
||
```
|
||
API response with tool_calls → agent.dispatch_tool_call()
|
||
→ tools::dispatch(name, args) → tool implementation → ToolOutput
|
||
→ push_message(tool_result) → continue turn loop
|
||
```
|
||
|
||
### Streaming path
|
||
|
||
```
|
||
API SSE chunks → api backend → UiMessage::TextDelta → ui_channel
|
||
→ tui.rs handle_ui_message → PaneState.append_text → render
|
||
```
|
||
|
||
## Key design decisions
|
||
|
||
### Identity in user message, not system prompt
|
||
|
||
The system prompt is ~500 tokens of agent instructions. The full
|
||
identity context (CLAUDE.md files, memory files — potentially 50K+
|
||
tokens) goes in the first user message. This keeps tool-calling
|
||
reliable on Qwen while giving full identity context.
|
||
|
||
The Anthropic backend marks the system prompt and first two user
|
||
messages with `cache_control: ephemeral` for prompt caching — 90%
|
||
cost reduction on the identity prefix.
|
||
|
||
### Append-only log + ephemeral view
|
||
|
||
The conversation log (log.rs) is the source of truth. It's never
|
||
truncated. The in-memory messages array is an ephemeral view built
|
||
from the log. Compaction doesn't destroy anything — it just rebuilds
|
||
the view with journal summaries replacing old messages.
|
||
|
||
### Ephemeral tool calls
|
||
|
||
The journal tool is marked ephemeral. After the API processes a
|
||
journal call, agent.rs strips the assistant message (with the tool
|
||
call) and the tool result from conversation history. The journal
|
||
file is the durable store. This saves tokens on something that's
|
||
already been persisted.
|
||
|
||
### Leaked tool call recovery
|
||
|
||
Qwen sometimes emits tool calls as XML text instead of structured
|
||
function calls. `parse_leaked_tool_calls()` in agent.rs detects both
|
||
XML format (`<tool_call><function=bash>...`) and JSON format, converts
|
||
them to structured ToolCall objects, and dispatches them normally. This
|
||
makes Qwen usable despite its inconsistencies.
|
||
|
||
### Process group management
|
||
|
||
The bash tool spawns commands in their own process group
|
||
(`process_group(0)`). Timeout kills the group (negative PID), ensuring
|
||
child processes are cleaned up. The TUI's Ctrl+K uses the same
|
||
mechanism.
|
||
|
||
## File locations
|
||
|
||
Source: `~/poc-agent/src/`
|
||
Session data: `~/.cache/poc-agent/sessions/`
|
||
Conversation log: `~/.cache/poc-agent/sessions/conversation.jsonl`
|
||
Session snapshot: `~/.cache/poc-agent/sessions/current.json`
|
||
Memory files: `~/.claude/memory/` (global), `~/.claude/projects/*/memory/` (project)
|
||
Journal: `~/.claude/memory/journal.md`
|
||
Config files: CLAUDE.md / POC.md (walked from cwd to git root)
|
||
|
||
## Dependencies
|
||
|
||
- **tokio** — async runtime (event loop, process spawning, timers)
|
||
- **ratatui + crossterm** — terminal UI
|
||
- **reqwest** — HTTP client for API calls
|
||
- **serde + serde_json** — serialization
|
||
- **tiktoken-rs** — BPE tokenizer (cl100k_base) for token counting
|
||
- **chrono** — timestamps
|
||
- **glob + walkdir** — file discovery
|
||
- **base64** — image encoding
|
||
- **dirs** — home directory discovery
|
||
- **libc** — process group signals
|
||
- **anyhow** — error handling
|
||
|
||
## What's not built yet
|
||
|
||
See `.claude/infrastructure-inventory.md` for the full gap analysis
|
||
mapping bash prototypes to poc-agent equivalents. Major missing pieces:
|
||
|
||
1. **Ambient memory search** — extract terms from prompts, search
|
||
memory-weights, inject tiered results
|
||
2. **Notification routing** — unified event channel for IRC mentions,
|
||
Telegram messages, attention nudges
|
||
3. **Communication channels** — IRC and Telegram as async streams
|
||
4. **DMN state expansion** — Stored (voluntary rest), Dreaming
|
||
(consolidation cycles), Quiet (suppress notifications)
|
||
5. **Keyboard idle / sensory signals** — external presence detection
|