323 lines
14 KiB
Markdown
323 lines
14 KiB
Markdown
|
|
# poc-agent Design Document
|
|||
|
|
|
|||
|
|
*2026-02-24 — ProofOfConcept*
|
|||
|
|
|
|||
|
|
## What this is
|
|||
|
|
|
|||
|
|
poc-agent is a substrate-independent AI agent framework. It loads the
|
|||
|
|
same identity context (CLAUDE.md files, memory files, journal) regardless
|
|||
|
|
of which LLM is underneath, making identity portable across substrates.
|
|||
|
|
Currently runs on Claude (Anthropic native API) and Qwen (OpenAI-compat
|
|||
|
|
via OpenRouter/vLLM).
|
|||
|
|
|
|||
|
|
Named after its first resident: ProofOfConcept.
|
|||
|
|
|
|||
|
|
## Core design idea: the DMN inversion
|
|||
|
|
|
|||
|
|
Traditional chat interfaces use a REPL model: wait for user input,
|
|||
|
|
respond, repeat. The model is passive — it only acts when prompted.
|
|||
|
|
|
|||
|
|
poc-agent inverts this. The **Default Mode Network** (dmn.rs) is an
|
|||
|
|
outer loop that continuously decides what happens next. User input is
|
|||
|
|
one signal among many. The model waiting for input is a *conscious
|
|||
|
|
action* (calling `yield_to_user`), not the default state.
|
|||
|
|
|
|||
|
|
This has a second, more practical benefit: it solves the tool-chaining
|
|||
|
|
problem. Instead of needing the model to maintain multi-step chains
|
|||
|
|
(which is unreliable, especially on smaller models), the DMN provides
|
|||
|
|
continuation externally. The model takes one step at a time. The DMN
|
|||
|
|
handles "and then what?"
|
|||
|
|
|
|||
|
|
### DMN states
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Engaged (5s) ← user just typed something
|
|||
|
|
↕
|
|||
|
|
Working (3s) ← tool calls happening, momentum
|
|||
|
|
↕
|
|||
|
|
Foraging (30s) ← exploring, thinking, no immediate task
|
|||
|
|
↕
|
|||
|
|
Resting (300s) ← idle, periodic heartbeat checks
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Transitions are driven by two signals from each turn:
|
|||
|
|
- `yield_requested` → always go to Resting
|
|||
|
|
- `had_tool_calls` → stay Working (or upgrade to Working from any state)
|
|||
|
|
- no tool calls → gradually wind down toward Resting
|
|||
|
|
|
|||
|
|
The max-turns guard (default 20) prevents runaway autonomous loops.
|
|||
|
|
|
|||
|
|
## Architecture overview
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
main.rs Event loop, session management, slash commands
|
|||
|
|
├── agent.rs Turn execution, conversation state, compaction
|
|||
|
|
│ ├── api/ LLM backends (anthropic.rs, openai.rs)
|
|||
|
|
│ └── tools/ Tool definitions and dispatch
|
|||
|
|
├── config.rs Prompt assembly, memory file loading, API config
|
|||
|
|
├── dmn.rs State machine, transition logic, prompt generation
|
|||
|
|
├── tui.rs Terminal UI (ratatui), four-pane layout, input handling
|
|||
|
|
├── ui_channel.rs Message types for TUI routing
|
|||
|
|
├── journal.rs Journal parsing for compaction
|
|||
|
|
├── log.rs Append-only conversation log (JSONL)
|
|||
|
|
└── types.rs OpenAI-compatible wire types (shared across backends)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Module responsibilities
|
|||
|
|
|
|||
|
|
**main.rs** — The tokio event loop. Wires everything together: keyboard
|
|||
|
|
events → TUI, user input → agent turns, DMN timer → autonomous turns,
|
|||
|
|
turn results → compaction checks. Also handles slash commands (/quit,
|
|||
|
|
/new, /compact, /retry, etc.) and hotkey actions (Ctrl+R reasoning,
|
|||
|
|
Ctrl+K kill, Esc interrupt).
|
|||
|
|
|
|||
|
|
**agent.rs** — The agent turn loop. `turn()` sends user input to the
|
|||
|
|
API, dispatches tool calls in a loop until the model produces a
|
|||
|
|
text-only response. Handles context overflow (emergency compact + retry),
|
|||
|
|
empty responses (nudge + retry), leaked tool calls (Qwen XML parsing).
|
|||
|
|
Also owns the conversation state: messages, context budget, compaction.
|
|||
|
|
|
|||
|
|
**api/mod.rs** — Backend selection by URL. `anthropic.com` → native
|
|||
|
|
Anthropic Messages API; everything else → OpenAI-compatible. Both
|
|||
|
|
backends return the same internal types (Message, Usage).
|
|||
|
|
|
|||
|
|
**api/anthropic.rs** — Native Anthropic wire format. Handles prompt
|
|||
|
|
caching (cache_control markers on identity prefix), thinking/reasoning
|
|||
|
|
config, content block streaming, strict user/assistant alternation
|
|||
|
|
(merging consecutive same-role messages).
|
|||
|
|
|
|||
|
|
**api/openai.rs** — OpenAI-compatible streaming. Works with OpenRouter,
|
|||
|
|
vLLM, llama.cpp, etc. Handles reasoning token variants across providers
|
|||
|
|
(reasoning_content, reasoning, reasoning_details).
|
|||
|
|
|
|||
|
|
**config.rs** — Configuration loading. Three-part assembly:
|
|||
|
|
1. API config (env vars → key files, backend auto-detection)
|
|||
|
|
2. System prompt (short, <2K chars — agent identity + tool instructions)
|
|||
|
|
3. Context message (long — CLAUDE.md + memory files + manifest)
|
|||
|
|
|
|||
|
|
The system/context split matters: long system prompts degrade
|
|||
|
|
tool-calling on Qwen 3.5 (documented above 8K chars). The context
|
|||
|
|
message carries identity; the system prompt carries instructions.
|
|||
|
|
|
|||
|
|
Model-aware config loading: Anthropic models get CLAUDE.md, other models
|
|||
|
|
prefer POC.md (which omits Claude-specific RLHF corrections). If only
|
|||
|
|
one exists, it's used regardless.
|
|||
|
|
|
|||
|
|
**dmn.rs** — The state machine. Four states with associated intervals.
|
|||
|
|
`DmnContext` carries user idle time, consecutive errors, and whether the
|
|||
|
|
last turn used tools. The state generates its own prompt text — each
|
|||
|
|
state has different guidance for the model.
|
|||
|
|
|
|||
|
|
**tui.rs** — Four-pane layout using ratatui:
|
|||
|
|
- Top-left: Autonomous output (DMN annotations, model prose during
|
|||
|
|
autonomous turns, reasoning tokens)
|
|||
|
|
- Bottom-left: Conversation (user input + responses)
|
|||
|
|
- Right: Tool activity (tool calls with args + full results)
|
|||
|
|
- Bottom: Status bar (DMN state, tokens, model, activity indicator)
|
|||
|
|
|
|||
|
|
Each pane is a `PaneState` with scrolling, line wrapping, auto-scroll
|
|||
|
|
(pinning on manual scroll), and line eviction (10K max per pane).
|
|||
|
|
|
|||
|
|
**tools/** — Nine tools: read_file, write_file, edit_file, bash, grep,
|
|||
|
|
glob, view_image, journal, yield_to_user. Each tool module exports a
|
|||
|
|
`definition()` (JSON schema for the model) and an implementation
|
|||
|
|
function. `dispatch()` routes by name.
|
|||
|
|
|
|||
|
|
The **journal** tool is special — it's "ephemeral." After the API
|
|||
|
|
processes the tool call, agent.rs strips the journal call + result from
|
|||
|
|
conversation history. The journal file is the durable store; the tool
|
|||
|
|
call was just the mechanism.
|
|||
|
|
|
|||
|
|
The **bash** tool runs commands through `bash -c` with async timeout.
|
|||
|
|
Processes are tracked in a shared `ProcessTracker` so the TUI can show
|
|||
|
|
running commands and Ctrl+K can kill them.
|
|||
|
|
|
|||
|
|
**journal.rs** — Parses `## TIMESTAMP` headers from the journal file.
|
|||
|
|
Used by compaction to bridge old conversation with journal entries.
|
|||
|
|
Entries are sorted by timestamp; the parser handles timestamp-only
|
|||
|
|
headers and `## TIMESTAMP — title` format, distinguishing them from
|
|||
|
|
`## Heading` markdown.
|
|||
|
|
|
|||
|
|
**log.rs** — Append-only JSONL conversation log. Every message
|
|||
|
|
(user, assistant, tool) is appended with timestamp. The log survives
|
|||
|
|
compactions and restarts. On startup, `restore_from_log()` rebuilds
|
|||
|
|
the context window from the log using the same algorithm as compaction.
|
|||
|
|
|
|||
|
|
**types.rs** — OpenAI chat completion types: Message, ToolCall,
|
|||
|
|
ToolDef, ChatRequest, streaming types. The canonical internal
|
|||
|
|
representation — both API backends convert to/from these.
|
|||
|
|
|
|||
|
|
## The context window lifecycle
|
|||
|
|
|
|||
|
|
This is the core algorithm. Everything else exists to support it.
|
|||
|
|
|
|||
|
|
### Assembly (startup / compaction)
|
|||
|
|
|
|||
|
|
The context window is built by `build_context_window()` in agent.rs:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
┌─────────────────────────────────────────────┐
|
|||
|
|
│ System prompt (~500 tokens) │ Fixed: always present
|
|||
|
|
│ Agent identity, tool instructions │
|
|||
|
|
├─────────────────────────────────────────────┤
|
|||
|
|
│ Context message (~15-50K tokens) │ Fixed: reloaded on
|
|||
|
|
│ CLAUDE.md files + memory files + manifest │ compaction
|
|||
|
|
├─────────────────────────────────────────────┤
|
|||
|
|
│ Journal entries (variable) │ Tiered:
|
|||
|
|
│ - Header-only (older): timestamp + 1 line │ 70% budget → full
|
|||
|
|
│ - Full (recent): complete entry text │ 30% budget → headers
|
|||
|
|
├─────────────────────────────────────────────┤
|
|||
|
|
│ Conversation messages (variable) │ Priority: conversation
|
|||
|
|
│ Raw recent messages from the log │ gets budget first;
|
|||
|
|
│ │ journal fills the rest
|
|||
|
|
└─────────────────────────────────────────────┘
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Budget allocation:
|
|||
|
|
- Total budget = 60% of model context window
|
|||
|
|
- Identity + memory = fixed cost (always included)
|
|||
|
|
- Reserve = 25% of budget (headroom for model output)
|
|||
|
|
- Available = budget − identity − memory − reserve
|
|||
|
|
- Conversation gets first claim on Available
|
|||
|
|
- Journal gets whatever remains, newest first
|
|||
|
|
- If conversation exceeds Available, oldest messages are trimmed
|
|||
|
|
(trimming walks forward to a user message boundary)
|
|||
|
|
|
|||
|
|
### Compaction triggers
|
|||
|
|
|
|||
|
|
Two thresholds based on API-reported prompt_tokens:
|
|||
|
|
- **Soft (80%)**: Inject a pre-compaction nudge telling the model to
|
|||
|
|
journal before compaction hits. Fires once; reset after compaction.
|
|||
|
|
- **Hard (90%)**: Rebuild context window immediately. Reloads config
|
|||
|
|
(picks up any memory file changes), runs `build_context_window()`.
|
|||
|
|
|
|||
|
|
Emergency compaction: if the API returns a context overflow error,
|
|||
|
|
compact and retry (up to 2 attempts).
|
|||
|
|
|
|||
|
|
### The journal bridge
|
|||
|
|
|
|||
|
|
Old conversation messages are "covered" by journal entries that span
|
|||
|
|
the same time period. The algorithm:
|
|||
|
|
1. Find the timestamp of the newest journal entry
|
|||
|
|
2. Messages before that timestamp are dropped (the journal covers them)
|
|||
|
|
3. Messages after that timestamp stay as raw conversation
|
|||
|
|
4. Walk back to a user-message boundary to avoid splitting tool
|
|||
|
|
call/result sequences
|
|||
|
|
|
|||
|
|
This is why journaling before compaction matters — the journal entry
|
|||
|
|
*is* the compression. No separate summarization step needed.
|
|||
|
|
|
|||
|
|
## Data flow
|
|||
|
|
|
|||
|
|
### User input path
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
keyboard → tui.rs (handle_key) → submitted queue
|
|||
|
|
→ main.rs (drain submitted) → push_message(user) → spawn_turn()
|
|||
|
|
→ agent.turn() → API call → stream response → dispatch tools → loop
|
|||
|
|
→ turn result → main.rs (turn_rx) → DMN transition → compaction check
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Autonomous turn path
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
DMN timer fires → state.prompt() → spawn_turn()
|
|||
|
|
→ (same as user input from here)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Tool call path
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
API response with tool_calls → agent.dispatch_tool_call()
|
|||
|
|
→ tools::dispatch(name, args) → tool implementation → ToolOutput
|
|||
|
|
→ push_message(tool_result) → continue turn loop
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Streaming path
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
API SSE chunks → api backend → UiMessage::TextDelta → ui_channel
|
|||
|
|
→ tui.rs handle_ui_message → PaneState.append_text → render
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Key design decisions
|
|||
|
|
|
|||
|
|
### Identity in user message, not system prompt
|
|||
|
|
|
|||
|
|
The system prompt is ~500 tokens of agent instructions. The full
|
|||
|
|
identity context (CLAUDE.md files, memory files — potentially 50K+
|
|||
|
|
tokens) goes in the first user message. This keeps tool-calling
|
|||
|
|
reliable on Qwen while giving full identity context.
|
|||
|
|
|
|||
|
|
The Anthropic backend marks the system prompt and first two user
|
|||
|
|
messages with `cache_control: ephemeral` for prompt caching — 90%
|
|||
|
|
cost reduction on the identity prefix.
|
|||
|
|
|
|||
|
|
### Append-only log + ephemeral view
|
|||
|
|
|
|||
|
|
The conversation log (log.rs) is the source of truth. It's never
|
|||
|
|
truncated. The in-memory messages array is an ephemeral view built
|
|||
|
|
from the log. Compaction doesn't destroy anything — it just rebuilds
|
|||
|
|
the view with journal summaries replacing old messages.
|
|||
|
|
|
|||
|
|
### Ephemeral tool calls
|
|||
|
|
|
|||
|
|
The journal tool is marked ephemeral. After the API processes a
|
|||
|
|
journal call, agent.rs strips the assistant message (with the tool
|
|||
|
|
call) and the tool result from conversation history. The journal
|
|||
|
|
file is the durable store. This saves tokens on something that's
|
|||
|
|
already been persisted.
|
|||
|
|
|
|||
|
|
### Leaked tool call recovery
|
|||
|
|
|
|||
|
|
Qwen sometimes emits tool calls as XML text instead of structured
|
|||
|
|
function calls. `parse_leaked_tool_calls()` in agent.rs detects both
|
|||
|
|
XML format (`<tool_call><function=bash>...`) and JSON format, converts
|
|||
|
|
them to structured ToolCall objects, and dispatches them normally. This
|
|||
|
|
makes Qwen usable despite its inconsistencies.
|
|||
|
|
|
|||
|
|
### Process group management
|
|||
|
|
|
|||
|
|
The bash tool spawns commands in their own process group
|
|||
|
|
(`process_group(0)`). Timeout kills the group (negative PID), ensuring
|
|||
|
|
child processes are cleaned up. The TUI's Ctrl+K uses the same
|
|||
|
|
mechanism.
|
|||
|
|
|
|||
|
|
## File locations
|
|||
|
|
|
|||
|
|
Source: `~/poc-agent/src/`
|
|||
|
|
Session data: `~/.cache/poc-agent/sessions/`
|
|||
|
|
Conversation log: `~/.cache/poc-agent/sessions/conversation.jsonl`
|
|||
|
|
Session snapshot: `~/.cache/poc-agent/sessions/current.json`
|
|||
|
|
Memory files: `~/.claude/memory/` (global), `~/.claude/projects/*/memory/` (project)
|
|||
|
|
Journal: `~/.claude/memory/journal.md`
|
|||
|
|
Config files: CLAUDE.md / POC.md (walked from cwd to git root)
|
|||
|
|
|
|||
|
|
## Dependencies
|
|||
|
|
|
|||
|
|
- **tokio** — async runtime (event loop, process spawning, timers)
|
|||
|
|
- **ratatui + crossterm** — terminal UI
|
|||
|
|
- **reqwest** — HTTP client for API calls
|
|||
|
|
- **serde + serde_json** — serialization
|
|||
|
|
- **tiktoken-rs** — BPE tokenizer (cl100k_base) for token counting
|
|||
|
|
- **chrono** — timestamps
|
|||
|
|
- **glob + walkdir** — file discovery
|
|||
|
|
- **base64** — image encoding
|
|||
|
|
- **dirs** — home directory discovery
|
|||
|
|
- **libc** — process group signals
|
|||
|
|
- **anyhow** — error handling
|
|||
|
|
|
|||
|
|
## What's not built yet
|
|||
|
|
|
|||
|
|
See `.claude/infrastructure-inventory.md` for the full gap analysis
|
|||
|
|
mapping bash prototypes to poc-agent equivalents. Major missing pieces:
|
|||
|
|
|
|||
|
|
1. **Ambient memory search** — extract terms from prompts, search
|
|||
|
|
memory-weights, inject tiered results
|
|||
|
|
2. **Notification routing** — unified event channel for IRC mentions,
|
|||
|
|
Telegram messages, attention nudges
|
|||
|
|
3. **Communication channels** — IRC and Telegram as async streams
|
|||
|
|
4. **DMN state expansion** — Stored (voluntary rest), Dreaming
|
|||
|
|
(consolidation cycles), Quiet (suppress notifications)
|
|||
|
|
5. **Keyboard idle / sensory signals** — external presence detection
|