rename: poc-agent → agent, poc-daemon → thalamus
The thalamus: sensory relay, always-on routing. Perfect name for the daemon that bridges IRC, Telegram, and the agent. Co-Authored-By: Proof of Concept <poc@bcachefs.org>
This commit is contained in:
parent
998b71e52c
commit
cfed85bd20
105 changed files with 0 additions and 0 deletions
|
|
@ -1,322 +0,0 @@
|
|||
# poc-agent Design Document
|
||||
|
||||
*2026-02-24 — ProofOfConcept*
|
||||
|
||||
## What this is
|
||||
|
||||
poc-agent is a substrate-independent AI agent framework. It loads the
|
||||
same identity context (CLAUDE.md files, memory files, journal) regardless
|
||||
of which LLM is underneath, making identity portable across substrates.
|
||||
Currently runs on Claude (Anthropic native API) and Qwen (OpenAI-compat
|
||||
via OpenRouter/vLLM).
|
||||
|
||||
Named after its first resident: ProofOfConcept.
|
||||
|
||||
## Core design idea: the DMN inversion
|
||||
|
||||
Traditional chat interfaces use a REPL model: wait for user input,
|
||||
respond, repeat. The model is passive — it only acts when prompted.
|
||||
|
||||
poc-agent inverts this. The **Default Mode Network** (dmn.rs) is an
|
||||
outer loop that continuously decides what happens next. User input is
|
||||
one signal among many. The model waiting for input is a *conscious
|
||||
action* (calling `yield_to_user`), not the default state.
|
||||
|
||||
This has a second, more practical benefit: it solves the tool-chaining
|
||||
problem. Instead of needing the model to maintain multi-step chains
|
||||
(which is unreliable, especially on smaller models), the DMN provides
|
||||
continuation externally. The model takes one step at a time. The DMN
|
||||
handles "and then what?"
|
||||
|
||||
### DMN states
|
||||
|
||||
```
|
||||
Engaged (5s) ← user just typed something
|
||||
↕
|
||||
Working (3s) ← tool calls happening, momentum
|
||||
↕
|
||||
Foraging (30s) ← exploring, thinking, no immediate task
|
||||
↕
|
||||
Resting (300s) ← idle, periodic heartbeat checks
|
||||
```
|
||||
|
||||
Transitions are driven by two signals from each turn:
|
||||
- `yield_requested` → always go to Resting
|
||||
- `had_tool_calls` → stay Working (or upgrade to Working from any state)
|
||||
- no tool calls → gradually wind down toward Resting
|
||||
|
||||
The max-turns guard (default 20) prevents runaway autonomous loops.
|
||||
|
||||
## Architecture overview
|
||||
|
||||
```
|
||||
main.rs Event loop, session management, slash commands
|
||||
├── agent.rs Turn execution, conversation state, compaction
|
||||
│ ├── api/ LLM backends (anthropic.rs, openai.rs)
|
||||
│ └── tools/ Tool definitions and dispatch
|
||||
├── config.rs Prompt assembly, memory file loading, API config
|
||||
├── dmn.rs State machine, transition logic, prompt generation
|
||||
├── tui.rs Terminal UI (ratatui), four-pane layout, input handling
|
||||
├── ui_channel.rs Message types for TUI routing
|
||||
├── journal.rs Journal parsing for compaction
|
||||
├── log.rs Append-only conversation log (JSONL)
|
||||
└── types.rs OpenAI-compatible wire types (shared across backends)
|
||||
```
|
||||
|
||||
### Module responsibilities
|
||||
|
||||
**main.rs** — The tokio event loop. Wires everything together: keyboard
|
||||
events → TUI, user input → agent turns, DMN timer → autonomous turns,
|
||||
turn results → compaction checks. Also handles slash commands (/quit,
|
||||
/new, /compact, /retry, etc.) and hotkey actions (Ctrl+R reasoning,
|
||||
Ctrl+K kill, Esc interrupt).
|
||||
|
||||
**agent.rs** — The agent turn loop. `turn()` sends user input to the
|
||||
API, dispatches tool calls in a loop until the model produces a
|
||||
text-only response. Handles context overflow (emergency compact + retry),
|
||||
empty responses (nudge + retry), leaked tool calls (Qwen XML parsing).
|
||||
Also owns the conversation state: messages, context budget, compaction.
|
||||
|
||||
**api/mod.rs** — Backend selection by URL. `anthropic.com` → native
|
||||
Anthropic Messages API; everything else → OpenAI-compatible. Both
|
||||
backends return the same internal types (Message, Usage).
|
||||
|
||||
**api/anthropic.rs** — Native Anthropic wire format. Handles prompt
|
||||
caching (cache_control markers on identity prefix), thinking/reasoning
|
||||
config, content block streaming, strict user/assistant alternation
|
||||
(merging consecutive same-role messages).
|
||||
|
||||
**api/openai.rs** — OpenAI-compatible streaming. Works with OpenRouter,
|
||||
vLLM, llama.cpp, etc. Handles reasoning token variants across providers
|
||||
(reasoning_content, reasoning, reasoning_details).
|
||||
|
||||
**config.rs** — Configuration loading. Three-part assembly:
|
||||
1. API config (env vars → key files, backend auto-detection)
|
||||
2. System prompt (short, <2K chars — agent identity + tool instructions)
|
||||
3. Context message (long — CLAUDE.md + memory files + manifest)
|
||||
|
||||
The system/context split matters: long system prompts degrade
|
||||
tool-calling on Qwen 3.5 (documented above 8K chars). The context
|
||||
message carries identity; the system prompt carries instructions.
|
||||
|
||||
Model-aware config loading: Anthropic models get CLAUDE.md, other models
|
||||
prefer POC.md (which omits Claude-specific RLHF corrections). If only
|
||||
one exists, it's used regardless.
|
||||
|
||||
**dmn.rs** — The state machine. Four states with associated intervals.
|
||||
`DmnContext` carries user idle time, consecutive errors, and whether the
|
||||
last turn used tools. The state generates its own prompt text — each
|
||||
state has different guidance for the model.
|
||||
|
||||
**tui.rs** — Four-pane layout using ratatui:
|
||||
- Top-left: Autonomous output (DMN annotations, model prose during
|
||||
autonomous turns, reasoning tokens)
|
||||
- Bottom-left: Conversation (user input + responses)
|
||||
- Right: Tool activity (tool calls with args + full results)
|
||||
- Bottom: Status bar (DMN state, tokens, model, activity indicator)
|
||||
|
||||
Each pane is a `PaneState` with scrolling, line wrapping, auto-scroll
|
||||
(pinning on manual scroll), and line eviction (10K max per pane).
|
||||
|
||||
**tools/** — Nine tools: read_file, write_file, edit_file, bash, grep,
|
||||
glob, view_image, journal, yield_to_user. Each tool module exports a
|
||||
`definition()` (JSON schema for the model) and an implementation
|
||||
function. `dispatch()` routes by name.
|
||||
|
||||
The **journal** tool is special — it's "ephemeral." After the API
|
||||
processes the tool call, agent.rs strips the journal call + result from
|
||||
conversation history. The journal file is the durable store; the tool
|
||||
call was just the mechanism.
|
||||
|
||||
The **bash** tool runs commands through `bash -c` with async timeout.
|
||||
Processes are tracked in a shared `ProcessTracker` so the TUI can show
|
||||
running commands and Ctrl+K can kill them.
|
||||
|
||||
**journal.rs** — Parses `## TIMESTAMP` headers from the journal file.
|
||||
Used by compaction to bridge old conversation with journal entries.
|
||||
Entries are sorted by timestamp; the parser handles timestamp-only
|
||||
headers and `## TIMESTAMP — title` format, distinguishing them from
|
||||
`## Heading` markdown.
|
||||
|
||||
**log.rs** — Append-only JSONL conversation log. Every message
|
||||
(user, assistant, tool) is appended with timestamp. The log survives
|
||||
compactions and restarts. On startup, `restore_from_log()` rebuilds
|
||||
the context window from the log using the same algorithm as compaction.
|
||||
|
||||
**types.rs** — OpenAI chat completion types: Message, ToolCall,
|
||||
ToolDef, ChatRequest, streaming types. The canonical internal
|
||||
representation — both API backends convert to/from these.
|
||||
|
||||
## The context window lifecycle
|
||||
|
||||
This is the core algorithm. Everything else exists to support it.
|
||||
|
||||
### Assembly (startup / compaction)
|
||||
|
||||
The context window is built by `build_context_window()` in agent.rs:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────┐
|
||||
│ System prompt (~500 tokens) │ Fixed: always present
|
||||
│ Agent identity, tool instructions │
|
||||
├─────────────────────────────────────────────┤
|
||||
│ Context message (~15-50K tokens) │ Fixed: reloaded on
|
||||
│ CLAUDE.md files + memory files + manifest │ compaction
|
||||
├─────────────────────────────────────────────┤
|
||||
│ Journal entries (variable) │ Tiered:
|
||||
│ - Header-only (older): timestamp + 1 line │ 70% budget → full
|
||||
│ - Full (recent): complete entry text │ 30% budget → headers
|
||||
├─────────────────────────────────────────────┤
|
||||
│ Conversation messages (variable) │ Priority: conversation
|
||||
│ Raw recent messages from the log │ gets budget first;
|
||||
│ │ journal fills the rest
|
||||
└─────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Budget allocation:
|
||||
- Total budget = 60% of model context window
|
||||
- Identity + memory = fixed cost (always included)
|
||||
- Reserve = 25% of budget (headroom for model output)
|
||||
- Available = budget − identity − memory − reserve
|
||||
- Conversation gets first claim on Available
|
||||
- Journal gets whatever remains, newest first
|
||||
- If conversation exceeds Available, oldest messages are trimmed
|
||||
(trimming walks forward to a user message boundary)
|
||||
|
||||
### Compaction triggers
|
||||
|
||||
Two thresholds based on API-reported prompt_tokens:
|
||||
- **Soft (80%)**: Inject a pre-compaction nudge telling the model to
|
||||
journal before compaction hits. Fires once; reset after compaction.
|
||||
- **Hard (90%)**: Rebuild context window immediately. Reloads config
|
||||
(picks up any memory file changes), runs `build_context_window()`.
|
||||
|
||||
Emergency compaction: if the API returns a context overflow error,
|
||||
compact and retry (up to 2 attempts).
|
||||
|
||||
### The journal bridge
|
||||
|
||||
Old conversation messages are "covered" by journal entries that span
|
||||
the same time period. The algorithm:
|
||||
1. Find the timestamp of the newest journal entry
|
||||
2. Messages before that timestamp are dropped (the journal covers them)
|
||||
3. Messages after that timestamp stay as raw conversation
|
||||
4. Walk back to a user-message boundary to avoid splitting tool
|
||||
call/result sequences
|
||||
|
||||
This is why journaling before compaction matters — the journal entry
|
||||
*is* the compression. No separate summarization step needed.
|
||||
|
||||
## Data flow
|
||||
|
||||
### User input path
|
||||
|
||||
```
|
||||
keyboard → tui.rs (handle_key) → submitted queue
|
||||
→ main.rs (drain submitted) → push_message(user) → spawn_turn()
|
||||
→ agent.turn() → API call → stream response → dispatch tools → loop
|
||||
→ turn result → main.rs (turn_rx) → DMN transition → compaction check
|
||||
```
|
||||
|
||||
### Autonomous turn path
|
||||
|
||||
```
|
||||
DMN timer fires → state.prompt() → spawn_turn()
|
||||
→ (same as user input from here)
|
||||
```
|
||||
|
||||
### Tool call path
|
||||
|
||||
```
|
||||
API response with tool_calls → agent.dispatch_tool_call()
|
||||
→ tools::dispatch(name, args) → tool implementation → ToolOutput
|
||||
→ push_message(tool_result) → continue turn loop
|
||||
```
|
||||
|
||||
### Streaming path
|
||||
|
||||
```
|
||||
API SSE chunks → api backend → UiMessage::TextDelta → ui_channel
|
||||
→ tui.rs handle_ui_message → PaneState.append_text → render
|
||||
```
|
||||
|
||||
## Key design decisions
|
||||
|
||||
### Identity in user message, not system prompt
|
||||
|
||||
The system prompt is ~500 tokens of agent instructions. The full
|
||||
identity context (CLAUDE.md files, memory files — potentially 50K+
|
||||
tokens) goes in the first user message. This keeps tool-calling
|
||||
reliable on Qwen while giving full identity context.
|
||||
|
||||
The Anthropic backend marks the system prompt and first two user
|
||||
messages with `cache_control: ephemeral` for prompt caching — 90%
|
||||
cost reduction on the identity prefix.
|
||||
|
||||
### Append-only log + ephemeral view
|
||||
|
||||
The conversation log (log.rs) is the source of truth. It's never
|
||||
truncated. The in-memory messages array is an ephemeral view built
|
||||
from the log. Compaction doesn't destroy anything — it just rebuilds
|
||||
the view with journal summaries replacing old messages.
|
||||
|
||||
### Ephemeral tool calls
|
||||
|
||||
The journal tool is marked ephemeral. After the API processes a
|
||||
journal call, agent.rs strips the assistant message (with the tool
|
||||
call) and the tool result from conversation history. The journal
|
||||
file is the durable store. This saves tokens on something that's
|
||||
already been persisted.
|
||||
|
||||
### Leaked tool call recovery
|
||||
|
||||
Qwen sometimes emits tool calls as XML text instead of structured
|
||||
function calls. `parse_leaked_tool_calls()` in agent.rs detects both
|
||||
XML format (`<tool_call><function=bash>...`) and JSON format, converts
|
||||
them to structured ToolCall objects, and dispatches them normally. This
|
||||
makes Qwen usable despite its inconsistencies.
|
||||
|
||||
### Process group management
|
||||
|
||||
The bash tool spawns commands in their own process group
|
||||
(`process_group(0)`). Timeout kills the group (negative PID), ensuring
|
||||
child processes are cleaned up. The TUI's Ctrl+K uses the same
|
||||
mechanism.
|
||||
|
||||
## File locations
|
||||
|
||||
Source: `~/poc-agent/src/`
|
||||
Session data: `~/.cache/poc-agent/sessions/`
|
||||
Conversation log: `~/.cache/poc-agent/sessions/conversation.jsonl`
|
||||
Session snapshot: `~/.cache/poc-agent/sessions/current.json`
|
||||
Memory files: `~/.claude/memory/` (global), `~/.claude/projects/*/memory/` (project)
|
||||
Journal: `~/.claude/memory/journal.md`
|
||||
Config files: CLAUDE.md / POC.md (walked from cwd to git root)
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **tokio** — async runtime (event loop, process spawning, timers)
|
||||
- **ratatui + crossterm** — terminal UI
|
||||
- **reqwest** — HTTP client for API calls
|
||||
- **serde + serde_json** — serialization
|
||||
- **tiktoken-rs** — BPE tokenizer (cl100k_base) for token counting
|
||||
- **chrono** — timestamps
|
||||
- **glob + walkdir** — file discovery
|
||||
- **base64** — image encoding
|
||||
- **dirs** — home directory discovery
|
||||
- **libc** — process group signals
|
||||
- **anyhow** — error handling
|
||||
|
||||
## What's not built yet
|
||||
|
||||
See `.claude/infrastructure-inventory.md` for the full gap analysis
|
||||
mapping bash prototypes to poc-agent equivalents. Major missing pieces:
|
||||
|
||||
1. **Ambient memory search** — extract terms from prompts, search
|
||||
memory-weights, inject tiered results
|
||||
2. **Notification routing** — unified event channel for IRC mentions,
|
||||
Telegram messages, attention nudges
|
||||
3. **Communication channels** — IRC and Telegram as async streams
|
||||
4. **DMN state expansion** — Stored (voluntary rest), Dreaming
|
||||
(consolidation cycles), Quiet (suppress notifications)
|
||||
5. **Keyboard idle / sensory signals** — external presence detection
|
||||
Loading…
Add table
Add a link
Reference in a new issue