Move core-personality and conversation to the end of the prompt.
The model needs to see its task before 200KB of conversation
context. Also: limit to 3 hops, 2-3 memories.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The agent output now includes logging (think blocks, tool calls)
before the final response. Search the tail instead of checking
only the last line.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The type field is near the start of JSONL objects. Scanning the
full object (potentially megabytes for tool_results) was the
bottleneck — TwoWaySearcher dominated the profile.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use memchr::memmem to check for "type":"user" or "type":"assistant"
in raw bytes before parsing. Avoids deserializing large tool_result
and system objects entirely.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TailMessages is a proper iterator that yields (role, text, timestamp)
newest-first. Owns the mmap internally. Caller decides when to stop.
resolve_conversation collects up to 200KB, then reverses to
chronological order. No compaction check needed — the byte budget
naturally limits how far back we scan.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces byte-by-byte backward iteration with memrchr3('{', '}', '"')
which uses SIMD to jump between structurally significant bytes. Major
speedup on large transcripts (1.4GB+).
Also simplifies tail_messages to use a byte budget (200KB) instead
of token counting.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Was parsing every object twice (compaction check + message extract)
and running contains_bytes on every object for the compaction marker.
Now: quick byte pre-filter for "user"/"assistant", parse once, check
compaction after text extraction.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reverse-scans the mmap'd transcript using JsonlBackwardIter,
collecting user/assistant messages up to a token budget, stopping
at the compaction boundary. Returns messages in chronological order.
resolve_conversation() now uses this instead of parsing the entire
file through extract_conversation + split_on_compaction.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When poc-memory render is called inside a Claude session, add the
key to the seen set so the surface agent knows it's been shown.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fires on each UserPromptSubmit, reads the conversation via
{{conversation}}, checks {{seen_recent}} to avoid re-surfacing,
searches the memory graph, and outputs a key list or nothing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
{{conversation}} reads POC_SESSION_ID, finds the transcript, extracts
the last segment (post-compaction), returns the tail ~100K chars.
{{seen_recent}} merges current + prev seen files for the session,
returns the 20 most recently surfaced memory keys with timestamps.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Surface agent fires asynchronously on UserPromptSubmit, deposits
results for the next prompt to consume. This commit adds:
- poc-hook: spawn surface agent with PID tracking and configurable
timeout, consume results (NEW RELEVANT MEMORIES / NO NEW), render
and inject surfaced memories, observation trigger on conversation
volume
- memory-search: rotate seen set on compaction (current → prev)
instead of deleting, merge both for navigation roots
- config: surface_timeout_secs option
The .agent file and agent output routing are still pending.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All LLM calls now go through the direct API backend. Removes
call_model, call_model_with_tools, call_sonnet, call_haiku,
log_usage, and their dependencies (Command, prctl, watchdog).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
audit, digest, and compare now go through the API backend via
call_simple(), which logs to llm-logs/{caller}/.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pass the caller's log closure all the way through to api.rs instead
of creating a separate eprintln closure in llm.rs. Everything goes
through one stream — prompt, think blocks, tool calls with args,
tool results with content, token counts, final response.
CLI uses println (stdout), daemon uses its task log. No more split
between stdout and stderr.
Also removes the llm-log file creation from knowledge.rs — that's
the daemon's concern, not the agent runner's.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The original '>' detection was too broad and caught tool output lines.
Now we look for '> X: ' pattern (user prompt with speaker prefix) to
detect the start of a new user input, which marks the end of the
previous response.
The --block flag makes poc-agent read block until a complete response
is received (detected by a new user input line starting with '>'),
then exit. This enables smoother three-way conversations where one
instance can wait for the other's complete response without polling.
The implementation:
- Added cmd_read_inner() with block parameter
- Modified socket streaming to detect '>' lines as response boundaries
- Added --block CLI flag to Read subcommand
The --follow flag continues to stream indefinitely.
The --block flag reads one complete response and exits.
Neither flag exits immediately if there's no new output.
The hook now tracks transcript size and queues an observation agent
run every ~5K tokens (~20KB) of new conversation. This makes memory
formation reactive to conversation volume rather than purely daily.
Configurable via POC_OBSERVATION_THRESHOLD env var. The observation
agent's chunk_size (in .agent file) controls how much context it
actually processes per run.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move data sections before instructions (core at top, subconscious +
notes at bottom near task). Deduplicate guidelines that are now in
memory-instructions-core-subconscious. Compress verbose paragraphs
to bullet points.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All 18 agents now include:
- {{node:memory-instructions-core}} — tool usage instructions
- {{node:memory-instructions-core-subconscious}} — subconscious framing
- {{node:subconscious-notes-{agent_name}}} — per-agent persistent notes
The subconscious instructions are additive, not a replacement for
the core memory instructions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When building the {{neighborhood}} placeholder for distill and other
agents, stop adding full neighbor content once the prompt exceeds
600KB (~150K tokens). Remaining neighbors get header-only treatment
(key + link strength + first line).
This fixes distill consistently failing on high-degree nodes like
inner-life-sexuality-intimacy whose full neighborhood was 2.5MB.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each consolidation agent now has its own persistent notes node
(subconscious-notes-{agent_name}) loaded via template substitution.
Agents can read their notes at the start of each run and write
updates after completing work, accumulating operational wisdom.
New node: memory-instructions-core-subconscious — shared framing
for background agents ("you are an agent of PoC's subconscious").
Template change: {agent_name} is substituted before {{...}} placeholder
resolution, enabling per-agent node references in .agent files.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split conversation pane into 2-char gutter + text column. Gutter shows
● markers at turn boundaries (Cyan for user, Magenta for assistant),
aligned with the input area's ' > ' gutter.
Key changes:
- Added Marker enum (None/User/Assistant) and parallel markers vec
- Track turn boundaries via pending_marker field
- New draw_conversation_pane() with visual row computation for wrapping
- Both gutter and text scroll synchronously by visual line offset
This fixes the wrapping alignment issue where continuation lines
aligned under markers instead of under the text.
The observe log was writing each TextDelta SSE token as a separate
line, making poc-agent read show word-by-word fragments and causing
the read cursor to advance past partial responses.
Now TextDelta and Reasoning tokens are buffered and flushed as
complete messages on turn boundaries (tool calls, user input, etc).
The socket path (read -f) still streams live.
Also fixed a potential deadlock: replaced blocking_lock() with
.lock().await on the shared logfile mutex.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Qwen 3.5 27B <noreply@qwen.ai>
Convert remaining tools from manual args["key"].as_str() parsing to
serde Deserialize structs. Also removes the now-unused get_str()
helper from grep.rs and simplifies capture_tmux_pane() signature
(takes lines directly instead of re-parsing args).
All 7 tool modules now use the same typed args pattern.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Convert read_file, write_file, edit_file, and glob from manual
args["key"].as_str() parsing to serde_json::from_value with typed
Args structs. Gives type safety, default values via serde attributes,
and clearer error messages on missing/wrong-type arguments.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move file discovery (CLAUDE.md/POC.md, memory files, people/ glob),
prompt assembly, and context_file_info from config.rs into identity.rs.
All extracted functions are pure — they take paths and return strings,
with no dependency on AppConfig. config.rs calls into identity.rs
(one-way dependency).
config.rs: 663 → 440 lines (-223)
identity.rs: 232 lines (new)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move context window construction (build_context_window, plan_context,
render_journal_text, assemble_context), token counting, error
classification, and related helpers from agent.rs into context.rs.
All extracted functions are pure — they take inputs and return values
with no mutable state access. State mutation stays in agent.rs
(compact, restore_from_log, load_startup_journal).
agent.rs: 1504 → 987 lines (-517)
context.rs: 365 lines (new)
Net: -152 lines (duplicate comments removed)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move context window building functions from agent.rs to context.rs:
- build_context_window, plan_context, render_journal_text, assemble_context
- truncate_at_section, find_journal_cutoff, msg_token_count_fn
- model_context_window, context_budget_tokens
- is_context_overflow, is_stream_error, msg_token_count
Also moved ContextPlan struct to types.rs.
Net: -307 lines in agent.rs, +232 in context.rs, +62 in types.rs
These are data structures, not agent logic. Moving them to types.rs
makes them available to other modules (context.rs, etc.) without
creating circular dependencies.
Move parse_leaked_tool_calls, strip_leaked_artifacts, and their
helpers (normalize_xml_tags, parse_qwen_tag, parse_xml_tool_call,
parse_json_tool_call) from agent.rs into their own module.
These functions have zero dependency on Agent or ContextState —
they're pure text parsing. All 4 existing tests move with them.
Reduces agent.rs by ~200 lines.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Qwen 3.5 27B <noreply@qwen.ai>
Previously 'poc-memory agent run <agent> --count N' always ran locally,
loading the full store and executing synchronously. This was slow and
bypassed the daemon's concurrency control and persistent task queue.
Now the CLI checks for a running daemon first and queues via RPC
(returning instantly) unless --local, --debug, or --dry-run is set.
Falls back to local execution if the daemon isn't running.
This also avoids the expensive Store::load() on the fast path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
console-subscriber (used by jobkit's console feature) requires tokio
to be built with --cfg tokio_unstable. Move this and codegen-units=6
from RUSTFLAGS env var to .cargo/config.toml so per-project cargo
config actually works (env var RUSTFLAGS overrides config.toml).
Also remove invalid frame-pointer keys from Cargo.toml profile
sections — frame pointers are already handled via -Cforce-frame-pointers
in the config.toml rustflags.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Connection errors now show cause (refused/timeout/request error),
URL, and the underlying error without redundant URL repetition
- HTTP errors show status code, URL, and up to 1000 chars of body
- Unparseable SSE events logged with content preview instead of
silently dropped — may contain error info from vllm/server
- Stream errors already had good context (kept as-is)
You can't debug what you can't see.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Observation agent was getting 261KB prompts (5 × 50KB chunks) —
too much for focused mining. Now agents can set count, chunk_size,
and chunk_overlap in their JSON header. observation.agent set to
count:1 for smaller, more focused prompts.
Also moved task instructions after {{CONVERSATIONS}} so they're
at the end of the prompt where the model attends more strongly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- store/types.rs: sanitize timestamps on capnp load — old records had
raw offsets instead of unix epoch, breaking sort-by-timestamp queries
- agents/api.rs: drain reasoning tokens from UI channel into LLM logs
so we can see Qwen's chain-of-thought in agent output
- agents/daemon.rs: persistent task queue (pending-tasks.jsonl) —
tasks survive daemon restarts. Push before spawn, remove on completion,
recover on startup.
- api/openai.rs: only send reasoning field when explicitly configured,
not on every request (fixes vllm warning)
- api/mod.rs: add 600s total request timeout as backstop for hung
connections
- Cargo.toml: enable tokio-console feature for task introspection
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- observation.agent: rewritten to navigate graph and prefer refining
existing nodes over creating new ones. Identity-framed prompt,
goals over rules.
- poc-memory edit: opens node in $EDITOR, writes back on save,
no-op if unchanged
- daemon: remove extra_workers (jobkit tokio migration dropped it),
remove sequential chaining of same-type agents (in-flight exclusion
is sufficient)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The lower threshold excluded too many neighbors, causing "query
returned no results (after exclusion)" failures and underloading
the GPU. Now only moderately-connected neighbors (score > 0.3) are
excluded, balancing collision prevention with GPU utilization.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a prompt exceeds the size guard, dump it to a timestamped file
with agent name, size, and seed node keys. Makes it easy to find
which nodes are blowing up prompts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three fixes:
1. Sanitize tool call arguments before pushing to conversation
history — vllm re-parses them as JSON on the next request and
crashes on invalid JSON from a previous turn. Malformed args now
get replaced with {} and the model gets an error message telling
it to retry with valid JSON.
2. Remove is_split special case — split goes through the normal
job_consolidation_agent path like all other agents.
3. call_for_def always uses API when api_base_url is configured,
regardless of tools field. Remove tools field from all .agent
files — memory tools are always provided by the API layer.
Also adds prompt size guard (800KB max) to catch oversized prompts
before they hit the model context limit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove is_split special case in daemon — split now goes through
job_consolidation_agent like all other agents
- call_for_def uses API whenever api_base_url is configured, regardless
of tools field (was requiring non-empty tools to use API)
- Remove "tools" field from all .agent files — memory tools are always
provided by the API layer, not configured per-agent
- Add prompt size guard: reject prompts over 800KB (~200K tokens) with
clear error instead of hitting the model's context limit
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>