Pass the caller's log closure all the way through to api.rs instead
of creating a separate eprintln closure in llm.rs. Everything goes
through one stream — prompt, think blocks, tool calls with args,
tool results with content, token counts, final response.
CLI uses println (stdout), daemon uses its task log. No more split
between stdout and stderr.
Also removes the llm-log file creation from knowledge.rs — that's
the daemon's concern, not the agent runner's.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- store/types.rs: sanitize timestamps on capnp load — old records had
raw offsets instead of unix epoch, breaking sort-by-timestamp queries
- agents/api.rs: drain reasoning tokens from UI channel into LLM logs
so we can see Qwen's chain-of-thought in agent output
- agents/daemon.rs: persistent task queue (pending-tasks.jsonl) —
tasks survive daemon restarts. Push before spawn, remove on completion,
recover on startup.
- api/openai.rs: only send reasoning field when explicitly configured,
not on every request (fixes vllm warning)
- api/mod.rs: add 600s total request timeout as backstop for hung
connections
- Cargo.toml: enable tokio-console feature for task introspection
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three fixes:
1. Sanitize tool call arguments before pushing to conversation
history — vllm re-parses them as JSON on the next request and
crashes on invalid JSON from a previous turn. Malformed args now
get replaced with {} and the model gets an error message telling
it to retry with valid JSON.
2. Remove is_split special case — split goes through the normal
job_consolidation_agent path like all other agents.
3. call_for_def always uses API when api_base_url is configured,
regardless of tools field. Remove tools field from all .agent
files — memory tools are always provided by the API layer.
Also adds prompt size guard (800KB max) to catch oversized prompts
before they hit the model context limit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two bugs: upsert_provenance didn't update node.timestamp, so history
showed the original creation date for every version. And native memory
tools (poc-agent dispatch) didn't set POC_PROVENANCE, so all agent
writes showed provenance "manual" instead of "agent:organize" etc.
Fix: set node.timestamp = now_epoch() in upsert_provenance. Thread
provenance through memory::dispatch as Option<&str>, set it via
.env("POC_PROVENANCE") on each subprocess Command. api.rs passes
"agent:{name}" for daemon agent calls.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tools:
- Add native memory_render, memory_write, memory_search,
memory_links, memory_link_set, memory_link_add, memory_used
tools to poc-agent (tools/memory.rs)
- Add MCP server (~/bin/memory-mcp.py) exposing same tools
for Claude Code sessions
- Wire memory tools into poc-agent dispatch and definitions
- poc-memory daemon agents now use memory_* tools instead of
bash poc-memory commands — no shell quoting issues
Distill agent:
- Rewrite distill.agent prompt: "agent of PoC's subconscious"
framing, focus on synthesis and creativity over bookkeeping
- Add {{neighborhood}} placeholder: full seed node content +
all neighbors with content + cross-links between neighbors
- Remove content truncation in prompt builder — agents need
full content for quality work
- Remove bag-of-words similarity suggestions — agents have
tools, let them explore the graph themselves
- Add api_reasoning config option (default: "high")
- link-set now deduplicates — collapses duplicate links
- Full tool call args in debug logs (was truncated to 80 chars)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Make ApiClient a process-wide singleton via OnceLock so the
connection pool is reused across agent calls. Fix the sync wrapper
to properly pass the caller's log closure through thread::scope
instead of dropping it.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run the async API call on a dedicated thread with its own tokio
runtime so it works whether called from a sync context or from
within an existing tokio runtime (daemon).
Also drops the log closure capture issue — uses a simple eprintln
fallback since the closure can't cross thread boundaries.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When api_base_url is configured, agents call the LLM directly via
OpenAI-compatible API (vllm, llama.cpp, etc.) instead of shelling
out to claude CLI. Implements the full tool loop: send prompt, if
tool_calls execute them and send results back, repeat until text.
This enables running agents against local/remote models like
Qwen-27B on a RunPod B200, with no dependency on claude CLI.
Config fields: api_base_url, api_key, api_model.
Falls back to claude CLI when api_base_url is not set.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>