No more subcrate nesting — src/, agents/, schema/, defaults/, build.rs
all live at the workspace root. poc-daemon remains as the only workspace
member. Crate name (poc-memory) and all imports unchanged.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
The CLI render command was marking keys as seen in the user's session
whenever POC_SESSION_ID was set. Agent processes inherit POC_SESSION_ID
(they need to read the conversation and seen set), so their tool calls
to poc-memory render were writing to the seen file as a side effect —
bypassing the dedup logic in surface_agent_cycle.
Fix: set POC_AGENT=1 at the start of cmd_run_agent (covers all agents,
not just surface), and guard the CLI render seen-marking on POC_AGENT
being absent. Agents can read the seen set but only surface_agent_cycle
should write to it.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
audit, digest, and compare now go through the API backend via
call_simple(), which logs to llm-logs/{caller}/.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pass the caller's log closure all the way through to api.rs instead
of creating a separate eprintln closure in llm.rs. Everything goes
through one stream — prompt, think blocks, tool calls with args,
tool results with content, token counts, final response.
CLI uses println (stdout), daemon uses its task log. No more split
between stdout and stderr.
Also removes the llm-log file creation from knowledge.rs — that's
the daemon's concern, not the agent runner's.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously 'poc-memory agent run <agent> --count N' always ran locally,
loading the full store and executing synchronously. This was slow and
bypassed the daemon's concurrency control and persistent task queue.
Now the CLI checks for a running daemon first and queues via RPC
(returning instantly) unless --local, --debug, or --dry-run is set.
Falls back to local execution if the daemon isn't running.
This also avoids the expensive Store::load() on the fast path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace per-field ConsolidationPlan struct with HashMap<String, usize>
counts map. Agent types are no longer hardcoded in the struct — add
agents by adding entries to the map.
Active agents: linker, organize, distill, separator, split.
Removed: transfer (redundant with distill), connector (rethink later),
replay (not needed for current graph work).
Elo-based budget allocation now iterates the map instead of indexing
a fixed array. Status display and TUI adapted to show dynamic agent
lists.
memory-instructions-core v13: added protected nodes section — agents
must not rewrite core-personality, core-personality-detail, or
memory-instructions-core. They may add links but not modify content.
High-value neighbors should be treated with care.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Edition 2024 changes:
- gen is reserved: rename variable in query/engine.rs
- set_var is unsafe: wrap in unsafe block in cli/agent.rs
- match ergonomics: add explicit & in spectral.rs filter closure
New --local flag for `poc-memory agent run` bypasses the daemon and
runs the agent directly in-process. Useful for testing agent prompt
changes without waiting in the daemon queue.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
--target and --query now queue individual daemon tasks instead of
running sequentially in the CLI. Each node gets its own choir task
with LLM resource locking. Falls back to local execution if daemon
isn't running.
RPC extended: "run-agent linker 1 target:KEY" spawns a targeted task.
Run an agent on nodes matching a query:
poc-memory agent run linker --query 'key ~ "bcachefs" | limit 10'
Resolves the query to node keys, then passes all as seeds to the agent.
For large batches, should be queued to daemon (future work).
experience_mine and journal_enrich are replaced by the observation
agent. enrich.rs reduced from 465 to 40 lines — only extract_conversation
and split_on_compaction remain (used by observation fragment selection).
-455 lines.
Remove unused StoreView imports, unused store imports, dead
install_default_file, dead make_report_slug, dead fact-mine/
experience-mine spawning loops in daemon. Fix mut warnings.
Zero compiler warnings now.
Adds run_one_agent_with_keys() which bypasses the agent's query and
uses explicitly provided node keys. This allows testing agents on
specific graph neighborhoods:
poc-memory agent run linker --target bcachefs --debug
Consolidate agent logging to one file per run in llm-logs/{agent}/.
Prompt written before LLM call, response appended after. --debug
additionally prints the same content to stdout.
Remove duplicate eprintln! calls and AgentResult.prompt field.
Kill experience_mine and fact_mine job functions from daemon —
observation.agent handles all transcript mining.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add --debug flag that prints the full prompt and LLM response to
stdout, making it easy to iterate on agent prompts. Also adds
prompt field to AgentResult so callers can inspect what was sent.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New command: `poc-memory agent run <agent> [--count N] [--dry-run]`
Runs a single agent by name through the full pipeline (build prompt,
call LLM, apply actions). With --dry-run, sets POC_MEMORY_DRY_RUN=1
so all mutations are no-ops but the agent can still read the graph.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The LCG was producing only 2 distinct matchup pairs due to poor
constants. Switch to xorshift32 for proper coverage of all type pairs.
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
Replace sort-based ranking with proper Elo system:
- Each agent TYPE has a persistent Elo rating (agent-elo.json)
- Each matchup: pick two random types, grab a recent action from
each, LLM compares, update ratings
- Ratings persist across daily evaluations — natural recency bias
from continuous updates against current opponents
- K=32 for fast adaptation to prompt changes
Usage: poc-memory agent evaluate --matchups 30 --model haiku
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
TIE causes inconsistency in sort (A=B, B=C but A>C breaks ordering).
Force the comparator to always pick a winner. Default to A if response
is unparseable.
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
- Use CARGO_MANIFEST_DIR for agent file path (same as defs.rs)
- Dedup affected nodes extracted from reports
- --dry-run shows example comparison prompt without LLM calls
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
Chain-of-thought: "say which is better and why" forces clearer
judgment and gives us analysis data for improving agents.
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
When both actions are from the same agent, show the instructions once
and just compare the two report outputs + affected nodes. Saves tokens
and makes the comparison cleaner.
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
Each comparison now shows the LLM:
- Agent instructions (the .agent prompt file)
- Report output (what the agent did)
- Affected nodes content (what it changed)
The comparator sees intent, action, and impact — can judge whether
a deletion was correct, whether links are meaningful, whether
WRITE_NODEs capture real insights.
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
Yes, really. Rust's stdlib sort_by with an LLM pairwise comparator.
Each comparison is an API call asking "which action was better?"
Sample N actions per agent type, throw them all in a Vec, sort.
Where each agent's samples cluster = that agent's quality score.
Reports per-type average rank and quality ratio.
Supports both haiku (fast/cheap) and sonnet (quality) as comparator.
Usage: poc-memory agent evaluate --samples 5 --model haiku
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>