Adds run_one_agent_with_keys() which bypasses the agent's query and
uses explicitly provided node keys. This allows testing agents on
specific graph neighborhoods:
poc-memory agent run linker --target bcachefs --debug
New placeholder resolves to the 20 highest-degree nodes, skipping
neighbors of already-selected hubs so the list covers different
regions of the graph. Gives agents a starting point for linking
new content to the right places.
Added to observation.agent prompt.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Large conversation segments are now split into 50KB chunks with 10KB
overlap, instead of being truncated to 8000 chars (which was broken
anyway — broke after exceeding, not before). Each chunk gets its own
candidate ID for independent mining and dedup.
format_segment simplified: no size limit, added timestamps to output
so observation agent can cross-reference with journal entries.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Consolidate agent logging to one file per run in llm-logs/{agent}/.
Prompt written before LLM call, response appended after. --debug
additionally prints the same content to stdout.
Remove duplicate eprintln! calls and AgentResult.prompt field.
Kill experience_mine and fact_mine job functions from daemon —
observation.agent handles all transcript mining.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Empty stdout and Claude's rate limit message were silently returned
as successful 0-byte responses. Now detected and reported as errors.
Also skip transcript segments with fewer than 2 assistant messages
(rate-limited sessions, stub conversations).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add --debug flag that prints the full prompt and LLM response to
stdout, making it easy to iterate on agent prompts. Also adds
prompt field to AgentResult so callers can inspect what was sent.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Raw agent responses were being stored as nodes in the graph
(_consolidate-*, _knowledge-*), creating thousands of nodes per day
that polluted search results and bloated the store. Now logged to
~/.claude/memory/llm-logs/<agent>/<timestamp>.txt instead.
Node creation should only happen through explicit agent actions
(WRITE_NODE, REFINE) or direct poc-memory write tool calls.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New command: `poc-memory agent run <agent> [--count N] [--dry-run]`
Runs a single agent by name through the full pipeline (build prompt,
call LLM, apply actions). With --dry-run, sets POC_MEMORY_DRY_RUN=1
so all mutations are no-ops but the agent can still read the graph.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All mutating commands (write, delete, rename, link-add, journal write,
used, wrong, not-useful, gap) check POC_MEMORY_DRY_RUN after argument
validation but before mutation. If set, process exits silently — agent
tool calls are visible in the LLM output so we can see what it tried
to do without applying changes.
Read commands (render, search, graph link, journal tail) work normally
in dry-run mode so agents can still explore the graph.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update the observation agent prompt to:
- Check the journal around transcript timestamps before extracting
- Link extractions back to relevant journal entries
- Use poc-memory tools directly (search, render, write, link-add)
- Prefer REFINE over WRITE_NODE
- Simplified and focused prompt
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire select_conversation_fragments to use store.is_segment_mined()
instead of scanning _observed-transcripts stub nodes. Segments are
now marked AFTER the agent succeeds (via mark_observation_done),
not before — so failed runs don't lose segments.
Fragment IDs flow through the Resolved.keys → AgentBatch.node_keys
path so run_and_apply_with_log can mark them post-success.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add TranscriptSegment capnp schema and append-only log for tracking
which transcript segments have been mined by which agents. Replaces
the old approach of creating stub nodes (_observed-transcripts,
_mined-transcripts, _facts-) in the main graph store.
- New schema: TranscriptSegment and TranscriptProgressLog
- Store methods: append_transcript_progress, replay, is_segment_mined,
mark_segment_mined
- Migration command: admin migrate-transcript-progress (migrated 1771
markers, soft-deleted old stub nodes)
- Progress log replayed on all Store::load paths
Also: revert extractor.agent to graph-only (no CONVERSATIONS),
update memory-instructions-core with refine-over-create principle.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extractor is a graph neighborhood organizer, not a transcript miner.
Remove {{CONVERSATIONS}} that was incorrectly merged in. Keep the
new includes (core-personality, memory-instructions-core) and tools.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add distill_count to ConsolidationPlan, daemon health metrics,
and TUI display. Distill agent now participates in the
consolidation budget alongside replay, linker, separator,
transfer, organize, and connector.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Was 130k, calibrated for the old 200k window. With the 1M token
context window, this was firing false compaction warnings for the
entire session.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All 17 agents now include {{node:core-personality}} and
{{node:memory-instructions-core}} instead of duplicating tool
blocks and graph walk instructions in each file. Stripped
duplicated tool/navigation sections from linker, organize,
distill, and evaluate. All agents now have Bash(poc-memory:*)
tool access for graph walking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add {{node:KEY}} placeholder resolver — agents can inline any graph
node's content in their prompts. Used for shared instructions.
- Remove hardcoded identity preamble from defs.rs — agents now pull
identity and instructions from the graph via {{node:core-personality}}
and {{node:memory-instructions-core}}.
- Agent output report keys now include a content slug extracted from
the first line of LLM output, making them human-readable
(e.g. _consolidate-distill-20260316T014739-distillation-run-complete).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Agent identity injection: prepend core-personality to all agent prompts
so agents dream as me, not as generic graph workers. Include instructions
to walk the graph and connect new nodes to core concepts.
- Parallel agent scheduling: sequential within type, parallel across types.
Different agent types (linker, organize, replay) run concurrently.
- Linker prompt: graph walking instead of keyword search for connections.
"Explore the local topology and walk the graph until you find the best
connections."
- memory-search fixes: format_results no longer truncates to 5 results,
pipeline default raised to 50, returned file cleared on compaction,
--seen and --seen-full merged, compaction timestamp in --seen output,
max_entries=3 per prompt for steady memory drip.
- Stemmer optimization: strip_suffix now works in-place on a single String
buffer instead of allocating 18 new Strings per word. Note for future:
reversed-suffix trie for O(suffix_len) instead of O(n_rules).
- Transcript: add compaction_timestamp() for --seen display.
- Agent budget configurable (default 4000 from config).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The graph changes fast with 1000+ agents per cycle. Daily was too
slow for the feedback loop. 6-hour cycle means Elo evaluation and
agent reallocation happen 4x per day.
Runs on first tick after daemon start (initialized to past).
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
agent_budget config (default 1000) replaces health-metric-computed
totals. The budget is the total agent runs per cycle — use it all.
Elo distribution is squared for power-law unfairness: top-rated agents
get disproportionately more runs. If linker has Elo 1123 and connector
has 876, linker gets ~7x more runs (squared ratio) vs ~3.5x (linear).
Minimum 2 runs per type so underperformers still get evaluated.
No Elo file → equal distribution as fallback.
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
After health metrics compute the total agent budget, read
agent-elo.json and redistribute proportionally to Elo ratings.
Higher-rated agent types get more runs.
Health determines HOW MUCH work. Elo determines WHAT KIND.
Every type gets at least 1 run. If no Elo file exists, falls back
to the existing hardcoded allocation.
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
The LCG was producing only 2 distinct matchup pairs due to poor
constants. Switch to xorshift32 for proper coverage of all type pairs.
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
Replace sort-based ranking with proper Elo system:
- Each agent TYPE has a persistent Elo rating (agent-elo.json)
- Each matchup: pick two random types, grab a recent action from
each, LLM compares, update ratings
- Ratings persist across daily evaluations — natural recency bias
from continuous updates against current opponents
- K=32 for fast adaptation to prompt changes
Usage: poc-memory agent evaluate --matchups 30 --model haiku
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
TIE causes inconsistency in sort (A=B, B=C but A>C breaks ordering).
Force the comparator to always pick a winner. Default to A if response
is unparseable.
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
- Use CARGO_MANIFEST_DIR for agent file path (same as defs.rs)
- Dedup affected nodes extracted from reports
- --dry-run shows example comparison prompt without LLM calls
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
Chain-of-thought: "say which is better and why" forces clearer
judgment and gives us analysis data for improving agents.
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
When both actions are from the same agent, show the instructions once
and just compare the two report outputs + affected nodes. Saves tokens
and makes the comparison cleaner.
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
Each comparison now shows the LLM:
- Agent instructions (the .agent prompt file)
- Report output (what the agent did)
- Affected nodes content (what it changed)
The comparator sees intent, action, and impact — can judge whether
a deletion was correct, whether links are meaningful, whether
WRITE_NODEs capture real insights.
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
Yes, really. Rust's stdlib sort_by with an LLM pairwise comparator.
Each comparison is an API call asking "which action was better?"
Sample N actions per agent type, throw them all in a Vec, sort.
Where each agent's samples cluster = that agent's quality score.
Reports per-type average rank and quality ratio.
Supports both haiku (fast/cheap) and sonnet (quality) as comparator.
Usage: poc-memory agent evaluate --samples 5 --model haiku
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
Evaluate agent will use sort-based ranking (LLM as merge sort
comparator) instead of absolute scoring. Stub for now — needs
Rust sampling code to bundle before/after pairs.
Fixed distill query: sort:degree (not sort:degree desc).
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
Walks high-degree hub nodes, reads neighborhood, distills essential
insights upward into the hub. REFINE to update stale hubs, SPLIT
to flag hubs that cover too many sub-topics. Size discipline:
200-500 words per hub, flag over 800 for splitting.
Completes the agent ecology: extract (experience) → link (linker) →
organize (clusters) → distill (hubs) → rename (vocabulary) → split
(overgrown hubs). Each stage refines the previous.
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
Add "core principle: keys are concepts" — renaming defines the
vocabulary of the knowledge graph. Core keywords should be the
search terms. Updated examples to use dash separator (no more #).
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
search.rs → query/engine.rs (algorithms, pipeline, seed matching)
query.rs → query/parser.rs (PEG query language, field resolution)
query/mod.rs re-exports for backwards compatibility.
crate::search still works (aliased to query::engine).
crate::query::run_query resolves to the parser's entry point.
No logic changes — pure file reorganization.
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
Organize runs at half the linker count — synthesizes what linker
connects, creates hub nodes for unnamed concepts.
Connector runs when communities are fragmented (<5 nodes/community
→ 20 runs, <10 → 10 runs). Bridges isolated clusters.
Both interleaved round-robin with existing agent types.
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
Tell linker and organize agents to:
- Name unnamed concepts: when 3+ nodes share a theme with no hub,
create one with WRITE_NODE that synthesizes the generalization
- Percolate up: gather key insights from children into hub content,
so the hub is self-contained without needing to follow every link
This addresses the gap where agents are good at extraction and linking
but not synthesis — turning episodic observations into semantic concepts.
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
Make 'created' resolve to created_at epoch (numeric, sortable) and add
'timestamp' field. Enables `sort created desc` and `sort created asc`
in query pipelines.
Example: poc-memory query "key ~ 'bcachefs' | sort created desc | limit 10"
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
Update the experience mining prompt to output links alongside journal
entries. The LLM now returns a "links" array per entry pointing to
existing semantic nodes. Rust code creates the links immediately after
node creation — new nodes arrive pre-connected instead of orphaned.
Also: remove # from all key generation paths (experience miner,
digest section keys, observed transcript keys). New nodes get clean
dash-separated keys.
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
Remove all the quoting instructions, warnings about shell comments,
and "CRITICAL" blocks about single quotes. Keys are plain dashes now.
Agent tool examples are clean and minimal.
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
Add `poc-memory admin bulk-rename FROM TO [--apply]` for bulk key
character replacement. Uses rename_node() per key for proper capnp
log persistence. Collision detection, progress reporting, auto-fsck.
Applied: renamed 13,042 keys from # to - separator. This fixes the
Claude Bash tool's inability to pass # in command arguments (the
model confabulates that quoting doesn't work and gives up).
7 collision pairs resolved by deleting the # version before rename.
209 orphan edges pruned by fsck.
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
The agent was confabulating that # keys can't be passed to the Bash
tool. They work fine with single quotes — the agent just gave up too
early. Added explicit "single quotes WORK, do not give up" with a
concrete example.
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
Linker agents output **LINK** (bold) with backtick-wrapped keys, and
**WRITE_NODE**/**END_NODE** with bold markers. The parsers expected
plain LINK/WRITE_NODE without markdown formatting, silently dropping
all actions from tool-enabled agents.
Updated regexes to accept optional ** bold markers and backtick key
wrapping. Also reverted per-link Jaccard computation (too expensive
in batch) — normalize-strengths should be run periodically instead.
This was causing ~600 links and ~40 new semantic nodes per overnight
batch to be silently lost.
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
Agent subprocess calls now set POC_PROVENANCE=agent:{name} so any
nodes/links created via tool calls are tagged with the creating agent.
This makes agent transcripts indistinguishable from conscious sessions
in format — important for future model training.
new_relation() now reads POC_PROVENANCE env var directly (raw string,
not enum) since agent names are dynamic.
link-add now computes initial strength from Jaccard similarity instead
of hardcoded 0.8. New links start at a strength reflecting actual
neighborhood overlap.
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
Some Sonnet runs preemptively refuse to use tools ("poc-memory tool
needs approval") without attempting to run them. Adding explicit
instruction that tools are pre-approved and should be used directly.
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
Add adjust_edge_strength() to Store — modifies strength on all edges
between two nodes, clamped to [0.05, 0.95].
New commands:
- `not-relevant KEY` — weakens ALL edges to the node by 0.01
(bad routing: search found the wrong thing)
- `not-useful KEY` — weakens node weight, not edges
(bad content: search found the right thing but it's not good)
Enhanced `used KEY` — now also strengthens all edges to the node by
0.01, in addition to the existing node weight boost.
Three-tier design: agents adjust by 0.00001 (automatic), conscious
commands adjust by 0.01 (deliberate), manual override sets directly.
All clamped, never hitting 0 or 1.
Design spec: .claude/analysis/2026-03-14-link-strength-feedback.md
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>