Commit graph

26 commits

Author SHA1 Message Date
Kent Overstreet
81fec99767 history: show DELETED marker on tombstone entries
cmd_history was silently hiding the deleted flag, making it
impossible to tell from the output that a node had been deleted.
This masked the kernel-patterns deletion — looked like the node
existed in the log but wouldn't load.

Also adds merge-logs and diag-key diagnostic binaries, and makes
Node::to_capnp public for use by external tools.

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-03-17 18:00:58 -04:00
ProofOfConcept
19e181665d Add calibrate agent, link-set command, and dominating-set query stage
calibrate.agent: Haiku-based agent that reads a node and all its
neighbors, then assigns appropriate link strengths relative to each
other. Designed for high-volume runs across the whole graph.

graph link-set: Set strength of an existing link (0.0-1.0).

dominating-set query stage: Greedy 3-covering dominating set — finds
the minimum set of nodes such that every node in the input is within
1 hop of at least 3 selected nodes. Use with calibrate agent to
ensure every link gets assessed from multiple perspectives.

Usage: poc-memory query "content ~ 'bcachefs' | dominating-set"
2026-03-17 01:39:41 -04:00
ProofOfConcept
7fc1270d6f agent run: queue targeted runs to daemon, one task per node
--target and --query now queue individual daemon tasks instead of
running sequentially in the CLI. Each node gets its own choir task
with LLM resource locking. Falls back to local execution if daemon
isn't running.

RPC extended: "run-agent linker 1 target:KEY" spawns a targeted task.
2026-03-17 01:24:54 -04:00
ProofOfConcept
83a027d8be agent run: add --query flag for batch targeting via search
Run an agent on nodes matching a query:
  poc-memory agent run linker --query 'key ~ "bcachefs" | limit 10'

Resolves the query to node keys, then passes all as seeds to the agent.
For large batches, should be queued to daemon (future work).
2026-03-17 01:03:43 -04:00
ProofOfConcept
2b25fee520 Remove experience_mine, journal_enrich, and old mining helpers
experience_mine and journal_enrich are replaced by the observation
agent. enrich.rs reduced from 465 to 40 lines — only extract_conversation
and split_on_compaction remain (used by observation fragment selection).

-455 lines.
2026-03-17 00:54:12 -04:00
ProofOfConcept
7a24d84ce3 Clean up unused imports, dead code, and compiler warnings
Remove unused StoreView imports, unused store imports, dead
install_default_file, dead make_report_slug, dead fact-mine/
experience-mine spawning loops in daemon. Fix mut warnings.
Zero compiler warnings now.
2026-03-17 00:47:52 -04:00
ProofOfConcept
6932e05b38 Remove dead action pipeline: parsing, depth tracking, knowledge loop, fact miner
Agents now apply changes via tool calls (poc-memory write/link-add/etc)
during the LLM call. The old pipeline — where agents output WRITE_NODE/
LINK/REFINE text, which was parsed and applied separately — is dead code.

Removed:
- Action/ActionKind/Confidence types and all parse_* functions
- DepthDb, depth tracking, confidence gating
- apply_action, stamp_content, has_edge
- NamingResolution, resolve_naming and related naming agent code
- KnowledgeLoopConfig, CycleResult, GraphMetrics, convergence checking
- run_knowledge_loop, run_cycle, check_convergence
- apply_consolidation (old report re-processing)
- fact_mine.rs (folded into observation agent)
- resolve_action_names

Simplified:
- AgentResult no longer carries actions/no_ops
- run_and_apply_with_log just runs the agent
- consolidate_full simplified action tracking

-1364 lines.
2026-03-17 00:37:12 -04:00
ProofOfConcept
8b959fb68d agent run: add --target flag to run agents on specific nodes
Adds run_one_agent_with_keys() which bypasses the agent's query and
uses explicitly provided node keys. This allows testing agents on
specific graph neighborhoods:

  poc-memory agent run linker --target bcachefs --debug
2026-03-17 00:24:24 -04:00
Kent Overstreet
03310dafa4 agent logging: single log file, --debug prints to stdout
Consolidate agent logging to one file per run in llm-logs/{agent}/.
Prompt written before LLM call, response appended after. --debug
additionally prints the same content to stdout.

Remove duplicate eprintln! calls and AgentResult.prompt field.
Kill experience_mine and fact_mine job functions from daemon —
observation.agent handles all transcript mining.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 20:44:09 -04:00
Kent Overstreet
7fe55e28bd poc-memory agent run --debug: dump prompt and response
Add --debug flag that prints the full prompt and LLM response to
stdout, making it easy to iterate on agent prompts. Also adds
prompt field to AgentResult so callers can inspect what was sent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 19:13:43 -04:00
Kent Overstreet
f0df489465 poc-memory agent run: single agent execution with dry-run
New command: `poc-memory agent run <agent> [--count N] [--dry-run]`

Runs a single agent by name through the full pipeline (build prompt,
call LLM, apply actions). With --dry-run, sets POC_MEMORY_DRY_RUN=1
so all mutations are no-ops but the agent can still read the graph.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 18:13:24 -04:00
Kent Overstreet
7e131862d6 poc-memory: POC_MEMORY_DRY_RUN=1 for agent testing
All mutating commands (write, delete, rename, link-add, journal write,
used, wrong, not-useful, gap) check POC_MEMORY_DRY_RUN after argument
validation but before mutation. If set, process exits silently — agent
tool calls are visible in the LLM output so we can see what it tried
to do without applying changes.

Read commands (render, search, graph link, journal tail) work normally
in dry-run mode so agents can still explore the graph.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 18:09:56 -04:00
ProofOfConcept
c959b2c964 evaluate: fix RNG — xorshift32 replaces degenerate LCG
The LCG was producing only 2 distinct matchup pairs due to poor
constants. Switch to xorshift32 for proper coverage of all type pairs.

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-03-14 19:57:58 -04:00
ProofOfConcept
16777924d0 evaluate: switch to Elo ratings with skillratings crate
Replace sort-based ranking with proper Elo system:
- Each agent TYPE has a persistent Elo rating (agent-elo.json)
- Each matchup: pick two random types, grab a recent action from
  each, LLM compares, update ratings
- Ratings persist across daily evaluations — natural recency bias
  from continuous updates against current opponents
- K=32 for fast adaptation to prompt changes

Usage: poc-memory agent evaluate --matchups 30 --model haiku

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-03-14 19:53:46 -04:00
ProofOfConcept
e2a6bc4c8b evaluate: remove TIE option, force binary judgment
TIE causes inconsistency in sort (A=B, B=C but A>C breaks ordering).
Force the comparator to always pick a winner. Default to A if response
is unparseable.

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-03-14 19:48:01 -04:00
ProofOfConcept
0cecfdb352 evaluate: fix agent prompt path, dedup affected nodes, add --dry-run
- Use CARGO_MANIFEST_DIR for agent file path (same as defs.rs)
- Dedup affected nodes extracted from reports
- --dry-run shows example comparison prompt without LLM calls

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-03-14 19:44:12 -04:00
ProofOfConcept
415180eeab evaluate: ask for reasoning in comparisons
Chain-of-thought: "say which is better and why" forces clearer
judgment and gives us analysis data for improving agents.

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-03-14 19:36:55 -04:00
ProofOfConcept
39e3d69e3c evaluate: dedup agent prompt when comparing same agent type
When both actions are from the same agent, show the instructions once
and just compare the two report outputs + affected nodes. Saves tokens
and makes the comparison cleaner.

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-03-14 19:35:11 -04:00
ProofOfConcept
b964335317 evaluate: include agent prompt + affected nodes in comparisons
Each comparison now shows the LLM:
- Agent instructions (the .agent prompt file)
- Report output (what the agent did)
- Affected nodes content (what it changed)

The comparator sees intent, action, and impact — can judge whether
a deletion was correct, whether links are meaningful, whether
WRITE_NODEs capture real insights.

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-03-14 19:34:10 -04:00
ProofOfConcept
433d36aea8 evaluate: use rayon par_sort_by for parallel LLM comparisons
Merge sort parallelizes naturally — multiple LLM comparison calls
happen concurrently. Safe because merge sort terminates correctly
even with non-deterministic comparators (unlike quicksort).

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-03-14 19:27:28 -04:00
ProofOfConcept
e12dea503b agent evaluate: sort agent actions by quality using Vec::sort_by with LLM
Yes, really. Rust's stdlib sort_by with an LLM pairwise comparator.
Each comparison is an API call asking "which action was better?"

Sample N actions per agent type, throw them all in a Vec, sort.
Where each agent's samples cluster = that agent's quality score.
Reports per-type average rank and quality ratio.

Supports both haiku (fast/cheap) and sonnet (quality) as comparator.

Usage: poc-memory agent evaluate --samples 5 --model haiku

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-03-14 19:24:07 -04:00
ProofOfConcept
99db511403 cli: move helpers to cli modules, main.rs under 1100 lines
Move CLI-specific helpers to their cli/ modules:
- journal_tail_entries, journal_tail_digests, extract_title,
  find_current_transcript → cli/journal.rs
- get_group_content → cli/misc.rs
- cmd_journal_write, cmd_journal_tail, cmd_load_context follow

These are presentation/session helpers, not library code — they
belong in the CLI layer per Kent's guidance.

main.rs: 3130 → 1054 lines (66% reduction).

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-03-14 18:14:52 -04:00
ProofOfConcept
8640d50990 cli: extract journal and misc commands, complete split
Move remaining extractable handlers into cli/journal.rs and cli/misc.rs.
Functions depending on main.rs helpers (cmd_journal_tail, cmd_journal_write,
cmd_load_context, cmd_cursor, cmd_daemon, cmd_digest, cmd_experience_mine,
cmd_apply_agent) remain in main.rs — next step is moving those helpers
to library code.

main.rs: 3130 → 1331 lines (57% reduction).
cli/ total: 1860 lines across 6 focused files.

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-03-14 18:10:22 -04:00
ProofOfConcept
f423cf22df cli: extract agent and admin commands from main.rs
Move agent handlers (consolidate, replay, digest, experience-mine,
fact-mine, knowledge-loop, apply-*) into cli/agent.rs.

Move admin handlers (init, fsck, dedup, bulk-rename, health,
daily-check, import, export) into cli/admin.rs.

Functions tightly coupled to Clap types (cmd_daemon, cmd_digest,
cmd_apply_agent, cmd_experience_mine) remain in main.rs.

main.rs: 3130 → 1586 lines (49% reduction).

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-03-14 18:06:27 -04:00
ProofOfConcept
aa2fddf137 cli: extract node commands from main.rs into cli/node.rs
Move 15 node subcommand handlers (310 lines) out of main.rs:
render, write, used, wrong, not-relevant, not-useful, gap,
node-delete, node-rename, history, list-keys, list-edges,
dump-json, lookup-bump, lookups.

main.rs: 2518 → 2193 lines.

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-03-14 18:02:12 -04:00
ProofOfConcept
c8d86e94c1 cli: extract graph commands from main.rs into cli/graph.rs
Move 18 graph subcommand handlers (594 lines) out of main.rs:
link, link-add, link-impact, link-audit, link-orphans,
triangle-close, cap-degree, normalize-strengths, differentiate,
trace, spectral-*, organize, interference.

main.rs: 3130 → 2518 lines.

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-03-14 17:59:46 -04:00