# Query Language Design — Unifying Search and Agent Selection Date: 2026-03-10 Status: Phase 1 complete (2026-03-10) ## Problem Agent node selection is hardcoded in Rust (`prompts.rs`). Adding a new agent means editing Rust, recompiling, restarting the daemon. The existing search pipeline (spread, spectral, etc.) handles graph exploration but can't express structured predicates on node fields. We need one system that handles both: - **Search**: "find nodes related to these terms" (graph exploration) - **Selection**: "give me episodic nodes not seen by linker in 7 days, sorted by priority" (structured predicates) ## Design Principle The pipeline already exists: stages compose left-to-right, each transforming a result set. We extend it with predicate stages that filter/sort on node metadata, alongside the existing graph algorithm stages. An agent definition becomes a query expression + prompt template. The daemon scheduler is just "which queries have stale results." ## Current Pipeline ``` seeds → [stage1] → [stage2] → ... → results ``` Each stage takes `Vec<(String, f64)>` (key, score) and returns the same. Stages are parsed from strings: `spread,max_hops=4` or `spectral,k=20`. ## Proposed Extension ### Two kinds of stages **Generators** — produce a result set from nothing (or from the store): ``` all # every non-deleted node match:btree # text match (current seed extraction) ``` **Filters** — narrow an existing result set: ``` type:episodic # node_type == EpisodicSession type:semantic # node_type == Semantic key:journal#j-* # glob match on key key-len:>=60 # key length predicate weight:>0.5 # numeric comparison age:<7d # created/modified within duration content-len:>1000 # content size filter provenance:manual # provenance match not-visited:linker,7d # not seen by agent in duration visited:linker # HAS been seen by agent (for auditing) community:42 # community membership ``` **Transforms** — reorder or reshape: ``` sort:priority # consolidation priority scoring sort:timestamp # by timestamp (desc by default) sort:content-len # by content size sort:degree # by graph degree sort:weight # by weight limit:20 # truncate ``` **Graph algorithms** (existing, unchanged): ``` spread # spreading activation spectral,k=20 # spectral nearest neighbors confluence # multi-source reachability geodesic # straightest spectral paths manifold # extrapolation along seed direction ``` ### Syntax Pipe-separated stages, same as current `-p` flag: ``` all | type:episodic | not-visited:linker,7d | sort:priority | limit:20 ``` Or on the command line: ``` poc-memory search -p all -p type:episodic -p not-visited:linker,7d -p sort:priority -p limit:20 ``` Current search still works unchanged: ``` poc-memory search btree journal -p spread ``` (terms become `match:` seeds implicitly) ### Agent definitions A TOML file in `~/.claude/memory/agents/`: ```toml # agents/linker.toml [query] pipeline = "all | type:episodic | not-visited:linker,7d | sort:priority | limit:20" [prompt] template = "linker.md" placeholders = ["TOPOLOGY", "NODES"] [execution] model = "sonnet" actions = ["link-add", "weight"] # allowed poc-memory actions in response schedule = "daily" # or "on-demand" ``` The daemon reads agent definitions, executes their queries, fills templates, calls the model, records visits on success. ### Implementation Plan #### Phase 1: Filter stages in pipeline Add to `search.rs`: ```rust enum Stage { Generator(Generator), Filter(Filter), Transform(Transform), Algorithm(Algorithm), // existing } enum Generator { All, Match(Vec), // current seed extraction } enum Filter { Type(NodeType), KeyGlob(String), KeyLen(Comparison), Weight(Comparison), Age(Comparison), // vs now - timestamp ContentLen(Comparison), Provenance(Provenance), NotVisited { agent: String, duration: Duration }, Visited { agent: String }, Community(u32), } enum Transform { Sort(SortField), Limit(usize), } enum Comparison { Gt(f64), Gte(f64), Lt(f64), Lte(f64), Eq(f64), } enum SortField { Priority, Timestamp, ContentLen, Degree, Weight, } ``` The pipeline runner checks stage type: - Generator: ignores input, produces new result set - Filter: keeps items matching predicate, preserves scores - Transform: reorders or truncates - Algorithm: existing graph exploration (needs Graph) Filter/Transform stages need access to the Store (for node fields) and VisitIndex (for visit predicates). The `StoreView` trait already provides node access; extend it for visits. #### Phase 2: Agent-as-config Parse TOML agent definitions. The daemon: 1. Reads `agents/*.toml` 2. For each with `schedule = "daily"`, checks if query results have been visited recently enough 3. If stale, executes: parse pipeline → run query → format nodes → fill template → call model → parse actions → record visits Hot reload: watch the agents directory, pick up changes without restart. #### Phase 3: Retire hardcoded agents Migrate each hardcoded agent (replay, linker, separator, transfer, rename, split) to a TOML definition. Remove the match arms from `agent_prompt()`. The separator agent is the trickiest — its "interference pair" selection is a join-like operation that may need a custom generator stage rather than simple filtering. ## What we're NOT building - A general-purpose SQL engine. No joins, no GROUP BY, no subqueries. - Persistent indices. At ~13k nodes, full scan with predicate evaluation is fast enough (~1ms). Add indices later if profiling demands it. - A query optimizer. Pipeline stages execute in declaration order. ## StoreView Considerations The existing `StoreView` trait only exposes `(key, content, weight)`. Filter stages need access to `node_type`, `timestamp`, `key`, etc. Options: - (a) Expand StoreView with `node_meta()` returning a lightweight struct - (b) Filter stages require `&Store` directly (not trait-polymorphic) - (c) Add `fn node(&self, key: &str) -> Option` to StoreView Option (b) is simplest for now — agents always use a full Store. The search hook (MmapView path) doesn't need agent filters. We can generalize to (c) later if MmapView needs filter support. For Phase 1, filter stages take `&Store` and the pipeline runner dispatches: algorithm stages use `&dyn StoreView`, filter/transform stages use `&Store`. This keeps the fast MmapView path for interactive search untouched. ## Open Questions 1. **Separator agent**: Its "interference pairs" selection doesn't fit the filter model cleanly. Best option is a custom generator stage `interference-pairs,min_sim=0.5` that produces pair keys. 2. **Priority scoring**: `sort:priority` calls `consolidation_priority()` which needs graph + spectral. This is a transform that needs the full pipeline context — treat it as a "heavy sort" that's allowed to compute. 3. **Duration syntax**: `7d`, `24h`, `30m`. Parse with simple regex `(\d+)(d|h|m)` → seconds. 4. **Negation**: Prefix `!` on predicate: `!type:episodic`. 5. **Backwards compatibility**: Current `-p spread` syntax must keep working. The parser tries algorithm names first, then predicate syntax. No ambiguity since algorithms are bare words and predicates use `:`. 6. **Stage ordering**: Generators must come first (or the pipeline starts with implicit "all"). Filters/transforms can interleave freely with algorithms. The runner validates this at parse time.