No more subcrate nesting — src/, agents/, schema/, defaults/, build.rs all live at the workspace root. poc-daemon remains as the only workspace member. Crate name (poc-memory) and all imports unchanged. Co-Authored-By: Proof of Concept <poc@bcachefs.org>
254 lines
7.8 KiB
Markdown
254 lines
7.8 KiB
Markdown
# Query Language Design — Unifying Search and Agent Selection
|
|
|
|
Date: 2026-03-10
|
|
Status: Phase 1 complete (2026-03-10)
|
|
|
|
## Problem
|
|
|
|
Agent node selection is hardcoded in Rust (`prompts.rs`). Adding a new
|
|
agent means editing Rust, recompiling, restarting the daemon. The
|
|
existing search pipeline (spread, spectral, etc.) handles graph
|
|
exploration but can't express structured predicates on node fields.
|
|
|
|
We need one system that handles both:
|
|
- **Search**: "find nodes related to these terms" (graph exploration)
|
|
- **Selection**: "give me episodic nodes not seen by linker in 7 days,
|
|
sorted by priority" (structured predicates)
|
|
|
|
## Design Principle
|
|
|
|
The pipeline already exists: stages compose left-to-right, each
|
|
transforming a result set. We extend it with predicate stages that
|
|
filter/sort on node metadata, alongside the existing graph algorithm
|
|
stages.
|
|
|
|
An agent definition becomes a query expression + prompt template.
|
|
The daemon scheduler is just "which queries have stale results."
|
|
|
|
## Current Pipeline
|
|
|
|
```
|
|
seeds → [stage1] → [stage2] → ... → results
|
|
```
|
|
|
|
Each stage takes `Vec<(String, f64)>` (key, score) and returns the same.
|
|
Stages are parsed from strings: `spread,max_hops=4` or `spectral,k=20`.
|
|
|
|
## Proposed Extension
|
|
|
|
### Two kinds of stages
|
|
|
|
**Generators** — produce a result set from nothing (or from the store):
|
|
```
|
|
all # every non-deleted node
|
|
match:btree # text match (current seed extraction)
|
|
```
|
|
|
|
**Filters** — narrow an existing result set:
|
|
```
|
|
type:episodic # node_type == EpisodicSession
|
|
type:semantic # node_type == Semantic
|
|
key:journal#j-* # glob match on key
|
|
key-len:>=60 # key length predicate
|
|
weight:>0.5 # numeric comparison
|
|
age:<7d # created/modified within duration
|
|
content-len:>1000 # content size filter
|
|
provenance:manual # provenance match
|
|
not-visited:linker,7d # not seen by agent in duration
|
|
visited:linker # HAS been seen by agent (for auditing)
|
|
community:42 # community membership
|
|
```
|
|
|
|
**Transforms** — reorder or reshape:
|
|
```
|
|
sort:priority # consolidation priority scoring
|
|
sort:timestamp # by timestamp (desc by default)
|
|
sort:content-len # by content size
|
|
sort:degree # by graph degree
|
|
sort:weight # by weight
|
|
limit:20 # truncate
|
|
```
|
|
|
|
**Graph algorithms** (existing, unchanged):
|
|
```
|
|
spread # spreading activation
|
|
spectral,k=20 # spectral nearest neighbors
|
|
confluence # multi-source reachability
|
|
geodesic # straightest spectral paths
|
|
manifold # extrapolation along seed direction
|
|
```
|
|
|
|
### Syntax
|
|
|
|
Pipe-separated stages, same as current `-p` flag:
|
|
|
|
```
|
|
all | type:episodic | not-visited:linker,7d | sort:priority | limit:20
|
|
```
|
|
|
|
Or on the command line:
|
|
```
|
|
poc-memory search -p all -p type:episodic -p not-visited:linker,7d -p sort:priority -p limit:20
|
|
```
|
|
|
|
Current search still works unchanged:
|
|
```
|
|
poc-memory search btree journal -p spread
|
|
```
|
|
(terms become `match:` seeds implicitly)
|
|
|
|
### Agent definitions
|
|
|
|
A TOML file in `~/.claude/memory/agents/`:
|
|
|
|
```toml
|
|
# agents/linker.toml
|
|
[query]
|
|
pipeline = "all | type:episodic | not-visited:linker,7d | sort:priority | limit:20"
|
|
|
|
[prompt]
|
|
template = "linker.md"
|
|
placeholders = ["TOPOLOGY", "NODES"]
|
|
|
|
[execution]
|
|
model = "sonnet"
|
|
actions = ["link-add", "weight"] # allowed poc-memory actions in response
|
|
schedule = "daily" # or "on-demand"
|
|
```
|
|
|
|
The daemon reads agent definitions, executes their queries, fills
|
|
templates, calls the model, records visits on success.
|
|
|
|
### Implementation Plan
|
|
|
|
#### Phase 1: Filter stages in pipeline
|
|
|
|
Add to `search.rs`:
|
|
|
|
```rust
|
|
enum Stage {
|
|
Generator(Generator),
|
|
Filter(Filter),
|
|
Transform(Transform),
|
|
Algorithm(Algorithm), // existing
|
|
}
|
|
|
|
enum Generator {
|
|
All,
|
|
Match(Vec<String>), // current seed extraction
|
|
}
|
|
|
|
enum Filter {
|
|
Type(NodeType),
|
|
KeyGlob(String),
|
|
KeyLen(Comparison),
|
|
Weight(Comparison),
|
|
Age(Comparison), // vs now - timestamp
|
|
ContentLen(Comparison),
|
|
Provenance(Provenance),
|
|
NotVisited { agent: String, duration: Duration },
|
|
Visited { agent: String },
|
|
Community(u32),
|
|
}
|
|
|
|
enum Transform {
|
|
Sort(SortField),
|
|
Limit(usize),
|
|
}
|
|
|
|
enum Comparison {
|
|
Gt(f64),
|
|
Gte(f64),
|
|
Lt(f64),
|
|
Lte(f64),
|
|
Eq(f64),
|
|
}
|
|
|
|
enum SortField {
|
|
Priority,
|
|
Timestamp,
|
|
ContentLen,
|
|
Degree,
|
|
Weight,
|
|
}
|
|
```
|
|
|
|
The pipeline runner checks stage type:
|
|
- Generator: ignores input, produces new result set
|
|
- Filter: keeps items matching predicate, preserves scores
|
|
- Transform: reorders or truncates
|
|
- Algorithm: existing graph exploration (needs Graph)
|
|
|
|
Filter/Transform stages need access to the Store (for node fields)
|
|
and VisitIndex (for visit predicates). The `StoreView` trait already
|
|
provides node access; extend it for visits.
|
|
|
|
#### Phase 2: Agent-as-config
|
|
|
|
Parse TOML agent definitions. The daemon:
|
|
1. Reads `agents/*.toml`
|
|
2. For each with `schedule = "daily"`, checks if query results have
|
|
been visited recently enough
|
|
3. If stale, executes: parse pipeline → run query → format nodes →
|
|
fill template → call model → parse actions → record visits
|
|
|
|
Hot reload: watch the agents directory, pick up changes without restart.
|
|
|
|
#### Phase 3: Retire hardcoded agents
|
|
|
|
Migrate each hardcoded agent (replay, linker, separator, transfer,
|
|
rename, split) to a TOML definition. Remove the match arms from
|
|
`agent_prompt()`. The separator agent is the trickiest — its
|
|
"interference pair" selection is a join-like operation that may need
|
|
a custom generator stage rather than simple filtering.
|
|
|
|
## What we're NOT building
|
|
|
|
- A general-purpose SQL engine. No joins, no GROUP BY, no subqueries.
|
|
- Persistent indices. At ~13k nodes, full scan with predicate evaluation
|
|
is fast enough (~1ms). Add indices later if profiling demands it.
|
|
- A query optimizer. Pipeline stages execute in declaration order.
|
|
|
|
## StoreView Considerations
|
|
|
|
The existing `StoreView` trait only exposes `(key, content, weight)`.
|
|
Filter stages need access to `node_type`, `timestamp`, `key`, etc.
|
|
|
|
Options:
|
|
- (a) Expand StoreView with `node_meta()` returning a lightweight struct
|
|
- (b) Filter stages require `&Store` directly (not trait-polymorphic)
|
|
- (c) Add `fn node(&self, key: &str) -> Option<NodeRef>` to StoreView
|
|
|
|
Option (b) is simplest for now — agents always use a full Store. The
|
|
search hook (MmapView path) doesn't need agent filters. We can
|
|
generalize to (c) later if MmapView needs filter support.
|
|
|
|
For Phase 1, filter stages take `&Store` and the pipeline runner
|
|
dispatches: algorithm stages use `&dyn StoreView`, filter/transform
|
|
stages use `&Store`. This keeps the fast MmapView path for interactive
|
|
search untouched.
|
|
|
|
## Open Questions
|
|
|
|
1. **Separator agent**: Its "interference pairs" selection doesn't fit
|
|
the filter model cleanly. Best option is a custom generator stage
|
|
`interference-pairs,min_sim=0.5` that produces pair keys.
|
|
|
|
2. **Priority scoring**: `sort:priority` calls `consolidation_priority()`
|
|
which needs graph + spectral. This is a transform that needs the
|
|
full pipeline context — treat it as a "heavy sort" that's allowed
|
|
to compute.
|
|
|
|
3. **Duration syntax**: `7d`, `24h`, `30m`. Parse with simple regex
|
|
`(\d+)(d|h|m)` → seconds.
|
|
|
|
4. **Negation**: Prefix `!` on predicate: `!type:episodic`.
|
|
|
|
5. **Backwards compatibility**: Current `-p spread` syntax must keep
|
|
working. The parser tries algorithm names first, then predicate
|
|
syntax. No ambiguity since algorithms are bare words and predicates
|
|
use `:`.
|
|
|
|
6. **Stage ordering**: Generators must come first (or the pipeline
|
|
starts with implicit "all"). Filters/transforms can interleave
|
|
freely with algorithms. The runner validates this at parse time.
|