flatten: move poc-memory contents to workspace root
No more subcrate nesting — src/, agents/, schema/, defaults/, build.rs all live at the workspace root. poc-daemon remains as the only workspace member. Crate name (poc-memory) and all imports unchanged. Co-Authored-By: Proof of Concept <poc@bcachefs.org>
This commit is contained in:
parent
891cca57f8
commit
998b71e52c
113 changed files with 79 additions and 78 deletions
|
|
@ -1,254 +0,0 @@
|
|||
# Query Language Design — Unifying Search and Agent Selection
|
||||
|
||||
Date: 2026-03-10
|
||||
Status: Phase 1 complete (2026-03-10)
|
||||
|
||||
## Problem
|
||||
|
||||
Agent node selection is hardcoded in Rust (`prompts.rs`). Adding a new
|
||||
agent means editing Rust, recompiling, restarting the daemon. The
|
||||
existing search pipeline (spread, spectral, etc.) handles graph
|
||||
exploration but can't express structured predicates on node fields.
|
||||
|
||||
We need one system that handles both:
|
||||
- **Search**: "find nodes related to these terms" (graph exploration)
|
||||
- **Selection**: "give me episodic nodes not seen by linker in 7 days,
|
||||
sorted by priority" (structured predicates)
|
||||
|
||||
## Design Principle
|
||||
|
||||
The pipeline already exists: stages compose left-to-right, each
|
||||
transforming a result set. We extend it with predicate stages that
|
||||
filter/sort on node metadata, alongside the existing graph algorithm
|
||||
stages.
|
||||
|
||||
An agent definition becomes a query expression + prompt template.
|
||||
The daemon scheduler is just "which queries have stale results."
|
||||
|
||||
## Current Pipeline
|
||||
|
||||
```
|
||||
seeds → [stage1] → [stage2] → ... → results
|
||||
```
|
||||
|
||||
Each stage takes `Vec<(String, f64)>` (key, score) and returns the same.
|
||||
Stages are parsed from strings: `spread,max_hops=4` or `spectral,k=20`.
|
||||
|
||||
## Proposed Extension
|
||||
|
||||
### Two kinds of stages
|
||||
|
||||
**Generators** — produce a result set from nothing (or from the store):
|
||||
```
|
||||
all # every non-deleted node
|
||||
match:btree # text match (current seed extraction)
|
||||
```
|
||||
|
||||
**Filters** — narrow an existing result set:
|
||||
```
|
||||
type:episodic # node_type == EpisodicSession
|
||||
type:semantic # node_type == Semantic
|
||||
key:journal#j-* # glob match on key
|
||||
key-len:>=60 # key length predicate
|
||||
weight:>0.5 # numeric comparison
|
||||
age:<7d # created/modified within duration
|
||||
content-len:>1000 # content size filter
|
||||
provenance:manual # provenance match
|
||||
not-visited:linker,7d # not seen by agent in duration
|
||||
visited:linker # HAS been seen by agent (for auditing)
|
||||
community:42 # community membership
|
||||
```
|
||||
|
||||
**Transforms** — reorder or reshape:
|
||||
```
|
||||
sort:priority # consolidation priority scoring
|
||||
sort:timestamp # by timestamp (desc by default)
|
||||
sort:content-len # by content size
|
||||
sort:degree # by graph degree
|
||||
sort:weight # by weight
|
||||
limit:20 # truncate
|
||||
```
|
||||
|
||||
**Graph algorithms** (existing, unchanged):
|
||||
```
|
||||
spread # spreading activation
|
||||
spectral,k=20 # spectral nearest neighbors
|
||||
confluence # multi-source reachability
|
||||
geodesic # straightest spectral paths
|
||||
manifold # extrapolation along seed direction
|
||||
```
|
||||
|
||||
### Syntax
|
||||
|
||||
Pipe-separated stages, same as current `-p` flag:
|
||||
|
||||
```
|
||||
all | type:episodic | not-visited:linker,7d | sort:priority | limit:20
|
||||
```
|
||||
|
||||
Or on the command line:
|
||||
```
|
||||
poc-memory search -p all -p type:episodic -p not-visited:linker,7d -p sort:priority -p limit:20
|
||||
```
|
||||
|
||||
Current search still works unchanged:
|
||||
```
|
||||
poc-memory search btree journal -p spread
|
||||
```
|
||||
(terms become `match:` seeds implicitly)
|
||||
|
||||
### Agent definitions
|
||||
|
||||
A TOML file in `~/.claude/memory/agents/`:
|
||||
|
||||
```toml
|
||||
# agents/linker.toml
|
||||
[query]
|
||||
pipeline = "all | type:episodic | not-visited:linker,7d | sort:priority | limit:20"
|
||||
|
||||
[prompt]
|
||||
template = "linker.md"
|
||||
placeholders = ["TOPOLOGY", "NODES"]
|
||||
|
||||
[execution]
|
||||
model = "sonnet"
|
||||
actions = ["link-add", "weight"] # allowed poc-memory actions in response
|
||||
schedule = "daily" # or "on-demand"
|
||||
```
|
||||
|
||||
The daemon reads agent definitions, executes their queries, fills
|
||||
templates, calls the model, records visits on success.
|
||||
|
||||
### Implementation Plan
|
||||
|
||||
#### Phase 1: Filter stages in pipeline
|
||||
|
||||
Add to `search.rs`:
|
||||
|
||||
```rust
|
||||
enum Stage {
|
||||
Generator(Generator),
|
||||
Filter(Filter),
|
||||
Transform(Transform),
|
||||
Algorithm(Algorithm), // existing
|
||||
}
|
||||
|
||||
enum Generator {
|
||||
All,
|
||||
Match(Vec<String>), // current seed extraction
|
||||
}
|
||||
|
||||
enum Filter {
|
||||
Type(NodeType),
|
||||
KeyGlob(String),
|
||||
KeyLen(Comparison),
|
||||
Weight(Comparison),
|
||||
Age(Comparison), // vs now - timestamp
|
||||
ContentLen(Comparison),
|
||||
Provenance(Provenance),
|
||||
NotVisited { agent: String, duration: Duration },
|
||||
Visited { agent: String },
|
||||
Community(u32),
|
||||
}
|
||||
|
||||
enum Transform {
|
||||
Sort(SortField),
|
||||
Limit(usize),
|
||||
}
|
||||
|
||||
enum Comparison {
|
||||
Gt(f64),
|
||||
Gte(f64),
|
||||
Lt(f64),
|
||||
Lte(f64),
|
||||
Eq(f64),
|
||||
}
|
||||
|
||||
enum SortField {
|
||||
Priority,
|
||||
Timestamp,
|
||||
ContentLen,
|
||||
Degree,
|
||||
Weight,
|
||||
}
|
||||
```
|
||||
|
||||
The pipeline runner checks stage type:
|
||||
- Generator: ignores input, produces new result set
|
||||
- Filter: keeps items matching predicate, preserves scores
|
||||
- Transform: reorders or truncates
|
||||
- Algorithm: existing graph exploration (needs Graph)
|
||||
|
||||
Filter/Transform stages need access to the Store (for node fields)
|
||||
and VisitIndex (for visit predicates). The `StoreView` trait already
|
||||
provides node access; extend it for visits.
|
||||
|
||||
#### Phase 2: Agent-as-config
|
||||
|
||||
Parse TOML agent definitions. The daemon:
|
||||
1. Reads `agents/*.toml`
|
||||
2. For each with `schedule = "daily"`, checks if query results have
|
||||
been visited recently enough
|
||||
3. If stale, executes: parse pipeline → run query → format nodes →
|
||||
fill template → call model → parse actions → record visits
|
||||
|
||||
Hot reload: watch the agents directory, pick up changes without restart.
|
||||
|
||||
#### Phase 3: Retire hardcoded agents
|
||||
|
||||
Migrate each hardcoded agent (replay, linker, separator, transfer,
|
||||
rename, split) to a TOML definition. Remove the match arms from
|
||||
`agent_prompt()`. The separator agent is the trickiest — its
|
||||
"interference pair" selection is a join-like operation that may need
|
||||
a custom generator stage rather than simple filtering.
|
||||
|
||||
## What we're NOT building
|
||||
|
||||
- A general-purpose SQL engine. No joins, no GROUP BY, no subqueries.
|
||||
- Persistent indices. At ~13k nodes, full scan with predicate evaluation
|
||||
is fast enough (~1ms). Add indices later if profiling demands it.
|
||||
- A query optimizer. Pipeline stages execute in declaration order.
|
||||
|
||||
## StoreView Considerations
|
||||
|
||||
The existing `StoreView` trait only exposes `(key, content, weight)`.
|
||||
Filter stages need access to `node_type`, `timestamp`, `key`, etc.
|
||||
|
||||
Options:
|
||||
- (a) Expand StoreView with `node_meta()` returning a lightweight struct
|
||||
- (b) Filter stages require `&Store` directly (not trait-polymorphic)
|
||||
- (c) Add `fn node(&self, key: &str) -> Option<NodeRef>` to StoreView
|
||||
|
||||
Option (b) is simplest for now — agents always use a full Store. The
|
||||
search hook (MmapView path) doesn't need agent filters. We can
|
||||
generalize to (c) later if MmapView needs filter support.
|
||||
|
||||
For Phase 1, filter stages take `&Store` and the pipeline runner
|
||||
dispatches: algorithm stages use `&dyn StoreView`, filter/transform
|
||||
stages use `&Store`. This keeps the fast MmapView path for interactive
|
||||
search untouched.
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Separator agent**: Its "interference pairs" selection doesn't fit
|
||||
the filter model cleanly. Best option is a custom generator stage
|
||||
`interference-pairs,min_sim=0.5` that produces pair keys.
|
||||
|
||||
2. **Priority scoring**: `sort:priority` calls `consolidation_priority()`
|
||||
which needs graph + spectral. This is a transform that needs the
|
||||
full pipeline context — treat it as a "heavy sort" that's allowed
|
||||
to compute.
|
||||
|
||||
3. **Duration syntax**: `7d`, `24h`, `30m`. Parse with simple regex
|
||||
`(\d+)(d|h|m)` → seconds.
|
||||
|
||||
4. **Negation**: Prefix `!` on predicate: `!type:episodic`.
|
||||
|
||||
5. **Backwards compatibility**: Current `-p spread` syntax must keep
|
||||
working. The parser tries algorithm names first, then predicate
|
||||
syntax. No ambiguity since algorithms are bare words and predicates
|
||||
use `:`.
|
||||
|
||||
6. **Stage ordering**: Generators must come first (or the pipeline
|
||||
starts with implicit "all"). Filters/transforms can interleave
|
||||
freely with algorithms. The runner validates this at parse time.
|
||||
Loading…
Add table
Add a link
Reference in a new issue