consciousness/research/query-language-unification.md

95 lines
3.5 KiB
Markdown
Raw Normal View History

# Query Language Unification Plan
**Status: DONE** (2026-04-11)
## Problem (was)
Two query parsers that didn't agree on syntax:
1. **PEG parser** (`hippocampus/query/parser.rs`) — boolean logic, general
comparisons, operator precedence, parentheses. Used by CLI and compact
format path in `query()` tool.
2. **Pipeline parser** (`hippocampus/query/engine.rs`) — domain-specific
filters (type, age, provenance), graph algorithms (spread, spectral).
Used by full format path in `query()` tool.
`journal_tail` generates pipeline syntax but gets routed through the PEG
parser on the compact path. Result: parse errors.
## Approach
Keep the PEG parser (has the harder-to-build structural foundation),
extend it with the pipeline parser's domain features.
## Expression extensions (add to `expr` rule in parser.rs)
- `field:value` shorthand for `field = 'value'` (colon-separated equality)
- `*` already works as `Expr::All`
- `key ~ 'glob'` already works via match operator
## New stages (add to `stage` rule in parser.rs)
Domain filter stages from engine.rs:
- `type:X` — filter by node type (episodic, daily, weekly, monthly, semantic)
- `age:<7d` — duration comparison on timestamp
- `key:GLOB` — glob match on key
- `provenance:X` — provenance filter
- `weight:>N` — weight comparison (may already work via general comparison)
- `content-len:>N` — content size filter
Sort/limit syntax variants:
- `sort:field` alongside existing `sort field`
- `limit:N` alongside existing `limit N`
Graph algorithms:
- `spread` — spreading activation
- `spectral` — spectral nearest neighbors
- `confluence` — multi-source reachability
- `geodesic` — straightest spectral paths
- `manifold` — extrapolation along seed direction
## What changes
1. `parser.rs` — add field:value shorthand to expr, add domain stages
2. `engine.rs` — keep run_pipeline execution logic, have PEG parser emit
compatible Stage types (or convert PEG AST to Stage at boundary)
3. `query()` tool handler (memory.rs) — one parser path for all formats
4. `journal_tail` (memory.rs) — generate unified syntax
5. CLI `poc-memory query` — uses unified parser
## Migration path
1. Add field:value shorthand and type/age/key stages to PEG parser
2. Route query() through PEG parser for all formats
3. Migrate journal_tail and any other pipeline-syntax callers
4. Remove the pipeline parser (or keep as internal execution layer)
## What was done
**Deleted from engine.rs (-153 lines):**
- `Stage::parse()` and `Stage::parse_pipeline()` — redundant with PEG
- `parse_cmp()`, `parse_duration_or_number()`, `parse_composite_sort()`,
`parse_node_type()`, `parse_sort_field()` — helper functions for deleted parser
**Added to parser.rs (+120 lines):**
- Pipeline syntax in PEG grammar (`type:X`, `age:<Nd`, `sort:field`, etc.)
- `parse_stages()` — unified entry point returning `Vec<Stage>`
- Grammar helper functions
**Net: +17 lines**
**Architecture now:**
- parser.rs: PEG grammar handles ALL parsing (both syntaxes)
- engine.rs: Pure execution — types and `run_query()`, no parsing
Result: `all | type:episodic | sort:timestamp | limit:5` works everywhere.
Mixed syntax like `degree > 5 | type:semantic | sort degree` also works.
## What NOT to change (original note)
The run_pipeline execution logic stays — it's correct and well-tested.
Only the parsing front-end unifies. The pipeline parser's Stage enum
becomes the internal representation that both the PEG parser and any
remaining direct callers produce.