PEG parser now handles both expression syntax (degree > 5 | sort degree) and pipeline syntax (all | type:episodic | sort:timestamp). Deleted Stage::parse() and helpers from engine.rs — it's now pure execution. All callers use parse_stages() from parser.rs as the single entry point. Co-Authored-By: Proof of Concept <poc@bcachefs.org>
94 lines
3.5 KiB
Markdown
94 lines
3.5 KiB
Markdown
# Query Language Unification Plan
|
|
|
|
**Status: DONE** (2026-04-11)
|
|
|
|
## Problem (was)
|
|
|
|
Two query parsers that didn't agree on syntax:
|
|
|
|
1. **PEG parser** (`hippocampus/query/parser.rs`) — boolean logic, general
|
|
comparisons, operator precedence, parentheses. Used by CLI and compact
|
|
format path in `query()` tool.
|
|
|
|
2. **Pipeline parser** (`hippocampus/query/engine.rs`) — domain-specific
|
|
filters (type, age, provenance), graph algorithms (spread, spectral).
|
|
Used by full format path in `query()` tool.
|
|
|
|
`journal_tail` generates pipeline syntax but gets routed through the PEG
|
|
parser on the compact path. Result: parse errors.
|
|
|
|
## Approach
|
|
|
|
Keep the PEG parser (has the harder-to-build structural foundation),
|
|
extend it with the pipeline parser's domain features.
|
|
|
|
## Expression extensions (add to `expr` rule in parser.rs)
|
|
|
|
- `field:value` shorthand for `field = 'value'` (colon-separated equality)
|
|
- `*` already works as `Expr::All`
|
|
- `key ~ 'glob'` already works via match operator
|
|
|
|
## New stages (add to `stage` rule in parser.rs)
|
|
|
|
Domain filter stages from engine.rs:
|
|
- `type:X` — filter by node type (episodic, daily, weekly, monthly, semantic)
|
|
- `age:<7d` — duration comparison on timestamp
|
|
- `key:GLOB` — glob match on key
|
|
- `provenance:X` — provenance filter
|
|
- `weight:>N` — weight comparison (may already work via general comparison)
|
|
- `content-len:>N` — content size filter
|
|
|
|
Sort/limit syntax variants:
|
|
- `sort:field` alongside existing `sort field`
|
|
- `limit:N` alongside existing `limit N`
|
|
|
|
Graph algorithms:
|
|
- `spread` — spreading activation
|
|
- `spectral` — spectral nearest neighbors
|
|
- `confluence` — multi-source reachability
|
|
- `geodesic` — straightest spectral paths
|
|
- `manifold` — extrapolation along seed direction
|
|
|
|
## What changes
|
|
|
|
1. `parser.rs` — add field:value shorthand to expr, add domain stages
|
|
2. `engine.rs` — keep run_pipeline execution logic, have PEG parser emit
|
|
compatible Stage types (or convert PEG AST to Stage at boundary)
|
|
3. `query()` tool handler (memory.rs) — one parser path for all formats
|
|
4. `journal_tail` (memory.rs) — generate unified syntax
|
|
5. CLI `poc-memory query` — uses unified parser
|
|
|
|
## Migration path
|
|
|
|
1. Add field:value shorthand and type/age/key stages to PEG parser
|
|
2. Route query() through PEG parser for all formats
|
|
3. Migrate journal_tail and any other pipeline-syntax callers
|
|
4. Remove the pipeline parser (or keep as internal execution layer)
|
|
|
|
## What was done
|
|
|
|
**Deleted from engine.rs (-153 lines):**
|
|
- `Stage::parse()` and `Stage::parse_pipeline()` — redundant with PEG
|
|
- `parse_cmp()`, `parse_duration_or_number()`, `parse_composite_sort()`,
|
|
`parse_node_type()`, `parse_sort_field()` — helper functions for deleted parser
|
|
|
|
**Added to parser.rs (+120 lines):**
|
|
- Pipeline syntax in PEG grammar (`type:X`, `age:<Nd`, `sort:field`, etc.)
|
|
- `parse_stages()` — unified entry point returning `Vec<Stage>`
|
|
- Grammar helper functions
|
|
|
|
**Net: +17 lines**
|
|
|
|
**Architecture now:**
|
|
- parser.rs: PEG grammar handles ALL parsing (both syntaxes)
|
|
- engine.rs: Pure execution — types and `run_query()`, no parsing
|
|
|
|
Result: `all | type:episodic | sort:timestamp | limit:5` works everywhere.
|
|
Mixed syntax like `degree > 5 | type:semantic | sort degree` also works.
|
|
|
|
## What NOT to change (original note)
|
|
|
|
The run_pipeline execution logic stays — it's correct and well-tested.
|
|
Only the parsing front-end unifies. The pipeline parser's Stage enum
|
|
becomes the internal representation that both the PEG parser and any
|
|
remaining direct callers produce.
|