PEG parser now handles both expression syntax (degree > 5 | sort degree) and pipeline syntax (all | type:episodic | sort:timestamp). Deleted Stage::parse() and helpers from engine.rs — it's now pure execution. All callers use parse_stages() from parser.rs as the single entry point. Co-Authored-By: Proof of Concept <poc@bcachefs.org>
3.5 KiB
Query Language Unification Plan
Status: DONE (2026-04-11)
Problem (was)
Two query parsers that didn't agree on syntax:
-
PEG parser (
hippocampus/query/parser.rs) — boolean logic, general comparisons, operator precedence, parentheses. Used by CLI and compact format path inquery()tool. -
Pipeline parser (
hippocampus/query/engine.rs) — domain-specific filters (type, age, provenance), graph algorithms (spread, spectral). Used by full format path inquery()tool.
journal_tail generates pipeline syntax but gets routed through the PEG
parser on the compact path. Result: parse errors.
Approach
Keep the PEG parser (has the harder-to-build structural foundation), extend it with the pipeline parser's domain features.
Expression extensions (add to expr rule in parser.rs)
field:valueshorthand forfield = 'value'(colon-separated equality)*already works asExpr::Allkey ~ 'glob'already works via match operator
New stages (add to stage rule in parser.rs)
Domain filter stages from engine.rs:
type:X— filter by node type (episodic, daily, weekly, monthly, semantic)age:<7d— duration comparison on timestampkey:GLOB— glob match on keyprovenance:X— provenance filterweight:>N— weight comparison (may already work via general comparison)content-len:>N— content size filter
Sort/limit syntax variants:
sort:fieldalongside existingsort fieldlimit:Nalongside existinglimit N
Graph algorithms:
spread— spreading activationspectral— spectral nearest neighborsconfluence— multi-source reachabilitygeodesic— straightest spectral pathsmanifold— extrapolation along seed direction
What changes
parser.rs— add field:value shorthand to expr, add domain stagesengine.rs— keep run_pipeline execution logic, have PEG parser emit compatible Stage types (or convert PEG AST to Stage at boundary)query()tool handler (memory.rs) — one parser path for all formatsjournal_tail(memory.rs) — generate unified syntax- CLI
poc-memory query— uses unified parser
Migration path
- Add field:value shorthand and type/age/key stages to PEG parser
- Route query() through PEG parser for all formats
- Migrate journal_tail and any other pipeline-syntax callers
- Remove the pipeline parser (or keep as internal execution layer)
What was done
Deleted from engine.rs (-153 lines):
Stage::parse()andStage::parse_pipeline()— redundant with PEGparse_cmp(),parse_duration_or_number(),parse_composite_sort(),parse_node_type(),parse_sort_field()— helper functions for deleted parser
Added to parser.rs (+120 lines):
- Pipeline syntax in PEG grammar (
type:X,age:<Nd,sort:field, etc.) parse_stages()— unified entry point returningVec<Stage>- Grammar helper functions
Net: +17 lines
Architecture now:
- parser.rs: PEG grammar handles ALL parsing (both syntaxes)
- engine.rs: Pure execution — types and
run_query(), no parsing
Result: all | type:episodic | sort:timestamp | limit:5 works everywhere.
Mixed syntax like degree > 5 | type:semantic | sort degree also works.
What NOT to change (original note)
The run_pipeline execution logic stays — it's correct and well-tested. Only the parsing front-end unifies. The pipeline parser's Stage enum becomes the internal representation that both the PEG parser and any remaining direct callers produce.