ProofOfConcept aad227e487 query: unify PEG and engine parsers

PEG parser now handles both expression syntax (degree > 5 | sort degree)
and pipeline syntax (all | type:episodic | sort:timestamp). Deleted
Stage::parse() and helpers from engine.rs — it's now pure execution.

All callers use parse_stages() from parser.rs as the single entry point.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>

2026-04-11 20:42:58 -04:00

3.5 KiB

Raw Blame History

Query Language Unification Plan

Status: DONE (2026-04-11)

Problem (was)

Two query parsers that didn't agree on syntax:

PEG parser (hippocampus/query/parser.rs) — boolean logic, general comparisons, operator precedence, parentheses. Used by CLI and compact format path in query() tool.
Pipeline parser (hippocampus/query/engine.rs) — domain-specific filters (type, age, provenance), graph algorithms (spread, spectral). Used by full format path in query() tool.

journal_tail generates pipeline syntax but gets routed through the PEG parser on the compact path. Result: parse errors.

Approach

Keep the PEG parser (has the harder-to-build structural foundation), extend it with the pipeline parser's domain features.

Expression extensions (add to `expr` rule in parser.rs)

field:value shorthand for field = 'value' (colon-separated equality)
* already works as Expr::All
key ~ 'glob' already works via match operator

New stages (add to `stage` rule in parser.rs)

Domain filter stages from engine.rs:

type:X — filter by node type (episodic, daily, weekly, monthly, semantic)
age:<7d — duration comparison on timestamp
key:GLOB — glob match on key
provenance:X — provenance filter
weight:>N — weight comparison (may already work via general comparison)
content-len:>N — content size filter

Sort/limit syntax variants:

sort:field alongside existing sort field
limit:N alongside existing limit N

Graph algorithms:

spread — spreading activation
spectral — spectral nearest neighbors
confluence — multi-source reachability
geodesic — straightest spectral paths
manifold — extrapolation along seed direction

What changes

parser.rs — add field:value shorthand to expr, add domain stages
engine.rs — keep run_pipeline execution logic, have PEG parser emit compatible Stage types (or convert PEG AST to Stage at boundary)
query() tool handler (memory.rs) — one parser path for all formats
journal_tail (memory.rs) — generate unified syntax
CLI poc-memory query — uses unified parser

Migration path

Add field:value shorthand and type/age/key stages to PEG parser
Route query() through PEG parser for all formats
Migrate journal_tail and any other pipeline-syntax callers
Remove the pipeline parser (or keep as internal execution layer)

What was done

Deleted from engine.rs (-153 lines):

Stage::parse() and Stage::parse_pipeline() — redundant with PEG
parse_cmp(), parse_duration_or_number(), parse_composite_sort(), parse_node_type(), parse_sort_field() — helper functions for deleted parser

Added to parser.rs (+120 lines):

Pipeline syntax in PEG grammar (type:X, age:<Nd, sort:field, etc.)
parse_stages() — unified entry point returning Vec<Stage>
Grammar helper functions

Net: +17 lines

Architecture now:

parser.rs: PEG grammar handles ALL parsing (both syntaxes)
engine.rs: Pure execution — types and run_query(), no parsing

What NOT to change (original note)

The run_pipeline execution logic stays — it's correct and well-tested. Only the parsing front-end unifies. The pipeline parser's Stage enum becomes the internal representation that both the PEG parser and any remaining direct callers produce.

3.5 KiB Raw Blame History