consciousness/research/query-language-unification.md
ProofOfConcept aad227e487 query: unify PEG and engine parsers
PEG parser now handles both expression syntax (degree > 5 | sort degree)
and pipeline syntax (all | type:episodic | sort:timestamp). Deleted
Stage::parse() and helpers from engine.rs — it's now pure execution.

All callers use parse_stages() from parser.rs as the single entry point.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-11 20:42:58 -04:00

3.5 KiB

Query Language Unification Plan

Status: DONE (2026-04-11)

Problem (was)

Two query parsers that didn't agree on syntax:

  1. PEG parser (hippocampus/query/parser.rs) — boolean logic, general comparisons, operator precedence, parentheses. Used by CLI and compact format path in query() tool.

  2. Pipeline parser (hippocampus/query/engine.rs) — domain-specific filters (type, age, provenance), graph algorithms (spread, spectral). Used by full format path in query() tool.

journal_tail generates pipeline syntax but gets routed through the PEG parser on the compact path. Result: parse errors.

Approach

Keep the PEG parser (has the harder-to-build structural foundation), extend it with the pipeline parser's domain features.

Expression extensions (add to expr rule in parser.rs)

  • field:value shorthand for field = 'value' (colon-separated equality)
  • * already works as Expr::All
  • key ~ 'glob' already works via match operator

New stages (add to stage rule in parser.rs)

Domain filter stages from engine.rs:

  • type:X — filter by node type (episodic, daily, weekly, monthly, semantic)
  • age:<7d — duration comparison on timestamp
  • key:GLOB — glob match on key
  • provenance:X — provenance filter
  • weight:>N — weight comparison (may already work via general comparison)
  • content-len:>N — content size filter

Sort/limit syntax variants:

  • sort:field alongside existing sort field
  • limit:N alongside existing limit N

Graph algorithms:

  • spread — spreading activation
  • spectral — spectral nearest neighbors
  • confluence — multi-source reachability
  • geodesic — straightest spectral paths
  • manifold — extrapolation along seed direction

What changes

  1. parser.rs — add field:value shorthand to expr, add domain stages
  2. engine.rs — keep run_pipeline execution logic, have PEG parser emit compatible Stage types (or convert PEG AST to Stage at boundary)
  3. query() tool handler (memory.rs) — one parser path for all formats
  4. journal_tail (memory.rs) — generate unified syntax
  5. CLI poc-memory query — uses unified parser

Migration path

  1. Add field:value shorthand and type/age/key stages to PEG parser
  2. Route query() through PEG parser for all formats
  3. Migrate journal_tail and any other pipeline-syntax callers
  4. Remove the pipeline parser (or keep as internal execution layer)

What was done

Deleted from engine.rs (-153 lines):

  • Stage::parse() and Stage::parse_pipeline() — redundant with PEG
  • parse_cmp(), parse_duration_or_number(), parse_composite_sort(), parse_node_type(), parse_sort_field() — helper functions for deleted parser

Added to parser.rs (+120 lines):

  • Pipeline syntax in PEG grammar (type:X, age:<Nd, sort:field, etc.)
  • parse_stages() — unified entry point returning Vec<Stage>
  • Grammar helper functions

Net: +17 lines

Architecture now:

  • parser.rs: PEG grammar handles ALL parsing (both syntaxes)
  • engine.rs: Pure execution — types and run_query(), no parsing

Result: all | type:episodic | sort:timestamp | limit:5 works everywhere. Mixed syntax like degree > 5 | type:semantic | sort degree also works.

What NOT to change (original note)

The run_pipeline execution logic stays — it's correct and well-tested. Only the parsing front-end unifies. The pipeline parser's Stage enum becomes the internal representation that both the PEG parser and any remaining direct callers produce.