consciousness/research/query-language-unification.md

# Query Language Unification Plan

**Status: DONE** (2026-04-11)

## Problem (was)

Two query parsers that didn't agree on syntax:

1. **PEG parser** (`hippocampus/query/parser.rs`) — boolean logic, general
   comparisons, operator precedence, parentheses. Used by CLI and compact
   format path in `query()` tool.

2. **Pipeline parser** (`hippocampus/query/engine.rs`) — domain-specific
   filters (type, age, provenance), graph algorithms (spread, spectral).
   Used by full format path in `query()` tool.

`journal_tail` generates pipeline syntax but gets routed through the PEG
parser on the compact path. Result: parse errors.

## Approach

Keep the PEG parser (has the harder-to-build structural foundation),
extend it with the pipeline parser's domain features.

## Expression extensions (add to `expr` rule in parser.rs)

- `field:value` shorthand for `field = 'value'` (colon-separated equality)
- `*` already works as `Expr::All`
- `key ~ 'glob'` already works via match operator

## New stages (add to `stage` rule in parser.rs)

Domain filter stages from engine.rs:
- `type:X` — filter by node type (episodic, daily, weekly, monthly, semantic)
- `age:<7d` — duration comparison on timestamp
- `key:GLOB` — glob match on key
- `provenance:X` — provenance filter
- `weight:>N` — weight comparison (may already work via general comparison)
- `content-len:>N` — content size filter

Sort/limit syntax variants:
- `sort:field` alongside existing `sort field`
- `limit:N` alongside existing `limit N`

Graph algorithms:
- `spread` — spreading activation
- `spectral` — spectral nearest neighbors
- `confluence` — multi-source reachability
- `geodesic` — straightest spectral paths
- `manifold` — extrapolation along seed direction

## What changes

1. `parser.rs` — add field:value shorthand to expr, add domain stages
2. `engine.rs` — keep run_pipeline execution logic, have PEG parser emit
   compatible Stage types (or convert PEG AST to Stage at boundary)
3. `query()` tool handler (memory.rs) — one parser path for all formats
4. `journal_tail` (memory.rs) — generate unified syntax
5. CLI `poc-memory query` — uses unified parser

## Migration path

1. Add field:value shorthand and type/age/key stages to PEG parser
2. Route query() through PEG parser for all formats
3. Migrate journal_tail and any other pipeline-syntax callers
4. Remove the pipeline parser (or keep as internal execution layer)

## What was done

**Deleted from engine.rs (-153 lines):**
- `Stage::parse()` and `Stage::parse_pipeline()` — redundant with PEG
- `parse_cmp()`, `parse_duration_or_number()`, `parse_composite_sort()`,
  `parse_node_type()`, `parse_sort_field()` — helper functions for deleted parser

**Added to parser.rs (+120 lines):**
- Pipeline syntax in PEG grammar (`type:X`, `age:<Nd`, `sort:field`, etc.)
- `parse_stages()` — unified entry point returning `Vec<Stage>`
- Grammar helper functions

**Net: +17 lines**

**Architecture now:**
- parser.rs: PEG grammar handles ALL parsing (both syntaxes)
- engine.rs: Pure execution — types and `run_query()`, no parsing

Result: `all | type:episodic | sort:timestamp | limit:5` works everywhere.
Mixed syntax like `degree > 5 | type:semantic | sort degree` also works.

## What NOT to change (original note)

The run_pipeline execution logic stays — it's correct and well-tested.
Only the parsing front-end unifies. The pipeline parser's Stage enum
becomes the internal representation that both the PEG parser and any
remaining direct callers produce.
query: unify PEG and engine parsers PEG parser now handles both expression syntax (degree > 5 \| sort degree) and pipeline syntax (all \| type:episodic \| sort:timestamp). Deleted Stage::parse() and helpers from engine.rs — it's now pure execution. All callers use parse_stages() from parser.rs as the single entry point. Co-Authored-By: Proof of Concept <poc@bcachefs.org> 2026-04-11 20:42:58 -04:00			`# Query Language Unification Plan`

			`Status: DONE (2026-04-11)`

			`## Problem (was)`

			`Two query parsers that didn't agree on syntax:`

			1. PEG parser (`hippocampus/query/parser.rs`) — boolean logic, general
			`comparisons, operator precedence, parentheses. Used by CLI and compact`
			format path in `query()` tool.

			2. Pipeline parser (`hippocampus/query/engine.rs`) — domain-specific
			`filters (type, age, provenance), graph algorithms (spread, spectral).`
			Used by full format path in `query()` tool.

			`journal_tail` generates pipeline syntax but gets routed through the PEG
			`parser on the compact path. Result: parse errors.`

			`## Approach`

			`Keep the PEG parser (has the harder-to-build structural foundation),`
			`extend it with the pipeline parser's domain features.`

			## Expression extensions (add to `expr` rule in parser.rs)

			- `field:value` shorthand for `field = 'value'` (colon-separated equality)
			- `*` already works as `Expr::All`
			- `key ~ 'glob'` already works via match operator

			## New stages (add to `stage` rule in parser.rs)

			`Domain filter stages from engine.rs:`
			- `type:X` — filter by node type (episodic, daily, weekly, monthly, semantic)
			- `age:<7d` — duration comparison on timestamp
			- `key:GLOB` — glob match on key
			- `provenance:X` — provenance filter
			- `weight:>N` — weight comparison (may already work via general comparison)
			- `content-len:>N` — content size filter

			`Sort/limit syntax variants:`
			- `sort:field` alongside existing `sort field`
			- `limit:N` alongside existing `limit N`

			`Graph algorithms:`
			- `spread` — spreading activation
			- `spectral` — spectral nearest neighbors
			- `confluence` — multi-source reachability
			- `geodesic` — straightest spectral paths
			- `manifold` — extrapolation along seed direction

			`## What changes`

			1. `parser.rs` — add field:value shorthand to expr, add domain stages
			2. `engine.rs` — keep run_pipeline execution logic, have PEG parser emit
			`compatible Stage types (or convert PEG AST to Stage at boundary)`
			3. `query()` tool handler (memory.rs) — one parser path for all formats
			4. `journal_tail` (memory.rs) — generate unified syntax
			5. CLI `poc-memory query` — uses unified parser

			`## Migration path`

			`1. Add field:value shorthand and type/age/key stages to PEG parser`
			`2. Route query() through PEG parser for all formats`
			`3. Migrate journal_tail and any other pipeline-syntax callers`
			`4. Remove the pipeline parser (or keep as internal execution layer)`

			`## What was done`

			`Deleted from engine.rs (-153 lines):`
			- `Stage::parse()` and `Stage::parse_pipeline()` — redundant with PEG
			- `parse_cmp()`, `parse_duration_or_number()`, `parse_composite_sort()`,
			`parse_node_type()`, `parse_sort_field()` — helper functions for deleted parser

			`Added to parser.rs (+120 lines):`
			- Pipeline syntax in PEG grammar (`type:X`, `age:<Nd`, `sort:field`, etc.)
			- `parse_stages()` — unified entry point returning `Vec<Stage>`
			`- Grammar helper functions`

			`Net: +17 lines`

			`Architecture now:`
			`- parser.rs: PEG grammar handles ALL parsing (both syntaxes)`
			- engine.rs: Pure execution — types and `run_query()`, no parsing

			Result: `all \| type:episodic \| sort:timestamp \| limit:5` works everywhere.
			Mixed syntax like `degree > 5 \| type:semantic \| sort degree` also works.

			`## What NOT to change (original note)`

			`The run_pipeline execution logic stays — it's correct and well-tested.`
			`Only the parsing front-end unifies. The pipeline parser's Stage enum`
			`becomes the internal representation that both the PEG parser and any`
			`remaining direct callers produce.`