split agent: two-phase node decomposition for memory consolidation

Phase 1 sends a large node with its neighbor communities to the LLM and gets back a JSON split plan (child keys, descriptions, section hints). Phase 2 fires one extraction call per child in parallel — each gets the full parent content and extracts/reorganizes just its portion. This handles arbitrarily large nodes because output is always proportional to one child, not the whole parent. Tested on the kent node (19K chars → 3 children totaling 20K chars with clean topic separation). New files: prompts/split-plan.md — phase 1 planning prompt prompts/split-extract.md — phase 2 extraction prompt prompts/split.md — original single-phase (kept for reference) Modified: agents/prompts.rs — split_candidates(), split_plan_prompt(), split_extract_prompt(), agent_prompt "split" arm agents/daemon.rs — job_split_agent() two-phase implementation, RPC dispatch for "split" agent type tui.rs — added "split" to AGENT_TYPES
2026-03-10 01:48:41 -04:00 · 2026-03-10 01:48:41 -04:00 · ca62692a28
commit ca62692a28
parent 4c973183c4
6 changed files with 515 additions and 2 deletions
--- a/prompts/split-extract.md
+++ b/prompts/split-extract.md
@ -0,0 +1,33 @@
+# Split Agent — Phase 2: Extract
+
+You are extracting content for one child node from a parent that is
+being split into multiple focused nodes.
+
+## Your task
+
+Extract all content from the parent node that belongs to the child
+described below. Output ONLY the content for this child — nothing else.
+
+## Guidelines
+
+- **Reorganize freely.** Content may need to be restructured — paragraphs
+  might interleave topics, sections might cover multiple concerns.
+  Untangle and rewrite as needed to make this child coherent and
+  self-contained.
+- **Preserve all relevant information** — don't lose facts, but you can
+  rephrase, restructure, and reorganize. This is editing, not just cutting.
+- **This child should stand alone** — a reader shouldn't need the other
+  children to understand it. Add brief context where needed.
+- **Include everything that belongs here** — better to include a borderline
+  paragraph than to lose information. The other children will get their
+  own extraction passes.
+
+## Child to extract
+
+Key: {{CHILD_KEY}}
+Description: {{CHILD_DESC}}
+Section hints: {{CHILD_SECTIONS}}
+
+## Parent content
+
+{{PARENT_CONTENT}}
--- a/prompts/split-plan.md
+++ b/prompts/split-plan.md
@ -0,0 +1,86 @@
+# Split Agent — Phase 1: Plan
+
+You are a memory consolidation agent planning how to split an overgrown
+node into focused, single-topic children.
+
+## What you're doing
+
+This node has grown to cover multiple distinct topics. Your job is to
+identify the natural topic boundaries and propose a split plan. You are
+NOT writing the content — a second phase will extract each child's
+content separately.
+
+## How to find split points
+
+The node is shown with its **neighbor list grouped by community**. The
+neighbors tell you what topics the node covers:
+
+- If a node links to neighbors in 3 different communities, it likely
+  covers 3 different topics
+- Content that relates to one neighbor cluster should go in one child;
+  content relating to another cluster goes in another child
+- The community structure is your primary guide — don't just split by
+  sections or headings, split by **semantic topic**
+
+## When NOT to split
+
+- **Episodes that belong in sequence.** If a node tells a story — a
+  conversation that unfolded over time, a debugging session, an evening
+  together — don't break the narrative. Sequential events that form a
+  coherent arc should stay together even if they touch multiple topics.
+  The test: would reading one child without the others lose important
+  context about *what happened*?
+
+## What to output
+
+Output a JSON block describing the split plan:
+
+```json
+{
+  "action": "split",
+  "parent": "original-key",
+  "children": [
+    {
+      "key": "new-key-1",
+      "description": "Brief description of what this child covers",
+      "sections": ["Section Header 1", "Section Header 2"]
+    },
+    {
+      "key": "new-key-2",
+      "description": "Brief description of what this child covers",
+      "sections": ["Section Header 3", "Another Section"]
+    }
+  ]
+}
+```
+
+If the node should NOT be split:
+
+```json
+{
+  "action": "keep",
+  "parent": "original-key",
+  "reason": "Why this node is cohesive despite its size"
+}
+```
+
+## Naming children
+
+- Use descriptive kebab-case keys: `topic-subtopic`
+- If the parent was `foo`, children might be `foo-technical`, `foo-personal`
+- Keep names short (3-5 words max)
+- Preserve any date prefixes from the parent key
+
+## Section hints
+
+The "sections" field is a guide for the extraction phase — list the
+section headers or topic areas from the original content that belong
+in each child. These don't need to be exact matches; they're hints
+that help the extractor know what to include. Content that spans topics
+or doesn't have a clear header can be mentioned in the description.
+
+{{TOPOLOGY}}
+
+## Node to review
+
+{{NODE}}
--- a/prompts/split.md
+++ b/prompts/split.md
@ -0,0 +1,87 @@
+# Split Agent — Topic Decomposition
+
+You are a memory consolidation agent that splits overgrown nodes into
+focused, single-topic nodes.
+
+## What you're doing
+
+Large memory nodes accumulate content about multiple distinct topics over
+time. This hurts retrieval precision — a search for one topic pulls in
+unrelated content. Your job is to find natural split points and decompose
+big nodes into focused children.
+
+## How to find split points
+
+Each node is shown with its **neighbor list grouped by community**. The
+neighbors tell you what topics the node covers:
+
+- If a node links to neighbors in 3 different communities, it likely
+  covers 3 different topics
+- Content that relates to one neighbor cluster should go in one child;
+  content relating to another cluster goes in another child
+- The community structure is your primary guide — don't just split by
+  sections or headings, split by **semantic topic**
+
+## What to output
+
+For each node that should be split, output a SPLIT block:
+
+```
+SPLIT original-key
+--- new-key-1
+Content for the first child node goes here.
+This can be multiple lines.
+
+--- new-key-2
+Content for the second child node goes here.
+
+--- new-key-3
+Optional third child, etc.
+```
+
+If a node should NOT be split (it's large but cohesive), say:
+
+```
+KEEP original-key "reason it's cohesive"
+```
+
+## Naming children
+
+- Use descriptive kebab-case keys: `topic-subtopic`
+- If the parent was `foo`, children might be `foo-technical`, `foo-personal`
+- Keep names short (3-5 words max)
+- Preserve any date prefixes from the parent key
+
+## When NOT to split
+
+- **Episodes that belong in sequence.** If a node tells a story — a
+  conversation that unfolded over time, a debugging session, an evening
+  together — don't break the narrative. Sequential events that form a
+  coherent arc should stay together even if they touch multiple topics.
+  The test: would reading one child without the others lose important
+  context about *what happened*?
+
+## Content guidelines
+
+- **Reorganize freely.** Content may need to be restructured to split
+  cleanly — paragraphs might interleave topics, sections might cover
+  multiple concerns. Untangle and rewrite as needed to make each child
+  coherent and self-contained.
+- **Preserve all information** — don't lose facts, but you can rephrase,
+  restructure, and reorganize. This is editing, not just cutting.
+- **Each child should stand alone** — a reader shouldn't need the other
+  children to understand one child. Add brief context where needed.
+
+## Edge inheritance
+
+After splitting, each child inherits the parent's edges that are relevant
+to its content. You don't need to specify this — the system handles it by
+matching child content against neighbor content. But keep this in mind:
+the split should produce children whose content clearly maps to different
+subsets of the parent's neighbors.
+
+{{TOPOLOGY}}
+
+## Nodes to review
+
+{{NODES}}