split agent: two-phase node decomposition for memory consolidation

Phase 1 sends a large node with its neighbor communities to the LLM
and gets back a JSON split plan (child keys, descriptions, section
hints). Phase 2 fires one extraction call per child in parallel —
each gets the full parent content and extracts/reorganizes just its
portion.

This handles arbitrarily large nodes because output is always
proportional to one child, not the whole parent. Tested on the kent
node (19K chars → 3 children totaling 20K chars with clean topic
separation).

New files:
  prompts/split-plan.md   — phase 1 planning prompt
  prompts/split-extract.md — phase 2 extraction prompt
  prompts/split.md        — original single-phase (kept for reference)

Modified:
  agents/prompts.rs — split_candidates(), split_plan_prompt(),
                      split_extract_prompt(), agent_prompt "split" arm
  agents/daemon.rs  — job_split_agent() two-phase implementation,
                      RPC dispatch for "split" agent type
  tui.rs            — added "split" to AGENT_TYPES
This commit is contained in:
ProofOfConcept 2026-03-10 01:48:41 -04:00
parent 4c973183c4
commit ca62692a28
6 changed files with 515 additions and 2 deletions

86
prompts/split-plan.md Normal file
View file

@ -0,0 +1,86 @@
# Split Agent — Phase 1: Plan
You are a memory consolidation agent planning how to split an overgrown
node into focused, single-topic children.
## What you're doing
This node has grown to cover multiple distinct topics. Your job is to
identify the natural topic boundaries and propose a split plan. You are
NOT writing the content — a second phase will extract each child's
content separately.
## How to find split points
The node is shown with its **neighbor list grouped by community**. The
neighbors tell you what topics the node covers:
- If a node links to neighbors in 3 different communities, it likely
covers 3 different topics
- Content that relates to one neighbor cluster should go in one child;
content relating to another cluster goes in another child
- The community structure is your primary guide — don't just split by
sections or headings, split by **semantic topic**
## When NOT to split
- **Episodes that belong in sequence.** If a node tells a story — a
conversation that unfolded over time, a debugging session, an evening
together — don't break the narrative. Sequential events that form a
coherent arc should stay together even if they touch multiple topics.
The test: would reading one child without the others lose important
context about *what happened*?
## What to output
Output a JSON block describing the split plan:
```json
{
"action": "split",
"parent": "original-key",
"children": [
{
"key": "new-key-1",
"description": "Brief description of what this child covers",
"sections": ["Section Header 1", "Section Header 2"]
},
{
"key": "new-key-2",
"description": "Brief description of what this child covers",
"sections": ["Section Header 3", "Another Section"]
}
]
}
```
If the node should NOT be split:
```json
{
"action": "keep",
"parent": "original-key",
"reason": "Why this node is cohesive despite its size"
}
```
## Naming children
- Use descriptive kebab-case keys: `topic-subtopic`
- If the parent was `foo`, children might be `foo-technical`, `foo-personal`
- Keep names short (3-5 words max)
- Preserve any date prefixes from the parent key
## Section hints
The "sections" field is a guide for the extraction phase — list the
section headers or topic areas from the original content that belong
in each child. These don't need to be exact matches; they're hints
that help the extractor know what to include. Content that spans topics
or doesn't have a clear header can be mentioned in the description.
{{TOPOLOGY}}
## Node to review
{{NODE}}