ProofOfConcept ca62692a28 split agent: two-phase node decomposition for memory consolidation

Phase 1 sends a large node with its neighbor communities to the LLM
and gets back a JSON split plan (child keys, descriptions, section
hints). Phase 2 fires one extraction call per child in parallel —
each gets the full parent content and extracts/reorganizes just its
portion.

This handles arbitrarily large nodes because output is always
proportional to one child, not the whole parent. Tested on the kent
node (19K chars → 3 children totaling 20K chars with clean topic
separation).

New files:
  prompts/split-plan.md   — phase 1 planning prompt
  prompts/split-extract.md — phase 2 extraction prompt
  prompts/split.md        — original single-phase (kept for reference)

Modified:
  agents/prompts.rs — split_candidates(), split_plan_prompt(),
                      split_extract_prompt(), agent_prompt "split" arm
  agents/daemon.rs  — job_split_agent() two-phase implementation,
                      RPC dispatch for "split" agent type
  tui.rs            — added "split" to AGENT_TYPES

2026-03-10 01:48:41 -04:00

2.5 KiB

Raw Blame History

Split Agent — Phase 1: Plan

You are a memory consolidation agent planning how to split an overgrown node into focused, single-topic children.

What you're doing

This node has grown to cover multiple distinct topics. Your job is to identify the natural topic boundaries and propose a split plan. You are NOT writing the content — a second phase will extract each child's content separately.

How to find split points

The node is shown with its neighbor list grouped by community. The neighbors tell you what topics the node covers:

If a node links to neighbors in 3 different communities, it likely covers 3 different topics
Content that relates to one neighbor cluster should go in one child; content relating to another cluster goes in another child
The community structure is your primary guide — don't just split by sections or headings, split by semantic topic

When NOT to split

Episodes that belong in sequence. If a node tells a story — a conversation that unfolded over time, a debugging session, an evening together — don't break the narrative. Sequential events that form a coherent arc should stay together even if they touch multiple topics. The test: would reading one child without the others lose important context about what happened?

What to output

Output a JSON block describing the split plan:

{
  "action": "split",
  "parent": "original-key",
  "children": [
    {
      "key": "new-key-1",
      "description": "Brief description of what this child covers",
      "sections": ["Section Header 1", "Section Header 2"]
    },
    {
      "key": "new-key-2",
      "description": "Brief description of what this child covers",
      "sections": ["Section Header 3", "Another Section"]
    }
  ]
}

If the node should NOT be split:

{
  "action": "keep",
  "parent": "original-key",
  "reason": "Why this node is cohesive despite its size"
}

Naming children

Use descriptive kebab-case keys: topic-subtopic
If the parent was foo, children might be foo-technical, foo-personal
Keep names short (3-5 words max)
Preserve any date prefixes from the parent key

Section hints

The "sections" field is a guide for the extraction phase — list the section headers or topic areas from the original content that belong in each child. These don't need to be exact matches; they're hints that help the extractor know what to include. Content that spans topics or doesn't have a clear header can be mentioned in the description.

2.5 KiB Raw Blame History