experience-mine: per-segment dedup keys, retry backoff

The whole-file dedup key (_mined-transcripts#f-{UUID}) prevented mining new compaction segments when session files grew. Replace with per-segment keys (_mined-transcripts#f-{UUID}.{N}) so each segment is tracked independently. Changes: - daemon session-watcher: segment-aware dedup, migrate 272 existing whole-file keys to per-segment on restart - seg_cache with size-based invalidation (re-parse when file grows) - exponential retry backoff (5min → 30min cap) for failed sessions - experience_mine(): write per-segment key only, backfill on content-hash early return - fact-mining gated on all per-segment keys existing Also adds documentation: - docs/claude-code-transcript-format.md: JSONL transcript format - docs/plan-experience-mine-dedup-fix.md: design document
2026-03-09 02:27:51 -04:00 · 2026-03-09 02:27:51 -04:00 · 8eb6308760
commit 8eb6308760
parent 1326a683a5
4 changed files with 365 additions and 93 deletions
--- a/docs/claude-code-transcript-format.md
+++ b/docs/claude-code-transcript-format.md
@ -0,0 +1,97 @@
+# Claude Code Transcript Format
+
+Claude Code stores session transcripts as JSONL files (one JSON object per
+line) in `~/.claude/projects/<project-slug>/<session-uuid>.jsonl`.
+
+## Common fields
+
+Every line has:
+- `type` — message type (see below)
+- `uuid` — unique ID for this message
+- `parentUuid` — links to preceding message (forms a chain)
+- `sessionId` — session UUID (matches the filename stem)
+- `timestamp` — ISO 8601
+- `cwd`, `version`, `gitBranch` — session context
+
+## Message types
+
+### `user`
+
+User input or tool results. `message.content` is either:
+- A string (plain user text)
+- An array of content blocks, each with `type`:
+  - `"tool_result"` — result of a tool call, with `tool_use_id`, `content`
+    (string or array of text/image blocks), `is_error`
+
+User messages that start a compaction segment begin with:
+```
+This session is being continued from a previous conversation that ran out of context.
+```
+These are injected by Claude Code when context is compacted.
+
+Additional fields on user messages:
+- `userType` — `"external"` for human input, may differ for system-injected
+- `todos` — task list state
+- `permissionMode` — permission level for the session
+
+### `assistant`
+
+Model responses. `message` contains the full API response:
+- `model` — model ID (e.g. `"claude-opus-4-6"`)
+- `role` — `"assistant"`
+- `content` — array of content blocks:
+  - `{"type": "text", "text": "..."}` — text output
+  - `{"type": "tool_use", "id": "...", "name": "Bash", "input": {...}}` — tool call
+- `stop_reason` — why generation stopped
+- `usage` — token counts (input, output, cache hits)
+
+Additional fields:
+- `requestId` — API request ID
+
+### `system`
+
+System events. Has `subtype` field:
+- `"stop_hook_summary"` — hook execution results at end of turn
+  - `hookCount`, `hookInfos` (command + duration), `hookErrors`
+  - `preventedContinuation`, `stopReason`
+
+### `progress`
+
+Hook execution progress. `data` contains:
+- `type` — e.g. `"hook_progress"`
+- `hookEvent` — trigger event (e.g. `"PostToolUse"`)
+- `hookName` — specific hook (e.g. `"PostToolUse:Bash"`)
+- `command` — hook command path
+
+### `queue-operation`
+
+User input queued while assistant is working:
+- `operation` — `"enqueue"`
+- `content` — the queued text
+
+### `file-history-snapshot`
+
+File state snapshots for undo/redo:
+- `snapshot.trackedFileBackups` — map of file paths to backup state
+
+## Compaction segments
+
+Long-running sessions hit context limits and get compacted. Each compaction
+injects a user message starting with the marker text (see above), containing
+a summary of the preceding conversation. This splits the transcript into
+segments:
+
+- Segment 0: original conversation start through first compaction
+- Segment 1: first compaction summary through second compaction
+- Segment N: Nth compaction through next (or end of file)
+
+Segments are append-only — new compactions add higher-indexed segments.
+Existing segment indices are stable and never shift.
+
+## File lifecycle
+
+- Created when a session starts
+- Grows as messages are exchanged
+- Grows further when compaction happens (summary injected, conversation continues)
+- Never truncated or rewritten
+- Becomes stale when the session ends (no process has the file open)
--- a/docs/plan-experience-mine-dedup-fix.md
+++ b/docs/plan-experience-mine-dedup-fix.md
@ -0,0 +1,112 @@
+# Fix: experience-mine dedup and retry handling
+
+## Problem
+
+1. **Whole-file dedup key prevents mining new segments.** When a session
+   is mined, `experience_mine()` writes `_mined-transcripts#f-{UUID}` (a
+   whole-file key). If the session later grows (compaction adds segments),
+   the daemon sees the whole-file key and skips it forever. New segments
+   never get mined.
+
+2. **No retry backoff.** When `claude` CLI fails (exit status 1), the
+   session-watcher re-queues the same session every 60s tick. This
+   produces a wall of failures in the log and wastes resources.
+
+## Design
+
+### Dedup keys: per-segment only
+
+Going forward, dedup keys are per-segment: `_mined-transcripts#f-{UUID}.{N}`
+where N is the segment index. No more whole-file keys.
+
+Segment indices are stable — compaction appends new segments, never
+reorders existing ones. See `docs/claude-code-transcript-format.md`.
+
+### Migration of existing whole-file keys
+
+~276 sessions have whole-file keys (`_mined-transcripts#f-{UUID}` with
+no segment suffix) and no per-segment keys. These were mined correctly
+at the time.
+
+When the session-watcher encounters a whole-file key:
+- Count current segments in the file
+- Write per-segment keys for all current segments (they were covered
+  by the old whole-file key)
+- If the file has grown since (new segments beyond the migrated set),
+  those won't have per-segment keys and will be mined normally
+
+This is a one-time migration per file. After migration, the whole-file
+key is harmless dead weight — nothing creates new ones.
+
+### Retry backoff
+
+The session-watcher tracks failed sessions in a local
+`HashMap<String, (Instant, Duration)>` mapping path to
+(next_retry_after, current_backoff).
+
+- Initial backoff: 5 minutes
+- Each failure: double the backoff
+- Cap: 30 minutes
+- Resets on daemon restart (map is thread-local, not persisted)
+
+## Changes
+
+### `poc-memory/src/agents/enrich.rs`
+
+`experience_mine()`: stop writing the bare filename key for unsegmented
+calls. Only write the content-hash key (for the legacy dedup check at
+the top of the function) and per-segment keys.
+
+**Already done** — edited earlier in this session.
+
+### `poc-memory/src/agents/daemon.rs`
+
+Session-watcher changes:
+
+1. **Remove whole-file fast path.** Delete the `is_transcript_mined_with_keys`
+   check that short-circuits before segment counting.
+
+2. **Always go through segment-aware path.** Every stale session gets
+   segment counting (cached) and per-segment key checks.
+
+3. **Migrate whole-file keys.** When we find a whole-file key exists but
+   no per-segment keys: write per-segment keys for all current segments
+   into the store. One-time cost per file, batched into a single
+   store load/save per tick.
+
+4. **seg_cache with size invalidation.** Change from `HashMap<String, usize>`
+   to `HashMap<String, (u64, usize)>` — `(file_size, seg_count)`. When
+   stat shows a different size, evict and re-parse.
+
+5. **Remove `mark_transcript_done`.** Stop writing whole-file keys for
+   fully-mined multi-segment files.
+
+6. **Add retry backoff.** `HashMap<String, (Instant, Duration)>` for
+   tracking failed sessions. Skip sessions whose backoff hasn't expired.
+   On failure (task finishes with error), update the backoff. Exponential
+   from 5min, cap at 30min.
+
+7. **Fact-mining check.** Currently fact-mining is gated behind
+   `experience_done` (the whole-file key). After removing the whole-file
+   fast path, fact-mining should be gated on "all segments mined" —
+   i.e., all per-segment keys exist for the current segment count.
+
+### Manual cleanup after deploy
+
+Delete the dedup keys for sessions that failed repeatedly (like
+`8cebfc0a-bd33-49f1-85a4-1489bdf7050c`) so they get re-processed:
+
+```
+poc-memory delete-node '_mined-transcripts#f-8cebfc0a-bd33-49f1-85a4-1489bdf7050c'
+# also any content-hash key for the same file
+```
+
+## Verification
+
+After deploying:
+- `tail -f ~/.claude/memory/daemon.log | grep session-watcher` should
+  show ticks with migration activity, then settle to idle
+- Failed sessions should show increasing backoff intervals, not
+  per-second retries
+- After fixing the `claude` CLI issue, backed-off sessions should
+  retry and succeed on the next daemon restart