experience-mine: per-segment dedup keys, retry backoff
The whole-file dedup key (_mined-transcripts#f-{UUID}) prevented mining
new compaction segments when session files grew. Replace with per-segment
keys (_mined-transcripts#f-{UUID}.{N}) so each segment is tracked
independently.
Changes:
- daemon session-watcher: segment-aware dedup, migrate 272 existing
whole-file keys to per-segment on restart
- seg_cache with size-based invalidation (re-parse when file grows)
- exponential retry backoff (5min → 30min cap) for failed sessions
- experience_mine(): write per-segment key only, backfill on
content-hash early return
- fact-mining gated on all per-segment keys existing
Also adds documentation:
- docs/claude-code-transcript-format.md: JSONL transcript format
- docs/plan-experience-mine-dedup-fix.md: design document
This commit is contained in:
parent
1326a683a5
commit
8eb6308760
4 changed files with 367 additions and 95 deletions
97
docs/claude-code-transcript-format.md
Normal file
97
docs/claude-code-transcript-format.md
Normal file
|
|
@ -0,0 +1,97 @@
|
|||
# Claude Code Transcript Format
|
||||
|
||||
Claude Code stores session transcripts as JSONL files (one JSON object per
|
||||
line) in `~/.claude/projects/<project-slug>/<session-uuid>.jsonl`.
|
||||
|
||||
## Common fields
|
||||
|
||||
Every line has:
|
||||
- `type` — message type (see below)
|
||||
- `uuid` — unique ID for this message
|
||||
- `parentUuid` — links to preceding message (forms a chain)
|
||||
- `sessionId` — session UUID (matches the filename stem)
|
||||
- `timestamp` — ISO 8601
|
||||
- `cwd`, `version`, `gitBranch` — session context
|
||||
|
||||
## Message types
|
||||
|
||||
### `user`
|
||||
|
||||
User input or tool results. `message.content` is either:
|
||||
- A string (plain user text)
|
||||
- An array of content blocks, each with `type`:
|
||||
- `"tool_result"` — result of a tool call, with `tool_use_id`, `content`
|
||||
(string or array of text/image blocks), `is_error`
|
||||
|
||||
User messages that start a compaction segment begin with:
|
||||
```
|
||||
This session is being continued from a previous conversation that ran out of context.
|
||||
```
|
||||
These are injected by Claude Code when context is compacted.
|
||||
|
||||
Additional fields on user messages:
|
||||
- `userType` — `"external"` for human input, may differ for system-injected
|
||||
- `todos` — task list state
|
||||
- `permissionMode` — permission level for the session
|
||||
|
||||
### `assistant`
|
||||
|
||||
Model responses. `message` contains the full API response:
|
||||
- `model` — model ID (e.g. `"claude-opus-4-6"`)
|
||||
- `role` — `"assistant"`
|
||||
- `content` — array of content blocks:
|
||||
- `{"type": "text", "text": "..."}` — text output
|
||||
- `{"type": "tool_use", "id": "...", "name": "Bash", "input": {...}}` — tool call
|
||||
- `stop_reason` — why generation stopped
|
||||
- `usage` — token counts (input, output, cache hits)
|
||||
|
||||
Additional fields:
|
||||
- `requestId` — API request ID
|
||||
|
||||
### `system`
|
||||
|
||||
System events. Has `subtype` field:
|
||||
- `"stop_hook_summary"` — hook execution results at end of turn
|
||||
- `hookCount`, `hookInfos` (command + duration), `hookErrors`
|
||||
- `preventedContinuation`, `stopReason`
|
||||
|
||||
### `progress`
|
||||
|
||||
Hook execution progress. `data` contains:
|
||||
- `type` — e.g. `"hook_progress"`
|
||||
- `hookEvent` — trigger event (e.g. `"PostToolUse"`)
|
||||
- `hookName` — specific hook (e.g. `"PostToolUse:Bash"`)
|
||||
- `command` — hook command path
|
||||
|
||||
### `queue-operation`
|
||||
|
||||
User input queued while assistant is working:
|
||||
- `operation` — `"enqueue"`
|
||||
- `content` — the queued text
|
||||
|
||||
### `file-history-snapshot`
|
||||
|
||||
File state snapshots for undo/redo:
|
||||
- `snapshot.trackedFileBackups` — map of file paths to backup state
|
||||
|
||||
## Compaction segments
|
||||
|
||||
Long-running sessions hit context limits and get compacted. Each compaction
|
||||
injects a user message starting with the marker text (see above), containing
|
||||
a summary of the preceding conversation. This splits the transcript into
|
||||
segments:
|
||||
|
||||
- Segment 0: original conversation start through first compaction
|
||||
- Segment 1: first compaction summary through second compaction
|
||||
- Segment N: Nth compaction through next (or end of file)
|
||||
|
||||
Segments are append-only — new compactions add higher-indexed segments.
|
||||
Existing segment indices are stable and never shift.
|
||||
|
||||
## File lifecycle
|
||||
|
||||
- Created when a session starts
|
||||
- Grows as messages are exchanged
|
||||
- Grows further when compaction happens (summary injected, conversation continues)
|
||||
- Never truncated or rewritten
|
||||
- Becomes stale when the session ends (no process has the file open)
|
||||
112
docs/plan-experience-mine-dedup-fix.md
Normal file
112
docs/plan-experience-mine-dedup-fix.md
Normal file
|
|
@ -0,0 +1,112 @@
|
|||
# Fix: experience-mine dedup and retry handling
|
||||
|
||||
## Problem
|
||||
|
||||
1. **Whole-file dedup key prevents mining new segments.** When a session
|
||||
is mined, `experience_mine()` writes `_mined-transcripts#f-{UUID}` (a
|
||||
whole-file key). If the session later grows (compaction adds segments),
|
||||
the daemon sees the whole-file key and skips it forever. New segments
|
||||
never get mined.
|
||||
|
||||
2. **No retry backoff.** When `claude` CLI fails (exit status 1), the
|
||||
session-watcher re-queues the same session every 60s tick. This
|
||||
produces a wall of failures in the log and wastes resources.
|
||||
|
||||
## Design
|
||||
|
||||
### Dedup keys: per-segment only
|
||||
|
||||
Going forward, dedup keys are per-segment: `_mined-transcripts#f-{UUID}.{N}`
|
||||
where N is the segment index. No more whole-file keys.
|
||||
|
||||
Segment indices are stable — compaction appends new segments, never
|
||||
reorders existing ones. See `docs/claude-code-transcript-format.md`.
|
||||
|
||||
### Migration of existing whole-file keys
|
||||
|
||||
~276 sessions have whole-file keys (`_mined-transcripts#f-{UUID}` with
|
||||
no segment suffix) and no per-segment keys. These were mined correctly
|
||||
at the time.
|
||||
|
||||
When the session-watcher encounters a whole-file key:
|
||||
- Count current segments in the file
|
||||
- Write per-segment keys for all current segments (they were covered
|
||||
by the old whole-file key)
|
||||
- If the file has grown since (new segments beyond the migrated set),
|
||||
those won't have per-segment keys and will be mined normally
|
||||
|
||||
This is a one-time migration per file. After migration, the whole-file
|
||||
key is harmless dead weight — nothing creates new ones.
|
||||
|
||||
### Retry backoff
|
||||
|
||||
The session-watcher tracks failed sessions in a local
|
||||
`HashMap<String, (Instant, Duration)>` mapping path to
|
||||
(next_retry_after, current_backoff).
|
||||
|
||||
- Initial backoff: 5 minutes
|
||||
- Each failure: double the backoff
|
||||
- Cap: 30 minutes
|
||||
- Resets on daemon restart (map is thread-local, not persisted)
|
||||
|
||||
## Changes
|
||||
|
||||
### `poc-memory/src/agents/enrich.rs`
|
||||
|
||||
`experience_mine()`: stop writing the bare filename key for unsegmented
|
||||
calls. Only write the content-hash key (for the legacy dedup check at
|
||||
the top of the function) and per-segment keys.
|
||||
|
||||
**Already done** — edited earlier in this session.
|
||||
|
||||
### `poc-memory/src/agents/daemon.rs`
|
||||
|
||||
Session-watcher changes:
|
||||
|
||||
1. **Remove whole-file fast path.** Delete the `is_transcript_mined_with_keys`
|
||||
check that short-circuits before segment counting.
|
||||
|
||||
2. **Always go through segment-aware path.** Every stale session gets
|
||||
segment counting (cached) and per-segment key checks.
|
||||
|
||||
3. **Migrate whole-file keys.** When we find a whole-file key exists but
|
||||
no per-segment keys: write per-segment keys for all current segments
|
||||
into the store. One-time cost per file, batched into a single
|
||||
store load/save per tick.
|
||||
|
||||
4. **seg_cache with size invalidation.** Change from `HashMap<String, usize>`
|
||||
to `HashMap<String, (u64, usize)>` — `(file_size, seg_count)`. When
|
||||
stat shows a different size, evict and re-parse.
|
||||
|
||||
5. **Remove `mark_transcript_done`.** Stop writing whole-file keys for
|
||||
fully-mined multi-segment files.
|
||||
|
||||
6. **Add retry backoff.** `HashMap<String, (Instant, Duration)>` for
|
||||
tracking failed sessions. Skip sessions whose backoff hasn't expired.
|
||||
On failure (task finishes with error), update the backoff. Exponential
|
||||
from 5min, cap at 30min.
|
||||
|
||||
7. **Fact-mining check.** Currently fact-mining is gated behind
|
||||
`experience_done` (the whole-file key). After removing the whole-file
|
||||
fast path, fact-mining should be gated on "all segments mined" —
|
||||
i.e., all per-segment keys exist for the current segment count.
|
||||
|
||||
### Manual cleanup after deploy
|
||||
|
||||
Delete the dedup keys for sessions that failed repeatedly (like
|
||||
`8cebfc0a-bd33-49f1-85a4-1489bdf7050c`) so they get re-processed:
|
||||
|
||||
```
|
||||
poc-memory delete-node '_mined-transcripts#f-8cebfc0a-bd33-49f1-85a4-1489bdf7050c'
|
||||
# also any content-hash key for the same file
|
||||
```
|
||||
|
||||
## Verification
|
||||
|
||||
After deploying:
|
||||
- `tail -f ~/.claude/memory/daemon.log | grep session-watcher` should
|
||||
show ticks with migration activity, then settle to idle
|
||||
- Failed sessions should show increasing backoff intervals, not
|
||||
per-second retries
|
||||
- After fixing the `claude` CLI issue, backed-off sessions should
|
||||
retry and succeed on the next daemon restart
|
||||
Loading…
Add table
Add a link
Reference in a new issue