The whole-file dedup key (_mined-transcripts#f-{UUID}) prevented mining
new compaction segments when session files grew. Replace with per-segment
keys (_mined-transcripts#f-{UUID}.{N}) so each segment is tracked
independently.
Changes:
- daemon session-watcher: segment-aware dedup, migrate 272 existing
whole-file keys to per-segment on restart
- seg_cache with size-based invalidation (re-parse when file grows)
- exponential retry backoff (5min → 30min cap) for failed sessions
- experience_mine(): write per-segment key only, backfill on
content-hash early return
- fact-mining gated on all per-segment keys existing
Also adds documentation:
- docs/claude-code-transcript-format.md: JSONL transcript format
- docs/plan-experience-mine-dedup-fix.md: design document
3.2 KiB
Claude Code Transcript Format
Claude Code stores session transcripts as JSONL files (one JSON object per
line) in ~/.claude/projects/<project-slug>/<session-uuid>.jsonl.
Common fields
Every line has:
type— message type (see below)uuid— unique ID for this messageparentUuid— links to preceding message (forms a chain)sessionId— session UUID (matches the filename stem)timestamp— ISO 8601cwd,version,gitBranch— session context
Message types
user
User input or tool results. message.content is either:
- A string (plain user text)
- An array of content blocks, each with
type:"tool_result"— result of a tool call, withtool_use_id,content(string or array of text/image blocks),is_error
User messages that start a compaction segment begin with:
This session is being continued from a previous conversation that ran out of context.
These are injected by Claude Code when context is compacted.
Additional fields on user messages:
userType—"external"for human input, may differ for system-injectedtodos— task list statepermissionMode— permission level for the session
assistant
Model responses. message contains the full API response:
model— model ID (e.g."claude-opus-4-6")role—"assistant"content— array of content blocks:{"type": "text", "text": "..."}— text output{"type": "tool_use", "id": "...", "name": "Bash", "input": {...}}— tool call
stop_reason— why generation stoppedusage— token counts (input, output, cache hits)
Additional fields:
requestId— API request ID
system
System events. Has subtype field:
"stop_hook_summary"— hook execution results at end of turnhookCount,hookInfos(command + duration),hookErrorspreventedContinuation,stopReason
progress
Hook execution progress. data contains:
type— e.g."hook_progress"hookEvent— trigger event (e.g."PostToolUse")hookName— specific hook (e.g."PostToolUse:Bash")command— hook command path
queue-operation
User input queued while assistant is working:
operation—"enqueue"content— the queued text
file-history-snapshot
File state snapshots for undo/redo:
snapshot.trackedFileBackups— map of file paths to backup state
Compaction segments
Long-running sessions hit context limits and get compacted. Each compaction injects a user message starting with the marker text (see above), containing a summary of the preceding conversation. This splits the transcript into segments:
- Segment 0: original conversation start through first compaction
- Segment 1: first compaction summary through second compaction
- Segment N: Nth compaction through next (or end of file)
Segments are append-only — new compactions add higher-indexed segments. Existing segment indices are stable and never shift.
File lifecycle
- Created when a session starts
- Grows as messages are exchanged
- Grows further when compaction happens (summary injected, conversation continues)
- Never truncated or rewritten
- Becomes stale when the session ends (no process has the file open)