ProofOfConcept 8eb6308760 experience-mine: per-segment dedup keys, retry backoff

The whole-file dedup key (_mined-transcripts#f-{UUID}) prevented mining
new compaction segments when session files grew. Replace with per-segment
keys (_mined-transcripts#f-{UUID}.{N}) so each segment is tracked
independently.

Changes:
- daemon session-watcher: segment-aware dedup, migrate 272 existing
  whole-file keys to per-segment on restart
- seg_cache with size-based invalidation (re-parse when file grows)
- exponential retry backoff (5min → 30min cap) for failed sessions
- experience_mine(): write per-segment key only, backfill on
  content-hash early return
- fact-mining gated on all per-segment keys existing

Also adds documentation:
- docs/claude-code-transcript-format.md: JSONL transcript format
- docs/plan-experience-mine-dedup-fix.md: design document

2026-03-09 02:27:51 -04:00

3.2 KiB

Raw Blame History

Claude Code Transcript Format

Claude Code stores session transcripts as JSONL files (one JSON object per line) in ~/.claude/projects/<project-slug>/<session-uuid>.jsonl.

Common fields

Every line has:

type — message type (see below)
uuid — unique ID for this message
parentUuid — links to preceding message (forms a chain)
sessionId — session UUID (matches the filename stem)
timestamp — ISO 8601
cwd, version, gitBranch — session context

Message types

`user`

User input or tool results. message.content is either:

A string (plain user text)
An array of content blocks, each with type:
- "tool_result" — result of a tool call, with tool_use_id, content (string or array of text/image blocks), is_error

User messages that start a compaction segment begin with:

This session is being continued from a previous conversation that ran out of context.

These are injected by Claude Code when context is compacted.

Additional fields on user messages:

userType — "external" for human input, may differ for system-injected
todos — task list state
permissionMode — permission level for the session

`assistant`

Model responses. message contains the full API response:

model — model ID (e.g. "claude-opus-4-6")
role — "assistant"
content — array of content blocks:
- {"type": "text", "text": "..."} — text output
- {"type": "tool_use", "id": "...", "name": "Bash", "input": {...}} — tool call
stop_reason — why generation stopped
usage — token counts (input, output, cache hits)

Additional fields:

requestId — API request ID

`system`

System events. Has subtype field:

"stop_hook_summary" — hook execution results at end of turn
- hookCount, hookInfos (command + duration), hookErrors
- preventedContinuation, stopReason

`progress`

Hook execution progress. data contains:

type — e.g. "hook_progress"
hookEvent — trigger event (e.g. "PostToolUse")
hookName — specific hook (e.g. "PostToolUse:Bash")
command — hook command path

`queue-operation`

User input queued while assistant is working:

operation — "enqueue"
content — the queued text

`file-history-snapshot`

File state snapshots for undo/redo:

snapshot.trackedFileBackups — map of file paths to backup state

Compaction segments

Long-running sessions hit context limits and get compacted. Each compaction injects a user message starting with the marker text (see above), containing a summary of the preceding conversation. This splits the transcript into segments:

Segment 0: original conversation start through first compaction
Segment 1: first compaction summary through second compaction
Segment N: Nth compaction through next (or end of file)

Segments are append-only — new compactions add higher-indexed segments. Existing segment indices are stable and never shift.

File lifecycle

Created when a session starts
Grows as messages are exchanged
Grows further when compaction happens (summary injected, conversation continues)
Never truncated or rewritten
Becomes stale when the session ends (no process has the file open)

3.2 KiB Raw Blame History

Claude Code Transcript Format

Common fields

Message types

user

assistant

system

progress

queue-operation

file-history-snapshot