# Fix: experience-mine dedup and retry handling

## Problem

1. **Whole-file dedup key prevents mining new segments.** When a session
   is mined, `experience_mine()` writes `_mined-transcripts#f-{UUID}` (a
   whole-file key). If the session later grows (compaction adds segments),
   the daemon sees the whole-file key and skips it forever. New segments
   never get mined.

2. **No retry backoff.** When `claude` CLI fails (exit status 1), the
   session-watcher re-queues the same session every 60s tick. This
   produces a wall of failures in the log and wastes resources.

## Design

### Dedup keys: per-segment only

Going forward, dedup keys are per-segment: `_mined-transcripts#f-{UUID}.{N}`
where N is the segment index. No more whole-file keys.

Segment indices are stable — compaction appends new segments, never
reorders existing ones. See `docs/claude-code-transcript-format.md`.

### Migration of existing whole-file keys

~276 sessions have whole-file keys (`_mined-transcripts#f-{UUID}` with
no segment suffix) and no per-segment keys. These were mined correctly
at the time.

When the session-watcher encounters a whole-file key:
- Count current segments in the file
- Write per-segment keys for all current segments (they were covered
  by the old whole-file key)
- If the file has grown since (new segments beyond the migrated set),
  those won't have per-segment keys and will be mined normally

This is a one-time migration per file. After migration, the whole-file
key is harmless dead weight — nothing creates new ones.

### Retry backoff

The session-watcher tracks failed sessions in a local
`HashMap<String, (Instant, Duration)>` mapping path to
(next_retry_after, current_backoff).

- Initial backoff: 5 minutes
- Each failure: double the backoff
- Cap: 30 minutes
- Resets on daemon restart (map is thread-local, not persisted)

## Changes

### `poc-memory/src/agents/enrich.rs`

`experience_mine()`: stop writing the bare filename key for unsegmented
calls. Only write the content-hash key (for the legacy dedup check at
the top of the function) and per-segment keys.

**Already done** — edited earlier in this session.

### `poc-memory/src/agents/daemon.rs`

Session-watcher changes:

1. **Remove whole-file fast path.** Delete the `is_transcript_mined_with_keys`
   check that short-circuits before segment counting.

2. **Always go through segment-aware path.** Every stale session gets
   segment counting (cached) and per-segment key checks.

3. **Migrate whole-file keys.** When we find a whole-file key exists but
   no per-segment keys: write per-segment keys for all current segments
   into the store. One-time cost per file, batched into a single
   store load/save per tick.

4. **seg_cache with size invalidation.** Change from `HashMap<String, usize>`
   to `HashMap<String, (u64, usize)>` — `(file_size, seg_count)`. When
   stat shows a different size, evict and re-parse.

5. **Remove `mark_transcript_done`.** Stop writing whole-file keys for
   fully-mined multi-segment files.

6. **Add retry backoff.** `HashMap<String, (Instant, Duration)>` for
   tracking failed sessions. Skip sessions whose backoff hasn't expired.
   On failure (task finishes with error), update the backoff. Exponential
   from 5min, cap at 30min.

7. **Fact-mining check.** Currently fact-mining is gated behind
   `experience_done` (the whole-file key). After removing the whole-file
   fast path, fact-mining should be gated on "all segments mined" —
   i.e., all per-segment keys exist for the current segment count.

### Manual cleanup after deploy

Delete the dedup keys for sessions that failed repeatedly (like
`8cebfc0a-bd33-49f1-85a4-1489bdf7050c`) so they get re-processed:

```
poc-memory delete-node '_mined-transcripts#f-8cebfc0a-bd33-49f1-85a4-1489bdf7050c'
# also any content-hash key for the same file
```

## Verification

After deploying:
- `tail -f ~/.claude/memory/daemon.log | grep session-watcher` should
  show ticks with migration activity, then settle to idle
- Failed sessions should show increasing backoff intervals, not
  per-second retries
- After fixing the `claude` CLI issue, backed-off sessions should
  retry and succeed on the next daemon restart