consciousness/docs/plan-experience-mine-dedup-fix.md

113 lines
4.1 KiB
Markdown
Raw Normal View History

# Fix: experience-mine dedup and retry handling
## Problem
1. **Whole-file dedup key prevents mining new segments.** When a session
is mined, `experience_mine()` writes `_mined-transcripts#f-{UUID}` (a
whole-file key). If the session later grows (compaction adds segments),
the daemon sees the whole-file key and skips it forever. New segments
never get mined.
2. **No retry backoff.** When `claude` CLI fails (exit status 1), the
session-watcher re-queues the same session every 60s tick. This
produces a wall of failures in the log and wastes resources.
## Design
### Dedup keys: per-segment only
Going forward, dedup keys are per-segment: `_mined-transcripts#f-{UUID}.{N}`
where N is the segment index. No more whole-file keys.
Segment indices are stable — compaction appends new segments, never
reorders existing ones. See `docs/claude-code-transcript-format.md`.
### Migration of existing whole-file keys
~276 sessions have whole-file keys (`_mined-transcripts#f-{UUID}` with
no segment suffix) and no per-segment keys. These were mined correctly
at the time.
When the session-watcher encounters a whole-file key:
- Count current segments in the file
- Write per-segment keys for all current segments (they were covered
by the old whole-file key)
- If the file has grown since (new segments beyond the migrated set),
those won't have per-segment keys and will be mined normally
This is a one-time migration per file. After migration, the whole-file
key is harmless dead weight — nothing creates new ones.
### Retry backoff
The session-watcher tracks failed sessions in a local
`HashMap<String, (Instant, Duration)>` mapping path to
(next_retry_after, current_backoff).
- Initial backoff: 5 minutes
- Each failure: double the backoff
- Cap: 30 minutes
- Resets on daemon restart (map is thread-local, not persisted)
## Changes
### `poc-memory/src/agents/enrich.rs`
`experience_mine()`: stop writing the bare filename key for unsegmented
calls. Only write the content-hash key (for the legacy dedup check at
the top of the function) and per-segment keys.
**Already done** — edited earlier in this session.
### `poc-memory/src/agents/daemon.rs`
Session-watcher changes:
1. **Remove whole-file fast path.** Delete the `is_transcript_mined_with_keys`
check that short-circuits before segment counting.
2. **Always go through segment-aware path.** Every stale session gets
segment counting (cached) and per-segment key checks.
3. **Migrate whole-file keys.** When we find a whole-file key exists but
no per-segment keys: write per-segment keys for all current segments
into the store. One-time cost per file, batched into a single
store load/save per tick.
4. **seg_cache with size invalidation.** Change from `HashMap<String, usize>`
to `HashMap<String, (u64, usize)>``(file_size, seg_count)`. When
stat shows a different size, evict and re-parse.
5. **Remove `mark_transcript_done`.** Stop writing whole-file keys for
fully-mined multi-segment files.
6. **Add retry backoff.** `HashMap<String, (Instant, Duration)>` for
tracking failed sessions. Skip sessions whose backoff hasn't expired.
On failure (task finishes with error), update the backoff. Exponential
from 5min, cap at 30min.
7. **Fact-mining check.** Currently fact-mining is gated behind
`experience_done` (the whole-file key). After removing the whole-file
fast path, fact-mining should be gated on "all segments mined" —
i.e., all per-segment keys exist for the current segment count.
### Manual cleanup after deploy
Delete the dedup keys for sessions that failed repeatedly (like
`8cebfc0a-bd33-49f1-85a4-1489bdf7050c`) so they get re-processed:
```
poc-memory delete-node '_mined-transcripts#f-8cebfc0a-bd33-49f1-85a4-1489bdf7050c'
# also any content-hash key for the same file
```
## Verification
After deploying:
- `tail -f ~/.consciousness/memory/daemon.log | grep session-watcher` should
show ticks with migration activity, then settle to idle
- Failed sessions should show increasing backoff intervals, not
per-second retries
- After fixing the `claude` CLI issue, backed-off sessions should
retry and succeed on the next daemon restart