experience-mine: split oversized sessions at compaction boundaries

Claude Code doesn't create new session files on context compaction —
a single UUID can accumulate 170+ conversations, producing 400MB+
JSONL files that generate 1.3M token prompts.

Split at compaction markers ("This session is being continued..."):
- extract_conversation made pub, split_on_compaction splits messages
- experience_mine takes optional segment index
- daemon watcher parses files, spawns per-segment jobs (.0, .1, .2)
- seg_cache memoizes segment counts across ticks
- per-segment dedup keys; whole-file key when all segments complete
- 150K token guard skips any remaining oversized segments
- char-boundary-safe truncation in enrich.rs and fact_mine.rs

Backwards compatible: unsegmented calls still write content-hash
dedup keys, old whole-file mined keys still recognized.
This commit is contained in:
ProofOfConcept 2026-03-07 12:01:38 -05:00
parent 22a9fdabdb
commit 45335de220
4 changed files with 155 additions and 39 deletions

View file

@ -234,7 +234,7 @@ pub fn mine_transcript(path: &Path, dry_run: bool) -> Result<Vec<Fact>, String>
if dry_run {
for (i, (offset, chunk)) in chunks.iter().enumerate() {
eprintln!("\n--- Chunk {} (offset {}, {} chars) ---", i + 1, offset, chunk.len());
let preview = if chunk.len() > 500 { &chunk[..500] } else { chunk };
let preview = if chunk.len() > 500 { &chunk[..chunk.floor_char_boundary(500)] } else { chunk };
eprintln!("{}", preview);
if chunk.len() > 500 {
eprintln!(" ... ({} more chars)", chunk.len() - 500);