experience-mine: split oversized sessions at compaction boundaries

Claude Code doesn't create new session files on context compaction — a single UUID can accumulate 170+ conversations, producing 400MB+ JSONL files that generate 1.3M token prompts. Split at compaction markers ("This session is being continued..."): - extract_conversation made pub, split_on_compaction splits messages - experience_mine takes optional segment index - daemon watcher parses files, spawns per-segment jobs (.0, .1, .2) - seg_cache memoizes segment counts across ticks - per-segment dedup keys; whole-file key when all segments complete - 150K token guard skips any remaining oversized segments - char-boundary-safe truncation in enrich.rs and fact_mine.rs Backwards compatible: unsegmented calls still write content-hash dedup keys, old whole-file mined keys still recognized.
2026-03-07 12:01:38 -05:00 · 2026-03-07 12:01:38 -05:00 · 45335de220
commit 45335de220
parent 22a9fdabdb
4 changed files with 155 additions and 39 deletions
--- a/src/fact_mine.rs
+++ b/src/fact_mine.rs
@ -234,7 +234,7 @@ pub fn mine_transcript(path: &Path, dry_run: bool) -> Result<Vec<Fact>, String>
    if dry_run {
        for (i, (offset, chunk)) in chunks.iter().enumerate() {
            eprintln!("\n--- Chunk {} (offset {}, {} chars) ---", i + 1, offset, chunk.len());
-            let preview = if chunk.len() > 500 { &chunk[..500] } else { chunk };
+            let preview = if chunk.len() > 500 { &chunk[..chunk.floor_char_boundary(500)] } else { chunk };
            eprintln!("{}", preview);
            if chunk.len() > 500 {
                eprintln!("  ... ({} more chars)", chunk.len() - 500);