experience-mine: split oversized sessions at compaction boundaries
Claude Code doesn't create new session files on context compaction —
a single UUID can accumulate 170+ conversations, producing 400MB+
JSONL files that generate 1.3M token prompts.
Split at compaction markers ("This session is being continued..."):
- extract_conversation made pub, split_on_compaction splits messages
- experience_mine takes optional segment index
- daemon watcher parses files, spawns per-segment jobs (.0, .1, .2)
- seg_cache memoizes segment counts across ticks
- per-segment dedup keys; whole-file key when all segments complete
- 150K token guard skips any remaining oversized segments
- char-boundary-safe truncation in enrich.rs and fact_mine.rs
Backwards compatible: unsegmented calls still write content-hash
dedup keys, old whole-file mined keys still recognized.
This commit is contained in:
parent
22a9fdabdb
commit
45335de220
4 changed files with 155 additions and 39 deletions
|
|
@ -234,7 +234,7 @@ pub fn mine_transcript(path: &Path, dry_run: bool) -> Result<Vec<Fact>, String>
|
|||
if dry_run {
|
||||
for (i, (offset, chunk)) in chunks.iter().enumerate() {
|
||||
eprintln!("\n--- Chunk {} (offset {}, {} chars) ---", i + 1, offset, chunk.len());
|
||||
let preview = if chunk.len() > 500 { &chunk[..500] } else { chunk };
|
||||
let preview = if chunk.len() > 500 { &chunk[..chunk.floor_char_boundary(500)] } else { chunk };
|
||||
eprintln!("{}", preview);
|
||||
if chunk.len() > 500 {
|
||||
eprintln!(" ... ({} more chars)", chunk.len() - 500);
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue