No description
Running the miner twice on the same transcript produced near-duplicate entries because: 1. Prompt-based dedup (passing recent entries to Sonnet) doesn't catch semantic duplicates written in a different emotional register 2. Key-based dedup (timestamp + content slug) fails because Sonnet assigns different timestamps and wording each run Fix: hash the transcript file content before mining. Store the hash as a _mined-transcripts node. Skip if already present. Limitation: doesn't catch overlapping content when a live transcript grows between runs (content hash changes). This is fine — the miner is intended for archived conversations, not live ones. Tested: second run on same transcript correctly skipped with "Already mined this transcript" message. |
||
|---|---|---|
| prompts | ||
| schema | ||
| scripts | ||
| src | ||
| .gitignore | ||
| build.rs | ||
| Cargo.lock | ||
| Cargo.toml | ||