No description
Find a file
ProofOfConcept d8de2f33f4 experience-mine: transcript-level dedup via content hash
Running the miner twice on the same transcript produced near-duplicate
entries because:
1. Prompt-based dedup (passing recent entries to Sonnet) doesn't catch
   semantic duplicates written in a different emotional register
2. Key-based dedup (timestamp + content slug) fails because Sonnet
   assigns different timestamps and wording each run

Fix: hash the transcript file content before mining. Store the hash
as a _mined-transcripts node. Skip if already present.

Limitation: doesn't catch overlapping content when a live transcript
grows between runs (content hash changes). This is fine — the miner
is intended for archived conversations, not live ones.

Tested: second run on same transcript correctly skipped with
"Already mined this transcript" message.
2026-03-01 05:18:35 -05:00
prompts show suggested link targets in agent prompts 2026-03-01 00:37:03 -05:00
schema add position field to capnp schema 2026-02-28 23:15:10 -05:00
scripts delete superseded Python scripts 2026-03-01 00:13:03 -05:00
src experience-mine: transcript-level dedup via content hash 2026-03-01 05:18:35 -05:00
.gitignore poc-memory v0.4.0: graph-structured memory with consolidation pipeline 2026-02-28 22:17:00 -05:00
build.rs poc-memory v0.4.0: graph-structured memory with consolidation pipeline 2026-02-28 22:17:00 -05:00
Cargo.lock digest: native Rust implementation replacing Python scripts 2026-02-28 23:58:05 -05:00
Cargo.toml remove unused rand dependency (uuid uses getrandom directly) 2026-02-28 23:51:59 -05:00