schema_fit was algebraically identical to clustering_coefficient
(both compute 2E/(d*(d-1)) = fraction of connected neighbor pairs).
Remove the redundant function, field, and metrics column.
- Delete schema_fit() and schema_fit_all() from graph.rs
- Remove schema_fit field from Node struct
- Remove avg_schema_fit from MetricsSnapshot (duplicated avg_cc)
- Replace all callers with graph.clustering_coefficient()
- Rename ReplayItem.schema_fit to .cc
- Query: "cc" and "schema_fit" both resolve from graph CC
- Low-CC count folded into health report CC line
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
Both functions count connected pairs among a node's neighbors:
cc = 2*triangles / (deg*(deg-1))
density = inter_edges / (n*(n-1)/2) = 2*inter_edges / (n*(n-1))
Since inter_edges == triangles and n == deg, density == cc.
schema_fit was (density + cc) / 2.0 = (cc + cc) / 2.0 = cc.
Verified empirically: assert!((density - cc).abs() < 1e-6) passed
on all 2401 nodes before this change.
Keep schema_fit as a semantic alias — CC is a graph metric,
schema fit is a cognitive one — but eliminate the redundant O(n²)
pairwise computation that was running for every node.
QueryResult carries a fields map (BTreeMap<String, Value>) so callers
don't re-resolve fields after queries run. Neighbors queries inject
edge context (strength, rel_type) at construction time.
New public API:
- run_query(): parse + execute + format in one call
- format_value(): format a Value for display
- execute_parsed(): internal, avoids double-parse in run_query
Removed: output_stages(), format_field()
Simplified commands:
- cmd_query, cmd_graph, cmd_link, cmd_list_keys all delegate to run_query
- cmd_experience_mine uses existing find_current_transcript()
Deduplication:
- now_epoch() 3 copies → 1 (capnp_store's public fn)
- hub_threshold → Graph::hub_threshold() method
- eval_node + eval_edge → single eval() with closure for field resolution
- compare() collapsed via Ordering (35 → 15 lines)
Modernization:
- 12 sites of partial_cmp().unwrap_or(Ordering::Equal) → total_cmp()
Build all batch prompts up front, run them in parallel via
rayon::par_iter, process results sequentially. Also fix temp file
collision under parallel calls by including thread ID in filename.
for_each_relation() was iterating deleted relations, polluting the
graph with ghost edges. Also filter them from rkyv snapshots and
clean them from the in-memory vec after cap_degree pruning.
- New spectral module: Laplacian eigendecomposition of the memory graph.
Commands: spectral, spectral-save, spectral-neighbors, spectral-positions,
spectral-suggest. Spectral neighbors expand search results beyond keyword
matching to structural proximity.
- Search: use StoreView trait to avoid 6MB state.bin rewrite on every query.
Append-only retrieval logging. Spectral expansion shows structurally
nearby nodes after text results.
- Fix panic in journal-tail: string truncation at byte 67 could land inside
a multi-byte character (em dash). Now walks back to char boundary.
- Replay queue: show classification and spectral outlier score.
- Knowledge agents: extractor, challenger, connector prompts and runner
scripts for automated graph enrichment.
- memory-search hook: stale state file cleanup (24h expiry).
Three new tools for structural graph health:
- fix-categories: rule-based recategorization fixing core inflation
(225 → 26 core nodes). Only identity.md and kent.md stay core;
everything else reclassified to tech/obs/gen by file prefix rules.
- cap-degree: two-phase degree capping. First prunes weakest Auto
edges, then prunes Link edges to high-degree targets (they have
alternative paths). Brought max degree from 919 → 50.
- link-orphans: connects degree-0/1 nodes to most textually similar
connected nodes via cosine similarity. Linked 614 orphans.
Also: community detection now filters edges below strength 0.3,
preventing weak auto-links from merging unrelated communities.
Pipeline updated: consolidate-full now runs link-orphans + cap-degree
instead of triangle-close (which was counterproductive — densified
hub neighborhoods instead of building bridges).
Net effect: Gini 0.754 → 0.546, max degree 919 → 50.
New command: `poc-memory triangle-close [MIN_DEG] [SIM] [MAX_PER_HUB]`
For each node above min_degree, finds pairs of its neighbors that
aren't directly connected and have text similarity above threshold.
Links them. This turns hub-spoke patterns into triangles, directly
improving clustering coefficient and schema fit.
First run results (default params: deg≥5, sim≥0.3, max 10/hub):
- 636 hubs processed, 5046 lateral links added
- cc: 0.14 → 0.46 (target: high)
- fit: 0.09 → 0.32 (target ≥0.2)
- σ: 56.9 → 84.4 (small-world coefficient improved)
Also fixes separator agent prompt: truncate interference pairs to
batch count (was including all 1114 pairs = 1.3M chars).
The mtime-based cache (state.bin) was causing data loss under
concurrent writes. Multiple processes (dream loop journal writes,
link audit agents, journal enrichment agents) would each:
1. Load state.bin (stale - missing other processes' recent writes)
2. Make their own changes
3. Save state.bin, overwriting entries from other processes
This caused 48 nodes to be lost from tonight's dream session -
entries were in the append-only capnp log but invisible to the
index because a later writer's state.bin overwrote the version
that contained them.
Fix: always replay from the capnp log (the source of truth).
Cost: ~10ms extra at 2K nodes (36ms vs 26ms). The cache saved
10ms but introduced a correctness bug that lost real data.
The append-only log design was correct - the cache layer violated
its invariant by allowing stale reads to silently discard writes.
Running the miner twice on the same transcript produced near-duplicate
entries because:
1. Prompt-based dedup (passing recent entries to Sonnet) doesn't catch
semantic duplicates written in a different emotional register
2. Key-based dedup (timestamp + content slug) fails because Sonnet
assigns different timestamps and wording each run
Fix: hash the transcript file content before mining. Store the hash
as a _mined-transcripts node. Skip if already present.
Limitation: doesn't catch overlapping content when a live transcript
grows between runs (content hash changes). This is fine — the miner
is intended for archived conversations, not live ones.
Tested: second run on same transcript correctly skipped with
"Already mined this transcript" message.
Reads a conversation JSONL, identifies experiential moments that
weren't captured in real-time journal entries, and writes them as
journal nodes in the store. The agent writes in PoC's voice with
emotion tags, focusing on intimate moments, shifts in understanding,
and small pleasures — not clinical topic extraction.
Conversation timestamps are now extracted and included in formatted
output, enabling accurate temporal placement of mined entries.
Also: extract_conversation now returns timestamps as a 4th tuple field.
`poc-journal tail 5 --full` shows full entry content with
timestamp headers and --- separators. Default mode remains
title-only for scanning. Also passes all args through the
poc-journal wrapper instead of just the count.
Sort key normalization ensures consistent ordering across entries
with different date formats (content dates vs key dates). Title
extraction skips date-only lines, finds ## headers or falls back
to first content line truncated at 70 chars.
Also fixed: cargo bin had stale binary shadowing local bin install.
Batch all non-deleted links (~3,800) into char-budgeted groups,
send each batch to Sonnet with full content of both endpoints,
and apply KEEP/DELETE/RETARGET/WEAKEN/STRENGTHEN decisions.
One-time cleanup for links created before refine_target existed.
Co-Authored-By: ProofOfConcept <poc@bcachefs.org>
Agents were flying blind — they could see nodes to review and the
topology header, but had no way to discover what targets to link to.
Now each node shows its top 8 text-similar semantic nodes that aren't
already neighbors, giving agents a search-like capability.
Also added section-level targeting guidance to linker.md, transfer.md,
and replay.md prompts: always target the most specific section, not
the file-level node.
Co-Authored-By: ProofOfConcept <poc@bcachefs.org>
- All partial_cmp().unwrap() → unwrap_or(Ordering::Equal) to prevent
NaN panics in sort operations across neuro.rs, graph.rs, similarity.rs
- replay_queue_with_graph: accepts pre-built graph, avoids rebuilding
in agent_prompt (was building 2-3x per prompt)
- differentiate_hub_with_graph: same pattern for differentiation
- Simplify double-reverse history iteration to slice indexing
Co-Authored-By: ProofOfConcept <poc@bcachefs.org>
Pattern separation for memory graph: when a file-level node (e.g.
identity.md) has section children, redistribute its links to the
best-matching section using cosine similarity.
- differentiate_hub: analyze hub, propose link redistribution
- refine_target: at link creation time, automatically target the
most specific section instead of the file-level hub
- Applied refine_target in all four link creation paths (digest
links, journal enrichment, apply consolidation, link-add command)
- Saturated hubs listed in agent topology header with "DO NOT LINK"
This prevents hub formation proactively (refine_target) and
remediates existing hubs (differentiate command).
Co-Authored-By: ProofOfConcept <poc@bcachefs.org>
Three Python scripts (858 lines) replaced with native Rust subcommands:
- digest-links [--apply]: parses ## Links sections from episodic digests,
normalizes keys, applies to graph with section-level fallback
- journal-enrich JSONL TEXT [LINE]: extracts conversation from JSONL
transcript, calls Sonnet for link proposals and source location
- apply-consolidation [--apply]: reads consolidation reports, sends to
Sonnet for structured action extraction (links, categorizations,
manual items)
Shared infrastructure: call_sonnet now pub(crate), new
parse_json_response helper for Sonnet output parsing with markdown
fence stripping.
Replace daily-digest.py, weekly-digest.py, monthly-digest.py with a
single digest.rs module. All three digest types now:
- Gather input directly from the Store (no subprocess calls)
- Build prompts in Rust (same templates as the Python versions)
- Call Sonnet via `claude -p --model sonnet`
- Import results back into the store automatically
- Extract links and save agent results
606 lines of Rust replaces 729 lines of Python + store_helpers.py
overhead. More importantly: this is now callable as a library from
poc-agent, and shares types/code with the rest of poc-memory.
Also adds `digest monthly [YYYY-MM]` subcommand (was Python-only).
- &PathBuf → &Path in memory-search.rs signatures
- Redundant field name in graph.rs struct init
- Add truncate(false) to lock file open
- Derive Default for Store instead of manual impl
- slice::from_ref instead of &[x.clone()]
- rsplit_once instead of split().last()
- str::repeat instead of iter::repeat().take().collect()
- is_none_or instead of map_or(true, ...)
- strip_prefix instead of manual slicing
Zero warnings on `cargo clippy`.
- Replace all 5 `Command::new("date")` calls across 4 files with
pure Rust time formatting via libc localtime_r
- Add format_date/format_datetime/format_datetime_space helpers to
capnp_store
- Move import_file, find_journal_node, export_to_markdown, render_file,
file_sections into Store methods where they belong
- Fix find_current_transcript to search all project dirs instead of
hardcoding bcachefs-tools path
- Fix double-reference .clone() warnings in cmd_trace
- Fix unused variable warning in neuro.rs
main.rs: 1290 → 1137 lines, zero warnings.
Position was only in the bincode cache (serde field) — it would
be lost on cache rebuild from capnp logs. Now persisted in the
append-only log via ContentNode.position @19.
Also fixes journal-tail sorting to extract dates from content
headers, falling back to key-embedded dates.
journal-write creates entries directly in the capnp store with
auto-generated timestamped keys (journal.md#j-YYYY-MM-DDtHH-MM-slug),
episodic session type, and source ref from current transcript.
journal-tail sorts entries by date extracted from content headers,
falling back to key-embedded dates, then node timestamp.
poc-journal shell script now delegates to these commands instead
of appending to journal.md. Journal entries are store-first.
Sections within a file have a natural order that matters —
identity.md reads as a narrative, not an alphabetical index.
The position field (u32) tracks section index within the file.
Set during init and import from parse order. Export and
load-context sort by position instead of key, preserving the
author's intended structure.
write KEY: upsert a single node from stdin. Creates new or updates
existing with version bump. No-op if content unchanged.
import FILE: parse markdown sections, diff against store, upsert
changed/new nodes. Incremental — only touches what changed.
export FILE|--all: regenerate markdown from store nodes. Gathers
file-level + section nodes, reconstitutes mem markers with links
and causes from the relation graph.
Together these close the bidirectional sync loop:
markdown → import → store → export → markdown
Also exposes memory_dir_pub() for use from main.rs.
load-context replaces the shell hook's file-by-file cat approach.
Queries the capnp store directly for all session-start context:
orientation, identity, reflections, interests, inner life, people,
active context, shared reference, technical, and recent journal.
Sections are gathered per-file and output in priority order.
Journal entries filtered to last 7 days by key-embedded date,
capped at 20 most recent.
render outputs a single node's content to stdout.
The load-memory.sh hook now delegates entirely to
`poc-memory load-context` — capnp store is the single source
of truth for session startup context.
init now detects content changes in markdown files and updates
existing nodes (bumps version, appends to capnp log) instead of
only creating new ones. Link resolution uses the redirect table
so references to moved sections (e.g. from the reflections split)
create edges to the correct target.
On cache rebuild from capnp logs, filter out relations that
reference deleted/missing nodes so the relation count matches
the actual graph edge count.
node-delete: soft-deletes a node by appending a deleted version to
the capnp log, then removing it from the in-memory cache.
resolve_redirect: when resolve_key can't find a node, checks a static
redirect table for sections that moved during file splits (like the
reflections.md → reflections-{reading,dreams,zoom}.md split). This
handles immutable files (journal.md with chattr +a) that can't have
their references updated.
Faster serialization/deserialization, smaller on disk (4.2MB vs 5.9MB).
Automatic migration from state.json on first load — reads the JSON,
writes state.bin, deletes the old file.
Added list-keys, list-edges, dump-json commands so Python scripts no
longer need to parse the cache directly. Updated bulk-categorize.py
and consolidation-loop.py to use the new CLI commands.
mark_used, mark_wrong, and decay all modified node state (weight,
uses, wrongs, spaced_repetition_interval) only in memory + state.json.
Like the categorize fix, these changes would be lost on cache rebuild.
Now all three append updated node versions to the capnp log. Decay
appends all nodes in one batch since it touches every node.
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
categorize() only updated the in-memory HashMap and state.json cache.
When init appended new nodes to nodes.capnp (making it newer than
state.json), the next load() would rebuild from capnp logs and lose
all category assignments.
Fix: append an updated node version to the capnp log when category
changes, so it survives cache rebuilds.
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>