Commit graph

23 commits

Author SHA1 Message Date
ProofOfConcept
635da6d3e2 split capnp_store.rs into src/store/ module hierarchy
capnp_store.rs (1772 lines) → four focused modules:
  store/types.rs  — types, macros, constants, path helpers
  store/parse.rs  — markdown parsing (MemoryUnit, parse_units)
  store/view.rs   — StoreView trait, MmapView, AnyView
  store/mod.rs    — Store impl methods, re-exports

new_node/new_relation become free functions in types.rs.
All callers updated: capnp_store:: → store::
2026-03-03 12:56:15 -05:00
ProofOfConcept
e34c0ccf4c capnp_store: cache compiled regexes with OnceLock
parse_units and parse_marker_attrs were recompiling 4 regexes on
every call. Since they're called per-file during init, this was
measurable overhead. Use std::sync::OnceLock to compile once.
2026-03-03 12:44:02 -05:00
ProofOfConcept
a2ec8657d2 capnp_store: remove dead has_node trait method, fix fix_categories bulk write
- has_node: defined on StoreView trait but never called externally
- fix_categories: was appending ALL nodes when only changed ones needed
  persisting; now collects changed nodes and appends only those
- save_snapshot: pass log sizes from caller instead of re-statting files
- params: use Copy instead of .clone() in snapshot construction
2026-03-03 12:42:16 -05:00
ProofOfConcept
70a5f05ce0 capnp_store: remove dead code, consolidate CRUD API
Dead code removed:
- rebuild_uuid_index (never called, index built during load)
- node_weight inherent method (all callers use StoreView trait)
- node_community (no callers)
- state_json_path (no callers)
- log_retrieval, log_retrieval_append (no callers; only _static is used)
- memory_dir_pub wrapper (just make memory_dir pub directly)

API consolidation:
- insert_node eliminated — callers use upsert_node (same behavior
  for new nodes, plus handles re-upsert gracefully)

AnyView StoreView dispatch compressed to one line per method
(also removes UFCS workaround that was needed when inherent
node_weight shadowed the trait method).

-69 lines net.
2026-03-03 12:38:52 -05:00
ProofOfConcept
0bce6aac3c capnp_store: extract helpers, eliminate duplication
- modify_node(): get_mut→modify→version++→append pattern was duplicated
  across mark_used, mark_wrong, categorize — extract once
- resolve_node_uuid(): resolve-or-redirect pattern was inlined in both
  link and causal edge creation — extract once
- ingest_units() + classify_filename(): shared logic between
  scan_dir_for_init and import_file — import_file shrinks to 6 lines
- Remove dead seen_keys HashSet (built but never read)
- partial_cmp().unwrap() → total_cmp() in cap_degree

-95 lines net.
2026-03-03 12:35:00 -05:00
ProofOfConcept
ea0d631051 capnp_store: declarative serialization via macros
Replace 130 lines of manual field-by-field capnp serialization with
two declarative macros:

  capnp_enum!  — generates to_capnp/from_capnp for enum types
  capnp_message! — generates from_capnp/to_capnp for structs

Adding a field to the capnp schema now means adding it in one place;
both read and write directions are generated from the same declaration.

Eliminates: read_content_node, write_content_node, read_relation,
write_relation, read_provenance (5 functions → 2 macro invocations).

Callers updated to method syntax: Node::from_capnp() / node.to_capnp().

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-03-03 12:25:10 -05:00
ProofOfConcept
ec8b4b2ed2 eliminate schema_fit: it's clustering coefficient
schema_fit was algebraically identical to clustering_coefficient
(both compute 2E/(d*(d-1)) = fraction of connected neighbor pairs).
Remove the redundant function, field, and metrics column.

- Delete schema_fit() and schema_fit_all() from graph.rs
- Remove schema_fit field from Node struct
- Remove avg_schema_fit from MetricsSnapshot (duplicated avg_cc)
- Replace all callers with graph.clustering_coefficient()
- Rename ReplayItem.schema_fit to .cc
- Query: "cc" and "schema_fit" both resolve from graph CC
- Low-CC count folded into health report CC line

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-03-03 12:21:04 -05:00
ProofOfConcept
e33328e515 store: filter deleted relations from graph building and snapshots
for_each_relation() was iterating deleted relations, polluting the
graph with ghost edges. Also filter them from rkyv snapshots and
clean them from the in-memory vec after cap_degree pruning.
2026-03-03 10:55:56 -05:00
ProofOfConcept
71e6f15d82 spectral decomposition, search improvements, char boundary fix
- New spectral module: Laplacian eigendecomposition of the memory graph.
  Commands: spectral, spectral-save, spectral-neighbors, spectral-positions,
  spectral-suggest. Spectral neighbors expand search results beyond keyword
  matching to structural proximity.

- Search: use StoreView trait to avoid 6MB state.bin rewrite on every query.
  Append-only retrieval logging. Spectral expansion shows structurally
  nearby nodes after text results.

- Fix panic in journal-tail: string truncation at byte 67 could land inside
  a multi-byte character (em dash). Now walks back to char boundary.

- Replay queue: show classification and spectral outlier score.

- Knowledge agents: extractor, challenger, connector prompts and runner
  scripts for automated graph enrichment.

- memory-search hook: stale state file cleanup (24h expiry).
2026-03-03 01:33:31 -05:00
ProofOfConcept
94dbca6018 graph health: fix-categories, cap-degree, link-orphans
Three new tools for structural graph health:

- fix-categories: rule-based recategorization fixing core inflation
  (225 → 26 core nodes). Only identity.md and kent.md stay core;
  everything else reclassified to tech/obs/gen by file prefix rules.

- cap-degree: two-phase degree capping. First prunes weakest Auto
  edges, then prunes Link edges to high-degree targets (they have
  alternative paths). Brought max degree from 919 → 50.

- link-orphans: connects degree-0/1 nodes to most textually similar
  connected nodes via cosine similarity. Linked 614 orphans.

Also: community detection now filters edges below strength 0.3,
preventing weak auto-links from merging unrelated communities.

Pipeline updated: consolidate-full now runs link-orphans + cap-degree
instead of triangle-close (which was counterproductive — densified
hub neighborhoods instead of building bridges).

Net effect: Gini 0.754 → 0.546, max degree 919 → 50.
2026-03-01 08:18:07 -05:00
ProofOfConcept
c7e7cfb7af store: always replay from capnp log, remove stale cache optimization
The mtime-based cache (state.bin) was causing data loss under
concurrent writes. Multiple processes (dream loop journal writes,
link audit agents, journal enrichment agents) would each:
1. Load state.bin (stale - missing other processes' recent writes)
2. Make their own changes
3. Save state.bin, overwriting entries from other processes

This caused 48 nodes to be lost from tonight's dream session -
entries were in the append-only capnp log but invisible to the
index because a later writer's state.bin overwrote the version
that contained them.

Fix: always replay from the capnp log (the source of truth).
Cost: ~10ms extra at 2K nodes (36ms vs 26ms). The cache saved
10ms but introduced a correctness bug that lost real data.

The append-only log design was correct - the cache layer violated
its invariant by allowing stale reads to silently discard writes.
2026-03-01 05:46:35 -05:00
ProofOfConcept
0ea86b8d54 refactor: extract Store methods, clean up shell-outs
- Add Store::upsert() — generic create-or-update, used by cmd_write
- Add Store::insert_node() — for pre-constructed nodes (journal entries)
- Add Store::delete_node() — soft-delete with version bump
- Simplify cmd_write (20 → 8 lines), cmd_node_delete (16 → 7 lines),
  cmd_journal_write (removes manual append/insert/save boilerplate)
- Replace generate_cookie shell-out to head/urandom with direct
  /dev/urandom read + const alphabet table

main.rs: 1137 → 1109 lines.
2026-02-28 23:49:43 -05:00
ProofOfConcept
29d5ed47a1 clippy: fix all warnings across all binaries
- &PathBuf → &Path in memory-search.rs signatures
- Redundant field name in graph.rs struct init
- Add truncate(false) to lock file open
- Derive Default for Store instead of manual impl
- slice::from_ref instead of &[x.clone()]
- rsplit_once instead of split().last()
- str::repeat instead of iter::repeat().take().collect()
- is_none_or instead of map_or(true, ...)
- strip_prefix instead of manual slicing

Zero warnings on `cargo clippy`.
2026-02-28 23:47:11 -05:00
ProofOfConcept
7ee6f9c651 refactor: eliminate date shell-outs, move logic to Store methods
- Replace all 5 `Command::new("date")` calls across 4 files with
  pure Rust time formatting via libc localtime_r
- Add format_date/format_datetime/format_datetime_space helpers to
  capnp_store
- Move import_file, find_journal_node, export_to_markdown, render_file,
  file_sections into Store methods where they belong
- Fix find_current_transcript to search all project dirs instead of
  hardcoding bcachefs-tools path
- Fix double-reference .clone() warnings in cmd_trace
- Fix unused variable warning in neuro.rs

main.rs: 1290 → 1137 lines, zero warnings.
2026-02-28 23:44:44 -05:00
ProofOfConcept
f20ea4f827 add position field to capnp schema
Position was only in the bincode cache (serde field) — it would
be lost on cache rebuild from capnp logs. Now persisted in the
append-only log via ContentNode.position @19.

Also fixes journal-tail sorting to extract dates from content
headers, falling back to key-embedded dates.
2026-02-28 23:15:10 -05:00
ProofOfConcept
7b811125ca add position field to nodes for stable section ordering
Sections within a file have a natural order that matters —
identity.md reads as a narrative, not an alphabetical index.

The position field (u32) tracks section index within the file.
Set during init and import from parse order. Export and
load-context sort by position instead of key, preserving the
author's intended structure.
2026-02-28 23:06:27 -05:00
ProofOfConcept
57cf61de44 add write, import, and export commands
write KEY: upsert a single node from stdin. Creates new or updates
existing with version bump. No-op if content unchanged.

import FILE: parse markdown sections, diff against store, upsert
changed/new nodes. Incremental — only touches what changed.

export FILE|--all: regenerate markdown from store nodes. Gathers
file-level + section nodes, reconstitutes mem markers with links
and causes from the relation graph.

Together these close the bidirectional sync loop:
  markdown → import → store → export → markdown

Also exposes memory_dir_pub() for use from main.rs.
2026-02-28 23:00:52 -05:00
ProofOfConcept
1a01cbf8f8 init: reconcile with existing nodes, filter orphaned edges
init now detects content changes in markdown files and updates
existing nodes (bumps version, appends to capnp log) instead of
only creating new ones. Link resolution uses the redirect table
so references to moved sections (e.g. from the reflections split)
create edges to the correct target.

On cache rebuild from capnp logs, filter out relations that
reference deleted/missing nodes so the relation count matches
the actual graph edge count.
2026-02-28 22:45:31 -05:00
ProofOfConcept
2d6c8d5199 add node-delete command and redirect table for split files
node-delete: soft-deletes a node by appending a deleted version to
the capnp log, then removing it from the in-memory cache.

resolve_redirect: when resolve_key can't find a node, checks a static
redirect table for sections that moved during file splits (like the
reflections.md → reflections-{reading,dreams,zoom}.md split). This
handles immutable files (journal.md with chattr +a) that can't have
their references updated.
2026-02-28 22:40:17 -05:00
ProofOfConcept
4b0bba7c56 replace state.json cache with bincode state.bin
Faster serialization/deserialization, smaller on disk (4.2MB vs 5.9MB).
Automatic migration from state.json on first load — reads the JSON,
writes state.bin, deletes the old file.

Added list-keys, list-edges, dump-json commands so Python scripts no
longer need to parse the cache directly. Updated bulk-categorize.py
and consolidation-loop.py to use the new CLI commands.
2026-02-28 22:30:03 -05:00
ProofOfConcept
c4d1675128 fix: persist all mutations to capnp log
mark_used, mark_wrong, and decay all modified node state (weight,
uses, wrongs, spaced_repetition_interval) only in memory + state.json.
Like the categorize fix, these changes would be lost on cache rebuild.

Now all three append updated node versions to the capnp log. Decay
appends all nodes in one batch since it touches every node.

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-02-28 22:24:53 -05:00
ProofOfConcept
6322b3fd61 fix: persist categorizations to capnp log
categorize() only updated the in-memory HashMap and state.json cache.
When init appended new nodes to nodes.capnp (making it newer than
state.json), the next load() would rebuild from capnp logs and lose
all category assignments.

Fix: append an updated node version to the capnp log when category
changes, so it survives cache rebuilds.

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-02-28 22:19:17 -05:00
ProofOfConcept
23fac4e5fe poc-memory v0.4.0: graph-structured memory with consolidation pipeline
Rust core:
- Cap'n Proto append-only storage (nodes + relations)
- Graph algorithms: clustering coefficient, community detection,
  schema fit, small-world metrics, interference detection
- BM25 text similarity with Porter stemming
- Spaced repetition replay queue
- Commands: search, init, health, status, graph, categorize,
  link-add, link-impact, decay, consolidate-session, etc.

Python scripts:
- Episodic digest pipeline: daily/weekly/monthly-digest.py
- retroactive-digest.py for backfilling
- consolidation-agents.py: 3 parallel Sonnet agents
- apply-consolidation.py: structured action extraction + apply
- digest-link-parser.py: extract ~400 explicit links from digests
- content-promotion-agent.py: promote episodic obs to semantic files
- bulk-categorize.py: categorize all nodes via single Sonnet call
- consolidation-loop.py: multi-round automated consolidation

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-02-28 22:17:00 -05:00