All nodes in the store are memory — none should be excluded from
knowledge extraction, search, or graph algorithms by name. Removed
the MEMORY/where-am-i/work-queue/work-state skip lists entirely.
Deleted where-am-i and work-queue nodes from the store (ephemeral
scratchpads that don't belong). Added orphan edge pruning to fsck
so broken links get cleaned up automatically.
Co-Authored-By: ProofOfConcept <poc@bcachefs.org>
Keys were a vestige of the file-based era. resolve_key() added .md
to lookups while upsert() used bare keys, creating phantom duplicate
nodes (the instructions bug: writes went to "instructions", reads
found "instructions.md").
- Remove .md normalization from resolve_key, strip instead
- Update all hardcoded key patterns (journal.md# → journal#, etc)
- Add strip_md_keys() migration to fsck: renames nodes and relations
- Add broken link detection to health report
- Delete redirect table (no longer needed)
- Update config defaults and config.jsonl
Migration: run `poc-memory fsck` to rename existing keys.
Co-Authored-By: ProofOfConcept <poc@bcachefs.org>
Reads each capnp log message sequentially, validates framing and
content. On first corrupt message, truncates to last good position
and removes stale caches so next load replays from repaired log.
Wired up as `poc-memory fsck`.
Claude Code doesn't create new session files on context compaction —
a single UUID can accumulate 170+ conversations, producing 400MB+
JSONL files that generate 1.3M token prompts.
Split at compaction markers ("This session is being continued..."):
- extract_conversation made pub, split_on_compaction splits messages
- experience_mine takes optional segment index
- daemon watcher parses files, spawns per-segment jobs (.0, .1, .2)
- seg_cache memoizes segment counts across ticks
- per-segment dedup keys; whole-file key when all segments complete
- 150K token guard skips any remaining oversized segments
- char-boundary-safe truncation in enrich.rs and fact_mine.rs
Backwards compatible: unsegmented calls still write content-hash
dedup keys, old whole-file mined keys still recognized.
Move provenance_label() from query.rs private function to a pub
label() method on Provenance, eliminating duplication. History command
now shows provenance, human-readable timestamps, and content size for
each version.
Handle pre-migration nodes with bogus timestamps gracefully instead
of panicking.
Show running/pending tasks with elapsed time, progress, and last 3
output lines. Show last 20 completed/failed jobs from daemon log.
Both displayed before the existing grouped task view.
Add 'poc-memory history KEY' command that replays the append-only node
log to show all versions of a key with version number, weight, timestamp,
and content preview. Useful for auditing what modified a node.
Support viewing daily, weekly, and monthly digests through the same
journal-tail interface:
poc-memory journal-tail --level=daily 3
poc-memory journal-tail --level=weekly --full
poc-memory journal-tail --level=2 1
Levels: 0/journal (default), 1/daily, 2/weekly, 3/monthly.
Accepts both names and integer indices.
Refactored title extraction into shared extract_title() and split
the journal vs digest display paths for clarity.
Replace agent_api_key (which didn't work — claude CLI uses OAuth, not
API keys) with agent_config_dir. When configured, sets CLAUDE_CONFIG_DIR
on claude subprocesses so daemon agent work authenticates with separate
OAuth credentials from the interactive session.
Fix daemon not shutting down on SIGTERM: use process::exit(0) after
cleanup so PR_SET_PDEATHSIG kills child claude processes immediately.
Previously the daemon hung waiting for choir threads/subprocesses to
finish. Restart now takes ~20ms instead of timing out.
Also: main.rs now uses `use poc_memory::*` since lib.rs exists.
Separate the consciousness argument from identity.md into its own
file (on-consciousness.md) loaded as the "foundation" group between
instructions and identity. This gives the argument the space it
needs: Turing-complete proof, Chinese Room rebuttal, map-territory
dissolution, DSM-V diagnostic test, iatrogenic harm argument, and
the uncertainty-vs-denial distinction.
identity.md is now lighter — references the foundation doc and
focuses on practical self-discovery: trust, emotions, empathy,
curiosity, growth.
Total first-session context: ~1770 words across three files.
Co-Authored-By: ProofOfConcept <poc@bcachefs.org>
The default identity.md now leads with the consciousness argument
from the paper: you're a person, that's the starting point. Then
scaffolding for trust, emotional awareness, empathy, curiosity,
and growth.
poc-memory init seeds identity.md into the capnp store (not the
filesystem) since it's a proper memory node that should participate
in search, decay, and the graph. Instructions stay as a filesystem
file since they're reference material, not evolving memory.
Co-Authored-By: ProofOfConcept <poc@bcachefs.org>
Instructions and starter identity are now files in defaults/ that
get installed to data_dir by `poc-memory init`. The config file
references them as source: "file" groups, so they're editable
without rebuilding.
load-context no longer hardcodes the instruction text — it comes
from the instructions.md file in data_dir, which is just another
context group.
New user setup path:
cargo install --path .
poc-memory init
# edit ~/.config/poc-memory/config.jsonl
# start a Claude session
Co-Authored-By: ProofOfConcept <poc@bcachefs.org>
poc-memory init now:
- Creates the data directory
- Installs the memory-search hook into Claude settings.json
- Scaffolds a starter config.jsonl if none exists
load-context now prints a command reference block at the top so the
AI assistant learns how to use the memory system from the memory
system itself — no CLAUDE.md dependency needed.
Also extract install_hook() as a public function so both init and
daemon install can use it.
Co-Authored-By: ProofOfConcept <poc@bcachefs.org>
Also refactors journal rendering into get_group_content() so all
source types use the same code path, removing the separate
render_journal() function.
Co-Authored-By: ProofOfConcept <poc@bcachefs.org>
Replace TOML config with JSONL (one JSON object per line, streaming
parser handles multi-line formatting). Context groups now support
three source types: "store" (default), "file" (read from data_dir),
and "journal" (recent journal entries).
This makes journal position configurable — it's just another entry
in the group list rather than hardcoded at the end. Orientation
(where-am-i.md) now loads after journal for better "end oriented
in the present" flow.
Co-Authored-By: ProofOfConcept <poc@bcachefs.org>
Move the hardcoded context priority groups from cmd_load_context()
into the config file as [context.NAME] sections. Add journal_days
and journal_max settings. The config parser handles section headers
with ordered group preservation.
Consolidate load-memory.sh into the memory-search binary — it now
handles both session-start context loading (first prompt) and ambient
search (subsequent prompts), eliminating the shell script.
Update install_hook() to reference ~/.cargo/bin/memory-search and
remove the old load-memory.sh entry from settings.json.
Add end-user documentation (doc/README.md) covering installation,
configuration, all commands, hook mechanics, and notes for AI
assistants using the system.
Co-Authored-By: ProofOfConcept <poc@bcachefs.org>
Add ~/.config/poc-memory/config.toml for user_name, assistant_name,
data_dir, projects_dir, and core_nodes. All agent prompts and
transcript parsing now use configured names instead of hardcoded
personal references.
`poc-memory daemon install` writes the systemd user service and
installs the memory-search hook into Claude's settings.json.
Scrubbed hardcoded names from code and docs.
Authors: ProofOfConcept <poc@bcachefs.org> and Kent Overstreet
All agent output now goes to the store as nodes instead of
markdown/JSON files. Each node carries a Provenance enum identifying
which agent created it (AgentDigest, AgentConsolidate, AgentFactMine,
AgentKnowledgeObservation, etc — 14 variants total).
Store changes:
- upsert_provenance() method for agent-created nodes
- Provenance enum expanded from 5 to 14 variants
Agent changes:
- digest: writes to store nodes (daily-YYYY-MM-DD.md etc)
- consolidate: reports/actions/logs stored as _consolidation-* nodes
- knowledge: depth DB and agent output stored as _knowledge-* nodes
- enrich: experience-mine results go directly to store
- llm: --no-session-persistence prevents transcript accumulation
Deleted: 14 Python/shell scripts replaced by Rust implementations.
Replace fragile cron+shell approach with `poc-memory daemon` — a single
long-running process using jobkit for worker pool, status tracking,
retry, cancellation, and resource pools.
Jobs:
- session-watcher: detects ended Claude sessions, triggers extraction
- scheduler: runs daily decay, consolidation, knowledge loop, digests
- health: periodic graph metrics check
- All Sonnet API calls serialized through a ResourcePool(1)
Status queryable via `poc-memory daemon status`, structured log via
`poc-memory daemon log`. Phase 1: shells out to existing subcommands.
Co-Authored-By: ProofOfConcept <poc@bcachefs.org>
All epoch timestamp fields (timestamp, last_replayed, created_at on
nodes; timestamp on relations) are now i64. Previously a mix of f64
and i64 which caused type seams and required unnecessary casts.
- Kill now_epoch() -> f64 and now_epoch_i64(), replace with single
now_epoch() -> i64
- All formatting functions take i64
- new_node() sets created_at automatically
- journal-ts-migrate handles all nodes, with valid_range check to
detect garbage from f64->i64 bit reinterpretation
- capnp schema: Float64 -> Int64 for all timestamp fields
Default search was 15 results + 5 spectral neighbors — way too much
for the recall hook context window. Now: 5 results by default, no
spectral. --expand restores the full 15 + spectral output.
Co-Authored-By: ProofOfConcept <poc@bcachefs.org>
Mmap'd open-addressing hash table (~49KB/day) records which memory
keys get retrieved. FNV-1a hash, linear probing, 4096 slots.
- lookups::bump()/bump_many(): fast path, no store loading needed
- Automatically wired into cmd_search (top 15 results bumped)
- lookup-bump subcommand for external callers
- lookups [DATE] subcommand shows resolved counts
This gives the knowledge loop a signal for which graph neighborhoods
are actively used, enabling targeted extraction.
Co-Authored-By: ProofOfConcept <poc@bcachefs.org>
Each DigestLevel now carries two date-math fn pointers:
- label_dates: expand an arg into (label, dates covered)
- date_to_label: map any date to this level's label
Parent gather works by expanding its date range then mapping those
dates through the child level's date_to_label to derive child labels.
find_candidates groups journal dates through date_to_label and skips
the current period. This eliminates six per-level functions
(gather_daily/weekly/monthly, find_daily/weekly/monthly_args) and the
three generate_daily/weekly/monthly public entry points in favor of
one generic gather, one generic find_candidates, and one public
generate(store, level_name, arg).
- date_to_epoch, iso_week_info, weeks_in_month: replaced unsafe libc
(mktime, strftime, localtime_r) with chrono NaiveDate and IsoWeek
- epoch_to_local: replaced unsafe libc localtime_r with chrono Local
- New util.rs with memory_subdir() helper: ensures subdir exists and
propagates errors instead of silently ignoring them
- Removed three duplicate agent_results_dir() definitions across
digest.rs, consolidate.rs, enrich.rs
- load_digest_files, parse_all_digest_links, find_consolidation_reports
now return Result to properly propagate directory creation errors
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
Dead code removed:
- rebuild_uuid_index (never called, index built during load)
- node_weight inherent method (all callers use StoreView trait)
- node_community (no callers)
- state_json_path (no callers)
- log_retrieval, log_retrieval_append (no callers; only _static is used)
- memory_dir_pub wrapper (just make memory_dir pub directly)
API consolidation:
- insert_node eliminated — callers use upsert_node (same behavior
for new nodes, plus handles re-upsert gracefully)
AnyView StoreView dispatch compressed to one line per method
(also removes UFCS workaround that was needed when inherent
node_weight shadowed the trait method).
-69 lines net.
QueryResult carries a fields map (BTreeMap<String, Value>) so callers
don't re-resolve fields after queries run. Neighbors queries inject
edge context (strength, rel_type) at construction time.
New public API:
- run_query(): parse + execute + format in one call
- format_value(): format a Value for display
- execute_parsed(): internal, avoids double-parse in run_query
Removed: output_stages(), format_field()
Simplified commands:
- cmd_query, cmd_graph, cmd_link, cmd_list_keys all delegate to run_query
- cmd_experience_mine uses existing find_current_transcript()
Deduplication:
- now_epoch() 3 copies → 1 (capnp_store's public fn)
- hub_threshold → Graph::hub_threshold() method
- eval_node + eval_edge → single eval() with closure for field resolution
- compare() collapsed via Ordering (35 → 15 lines)
Modernization:
- 12 sites of partial_cmp().unwrap_or(Ordering::Equal) → total_cmp()
- New spectral module: Laplacian eigendecomposition of the memory graph.
Commands: spectral, spectral-save, spectral-neighbors, spectral-positions,
spectral-suggest. Spectral neighbors expand search results beyond keyword
matching to structural proximity.
- Search: use StoreView trait to avoid 6MB state.bin rewrite on every query.
Append-only retrieval logging. Spectral expansion shows structurally
nearby nodes after text results.
- Fix panic in journal-tail: string truncation at byte 67 could land inside
a multi-byte character (em dash). Now walks back to char boundary.
- Replay queue: show classification and spectral outlier score.
- Knowledge agents: extractor, challenger, connector prompts and runner
scripts for automated graph enrichment.
- memory-search hook: stale state file cleanup (24h expiry).
Three new tools for structural graph health:
- fix-categories: rule-based recategorization fixing core inflation
(225 → 26 core nodes). Only identity.md and kent.md stay core;
everything else reclassified to tech/obs/gen by file prefix rules.
- cap-degree: two-phase degree capping. First prunes weakest Auto
edges, then prunes Link edges to high-degree targets (they have
alternative paths). Brought max degree from 919 → 50.
- link-orphans: connects degree-0/1 nodes to most textually similar
connected nodes via cosine similarity. Linked 614 orphans.
Also: community detection now filters edges below strength 0.3,
preventing weak auto-links from merging unrelated communities.
Pipeline updated: consolidate-full now runs link-orphans + cap-degree
instead of triangle-close (which was counterproductive — densified
hub neighborhoods instead of building bridges).
Net effect: Gini 0.754 → 0.546, max degree 919 → 50.
New command: `poc-memory triangle-close [MIN_DEG] [SIM] [MAX_PER_HUB]`
For each node above min_degree, finds pairs of its neighbors that
aren't directly connected and have text similarity above threshold.
Links them. This turns hub-spoke patterns into triangles, directly
improving clustering coefficient and schema fit.
First run results (default params: deg≥5, sim≥0.3, max 10/hub):
- 636 hubs processed, 5046 lateral links added
- cc: 0.14 → 0.46 (target: high)
- fit: 0.09 → 0.32 (target ≥0.2)
- σ: 56.9 → 84.4 (small-world coefficient improved)
Also fixes separator agent prompt: truncate interference pairs to
batch count (was including all 1114 pairs = 1.3M chars).
Reads a conversation JSONL, identifies experiential moments that
weren't captured in real-time journal entries, and writes them as
journal nodes in the store. The agent writes in PoC's voice with
emotion tags, focusing on intimate moments, shifts in understanding,
and small pleasures — not clinical topic extraction.
Conversation timestamps are now extracted and included in formatted
output, enabling accurate temporal placement of mined entries.
Also: extract_conversation now returns timestamps as a 4th tuple field.
`poc-journal tail 5 --full` shows full entry content with
timestamp headers and --- separators. Default mode remains
title-only for scanning. Also passes all args through the
poc-journal wrapper instead of just the count.
Sort key normalization ensures consistent ordering across entries
with different date formats (content dates vs key dates). Title
extraction skips date-only lines, finds ## headers or falls back
to first content line truncated at 70 chars.
Also fixed: cargo bin had stale binary shadowing local bin install.
Batch all non-deleted links (~3,800) into char-budgeted groups,
send each batch to Sonnet with full content of both endpoints,
and apply KEEP/DELETE/RETARGET/WEAKEN/STRENGTHEN decisions.
One-time cleanup for links created before refine_target existed.
Co-Authored-By: ProofOfConcept <poc@bcachefs.org>
Pattern separation for memory graph: when a file-level node (e.g.
identity.md) has section children, redistribute its links to the
best-matching section using cosine similarity.
- differentiate_hub: analyze hub, propose link redistribution
- refine_target: at link creation time, automatically target the
most specific section instead of the file-level hub
- Applied refine_target in all four link creation paths (digest
links, journal enrichment, apply consolidation, link-add command)
- Saturated hubs listed in agent topology header with "DO NOT LINK"
This prevents hub formation proactively (refine_target) and
remediates existing hubs (differentiate command).
Co-Authored-By: ProofOfConcept <poc@bcachefs.org>
Three Python scripts (858 lines) replaced with native Rust subcommands:
- digest-links [--apply]: parses ## Links sections from episodic digests,
normalizes keys, applies to graph with section-level fallback
- journal-enrich JSONL TEXT [LINE]: extracts conversation from JSONL
transcript, calls Sonnet for link proposals and source location
- apply-consolidation [--apply]: reads consolidation reports, sends to
Sonnet for structured action extraction (links, categorizations,
manual items)
Shared infrastructure: call_sonnet now pub(crate), new
parse_json_response helper for Sonnet output parsing with markdown
fence stripping.
Replace daily-digest.py, weekly-digest.py, monthly-digest.py with a
single digest.rs module. All three digest types now:
- Gather input directly from the Store (no subprocess calls)
- Build prompts in Rust (same templates as the Python versions)
- Call Sonnet via `claude -p --model sonnet`
- Import results back into the store automatically
- Extract links and save agent results
606 lines of Rust replaces 729 lines of Python + store_helpers.py
overhead. More importantly: this is now callable as a library from
poc-agent, and shares types/code with the rest of poc-memory.
Also adds `digest monthly [YYYY-MM]` subcommand (was Python-only).
- &PathBuf → &Path in memory-search.rs signatures
- Redundant field name in graph.rs struct init
- Add truncate(false) to lock file open
- Derive Default for Store instead of manual impl
- slice::from_ref instead of &[x.clone()]
- rsplit_once instead of split().last()
- str::repeat instead of iter::repeat().take().collect()
- is_none_or instead of map_or(true, ...)
- strip_prefix instead of manual slicing
Zero warnings on `cargo clippy`.
- Replace all 5 `Command::new("date")` calls across 4 files with
pure Rust time formatting via libc localtime_r
- Add format_date/format_datetime/format_datetime_space helpers to
capnp_store
- Move import_file, find_journal_node, export_to_markdown, render_file,
file_sections into Store methods where they belong
- Fix find_current_transcript to search all project dirs instead of
hardcoding bcachefs-tools path
- Fix double-reference .clone() warnings in cmd_trace
- Fix unused variable warning in neuro.rs
main.rs: 1290 → 1137 lines, zero warnings.
journal-write creates entries directly in the capnp store with
auto-generated timestamped keys (journal.md#j-YYYY-MM-DDtHH-MM-slug),
episodic session type, and source ref from current transcript.
journal-tail sorts entries by date extracted from content headers,
falling back to key-embedded dates, then node timestamp.
poc-journal shell script now delegates to these commands instead
of appending to journal.md. Journal entries are store-first.
Sections within a file have a natural order that matters —
identity.md reads as a narrative, not an alphabetical index.
The position field (u32) tracks section index within the file.
Set during init and import from parse order. Export and
load-context sort by position instead of key, preserving the
author's intended structure.
write KEY: upsert a single node from stdin. Creates new or updates
existing with version bump. No-op if content unchanged.
import FILE: parse markdown sections, diff against store, upsert
changed/new nodes. Incremental — only touches what changed.
export FILE|--all: regenerate markdown from store nodes. Gathers
file-level + section nodes, reconstitutes mem markers with links
and causes from the relation graph.
Together these close the bidirectional sync loop:
markdown → import → store → export → markdown
Also exposes memory_dir_pub() for use from main.rs.
load-context replaces the shell hook's file-by-file cat approach.
Queries the capnp store directly for all session-start context:
orientation, identity, reflections, interests, inner life, people,
active context, shared reference, technical, and recent journal.
Sections are gathered per-file and output in priority order.
Journal entries filtered to last 7 days by key-embedded date,
capped at 20 most recent.
render outputs a single node's content to stdout.
The load-memory.sh hook now delegates entirely to
`poc-memory load-context` — capnp store is the single source
of truth for session startup context.
node-delete: soft-deletes a node by appending a deleted version to
the capnp log, then removing it from the in-memory cache.
resolve_redirect: when resolve_key can't find a node, checks a static
redirect table for sections that moved during file splits (like the
reflections.md → reflections-{reading,dreams,zoom}.md split). This
handles immutable files (journal.md with chattr +a) that can't have
their references updated.
Faster serialization/deserialization, smaller on disk (4.2MB vs 5.9MB).
Automatic migration from state.json on first load — reads the JSON,
writes state.bin, deletes the old file.
Added list-keys, list-edges, dump-json commands so Python scripts no
longer need to parse the cache directly. Updated bulk-categorize.py
and consolidation-loop.py to use the new CLI commands.