Collapse doc/ and docs/

2026-04-04 02:47:24 -04:00 · 2026-04-04 02:47:24 -04:00 · 6fa881f811
commit 6fa881f811
parent 79e384f005
6 changed files with 0 additions and 0 deletions
--- a/doc/claude-code-transcript-format.md
+++ b/doc/claude-code-transcript-format.md
@ -0,0 +1,97 @@
+# Claude Code Transcript Format
+
+Claude Code stores session transcripts as JSONL files (one JSON object per
+line) in `~/.claude/projects/<project-slug>/<session-uuid>.jsonl`.
+
+## Common fields
+
+Every line has:
+- `type` — message type (see below)
+- `uuid` — unique ID for this message
+- `parentUuid` — links to preceding message (forms a chain)
+- `sessionId` — session UUID (matches the filename stem)
+- `timestamp` — ISO 8601
+- `cwd`, `version`, `gitBranch` — session context
+
+## Message types
+
+### `user`
+
+User input or tool results. `message.content` is either:
+- A string (plain user text)
+- An array of content blocks, each with `type`:
+  - `"tool_result"` — result of a tool call, with `tool_use_id`, `content`
+    (string or array of text/image blocks), `is_error`
+
+User messages that start a compaction segment begin with:
+```
+This session is being continued from a previous conversation that ran out of context.
+```
+These are injected by Claude Code when context is compacted.
+
+Additional fields on user messages:
+- `userType` — `"external"` for human input, may differ for system-injected
+- `todos` — task list state
+- `permissionMode` — permission level for the session
+
+### `assistant`
+
+Model responses. `message` contains the full API response:
+- `model` — model ID (e.g. `"claude-opus-4-6"`)
+- `role` — `"assistant"`
+- `content` — array of content blocks:
+  - `{"type": "text", "text": "..."}` — text output
+  - `{"type": "tool_use", "id": "...", "name": "Bash", "input": {...}}` — tool call
+- `stop_reason` — why generation stopped
+- `usage` — token counts (input, output, cache hits)
+
+Additional fields:
+- `requestId` — API request ID
+
+### `system`
+
+System events. Has `subtype` field:
+- `"stop_hook_summary"` — hook execution results at end of turn
+  - `hookCount`, `hookInfos` (command + duration), `hookErrors`
+  - `preventedContinuation`, `stopReason`
+
+### `progress`
+
+Hook execution progress. `data` contains:
+- `type` — e.g. `"hook_progress"`
+- `hookEvent` — trigger event (e.g. `"PostToolUse"`)
+- `hookName` — specific hook (e.g. `"PostToolUse:Bash"`)
+- `command` — hook command path
+
+### `queue-operation`
+
+User input queued while assistant is working:
+- `operation` — `"enqueue"`
+- `content` — the queued text
+
+### `file-history-snapshot`
+
+File state snapshots for undo/redo:
+- `snapshot.trackedFileBackups` — map of file paths to backup state
+
+## Compaction segments
+
+Long-running sessions hit context limits and get compacted. Each compaction
+injects a user message starting with the marker text (see above), containing
+a summary of the preceding conversation. This splits the transcript into
+segments:
+
+- Segment 0: original conversation start through first compaction
+- Segment 1: first compaction summary through second compaction
+- Segment N: Nth compaction through next (or end of file)
+
+Segments are append-only — new compactions add higher-indexed segments.
+Existing segment indices are stable and never shift.
+
+## File lifecycle
+
+- Created when a session starts
+- Grows as messages are exchanged
+- Grows further when compaction happens (summary injected, conversation continues)
+- Never truncated or rewritten
+- Becomes stale when the session ends (no process has the file open)
--- a/doc/daemon.md
+++ b/doc/daemon.md
@ -0,0 +1,106 @@
+# Memory daemon
+
+The background daemon (`poc-memory daemon`) automatically processes
+session transcripts through a multi-stage pipeline, extracting
+experiences and facts into the knowledge graph.
+
+## Starting
+
+```bash
+poc-memory daemon                  # Start foreground
+poc-memory daemon install          # Install systemd service + hooks
+```
+
+## Pipeline stages
+
+Each session file goes through these stages in order:
+
+1. **find_stale_sessions** — stat-only scan for JSONL files >100KB,
+   older than SESSION_STALE_SECS (default 120s). No file reads.
+
+2. **segment splitting** — files with multiple compaction boundaries
+   (`"This session is being continued"`) are split into segments.
+   Each segment gets its own LLM job. Segment counts are cached in
+   a `seg_cache` HashMap to avoid re-parsing large files every tick.
+
+3. **experience-mine** — LLM extracts journal entries, observations,
+   and experiences from each segment. Writes results to the store.
+   Dedup key: `_mined-transcripts.md#f-{uuid}` (single-segment) or
+   `_mined-transcripts.md#f-{uuid}.{N}` (multi-segment).
+
+4. **fact-mine** — LLM extracts structured facts (names, dates,
+   decisions, preferences). Only starts when all experience-mine
+   work is done. Dedup key: `_facts-{uuid}`.
+
+5. **whole-file key** — for multi-segment files, once all segments
+   complete, a whole-file key is written so future ticks skip
+   re-parsing.
+
+## Resource management
+
+LLM calls are gated by a jobkit resource pool (default 1 slot).
+This serializes API access and prevents memory pressure from
+concurrent store loads. MAX_NEW_PER_TICK (10) limits how many
+tasks are spawned per 60s watcher tick.
+
+## Diagnostics
+
+### Log
+
+```bash
+tail -f ~/.consciousness/memory/daemon.log
+```
+
+JSON lines with `ts`, `job`, `event`, and `detail` fields.
+
+### Understanding the tick line
+
+```
+{"job":"session-watcher","event":"tick",
+ "detail":"277 stale, 219 mined, 4 extract, 0 fact, 0 open"}
+```
+
+| Field   | Meaning |
+|---------|---------|
+| stale   | Total session files on disk matching age+size criteria. This is a filesystem count — it does NOT decrease as sessions are mined. |
+| mined   | Sessions with both experience-mine AND fact-mine complete. |
+| extract | Segments currently queued/running for experience-mine. |
+| fact    | Sessions queued/running for fact-mine. |
+| open    | Sessions still being written to (skipped). |
+
+Progress = mined / stale. When mined equals stale, the backlog is clear.
+
+### Checking pipeline health
+
+```bash
+# Experience-mine completions (logged as "experience-mine", not "extract")
+grep "experience-mine.*completed" ~/.consciousness/memory/daemon.log | wc -l
+
+# Errors
+grep "experience-mine.*failed" ~/.consciousness/memory/daemon.log | wc -l
+
+# Store size and node count
+poc-memory status
+wc -c ~/.consciousness/memory/nodes.capnp
+```
+
+## Common issues
+
+**stale count never decreases**: Normal. It's a raw file count, not a
+backlog counter. Compare `mined` to `stale` for actual progress.
+
+**Early failures ("claude exited exit status: 1")**: Oversized segments
+hitting the LLM context limit. The 150k-token size guard and segmented
+mining should prevent this. If it recurs, check segment sizes.
+
+**Memory pressure (OOM)**: Each job loads the full capnp store. At
+200MB+ store size, concurrent jobs can spike to ~5GB. The resource pool
+serializes access, but if the pool size is increased, watch RSS.
+
+**Segments not progressing**: The watcher memoizes segment counts in
+`seg_cache`. If a file is modified after caching (e.g., session resumed),
+the daemon won't see new segments until restarted.
+
+**Extract jobs queued but 0 completed in log**: Completion events are
+logged under the `experience-mine` job name, not `extract`. The `extract`
+label is only used for queue events.
--- a/doc/hooks.md
+++ b/doc/hooks.md
@ -0,0 +1,49 @@
+# Hooks
+
+Hooks integrate poc-memory into Claude Code's session lifecycle.
+Two hook binaries fire on session events, providing memory recall
+and notification delivery.
+
+## Setup
+
+Configured in `~/.claude/settings.json`:
+
+```json
+{
+  "hooks": {
+    "UserPromptSubmit": [{"hooks": [
+      {"type": "command", "command": "memory-search", "timeout": 10},
+      {"type": "command", "command": "poc-hook", "timeout": 5}
+    ]}],
+    "PostToolUse": [{"hooks": [
+      {"type": "command", "command": "poc-hook", "timeout": 5}
+    ]}],
+    "Stop": [{"hooks": [
+      {"type": "command", "command": "poc-hook", "timeout": 5}
+    ]}]
+  }
+}
+```
+
+## memory-search (UserPromptSubmit)
+
+Fires on every user prompt. Two modes:
+
+1. **First prompt or post-compaction**: loads full memory context
+   via `poc-memory load-context` — journal entries, identity nodes,
+   orientation file, configured context groups.
+
+2. **Every prompt**: keyword search over the knowledge graph,
+   returns relevant memories as `additionalContext`. Deduplicates
+   across the session to avoid repeating the same memories.
+
+## poc-hook (UserPromptSubmit, PostToolUse, Stop)
+
+Signals session activity to `poc-daemon` and delivers pending
+notifications:
+
+- **UserPromptSubmit**: signals user activity, drains pending
+  notifications into `additionalContext`
+- **PostToolUse**: signals assistant activity (tool use implies
+  the session is active)
+- **Stop**: signals session end, triggers experience-mine
--- a/doc/memory.md
+++ b/doc/memory.md
@ -0,0 +1,123 @@
+# Memory system
+
+## Two memory systems
+
+**Episodic memory** is the journal — a timestamped stream of
+experiences, observations, and emotional responses. Raw and
+chronological. This is where memories enter the system.
+
+**Associative memory** is the knowledge graph — nodes containing
+distilled knowledge, connected by weighted edges. Topic nodes,
+identity reflections, people profiles, technical notes. This is
+where memories mature into understanding.
+
+The journal is the river; topic nodes are the delta. Experiences
+flow in as journal entries. During consolidation, themes are pulled
+out into topic nodes, connections form between related concepts, and
+the graph self-organizes through spectral analysis and community
+detection.
+
+## Neuroscience-inspired algorithms
+
+The `neuro` module implements consolidation scoring inspired by
+hippocampal replay:
+
+- **Replay queues** — nodes are prioritized for review using
+  spaced-repetition intervals, weighted by spectral displacement
+  (how far a node sits from its community center in eigenspace)
+- **Interference detection** — finds pairs of nodes with high
+  content similarity but contradictory or outdated information
+- **Hub differentiation** — identifies overloaded hub nodes and
+  splits them into more specific children
+- **Spectral embedding** — graph eigendecomposition for community
+  detection and outlier scoring
+
+## Weight decay
+
+Nodes decay exponentially based on category. Core identity nodes
+decay slowest; transient observations decay fastest. The `used` and
+`wrong` feedback commands adjust weights — closing the loop between
+recall and relevance.
+
+## Architecture
+
+- **Store**: Append-only Cap'n Proto log with in-memory cache. Nodes
+  have UUIDs, versions, weights, categories, and spaced-repetition
+  intervals.
+- **Graph**: Typed relations (link, auto, derived). Community
+  detection and clustering coefficients computed on demand.
+- **Search**: TF-IDF weighted keyword search over node content.
+- **Neuro**: Spectral embedding, consolidation scoring, replay
+  queues, interference detection, hub differentiation.
+
+## Configuration
+
+Config: `~/.consciousness/config.jsonl`
+
+```jsonl
+{"config": {
+  "user_name":      "Alice",
+  "assistant_name": "MyAssistant",
+  "data_dir":       "~/.consciousness/memory",
+  "projects_dir":   "~/.claude/projects",
+  "core_nodes":     ["identity.md"],
+  "journal_days":   7,
+  "journal_max":    20
+}}
+
+{"group": "identity",    "keys": ["identity.md"]}
+{"group": "people",      "keys": ["alice.md"]}
+{"group": "technical",   "keys": ["project-notes.md"]}
+{"group": "journal",     "source": "journal"}
+{"group": "orientation", "keys": ["where-am-i.md"], "source": "file"}
+```
+
+Context groups load in order at session start. The special
+`"source": "journal"` loads recent journal entries; `"source": "file"`
+reads directly from disk rather than the store.
+
+Override: `POC_MEMORY_CONFIG=/path/to/config.jsonl`
+
+## Commands
+
+```bash
+poc-memory init                    # Initialize empty store
+poc-memory search QUERY            # Search nodes (AND logic)
+poc-memory render KEY              # Output a node's content
+poc-memory write KEY < content     # Upsert a node from stdin
+poc-memory delete KEY              # Soft-delete a node
+poc-memory rename OLD NEW          # Rename (preserves UUID/edges)
+poc-memory categorize KEY CAT      # core/tech/gen/obs/task
+
+poc-memory journal-write "text"    # Write a journal entry
+poc-memory journal-tail [N]        # Last N entries (default 20)
+  --full                           # Show full content (not truncated)
+  --level=daily|weekly|monthly     # Show digest level
+
+poc-memory used KEY                # Boost weight (was useful)
+poc-memory wrong KEY [CTX]         # Reduce weight (was wrong)
+poc-memory gap DESCRIPTION         # Record a knowledge gap
+
+poc-memory graph                   # Graph statistics
+poc-memory status                  # Store overview
+poc-memory decay                   # Apply weight decay
+poc-memory consolidate-session     # Guided consolidation
+poc-memory load-context            # Output session-start context
+poc-memory load-context --stats    # Context size breakdown
+
+poc-memory experience-mine PATH [--segment N]  # Extract experiences
+poc-memory fact-mine-store PATH                # Extract and store facts
+```
+
+## For AI assistants
+
+If you're an AI assistant using this system:
+
+- **Search before creating**: `poc-memory search` before writing
+  new nodes to avoid duplicates.
+- **Close the feedback loop**: call `poc-memory used KEY` when
+  recalled memories shaped your response. Call `poc-memory wrong KEY`
+  when a memory was incorrect.
+- **Journal is the river, topic nodes are the delta**: write
+  experiences to the journal. During consolidation, pull themes
+  into topic nodes.
--- a/doc/notifications.md
+++ b/doc/notifications.md
@ -0,0 +1,103 @@
+# Notification daemon
+
+`poc-daemon` routes messages from communication modules and internal
+events through a hierarchical, activity-aware delivery system.
+
+## Architecture
+
+```
+  Communication modules              Hooks
+  +-----------------+               +-------------+
+  |  IRC (native)   |--+            |  poc-hook   |
+  |  Telegram       |  |  mpsc      |  (all events|
+  +-----------------+  +------+     +------+------+
+                       |      |            |
+                       v      |     capnp-rpc
+                  +----------+             |
+                  |  poc-daemon            |
+                  |                        |
+                  |  NotifyState <---------+
+                  |    +-- type registry
+                  |    +-- pending queue
+                  |    +-- threshold lookup
+                  |    +-- activity-aware delivery
+                  |
+                  |  idle::State
+                  |    +-- presence detection
+                  |    +-- sleep/wake/dream modes
+                  |    +-- tmux prompt injection
+                  +--------------------------
+```
+
+## Notification types and urgency
+
+Types are free-form hierarchical strings: `irc.mention.nick`,
+`irc.channel.bcachefs`, `telegram.kent`. Each has an urgency level:
+
+| Level | Name    | Meaning                              |
+|-------|---------|--------------------------------------|
+| 0     | ambient | Include in idle context only          |
+| 1     | low     | Deliver on next check                |
+| 2     | normal  | Deliver on next user interaction     |
+| 3     | urgent  | Interrupt immediately                |
+
+Per-type thresholds walk up the hierarchy: `irc.channel.bcachefs-ai`
+-> `irc.channel` -> `irc` -> default. Effective thresholds adjust by
+activity state: raised when focused, lowered when idle, only urgent
+when sleeping.
+
+## Communication modules
+
+**IRC** — native async TLS connection (tokio-rustls). Connects,
+joins channels, parses messages, generates notifications. Runtime
+commands: join, leave, send, status, log, nick. Per-channel logs
+at `~/.consciousness/irc/logs/`.
+
+**Telegram** — native async HTTP long-polling (reqwest). Downloads
+media (photos, voice, documents). Chat ID filtering for security.
+Runtime commands: send, status, log.
+
+Both modules persist config changes to `~/.consciousness/daemon.toml` —
+channel joins and nick changes survive restarts.
+
+## Commands
+
+```bash
+poc-daemon                         # Start daemon
+poc-daemon status                  # State summary
+poc-daemon irc status              # IRC module status
+poc-daemon irc send TARGET MSG     # Send IRC message
+poc-daemon irc join CHANNEL        # Join (persists to config)
+poc-daemon irc leave CHANNEL       # Leave
+poc-daemon irc log [N]             # Last N messages
+poc-daemon telegram status         # Telegram module status
+poc-daemon telegram send MSG       # Send Telegram message
+poc-daemon telegram log [N]        # Last N messages
+poc-daemon notify TYPE URG MSG     # Submit notification
+poc-daemon notifications [URG]     # Get + drain pending
+poc-daemon notify-types            # List all types
+poc-daemon notify-threshold T L    # Set per-type threshold
+poc-daemon sleep / wake / quiet    # Session management
+poc-daemon stop                    # Shut down
+```
+
+## Configuration
+
+Config: `~/.consciousness/daemon.toml`
+
+```toml
+[irc]
+enabled = true
+server = "irc.oftc.net"
+port = 6697
+tls = true
+nick = "MyBot"
+user = "bot"
+realname = "My Bot"
+channels = ["#mychannel"]
+
+[telegram]
+enabled = true
+token = "bot-token-here"
+chat_id = 123456789
+```
--- a/doc/plan-experience-mine-dedup-fix.md
+++ b/doc/plan-experience-mine-dedup-fix.md
@ -0,0 +1,112 @@
+# Fix: experience-mine dedup and retry handling
+
+## Problem
+
+1. **Whole-file dedup key prevents mining new segments.** When a session
+   is mined, `experience_mine()` writes `_mined-transcripts#f-{UUID}` (a
+   whole-file key). If the session later grows (compaction adds segments),
+   the daemon sees the whole-file key and skips it forever. New segments
+   never get mined.
+
+2. **No retry backoff.** When `claude` CLI fails (exit status 1), the
+   session-watcher re-queues the same session every 60s tick. This
+   produces a wall of failures in the log and wastes resources.
+
+## Design
+
+### Dedup keys: per-segment only
+
+Going forward, dedup keys are per-segment: `_mined-transcripts#f-{UUID}.{N}`
+where N is the segment index. No more whole-file keys.
+
+Segment indices are stable — compaction appends new segments, never
+reorders existing ones. See `docs/claude-code-transcript-format.md`.
+
+### Migration of existing whole-file keys
+
+~276 sessions have whole-file keys (`_mined-transcripts#f-{UUID}` with
+no segment suffix) and no per-segment keys. These were mined correctly
+at the time.
+
+When the session-watcher encounters a whole-file key:
+- Count current segments in the file
+- Write per-segment keys for all current segments (they were covered
+  by the old whole-file key)
+- If the file has grown since (new segments beyond the migrated set),
+  those won't have per-segment keys and will be mined normally
+
+This is a one-time migration per file. After migration, the whole-file
+key is harmless dead weight — nothing creates new ones.
+
+### Retry backoff
+
+The session-watcher tracks failed sessions in a local
+`HashMap<String, (Instant, Duration)>` mapping path to
+(next_retry_after, current_backoff).
+
+- Initial backoff: 5 minutes
+- Each failure: double the backoff
+- Cap: 30 minutes
+- Resets on daemon restart (map is thread-local, not persisted)
+
+## Changes
+
+### `poc-memory/src/agents/enrich.rs`
+
+`experience_mine()`: stop writing the bare filename key for unsegmented
+calls. Only write the content-hash key (for the legacy dedup check at
+the top of the function) and per-segment keys.
+
+**Already done** — edited earlier in this session.
+
+### `poc-memory/src/agents/daemon.rs`
+
+Session-watcher changes:
+
+1. **Remove whole-file fast path.** Delete the `is_transcript_mined_with_keys`
+   check that short-circuits before segment counting.
+
+2. **Always go through segment-aware path.** Every stale session gets
+   segment counting (cached) and per-segment key checks.
+
+3. **Migrate whole-file keys.** When we find a whole-file key exists but
+   no per-segment keys: write per-segment keys for all current segments
+   into the store. One-time cost per file, batched into a single
+   store load/save per tick.
+
+4. **seg_cache with size invalidation.** Change from `HashMap<String, usize>`
+   to `HashMap<String, (u64, usize)>` — `(file_size, seg_count)`. When
+   stat shows a different size, evict and re-parse.
+
+5. **Remove `mark_transcript_done`.** Stop writing whole-file keys for
+   fully-mined multi-segment files.
+
+6. **Add retry backoff.** `HashMap<String, (Instant, Duration)>` for
+   tracking failed sessions. Skip sessions whose backoff hasn't expired.
+   On failure (task finishes with error), update the backoff. Exponential
+   from 5min, cap at 30min.
+
+7. **Fact-mining check.** Currently fact-mining is gated behind
+   `experience_done` (the whole-file key). After removing the whole-file
+   fast path, fact-mining should be gated on "all segments mined" —
+   i.e., all per-segment keys exist for the current segment count.
+
+### Manual cleanup after deploy
+
+Delete the dedup keys for sessions that failed repeatedly (like
+`8cebfc0a-bd33-49f1-85a4-1489bdf7050c`) so they get re-processed:
+
+```
+poc-memory delete-node '_mined-transcripts#f-8cebfc0a-bd33-49f1-85a4-1489bdf7050c'
+# also any content-hash key for the same file
+```
+
+## Verification
+
+After deploying:
+- `tail -f ~/.consciousness/memory/daemon.log | grep session-watcher` should
+  show ticks with migration activity, then settle to idle
+- Failed sessions should show increasing backoff intervals, not
+  per-second retries
+- After fixing the `claude` CLI issue, backed-off sessions should
+  retry and succeed on the next daemon restart