7.4 KiB
poc-memory daemon design sketch
2026-03-05, ProofOfConcept + Kent
Problem
Memory maintenance (extraction, consolidation, digests, knowledge loop)
currently runs via a cron job that shells out to separate scripts. This
is fragile (set -e silently ate today's failure), hard to observe,
and has no backpressure or dependency management. The conversation
extraction agent hasn't run in 2 days because Phase 1 crashed and
took Phase 2 with it.
Goal
poc-memory daemon — a single long-running process that owns all
background memory work. Structured so the orchestration logic can later
move into poc-agent when that's ready.
Jobs
Six job types, roughly in pipeline order:
| Job | Trigger | Cadence | Depends on |
|---|---|---|---|
| extract | session end detected | event | — |
| decay | schedule | daily | — |
| consolidate | schedule or post-extract | daily + after extract | extract |
| knowledge-loop | post-consolidate | daily | consolidate |
| digest | schedule | daily (+ weekly) | consolidate |
| health | schedule | hourly | — |
extract
Watch ~/.claude/projects/*/ for conversation jsonl files. A session
is "ended" when its jsonl hasn't been written to in 10 minutes AND no
claude process has it open (lsof/fuser check). Extract durable knowledge
using the observation agent (currently Python/Sonnet).
This is the most time-sensitive job — new material should enter the graph within ~15 minutes of a session ending, not wait 24 hours.
decay
Run weight decay on all nodes. Quick, no API calls. Currently
poc-memory decay.
consolidate
Graph health check, then agent runs: replay, linker, separator.
Link orphans, cap degree. Currently poc-memory consolidate-full
minus the digest step.
knowledge-loop
The four knowledge agents: observation (on any un-extracted fragments),
extractor, connector, challenger. Currently knowledge_loop.py.
Runs until convergence or budget exhausted.
digest
Generate daily and weekly digests from journal entries.
Currently part of consolidate-full.
health
Quick graph metrics snapshot. No API calls. Log metrics for trend tracking. Alert (via status file) if graph health degrades.
Architecture
poc-memory daemon
├── Scheduler
│ ├── clock triggers (daily, hourly)
│ ├── event triggers (session end, post-job)
│ └── condition triggers (health threshold)
├── Job Runner
│ ├── job queue (ordered by priority + dependencies)
│ ├── single-threaded Sonnet executor (backpressure)
│ └── retry logic (exponential backoff on API errors)
├── Session Watcher
│ ├── inotify on ~/.claude/projects/*/
│ ├── staleness + lsof check for session end
│ └── tracks which sessions have been extracted
├── Status Store
│ └── ~/.claude/memory/daemon-status.json
└── Logger
└── structured log → ~/.claude/memory/daemon.log
Scheduler
Three trigger types unified into one interface:
enum Trigger {
/// Fire at fixed intervals
Schedule { interval: Duration, last_run: Option<Instant> },
/// Fire when a condition is met
Event { kind: EventKind },
/// Fire when another job completes
After { job: JobId, only_on_success: bool },
}
enum EventKind {
SessionEnded(PathBuf), // specific jsonl path
HealthBelowThreshold,
}
Jobs declare their triggers and dependencies. The scheduler resolves ordering and ensures a job doesn't run while its dependency is still in progress.
Sonnet executor
All API-calling jobs (extract, consolidate agents, knowledge loop) go through a single Sonnet executor. This provides:
- Serialization: one Sonnet call at a time (simple, avoids rate limits)
- Backpressure: if calls are failing, back off globally
- Cost tracking: log tokens used per job
- Timeout: kill calls that hang (the current scripts have no timeout)
Initially this just shells out to call-sonnet.sh like the Python
scripts do. Later it can use the API directly.
Status store
{
"daemon": {
"pid": 12345,
"started": "2026-03-05T09:15:00-05:00",
"uptime_secs": 3600
},
"jobs": {
"extract": {
"state": "idle",
"last_run": "2026-03-05T10:30:00-05:00",
"last_result": "ok",
"last_duration_secs": 45,
"sessions_extracted": 3,
"next_scheduled": null
},
"consolidate": {
"state": "running",
"started": "2026-03-05T11:00:00-05:00",
"progress": "linker (3/5)",
"last_result": "ok",
"last_duration_secs": 892
},
"knowledge-loop": {
"state": "waiting",
"waiting_on": "consolidate",
"last_result": "ok",
"last_cycles": 25
}
},
"sonnet": {
"state": "busy",
"current_job": "consolidate",
"calls_today": 47,
"errors_today": 0,
"tokens_today": 125000
}
}
Queryable via poc-memory daemon status (reads the JSON, pretty-prints).
Also: poc-memory daemon status --json for programmatic access.
Session watcher
struct SessionWatcher {
/// jsonl paths we've already fully extracted
extracted: HashSet<PathBuf>,
/// jsonl paths we're watching for staleness
watching: HashMap<PathBuf, SystemTime>, // path → last_modified
}
On each tick (every 60s):
- Scan
~/.claude/projects/*/for*.jsonlfiles - For each unknown file, start watching
- For each watched file where mtime is >10min old AND not open by
any process → mark as ended, emit
SessionEndedevent - Skip files in
extractedset
The extracted set persists to disk so we don't re-extract after daemon restart.
Logging
Structured log lines, one JSON object per line:
{"ts":"2026-03-05T11:00:00","job":"consolidate","event":"started"}
{"ts":"2026-03-05T11:00:05","job":"consolidate","event":"sonnet_call","tokens":2400,"duration_ms":3200}
{"ts":"2026-03-05T11:14:52","job":"consolidate","event":"completed","duration_secs":892,"result":"ok"}
{"ts":"2026-03-05T11:14:52","job":"consolidate","event":"error","msg":"Sonnet timeout after 600s","retry_in":120}
poc-memory daemon log tails the log with human-friendly formatting.
poc-memory daemon log --job extract filters to one job.
Migration path
Phase 1: orchestration only
The daemon is just a scheduler + session watcher. Jobs still shell out
to poc-memory consolidate-full, knowledge_loop.py, etc. The value
is: reliable scheduling, session-end detection, centralized status,
error logging that doesn't disappear.
Phase 2: inline jobs
Move job logic into Rust one at a time. Decay and health are trivial (already Rust). Digests next. Consolidation agents and knowledge loop are bigger (currently Python + Sonnet prompts).
Phase 3: poc-agent integration
The daemon becomes a subsystem of poc-agent. The scheduler, status store, and Sonnet executor are reusable. The session watcher feeds into poc-agent's broader awareness of what's happening on the machine.
Open questions
- Signal handling: SIGHUP to reload config? SIGUSR1 to trigger immediate consolidation?
- Multiple claude sessions: extract should handle overlapping sessions (different project dirs). Currently there's usually just one.
- Budget limits: should the daemon have a daily token budget and stop when exhausted? Prevents runaway costs if something loops.
- Notification: when something fails, should it write to the
telegram inbox? Or just log and let
daemon statussurface it?