flatten: move poc-memory contents to workspace root

No more subcrate nesting — src/, agents/, schema/, defaults/, build.rs
all live at the workspace root. poc-daemon remains as the only workspace
member. Crate name (poc-memory) and all imports unchanged.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
This commit is contained in:
ProofOfConcept 2026-03-25 00:54:12 -04:00
parent 891cca57f8
commit 998b71e52c
113 changed files with 79 additions and 78 deletions

View file

@ -0,0 +1,202 @@
# Daemon & Jobkit Architecture Survey
_2026-03-14, autonomous survey while Kent debugs discard FIFO_
## Current state
daemon.rs is 1952 lines mixing three concerns:
- ~400 lines: pure jobkit usage (spawn, depend_on, resource)
- ~600 lines: logging/monitoring (log_event, status, RPC)
- ~950 lines: job functions embedding business logic
## What jobkit provides (good)
- Worker pool with named workers
- Dependency graph: `depend_on()` for ordering
- Resource pools: `ResourcePool` for concurrency gating (LLM slots)
- Retry logic: `retries(N)` on `TaskError::Retry`
- Task status tracking: `choir.task_statuses()``Vec<TaskInfo>`
- Cancellation: `ctx.is_cancelled()`
## What jobkit is missing
### 1. Structured logging (PRIORITY)
- Currently dual-channel: `ctx.log_line()` (per-task) + `log_event()` (daemon JSONL)
- No log levels, no structured context, no correlation IDs
- Log rotation is naive (truncate at 1MB, keep second half)
- Need: observability hooks that both human TUI and AI can consume
### 2. Metrics (NONE EXIST)
- No task duration histograms
- No worker utilization tracking
- No queue depth monitoring
- No success/failure rates by type
- No resource pool wait times
### 3. Health monitoring
- No watchdog timers
- No health check hooks per job
- No alerting on threshold violations
- Health computed on-demand in daemon, not in jobkit
### 4. RPC (ad-hoc in daemon, should be schematized)
- Unix socket with string matching: `match cmd.as_str()`
- No cap'n proto schema for daemon control
- No versioning, no validation, no streaming
## Architecture problems
### Tangled concerns
Job functions hardcode `log_event()` calls. Graph health is in daemon
but uses domain-specific metrics. Store loading happens inside jobs
(10 agent runs = 10 store loads). Not separable.
### Magic numbers
- Workers = `llm_concurrency + 3` (line 682)
- 10 max new jobs per tick (line 770)
- 300/1800s backoff range (lines 721-722)
- 1MB log rotation (line 39)
- 60s scheduler interval (line 24)
None configurable.
### Hardcoded pipeline DAG
Daily pipeline phases are `depend_on()` chains in Rust code (lines
1061-1109). Can't adjust without recompile. No visualization. No
conditional skipping of phases.
### Task naming is fragile
Names used as both identifiers AND for parsing in TUI. Format varies
(colons, dashes, dates). `task_group()` splits on '-' to categorize —
brittle.
### No persistent task queue
Restart loses all pending tasks. Session watcher handles this via
reconciliation (good), but scheduler uses `last_daily` date from file.
## What works well
1. **Reconciliation-based session discovery** — elegant, restart-resilient
2. **Resource pooling** — LLM concurrency decoupled from worker count
3. **Dependency-driven pipeline** — clean DAG via `depend_on()`
4. **Retry with backoff** — exponential 5min→30min, resets on success
5. **Graceful shutdown** — SIGINT/SIGTERM handled properly
## Kent's design direction
### Event stream, not log files
One pipeline, multiple consumers. TUI renders for humans, AI consumes
structured data. Same events, different renderers. Cap'n Proto streaming
subscription: `subscribe(filter) -> stream<Event>`.
"No one ever thinks further ahead than log files with monitoring and
it's infuriating." — Kent
### Extend jobkit, don't add a layer
jobkit already has the scheduling and dependency graph. Don't create a
new orchestration layer — add the missing pieces (logging, metrics,
health, RPC) to jobkit itself.
### Cap'n Proto for everything
Standard RPC definitions for:
- Status queries (what's running, pending, failed)
- Control (start, stop, restart, queue)
- Event streaming (subscribe with filter)
- Health checks
## The bigger picture: bcachefs as library
Kent's monitoring system in bcachefs (event_inc/event_inc_trace + x-macro
counters) is the real monitoring infrastructure. 1-1 correspondence between
counters (cheap, always-on dashboard via `fs top`) and tracepoints (expensive
detail, only runs when enabled). The x-macro enforces this — can't have one
without the other.
When the Rust conversion is complete, bcachefs becomes a library. At that
point, jobkit doesn't need its own monitoring — it uses the same counter/
tracepoint infrastructure. One observability system for everything.
**Implication for now:** jobkit monitoring just needs to be good enough.
JSON events, not typed. Don't over-engineer — the real infrastructure is
coming from the Rust conversion.
## Extraction: jobkit-daemon library (designed with Kent)
### Goes to jobkit-daemon (generic)
- JSONL event logging with size-based rotation
- Unix domain socket server + signal handling
- Status file writing (periodic JSON snapshot)
- `run_job()` wrapper (logging + progress + error mapping)
- Systemd service installation
- Worker pool setup from config
- Cap'n Proto RPC for control protocol
### Stays in poc-memory (application)
- All job functions (experience-mine, fact-mine, consolidation, etc.)
- Session watcher, scheduler, RPC command handlers
- GraphHealth, consolidation plan logic
### Interface design
- Cap'n Proto RPC for typed operations (submit, cancel, subscribe)
- JSON blob for status (inherently open-ended, every app has different
job types — typing this is the tracepoint mistake)
- Application registers: RPC handlers, long-running tasks, job functions
- ~50-100 lines of setup code, call `daemon.run()`
## Plan of attack
1. **Observability hooks in jobkit**`on_task_start/progress/complete`
callbacks that consumers can subscribe to
2. **Structured event type** — typed events with task ID, name, duration,
result, metadata. Not strings.
3. **Metrics collection** — duration histograms, success rates, queue
depth. Built on the event stream.
4. **Cap'n Proto daemon RPC schema** — replace ad-hoc socket protocol
5. **TUI consumes event stream** — same data as AI consumer
6. **Extract monitoring from daemon.rs** — the 600 lines of logging/status
become generic, reusable infrastructure
7. **Declarative pipeline config** — DAG definition in config, not code
## File reference
- `src/agents/daemon.rs` — 1952 lines, all orchestration
- Job functions: 96-553
- run_daemon(): 678-1143
- Socket/RPC: 1145-1372
- Status display: 1374-1682
- `src/tui.rs` — 907 lines, polls status socket every 2s
- `schema/memory.capnp` — 125 lines, data only, no RPC definitions
- `src/config.rs` — configuration loading
- External: `jobkit` crate (git dependency)
## Mistakes I made building this (learning notes)
_Per Kent's instruction: note what went wrong and WHY._
1. **Dual logging channels** — I added `log_event()` because `ctx.log_line()`
wasn't enough, instead of fixing the underlying abstraction. Symptom:
can't find a failed job without searching two places.
2. **Magic numbers** — I hardcoded constants because "I'll make them
configurable later." Later never came. Every magic number is a design
decision that should have been explicit.
3. **1952-line file** — daemon.rs grew organically because each new feature
was "just one more function." Should have extracted when it passed 500
lines. The pain of refactoring later is always worse than the pain of
organizing early.
4. **Ad-hoc RPC** — String matching seemed fine for 2 commands. Now it's 4
commands and growing, with implicit formats. Should have used cap'n proto
from the start — the schema IS the documentation.
5. **No tests** — Zero tests in daemon code. "It's a daemon, how do you test
it?" is not an excuse. The job functions are pure-ish and testable. The
scheduler logic is testable with a clock abstraction.
6. **Not using systemd** — There's a systemd service for the daemon.
I keep starting it manually with `poc-memory agent daemon start` and
accumulating multiple instances. Tonight: 4 concurrent daemons, 32
cores pegged at 95%, load average 92. USE SYSTEMD. That's what it's for.
`systemctl --user start poc-memory-daemon`. ONE instance. Managed.
Pattern: every shortcut was "just for now" and every "just for now" became
permanent. Kent's yelling was right every time.

View file

@ -0,0 +1,98 @@
# Link Strength Feedback Design
_2026-03-14, designed with Kent_
## The two signals
### "Not relevant" → weaken the EDGE
The routing failed. Search followed a link and arrived at a node that
doesn't relate to what I was looking for. The edge carried activation
where it shouldn't have.
- Trace back through memory-search's recorded activation path
- Identify which edge(s) carried activation to the bad result
- Weaken those edges by a conscious-scale delta (0.01)
### "Not useful" → weaken the NODE
The routing was correct but the content is bad. The node itself isn't
valuable — stale, wrong, poorly written, duplicate.
- Downweight the node (existing `poc-memory wrong` behavior)
- Don't touch the edges — the path was correct, the destination was bad
## Three tiers of adjustment
### Tier 1: Agent automatic (0.00001 per event)
- Agent follows edge A→B during a run
- If the run produces output that gets `used` → strengthen A→B
- If the run produces nothing useful → weaken A→B
- The agent doesn't know this is happening — daemon tracks it
- Clamped to [0.05, 0.95] — edges can never hit 0 or 1
- Logged: every adjustment recorded with (agent, edge, delta, timestamp)
### Tier 2: Conscious feedback (0.01 per event)
- `poc-memory not-relevant KEY` → trace activation path, weaken edges
- `poc-memory not-useful KEY` → downweight node
- `poc-memory used KEY` → strengthen edges in the path that got here
- 100x stronger than agent signal — deliberate judgment
- Still clamped, still logged
### Tier 3: Manual override (direct set)
- `poc-memory graph link-strength SRC DST VALUE` → set directly
- For when we know exactly what a strength should be
- Rare, but needed for bootstrapping / correction
## Implementation: recording the path
memory-search already computes the spread activation trace. Need to:
1. Record the activation path for each result (which edges carried how
much activation to arrive at this node)
2. Persist this per-session so `not-relevant` can look it up
3. The `record-hits` RPC already sends keys to the daemon — extend
to include (key, activation_path) pairs
## Implementation: agent tracking
In the daemon's job functions:
1. Before LLM call: record which nodes and edges the agent received
2. After LLM call: parse output for LINK/WRITE_NODE actions
3. If actions are created and later get `used` → the input edges were useful
4. If no actions or actions never used → the input edges weren't useful
5. This is a delayed signal — requires tracking across time
Simpler first pass: just track co-occurrence. If two nodes appear
together in a successful agent run, strengthen the edge between them.
No need to track which specific edge was "followed."
## Clamping
```rust
fn adjust_strength(current: f32, delta: f32) -> f32 {
(current + delta).clamp(0.05, 0.95)
}
```
Edges can asymptotically approach 0 or 1 but never reach them.
This prevents dead edges (can always be revived by strong signal)
and prevents edges from becoming unweakenable.
## Logging
Every adjustment logged as JSON event:
```json
{"ts": "...", "event": "strength_adjust", "source": "agent|conscious|manual",
"edge": ["nodeA", "nodeB"], "old": 0.45, "new": 0.4501, "delta": 0.0001,
"reason": "co-retrieval in linker run c-linker-42"}
```
This lets us:
- Watch the distribution shift over time
- Identify edges that are oscillating (being pulled both ways)
- Tune the delta values based on observed behavior
- Roll back if something goes wrong
## Migration from current commands
- `poc-memory wrong KEY [CTX]` → splits into `not-relevant` and `not-useful`
- `poc-memory used KEY` → additionally strengthens edges in activation path
- Both old commands continue to work for backward compat, mapped to the
most likely intent (wrong → not-useful, used → strengthen path)

View file

@ -0,0 +1,254 @@
# Query Language Design — Unifying Search and Agent Selection
Date: 2026-03-10
Status: Phase 1 complete (2026-03-10)
## Problem
Agent node selection is hardcoded in Rust (`prompts.rs`). Adding a new
agent means editing Rust, recompiling, restarting the daemon. The
existing search pipeline (spread, spectral, etc.) handles graph
exploration but can't express structured predicates on node fields.
We need one system that handles both:
- **Search**: "find nodes related to these terms" (graph exploration)
- **Selection**: "give me episodic nodes not seen by linker in 7 days,
sorted by priority" (structured predicates)
## Design Principle
The pipeline already exists: stages compose left-to-right, each
transforming a result set. We extend it with predicate stages that
filter/sort on node metadata, alongside the existing graph algorithm
stages.
An agent definition becomes a query expression + prompt template.
The daemon scheduler is just "which queries have stale results."
## Current Pipeline
```
seeds → [stage1] → [stage2] → ... → results
```
Each stage takes `Vec<(String, f64)>` (key, score) and returns the same.
Stages are parsed from strings: `spread,max_hops=4` or `spectral,k=20`.
## Proposed Extension
### Two kinds of stages
**Generators** — produce a result set from nothing (or from the store):
```
all # every non-deleted node
match:btree # text match (current seed extraction)
```
**Filters** — narrow an existing result set:
```
type:episodic # node_type == EpisodicSession
type:semantic # node_type == Semantic
key:journal#j-* # glob match on key
key-len:>=60 # key length predicate
weight:>0.5 # numeric comparison
age:<7d # created/modified within duration
content-len:>1000 # content size filter
provenance:manual # provenance match
not-visited:linker,7d # not seen by agent in duration
visited:linker # HAS been seen by agent (for auditing)
community:42 # community membership
```
**Transforms** — reorder or reshape:
```
sort:priority # consolidation priority scoring
sort:timestamp # by timestamp (desc by default)
sort:content-len # by content size
sort:degree # by graph degree
sort:weight # by weight
limit:20 # truncate
```
**Graph algorithms** (existing, unchanged):
```
spread # spreading activation
spectral,k=20 # spectral nearest neighbors
confluence # multi-source reachability
geodesic # straightest spectral paths
manifold # extrapolation along seed direction
```
### Syntax
Pipe-separated stages, same as current `-p` flag:
```
all | type:episodic | not-visited:linker,7d | sort:priority | limit:20
```
Or on the command line:
```
poc-memory search -p all -p type:episodic -p not-visited:linker,7d -p sort:priority -p limit:20
```
Current search still works unchanged:
```
poc-memory search btree journal -p spread
```
(terms become `match:` seeds implicitly)
### Agent definitions
A TOML file in `~/.claude/memory/agents/`:
```toml
# agents/linker.toml
[query]
pipeline = "all | type:episodic | not-visited:linker,7d | sort:priority | limit:20"
[prompt]
template = "linker.md"
placeholders = ["TOPOLOGY", "NODES"]
[execution]
model = "sonnet"
actions = ["link-add", "weight"] # allowed poc-memory actions in response
schedule = "daily" # or "on-demand"
```
The daemon reads agent definitions, executes their queries, fills
templates, calls the model, records visits on success.
### Implementation Plan
#### Phase 1: Filter stages in pipeline
Add to `search.rs`:
```rust
enum Stage {
Generator(Generator),
Filter(Filter),
Transform(Transform),
Algorithm(Algorithm), // existing
}
enum Generator {
All,
Match(Vec<String>), // current seed extraction
}
enum Filter {
Type(NodeType),
KeyGlob(String),
KeyLen(Comparison),
Weight(Comparison),
Age(Comparison), // vs now - timestamp
ContentLen(Comparison),
Provenance(Provenance),
NotVisited { agent: String, duration: Duration },
Visited { agent: String },
Community(u32),
}
enum Transform {
Sort(SortField),
Limit(usize),
}
enum Comparison {
Gt(f64),
Gte(f64),
Lt(f64),
Lte(f64),
Eq(f64),
}
enum SortField {
Priority,
Timestamp,
ContentLen,
Degree,
Weight,
}
```
The pipeline runner checks stage type:
- Generator: ignores input, produces new result set
- Filter: keeps items matching predicate, preserves scores
- Transform: reorders or truncates
- Algorithm: existing graph exploration (needs Graph)
Filter/Transform stages need access to the Store (for node fields)
and VisitIndex (for visit predicates). The `StoreView` trait already
provides node access; extend it for visits.
#### Phase 2: Agent-as-config
Parse TOML agent definitions. The daemon:
1. Reads `agents/*.toml`
2. For each with `schedule = "daily"`, checks if query results have
been visited recently enough
3. If stale, executes: parse pipeline → run query → format nodes →
fill template → call model → parse actions → record visits
Hot reload: watch the agents directory, pick up changes without restart.
#### Phase 3: Retire hardcoded agents
Migrate each hardcoded agent (replay, linker, separator, transfer,
rename, split) to a TOML definition. Remove the match arms from
`agent_prompt()`. The separator agent is the trickiest — its
"interference pair" selection is a join-like operation that may need
a custom generator stage rather than simple filtering.
## What we're NOT building
- A general-purpose SQL engine. No joins, no GROUP BY, no subqueries.
- Persistent indices. At ~13k nodes, full scan with predicate evaluation
is fast enough (~1ms). Add indices later if profiling demands it.
- A query optimizer. Pipeline stages execute in declaration order.
## StoreView Considerations
The existing `StoreView` trait only exposes `(key, content, weight)`.
Filter stages need access to `node_type`, `timestamp`, `key`, etc.
Options:
- (a) Expand StoreView with `node_meta()` returning a lightweight struct
- (b) Filter stages require `&Store` directly (not trait-polymorphic)
- (c) Add `fn node(&self, key: &str) -> Option<NodeRef>` to StoreView
Option (b) is simplest for now — agents always use a full Store. The
search hook (MmapView path) doesn't need agent filters. We can
generalize to (c) later if MmapView needs filter support.
For Phase 1, filter stages take `&Store` and the pipeline runner
dispatches: algorithm stages use `&dyn StoreView`, filter/transform
stages use `&Store`. This keeps the fast MmapView path for interactive
search untouched.
## Open Questions
1. **Separator agent**: Its "interference pairs" selection doesn't fit
the filter model cleanly. Best option is a custom generator stage
`interference-pairs,min_sim=0.5` that produces pair keys.
2. **Priority scoring**: `sort:priority` calls `consolidation_priority()`
which needs graph + spectral. This is a transform that needs the
full pipeline context — treat it as a "heavy sort" that's allowed
to compute.
3. **Duration syntax**: `7d`, `24h`, `30m`. Parse with simple regex
`(\d+)(d|h|m)` → seconds.
4. **Negation**: Prefix `!` on predicate: `!type:episodic`.
5. **Backwards compatibility**: Current `-p spread` syntax must keep
working. The parser tries algorithm names first, then predicate
syntax. No ambiguity since algorithms are bare words and predicates
use `:`.
6. **Stage ordering**: Generators must come first (or the pipeline
starts with implicit "all"). Filters/transforms can interleave
freely with algorithms. The runner validates this at parse time.