Logging overhaul: per-task log files, daemon.log drill-down

Switch from jobkit-daemon crate to jobkit with daemon feature. Wire up per-task log files for all daemon-spawned agent tasks. Changes: - Use jobkit::daemon:: instead of jobkit_daemon:: - All agent tasks get .log_dir() set to $data_dir/logs/ - Task log path shown in daemon status and TUI - New CLI: poc-memory agent daemon log --task NAME Finds the task's log path from status or daemon.log, tails the file - LLM backend selection logged to daemon.log via log_event - Targeted agent job names include the target key for debuggability - Logging architecture documented in doc/logging.md Two-level logging, no duplication: - daemon.log: lifecycle events with task log path for drill-down - per-task logs: full agent output via ctx.log_line() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 11:17:07 -04:00 · 2026-03-19 11:17:07 -04:00 · 49f72cdac3
commit 49f72cdac3
parent f2c2c02a22
7 changed files with 192 additions and 54 deletions
--- a/doc/logging.md
+++ b/doc/logging.md
@ -0,0 +1,76 @@
+# Logging Architecture
+
+poc-memory has multiple logging channels serving different purposes.
+Understanding which log to check is essential for debugging.
+
+## Log files
+
+### daemon.log — structured event log
+- **Path**: `$data_dir/daemon.log` (default: `~/.claude/memory/daemon.log`)
+- **Format**: JSONL — `{"ts", "job", "event", "detail"}`
+- **Written by**: `jobkit_daemon::event_log::log()`, wrapped by `log_event()` in daemon.rs
+- **Rotation**: truncates to last half when file exceeds 1MB
+- **Contains**: task lifecycle events (started, completed, failed, progress),
+  session-watcher ticks, scheduler events
+- **View**: `poc-memory agent daemon log [--job NAME] [--lines N]`
+- **Note**: the "daemon log" command reads this file and formats the JSONL
+  as human-readable lines with timestamps. The `--job` filter shows only
+  entries for a specific job name.
+
+### daemon-status.json — live snapshot
+- **Path**: `$data_dir/daemon-status.json`
+- **Format**: pretty-printed JSON
+- **Written by**: `write_status()` in daemon.rs, called periodically
+- **Contains**: current task list with states (pending/running/completed),
+  graph health metrics, consolidation plan, uptime
+- **View**: `poc-memory agent daemon status`
+
+### llm-logs/ — per-agent LLM call transcripts
+- **Path**: `$data_dir/llm-logs/{agent_name}/{timestamp}.txt`
+- **Format**: plaintext sections: `=== PROMPT ===`, `=== CALLING LLM ===`,
+  `=== RESPONSE ===`
+- **Written by**: `run_one_agent_inner()` in knowledge.rs
+- **Contains**: full prompt sent to the LLM and full response received.
+  One file per agent invocation. Invaluable for debugging agent quality —
+  shows exactly what the model saw and what it produced.
+- **Volume**: can be large — 292 files for distill alone as of Mar 19.
+
+### retrieval.log — memory search queries
+- **Path**: `$data_dir/retrieval.log`
+- **Format**: plaintext, one line per search: `[date] q="..." hits=N`
+- **Contains**: every memory search query and hit count. Useful for
+  understanding what the memory-search hook is doing and whether
+  queries are finding useful results.
+
+### daily-check.log — graph health history
+- **Path**: `$data_dir/daily-check.log`
+- **Format**: plaintext, multi-line entries with metrics
+- **Contains**: graph topology metrics over time (σ, α, gini, cc, fit).
+  Only ~10 entries — appended by the daily health check.
+
+## In-memory state (redundant with daemon.log)
+
+### ctx.log_line() — task output log
+- **Stored in**: jobkit task state (last 20 lines per task)
+- **Also writes to**: daemon.log via `log_event()` (as of Mar 19)
+- **View**: `daemon-status.json` → task → output_log, or just tail daemon.log
+- **Design note**: the in-memory buffer is redundant now that progress
+  events go to daemon.log. The status viewer should eventually just
+  tail daemon.log filtered by job name, eliminating the in-memory state.
+
+### ctx.set_progress() — current activity string
+- **Stored in**: jobkit task state
+- **View**: shown in status display next to the task name
+- **Note**: overwritten by each `ctx.log_line()` call.
+
+## What to check when
+
+| Problem                          | Check                              |
+|----------------------------------|------------------------------------|
+| Task not starting                | daemon-status.json (task states)   |
+| Task failing                     | daemon.log (failed events)         |
+| Agent producing bad output       | llm-logs/{agent}/{timestamp}.txt   |
+| Agent not finding right nodes    | retrieval.log (search queries)     |
+| Graph health declining           | daily-check.log                    |
+| Resource pool / parallelism      | **currently no log** — need to add |
+| Which LLM backend is being used  | daemon.log (llm-backend event)     |