Kent Overstreet 49f72cdac3 Logging overhaul: per-task log files, daemon.log drill-down

Switch from jobkit-daemon crate to jobkit with daemon feature.
Wire up per-task log files for all daemon-spawned agent tasks.

Changes:
- Use jobkit::daemon:: instead of jobkit_daemon::
- All agent tasks get .log_dir() set to $data_dir/logs/
- Task log path shown in daemon status and TUI
- New CLI: poc-memory agent daemon log --task NAME
  Finds the task's log path from status or daemon.log, tails the file
- LLM backend selection logged to daemon.log via log_event
- Targeted agent job names include the target key for debuggability
- Logging architecture documented in doc/logging.md

Two-level logging, no duplication:
- daemon.log: lifecycle events with task log path for drill-down
- per-task logs: full agent output via ctx.log_line()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-19 11:17:07 -04:00

3.7 KiB

Raw Blame History

Logging Architecture

poc-memory has multiple logging channels serving different purposes. Understanding which log to check is essential for debugging.

Log files

daemon.log — structured event log

Path: $data_dir/daemon.log (default: ~/.claude/memory/daemon.log)
Format: JSONL — {"ts", "job", "event", "detail"}
Written by: jobkit_daemon::event_log::log(), wrapped by log_event() in daemon.rs
Rotation: truncates to last half when file exceeds 1MB
Contains: task lifecycle events (started, completed, failed, progress), session-watcher ticks, scheduler events
View: poc-memory agent daemon log [--job NAME] [--lines N]
Note: the "daemon log" command reads this file and formats the JSONL as human-readable lines with timestamps. The --job filter shows only entries for a specific job name.

daemon-status.json — live snapshot

Path: $data_dir/daemon-status.json
Format: pretty-printed JSON
Written by: write_status() in daemon.rs, called periodically
Contains: current task list with states (pending/running/completed), graph health metrics, consolidation plan, uptime
View: poc-memory agent daemon status

llm-logs/ — per-agent LLM call transcripts

Path: $data_dir/llm-logs/{agent_name}/{timestamp}.txt
Format: plaintext sections: === PROMPT ===, === CALLING LLM ===, === RESPONSE ===
Written by: run_one_agent_inner() in knowledge.rs
Contains: full prompt sent to the LLM and full response received. One file per agent invocation. Invaluable for debugging agent quality — shows exactly what the model saw and what it produced.
Volume: can be large — 292 files for distill alone as of Mar 19.

retrieval.log — memory search queries

Path: $data_dir/retrieval.log
Format: plaintext, one line per search: [date] q="..." hits=N
Contains: every memory search query and hit count. Useful for understanding what the memory-search hook is doing and whether queries are finding useful results.

daily-check.log — graph health history

Path: $data_dir/daily-check.log
Format: plaintext, multi-line entries with metrics
Contains: graph topology metrics over time (σ, α, gini, cc, fit). Only ~10 entries — appended by the daily health check.

In-memory state (redundant with daemon.log)

ctx.log_line() — task output log

Stored in: jobkit task state (last 20 lines per task)
Also writes to: daemon.log via log_event() (as of Mar 19)
View: daemon-status.json → task → output_log, or just tail daemon.log
Design note: the in-memory buffer is redundant now that progress events go to daemon.log. The status viewer should eventually just tail daemon.log filtered by job name, eliminating the in-memory state.

ctx.set_progress() — current activity string

Stored in: jobkit task state
View: shown in status display next to the task name
Note: overwritten by each ctx.log_line() call.

What to check when

Problem	Check
Task not starting	daemon-status.json (task states)
Task failing	daemon.log (failed events)
Agent producing bad output	llm-logs/{agent}/{timestamp}.txt
Agent not finding right nodes	retrieval.log (search queries)
Graph health declining	daily-check.log
Resource pool / parallelism	currently no log — need to add
Which LLM backend is being used	daemon.log (llm-backend event)

3.7 KiB Raw Blame History Unescape Escape

Logging Architecture

Log files

daemon.log — structured event log

daemon-status.json — live snapshot

llm-logs/ — per-agent LLM call transcripts

retrieval.log — memory search queries

daily-check.log — graph health history

In-memory state (redundant with daemon.log)

ctx.log_line() — task output log

ctx.set_progress() — current activity string

What to check when

3.7 KiB

Raw Blame History