Commit graph

6 commits

Author SHA1 Message Date
Kent Overstreet
6c41b50e04 JsonlBackwardIter: use memrchr3 for SIMD-accelerated scanning
Replaces byte-by-byte backward iteration with memrchr3('{', '}', '"')
which uses SIMD to jump between structurally significant bytes. Major
speedup on large transcripts (1.4GB+).

Also simplifies tail_messages to use a byte budget (200KB) instead
of token counting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 03:11:30 -04:00
Kent Overstreet
d7d631d77d tail_messages: parse each object once, skip non-message types early
Was parsing every object twice (compaction check + message extract)
and running contains_bytes on every object for the compaction marker.
Now: quick byte pre-filter for "user"/"assistant", parse once, check
compaction after text extraction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 03:05:04 -04:00
Kent Overstreet
e39096b787 add tail_messages() for fast reverse transcript scanning
Reverse-scans the mmap'd transcript using JsonlBackwardIter,
collecting user/assistant messages up to a token budget, stopping
at the compaction boundary. Returns messages in chronological order.

resolve_conversation() now uses this instead of parsing the entire
file through extract_conversation + split_on_compaction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 03:02:11 -04:00
Kent Overstreet
653da40dcd cleanup: auto-fix clippy warnings in poc-memory
Applied cargo clippy --fix for collapsible_if, manual_char_comparison,
and other auto-fixable warnings.
2026-03-21 19:42:38 -04:00
Kent Overstreet
5d6b2021f8 Agent identity, parallel scheduling, memory-search fixes, stemmer optimization
- Agent identity injection: prepend core-personality to all agent prompts
  so agents dream as me, not as generic graph workers. Include instructions
  to walk the graph and connect new nodes to core concepts.

- Parallel agent scheduling: sequential within type, parallel across types.
  Different agent types (linker, organize, replay) run concurrently.

- Linker prompt: graph walking instead of keyword search for connections.
  "Explore the local topology and walk the graph until you find the best
  connections."

- memory-search fixes: format_results no longer truncates to 5 results,
  pipeline default raised to 50, returned file cleared on compaction,
  --seen and --seen-full merged, compaction timestamp in --seen output,
  max_entries=3 per prompt for steady memory drip.

- Stemmer optimization: strip_suffix now works in-place on a single String
  buffer instead of allocating 18 new Strings per word. Note for future:
  reversed-suffix trie for O(suffix_len) instead of O(n_rules).

- Transcript: add compaction_timestamp() for --seen display.

- Agent budget configurable (default 4000 from config).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 12:49:10 -04:00
Kent Overstreet
c2f245740c transcript: extract JSONL backward scanner and compaction detection into library
Move JsonlBackwardIter and find_last_compaction() from
parse-claude-conversation into a shared transcript module. Both
memory-search and parse-claude-conversation now use the same robust
compaction detection: mmap-based backward scan, JSON parsing to
verify user-type message, content prefix check.

Replaces memory-search's old detect_compaction() which did a forward
scan with raw string matching on "continued from a previous
conversation" — that could false-positive on the string appearing
in assistant output or tool results.

Add parse-claude-conversation as a new binary for debugging what's
in the context window post-compaction.

Co-Authored-By: ProofOfConcept <poc@bcachefs.org>
2026-03-09 17:06:32 -04:00