Two related changes to the learn subsystem:
1. AST node timestamps are now non-optional — both Leaf and Branch
variants carry a DateTime<Utc>. UNIX_EPOCH means "unset" (old entries
deserialized from on-disk conversation logs).
Training uses timestamps as unique keys for dedup, so we promote to
nanosecond precision: node_timestamp_ns(), TrainData.timestamp_ns,
FinetuneCandidate.timestamp_ns, mark_trained(ns).
2. build_token_ids() now also returns token-position ranges of assistant
messages. These are passed to vLLM's /score endpoint via the new
score_ranges field so only scored-position logprobs are returned —
cuts bandwidth/compute when scoring small windows.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Branch::tokens() was calling tokenizer::encode() on every call for
the role header ("system\n", "user\n", "assistant\n") and trailing
newline. In trim_conversation(), this meant hundreds of encode calls
per trim cycle.
These are fixed strings - cache them with OnceLock on first use.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
The old code wrote a JSON object with named section keys, which
serde_json serialized in alphabetical order — putting conversation
before system, making logs misleading. Write a single flat array
in section order instead, matching what the model actually sees.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
The full matrix scorer was deleted during the AST conversion. Restore
it: /score runs score_memories() which computes divergence for every
memory × response pair, stores the MemoryScore on MindState, and
displays per-memory weights with bar charts on the F2 screen.
Both scoring paths now use ActivityGuard::update() for live progress
in the status bar instead of creating a new activity per iteration.
Also bumps score API timeout from 120s to 300s and adds progress
logging throughout.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
tool_call labels now show the arguments truncated to 80 chars:
tool: memory_render({"key":"identity"})
instead of just:
tool_call: memory_render
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Duplicate key warnings fire on every store load and were writing to
stderr, corrupting the TUI display. Log write warnings and MCP
server failures are similarly routine. Route these to dbglog.
Serious errors (rkyv snapshot failures, store corruption) remain on
stderr — those are real problems the user needs to see.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
restore_from_log reads the full log but walks backwards from the tail,
retokenizing each node as it goes. Stops when conversation budget is
full. Only the nodes that fit get pushed into context.
Added AstNode::retokenize() — recomputes token_ids on all leaves
after deserialization (serde skip means they're empty).
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
The unconscious trigger holds the tokio mutex during heavy sync work
(store load, graph build, agent creation), blocking the UI tick which
needs the same lock for snapshots. Fix: try_lock in the UI — skip
the update if the trigger is running.
Also: restore_from_log was re-logging every restored node back to the
log file via push()'s auto-log. Added push_no_log() for restore path.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
The log records what goes into context, so it belongs under the context
lock. push() now auto-logs conversation entries, eliminating all the
manual lock-state-for-log, drop, lock-context-for-push dances.
- ContextState: new conversation_log field, Clone impl drops it
(forked contexts don't log)
- push(): auto-logs Section::Conversation entries
- push_node, apply_tool_results, collect_results: all simplified
- collect_results: batch nodes under single context lock
- Assistant response logged under context lock after parse completes
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Ported the old trim_entries logic to the new AstNode types:
- Phase 1: Dedup Memory nodes by key (keep last), drop DMN entries
- Phase 2: While over budget, evict lowest-scored memory (if memories
> 50% of conv tokens) or oldest conversation entry
- Phase 3: Snap to User message boundary at start
Called from compact() which runs on startup and on /compact.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
token_ids are not serialized (serde skip), so deserialized nodes had
0 tokens. The custom Deserialize impl recomputes tokens from the body
text, restoring the invariant at the reconstruction boundary. No
separate recompute step needed.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Byte-position truncation (&s[..s.len().min(N)]) panics when position
N lands inside a multi-byte character. Fixed in parser debug logging,
API error messages, oneshot response logging, and CLI agent display.
Also fixed tool dispatch permissions (removed global fallback).
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Parameter values like ["key1", "key2"] were being wrapped as strings
instead of parsed as JSON arrays. Tools expecting array arguments
(like memory_search) got a string containing the array literal.
Now tries serde_json::from_str first, falls back to String.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Qwen's chat template renders tool results as:
<|im_start|>user\n<tool_response>\n{content}\n</tool_response><|im_end|>
We were rendering as:
<|im_start|>tool\n{content}<|im_end|>
The model never saw <|im_start|>tool in training, so it ignored our
tool results and looped retrying the same call. Found by comparing
our tokenization against vLLM's /tokenize endpoint with chat messages.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
compact() was clearing tool definitions from the system section on
startup — now leaves system section untouched (set once by new()).
Added context token count to parser done log for diagnosing the
subconscious agent loop issue.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
compact() cleared and rebuilt the system section but only pushed the
system prompt — tool definitions were lost. Since new() sets up the
system section correctly (prompt + tools), compact() now only reloads
identity and journal, leaving system untouched.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Logs full response text when no tool calls detected, tool call
bodies when found. Per-agent log files for debugging subconscious
agent parsing issues.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Logs full text length, <tool_call> tag count, and tool call details
on stream completion. Helps diagnose parsing issues with subconscious
agents.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Was checking trim but storing untrimmed. Now stores the trimmed
version — no leading/trailing whitespace in the AST.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Content between tags (e.g. newlines between </think> and <tool_call>)
was creating empty Content nodes. Now trimmed before creating the node.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Parser skips Thinking nodes that are just whitespace. Conscious screen
now shows assistant children (Content, Thinking, ToolCall) as nested
tree items via recursive node_to_view. Nodes get timestamped in
push_node and on assistant branch creation.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
The parser can't reliably split model-produced token IDs at tag
boundaries (<think>, <tool_call>) because BPE tokens can span across
tags. Instead, each leaf gets re-encoded from its text content via
the local tokenizer. This gives clean token boundaries aligned with
semantic structure — better for budgeting and potentially for the
model during fine-tuning.
Also skip serializing token_ids to conversation log (they're cached
state, recomputed on construction).
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
ResponseParser::run() spawns a task that reads StreamTokens, parses
into the AST (locking context per token), and sends PendingToolCalls
through a channel. Returns (tool_rx, JoinHandle<Result>) — the turn
loop dispatches tool calls and awaits the handle for error checking.
Token IDs from vLLM are accumulated alongside text and stored directly
on AST leaves — no local re-encoding on the response path.
The turn loop no longer matches on individual stream events. It just
reads tool calls and dispatches them.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Remove tiktoken-rs dependency, CoreBPE field on Agent, and the
msg_token_count() function. All tokenization now goes through the
global HuggingFace tokenizer in agent/tokenizer.rs.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Add agent/tokenizer.rs with global Qwen 3.5 tokenizer that generates
actual token IDs including chat template wrapping. ContextEntry now
stores token_ids: Vec<u32> instead of tokens: usize — the count is
derived from the length.
ContextEntry::new() tokenizes automatically via the global tokenizer.
ContextSection::push_entry() takes a raw ConversationEntry and
tokenizes it. set_message() re-tokenizes without needing an external
tokenizer parameter.
Token IDs include the full chat template: <|im_start|>role\ncontent
<|im_end|>\n — so concatenating token_ids across entries produces a
ready-to-send prompt for vLLM's /v1/completions endpoint.
The old tiktoken CoreBPE is now unused on Agent (will be removed in
a followup). Token counts are now exact for Qwen 3.5 instead of the
~85-90% approximation from cl100k_base.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Thinking/reasoning content is now a first-class entry type:
- Serialized as {"thinking": "..."} in conversation log
- 0 tokens for budgeting (doesn't count against context window)
- Filtered from assemble_api_messages (not sent back to model)
- Displayed in UI with "thinking: ..." label and expandable content
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
lowest_scored_memory() now skips memories with score=None. Unscored
memories haven't been evaluated — dropping them before scored
low-value ones loses potentially important context.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
trim_entries is now a simple loop:
1. Drop duplicate memories and DMN entries
2. While over budget: if memories > 50% of entry tokens, drop
lowest-scored memory; otherwise drop oldest conversation entry
3. Snap to user message boundary
ContextBudget is gone — sections already have cached token totals:
- total_tokens() on ContextState replaces budget.total()
- format_budget() on ContextState replaces budget.format()
- trim() takes fixed_tokens: usize (system + identity + journal)
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
New types — not yet wired to callers:
- ContextEntry: wraps ConversationEntry with cached token count and
timestamp
- ContextSection: named group of entries with cached token total.
Private entries/tokens, read via entries()/tokens().
Mutation via push(entry), set(index, entry), del(index).
- ContextState: system/identity/journal/conversation sections + working_stack
- ConversationEntry::System variant for system prompt entries
Token counting happens once at push time. Sections maintain their
totals incrementally via push/set/del. No more recomputing from
scratch on every budget check.
Does not compile — callers need updating.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
compact() already computes context_budget() — pass it to trim_entries
so it has access to all budget components without recomputing them.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Scores are saved to memory-scores.json alongside the conversation log
after each scoring run, and loaded on startup — no more re-scoring
on restart.
trim_entries now evicts lowest-scored memories first (instead of
oldest-first) when memories exceed 50% of context. The 50% threshold
stays as a heuristic for memory-vs-conversation balance until we have
a scoring signal for conversation entries too. Unscored memories get
0.0, so they're evicted before scored ones.
save_memory_scores rebuilds from current entries, so evicted memories
are automatically expired from the scores file.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
context_state_summary() was used for both compaction decisions (just
needs token counts) and debug screen display (needs full tree with
labels). Split into:
- Agent::context_budget() -> ContextBudget: cheap token counting by
category, used by compact(), restore_from_log(), mind event loop
- ContextBudget::format(): replaces sections_budget_string() which
fragily pattern-matched on section name strings
- context_state_summary(): now UI-only, formatting code stays here
Also extracted entry_sections() as shared helper with include_memories
param — false for context_state_summary (memories have own section),
true for conversation_sections_from() (subconscious screen shows all).
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Only Message, Role, MessageContent, ContentPart, ToolCall,
FunctionCall, Usage, ImageUrl are pub-exported from agent::api.
Internal types (ChatRequest, ChatCompletionChunk, ChunkChoice,
Delta, ReasoningConfig, ToolCallDelta, FunctionCallDelta) are
pub(crate) — invisible outside the crate.
All callers updated to import from agent::api:: instead of
agent::api::types::.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
ConversationEntry::Memory gains score: Option<f64>. The scorer
writes scores directly onto entries when results arrive. Removes
Agent.memory_scores Vec and the memory_scores parameter from
context_state_summary().
Scores are serialized to/from the conversation log as memory_score.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Add Log variant to ConversationEntry that serializes to the
conversation log but is filtered out on read-back and API calls.
AutoAgent writes debug/status info (turns, tokens, tool calls)
through the conversation log instead of a callback parameter.
Removes the log callback from run_one_agent, call_api_with_tools,
and all callers.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
- Budget now counts exact message tokens matching what assemble_api_messages
sends, not raw string content. Eliminates undercounting from formatting
overhead (journal headers, personality separators, working stack).
- Load journal before trimming so trim accounts for journal cost.
- Compact before every turn, not just after turn completion. Prevents
agent_cycle surfaced memories from pushing context over budget.
- Move agent_cycle orchestration from Agent::turn to Mind::start_turn —
surfaced memories and reflections now precede the user message.
- Move AgentCycleState from Agent to Mind — it's orchestration, not
per-agent state. memory_scoring_in_flight and memory_scores stay on
Agent where they belong.
- Tag DMN entries as ConversationEntry::Dmn — compaction evicts them
first since they're ephemeral. Compaction also prefers evicting
memories over conversation when memories exceed 50% of entry tokens.
- Kill /retry slash command.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Kill ContextBudget and recompute_budget entirely. Budget percentages,
used token counts, and compaction threshold checks now all derive from
the ContextSection tree built by context_state_summary(). This
eliminates the stale-budget bug where the cached budget diverged from
actual context contents.
Also: remove MindCommand::Turn — user input flows through
shared_mind.input exclusively. Mind::start_turn() atomically moves
text from pending input into the agent's context and spawns the turn.
Kill /retry. Make Agent::turn() take no input parameter.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
budget() called tiktoken on every UI tick, which was the main CPU hog
during rapid key input. Move the cached ContextBudget onto ContextState
and recompute only when entries actually change (push_entry, compact,
restore_from_log).
Co-Authored-By: Proof of Concept <poc@bcachefs.org>