New stream_completions() in openai.rs sends prompt as token IDs to
the completions endpoint instead of JSON messages to chat/completions.
Handles <think> tags in the response (split into Reasoning events)
and stops on <|im_end|> token.
start_stream_completions() on ApiClient provides the same interface
as start_stream() but takes token IDs instead of Messages.
The turn loop in Agent::turn() uses completions when the tokenizer
is initialized, falling back to the chat API otherwise. This allows
gradual migration — consciousness uses completions (Qwen tokenizer),
Claude Code hook still uses chat API (Anthropic).
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Tool definitions are now pushed as a ContextEntry in the system
section at Agent construction time, formatted in the Qwen chat
template style. They're tokenized, scored, and treated like any
other context entry.
assemble_prompt_tokens() no longer takes a tools parameter —
tools are already in the context. This prepares for the switch
to /v1/completions where tools aren't a separate API field.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Remove tiktoken-rs dependency, CoreBPE field on Agent, and the
msg_token_count() function. All tokenization now goes through the
global HuggingFace tokenizer in agent/tokenizer.rs.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Add agent/tokenizer.rs with global Qwen 3.5 tokenizer that generates
actual token IDs including chat template wrapping. ContextEntry now
stores token_ids: Vec<u32> instead of tokens: usize — the count is
derived from the length.
ContextEntry::new() tokenizes automatically via the global tokenizer.
ContextSection::push_entry() takes a raw ConversationEntry and
tokenizes it. set_message() re-tokenizes without needing an external
tokenizer parameter.
Token IDs include the full chat template: <|im_start|>role\ncontent
<|im_end|>\n — so concatenating token_ids across entries produces a
ready-to-send prompt for vLLM's /v1/completions endpoint.
The old tiktoken CoreBPE is now unused on Agent (will be removed in
a followup). Token counts are now exact for Qwen 3.5 instead of the
~85-90% approximation from cl100k_base.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
restore_from_log called .message() on all entries including Thinking
entries, which panic. Filter them out alongside Log entries.
Also fix bail-no-competing.sh: without nullglob, when no pid-* files
exist the glob stays literal and always triggers a false bail.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
The bail-no-competing.sh script expects $1 to be the path to the
current agent's pid file so it can skip it when checking for
competing processes. But the runner wasn't passing any arguments,
so $1 was empty and the script treated every pid file (including
the agent's own) as a competing process — bailing every time.
This caused surface-observe to always bail at step 2, preventing
all memory graph maintenance (organize, observe phases) from
running.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
StreamResult now includes accumulated reasoning text. After each
stream completes, if reasoning was produced, a Thinking entry is
pushed to the conversation before the response message.
Reasoning content is visible in the context tree UI but not sent
back to the API and doesn't count against the token budget.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Thinking/reasoning content is now a first-class entry type:
- Serialized as {"thinking": "..."} in conversation log
- 0 tokens for budgeting (doesn't count against context window)
- Filtered from assemble_api_messages (not sent back to model)
- Displayed in UI with "thinking: ..." label and expandable content
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
lowest_scored_memory() now skips memories with score=None. Unscored
memories haven't been evaluated — dropping them before scored
low-value ones loses potentially important context.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
trim_entries is now a simple loop:
1. Drop duplicate memories and DMN entries
2. While over budget: if memories > 50% of entry tokens, drop
lowest-scored memory; otherwise drop oldest conversation entry
3. Snap to user message boundary
ContextBudget is gone — sections already have cached token totals:
- total_tokens() on ContextState replaces budget.total()
- format_budget() on ContextState replaces budget.format()
- trim() takes fixed_tokens: usize (system + identity + journal)
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
New types — not yet wired to callers:
- ContextEntry: wraps ConversationEntry with cached token count and
timestamp
- ContextSection: named group of entries with cached token total.
Private entries/tokens, read via entries()/tokens().
Mutation via push(entry), set(index, entry), del(index).
- ContextState: system/identity/journal/conversation sections + working_stack
- ConversationEntry::System variant for system prompt entries
Token counting happens once at push time. Sections maintain their
totals incrementally via push/set/del. No more recomputing from
scratch on every budget check.
Does not compile — callers need updating.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
compact() already computes context_budget() — pass it to trim_entries
so it has access to all budget components without recomputing them.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Scores are saved to memory-scores.json alongside the conversation log
after each scoring run, and loaded on startup — no more re-scoring
on restart.
trim_entries now evicts lowest-scored memories first (instead of
oldest-first) when memories exceed 50% of context. The 50% threshold
stays as a heuristic for memory-vs-conversation balance until we have
a scoring signal for conversation entries too. Unscored memories get
0.0, so they're evicted before scored ones.
save_memory_scores rebuilds from current entries, so evicted memories
are automatically expired from the scores file.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Moved persistent_state from per-agent to a single shared BTreeMap on
Subconscious. All agents read/write the same state — surface's walked
keys are visible to observe and reflect, etc.
- Subconscious.state: shared BTreeMap<String, String>
- walked() derives from state["walked"] instead of separate Vec
- subconscious-state.json is now a flat key-value map
- All agent outputs merge into the shared state on completion
- Loaded on startup, saved after any agent completes
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
- Agent state (outputs) persists across runs in subconscious-state.json,
loaded on startup, saved after each run completes
- Merge semantics: each run's outputs accumulate into persistent_state
rather than replacing
- Walked keys restored from surface agent state on load
- Store::recent_by_provenance() queries nodes by agent provenance for
the store activity view
- Switch outputs from HashMap to BTreeMap for stable display ordering
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
context_state_summary() was used for both compaction decisions (just
needs token counts) and debug screen display (needs full tree with
labels). Split into:
- Agent::context_budget() -> ContextBudget: cheap token counting by
category, used by compact(), restore_from_log(), mind event loop
- ContextBudget::format(): replaces sections_budget_string() which
fragily pattern-matched on section name strings
- context_state_summary(): now UI-only, formatting code stays here
Also extracted entry_sections() as shared helper with include_memories
param — false for context_state_summary (memories have own section),
true for conversation_sections_from() (subconscious screen shows all).
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Add provenance field to Agent, set to "agent:{name}" for forked
subconscious agents. Memory tools (write, link_add, supersede,
journal_new, journal_update) now read provenance from the Agent
context when available, falling back to "manual" for interactive use.
AutoAgent passes the forked agent to dispatch_with_agent so tools
can access it.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
The final assistant response in run_with_backend wasn't being pushed
to the backend — only intermediate step responses were. This meant
the subconscious debug screen only showed the prompt, not the full
conversation.
Now push assistant response immediately after receiving it, before
checking for next steps. Remove the duplicate push in the multi-step
path.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Only Message, Role, MessageContent, ContentPart, ToolCall,
FunctionCall, Usage, ImageUrl are pub-exported from agent::api.
Internal types (ChatRequest, ChatCompletionChunk, ChunkChoice,
Delta, ReasoningConfig, ToolCallDelta, FunctionCallDelta) are
pub(crate) — invisible outside the crate.
All callers updated to import from agent::api:: instead of
agent::api::types::.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
New src/agent/api/http.rs: ~240 lines, supports GET/POST, JSON/form
bodies, SSE streaming via chunk(), TLS via rustls. No tracing dep.
Removes reqwest from the main crate and telegram channel crate.
Cargo.lock drops ~900 lines of transitive dependencies.
tracing now only pulled in by tui-markdown.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Store::cached() returns a process-global Arc<tokio::sync::Mutex<Store>>
that loads once and reloads only when log files change (is_stale()
checks file sizes). All memory and journal tools use cached_store()
instead of Store::load() per invocation.
Fixes CPU saturation from HashMap hashing when multiple subconscious
agents make concurrent tool calls.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
ConversationEntry::Memory gains score: Option<f64>. The scorer
writes scores directly onto entries when results arrive. Removes
Agent.memory_scores Vec and the memory_scores parameter from
context_state_summary().
Scores are serialized to/from the conversation log as memory_score.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
The forked agent is now behind Arc<tokio::sync::Mutex<Agent>>,
stored on SubconsciousAgent and passed to the spawned task. The
subconscious detail screen locks it via try_lock() to read entries
from the fork point — live during runs, persisted after completion.
Removes last_run_entries snapshot. Backend::Forked now holds the
shared Arc, all push operations go through the lock.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
F1 and F2 screens now call agent.context_state_summary() directly
via try_lock/lock instead of reading from a shared RwLock cache.
Removes SharedContextState, publish_context_state(), and
publish_context_state_with_scores().
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
SubconsciousSharedState holds walked keys shared between all
subconscious agents. Enables splitting surface-observe into separate
surface and observe agents that share the same walked state.
Walked is passed to run_forked() at run time instead of living on
AutoAgent. UI shows walked count in the subconscious screen header.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Track fork point in run_forked(), capture entries added during the
run. Subconscious screen shows these in a detail view (Enter to
drill in, Esc to go back) — only the subconscious agent's own
conversation, not the inherited conscious context.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
AutoAgent intercepts output() tool calls and stores results in an
in-memory HashMap instead of writing to the filesystem. Mind reads
auto.outputs after task completion. Eliminates the env-var-based
output dir which couldn't work with concurrent agents in one process.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
AutoAgent holds config + walked state. Backend is ephemeral per run:
- run(): standalone, global API client (oneshot CLI)
- run_forked(): forks conscious agent, resolves prompt templates
with current memory_keys and walked state
Mind creates AutoAgents once at startup, takes them out for spawned
tasks, puts them back on completion (preserving walked state).
Removes {{seen_previous}}, {{input:walked}}, {{memory_ratio}} from
subconscious agent prompts. Walked keys are now a Vec on AutoAgent,
resolved via {{walked}} from in-memory state.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Add Log variant to ConversationEntry that serializes to the
conversation log but is filtered out on read-back and API calls.
AutoAgent writes debug/status info (turns, tokens, tool calls)
through the conversation log instead of a callback parameter.
Removes the log callback from run_one_agent, call_api_with_tools,
and all callers.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Instead of snapshotting assemble_api_messages() at construction, the
forked backend pushes step prompts and tool results into the agent's
context.entries and reassembles messages each turn. Standalone backend
(oneshot CLI) keeps the bare message list.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Agent::fork() clones context for KV cache sharing with conscious agent.
AutoAgent runs multi-step prompt sequences with tool dispatch — used by
both oneshot CLI agents and (soon) Mind's subconscious agents.
call_api_with_tools() now delegates to AutoAgent internally; existing
callers unchanged.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
- Fix sync logic to only break at matching assistant messages
- When assistant message changes (streaming → final), properly pop and re-display
- Add debug logging for sync operations (can be removed later)
The bug: when tool calls split an assistant response into multiple entries,
the sync logic was breaking at the assistant even when it didn't match,
causing the old display to remain while new entries were added on top.
The fix: only break at assistant if matches=true, ensuring changed entries
are properly popped before re-adding.
Co-Authored-By: ProofOfConcept <poc@bcachefs.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Replace pop+push of streaming entries with finalize_streaming() which
finds the unstamped assistant entry and updates it in place. The
streaming entry IS the assistant message — just stamp it when done.
Also: set dirty flag on agent_changed/turn_watch so the TUI actually
redraws when the agent state changes. Publish context state on F2
switch so the debug screen shows current data.
Age out images during compact() so old screenshots don't bloat the
request payload on startup.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
- Budget now counts exact message tokens matching what assemble_api_messages
sends, not raw string content. Eliminates undercounting from formatting
overhead (journal headers, personality separators, working stack).
- Load journal before trimming so trim accounts for journal cost.
- Compact before every turn, not just after turn completion. Prevents
agent_cycle surfaced memories from pushing context over budget.
- Move agent_cycle orchestration from Agent::turn to Mind::start_turn —
surfaced memories and reflections now precede the user message.
- Move AgentCycleState from Agent to Mind — it's orchestration, not
per-agent state. memory_scoring_in_flight and memory_scores stay on
Agent where they belong.
- Tag DMN entries as ConversationEntry::Dmn — compaction evicts them
first since they're ephemeral. Compaction also prefers evicting
memories over conversation when memories exceed 50% of entry tokens.
- Kill /retry slash command.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Kill ContextBudget and recompute_budget entirely. Budget percentages,
used token counts, and compaction threshold checks now all derive from
the ContextSection tree built by context_state_summary(). This
eliminates the stale-budget bug where the cached budget diverged from
actual context contents.
Also: remove MindCommand::Turn — user input flows through
shared_mind.input exclusively. Mind::start_turn() atomically moves
text from pending input into the agent's context and spawns the turn.
Kill /retry. Make Agent::turn() take no input parameter.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
budget() called tiktoken on every UI tick, which was the main CPU hog
during rapid key input. Move the cached ContextBudget onto ContextState
and recompute only when entries actually change (push_entry, compact,
restore_from_log).
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
append_text was the TextDelta streaming handler — replaced by
append_streaming on Agent entries. needs_assistant_marker tracked
turn boundaries for the old message path. target removed from
Agent::turn — routing now determined by entry content.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>