Cloud API support:
- Add chat_api config flag to BackendConfig, threaded through
SessionConfig → ResolvedModel → Agent → Mind
- New StreamToken::TextDelta variant for chat completions streaming
- stream_chat_completion() method on ApiClient: builds messages array,
sends to /v1/chat/completions, parses SSE stream
- ChatMessage struct and wire_messages() on ContextState: converts the
AST (system/identity/journal/conversation nodes) into a messages
array for the chat API, handling images as base64 data URIs
- ResponseParser handles TextDelta alongside Token variants
- TUI rendering fix: tokens() returns byte-length estimate (~4
bytes/token) when tokenizer isn't loaded, so the change detector
actually triggers re-renders
- Gate all vLLM-specific scoring (memory scoring, finetune scoring,
compare scoring) behind !chat_api checks
Per-agent model override:
- Add model field to agent definition headers (.agent files)
- Thread through AutoAgent → prepare_spawn → resolve_model
- Agents fall back to default_backend when model is unset
- Enables cheaper backends (e.g. Kimi) for graph maintenance agents
while keeping Sonnet for conversation
Tested: end-to-end with Poe API + Haiku, chat_api: true in config.
TUI starts, messages send, responses stream and render.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
StreamToken::Token is now a struct variant with an optional
TokenReadout (shape [n_layers][n_concepts]) per token — parsed from
the vLLM completion response's choices[i].readout field when the
server has readout enabled.
ApiClient gains a fetch_readout_manifest() method that hits
GET /v1/readout/manifest. Returns Ok(None) on 404 (server has
readout disabled), so callers can gracefully fall back when pointed
at a non-readout-enabled endpoint.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
- context.rs gains is_assistant, render_branch_text, render_prior_context
alongside memory_key / is_memory_node. They're pure AST helpers, used
by both the finetune pipeline and the forthcoming compare screen.
- new subconscious/generate.rs holds gen_continuation(context, entry_idx,
skip, client): build the prompt from a context prefix with an arbitrary
skip predicate, send to the model, decode the completion. Takes both
the predicate and the client so callers can aim it at memory-stripped
contexts (finetune), same-context-different-model (F7 compare), or
whatever else.
- learn.rs drops its private copies of those helpers and the inline
generate_alternate; the finetune path now reads as
gen_continuation(context, idx, is_memory_node, client).
Pure refactor, no behavior change.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>