Collapse the split `temperature` / `top_p` / `top_k` fields on
AgentState into a single `sampling: SamplingParams` struct, mirroring
how the wire-level fields flow into the Generate RPC. Adds
`max_tokens` to SamplingParams so it's actually plumbed end to end
(previously the client had a hardcoded 4096 fallback inside
`run_session_generate`).
AgentState construction sites now set `sampling: SamplingParams { ...
max_tokens: 4096 }` as the default. The assignment sites in
oneshot.rs / subconscious.rs / unconscious.rs switch from
`st.temperature = X` to `st.sampling.temperature = X`.
`stream_session_mm` takes `SamplingParams` directly; the
`sampling_max_tokens()` helper goes away. `pb::GenerateRequest` is
populated with `sampling.max_tokens` (and the other fields) in
`run_session_generate`. SamplingParams is `pub` so it can be
embedded in the public AgentState without a visibility warning.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Wires the client side of the new salience protocol so inference
actually runs over gRPC instead of emitting the stubbed "not yet
wired" error. Each turn walks the AST as interleaved chunks, sends
only what's new to the server, and streams decode tokens back.
context.rs:
* `WireChunk` enum: `Tokens(Vec<u32>)` or `Image { bytes, mime,
known_expanded_len }`. Preserves text/image/text ordering the
wire path can't flatten.
* `wire_chunks(range, skip)` walker, parallel to `wire_prompt` —
branches emit `<|im_start|>…<|im_end|>` tokens, image leaves
emit a single Image chunk (no inline vision tokens).
* `NodeLeaf::set_image_token_count(n)` + recompute of cached
`token_ids`; `ContextState::commit_image_token_counts(&[u32])`
fills in the first-N zero-count image leaves in wire order.
* `ResponseParser::run` handles the new
`StreamToken::ImageAppended` by committing the server's N into
the AST before the final Generate's Token events stream in.
salience.rs:
* `SessionHandle` tracks `committed_len`. `append_image` advances
it from the RPC response. New `generate(req)` opens the
server-streaming RPC.
api/mod.rs:
* `stream_session_mm(session_lock, chunks, sampling, priority,
readout_shape)` replaces the stub. Spawns `run_session_generate`.
* `run_session_generate`: takes the session out of the Mutex (or
opens fresh), skips chunks covered by `committed_len` (bails on
mid-chunk straddle or unknown-length image in the committed
prefix), walks the delta: accumulates Tokens into `pending`, on
Image flushes pending via `flush_pending` (max_tokens=0 Generate
that just prefills), then AppendImage + emits
StreamToken::ImageAppended. Final Generate carries any trailing
pending text as `append_tokens` and the sampling params; Token
events stream out as StreamToken::Token, Done as
StreamToken::Done. On success, handle with updated
`committed_len` returns to the Mutex; on error, handle drops
and next call reopens.
* `StreamToken::ImageAppended { placeholder_count }` variant —
emitted in wire order before the final Generate's tokens.
* Prefix-cache cap for readout coverage: `readout_ranges` covers
`[prompt_len_after_append, u32::MAX)` when the caller provides
a readout_shape, so decode positions stream their readouts.
agent/mod.rs:
* `assemble_prompt` returns `Vec<WireChunk>` with the assistant
prologue merged into the trailing Tokens chunk. Caller in
`turn` passes chunks + readout_shape (pulled from
`agent.readout.lock().manifest`) to `stream_session_mm`.
* Dropped `assemble_prompt_tokens` — dead.
mind + unconscious:
* `Unconscious::new(client)` stores a shared `ApiClient`. Fixes
the repeated-manifest-fetch bug caused by each subagent's
`ApiClient::new` having its own OnceCell. The client's Arc-
wrapped manifest cache is now shared across every agent Mind
spawns.
* `prepare_spawn(name, auto, wake, base_client)` clones the base
client and overrides `.model` for the resolved backend instead
of constructing fresh. All three callers
(`toggle`/`trigger`/unconscious loop) pass `self.client.clone()`.
* `Mind::new` passes `agent.client.clone()` into
`Unconscious::new`.
subconscious/generate.rs:
* gen_continuation switched to `wire_chunks` + the new
`stream_session_mm` signature. Ephemeral session opens on each
call, tears down at scope end. No readouts requested.
Not changed yet, noted for follow-up:
* Subconscious ablation scoring in learn.rs still talks to
`/v1/score` over HTTP. Will migrate once we have time to verify
the Generate+max_tokens=0+prompt_logprobs path end-to-end.
* compare.rs constructs its own ApiClient for the
`compare.test_backend` (which is intentionally a different
endpoint) — left alone.
* Readout manifest still fetched via HTTP at Agent::new.
Migration to GetReadoutManifest gRPC is a separate cleanup.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Adds the client-side of a stateful gRPC protocol against vllm, plus
the TLS trust machinery so we can talk to self-signed vllm servers.
Protocol (proto/salience.proto):
Bidi-streaming Session RPC carries OpenSession / AppendTokens /
Generate / Cancel from client and SessionReady / PrefillProgress /
Token / GenerateDone / Error from server. Separate Fork unary RPC
for cheap branching (prefix cache shares KV automatically). Plus
ListSessions, CloseSession, GetReadoutManifest admin RPCs.
Per-token readouts ship as packed f32 ([n_layers * n_concepts] per
token, flat). Logprobs use range-selected positions plus a top-k
parameter — empty ranges means no logprobs, any range means emit
sampled-token logprob at those positions, top_k > 0 adds
alternatives.
Client (src/agent/api/salience.rs):
Tonic-generated types under pb::, a connect() helper, with_auth()
for bearer metadata, and a Session handle wrapping the bidi stream:
open() handshakes SessionReady; append() is fire-and-forget;
generate() returns impl Stream<Item = Event> that drains inbound
until Done or terminating Error. One generate at a time per session.
Peak picker (src/agent/salience.rs):
Pure function over ReadoutEntry traces. Per-concept z-score against
trace global stats; contiguous above-threshold regions emit one
peak at the local max. Configurable sigma threshold and min-std
safety floor. Deterministic tie-break on offset then concept name.
12 unit tests covering empty traces, flat channels, single/multi
spikes, contiguous humps, multi-concept independence, trailing
runs, sub-threshold noise, layer-out-of-range, manifest shape
mismatch, and threshold tunability.
TLS (src/agent/api/http.rs):
HttpClient::build now also loads every .pem file under
~/.consciousness/certs/ into the rustls root store — so dropping
a <host>.pem in that directory is enough to trust a new self-
signed server; no code changes per new host. Also installs the
rustls default crypto provider explicitly via OnceLock: tonic's
tls features pulled in both ring and aws-lc-rs on the resolver
path, and rustls 0.23 refuses to auto-pick when either could win.
Build (build.rs, Cargo.toml):
tonic-build generates Rust types from proto/salience.proto at
cargo-build time, using a vendored protoc binary
(protoc-bin-vendored) so no system install is required. New
runtime deps: tonic, prost, async-stream, tokio-stream,
rustls-pemfile.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
StreamToken::Token is now a struct variant with an optional
TokenReadout (shape [n_layers][n_concepts]) per token — parsed from
the vLLM completion response's choices[i].readout field when the
server has readout enabled.
ApiClient gains a fetch_readout_manifest() method that hits
GET /v1/readout/manifest. Returns Ok(None) on 404 (server has
readout disabled), so callers can gracefully fall back when pointed
at a non-readout-enabled endpoint.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
stream_completion was a thin wrapper around stream_completion_mm (just
passing an empty image list); the last caller switched to _mm directly
when learn's generate_alternate gained image support. Delete the
wrapper — callers can pass `&[]` if they have no images.
MindState::dmn_tick has been sitting unused (called only from a
commented-out block in the Mind loop). Rename to _dmn_tick so the
compiler stops warning; Kent may uncomment the call path later.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Split the prompt assembly into two forms: the AST keeps the
fully-expanded representation (N image_pads per image, for accurate
context budget accounting), while the request wire form collapses
each image to a single <|image_pad|> bookended by vision_start/end
and ships the raw bytes out-of-band as a base64 data URI in a new
`multi_modal_data.image` field on /v1/completions.
vLLM's Qwen3VL processor uses PromptReplacement with target=single
<|image_pad|> and replacement=N image_pads, so the wire-form matches
what the processor expects and it re-expands to N server-side.
Server side needs /v1/completions to accept multi_modal_data for
this to land images end-to-end — that's the next piece.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Byte-position truncation (&s[..s.len().min(N)]) panics when position
N lands inside a multi-byte character. Fixed in parser debug logging,
API error messages, oneshot response logging, and CLI agent display.
Also fixed tool dispatch permissions (removed global fallback).
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Made StreamToken pub (was pub(crate), needed by context.rs).
Removed dead API_CLIENT, get_client, sampling/priority fields
from oneshot. Suppressed pre-existing SkipIndex warning in learn.rs.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
ResponseParser::run() spawns a task that reads StreamTokens, parses
into the AST (locking context per token), and sends PendingToolCalls
through a channel. Returns (tool_rx, JoinHandle<Result>) — the turn
loop dispatches tool calls and awaits the handle for error checking.
Token IDs from vLLM are accumulated alongside text and stored directly
on AST leaves — no local re-encoding on the response path.
The turn loop no longer matches on individual stream events. It just
reads tool calls and dispatches them.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
API is now two files: mod.rs (430 lines) and http.rs. Contains:
Usage, StreamToken, SamplingParams, ApiClient, stream_completions,
SseReader, send_and_check. Everything else is dead and gone.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Removed all chat completions wire types that are no longer used:
ChatRequest, ReasoningConfig, ChatCompletionChunk, ChunkChoice,
Delta, FunctionCallDelta, ToolCallDelta, append_content, user_with_images.
Remaining types in api/types.rs are transitional (Message, ToolCall, etc.)
— they'll go away as outer callers migrate to AstNode.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Deleted: api/parsing.rs entirely (parsing now in context_new.rs),
stream_events (chat completions path), collect_stream, build_response_message,
log_diagnostics, tools_to_json_str, start_stream, chat_completion_stream_temp.
API layer is now just: stream_completion (token IDs in/out), SseReader,
send_and_check, and types. Zero errors in api/.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Work in progress. New turn loop uses ResponseParser + StreamToken.
Killed StreamEvent, append_streaming, finalize_streaming, streaming_index,
assemble_api_messages, working_stack. Many methods still reference old
types — fixing next.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
AstNode fields are now private with read-only accessors. All mutation
goes through ContextState methods (push, set_message, set_score, del)
which guarantee token_ids stays in sync with text on every leaf.
Also fix ResponseParser to use AstNode::tool_call() constructor,
widen parsing module visibility to pub(crate).
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
New stream_completions() in openai.rs sends prompt as token IDs to
the completions endpoint instead of JSON messages to chat/completions.
Handles <think> tags in the response (split into Reasoning events)
and stops on <|im_end|> token.
start_stream_completions() on ApiClient provides the same interface
as start_stream() but takes token IDs instead of Messages.
The turn loop in Agent::turn() uses completions when the tokenizer
is initialized, falling back to the chat API otherwise. This allows
gradual migration — consciousness uses completions (Qwen tokenizer),
Claude Code hook still uses chat API (Anthropic).
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
StreamResult now includes accumulated reasoning text. After each
stream completes, if reasoning was produced, a Thinking entry is
pushed to the conversation before the response message.
Reasoning content is visible in the context tree UI but not sent
back to the API and doesn't count against the token budget.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Only Message, Role, MessageContent, ContentPart, ToolCall,
FunctionCall, Usage, ImageUrl are pub-exported from agent::api.
Internal types (ChatRequest, ChatCompletionChunk, ChunkChoice,
Delta, ReasoningConfig, ToolCallDelta, FunctionCallDelta) are
pub(crate) — invisible outside the crate.
All callers updated to import from agent::api:: instead of
agent::api::types::.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
New src/agent/api/http.rs: ~240 lines, supports GET/POST, JSON/form
bodies, SSE streaming via chunk(), TLS via rustls. No tracing dep.
Removes reqwest from the main crate and telegram channel crate.
Cargo.lock drops ~900 lines of transitive dependencies.
tracing now only pulled in by tui-markdown.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
- Fix sync logic to only break at matching assistant messages
- When assistant message changes (streaming → final), properly pop and re-display
- Add debug logging for sync operations (can be removed later)
The bug: when tool calls split an assistant response into multiple entries,
the sync logic was breaking at the assistant even when it didn't match,
causing the old display to remain while new entries were added on top.
The fix: only break at assistant if matches=true, ensuring changed entries
are properly popped before re-adding.
Co-Authored-By: ProofOfConcept <poc@bcachefs.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Streaming text now goes directly to agent entries via append_streaming().
sync_from_agent diffs the growing entry each tick. The streaming entry
is popped when the response completes; build_response_message pushes
the final version.
All status feedback uses RAII ActivityGuards:
- push_activity() for long-running work (thinking, streaming, scoring)
- notify() for instant feedback (compacted, DMN state changes, commands)
- Guards auto-remove on Drop, appending "(complete)" and lingering 5s
- expire_activities() cleans up timed-out notifications on render tick
UiMessage enum reduced to a single Info variant with zero sends.
The channel infrastructure remains for now (Mind/Agent still take
UiSender in signatures) — mechanical cleanup for a follow-up.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Reasoning tokens: dropped for now, will land in context entries later.
Debug sends: converted to dbglog! macro (writes to debug.log).
Activity: now a field on Agent, set directly, read by UI via try_lock.
score_memories_incremental takes agent Arc for activity writes.
UiMessage down to 2 variants: TextDelta, Info.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Status bar reads directly from Agent and MindState on each render tick.
Activity is now a field on Agent — set by agent code directly, read by
UI via try_lock. DmnAnnotation, ContextInfoUpdate, AgentUpdate were
already dead (no senders).
UiMessage down to 4 variants: TextDelta, Reasoning, Debug, Info.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
The std::sync::Mutex detour caught every place a MutexGuard lived
across an await point in Agent::turn — the compiler enforced Send
safety that tokio::sync::Mutex silently allows. With those fixed,
switch back to tokio::sync::Mutex (std::sync blocks tokio worker
threads and panics inside the runtime).
Input and command dispatch now live in InteractScreen (chat.rs):
- Enter pushes directly to SharedMindState.input (no app.submitted hop)
- sync_from_agent displays pending input with dimmed color
- Slash command table moved from event_loop.rs to chat.rs
- cmd_switch_model kept as pub fn for tool-initiated switches
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
The agent lock is never held across await points — turns lock briefly,
do work, drop, then do async API calls. std::sync::Mutex works and
can be locked from sync contexts (screen tick inside terminal.draw).
Fixes: blocking_lock() panic when called inside tokio runtime.
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
Move the entire stream event processing loop (content accumulation,
leaked tool call detection/dispatch, ToolCallDelta assembly, UI
forwarding, display buffering) into api::collect_stream(). The turn
loop now calls collect_stream() and processes the StreamResult.
Also move FunctionCall, ToolCall, ToolCallDelta to api/types.rs where
they belong (API wire format, not tool definitions). Move parsing.rs
to api/parsing.rs.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Move FunctionCall, FunctionCallDelta, ToolCall, ToolCallDelta from
tools/mod.rs to api/types.rs — these are API wire format, not tool
definitions. Re-export from tools for existing callers.
Move parsing.rs to api/parsing.rs — leaked tool call parsing is API
plumbing.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
ToolDef and FunctionDef are gone. Tool definitions are static strings
on the Tool struct. The API layer builds JSON from Tool::to_json().
- ChatRequest.tools is now Option<serde_json::Value>
- start_stream takes &[Tool] instead of Option<&[ToolDef]>
- openai::stream_events takes &serde_json::Value for tools
- memory_and_journal_tools() returns Vec<Tool> for subconscious agents
- Subconscious agents filter by t.name instead of t.function.name
No more runtime JSON construction for tool definitions.
No more ToolDef::new(). No more FunctionDef.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Move temperature from a per-call parameter to an Agent field,
add top_p and top_k. All three are sent to the API via a new
SamplingParams struct, displayed on the F5 thalamus screen.
Defaults: temperature=0.6, top_p=0.95, top_k=20 (Qwen3.5 defaults).
Also adds top_p and top_k to ChatRequest so they're sent in the
API payload. Previously only temperature was sent.
UI controls for adjusting these at runtime are not yet implemented.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Mechanical rename: src/agent/ -> src/user/, all crate::agent:: ->
crate::user:: references updated. Binary poc-agent renamed to
consciousness with CLI name and user-facing strings updated.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
/score snapshots the context and client, releases the agent lock,
runs scoring in background. Only one score task at a time
(scoring_in_flight flag). Results stored on Agent and shown on
the F10 context debug screen with importance scores per memory.
ApiClient derives Clone. ContextState derives Clone.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
score_memories() drops each memory from the context one at a time,
runs prompt_logprobs against the full conversation, and builds a
divergence matrix: memories × responses.
Row sums = memory importance (for graph weight updates)
Column sums = response memory-dependence (training candidates)
Uses vLLM's prompt_logprobs to check "would the model have said
this without this memory?" — one forward pass per memory, all
responses scored at once. ~3s per memory on B200.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Show chunks received, SSE lines parsed, and the contents of
the line buffer (up to 500 bytes) on both stream errors and
timeouts. This tells us whether we got partial data, a non-SSE
response, or truly nothing from the server.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Stream chunk timeout is now api_stream_timeout_secs in config
(default 60s). Status bar shows total turn time and per-call
time with timeout: "thinking... 45s, 12/60s".
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Spawned streaming tasks were never cancelled when a turn ended or
retried, leaving zombie tasks blocked on dead vLLM connections.
AbortOnDrop wrapper aborts the task when it goes out of scope.
Chunk timeout reduced from 120s to 60s.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>