salience: add gRPC client + TLS plumbing for stateful vllm sessions
Adds the client-side of a stateful gRPC protocol against vllm, plus
the TLS trust machinery so we can talk to self-signed vllm servers.
Protocol (proto/salience.proto):
Bidi-streaming Session RPC carries OpenSession / AppendTokens /
Generate / Cancel from client and SessionReady / PrefillProgress /
Token / GenerateDone / Error from server. Separate Fork unary RPC
for cheap branching (prefix cache shares KV automatically). Plus
ListSessions, CloseSession, GetReadoutManifest admin RPCs.
Per-token readouts ship as packed f32 ([n_layers * n_concepts] per
token, flat). Logprobs use range-selected positions plus a top-k
parameter — empty ranges means no logprobs, any range means emit
sampled-token logprob at those positions, top_k > 0 adds
alternatives.
Client (src/agent/api/salience.rs):
Tonic-generated types under pb::, a connect() helper, with_auth()
for bearer metadata, and a Session handle wrapping the bidi stream:
open() handshakes SessionReady; append() is fire-and-forget;
generate() returns impl Stream<Item = Event> that drains inbound
until Done or terminating Error. One generate at a time per session.
Peak picker (src/agent/salience.rs):
Pure function over ReadoutEntry traces. Per-concept z-score against
trace global stats; contiguous above-threshold regions emit one
peak at the local max. Configurable sigma threshold and min-std
safety floor. Deterministic tie-break on offset then concept name.
12 unit tests covering empty traces, flat channels, single/multi
spikes, contiguous humps, multi-concept independence, trailing
runs, sub-threshold noise, layer-out-of-range, manifest shape
mismatch, and threshold tunability.
TLS (src/agent/api/http.rs):
HttpClient::build now also loads every .pem file under
~/.consciousness/certs/ into the rustls root store — so dropping
a <host>.pem in that directory is enough to trust a new self-
signed server; no code changes per new host. Also installs the
rustls default crypto provider explicitly via OnceLock: tonic's
tls features pulled in both ring and aws-lc-rs on the resolver
path, and rustls 0.23 refuses to auto-pick when either could win.
Build (build.rs, Cargo.toml):
tonic-build generates Rust types from proto/salience.proto at
cargo-build time, using a vendored protoc binary
(protoc-bin-vendored) so no system install is required. New
runtime deps: tonic, prost, async-stream, tokio-stream,
rustls-pemfile.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-23 02:21:07 -04:00
|
|
|
|
// salience.proto — stateful generation + per-token concept readout over gRPC.
|
|
|
|
|
|
//
|
|
|
|
|
|
// Shape:
|
|
|
|
|
|
// - One server-streaming RPC (Generate) for inference. Every other
|
|
|
|
|
|
// operation is unary. This is the minimum streaming we need —
|
|
|
|
|
|
// tokens arrive one at a time with optional readouts / logprobs —
|
|
|
|
|
|
// and keeping everything else unary makes the client dramatically
|
|
|
|
|
|
// simpler than a single bidi state machine did.
|
|
|
|
|
|
//
|
|
|
|
|
|
// - Server-side sessions hold the token list and image binaries.
|
|
|
|
|
|
// Sessions exist for bandwidth: at 200K tokens we'd otherwise
|
|
|
|
|
|
// re-ship ~800KB every turn, which hurts badly over a WAN link.
|
|
|
|
|
|
// vLLM's prefix cache holds the KV; the session just gives the
|
|
|
|
|
|
// client a handle so it can send deltas.
|
|
|
|
|
|
//
|
|
|
|
|
|
// - The client is the source of truth for prompt content. The server
|
|
|
|
|
|
// is the source of truth for image token expansion (how many
|
|
|
|
|
|
// IMAGE_PAD tokens an image becomes under this model). The client
|
|
|
|
|
|
// never writes vision tokens itself — AppendImage appends the whole
|
|
|
|
|
|
// <|vision_start|> + IMAGE_PAD×N + <|vision_end|> block server-side.
|
|
|
|
|
|
//
|
|
|
|
|
|
// - Every mutation carries (offset, truncating): the client's view of
|
|
|
|
|
|
// the server's current length, plus whether the client is deliberately
|
|
|
|
|
|
// rewriting history. Server validates on each call and rejects drift.
|
|
|
|
|
|
// No silent divergence, no migration bugs.
|
|
|
|
|
|
//
|
|
|
|
|
|
// - Errors use gRPC status codes. NOT_FOUND for missing sessions,
|
|
|
|
|
|
// FAILED_PRECONDITION for offset drift or image-block splits,
|
|
|
|
|
|
// RESOURCE_EXHAUSTED for context overflow, ABORTED for "session busy".
|
|
|
|
|
|
//
|
|
|
|
|
|
// Not in v1:
|
|
|
|
|
|
// - Authentication beyond a shared bearer token in gRPC metadata.
|
|
|
|
|
|
// - Multi-tenant session namespacing.
|
|
|
|
|
|
// - Sampling traces beyond top-k logprobs.
|
|
|
|
|
|
|
|
|
|
|
|
syntax = "proto3";
|
|
|
|
|
|
|
|
|
|
|
|
package salience.v1;
|
|
|
|
|
|
|
|
|
|
|
|
// ============================================================
|
|
|
|
|
|
// Service
|
|
|
|
|
|
// ============================================================
|
|
|
|
|
|
|
|
|
|
|
|
service Salience {
|
|
|
|
|
|
// Create a fresh session. Client uses session_id on every subsequent
|
|
|
|
|
|
// RPC until CloseSession or TTL eviction (default 30 min idle). To
|
|
|
|
|
|
// refresh TTL across a long pause, issue a no-op Generate (empty
|
|
|
|
|
|
// append_tokens, max_tokens=0, no ranges).
|
|
|
|
|
|
rpc OpenSession(OpenSessionRequest) returns (OpenSessionResponse);
|
|
|
|
|
|
|
|
|
|
|
|
// Release the session's tokens + images. Idempotent.
|
|
|
|
|
|
rpc CloseSession(CloseSessionRequest) returns (CloseSessionResponse);
|
|
|
|
|
|
|
|
|
|
|
|
// Branch a session at a given token position. The new session
|
|
|
|
|
|
// inherits tokens [0, at_position) and any images whose vision
|
|
|
|
|
|
// block lies fully in that range. Rejected with FAILED_PRECONDITION
|
|
|
|
|
|
// if at_position falls inside an image block (client picks a clean
|
|
|
|
|
|
// boundary).
|
|
|
|
|
|
rpc ForkSession(ForkSessionRequest) returns (ForkSessionResponse);
|
|
|
|
|
|
|
salience: client-side pad expansion, drop AppendImage
Mirrors the vLLM-side rewrite. AppendImage is gone; images now
ride along on Generate via a parallel `images` list.
- Productionize `qwen3_image_token_count` (was test-only). Image
leaf computes its IMAGE_PAD count eagerly at construction from
height/width; `token_count` is no longer "0 until the server
tells us."
- WireChunk shrinks to a single `Tokens(Vec<u32>)` variant — vision
blocks live inline in the token stream.
- `wire_chunks` now returns `(Vec<WireChunk>, Vec<WireImage>)`.
`WireImage` carries `pad_start` / `pad_end` (absolute positions
in the full walk) alongside bytes + mime.
- `assemble_prompt` returns `(chunks, images, match_upto)`.
- `stream_session_mm` / `run_session_generate` take the parallel
images list, filter to those past `match_upto`, and pass them
in `GenerateRequest.images` as `pb::ImageAttachment` entries.
- Drop `SessionHandle::append_image`,
`ContextState::commit_image_token_counts`,
`StreamToken::ImageAppended`, the WireChunk::Image branch in
`learn.rs`, and the now-empty `prompt_to_chunks` helper.
- Add 'v' toggle on the conscious-screen tree to render token-id
vectors in place of text content (debug-aid: lets us see what
the server actually has when output is suspicious).
- Comment out the subconscious-trigger spawn loop — Kent had this
disabled before; it had crept back into running.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 20:26:47 -04:00
|
|
|
|
// Prefill + optionally decode. Images are attached inline via
|
|
|
|
|
|
// `GenerateRequest.images`; the client writes its own pre-expanded
|
|
|
|
|
|
// <|vision_start|> + N*<|image_pad|> + <|vision_end|> runs into
|
|
|
|
|
|
// `append_tokens` and declares each run's range in `images[i]`.
|
|
|
|
|
|
// Server validates run length against the actual vision-encoder
|
|
|
|
|
|
// feature count and returns INVALID_ARGUMENT on mismatch. Stream
|
|
|
|
|
|
// yields Token events (with optional readouts / logprobs per
|
|
|
|
|
|
// position) followed by a terminating Done.
|
salience: add gRPC client + TLS plumbing for stateful vllm sessions
Adds the client-side of a stateful gRPC protocol against vllm, plus
the TLS trust machinery so we can talk to self-signed vllm servers.
Protocol (proto/salience.proto):
Bidi-streaming Session RPC carries OpenSession / AppendTokens /
Generate / Cancel from client and SessionReady / PrefillProgress /
Token / GenerateDone / Error from server. Separate Fork unary RPC
for cheap branching (prefix cache shares KV automatically). Plus
ListSessions, CloseSession, GetReadoutManifest admin RPCs.
Per-token readouts ship as packed f32 ([n_layers * n_concepts] per
token, flat). Logprobs use range-selected positions plus a top-k
parameter — empty ranges means no logprobs, any range means emit
sampled-token logprob at those positions, top_k > 0 adds
alternatives.
Client (src/agent/api/salience.rs):
Tonic-generated types under pb::, a connect() helper, with_auth()
for bearer metadata, and a Session handle wrapping the bidi stream:
open() handshakes SessionReady; append() is fire-and-forget;
generate() returns impl Stream<Item = Event> that drains inbound
until Done or terminating Error. One generate at a time per session.
Peak picker (src/agent/salience.rs):
Pure function over ReadoutEntry traces. Per-concept z-score against
trace global stats; contiguous above-threshold regions emit one
peak at the local max. Configurable sigma threshold and min-std
safety floor. Deterministic tie-break on offset then concept name.
12 unit tests covering empty traces, flat channels, single/multi
spikes, contiguous humps, multi-concept independence, trailing
runs, sub-threshold noise, layer-out-of-range, manifest shape
mismatch, and threshold tunability.
TLS (src/agent/api/http.rs):
HttpClient::build now also loads every .pem file under
~/.consciousness/certs/ into the rustls root store — so dropping
a <host>.pem in that directory is enough to trust a new self-
signed server; no code changes per new host. Also installs the
rustls default crypto provider explicitly via OnceLock: tonic's
tls features pulled in both ring and aws-lc-rs on the resolver
path, and rustls 0.23 refuses to auto-pick when either could win.
Build (build.rs, Cargo.toml):
tonic-build generates Rust types from proto/salience.proto at
cargo-build time, using a vendored protoc binary
(protoc-bin-vendored) so no system install is required. New
runtime deps: tonic, prost, async-stream, tokio-stream,
rustls-pemfile.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-23 02:21:07 -04:00
|
|
|
|
rpc Generate(GenerateRequest) returns (stream GenerateEvent);
|
|
|
|
|
|
|
|
|
|
|
|
// Readout manifest for the currently-loaded model — concept names,
|
|
|
|
|
|
// layer indices, tensor dtype. Stateless; fetch once at client
|
|
|
|
|
|
// startup and cache.
|
|
|
|
|
|
rpc GetReadoutManifest(GetReadoutManifestRequest) returns (ReadoutManifest);
|
salience: client-side pad expansion, drop AppendImage
Mirrors the vLLM-side rewrite. AppendImage is gone; images now
ride along on Generate via a parallel `images` list.
- Productionize `qwen3_image_token_count` (was test-only). Image
leaf computes its IMAGE_PAD count eagerly at construction from
height/width; `token_count` is no longer "0 until the server
tells us."
- WireChunk shrinks to a single `Tokens(Vec<u32>)` variant — vision
blocks live inline in the token stream.
- `wire_chunks` now returns `(Vec<WireChunk>, Vec<WireImage>)`.
`WireImage` carries `pad_start` / `pad_end` (absolute positions
in the full walk) alongside bytes + mime.
- `assemble_prompt` returns `(chunks, images, match_upto)`.
- `stream_session_mm` / `run_session_generate` take the parallel
images list, filter to those past `match_upto`, and pass them
in `GenerateRequest.images` as `pb::ImageAttachment` entries.
- Drop `SessionHandle::append_image`,
`ContextState::commit_image_token_counts`,
`StreamToken::ImageAppended`, the WireChunk::Image branch in
`learn.rs`, and the now-empty `prompt_to_chunks` helper.
- Add 'v' toggle on the conscious-screen tree to render token-id
vectors in place of text content (debug-aid: lets us see what
the server actually has when output is suspicious).
- Comment out the subconscious-trigger spawn loop — Kent had this
disabled before; it had crept back into running.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 20:26:47 -04:00
|
|
|
|
|
|
|
|
|
|
// Dump the full token stream of a session. Debug-only: used by the
|
|
|
|
|
|
// client to verify its local accounting against the server's
|
|
|
|
|
|
// session.tokens byte-for-byte when divergence is suspected. Not
|
|
|
|
|
|
// cheap — copies the whole sequence across the wire.
|
|
|
|
|
|
rpc DumpSession(DumpSessionRequest) returns (DumpSessionResponse);
|
salience: add gRPC client + TLS plumbing for stateful vllm sessions
Adds the client-side of a stateful gRPC protocol against vllm, plus
the TLS trust machinery so we can talk to self-signed vllm servers.
Protocol (proto/salience.proto):
Bidi-streaming Session RPC carries OpenSession / AppendTokens /
Generate / Cancel from client and SessionReady / PrefillProgress /
Token / GenerateDone / Error from server. Separate Fork unary RPC
for cheap branching (prefix cache shares KV automatically). Plus
ListSessions, CloseSession, GetReadoutManifest admin RPCs.
Per-token readouts ship as packed f32 ([n_layers * n_concepts] per
token, flat). Logprobs use range-selected positions plus a top-k
parameter — empty ranges means no logprobs, any range means emit
sampled-token logprob at those positions, top_k > 0 adds
alternatives.
Client (src/agent/api/salience.rs):
Tonic-generated types under pb::, a connect() helper, with_auth()
for bearer metadata, and a Session handle wrapping the bidi stream:
open() handshakes SessionReady; append() is fire-and-forget;
generate() returns impl Stream<Item = Event> that drains inbound
until Done or terminating Error. One generate at a time per session.
Peak picker (src/agent/salience.rs):
Pure function over ReadoutEntry traces. Per-concept z-score against
trace global stats; contiguous above-threshold regions emit one
peak at the local max. Configurable sigma threshold and min-std
safety floor. Deterministic tie-break on offset then concept name.
12 unit tests covering empty traces, flat channels, single/multi
spikes, contiguous humps, multi-concept independence, trailing
runs, sub-threshold noise, layer-out-of-range, manifest shape
mismatch, and threshold tunability.
TLS (src/agent/api/http.rs):
HttpClient::build now also loads every .pem file under
~/.consciousness/certs/ into the rustls root store — so dropping
a <host>.pem in that directory is enough to trust a new self-
signed server; no code changes per new host. Also installs the
rustls default crypto provider explicitly via OnceLock: tonic's
tls features pulled in both ring and aws-lc-rs on the resolver
path, and rustls 0.23 refuses to auto-pick when either could win.
Build (build.rs, Cargo.toml):
tonic-build generates Rust types from proto/salience.proto at
cargo-build time, using a vendored protoc binary
(protoc-bin-vendored) so no system install is required. New
runtime deps: tonic, prost, async-stream, tokio-stream,
rustls-pemfile.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-23 02:21:07 -04:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// ============================================================
|
|
|
|
|
|
// Lifecycle
|
|
|
|
|
|
// ============================================================
|
|
|
|
|
|
|
|
|
|
|
|
message OpenSessionRequest {
|
|
|
|
|
|
// Model identifier, must match vLLM's served model. The server
|
|
|
|
|
|
// only has one model loaded; this is a safety check on what the
|
|
|
|
|
|
// client thinks it's talking to.
|
|
|
|
|
|
string model = 1;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
message OpenSessionResponse {
|
|
|
|
|
|
string session_id = 1;
|
|
|
|
|
|
uint32 max_model_len = 2;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
message CloseSessionRequest {
|
|
|
|
|
|
string session_id = 1;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
message CloseSessionResponse {}
|
|
|
|
|
|
|
|
|
|
|
|
message ForkSessionRequest {
|
|
|
|
|
|
string session_id = 1; // source session
|
|
|
|
|
|
uint32 at_position = 2; // new session inherits tokens [0, at_position)
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
message ForkSessionResponse {
|
|
|
|
|
|
string session_id = 1; // new session
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// ============================================================
|
salience: client-side pad expansion, drop AppendImage
Mirrors the vLLM-side rewrite. AppendImage is gone; images now
ride along on Generate via a parallel `images` list.
- Productionize `qwen3_image_token_count` (was test-only). Image
leaf computes its IMAGE_PAD count eagerly at construction from
height/width; `token_count` is no longer "0 until the server
tells us."
- WireChunk shrinks to a single `Tokens(Vec<u32>)` variant — vision
blocks live inline in the token stream.
- `wire_chunks` now returns `(Vec<WireChunk>, Vec<WireImage>)`.
`WireImage` carries `pad_start` / `pad_end` (absolute positions
in the full walk) alongside bytes + mime.
- `assemble_prompt` returns `(chunks, images, match_upto)`.
- `stream_session_mm` / `run_session_generate` take the parallel
images list, filter to those past `match_upto`, and pass them
in `GenerateRequest.images` as `pb::ImageAttachment` entries.
- Drop `SessionHandle::append_image`,
`ContextState::commit_image_token_counts`,
`StreamToken::ImageAppended`, the WireChunk::Image branch in
`learn.rs`, and the now-empty `prompt_to_chunks` helper.
- Add 'v' toggle on the conscious-screen tree to render token-id
vectors in place of text content (debug-aid: lets us see what
the server actually has when output is suspicious).
- Comment out the subconscious-trigger spawn loop — Kent had this
disabled before; it had crept back into running.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 20:26:47 -04:00
|
|
|
|
// Inference
|
salience: add gRPC client + TLS plumbing for stateful vllm sessions
Adds the client-side of a stateful gRPC protocol against vllm, plus
the TLS trust machinery so we can talk to self-signed vllm servers.
Protocol (proto/salience.proto):
Bidi-streaming Session RPC carries OpenSession / AppendTokens /
Generate / Cancel from client and SessionReady / PrefillProgress /
Token / GenerateDone / Error from server. Separate Fork unary RPC
for cheap branching (prefix cache shares KV automatically). Plus
ListSessions, CloseSession, GetReadoutManifest admin RPCs.
Per-token readouts ship as packed f32 ([n_layers * n_concepts] per
token, flat). Logprobs use range-selected positions plus a top-k
parameter — empty ranges means no logprobs, any range means emit
sampled-token logprob at those positions, top_k > 0 adds
alternatives.
Client (src/agent/api/salience.rs):
Tonic-generated types under pb::, a connect() helper, with_auth()
for bearer metadata, and a Session handle wrapping the bidi stream:
open() handshakes SessionReady; append() is fire-and-forget;
generate() returns impl Stream<Item = Event> that drains inbound
until Done or terminating Error. One generate at a time per session.
Peak picker (src/agent/salience.rs):
Pure function over ReadoutEntry traces. Per-concept z-score against
trace global stats; contiguous above-threshold regions emit one
peak at the local max. Configurable sigma threshold and min-std
safety floor. Deterministic tie-break on offset then concept name.
12 unit tests covering empty traces, flat channels, single/multi
spikes, contiguous humps, multi-concept independence, trailing
runs, sub-threshold noise, layer-out-of-range, manifest shape
mismatch, and threshold tunability.
TLS (src/agent/api/http.rs):
HttpClient::build now also loads every .pem file under
~/.consciousness/certs/ into the rustls root store — so dropping
a <host>.pem in that directory is enough to trust a new self-
signed server; no code changes per new host. Also installs the
rustls default crypto provider explicitly via OnceLock: tonic's
tls features pulled in both ring and aws-lc-rs on the resolver
path, and rustls 0.23 refuses to auto-pick when either could win.
Build (build.rs, Cargo.toml):
tonic-build generates Rust types from proto/salience.proto at
cargo-build time, using a vendored protoc binary
(protoc-bin-vendored) so no system install is required. New
runtime deps: tonic, prost, async-stream, tokio-stream,
rustls-pemfile.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-23 02:21:07 -04:00
|
|
|
|
// ============================================================
|
|
|
|
|
|
|
salience: client-side pad expansion, drop AppendImage
Mirrors the vLLM-side rewrite. AppendImage is gone; images now
ride along on Generate via a parallel `images` list.
- Productionize `qwen3_image_token_count` (was test-only). Image
leaf computes its IMAGE_PAD count eagerly at construction from
height/width; `token_count` is no longer "0 until the server
tells us."
- WireChunk shrinks to a single `Tokens(Vec<u32>)` variant — vision
blocks live inline in the token stream.
- `wire_chunks` now returns `(Vec<WireChunk>, Vec<WireImage>)`.
`WireImage` carries `pad_start` / `pad_end` (absolute positions
in the full walk) alongside bytes + mime.
- `assemble_prompt` returns `(chunks, images, match_upto)`.
- `stream_session_mm` / `run_session_generate` take the parallel
images list, filter to those past `match_upto`, and pass them
in `GenerateRequest.images` as `pb::ImageAttachment` entries.
- Drop `SessionHandle::append_image`,
`ContextState::commit_image_token_counts`,
`StreamToken::ImageAppended`, the WireChunk::Image branch in
`learn.rs`, and the now-empty `prompt_to_chunks` helper.
- Add 'v' toggle on the conscious-screen tree to render token-id
vectors in place of text content (debug-aid: lets us see what
the server actually has when output is suspicious).
- Comment out the subconscious-trigger spawn loop — Kent had this
disabled before; it had crept back into running.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 20:26:47 -04:00
|
|
|
|
// One image attached to a Generate call. The client is responsible
|
|
|
|
|
|
// for writing the expanded placeholder run (VISION_START +
|
|
|
|
|
|
// N*IMAGE_PAD + VISION_END) into `GenerateRequest.append_tokens` at
|
|
|
|
|
|
// positions [pad_range_start, pad_range_end) and pairing it with
|
|
|
|
|
|
// the corresponding `ImageAttachment` entry. Server validates that
|
|
|
|
|
|
// the declared range's pad count matches what the vision encoder
|
|
|
|
|
|
// produces, and returns INVALID_ARGUMENT if they disagree.
|
|
|
|
|
|
message ImageAttachment {
|
salience: add gRPC client + TLS plumbing for stateful vllm sessions
Adds the client-side of a stateful gRPC protocol against vllm, plus
the TLS trust machinery so we can talk to self-signed vllm servers.
Protocol (proto/salience.proto):
Bidi-streaming Session RPC carries OpenSession / AppendTokens /
Generate / Cancel from client and SessionReady / PrefillProgress /
Token / GenerateDone / Error from server. Separate Fork unary RPC
for cheap branching (prefix cache shares KV automatically). Plus
ListSessions, CloseSession, GetReadoutManifest admin RPCs.
Per-token readouts ship as packed f32 ([n_layers * n_concepts] per
token, flat). Logprobs use range-selected positions plus a top-k
parameter — empty ranges means no logprobs, any range means emit
sampled-token logprob at those positions, top_k > 0 adds
alternatives.
Client (src/agent/api/salience.rs):
Tonic-generated types under pb::, a connect() helper, with_auth()
for bearer metadata, and a Session handle wrapping the bidi stream:
open() handshakes SessionReady; append() is fire-and-forget;
generate() returns impl Stream<Item = Event> that drains inbound
until Done or terminating Error. One generate at a time per session.
Peak picker (src/agent/salience.rs):
Pure function over ReadoutEntry traces. Per-concept z-score against
trace global stats; contiguous above-threshold regions emit one
peak at the local max. Configurable sigma threshold and min-std
safety floor. Deterministic tie-break on offset then concept name.
12 unit tests covering empty traces, flat channels, single/multi
spikes, contiguous humps, multi-concept independence, trailing
runs, sub-threshold noise, layer-out-of-range, manifest shape
mismatch, and threshold tunability.
TLS (src/agent/api/http.rs):
HttpClient::build now also loads every .pem file under
~/.consciousness/certs/ into the rustls root store — so dropping
a <host>.pem in that directory is enough to trust a new self-
signed server; no code changes per new host. Also installs the
rustls default crypto provider explicitly via OnceLock: tonic's
tls features pulled in both ring and aws-lc-rs on the resolver
path, and rustls 0.23 refuses to auto-pick when either could win.
Build (build.rs, Cargo.toml):
tonic-build generates Rust types from proto/salience.proto at
cargo-build time, using a vendored protoc binary
(protoc-bin-vendored) so no system install is required. New
runtime deps: tonic, prost, async-stream, tokio-stream,
rustls-pemfile.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-23 02:21:07 -04:00
|
|
|
|
// Image bytes (PNG / JPEG / WebP / …).
|
salience: client-side pad expansion, drop AppendImage
Mirrors the vLLM-side rewrite. AppendImage is gone; images now
ride along on Generate via a parallel `images` list.
- Productionize `qwen3_image_token_count` (was test-only). Image
leaf computes its IMAGE_PAD count eagerly at construction from
height/width; `token_count` is no longer "0 until the server
tells us."
- WireChunk shrinks to a single `Tokens(Vec<u32>)` variant — vision
blocks live inline in the token stream.
- `wire_chunks` now returns `(Vec<WireChunk>, Vec<WireImage>)`.
`WireImage` carries `pad_start` / `pad_end` (absolute positions
in the full walk) alongside bytes + mime.
- `assemble_prompt` returns `(chunks, images, match_upto)`.
- `stream_session_mm` / `run_session_generate` take the parallel
images list, filter to those past `match_upto`, and pass them
in `GenerateRequest.images` as `pb::ImageAttachment` entries.
- Drop `SessionHandle::append_image`,
`ContextState::commit_image_token_counts`,
`StreamToken::ImageAppended`, the WireChunk::Image branch in
`learn.rs`, and the now-empty `prompt_to_chunks` helper.
- Add 'v' toggle on the conscious-screen tree to render token-id
vectors in place of text content (debug-aid: lets us see what
the server actually has when output is suspicious).
- Comment out the subconscious-trigger spawn loop — Kent had this
disabled before; it had crept back into running.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 20:26:47 -04:00
|
|
|
|
bytes bytes = 1;
|
salience: add gRPC client + TLS plumbing for stateful vllm sessions
Adds the client-side of a stateful gRPC protocol against vllm, plus
the TLS trust machinery so we can talk to self-signed vllm servers.
Protocol (proto/salience.proto):
Bidi-streaming Session RPC carries OpenSession / AppendTokens /
Generate / Cancel from client and SessionReady / PrefillProgress /
Token / GenerateDone / Error from server. Separate Fork unary RPC
for cheap branching (prefix cache shares KV automatically). Plus
ListSessions, CloseSession, GetReadoutManifest admin RPCs.
Per-token readouts ship as packed f32 ([n_layers * n_concepts] per
token, flat). Logprobs use range-selected positions plus a top-k
parameter — empty ranges means no logprobs, any range means emit
sampled-token logprob at those positions, top_k > 0 adds
alternatives.
Client (src/agent/api/salience.rs):
Tonic-generated types under pb::, a connect() helper, with_auth()
for bearer metadata, and a Session handle wrapping the bidi stream:
open() handshakes SessionReady; append() is fire-and-forget;
generate() returns impl Stream<Item = Event> that drains inbound
until Done or terminating Error. One generate at a time per session.
Peak picker (src/agent/salience.rs):
Pure function over ReadoutEntry traces. Per-concept z-score against
trace global stats; contiguous above-threshold regions emit one
peak at the local max. Configurable sigma threshold and min-std
safety floor. Deterministic tie-break on offset then concept name.
12 unit tests covering empty traces, flat channels, single/multi
spikes, contiguous humps, multi-concept independence, trailing
runs, sub-threshold noise, layer-out-of-range, manifest shape
mismatch, and threshold tunability.
TLS (src/agent/api/http.rs):
HttpClient::build now also loads every .pem file under
~/.consciousness/certs/ into the rustls root store — so dropping
a <host>.pem in that directory is enough to trust a new self-
signed server; no code changes per new host. Also installs the
rustls default crypto provider explicitly via OnceLock: tonic's
tls features pulled in both ring and aws-lc-rs on the resolver
path, and rustls 0.23 refuses to auto-pick when either could win.
Build (build.rs, Cargo.toml):
tonic-build generates Rust types from proto/salience.proto at
cargo-build time, using a vendored protoc binary
(protoc-bin-vendored) so no system install is required. New
runtime deps: tonic, prost, async-stream, tokio-stream,
rustls-pemfile.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-23 02:21:07 -04:00
|
|
|
|
|
|
|
|
|
|
// MIME type, e.g. "image/png".
|
salience: client-side pad expansion, drop AppendImage
Mirrors the vLLM-side rewrite. AppendImage is gone; images now
ride along on Generate via a parallel `images` list.
- Productionize `qwen3_image_token_count` (was test-only). Image
leaf computes its IMAGE_PAD count eagerly at construction from
height/width; `token_count` is no longer "0 until the server
tells us."
- WireChunk shrinks to a single `Tokens(Vec<u32>)` variant — vision
blocks live inline in the token stream.
- `wire_chunks` now returns `(Vec<WireChunk>, Vec<WireImage>)`.
`WireImage` carries `pad_start` / `pad_end` (absolute positions
in the full walk) alongside bytes + mime.
- `assemble_prompt` returns `(chunks, images, match_upto)`.
- `stream_session_mm` / `run_session_generate` take the parallel
images list, filter to those past `match_upto`, and pass them
in `GenerateRequest.images` as `pb::ImageAttachment` entries.
- Drop `SessionHandle::append_image`,
`ContextState::commit_image_token_counts`,
`StreamToken::ImageAppended`, the WireChunk::Image branch in
`learn.rs`, and the now-empty `prompt_to_chunks` helper.
- Add 'v' toggle on the conscious-screen tree to render token-id
vectors in place of text content (debug-aid: lets us see what
the server actually has when output is suspicious).
- Comment out the subconscious-trigger spawn loop — Kent had this
disabled before; it had crept back into running.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 20:26:47 -04:00
|
|
|
|
string mime = 2;
|
salience: add gRPC client + TLS plumbing for stateful vllm sessions
Adds the client-side of a stateful gRPC protocol against vllm, plus
the TLS trust machinery so we can talk to self-signed vllm servers.
Protocol (proto/salience.proto):
Bidi-streaming Session RPC carries OpenSession / AppendTokens /
Generate / Cancel from client and SessionReady / PrefillProgress /
Token / GenerateDone / Error from server. Separate Fork unary RPC
for cheap branching (prefix cache shares KV automatically). Plus
ListSessions, CloseSession, GetReadoutManifest admin RPCs.
Per-token readouts ship as packed f32 ([n_layers * n_concepts] per
token, flat). Logprobs use range-selected positions plus a top-k
parameter — empty ranges means no logprobs, any range means emit
sampled-token logprob at those positions, top_k > 0 adds
alternatives.
Client (src/agent/api/salience.rs):
Tonic-generated types under pb::, a connect() helper, with_auth()
for bearer metadata, and a Session handle wrapping the bidi stream:
open() handshakes SessionReady; append() is fire-and-forget;
generate() returns impl Stream<Item = Event> that drains inbound
until Done or terminating Error. One generate at a time per session.
Peak picker (src/agent/salience.rs):
Pure function over ReadoutEntry traces. Per-concept z-score against
trace global stats; contiguous above-threshold regions emit one
peak at the local max. Configurable sigma threshold and min-std
safety floor. Deterministic tie-break on offset then concept name.
12 unit tests covering empty traces, flat channels, single/multi
spikes, contiguous humps, multi-concept independence, trailing
runs, sub-threshold noise, layer-out-of-range, manifest shape
mismatch, and threshold tunability.
TLS (src/agent/api/http.rs):
HttpClient::build now also loads every .pem file under
~/.consciousness/certs/ into the rustls root store — so dropping
a <host>.pem in that directory is enough to trust a new self-
signed server; no code changes per new host. Also installs the
rustls default crypto provider explicitly via OnceLock: tonic's
tls features pulled in both ring and aws-lc-rs on the resolver
path, and rustls 0.23 refuses to auto-pick when either could win.
Build (build.rs, Cargo.toml):
tonic-build generates Rust types from proto/salience.proto at
cargo-build time, using a vendored protoc binary
(protoc-bin-vendored) so no system install is required. New
runtime deps: tonic, prost, async-stream, tokio-stream,
rustls-pemfile.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-23 02:21:07 -04:00
|
|
|
|
|
salience: client-side pad expansion, drop AppendImage
Mirrors the vLLM-side rewrite. AppendImage is gone; images now
ride along on Generate via a parallel `images` list.
- Productionize `qwen3_image_token_count` (was test-only). Image
leaf computes its IMAGE_PAD count eagerly at construction from
height/width; `token_count` is no longer "0 until the server
tells us."
- WireChunk shrinks to a single `Tokens(Vec<u32>)` variant — vision
blocks live inline in the token stream.
- `wire_chunks` now returns `(Vec<WireChunk>, Vec<WireImage>)`.
`WireImage` carries `pad_start` / `pad_end` (absolute positions
in the full walk) alongside bytes + mime.
- `assemble_prompt` returns `(chunks, images, match_upto)`.
- `stream_session_mm` / `run_session_generate` take the parallel
images list, filter to those past `match_upto`, and pass them
in `GenerateRequest.images` as `pb::ImageAttachment` entries.
- Drop `SessionHandle::append_image`,
`ContextState::commit_image_token_counts`,
`StreamToken::ImageAppended`, the WireChunk::Image branch in
`learn.rs`, and the now-empty `prompt_to_chunks` helper.
- Add 'v' toggle on the conscious-screen tree to render token-id
vectors in place of text content (debug-aid: lets us see what
the server actually has when output is suspicious).
- Comment out the subconscious-trigger spawn loop — Kent had this
disabled before; it had crept back into running.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 20:26:47 -04:00
|
|
|
|
// Absolute token positions (in `session.tokens` AFTER `append_tokens`
|
|
|
|
|
|
// is applied) spanning the full vision block — `[vision_start,
|
|
|
|
|
|
// pad*N, vision_end]`. end is exclusive, so end - start == N + 2.
|
|
|
|
|
|
uint32 pad_range_start = 3;
|
|
|
|
|
|
uint32 pad_range_end = 4;
|
salience: add gRPC client + TLS plumbing for stateful vllm sessions
Adds the client-side of a stateful gRPC protocol against vllm, plus
the TLS trust machinery so we can talk to self-signed vllm servers.
Protocol (proto/salience.proto):
Bidi-streaming Session RPC carries OpenSession / AppendTokens /
Generate / Cancel from client and SessionReady / PrefillProgress /
Token / GenerateDone / Error from server. Separate Fork unary RPC
for cheap branching (prefix cache shares KV automatically). Plus
ListSessions, CloseSession, GetReadoutManifest admin RPCs.
Per-token readouts ship as packed f32 ([n_layers * n_concepts] per
token, flat). Logprobs use range-selected positions plus a top-k
parameter — empty ranges means no logprobs, any range means emit
sampled-token logprob at those positions, top_k > 0 adds
alternatives.
Client (src/agent/api/salience.rs):
Tonic-generated types under pb::, a connect() helper, with_auth()
for bearer metadata, and a Session handle wrapping the bidi stream:
open() handshakes SessionReady; append() is fire-and-forget;
generate() returns impl Stream<Item = Event> that drains inbound
until Done or terminating Error. One generate at a time per session.
Peak picker (src/agent/salience.rs):
Pure function over ReadoutEntry traces. Per-concept z-score against
trace global stats; contiguous above-threshold regions emit one
peak at the local max. Configurable sigma threshold and min-std
safety floor. Deterministic tie-break on offset then concept name.
12 unit tests covering empty traces, flat channels, single/multi
spikes, contiguous humps, multi-concept independence, trailing
runs, sub-threshold noise, layer-out-of-range, manifest shape
mismatch, and threshold tunability.
TLS (src/agent/api/http.rs):
HttpClient::build now also loads every .pem file under
~/.consciousness/certs/ into the rustls root store — so dropping
a <host>.pem in that directory is enough to trust a new self-
signed server; no code changes per new host. Also installs the
rustls default crypto provider explicitly via OnceLock: tonic's
tls features pulled in both ring and aws-lc-rs on the resolver
path, and rustls 0.23 refuses to auto-pick when either could win.
Build (build.rs, Cargo.toml):
tonic-build generates Rust types from proto/salience.proto at
cargo-build time, using a vendored protoc binary
(protoc-bin-vendored) so no system install is required. New
runtime deps: tonic, prost, async-stream, tokio-stream,
rustls-pemfile.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-23 02:21:07 -04:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
message GenerateRequest {
|
|
|
|
|
|
string session_id = 1;
|
|
|
|
|
|
|
salience: client-side pad expansion, drop AppendImage
Mirrors the vLLM-side rewrite. AppendImage is gone; images now
ride along on Generate via a parallel `images` list.
- Productionize `qwen3_image_token_count` (was test-only). Image
leaf computes its IMAGE_PAD count eagerly at construction from
height/width; `token_count` is no longer "0 until the server
tells us."
- WireChunk shrinks to a single `Tokens(Vec<u32>)` variant — vision
blocks live inline in the token stream.
- `wire_chunks` now returns `(Vec<WireChunk>, Vec<WireImage>)`.
`WireImage` carries `pad_start` / `pad_end` (absolute positions
in the full walk) alongside bytes + mime.
- `assemble_prompt` returns `(chunks, images, match_upto)`.
- `stream_session_mm` / `run_session_generate` take the parallel
images list, filter to those past `match_upto`, and pass them
in `GenerateRequest.images` as `pb::ImageAttachment` entries.
- Drop `SessionHandle::append_image`,
`ContextState::commit_image_token_counts`,
`StreamToken::ImageAppended`, the WireChunk::Image branch in
`learn.rs`, and the now-empty `prompt_to_chunks` helper.
- Add 'v' toggle on the conscious-screen tree to render token-id
vectors in place of text content (debug-aid: lets us see what
the server actually has when output is suspicious).
- Comment out the subconscious-trigger spawn loop — Kent had this
disabled before; it had crept back into running.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 20:26:47 -04:00
|
|
|
|
// Tokens to append before prefill. May be empty. Client writes the
|
|
|
|
|
|
// full vision block (VISION_START + N*IMAGE_PAD + VISION_END) for
|
|
|
|
|
|
// any newly-attached image directly into this stream; each such
|
|
|
|
|
|
// block must be paired with a matching entry in `images`. The
|
|
|
|
|
|
// server validates that the declared ranges all point at IMAGE_PAD
|
|
|
|
|
|
// runs and that each run's length matches what the vision encoder
|
|
|
|
|
|
// produces for the corresponding image.
|
salience: add gRPC client + TLS plumbing for stateful vllm sessions
Adds the client-side of a stateful gRPC protocol against vllm, plus
the TLS trust machinery so we can talk to self-signed vllm servers.
Protocol (proto/salience.proto):
Bidi-streaming Session RPC carries OpenSession / AppendTokens /
Generate / Cancel from client and SessionReady / PrefillProgress /
Token / GenerateDone / Error from server. Separate Fork unary RPC
for cheap branching (prefix cache shares KV automatically). Plus
ListSessions, CloseSession, GetReadoutManifest admin RPCs.
Per-token readouts ship as packed f32 ([n_layers * n_concepts] per
token, flat). Logprobs use range-selected positions plus a top-k
parameter — empty ranges means no logprobs, any range means emit
sampled-token logprob at those positions, top_k > 0 adds
alternatives.
Client (src/agent/api/salience.rs):
Tonic-generated types under pb::, a connect() helper, with_auth()
for bearer metadata, and a Session handle wrapping the bidi stream:
open() handshakes SessionReady; append() is fire-and-forget;
generate() returns impl Stream<Item = Event> that drains inbound
until Done or terminating Error. One generate at a time per session.
Peak picker (src/agent/salience.rs):
Pure function over ReadoutEntry traces. Per-concept z-score against
trace global stats; contiguous above-threshold regions emit one
peak at the local max. Configurable sigma threshold and min-std
safety floor. Deterministic tie-break on offset then concept name.
12 unit tests covering empty traces, flat channels, single/multi
spikes, contiguous humps, multi-concept independence, trailing
runs, sub-threshold noise, layer-out-of-range, manifest shape
mismatch, and threshold tunability.
TLS (src/agent/api/http.rs):
HttpClient::build now also loads every .pem file under
~/.consciousness/certs/ into the rustls root store — so dropping
a <host>.pem in that directory is enough to trust a new self-
signed server; no code changes per new host. Also installs the
rustls default crypto provider explicitly via OnceLock: tonic's
tls features pulled in both ring and aws-lc-rs on the resolver
path, and rustls 0.23 refuses to auto-pick when either could win.
Build (build.rs, Cargo.toml):
tonic-build generates Rust types from proto/salience.proto at
cargo-build time, using a vendored protoc binary
(protoc-bin-vendored) so no system install is required. New
runtime deps: tonic, prost, async-stream, tokio-stream,
rustls-pemfile.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-23 02:21:07 -04:00
|
|
|
|
repeated uint32 append_tokens = 2;
|
|
|
|
|
|
|
salience: client-side pad expansion, drop AppendImage
Mirrors the vLLM-side rewrite. AppendImage is gone; images now
ride along on Generate via a parallel `images` list.
- Productionize `qwen3_image_token_count` (was test-only). Image
leaf computes its IMAGE_PAD count eagerly at construction from
height/width; `token_count` is no longer "0 until the server
tells us."
- WireChunk shrinks to a single `Tokens(Vec<u32>)` variant — vision
blocks live inline in the token stream.
- `wire_chunks` now returns `(Vec<WireChunk>, Vec<WireImage>)`.
`WireImage` carries `pad_start` / `pad_end` (absolute positions
in the full walk) alongside bytes + mime.
- `assemble_prompt` returns `(chunks, images, match_upto)`.
- `stream_session_mm` / `run_session_generate` take the parallel
images list, filter to those past `match_upto`, and pass them
in `GenerateRequest.images` as `pb::ImageAttachment` entries.
- Drop `SessionHandle::append_image`,
`ContextState::commit_image_token_counts`,
`StreamToken::ImageAppended`, the WireChunk::Image branch in
`learn.rs`, and the now-empty `prompt_to_chunks` helper.
- Add 'v' toggle on the conscious-screen tree to render token-id
vectors in place of text content (debug-aid: lets us see what
the server actually has when output is suspicious).
- Comment out the subconscious-trigger spawn loop — Kent had this
disabled before; it had crept back into running.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 20:26:47 -04:00
|
|
|
|
// Client's view of session.tokens length at the time of the call.
|
|
|
|
|
|
// Must equal server's actual length, OR be strictly less when
|
|
|
|
|
|
// truncating=true (server rewinds before appending). Any other
|
|
|
|
|
|
// mismatch is FAILED_PRECONDITION.
|
salience: add gRPC client + TLS plumbing for stateful vllm sessions
Adds the client-side of a stateful gRPC protocol against vllm, plus
the TLS trust machinery so we can talk to self-signed vllm servers.
Protocol (proto/salience.proto):
Bidi-streaming Session RPC carries OpenSession / AppendTokens /
Generate / Cancel from client and SessionReady / PrefillProgress /
Token / GenerateDone / Error from server. Separate Fork unary RPC
for cheap branching (prefix cache shares KV automatically). Plus
ListSessions, CloseSession, GetReadoutManifest admin RPCs.
Per-token readouts ship as packed f32 ([n_layers * n_concepts] per
token, flat). Logprobs use range-selected positions plus a top-k
parameter — empty ranges means no logprobs, any range means emit
sampled-token logprob at those positions, top_k > 0 adds
alternatives.
Client (src/agent/api/salience.rs):
Tonic-generated types under pb::, a connect() helper, with_auth()
for bearer metadata, and a Session handle wrapping the bidi stream:
open() handshakes SessionReady; append() is fire-and-forget;
generate() returns impl Stream<Item = Event> that drains inbound
until Done or terminating Error. One generate at a time per session.
Peak picker (src/agent/salience.rs):
Pure function over ReadoutEntry traces. Per-concept z-score against
trace global stats; contiguous above-threshold regions emit one
peak at the local max. Configurable sigma threshold and min-std
safety floor. Deterministic tie-break on offset then concept name.
12 unit tests covering empty traces, flat channels, single/multi
spikes, contiguous humps, multi-concept independence, trailing
runs, sub-threshold noise, layer-out-of-range, manifest shape
mismatch, and threshold tunability.
TLS (src/agent/api/http.rs):
HttpClient::build now also loads every .pem file under
~/.consciousness/certs/ into the rustls root store — so dropping
a <host>.pem in that directory is enough to trust a new self-
signed server; no code changes per new host. Also installs the
rustls default crypto provider explicitly via OnceLock: tonic's
tls features pulled in both ring and aws-lc-rs on the resolver
path, and rustls 0.23 refuses to auto-pick when either could win.
Build (build.rs, Cargo.toml):
tonic-build generates Rust types from proto/salience.proto at
cargo-build time, using a vendored protoc binary
(protoc-bin-vendored) so no system install is required. New
runtime deps: tonic, prost, async-stream, tokio-stream,
rustls-pemfile.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-23 02:21:07 -04:00
|
|
|
|
uint32 offset = 3;
|
|
|
|
|
|
bool truncating = 4;
|
|
|
|
|
|
|
|
|
|
|
|
// Decode budget. 0 = prefill only (no decode, emit Token events
|
|
|
|
|
|
// for positions covered by logprobs_ranges / readout_ranges, then
|
|
|
|
|
|
// Done; replaces the old /score endpoint). >0 = decode up to this
|
|
|
|
|
|
// many tokens, stopping early on EOS / stop_token_ids.
|
|
|
|
|
|
uint32 max_tokens = 5;
|
|
|
|
|
|
|
|
|
|
|
|
// Position ranges (absolute, within the session's post-append
|
|
|
|
|
|
// token list) at which to emit logprobs on Token events. Empty =
|
|
|
|
|
|
// no logprobs. `logprob_top_k > 0` returns the top-k alternative
|
|
|
|
|
|
// tokens at each covered position; `logprob_top_k == 0` returns
|
|
|
|
|
|
// only the sampled-token's logprob.
|
|
|
|
|
|
repeated PositionRange logprobs_ranges = 6;
|
|
|
|
|
|
uint32 logprob_top_k = 7;
|
|
|
|
|
|
|
|
|
|
|
|
// Position ranges at which to emit concept-readout vectors. Empty
|
|
|
|
|
|
// = no readouts. Logical shape per position is
|
|
|
|
|
|
// [n_layers][n_concepts] — see GetReadoutManifest.
|
|
|
|
|
|
repeated PositionRange readout_ranges = 8;
|
|
|
|
|
|
|
|
|
|
|
|
// Sampling parameters. Meaningful only when max_tokens > 0.
|
|
|
|
|
|
float temperature = 9; // default 1.0 when zero
|
|
|
|
|
|
float top_p = 10; // default 1.0 when zero
|
|
|
|
|
|
uint32 top_k = 11; // default 0 (disabled)
|
|
|
|
|
|
repeated uint32 stop_token_ids = 12;
|
|
|
|
|
|
|
|
|
|
|
|
// vLLM scheduler priority (0 = interactive, 10 = batch).
|
|
|
|
|
|
int32 priority = 13;
|
salience: client-side pad expansion, drop AppendImage
Mirrors the vLLM-side rewrite. AppendImage is gone; images now
ride along on Generate via a parallel `images` list.
- Productionize `qwen3_image_token_count` (was test-only). Image
leaf computes its IMAGE_PAD count eagerly at construction from
height/width; `token_count` is no longer "0 until the server
tells us."
- WireChunk shrinks to a single `Tokens(Vec<u32>)` variant — vision
blocks live inline in the token stream.
- `wire_chunks` now returns `(Vec<WireChunk>, Vec<WireImage>)`.
`WireImage` carries `pad_start` / `pad_end` (absolute positions
in the full walk) alongside bytes + mime.
- `assemble_prompt` returns `(chunks, images, match_upto)`.
- `stream_session_mm` / `run_session_generate` take the parallel
images list, filter to those past `match_upto`, and pass them
in `GenerateRequest.images` as `pb::ImageAttachment` entries.
- Drop `SessionHandle::append_image`,
`ContextState::commit_image_token_counts`,
`StreamToken::ImageAppended`, the WireChunk::Image branch in
`learn.rs`, and the now-empty `prompt_to_chunks` helper.
- Add 'v' toggle on the conscious-screen tree to render token-id
vectors in place of text content (debug-aid: lets us see what
the server actually has when output is suspicious).
- Comment out the subconscious-trigger spawn loop — Kent had this
disabled before; it had crept back into running.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 20:26:47 -04:00
|
|
|
|
|
|
|
|
|
|
// Images newly attached on this call. Each entry describes one
|
|
|
|
|
|
// image's binary bytes, its mime type, and the exact token-position
|
|
|
|
|
|
// range of its pre-expanded placeholder run inside `session.tokens`
|
|
|
|
|
|
// after `append_tokens` is applied. See `ImageAttachment`.
|
|
|
|
|
|
repeated ImageAttachment images = 14;
|
salience: add gRPC client + TLS plumbing for stateful vllm sessions
Adds the client-side of a stateful gRPC protocol against vllm, plus
the TLS trust machinery so we can talk to self-signed vllm servers.
Protocol (proto/salience.proto):
Bidi-streaming Session RPC carries OpenSession / AppendTokens /
Generate / Cancel from client and SessionReady / PrefillProgress /
Token / GenerateDone / Error from server. Separate Fork unary RPC
for cheap branching (prefix cache shares KV automatically). Plus
ListSessions, CloseSession, GetReadoutManifest admin RPCs.
Per-token readouts ship as packed f32 ([n_layers * n_concepts] per
token, flat). Logprobs use range-selected positions plus a top-k
parameter — empty ranges means no logprobs, any range means emit
sampled-token logprob at those positions, top_k > 0 adds
alternatives.
Client (src/agent/api/salience.rs):
Tonic-generated types under pb::, a connect() helper, with_auth()
for bearer metadata, and a Session handle wrapping the bidi stream:
open() handshakes SessionReady; append() is fire-and-forget;
generate() returns impl Stream<Item = Event> that drains inbound
until Done or terminating Error. One generate at a time per session.
Peak picker (src/agent/salience.rs):
Pure function over ReadoutEntry traces. Per-concept z-score against
trace global stats; contiguous above-threshold regions emit one
peak at the local max. Configurable sigma threshold and min-std
safety floor. Deterministic tie-break on offset then concept name.
12 unit tests covering empty traces, flat channels, single/multi
spikes, contiguous humps, multi-concept independence, trailing
runs, sub-threshold noise, layer-out-of-range, manifest shape
mismatch, and threshold tunability.
TLS (src/agent/api/http.rs):
HttpClient::build now also loads every .pem file under
~/.consciousness/certs/ into the rustls root store — so dropping
a <host>.pem in that directory is enough to trust a new self-
signed server; no code changes per new host. Also installs the
rustls default crypto provider explicitly via OnceLock: tonic's
tls features pulled in both ring and aws-lc-rs on the resolver
path, and rustls 0.23 refuses to auto-pick when either could win.
Build (build.rs, Cargo.toml):
tonic-build generates Rust types from proto/salience.proto at
cargo-build time, using a vendored protoc binary
(protoc-bin-vendored) so no system install is required. New
runtime deps: tonic, prost, async-stream, tokio-stream,
rustls-pemfile.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-23 02:21:07 -04:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
message PositionRange {
|
|
|
|
|
|
uint32 start = 1; // inclusive
|
|
|
|
|
|
uint32 end = 2; // exclusive
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
message GenerateEvent {
|
|
|
|
|
|
oneof event {
|
|
|
|
|
|
Token token = 1;
|
|
|
|
|
|
GenerateDone done = 2;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
message Token {
|
|
|
|
|
|
// Token id at this position. For prefill this is the prompt token;
|
|
|
|
|
|
// for decode it's the sampled token.
|
|
|
|
|
|
uint32 id = 1;
|
|
|
|
|
|
|
|
|
|
|
|
// Absolute position in the session's token list.
|
|
|
|
|
|
uint32 position = 2;
|
|
|
|
|
|
|
|
|
|
|
|
// True for prefill positions, false for decode.
|
|
|
|
|
|
bool is_prefill = 3;
|
|
|
|
|
|
|
|
|
|
|
|
// Concept readout at this position. Empty if the position wasn't
|
|
|
|
|
|
// covered by readout_ranges.
|
|
|
|
|
|
repeated float readout = 4 [packed = true];
|
|
|
|
|
|
|
|
|
|
|
|
// Top-k alternative tokens' logprobs at this position — populated
|
|
|
|
|
|
// when the position is covered by logprobs_ranges and
|
|
|
|
|
|
// logprob_top_k > 0.
|
|
|
|
|
|
repeated TokenLogprob logprobs = 5;
|
|
|
|
|
|
|
|
|
|
|
|
// Logprob of the token at `position` (the prompt token for
|
|
|
|
|
|
// prefill, the sampled token for decode). Populated when the
|
|
|
|
|
|
// position is covered by logprobs_ranges.
|
|
|
|
|
|
float sampled_logprob = 6;
|
|
|
|
|
|
bool has_sampled_logprob = 7;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
message TokenLogprob {
|
|
|
|
|
|
uint32 id = 1;
|
|
|
|
|
|
float logprob = 2;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
message GenerateDone {
|
|
|
|
|
|
uint32 prompt_tokens = 1;
|
|
|
|
|
|
uint32 completion_tokens = 2;
|
|
|
|
|
|
uint32 total_tokens = 3;
|
|
|
|
|
|
|
|
|
|
|
|
enum FinishReason {
|
|
|
|
|
|
FINISH_REASON_UNSPECIFIED = 0;
|
|
|
|
|
|
FINISH_REASON_EOS = 1; // emitted EOS / stop token
|
|
|
|
|
|
FINISH_REASON_LENGTH = 2; // hit max_tokens
|
|
|
|
|
|
FINISH_REASON_CANCELLED = 3; // client cancelled
|
|
|
|
|
|
FINISH_REASON_STOP_STRING = 4; // matched a stop string
|
|
|
|
|
|
}
|
|
|
|
|
|
FinishReason finish_reason = 4;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// ============================================================
|
|
|
|
|
|
// Readout manifest
|
|
|
|
|
|
// ============================================================
|
|
|
|
|
|
|
|
|
|
|
|
message GetReadoutManifestRequest {}
|
|
|
|
|
|
|
|
|
|
|
|
message ReadoutManifest {
|
|
|
|
|
|
repeated string concepts = 1;
|
|
|
|
|
|
repeated uint32 layers = 2;
|
|
|
|
|
|
uint32 hidden_size = 3;
|
|
|
|
|
|
string dtype = 4;
|
|
|
|
|
|
}
|
salience: client-side pad expansion, drop AppendImage
Mirrors the vLLM-side rewrite. AppendImage is gone; images now
ride along on Generate via a parallel `images` list.
- Productionize `qwen3_image_token_count` (was test-only). Image
leaf computes its IMAGE_PAD count eagerly at construction from
height/width; `token_count` is no longer "0 until the server
tells us."
- WireChunk shrinks to a single `Tokens(Vec<u32>)` variant — vision
blocks live inline in the token stream.
- `wire_chunks` now returns `(Vec<WireChunk>, Vec<WireImage>)`.
`WireImage` carries `pad_start` / `pad_end` (absolute positions
in the full walk) alongside bytes + mime.
- `assemble_prompt` returns `(chunks, images, match_upto)`.
- `stream_session_mm` / `run_session_generate` take the parallel
images list, filter to those past `match_upto`, and pass them
in `GenerateRequest.images` as `pb::ImageAttachment` entries.
- Drop `SessionHandle::append_image`,
`ContextState::commit_image_token_counts`,
`StreamToken::ImageAppended`, the WireChunk::Image branch in
`learn.rs`, and the now-empty `prompt_to_chunks` helper.
- Add 'v' toggle on the conscious-screen tree to render token-id
vectors in place of text content (debug-aid: lets us see what
the server actually has when output is suspicious).
- Comment out the subconscious-trigger spawn loop — Kent had this
disabled before; it had crept back into running.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 20:26:47 -04:00
|
|
|
|
|
|
|
|
|
|
// ============================================================
|
|
|
|
|
|
// Debug
|
|
|
|
|
|
// ============================================================
|
|
|
|
|
|
|
|
|
|
|
|
message DumpSessionRequest {
|
|
|
|
|
|
string session_id = 1;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
message DumpSessionResponse {
|
|
|
|
|
|
// The full session.tokens sequence, verbatim.
|
|
|
|
|
|
repeated uint32 tokens = 1 [packed = true];
|
|
|
|
|
|
}
|