2026-04-02 14:13:23 -04:00
|
|
|
// api/ — LLM API client (OpenAI-compatible)
|
2026-03-25 00:52:41 -04:00
|
|
|
//
|
2026-04-02 14:13:23 -04:00
|
|
|
// Works with any provider that implements the OpenAI chat completions
|
|
|
|
|
// API: OpenRouter, vLLM, llama.cpp, Fireworks, Together, etc.
|
2026-03-25 00:52:41 -04:00
|
|
|
//
|
|
|
|
|
// Diagnostics: anomalies always logged to debug panel.
|
|
|
|
|
// Set POC_DEBUG=1 for verbose per-turn logging.
|
|
|
|
|
|
2026-04-07 12:50:40 -04:00
|
|
|
pub mod http;
|
salience: add gRPC client + TLS plumbing for stateful vllm sessions
Adds the client-side of a stateful gRPC protocol against vllm, plus
the TLS trust machinery so we can talk to self-signed vllm servers.
Protocol (proto/salience.proto):
Bidi-streaming Session RPC carries OpenSession / AppendTokens /
Generate / Cancel from client and SessionReady / PrefillProgress /
Token / GenerateDone / Error from server. Separate Fork unary RPC
for cheap branching (prefix cache shares KV automatically). Plus
ListSessions, CloseSession, GetReadoutManifest admin RPCs.
Per-token readouts ship as packed f32 ([n_layers * n_concepts] per
token, flat). Logprobs use range-selected positions plus a top-k
parameter — empty ranges means no logprobs, any range means emit
sampled-token logprob at those positions, top_k > 0 adds
alternatives.
Client (src/agent/api/salience.rs):
Tonic-generated types under pb::, a connect() helper, with_auth()
for bearer metadata, and a Session handle wrapping the bidi stream:
open() handshakes SessionReady; append() is fire-and-forget;
generate() returns impl Stream<Item = Event> that drains inbound
until Done or terminating Error. One generate at a time per session.
Peak picker (src/agent/salience.rs):
Pure function over ReadoutEntry traces. Per-concept z-score against
trace global stats; contiguous above-threshold regions emit one
peak at the local max. Configurable sigma threshold and min-std
safety floor. Deterministic tie-break on offset then concept name.
12 unit tests covering empty traces, flat channels, single/multi
spikes, contiguous humps, multi-concept independence, trailing
runs, sub-threshold noise, layer-out-of-range, manifest shape
mismatch, and threshold tunability.
TLS (src/agent/api/http.rs):
HttpClient::build now also loads every .pem file under
~/.consciousness/certs/ into the rustls root store — so dropping
a <host>.pem in that directory is enough to trust a new self-
signed server; no code changes per new host. Also installs the
rustls default crypto provider explicitly via OnceLock: tonic's
tls features pulled in both ring and aws-lc-rs on the resolver
path, and rustls 0.23 refuses to auto-pick when either could win.
Build (build.rs, Cargo.toml):
tonic-build generates Rust types from proto/salience.proto at
cargo-build time, using a vendored protoc binary
(protoc-bin-vendored) so no system install is required. New
runtime deps: tonic, prost, async-stream, tokio-stream,
rustls-pemfile.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-23 02:21:07 -04:00
|
|
|
pub mod salience;
|
2026-04-04 00:29:11 -04:00
|
|
|
|
salience: add gRPC client + TLS plumbing for stateful vllm sessions
Adds the client-side of a stateful gRPC protocol against vllm, plus
the TLS trust machinery so we can talk to self-signed vllm servers.
Protocol (proto/salience.proto):
Bidi-streaming Session RPC carries OpenSession / AppendTokens /
Generate / Cancel from client and SessionReady / PrefillProgress /
Token / GenerateDone / Error from server. Separate Fork unary RPC
for cheap branching (prefix cache shares KV automatically). Plus
ListSessions, CloseSession, GetReadoutManifest admin RPCs.
Per-token readouts ship as packed f32 ([n_layers * n_concepts] per
token, flat). Logprobs use range-selected positions plus a top-k
parameter — empty ranges means no logprobs, any range means emit
sampled-token logprob at those positions, top_k > 0 adds
alternatives.
Client (src/agent/api/salience.rs):
Tonic-generated types under pb::, a connect() helper, with_auth()
for bearer metadata, and a Session handle wrapping the bidi stream:
open() handshakes SessionReady; append() is fire-and-forget;
generate() returns impl Stream<Item = Event> that drains inbound
until Done or terminating Error. One generate at a time per session.
Peak picker (src/agent/salience.rs):
Pure function over ReadoutEntry traces. Per-concept z-score against
trace global stats; contiguous above-threshold regions emit one
peak at the local max. Configurable sigma threshold and min-std
safety floor. Deterministic tie-break on offset then concept name.
12 unit tests covering empty traces, flat channels, single/multi
spikes, contiguous humps, multi-concept independence, trailing
runs, sub-threshold noise, layer-out-of-range, manifest shape
mismatch, and threshold tunability.
TLS (src/agent/api/http.rs):
HttpClient::build now also loads every .pem file under
~/.consciousness/certs/ into the rustls root store — so dropping
a <host>.pem in that directory is enough to trust a new self-
signed server; no code changes per new host. Also installs the
rustls default crypto provider explicitly via OnceLock: tonic's
tls features pulled in both ring and aws-lc-rs on the resolver
path, and rustls 0.23 refuses to auto-pick when either could win.
Build (build.rs, Cargo.toml):
tonic-build generates Rust types from proto/salience.proto at
cargo-build time, using a vendored protoc binary
(protoc-bin-vendored) so no system install is required. New
runtime deps: tonic, prost, async-stream, tokio-stream,
rustls-pemfile.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-23 02:21:07 -04:00
|
|
|
use std::time::Duration;
|
2026-04-08 16:32:00 -04:00
|
|
|
use anyhow::Result;
|
2026-03-29 21:22:42 -04:00
|
|
|
use tokio::sync::mpsc;
|
2026-04-08 15:15:21 -04:00
|
|
|
use serde::Deserialize;
|
|
|
|
|
|
salience: add gRPC client + TLS plumbing for stateful vllm sessions
Adds the client-side of a stateful gRPC protocol against vllm, plus
the TLS trust machinery so we can talk to self-signed vllm servers.
Protocol (proto/salience.proto):
Bidi-streaming Session RPC carries OpenSession / AppendTokens /
Generate / Cancel from client and SessionReady / PrefillProgress /
Token / GenerateDone / Error from server. Separate Fork unary RPC
for cheap branching (prefix cache shares KV automatically). Plus
ListSessions, CloseSession, GetReadoutManifest admin RPCs.
Per-token readouts ship as packed f32 ([n_layers * n_concepts] per
token, flat). Logprobs use range-selected positions plus a top-k
parameter — empty ranges means no logprobs, any range means emit
sampled-token logprob at those positions, top_k > 0 adds
alternatives.
Client (src/agent/api/salience.rs):
Tonic-generated types under pb::, a connect() helper, with_auth()
for bearer metadata, and a Session handle wrapping the bidi stream:
open() handshakes SessionReady; append() is fire-and-forget;
generate() returns impl Stream<Item = Event> that drains inbound
until Done or terminating Error. One generate at a time per session.
Peak picker (src/agent/salience.rs):
Pure function over ReadoutEntry traces. Per-concept z-score against
trace global stats; contiguous above-threshold regions emit one
peak at the local max. Configurable sigma threshold and min-std
safety floor. Deterministic tie-break on offset then concept name.
12 unit tests covering empty traces, flat channels, single/multi
spikes, contiguous humps, multi-concept independence, trailing
runs, sub-threshold noise, layer-out-of-range, manifest shape
mismatch, and threshold tunability.
TLS (src/agent/api/http.rs):
HttpClient::build now also loads every .pem file under
~/.consciousness/certs/ into the rustls root store — so dropping
a <host>.pem in that directory is enough to trust a new self-
signed server; no code changes per new host. Also installs the
rustls default crypto provider explicitly via OnceLock: tonic's
tls features pulled in both ring and aws-lc-rs on the resolver
path, and rustls 0.23 refuses to auto-pick when either could win.
Build (build.rs, Cargo.toml):
tonic-build generates Rust types from proto/salience.proto at
cargo-build time, using a vendored protoc binary
(protoc-bin-vendored) so no system install is required. New
runtime deps: tonic, prost, async-stream, tokio-stream,
rustls-pemfile.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-23 02:21:07 -04:00
|
|
|
use http::HttpClient;
|
2026-04-08 16:32:00 -04:00
|
|
|
|
2026-04-08 15:15:21 -04:00
|
|
|
#[derive(Debug, Clone, Deserialize)]
|
|
|
|
|
pub struct Usage {
|
|
|
|
|
pub prompt_tokens: u32,
|
|
|
|
|
pub completion_tokens: u32,
|
|
|
|
|
pub total_tokens: u32,
|
|
|
|
|
}
|
2026-03-29 21:22:42 -04:00
|
|
|
|
2026-04-18 01:15:46 -04:00
|
|
|
/// Concept-readout manifest returned by the vLLM server's
|
|
|
|
|
/// `/v1/readout/manifest` endpoint. Maps the nameless tensor indices
|
|
|
|
|
/// in streaming `readout` fields back to concept names and layer
|
|
|
|
|
/// indices.
|
|
|
|
|
#[derive(Debug, Clone, Deserialize)]
|
|
|
|
|
pub struct ReadoutManifest {
|
|
|
|
|
pub concepts: Vec<String>,
|
|
|
|
|
pub layers: Vec<u32>,
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/// Per-token per-layer concept projections streamed alongside each
|
|
|
|
|
/// sampled token. Shape `[n_layers][n_concepts]`. Named values come
|
|
|
|
|
/// from pairing with the manifest fetched at startup.
|
|
|
|
|
pub type TokenReadout = Vec<Vec<f32>>;
|
|
|
|
|
|
2026-04-02 18:41:02 -04:00
|
|
|
/// A JoinHandle that aborts its task when dropped.
|
2026-04-07 13:43:25 -04:00
|
|
|
pub(crate) struct AbortOnDrop(tokio::task::JoinHandle<()>);
|
2026-04-02 18:41:02 -04:00
|
|
|
|
|
|
|
|
impl Drop for AbortOnDrop {
|
|
|
|
|
fn drop(&mut self) {
|
|
|
|
|
self.0.abort();
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2026-04-04 13:48:24 -04:00
|
|
|
/// Sampling parameters for model generation.
|
|
|
|
|
#[derive(Clone, Copy)]
|
salience: add gRPC client + TLS plumbing for stateful vllm sessions
Adds the client-side of a stateful gRPC protocol against vllm, plus
the TLS trust machinery so we can talk to self-signed vllm servers.
Protocol (proto/salience.proto):
Bidi-streaming Session RPC carries OpenSession / AppendTokens /
Generate / Cancel from client and SessionReady / PrefillProgress /
Token / GenerateDone / Error from server. Separate Fork unary RPC
for cheap branching (prefix cache shares KV automatically). Plus
ListSessions, CloseSession, GetReadoutManifest admin RPCs.
Per-token readouts ship as packed f32 ([n_layers * n_concepts] per
token, flat). Logprobs use range-selected positions plus a top-k
parameter — empty ranges means no logprobs, any range means emit
sampled-token logprob at those positions, top_k > 0 adds
alternatives.
Client (src/agent/api/salience.rs):
Tonic-generated types under pb::, a connect() helper, with_auth()
for bearer metadata, and a Session handle wrapping the bidi stream:
open() handshakes SessionReady; append() is fire-and-forget;
generate() returns impl Stream<Item = Event> that drains inbound
until Done or terminating Error. One generate at a time per session.
Peak picker (src/agent/salience.rs):
Pure function over ReadoutEntry traces. Per-concept z-score against
trace global stats; contiguous above-threshold regions emit one
peak at the local max. Configurable sigma threshold and min-std
safety floor. Deterministic tie-break on offset then concept name.
12 unit tests covering empty traces, flat channels, single/multi
spikes, contiguous humps, multi-concept independence, trailing
runs, sub-threshold noise, layer-out-of-range, manifest shape
mismatch, and threshold tunability.
TLS (src/agent/api/http.rs):
HttpClient::build now also loads every .pem file under
~/.consciousness/certs/ into the rustls root store — so dropping
a <host>.pem in that directory is enough to trust a new self-
signed server; no code changes per new host. Also installs the
rustls default crypto provider explicitly via OnceLock: tonic's
tls features pulled in both ring and aws-lc-rs on the resolver
path, and rustls 0.23 refuses to auto-pick when either could win.
Build (build.rs, Cargo.toml):
tonic-build generates Rust types from proto/salience.proto at
cargo-build time, using a vendored protoc binary
(protoc-bin-vendored) so no system install is required. New
runtime deps: tonic, prost, async-stream, tokio-stream,
rustls-pemfile.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-23 02:21:07 -04:00
|
|
|
#[allow(dead_code)] // fields used once Generate RPC lands in a later step
|
2026-04-07 13:43:25 -04:00
|
|
|
pub(crate) struct SamplingParams {
|
2026-04-04 13:48:24 -04:00
|
|
|
pub temperature: f32,
|
|
|
|
|
pub top_p: f32,
|
|
|
|
|
pub top_k: u32,
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-29 21:22:42 -04:00
|
|
|
// ─────────────────────────────────────────────────────────────
|
|
|
|
|
// Stream events — yielded by backends, consumed by the runner
|
|
|
|
|
// ─────────────────────────────────────────────────────────────
|
|
|
|
|
|
2026-04-08 14:55:10 -04:00
|
|
|
/// One token from the streaming completions API.
|
2026-04-08 16:35:57 -04:00
|
|
|
pub enum StreamToken {
|
2026-04-18 01:15:46 -04:00
|
|
|
/// A sampled token, optionally with its per-layer concept readout.
|
|
|
|
|
/// `readout` is `None` when the server has readout disabled or
|
|
|
|
|
/// returned no readout for this chunk.
|
|
|
|
|
Token { id: u32, readout: Option<TokenReadout> },
|
2026-04-08 14:55:10 -04:00
|
|
|
Done { usage: Option<Usage> },
|
2026-03-29 21:22:42 -04:00
|
|
|
Error(String),
|
|
|
|
|
}
|
2026-03-25 00:52:41 -04:00
|
|
|
|
2026-04-02 22:18:50 -04:00
|
|
|
#[derive(Clone)]
|
2026-03-25 00:52:41 -04:00
|
|
|
pub struct ApiClient {
|
2026-04-07 12:50:40 -04:00
|
|
|
client: HttpClient,
|
2026-03-25 00:52:41 -04:00
|
|
|
api_key: String,
|
|
|
|
|
pub model: String,
|
2026-04-02 14:13:23 -04:00
|
|
|
base_url: String,
|
salience: add gRPC client + TLS plumbing for stateful vllm sessions
Adds the client-side of a stateful gRPC protocol against vllm, plus
the TLS trust machinery so we can talk to self-signed vllm servers.
Protocol (proto/salience.proto):
Bidi-streaming Session RPC carries OpenSession / AppendTokens /
Generate / Cancel from client and SessionReady / PrefillProgress /
Token / GenerateDone / Error from server. Separate Fork unary RPC
for cheap branching (prefix cache shares KV automatically). Plus
ListSessions, CloseSession, GetReadoutManifest admin RPCs.
Per-token readouts ship as packed f32 ([n_layers * n_concepts] per
token, flat). Logprobs use range-selected positions plus a top-k
parameter — empty ranges means no logprobs, any range means emit
sampled-token logprob at those positions, top_k > 0 adds
alternatives.
Client (src/agent/api/salience.rs):
Tonic-generated types under pb::, a connect() helper, with_auth()
for bearer metadata, and a Session handle wrapping the bidi stream:
open() handshakes SessionReady; append() is fire-and-forget;
generate() returns impl Stream<Item = Event> that drains inbound
until Done or terminating Error. One generate at a time per session.
Peak picker (src/agent/salience.rs):
Pure function over ReadoutEntry traces. Per-concept z-score against
trace global stats; contiguous above-threshold regions emit one
peak at the local max. Configurable sigma threshold and min-std
safety floor. Deterministic tie-break on offset then concept name.
12 unit tests covering empty traces, flat channels, single/multi
spikes, contiguous humps, multi-concept independence, trailing
runs, sub-threshold noise, layer-out-of-range, manifest shape
mismatch, and threshold tunability.
TLS (src/agent/api/http.rs):
HttpClient::build now also loads every .pem file under
~/.consciousness/certs/ into the rustls root store — so dropping
a <host>.pem in that directory is enough to trust a new self-
signed server; no code changes per new host. Also installs the
rustls default crypto provider explicitly via OnceLock: tonic's
tls features pulled in both ring and aws-lc-rs on the resolver
path, and rustls 0.23 refuses to auto-pick when either could win.
Build (build.rs, Cargo.toml):
tonic-build generates Rust types from proto/salience.proto at
cargo-build time, using a vendored protoc binary
(protoc-bin-vendored) so no system install is required. New
runtime deps: tonic, prost, async-stream, tokio-stream,
rustls-pemfile.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-23 02:21:07 -04:00
|
|
|
/// Cached readout manifest — fetched once per process and shared
|
|
|
|
|
/// across ApiClient clones (every Agent/fork gets the same cell).
|
|
|
|
|
/// `None` after fetch means the server has readout disabled (404).
|
|
|
|
|
manifest: std::sync::Arc<tokio::sync::OnceCell<Option<ReadoutManifest>>>,
|
2026-03-25 00:52:41 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
impl ApiClient {
|
|
|
|
|
pub fn new(base_url: &str, api_key: &str, model: &str) -> Self {
|
2026-04-07 12:50:40 -04:00
|
|
|
let client = HttpClient::builder()
|
2026-03-25 00:52:41 -04:00
|
|
|
.connect_timeout(Duration::from_secs(30))
|
|
|
|
|
.timeout(Duration::from_secs(600))
|
2026-04-07 12:50:40 -04:00
|
|
|
.build();
|
2026-03-25 00:52:41 -04:00
|
|
|
|
|
|
|
|
Self {
|
|
|
|
|
client,
|
|
|
|
|
api_key: api_key.to_string(),
|
|
|
|
|
model: model.to_string(),
|
2026-04-02 14:13:23 -04:00
|
|
|
base_url: base_url.trim_end_matches('/').to_string(),
|
salience: add gRPC client + TLS plumbing for stateful vllm sessions
Adds the client-side of a stateful gRPC protocol against vllm, plus
the TLS trust machinery so we can talk to self-signed vllm servers.
Protocol (proto/salience.proto):
Bidi-streaming Session RPC carries OpenSession / AppendTokens /
Generate / Cancel from client and SessionReady / PrefillProgress /
Token / GenerateDone / Error from server. Separate Fork unary RPC
for cheap branching (prefix cache shares KV automatically). Plus
ListSessions, CloseSession, GetReadoutManifest admin RPCs.
Per-token readouts ship as packed f32 ([n_layers * n_concepts] per
token, flat). Logprobs use range-selected positions plus a top-k
parameter — empty ranges means no logprobs, any range means emit
sampled-token logprob at those positions, top_k > 0 adds
alternatives.
Client (src/agent/api/salience.rs):
Tonic-generated types under pb::, a connect() helper, with_auth()
for bearer metadata, and a Session handle wrapping the bidi stream:
open() handshakes SessionReady; append() is fire-and-forget;
generate() returns impl Stream<Item = Event> that drains inbound
until Done or terminating Error. One generate at a time per session.
Peak picker (src/agent/salience.rs):
Pure function over ReadoutEntry traces. Per-concept z-score against
trace global stats; contiguous above-threshold regions emit one
peak at the local max. Configurable sigma threshold and min-std
safety floor. Deterministic tie-break on offset then concept name.
12 unit tests covering empty traces, flat channels, single/multi
spikes, contiguous humps, multi-concept independence, trailing
runs, sub-threshold noise, layer-out-of-range, manifest shape
mismatch, and threshold tunability.
TLS (src/agent/api/http.rs):
HttpClient::build now also loads every .pem file under
~/.consciousness/certs/ into the rustls root store — so dropping
a <host>.pem in that directory is enough to trust a new self-
signed server; no code changes per new host. Also installs the
rustls default crypto provider explicitly via OnceLock: tonic's
tls features pulled in both ring and aws-lc-rs on the resolver
path, and rustls 0.23 refuses to auto-pick when either could win.
Build (build.rs, Cargo.toml):
tonic-build generates Rust types from proto/salience.proto at
cargo-build time, using a vendored protoc binary
(protoc-bin-vendored) so no system install is required. New
runtime deps: tonic, prost, async-stream, tokio-stream,
rustls-pemfile.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-23 02:21:07 -04:00
|
|
|
manifest: std::sync::Arc::new(tokio::sync::OnceCell::new()),
|
2026-03-25 00:52:41 -04:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
salience: add gRPC client + TLS plumbing for stateful vllm sessions
Adds the client-side of a stateful gRPC protocol against vllm, plus
the TLS trust machinery so we can talk to self-signed vllm servers.
Protocol (proto/salience.proto):
Bidi-streaming Session RPC carries OpenSession / AppendTokens /
Generate / Cancel from client and SessionReady / PrefillProgress /
Token / GenerateDone / Error from server. Separate Fork unary RPC
for cheap branching (prefix cache shares KV automatically). Plus
ListSessions, CloseSession, GetReadoutManifest admin RPCs.
Per-token readouts ship as packed f32 ([n_layers * n_concepts] per
token, flat). Logprobs use range-selected positions plus a top-k
parameter — empty ranges means no logprobs, any range means emit
sampled-token logprob at those positions, top_k > 0 adds
alternatives.
Client (src/agent/api/salience.rs):
Tonic-generated types under pb::, a connect() helper, with_auth()
for bearer metadata, and a Session handle wrapping the bidi stream:
open() handshakes SessionReady; append() is fire-and-forget;
generate() returns impl Stream<Item = Event> that drains inbound
until Done or terminating Error. One generate at a time per session.
Peak picker (src/agent/salience.rs):
Pure function over ReadoutEntry traces. Per-concept z-score against
trace global stats; contiguous above-threshold regions emit one
peak at the local max. Configurable sigma threshold and min-std
safety floor. Deterministic tie-break on offset then concept name.
12 unit tests covering empty traces, flat channels, single/multi
spikes, contiguous humps, multi-concept independence, trailing
runs, sub-threshold noise, layer-out-of-range, manifest shape
mismatch, and threshold tunability.
TLS (src/agent/api/http.rs):
HttpClient::build now also loads every .pem file under
~/.consciousness/certs/ into the rustls root store — so dropping
a <host>.pem in that directory is enough to trust a new self-
signed server; no code changes per new host. Also installs the
rustls default crypto provider explicitly via OnceLock: tonic's
tls features pulled in both ring and aws-lc-rs on the resolver
path, and rustls 0.23 refuses to auto-pick when either could win.
Build (build.rs, Cargo.toml):
tonic-build generates Rust types from proto/salience.proto at
cargo-build time, using a vendored protoc binary
(protoc-bin-vendored) so no system install is required. New
runtime deps: tonic, prost, async-stream, tokio-stream,
rustls-pemfile.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-23 02:21:07 -04:00
|
|
|
/// Stream generation via a gRPC session. Stubbed during the
|
|
|
|
|
/// unary-rewrite transition — the Generate RPC is wired in a
|
|
|
|
|
/// later step of this series. Until then, callers that reach
|
|
|
|
|
/// this path get a StreamToken::Error.
|
|
|
|
|
pub(crate) fn stream_session_mm(
|
2026-04-16 18:08:26 -04:00
|
|
|
&self,
|
salience: add gRPC client + TLS plumbing for stateful vllm sessions
Adds the client-side of a stateful gRPC protocol against vllm, plus
the TLS trust machinery so we can talk to self-signed vllm servers.
Protocol (proto/salience.proto):
Bidi-streaming Session RPC carries OpenSession / AppendTokens /
Generate / Cancel from client and SessionReady / PrefillProgress /
Token / GenerateDone / Error from server. Separate Fork unary RPC
for cheap branching (prefix cache shares KV automatically). Plus
ListSessions, CloseSession, GetReadoutManifest admin RPCs.
Per-token readouts ship as packed f32 ([n_layers * n_concepts] per
token, flat). Logprobs use range-selected positions plus a top-k
parameter — empty ranges means no logprobs, any range means emit
sampled-token logprob at those positions, top_k > 0 adds
alternatives.
Client (src/agent/api/salience.rs):
Tonic-generated types under pb::, a connect() helper, with_auth()
for bearer metadata, and a Session handle wrapping the bidi stream:
open() handshakes SessionReady; append() is fire-and-forget;
generate() returns impl Stream<Item = Event> that drains inbound
until Done or terminating Error. One generate at a time per session.
Peak picker (src/agent/salience.rs):
Pure function over ReadoutEntry traces. Per-concept z-score against
trace global stats; contiguous above-threshold regions emit one
peak at the local max. Configurable sigma threshold and min-std
safety floor. Deterministic tie-break on offset then concept name.
12 unit tests covering empty traces, flat channels, single/multi
spikes, contiguous humps, multi-concept independence, trailing
runs, sub-threshold noise, layer-out-of-range, manifest shape
mismatch, and threshold tunability.
TLS (src/agent/api/http.rs):
HttpClient::build now also loads every .pem file under
~/.consciousness/certs/ into the rustls root store — so dropping
a <host>.pem in that directory is enough to trust a new self-
signed server; no code changes per new host. Also installs the
rustls default crypto provider explicitly via OnceLock: tonic's
tls features pulled in both ring and aws-lc-rs on the resolver
path, and rustls 0.23 refuses to auto-pick when either could win.
Build (build.rs, Cargo.toml):
tonic-build generates Rust types from proto/salience.proto at
cargo-build time, using a vendored protoc binary
(protoc-bin-vendored) so no system install is required. New
runtime deps: tonic, prost, async-stream, tokio-stream,
rustls-pemfile.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-23 02:21:07 -04:00
|
|
|
_session_lock: std::sync::Arc<crate::Mutex<Option<salience::SessionHandle>>>,
|
|
|
|
|
_prompt_tokens: &[u32],
|
|
|
|
|
_images: &[super::context::WireImage],
|
|
|
|
|
_sampling: SamplingParams,
|
|
|
|
|
_priority: Option<i32>,
|
2026-04-08 14:55:10 -04:00
|
|
|
) -> (mpsc::UnboundedReceiver<StreamToken>, AbortOnDrop) {
|
2026-04-08 11:42:22 -04:00
|
|
|
let (tx, rx) = mpsc::unbounded_channel();
|
|
|
|
|
let handle = tokio::spawn(async move {
|
salience: add gRPC client + TLS plumbing for stateful vllm sessions
Adds the client-side of a stateful gRPC protocol against vllm, plus
the TLS trust machinery so we can talk to self-signed vllm servers.
Protocol (proto/salience.proto):
Bidi-streaming Session RPC carries OpenSession / AppendTokens /
Generate / Cancel from client and SessionReady / PrefillProgress /
Token / GenerateDone / Error from server. Separate Fork unary RPC
for cheap branching (prefix cache shares KV automatically). Plus
ListSessions, CloseSession, GetReadoutManifest admin RPCs.
Per-token readouts ship as packed f32 ([n_layers * n_concepts] per
token, flat). Logprobs use range-selected positions plus a top-k
parameter — empty ranges means no logprobs, any range means emit
sampled-token logprob at those positions, top_k > 0 adds
alternatives.
Client (src/agent/api/salience.rs):
Tonic-generated types under pb::, a connect() helper, with_auth()
for bearer metadata, and a Session handle wrapping the bidi stream:
open() handshakes SessionReady; append() is fire-and-forget;
generate() returns impl Stream<Item = Event> that drains inbound
until Done or terminating Error. One generate at a time per session.
Peak picker (src/agent/salience.rs):
Pure function over ReadoutEntry traces. Per-concept z-score against
trace global stats; contiguous above-threshold regions emit one
peak at the local max. Configurable sigma threshold and min-std
safety floor. Deterministic tie-break on offset then concept name.
12 unit tests covering empty traces, flat channels, single/multi
spikes, contiguous humps, multi-concept independence, trailing
runs, sub-threshold noise, layer-out-of-range, manifest shape
mismatch, and threshold tunability.
TLS (src/agent/api/http.rs):
HttpClient::build now also loads every .pem file under
~/.consciousness/certs/ into the rustls root store — so dropping
a <host>.pem in that directory is enough to trust a new self-
signed server; no code changes per new host. Also installs the
rustls default crypto provider explicitly via OnceLock: tonic's
tls features pulled in both ring and aws-lc-rs on the resolver
path, and rustls 0.23 refuses to auto-pick when either could win.
Build (build.rs, Cargo.toml):
tonic-build generates Rust types from proto/salience.proto at
cargo-build time, using a vendored protoc binary
(protoc-bin-vendored) so no system install is required. New
runtime deps: tonic, prost, async-stream, tokio-stream,
rustls-pemfile.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-23 02:21:07 -04:00
|
|
|
let _ = tx.send(StreamToken::Error(
|
|
|
|
|
"Generate RPC not yet wired after protocol rewrite — see \
|
|
|
|
|
proto/salience.proto; AppendImage / Generate land next."
|
|
|
|
|
.into(),
|
|
|
|
|
));
|
2026-04-08 11:42:22 -04:00
|
|
|
});
|
|
|
|
|
(rx, AbortOnDrop(handle))
|
|
|
|
|
}
|
|
|
|
|
|
2026-04-02 22:13:55 -04:00
|
|
|
pub fn base_url(&self) -> &str { &self.base_url }
|
|
|
|
|
pub fn api_key(&self) -> &str { &self.api_key }
|
|
|
|
|
|
2026-04-18 01:15:46 -04:00
|
|
|
/// Fetch `/v1/readout/manifest` — returns `Ok(Some(..))` if
|
|
|
|
|
/// readout is enabled on the server, `Ok(None)` on 404 (disabled),
|
|
|
|
|
/// or an error on any other failure.
|
|
|
|
|
///
|
salience: add gRPC client + TLS plumbing for stateful vllm sessions
Adds the client-side of a stateful gRPC protocol against vllm, plus
the TLS trust machinery so we can talk to self-signed vllm servers.
Protocol (proto/salience.proto):
Bidi-streaming Session RPC carries OpenSession / AppendTokens /
Generate / Cancel from client and SessionReady / PrefillProgress /
Token / GenerateDone / Error from server. Separate Fork unary RPC
for cheap branching (prefix cache shares KV automatically). Plus
ListSessions, CloseSession, GetReadoutManifest admin RPCs.
Per-token readouts ship as packed f32 ([n_layers * n_concepts] per
token, flat). Logprobs use range-selected positions plus a top-k
parameter — empty ranges means no logprobs, any range means emit
sampled-token logprob at those positions, top_k > 0 adds
alternatives.
Client (src/agent/api/salience.rs):
Tonic-generated types under pb::, a connect() helper, with_auth()
for bearer metadata, and a Session handle wrapping the bidi stream:
open() handshakes SessionReady; append() is fire-and-forget;
generate() returns impl Stream<Item = Event> that drains inbound
until Done or terminating Error. One generate at a time per session.
Peak picker (src/agent/salience.rs):
Pure function over ReadoutEntry traces. Per-concept z-score against
trace global stats; contiguous above-threshold regions emit one
peak at the local max. Configurable sigma threshold and min-std
safety floor. Deterministic tie-break on offset then concept name.
12 unit tests covering empty traces, flat channels, single/multi
spikes, contiguous humps, multi-concept independence, trailing
runs, sub-threshold noise, layer-out-of-range, manifest shape
mismatch, and threshold tunability.
TLS (src/agent/api/http.rs):
HttpClient::build now also loads every .pem file under
~/.consciousness/certs/ into the rustls root store — so dropping
a <host>.pem in that directory is enough to trust a new self-
signed server; no code changes per new host. Also installs the
rustls default crypto provider explicitly via OnceLock: tonic's
tls features pulled in both ring and aws-lc-rs on the resolver
path, and rustls 0.23 refuses to auto-pick when either could win.
Build (build.rs, Cargo.toml):
tonic-build generates Rust types from proto/salience.proto at
cargo-build time, using a vendored protoc binary
(protoc-bin-vendored) so no system install is required. New
runtime deps: tonic, prost, async-stream, tokio-stream,
rustls-pemfile.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-23 02:21:07 -04:00
|
|
|
/// First call performs the HTTP fetch; subsequent calls (including
|
|
|
|
|
/// across ApiClient clones sharing the same cell) return the
|
|
|
|
|
/// cached result. The manifest doesn't change during a server run.
|
2026-04-18 01:15:46 -04:00
|
|
|
pub async fn fetch_readout_manifest(&self) -> Result<Option<ReadoutManifest>> {
|
salience: add gRPC client + TLS plumbing for stateful vllm sessions
Adds the client-side of a stateful gRPC protocol against vllm, plus
the TLS trust machinery so we can talk to self-signed vllm servers.
Protocol (proto/salience.proto):
Bidi-streaming Session RPC carries OpenSession / AppendTokens /
Generate / Cancel from client and SessionReady / PrefillProgress /
Token / GenerateDone / Error from server. Separate Fork unary RPC
for cheap branching (prefix cache shares KV automatically). Plus
ListSessions, CloseSession, GetReadoutManifest admin RPCs.
Per-token readouts ship as packed f32 ([n_layers * n_concepts] per
token, flat). Logprobs use range-selected positions plus a top-k
parameter — empty ranges means no logprobs, any range means emit
sampled-token logprob at those positions, top_k > 0 adds
alternatives.
Client (src/agent/api/salience.rs):
Tonic-generated types under pb::, a connect() helper, with_auth()
for bearer metadata, and a Session handle wrapping the bidi stream:
open() handshakes SessionReady; append() is fire-and-forget;
generate() returns impl Stream<Item = Event> that drains inbound
until Done or terminating Error. One generate at a time per session.
Peak picker (src/agent/salience.rs):
Pure function over ReadoutEntry traces. Per-concept z-score against
trace global stats; contiguous above-threshold regions emit one
peak at the local max. Configurable sigma threshold and min-std
safety floor. Deterministic tie-break on offset then concept name.
12 unit tests covering empty traces, flat channels, single/multi
spikes, contiguous humps, multi-concept independence, trailing
runs, sub-threshold noise, layer-out-of-range, manifest shape
mismatch, and threshold tunability.
TLS (src/agent/api/http.rs):
HttpClient::build now also loads every .pem file under
~/.consciousness/certs/ into the rustls root store — so dropping
a <host>.pem in that directory is enough to trust a new self-
signed server; no code changes per new host. Also installs the
rustls default crypto provider explicitly via OnceLock: tonic's
tls features pulled in both ring and aws-lc-rs on the resolver
path, and rustls 0.23 refuses to auto-pick when either could win.
Build (build.rs, Cargo.toml):
tonic-build generates Rust types from proto/salience.proto at
cargo-build time, using a vendored protoc binary
(protoc-bin-vendored) so no system install is required. New
runtime deps: tonic, prost, async-stream, tokio-stream,
rustls-pemfile.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-23 02:21:07 -04:00
|
|
|
let manifest = self.manifest.get_or_try_init(|| async {
|
|
|
|
|
let url = format!("{}/readout/manifest", self.base_url);
|
|
|
|
|
let auth = format!("Bearer {}", self.api_key);
|
|
|
|
|
let response = self
|
|
|
|
|
.client
|
|
|
|
|
.get_with_headers(&url, &[("Authorization", &auth)])
|
|
|
|
|
.await
|
|
|
|
|
.map_err(|e| anyhow::anyhow!("readout manifest fetch ({}): {}", url, e))?;
|
|
|
|
|
let status = response.status();
|
|
|
|
|
if status.as_u16() == 404 {
|
|
|
|
|
return Ok::<_, anyhow::Error>(None);
|
2026-03-25 00:52:41 -04:00
|
|
|
}
|
salience: add gRPC client + TLS plumbing for stateful vllm sessions
Adds the client-side of a stateful gRPC protocol against vllm, plus
the TLS trust machinery so we can talk to self-signed vllm servers.
Protocol (proto/salience.proto):
Bidi-streaming Session RPC carries OpenSession / AppendTokens /
Generate / Cancel from client and SessionReady / PrefillProgress /
Token / GenerateDone / Error from server. Separate Fork unary RPC
for cheap branching (prefix cache shares KV automatically). Plus
ListSessions, CloseSession, GetReadoutManifest admin RPCs.
Per-token readouts ship as packed f32 ([n_layers * n_concepts] per
token, flat). Logprobs use range-selected positions plus a top-k
parameter — empty ranges means no logprobs, any range means emit
sampled-token logprob at those positions, top_k > 0 adds
alternatives.
Client (src/agent/api/salience.rs):
Tonic-generated types under pb::, a connect() helper, with_auth()
for bearer metadata, and a Session handle wrapping the bidi stream:
open() handshakes SessionReady; append() is fire-and-forget;
generate() returns impl Stream<Item = Event> that drains inbound
until Done or terminating Error. One generate at a time per session.
Peak picker (src/agent/salience.rs):
Pure function over ReadoutEntry traces. Per-concept z-score against
trace global stats; contiguous above-threshold regions emit one
peak at the local max. Configurable sigma threshold and min-std
safety floor. Deterministic tie-break on offset then concept name.
12 unit tests covering empty traces, flat channels, single/multi
spikes, contiguous humps, multi-concept independence, trailing
runs, sub-threshold noise, layer-out-of-range, manifest shape
mismatch, and threshold tunability.
TLS (src/agent/api/http.rs):
HttpClient::build now also loads every .pem file under
~/.consciousness/certs/ into the rustls root store — so dropping
a <host>.pem in that directory is enough to trust a new self-
signed server; no code changes per new host. Also installs the
rustls default crypto provider explicitly via OnceLock: tonic's
tls features pulled in both ring and aws-lc-rs on the resolver
path, and rustls 0.23 refuses to auto-pick when either could win.
Build (build.rs, Cargo.toml):
tonic-build generates Rust types from proto/salience.proto at
cargo-build time, using a vendored protoc binary
(protoc-bin-vendored) so no system install is required. New
runtime deps: tonic, prost, async-stream, tokio-stream,
rustls-pemfile.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-23 02:21:07 -04:00
|
|
|
if !status.is_success() {
|
|
|
|
|
let body = response.text().await.unwrap_or_default();
|
|
|
|
|
let n = body.floor_char_boundary(body.len().min(500));
|
|
|
|
|
anyhow::bail!("readout manifest HTTP {} ({}): {}", status, url, &body[..n]);
|
2026-04-02 15:10:40 -04:00
|
|
|
}
|
salience: add gRPC client + TLS plumbing for stateful vllm sessions
Adds the client-side of a stateful gRPC protocol against vllm, plus
the TLS trust machinery so we can talk to self-signed vllm servers.
Protocol (proto/salience.proto):
Bidi-streaming Session RPC carries OpenSession / AppendTokens /
Generate / Cancel from client and SessionReady / PrefillProgress /
Token / GenerateDone / Error from server. Separate Fork unary RPC
for cheap branching (prefix cache shares KV automatically). Plus
ListSessions, CloseSession, GetReadoutManifest admin RPCs.
Per-token readouts ship as packed f32 ([n_layers * n_concepts] per
token, flat). Logprobs use range-selected positions plus a top-k
parameter — empty ranges means no logprobs, any range means emit
sampled-token logprob at those positions, top_k > 0 adds
alternatives.
Client (src/agent/api/salience.rs):
Tonic-generated types under pb::, a connect() helper, with_auth()
for bearer metadata, and a Session handle wrapping the bidi stream:
open() handshakes SessionReady; append() is fire-and-forget;
generate() returns impl Stream<Item = Event> that drains inbound
until Done or terminating Error. One generate at a time per session.
Peak picker (src/agent/salience.rs):
Pure function over ReadoutEntry traces. Per-concept z-score against
trace global stats; contiguous above-threshold regions emit one
peak at the local max. Configurable sigma threshold and min-std
safety floor. Deterministic tie-break on offset then concept name.
12 unit tests covering empty traces, flat channels, single/multi
spikes, contiguous humps, multi-concept independence, trailing
runs, sub-threshold noise, layer-out-of-range, manifest shape
mismatch, and threshold tunability.
TLS (src/agent/api/http.rs):
HttpClient::build now also loads every .pem file under
~/.consciousness/certs/ into the rustls root store — so dropping
a <host>.pem in that directory is enough to trust a new self-
signed server; no code changes per new host. Also installs the
rustls default crypto provider explicitly via OnceLock: tonic's
tls features pulled in both ring and aws-lc-rs on the resolver
path, and rustls 0.23 refuses to auto-pick when either could win.
Build (build.rs, Cargo.toml):
tonic-build generates Rust types from proto/salience.proto at
cargo-build time, using a vendored protoc binary
(protoc-bin-vendored) so no system install is required. New
runtime deps: tonic, prost, async-stream, tokio-stream,
rustls-pemfile.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-23 02:21:07 -04:00
|
|
|
Ok(Some(response.json().await?))
|
|
|
|
|
}).await?;
|
|
|
|
|
Ok(manifest.clone())
|
2026-03-25 00:52:41 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|