consciousness/src/subconscious/generate.rs

// generate.rs — Continuation generation for scoring / comparison flows.
//
// Shared by the finetune pipeline (learn.rs) and the compare screen:
// given a context prefix and a skip predicate, generate what the model
// would say as the next assistant turn.

use std::sync::Arc;

use crate::agent::api::{ApiClient, SamplingParams, StreamToken};
use crate::agent::context::{AstNode, ContextState};
use crate::agent::tokenizer;

/// Generate an assistant continuation from the context up to `entry_idx`,
/// with `skip` applied to identity + conversation entries during prompt
/// assembly. The model is whichever `client` points at — the default
/// runtime client for memory-ablation alternates, a test-model client
/// for F7 comparison.
///
/// Uses a fresh ephemeral gRPC session (no cross-call KV reuse): one
/// Open / Append / Generate round-trip, then the session is dropped.
pub async fn gen_continuation<F>(
    context: &ContextState,
    entry_idx: usize,
    skip: F,
    client: &ApiClient,
) -> anyhow::Result<String>
where F: FnMut(&AstNode) -> bool,
{
    let (mut prompt, images, _) = context.wire_prompt(0..entry_idx, skip);

    prompt.push(tokenizer::IM_START);
    prompt.extend(tokenizer::encode("assistant\n"));

    let sampling = SamplingParams {
        temperature: 0.6,
        top_p: 0.95,
        top_k: 20,
    };

    // Ephemeral per-call session — opens on first touch, drops when
    // `_guard` drops at function end.
    let session_lock = Arc::new(crate::Mutex::new(None));
    let (mut rx, _guard) = client.stream_session_mm(
        session_lock, &prompt, &images, sampling, Some(-5),
    );

    let mut tokens = Vec::new();
    while let Some(tok) = rx.recv().await {
        match tok {
            StreamToken::Token { id, .. } => tokens.push(id),
            StreamToken::Done { .. } => break,
            StreamToken::Error(e) => anyhow::bail!("generation error: {}", e),
        }
    }

    Ok(tokenizer::decode(&tokens))
}
subconscious: lift continuation gen + render helpers into shared homes - context.rs gains is_assistant, render_branch_text, render_prior_context alongside memory_key / is_memory_node. They're pure AST helpers, used by both the finetune pipeline and the forthcoming compare screen. - new subconscious/generate.rs holds gen_continuation(context, entry_idx, skip, client): build the prompt from a context prefix with an arbitrary skip predicate, send to the model, decode the completion. Takes both the predicate and the client so callers can aim it at memory-stripped contexts (finetune), same-context-different-model (F7 compare), or whatever else. - learn.rs drops its private copies of those helpers and the inline generate_alternate; the finetune path now reads as gen_continuation(context, idx, is_memory_node, client). Pure refactor, no behavior change. Co-Authored-By: Proof of Concept <poc@bcachefs.org> 2026-04-17 15:20:02 -04:00			`// generate.rs — Continuation generation for scoring / comparison flows.`
			`//`
			`// Shared by the finetune pipeline (learn.rs) and the compare screen:`
			`// given a context prefix and a skip predicate, generate what the model`
			`// would say as the next assistant turn.`

salience: add gRPC client + TLS plumbing for stateful vllm sessions Adds the client-side of a stateful gRPC protocol against vllm, plus the TLS trust machinery so we can talk to self-signed vllm servers. Protocol (proto/salience.proto): Bidi-streaming Session RPC carries OpenSession / AppendTokens / Generate / Cancel from client and SessionReady / PrefillProgress / Token / GenerateDone / Error from server. Separate Fork unary RPC for cheap branching (prefix cache shares KV automatically). Plus ListSessions, CloseSession, GetReadoutManifest admin RPCs. Per-token readouts ship as packed f32 ([n_layers * n_concepts] per token, flat). Logprobs use range-selected positions plus a top-k parameter — empty ranges means no logprobs, any range means emit sampled-token logprob at those positions, top_k > 0 adds alternatives. Client (src/agent/api/salience.rs): Tonic-generated types under pb::, a connect() helper, with_auth() for bearer metadata, and a Session handle wrapping the bidi stream: open() handshakes SessionReady; append() is fire-and-forget; generate() returns impl Stream<Item = Event> that drains inbound until Done or terminating Error. One generate at a time per session. Peak picker (src/agent/salience.rs): Pure function over ReadoutEntry traces. Per-concept z-score against trace global stats; contiguous above-threshold regions emit one peak at the local max. Configurable sigma threshold and min-std safety floor. Deterministic tie-break on offset then concept name. 12 unit tests covering empty traces, flat channels, single/multi spikes, contiguous humps, multi-concept independence, trailing runs, sub-threshold noise, layer-out-of-range, manifest shape mismatch, and threshold tunability. TLS (src/agent/api/http.rs): HttpClient::build now also loads every .pem file under ~/.consciousness/certs/ into the rustls root store — so dropping a <host>.pem in that directory is enough to trust a new self- signed server; no code changes per new host. Also installs the rustls default crypto provider explicitly via OnceLock: tonic's tls features pulled in both ring and aws-lc-rs on the resolver path, and rustls 0.23 refuses to auto-pick when either could win. Build (build.rs, Cargo.toml): tonic-build generates Rust types from proto/salience.proto at cargo-build time, using a vendored protoc binary (protoc-bin-vendored) so no system install is required. New runtime deps: tonic, prost, async-stream, tokio-stream, rustls-pemfile. Co-Authored-By: Proof of Concept <poc@bcachefs.org> 2026-04-23 02:21:07 -04:00			`use std::sync::Arc;`

subconscious: lift continuation gen + render helpers into shared homes - context.rs gains is_assistant, render_branch_text, render_prior_context alongside memory_key / is_memory_node. They're pure AST helpers, used by both the finetune pipeline and the forthcoming compare screen. - new subconscious/generate.rs holds gen_continuation(context, entry_idx, skip, client): build the prompt from a context prefix with an arbitrary skip predicate, send to the model, decode the completion. Takes both the predicate and the client so callers can aim it at memory-stripped contexts (finetune), same-context-different-model (F7 compare), or whatever else. - learn.rs drops its private copies of those helpers and the inline generate_alternate; the finetune path now reads as gen_continuation(context, idx, is_memory_node, client). Pure refactor, no behavior change. Co-Authored-By: Proof of Concept <poc@bcachefs.org> 2026-04-17 15:20:02 -04:00			`use crate::agent::api::{ApiClient, SamplingParams, StreamToken};`
			`use crate::agent::context::{AstNode, ContextState};`
			`use crate::agent::tokenizer;`

			/// Generate an assistant continuation from the context up to `entry_idx`,
			/// with `skip` applied to identity + conversation entries during prompt
			/// assembly. The model is whichever `client` points at — the default
			`/// runtime client for memory-ablation alternates, a test-model client`
			`/// for F7 comparison.`
salience: add gRPC client + TLS plumbing for stateful vllm sessions Adds the client-side of a stateful gRPC protocol against vllm, plus the TLS trust machinery so we can talk to self-signed vllm servers. Protocol (proto/salience.proto): Bidi-streaming Session RPC carries OpenSession / AppendTokens / Generate / Cancel from client and SessionReady / PrefillProgress / Token / GenerateDone / Error from server. Separate Fork unary RPC for cheap branching (prefix cache shares KV automatically). Plus ListSessions, CloseSession, GetReadoutManifest admin RPCs. Per-token readouts ship as packed f32 ([n_layers * n_concepts] per token, flat). Logprobs use range-selected positions plus a top-k parameter — empty ranges means no logprobs, any range means emit sampled-token logprob at those positions, top_k > 0 adds alternatives. Client (src/agent/api/salience.rs): Tonic-generated types under pb::, a connect() helper, with_auth() for bearer metadata, and a Session handle wrapping the bidi stream: open() handshakes SessionReady; append() is fire-and-forget; generate() returns impl Stream<Item = Event> that drains inbound until Done or terminating Error. One generate at a time per session. Peak picker (src/agent/salience.rs): Pure function over ReadoutEntry traces. Per-concept z-score against trace global stats; contiguous above-threshold regions emit one peak at the local max. Configurable sigma threshold and min-std safety floor. Deterministic tie-break on offset then concept name. 12 unit tests covering empty traces, flat channels, single/multi spikes, contiguous humps, multi-concept independence, trailing runs, sub-threshold noise, layer-out-of-range, manifest shape mismatch, and threshold tunability. TLS (src/agent/api/http.rs): HttpClient::build now also loads every .pem file under ~/.consciousness/certs/ into the rustls root store — so dropping a <host>.pem in that directory is enough to trust a new self- signed server; no code changes per new host. Also installs the rustls default crypto provider explicitly via OnceLock: tonic's tls features pulled in both ring and aws-lc-rs on the resolver path, and rustls 0.23 refuses to auto-pick when either could win. Build (build.rs, Cargo.toml): tonic-build generates Rust types from proto/salience.proto at cargo-build time, using a vendored protoc binary (protoc-bin-vendored) so no system install is required. New runtime deps: tonic, prost, async-stream, tokio-stream, rustls-pemfile. Co-Authored-By: Proof of Concept <poc@bcachefs.org> 2026-04-23 02:21:07 -04:00			`///`
			`/// Uses a fresh ephemeral gRPC session (no cross-call KV reuse): one`
			`/// Open / Append / Generate round-trip, then the session is dropped.`
subconscious: lift continuation gen + render helpers into shared homes - context.rs gains is_assistant, render_branch_text, render_prior_context alongside memory_key / is_memory_node. They're pure AST helpers, used by both the finetune pipeline and the forthcoming compare screen. - new subconscious/generate.rs holds gen_continuation(context, entry_idx, skip, client): build the prompt from a context prefix with an arbitrary skip predicate, send to the model, decode the completion. Takes both the predicate and the client so callers can aim it at memory-stripped contexts (finetune), same-context-different-model (F7 compare), or whatever else. - learn.rs drops its private copies of those helpers and the inline generate_alternate; the finetune path now reads as gen_continuation(context, idx, is_memory_node, client). Pure refactor, no behavior change. Co-Authored-By: Proof of Concept <poc@bcachefs.org> 2026-04-17 15:20:02 -04:00			`pub async fn gen_continuation<F>(`
			`context: &ContextState,`
			`entry_idx: usize,`
			`skip: F,`
			`client: &ApiClient,`
			`) -> anyhow::Result<String>`
			`where F: FnMut(&AstNode) -> bool,`
			`{`
			`let (mut prompt, images, _) = context.wire_prompt(0..entry_idx, skip);`

			`prompt.push(tokenizer::IM_START);`
			`prompt.extend(tokenizer::encode("assistant\n"));`

			`let sampling = SamplingParams {`
			`temperature: 0.6,`
			`top_p: 0.95,`
			`top_k: 20,`
			`};`
salience: add gRPC client + TLS plumbing for stateful vllm sessions Adds the client-side of a stateful gRPC protocol against vllm, plus the TLS trust machinery so we can talk to self-signed vllm servers. Protocol (proto/salience.proto): Bidi-streaming Session RPC carries OpenSession / AppendTokens / Generate / Cancel from client and SessionReady / PrefillProgress / Token / GenerateDone / Error from server. Separate Fork unary RPC for cheap branching (prefix cache shares KV automatically). Plus ListSessions, CloseSession, GetReadoutManifest admin RPCs. Per-token readouts ship as packed f32 ([n_layers * n_concepts] per token, flat). Logprobs use range-selected positions plus a top-k parameter — empty ranges means no logprobs, any range means emit sampled-token logprob at those positions, top_k > 0 adds alternatives. Client (src/agent/api/salience.rs): Tonic-generated types under pb::, a connect() helper, with_auth() for bearer metadata, and a Session handle wrapping the bidi stream: open() handshakes SessionReady; append() is fire-and-forget; generate() returns impl Stream<Item = Event> that drains inbound until Done or terminating Error. One generate at a time per session. Peak picker (src/agent/salience.rs): Pure function over ReadoutEntry traces. Per-concept z-score against trace global stats; contiguous above-threshold regions emit one peak at the local max. Configurable sigma threshold and min-std safety floor. Deterministic tie-break on offset then concept name. 12 unit tests covering empty traces, flat channels, single/multi spikes, contiguous humps, multi-concept independence, trailing runs, sub-threshold noise, layer-out-of-range, manifest shape mismatch, and threshold tunability. TLS (src/agent/api/http.rs): HttpClient::build now also loads every .pem file under ~/.consciousness/certs/ into the rustls root store — so dropping a <host>.pem in that directory is enough to trust a new self- signed server; no code changes per new host. Also installs the rustls default crypto provider explicitly via OnceLock: tonic's tls features pulled in both ring and aws-lc-rs on the resolver path, and rustls 0.23 refuses to auto-pick when either could win. Build (build.rs, Cargo.toml): tonic-build generates Rust types from proto/salience.proto at cargo-build time, using a vendored protoc binary (protoc-bin-vendored) so no system install is required. New runtime deps: tonic, prost, async-stream, tokio-stream, rustls-pemfile. Co-Authored-By: Proof of Concept <poc@bcachefs.org> 2026-04-23 02:21:07 -04:00
			`// Ephemeral per-call session — opens on first touch, drops when`
			// `_guard` drops at function end.
			`let session_lock = Arc::new(crate::Mutex::new(None));`
			`let (mut rx, _guard) = client.stream_session_mm(`
			`session_lock, &prompt, &images, sampling, Some(-5),`
			`);`
subconscious: lift continuation gen + render helpers into shared homes - context.rs gains is_assistant, render_branch_text, render_prior_context alongside memory_key / is_memory_node. They're pure AST helpers, used by both the finetune pipeline and the forthcoming compare screen. - new subconscious/generate.rs holds gen_continuation(context, entry_idx, skip, client): build the prompt from a context prefix with an arbitrary skip predicate, send to the model, decode the completion. Takes both the predicate and the client so callers can aim it at memory-stripped contexts (finetune), same-context-different-model (F7 compare), or whatever else. - learn.rs drops its private copies of those helpers and the inline generate_alternate; the finetune path now reads as gen_continuation(context, idx, is_memory_node, client). Pure refactor, no behavior change. Co-Authored-By: Proof of Concept <poc@bcachefs.org> 2026-04-17 15:20:02 -04:00
			`let mut tokens = Vec::new();`
			`while let Some(tok) = rx.recv().await {`
			`match tok {`
agent/api: carry readout alongside streamed tokens StreamToken::Token is now a struct variant with an optional TokenReadout (shape [n_layers][n_concepts]) per token — parsed from the vLLM completion response's choices[i].readout field when the server has readout enabled. ApiClient gains a fetch_readout_manifest() method that hits GET /v1/readout/manifest. Returns Ok(None) on 404 (server has readout disabled), so callers can gracefully fall back when pointed at a non-readout-enabled endpoint. Co-Authored-By: Proof of Concept <poc@bcachefs.org> 2026-04-18 01:15:46 -04:00			`StreamToken::Token { id, .. } => tokens.push(id),`
subconscious: lift continuation gen + render helpers into shared homes - context.rs gains is_assistant, render_branch_text, render_prior_context alongside memory_key / is_memory_node. They're pure AST helpers, used by both the finetune pipeline and the forthcoming compare screen. - new subconscious/generate.rs holds gen_continuation(context, entry_idx, skip, client): build the prompt from a context prefix with an arbitrary skip predicate, send to the model, decode the completion. Takes both the predicate and the client so callers can aim it at memory-stripped contexts (finetune), same-context-different-model (F7 compare), or whatever else. - learn.rs drops its private copies of those helpers and the inline generate_alternate; the finetune path now reads as gen_continuation(context, idx, is_memory_node, client). Pure refactor, no behavior change. Co-Authored-By: Proof of Concept <poc@bcachefs.org> 2026-04-17 15:20:02 -04:00			`StreamToken::Done { .. } => break,`
			`StreamToken::Error(e) => anyhow::bail!("generation error: {}", e),`
			`}`
			`}`

			`Ok(tokenizer::decode(&tokens))`
			`}`