consciousness

Author	SHA1	Message	Date
Kent Overstreet	fe232cf292	salience: client-side pad expansion, drop AppendImage Mirrors the vLLM-side rewrite. AppendImage is gone; images now ride along on Generate via a parallel `images` list. - Productionize `qwen3_image_token_count` (was test-only). Image leaf computes its IMAGE_PAD count eagerly at construction from height/width; `token_count` is no longer "0 until the server tells us." - WireChunk shrinks to a single `Tokens(Vec<u32>)` variant — vision blocks live inline in the token stream. - `wire_chunks` now returns `(Vec<WireChunk>, Vec<WireImage>)`. `WireImage` carries `pad_start` / `pad_end` (absolute positions in the full walk) alongside bytes + mime. - `assemble_prompt` returns `(chunks, images, match_upto)`. - `stream_session_mm` / `run_session_generate` take the parallel images list, filter to those past `match_upto`, and pass them in `GenerateRequest.images` as `pb::ImageAttachment` entries. - Drop `SessionHandle::append_image`, `ContextState::commit_image_token_counts`, `StreamToken::ImageAppended`, the WireChunk::Image branch in `learn.rs`, and the now-empty `prompt_to_chunks` helper. - Add 'v' toggle on the conscious-screen tree to render token-id vectors in place of text content (debug-aid: lets us see what the server actually has when output is suspicious). - Comment out the subconscious-trigger spawn loop — Kent had this disabled before; it had crept back into running. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-24 20:26:47 -04:00
Kent Overstreet	4feebb7bc4	agent: share one tonic Channel + migrate scoring to gRPC Generate Two changes that bolt together — the shared connection means the new scoring path actually costs one HTTP/2 handshake across the whole process instead of one-per-RPC. ApiClient gains `salience_channel: Arc<OnceCell<Channel>>`. First call to `ApiClient::salience_client()` opens the channel via `connect_channel()` and stores the Channel; subsequent calls clone it (cheap — tonic multiplexes concurrent RPCs over the single HTTP/2 connection). Every ApiClient clone shares the same OnceCell, so all agents spawned from Mind's client — plus every ephemeral scoring session — reuse one connection. SessionHandle refactored to hold an `ApiClient` clone instead of a bag of (base_url, api_key) strings. `open` / `append_image` / `generate` go through `self.client.salience_client()` now. New `prefill_only(tokens)` method encapsulates the "Generate with max_tokens=0 to append text" pattern (previously a private free function in api/mod.rs called `flush_pending`). Drop impl on SessionHandle stays — still fires CloseSession on the shared channel in a detached task. `run_session_generate` switched from `(base_url, api_key, model)` to `&ApiClient`; the agent-turn flow that uses it keeps the same shape but `stream_session_mm` clones the ApiClient into the spawned worker. learn.rs migrated from the HTTP `/v1/score` endpoint to a gRPC session-based score: * `call_score` opens an ephemeral SessionHandle on the client, converts (prompt_tokens, images) → Vec<WireChunk> via the new `prompt_to_chunks` helper (splits on VISION_START/VISION_END), walks chunks calling `prefill_only` + `append_image`, runs a final Generate with `max_tokens=0` + `logprobs_ranges` over the scored positions, and sums each Token event's `sampled_logprob` per range to produce `ScoreResult`s. * SessionHandle drops at end of scope → CloseSession auto-fires, keeping the server's session map clean between calls. * No more HTTP path, no more `http_client()` helper, no more `ScoreResponse` / serde plumbing for /v1/score. * `send_to_train` still uses HTTP (it talks to /v1/train which isn't on the gRPC protocol); its ad-hoc HTTP client lives inline now instead of reaching for the deleted `http_client()`. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-24 12:51:53 -04:00
Kent Overstreet	8d9c9e9f7b	agent: end-to-end gRPC Generate with delta-based session orchestration Wires the client side of the new salience protocol so inference actually runs over gRPC instead of emitting the stubbed "not yet wired" error. Each turn walks the AST as interleaved chunks, sends only what's new to the server, and streams decode tokens back. context.rs: * `WireChunk` enum: `Tokens(Vec<u32>)` or `Image { bytes, mime, known_expanded_len }`. Preserves text/image/text ordering the wire path can't flatten. * `wire_chunks(range, skip)` walker, parallel to `wire_prompt` — branches emit `<\|im_start\|>…<\|im_end\|>` tokens, image leaves emit a single Image chunk (no inline vision tokens). * `NodeLeaf::set_image_token_count(n)` + recompute of cached `token_ids`; `ContextState::commit_image_token_counts(&[u32])` fills in the first-N zero-count image leaves in wire order. * `ResponseParser::run` handles the new `StreamToken::ImageAppended` by committing the server's N into the AST before the final Generate's Token events stream in. salience.rs: * `SessionHandle` tracks `committed_len`. `append_image` advances it from the RPC response. New `generate(req)` opens the server-streaming RPC. api/mod.rs: * `stream_session_mm(session_lock, chunks, sampling, priority, readout_shape)` replaces the stub. Spawns `run_session_generate`. * `run_session_generate`: takes the session out of the Mutex (or opens fresh), skips chunks covered by `committed_len` (bails on mid-chunk straddle or unknown-length image in the committed prefix), walks the delta: accumulates Tokens into `pending`, on Image flushes pending via `flush_pending` (max_tokens=0 Generate that just prefills), then AppendImage + emits StreamToken::ImageAppended. Final Generate carries any trailing pending text as `append_tokens` and the sampling params; Token events stream out as StreamToken::Token, Done as StreamToken::Done. On success, handle with updated `committed_len` returns to the Mutex; on error, handle drops and next call reopens. * `StreamToken::ImageAppended { placeholder_count }` variant — emitted in wire order before the final Generate's tokens. * Prefix-cache cap for readout coverage: `readout_ranges` covers `[prompt_len_after_append, u32::MAX)` when the caller provides a readout_shape, so decode positions stream their readouts. agent/mod.rs: * `assemble_prompt` returns `Vec<WireChunk>` with the assistant prologue merged into the trailing Tokens chunk. Caller in `turn` passes chunks + readout_shape (pulled from `agent.readout.lock().manifest`) to `stream_session_mm`. * Dropped `assemble_prompt_tokens` — dead. mind + unconscious: * `Unconscious::new(client)` stores a shared `ApiClient`. Fixes the repeated-manifest-fetch bug caused by each subagent's `ApiClient::new` having its own OnceCell. The client's Arc- wrapped manifest cache is now shared across every agent Mind spawns. * `prepare_spawn(name, auto, wake, base_client)` clones the base client and overrides `.model` for the resolved backend instead of constructing fresh. All three callers (`toggle`/`trigger`/unconscious loop) pass `self.client.clone()`. * `Mind::new` passes `agent.client.clone()` into `Unconscious::new`. subconscious/generate.rs: * gen_continuation switched to `wire_chunks` + the new `stream_session_mm` signature. Ephemeral session opens on each call, tears down at scope end. No readouts requested. Not changed yet, noted for follow-up: * Subconscious ablation scoring in learn.rs still talks to `/v1/score` over HTTP. Will migrate once we have time to verify the Generate+max_tokens=0+prompt_logprobs path end-to-end. * compare.rs constructs its own ApiClient for the `compare.test_backend` (which is intentionally a different endpoint) — left alone. * Readout manifest still fetched via HTTP at Agent::new. Migration to GetReadoutManifest gRPC is a separate cleanup. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-24 12:27:55 -04:00
Kent Overstreet	08213f9093	salience: add gRPC client + TLS plumbing for stateful vllm sessions Adds the client-side of a stateful gRPC protocol against vllm, plus the TLS trust machinery so we can talk to self-signed vllm servers. Protocol (proto/salience.proto): Bidi-streaming Session RPC carries OpenSession / AppendTokens / Generate / Cancel from client and SessionReady / PrefillProgress / Token / GenerateDone / Error from server. Separate Fork unary RPC for cheap branching (prefix cache shares KV automatically). Plus ListSessions, CloseSession, GetReadoutManifest admin RPCs. Per-token readouts ship as packed f32 ([n_layers * n_concepts] per token, flat). Logprobs use range-selected positions plus a top-k parameter — empty ranges means no logprobs, any range means emit sampled-token logprob at those positions, top_k > 0 adds alternatives. Client (src/agent/api/salience.rs): Tonic-generated types under pb::, a connect() helper, with_auth() for bearer metadata, and a Session handle wrapping the bidi stream: open() handshakes SessionReady; append() is fire-and-forget; generate() returns impl Stream<Item = Event> that drains inbound until Done or terminating Error. One generate at a time per session. Peak picker (src/agent/salience.rs): Pure function over ReadoutEntry traces. Per-concept z-score against trace global stats; contiguous above-threshold regions emit one peak at the local max. Configurable sigma threshold and min-std safety floor. Deterministic tie-break on offset then concept name. 12 unit tests covering empty traces, flat channels, single/multi spikes, contiguous humps, multi-concept independence, trailing runs, sub-threshold noise, layer-out-of-range, manifest shape mismatch, and threshold tunability. TLS (src/agent/api/http.rs): HttpClient::build now also loads every .pem file under ~/.consciousness/certs/ into the rustls root store — so dropping a <host>.pem in that directory is enough to trust a new self- signed server; no code changes per new host. Also installs the rustls default crypto provider explicitly via OnceLock: tonic's tls features pulled in both ring and aws-lc-rs on the resolver path, and rustls 0.23 refuses to auto-pick when either could win. Build (build.rs, Cargo.toml): tonic-build generates Rust types from proto/salience.proto at cargo-build time, using a vendored protoc binary (protoc-bin-vendored) so no system install is required. New runtime deps: tonic, prost, async-stream, tokio-stream, rustls-pemfile. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-24 11:56:32 -04:00

4 commits