agent: end-to-end gRPC Generate with delta-based session orchestration
Wires the client side of the new salience protocol so inference
actually runs over gRPC instead of emitting the stubbed "not yet
wired" error. Each turn walks the AST as interleaved chunks, sends
only what's new to the server, and streams decode tokens back.
context.rs:
* `WireChunk` enum: `Tokens(Vec<u32>)` or `Image { bytes, mime,
known_expanded_len }`. Preserves text/image/text ordering the
wire path can't flatten.
* `wire_chunks(range, skip)` walker, parallel to `wire_prompt` —
branches emit `<|im_start|>…<|im_end|>` tokens, image leaves
emit a single Image chunk (no inline vision tokens).
* `NodeLeaf::set_image_token_count(n)` + recompute of cached
`token_ids`; `ContextState::commit_image_token_counts(&[u32])`
fills in the first-N zero-count image leaves in wire order.
* `ResponseParser::run` handles the new
`StreamToken::ImageAppended` by committing the server's N into
the AST before the final Generate's Token events stream in.
salience.rs:
* `SessionHandle` tracks `committed_len`. `append_image` advances
it from the RPC response. New `generate(req)` opens the
server-streaming RPC.
api/mod.rs:
* `stream_session_mm(session_lock, chunks, sampling, priority,
readout_shape)` replaces the stub. Spawns `run_session_generate`.
* `run_session_generate`: takes the session out of the Mutex (or
opens fresh), skips chunks covered by `committed_len` (bails on
mid-chunk straddle or unknown-length image in the committed
prefix), walks the delta: accumulates Tokens into `pending`, on
Image flushes pending via `flush_pending` (max_tokens=0 Generate
that just prefills), then AppendImage + emits
StreamToken::ImageAppended. Final Generate carries any trailing
pending text as `append_tokens` and the sampling params; Token
events stream out as StreamToken::Token, Done as
StreamToken::Done. On success, handle with updated
`committed_len` returns to the Mutex; on error, handle drops
and next call reopens.
* `StreamToken::ImageAppended { placeholder_count }` variant —
emitted in wire order before the final Generate's tokens.
* Prefix-cache cap for readout coverage: `readout_ranges` covers
`[prompt_len_after_append, u32::MAX)` when the caller provides
a readout_shape, so decode positions stream their readouts.
agent/mod.rs:
* `assemble_prompt` returns `Vec<WireChunk>` with the assistant
prologue merged into the trailing Tokens chunk. Caller in
`turn` passes chunks + readout_shape (pulled from
`agent.readout.lock().manifest`) to `stream_session_mm`.
* Dropped `assemble_prompt_tokens` — dead.
mind + unconscious:
* `Unconscious::new(client)` stores a shared `ApiClient`. Fixes
the repeated-manifest-fetch bug caused by each subagent's
`ApiClient::new` having its own OnceCell. The client's Arc-
wrapped manifest cache is now shared across every agent Mind
spawns.
* `prepare_spawn(name, auto, wake, base_client)` clones the base
client and overrides `.model` for the resolved backend instead
of constructing fresh. All three callers
(`toggle`/`trigger`/unconscious loop) pass `self.client.clone()`.
* `Mind::new` passes `agent.client.clone()` into
`Unconscious::new`.
subconscious/generate.rs:
* gen_continuation switched to `wire_chunks` + the new
`stream_session_mm` signature. Ephemeral session opens on each
call, tears down at scope end. No readouts requested.
Not changed yet, noted for follow-up:
* Subconscious ablation scoring in learn.rs still talks to
`/v1/score` over HTTP. Will migrate once we have time to verify
the Generate+max_tokens=0+prompt_logprobs path end-to-end.
* compare.rs constructs its own ApiClient for the
`compare.test_backend` (which is intentionally a different
endpoint) — left alone.
* Readout manifest still fetched via HTTP at Agent::new.
Migration to GetReadoutManifest gRPC is a separate cleanup.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
This commit is contained in:
parent
08213f9093
commit
8d9c9e9f7b
7 changed files with 536 additions and 60 deletions
|
|
@ -73,10 +73,15 @@ pub struct Unconscious {
|
|||
last_health_check: Option<Instant>,
|
||||
/// Notified when agent state changes (finished, toggled)
|
||||
pub wake: std::sync::Arc<tokio::sync::Notify>,
|
||||
/// Shared API client — cloned (cheap) into each spawned agent's
|
||||
/// Agent::new call so they all share the manifest cache and
|
||||
/// gRPC endpoint state. Override `.model` on the clone when a
|
||||
/// per-agent backend differs from the default.
|
||||
pub client: crate::agent::api::ApiClient,
|
||||
}
|
||||
|
||||
impl Unconscious {
|
||||
pub fn new() -> Self {
|
||||
pub fn new(client: crate::agent::api::ApiClient) -> Self {
|
||||
let enabled_map = load_enabled_config();
|
||||
|
||||
// Scan all .agent files, exclude subconscious-* and surface-observe
|
||||
|
|
@ -120,6 +125,7 @@ impl Unconscious {
|
|||
graph_health: None,
|
||||
last_health_check: None,
|
||||
wake: std::sync::Arc::new(tokio::sync::Notify::new()),
|
||||
client,
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -134,7 +140,8 @@ impl Unconscious {
|
|||
let agent_name = self.agents[idx].name.clone();
|
||||
let auto = self.agents[idx].auto.take().unwrap();
|
||||
let wake = self.wake.clone();
|
||||
match prepare_spawn(&agent_name, auto, wake).await {
|
||||
let client = self.client.clone();
|
||||
match prepare_spawn(&agent_name, auto, wake, client).await {
|
||||
Ok(result) => self.complete_spawn(idx, result),
|
||||
Err(auto) => self.abort_spawn(idx, auto),
|
||||
}
|
||||
|
|
@ -250,7 +257,12 @@ pub struct SpawnResult {
|
|||
/// Called outside the Unconscious lock.
|
||||
/// On success, auto is consumed (moved into spawned task).
|
||||
/// On failure, auto is returned so it can be restored.
|
||||
pub async fn prepare_spawn(name: &str, mut auto: AutoAgent, wake: std::sync::Arc<tokio::sync::Notify>) -> Result<SpawnResult, AutoAgent> {
|
||||
pub async fn prepare_spawn(
|
||||
name: &str,
|
||||
mut auto: AutoAgent,
|
||||
wake: std::sync::Arc<tokio::sync::Notify>,
|
||||
base_client: crate::agent::api::ApiClient,
|
||||
) -> Result<SpawnResult, AutoAgent> {
|
||||
dbglog!("[unconscious] spawning {}", name);
|
||||
|
||||
let def = match defs::get_def(name) {
|
||||
|
|
@ -295,8 +307,10 @@ pub async fn prepare_spawn(name: &str, mut auto: AutoAgent, wake: std::sync::Arc
|
|||
};
|
||||
|
||||
// Unconscious agents have self-contained prompts — no standard context.
|
||||
let client = crate::agent::api::ApiClient::new(
|
||||
&resolved.api_base, &resolved.api_key, &resolved.model_id);
|
||||
// Clone the shared client so we inherit the manifest cache and
|
||||
// only override the model id per-agent.
|
||||
let mut client = base_client;
|
||||
client.model = resolved.model_id.clone();
|
||||
let agent = crate::agent::Agent::new(
|
||||
client, Vec::new(),
|
||||
app, None,
|
||||
|
|
@ -329,8 +343,9 @@ impl Unconscious {
|
|||
self.reap_finished();
|
||||
let to_spawn = self.select_to_spawn();
|
||||
let wake = self.wake.clone();
|
||||
let client = self.client.clone();
|
||||
for (idx, name, auto) in to_spawn {
|
||||
match prepare_spawn(&name, auto, wake.clone()).await {
|
||||
match prepare_spawn(&name, auto, wake.clone(), client.clone()).await {
|
||||
Ok(result) => self.complete_spawn(idx, result),
|
||||
Err(auto) => self.abort_spawn(idx, auto),
|
||||
}
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue