forked from kent/consciousness
agent: bump tonic gRPC message caps to 64 MiB
The default 4 MiB cap on encoded/decoded messages is too small for the multimodal Generate path: Qwen3.6-VL high-res patches put 5–8 MiB of pre-encoded image bytes inline in a single Generate request, and Done events carrying full per-token readout vectors can also exceed 4 MiB on long runs. Hit "ResourceExhausted: Received message larger than max (5799108 vs. 4194304)" from the salience server. Bump both encode and decode caps on every cloned SalienceClient. The matching server-side bump is in vllm/entrypoints/salience/server.py. Co-Authored-By: Proof of Concept <poc@bcachefs.org>
This commit is contained in:
parent
11a7e4043e
commit
10c8878f1c
1 changed files with 10 additions and 1 deletions
|
|
@ -117,6 +117,12 @@ impl ApiClient {
|
||||||
/// the channel on first call and reuses it thereafter across
|
/// the channel on first call and reuses it thereafter across
|
||||||
/// every ApiClient clone. All scoring / inference / session
|
/// every ApiClient clone. All scoring / inference / session
|
||||||
/// RPCs flow through this single multiplexed HTTP/2 connection.
|
/// RPCs flow through this single multiplexed HTTP/2 connection.
|
||||||
|
///
|
||||||
|
/// Bumps tonic's default 4 MiB encode/decode caps to 64 MiB on
|
||||||
|
/// every client. Multimodal Generate requests carry pre-encoded
|
||||||
|
/// image bytes inline (Qwen3.6's 768×768 patches at high res
|
||||||
|
/// land around 5–8 MiB per turn), and Done events with full
|
||||||
|
/// per-token readout vectors can also exceed 4 MiB on long runs.
|
||||||
pub async fn salience_client(&self) -> Result<
|
pub async fn salience_client(&self) -> Result<
|
||||||
salience::pb::salience_client::SalienceClient<tonic::transport::Channel>
|
salience::pb::salience_client::SalienceClient<tonic::transport::Channel>
|
||||||
> {
|
> {
|
||||||
|
|
@ -127,7 +133,10 @@ impl ApiClient {
|
||||||
self.base_url, grpc_url);
|
self.base_url, grpc_url);
|
||||||
salience::connect_channel(&grpc_url).await
|
salience::connect_channel(&grpc_url).await
|
||||||
}).await?;
|
}).await?;
|
||||||
Ok(salience::pb::salience_client::SalienceClient::new(ch.clone()))
|
const MAX_GRPC_MESSAGE_BYTES: usize = 64 * 1024 * 1024;
|
||||||
|
Ok(salience::pb::salience_client::SalienceClient::new(ch.clone())
|
||||||
|
.max_decoding_message_size(MAX_GRPC_MESSAGE_BYTES)
|
||||||
|
.max_encoding_message_size(MAX_GRPC_MESSAGE_BYTES))
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Stream generation via a gRPC session. Walks the prompt chunks
|
/// Stream generation via a gRPC session. Walks the prompt chunks
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue