vLLM priority scheduling for agents
Thread request priority through the API call chain to vLLM's priority scheduler. Lower value = higher priority, with preemption. Priority is set per-agent in the .agent header: - interactive (runner): 0 (default, highest) - surface-observe: 1 (near-realtime, watches conversation) - all other agents: 10 (batch, default if not specified) Requires vLLM started with --scheduling-policy priority. Co-Authored-By: Proof of Concept <poc@bcachefs.org>
This commit is contained in:
parent
503e2995c1
commit
c72eb4d528
8 changed files with 27 additions and 7 deletions
|
|
@ -261,6 +261,7 @@ impl Agent {
|
|||
ui_tx,
|
||||
&self.reasoning_effort,
|
||||
None,
|
||||
None, // priority: interactive
|
||||
);
|
||||
|
||||
let mut content = String::new();
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue