Wire vLLM priority scheduling through all agent paths

The priority field existed in agent definitions and was serialized
into vLLM requests, but was never actually set — every request went
out with no priority, so vLLM treated them equally. This meant
background graph maintenance agents could preempt the main
conversation.

Add priority to AgentState and set it at each call site:
  0 = interactive (main conversation)
  1 = surface agent (needs to feed memories promptly)
  2 = other subconscious agents
  10 = unconscious/standalone agents (batch)

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
This commit is contained in:
ProofOfConcept 2026-04-09 20:38:33 -04:00
parent b115cec096
commit bf503b571e
4 changed files with 15 additions and 2 deletions

View file

@ -163,6 +163,9 @@ pub struct AgentState {
pub generation: u64,
pub memory_scoring_in_flight: bool,
pub active_tools: tools::ActiveTools,
/// vLLM scheduling priority (lower = higher priority).
/// 0 = interactive, 1 = surface agent, 2 = other subconscious, 10 = unconscious.
pub priority: Option<i32>,
/// Forked agents should not compact on overflow — it blows the
/// KV cache prefix and evicts the step prompts.
pub no_compact: bool,
@ -225,6 +228,7 @@ impl Agent {
generation: 0,
memory_scoring_in_flight: false,
active_tools,
priority: Some(0),
no_compact: false,
changed: Arc::new(tokio::sync::Notify::new()),
}),
@ -261,6 +265,7 @@ impl Agent {
generation: 0,
memory_scoring_in_flight: false,
active_tools: tools::ActiveTools::new(),
priority: None,
no_compact: true,
changed: Arc::new(tokio::sync::Notify::new()),
}),
@ -318,7 +323,7 @@ impl Agent {
top_p: st.top_p,
top_k: st.top_k,
},
None,
st.priority,
)
};