forked from kent/consciousness
agent: kill no_compact, add pre-send size check in assemble_prompt
Two related fixes for last night's crash diagnosis:
1. Kill AgentState::no_compact. The reasoning ("forked agents
shouldn't compact because it blows the KV cache prefix") wasn't
worth the cost — forks with no compact recovery just *died* on
any oversize prompt, with no fallback. The KV cache invalidation
is a performance loss; failing the request entirely is a
correctness loss. Remove the flag, let every agent's overflow-
retry path call compact() up to 2 times.
2. Add pre-send size check in Agent::assemble_prompt. If the
context has grown past budget (context_window * 80%) since the
last compact — accumulation between turns, a fork assembling
more than expected, etc. — trim_conversation() is called before
wire_prompt. Since we tokenize client-side, we already know the
exact count, so there's no reason to round-trip an oversize
request to vLLM and get rejected.
Together these prevent the failure mode from last night: a
subconscious/unconscious agent's prompt exceeded max_model_len,
vLLM returned 400, agent had no_compact=true so it couldn't
recover, request failed. Now: the trim happens before send, so
the request rarely hits the 400 path at all; and if it somehow
does, compact+retry works for every agent.
Also adds ContextState::total_tokens() as the cheap pre-send
budget check.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
This commit is contained in:
parent
0592c5f78d
commit
c7b0052f1d
2 changed files with 30 additions and 20 deletions
|
|
@ -1096,6 +1096,16 @@ impl ContextState {
|
|||
self.section_mut(section).clear();
|
||||
}
|
||||
|
||||
/// Total tokens across every section that gets serialized into the prompt.
|
||||
/// Cheap sum over cached `node.tokens()`; call this before assembling to
|
||||
/// decide whether to trim.
|
||||
pub fn total_tokens(&self) -> usize {
|
||||
self.system().iter().map(|n| n.tokens()).sum::<usize>()
|
||||
+ self.identity().iter().map(|n| n.tokens()).sum::<usize>()
|
||||
+ self.journal().iter().map(|n| n.tokens()).sum::<usize>()
|
||||
+ self.conversation().iter().map(|n| n.tokens()).sum::<usize>()
|
||||
}
|
||||
|
||||
/// Dedup and trim conversation entries to fit within the context budget.
|
||||
///
|
||||
/// Phase 1: Drop duplicate memories (keep last) and DMN entries.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue