Thread request priority through the API call chain to vLLM's
priority scheduler. Lower value = higher priority, with preemption.
Priority is set per-agent in the .agent header:
- interactive (runner): 0 (default, highest)
- surface-observe: 1 (near-realtime, watches conversation)
- all other agents: 10 (batch, default if not specified)
Requires vLLM started with --scheduling-policy priority.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Split the streaming pipeline: API backends yield StreamEvents through
a channel, the runner reads them and routes to the appropriate UI pane.
- Add StreamEvent enum (Content, Reasoning, ToolCallDelta, etc.)
- API start_stream() spawns backend as a task, returns event receiver
- Runner loops over events, sends content to conversation pane but
suppresses <tool_call> XML with a buffered tail for partial tags
- OpenAI backend refactored to stream_events() — no more UI coupling
- Anthropic backend gets a wrapper that synthesizes events from the
existing stream() (TODO: native event streaming)
- chat_completion_stream() kept for subconscious agents, reimplemented
on top of the event stream
- Usage derives Clone
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
No more subcrate nesting — src/, agents/, schema/, defaults/, build.rs
all live at the workspace root. poc-daemon remains as the only workspace
member. Crate name (poc-memory) and all imports unchanged.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-03-25 00:54:12 -04:00
Renamed from poc-memory/src/agent/api/openai.rs (Browse further)