Show chunks received, SSE lines parsed, and the contents of
the line buffer (up to 500 bytes) on both stream errors and
timeouts. This tells us whether we got partial data, a non-SSE
response, or truly nothing from the server.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Stream chunk timeout is now api_stream_timeout_secs in config
(default 60s). Status bar shows total turn time and per-call
time with timeout: "thinking... 45s, 12/60s".
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Spawned streaming tasks were never cancelled when a turn ended or
retried, leaving zombie tasks blocked on dead vLLM connections.
AbortOnDrop wrapper aborts the task when it goes out of scope.
Chunk timeout reduced from 120s to 60s.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
pub → pub(crate) for SseReader methods (used across child modules).
pub → pub(super) for openai::stream_events, tool definitions, store
helpers. pub → private for normalize_link and differentiate_hub_with_graph
(only used within their own files).
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Serialize request JSON before send_and_check so it's available
for both HTTP errors and stream errors. Extracted save logic
into save_failed_request helper on SseReader.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Delete anthropic.rs (713 lines) — we only use OpenAI-compatible
endpoints (vLLM, OpenRouter). Simplify ApiClient to store base_url
directly instead of Backend enum.
SseReader now stores the serialized request payload and saves it
to ~/.consciousness/logs/failed-request-{ts}.json on stream timeout,
so failed requests can be replayed with curl for debugging.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Thread request priority through the API call chain to vLLM's
priority scheduler. Lower value = higher priority, with preemption.
Priority is set per-agent in the .agent header:
- interactive (runner): 0 (default, highest)
- surface-observe: 1 (near-realtime, watches conversation)
- all other agents: 10 (batch, default if not specified)
Requires vLLM started with --scheduling-policy priority.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Split the streaming pipeline: API backends yield StreamEvents through
a channel, the runner reads them and routes to the appropriate UI pane.
- Add StreamEvent enum (Content, Reasoning, ToolCallDelta, etc.)
- API start_stream() spawns backend as a task, returns event receiver
- Runner loops over events, sends content to conversation pane but
suppresses <tool_call> XML with a buffered tail for partial tags
- OpenAI backend refactored to stream_events() — no more UI coupling
- Anthropic backend gets a wrapper that synthesizes events from the
existing stream() (TODO: native event streaming)
- chat_completion_stream() kept for subconscious agents, reimplemented
on top of the event stream
- Usage derives Clone
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Tool call parsing was only in runner.rs, so subconscious agents
(poc-memory agent run) never recovered leaked tool calls from
models that emit <tool_call> as content text (e.g. Qwen via Crane).
Move the recovery into build_response_message where both code paths
share it. Leaked tool calls are promoted to structured tool_calls
and the content is cleaned, so all consumers see them uniformly.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
No more subcrate nesting — src/, agents/, schema/, defaults/, build.rs
all live at the workspace root. poc-daemon remains as the only workspace
member. Crate name (poc-memory) and all imports unchanged.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-03-25 00:54:12 -04:00
Renamed from poc-memory/src/agent/api/mod.rs (Browse further)