New src/agent/api/http.rs: ~240 lines, supports GET/POST, JSON/form
bodies, SSE streaming via chunk(), TLS via rustls. No tracing dep.
Removes reqwest from the main crate and telegram channel crate.
Cargo.lock drops ~900 lines of transitive dependencies.
tracing now only pulled in by tui-markdown.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Streaming text now goes directly to agent entries via append_streaming().
sync_from_agent diffs the growing entry each tick. The streaming entry
is popped when the response completes; build_response_message pushes
the final version.
All status feedback uses RAII ActivityGuards:
- push_activity() for long-running work (thinking, streaming, scoring)
- notify() for instant feedback (compacted, DMN state changes, commands)
- Guards auto-remove on Drop, appending "(complete)" and lingering 5s
- expire_activities() cleans up timed-out notifications on render tick
UiMessage enum reduced to a single Info variant with zero sends.
The channel infrastructure remains for now (Mind/Agent still take
UiSender in signatures) — mechanical cleanup for a follow-up.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Reasoning tokens: dropped for now, will land in context entries later.
Debug sends: converted to dbglog! macro (writes to debug.log).
Activity: now a field on Agent, set directly, read by UI via try_lock.
score_memories_incremental takes agent Arc for activity writes.
UiMessage down to 2 variants: TextDelta, Info.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Status bar reads directly from Agent and MindState on each render tick.
Activity is now a field on Agent — set by agent code directly, read by
UI via try_lock. DmnAnnotation, ContextInfoUpdate, AgentUpdate were
already dead (no senders).
UiMessage down to 4 variants: TextDelta, Reasoning, Debug, Info.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
The std::sync::Mutex detour caught every place a MutexGuard lived
across an await point in Agent::turn — the compiler enforced Send
safety that tokio::sync::Mutex silently allows. With those fixed,
switch back to tokio::sync::Mutex (std::sync blocks tokio worker
threads and panics inside the runtime).
Input and command dispatch now live in InteractScreen (chat.rs):
- Enter pushes directly to SharedMindState.input (no app.submitted hop)
- sync_from_agent displays pending input with dimmed color
- Slash command table moved from event_loop.rs to chat.rs
- cmd_switch_model kept as pub fn for tool-initiated switches
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
The agent lock is never held across await points — turns lock briefly,
do work, drop, then do async API calls. std::sync::Mutex works and
can be locked from sync contexts (screen tick inside terminal.draw).
Fixes: blocking_lock() panic when called inside tokio runtime.
Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
Move the entire stream event processing loop (content accumulation,
leaked tool call detection/dispatch, ToolCallDelta assembly, UI
forwarding, display buffering) into api::collect_stream(). The turn
loop now calls collect_stream() and processes the StreamResult.
Also move FunctionCall, ToolCall, ToolCallDelta to api/types.rs where
they belong (API wire format, not tool definitions). Move parsing.rs
to api/parsing.rs.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Move FunctionCall, FunctionCallDelta, ToolCall, ToolCallDelta from
tools/mod.rs to api/types.rs — these are API wire format, not tool
definitions. Re-export from tools for existing callers.
Move parsing.rs to api/parsing.rs — leaked tool call parsing is API
plumbing.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
ToolDef and FunctionDef are gone. Tool definitions are static strings
on the Tool struct. The API layer builds JSON from Tool::to_json().
- ChatRequest.tools is now Option<serde_json::Value>
- start_stream takes &[Tool] instead of Option<&[ToolDef]>
- openai::stream_events takes &serde_json::Value for tools
- memory_and_journal_tools() returns Vec<Tool> for subconscious agents
- Subconscious agents filter by t.name instead of t.function.name
No more runtime JSON construction for tool definitions.
No more ToolDef::new(). No more FunctionDef.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Move temperature from a per-call parameter to an Agent field,
add top_p and top_k. All three are sent to the API via a new
SamplingParams struct, displayed on the F5 thalamus screen.
Defaults: temperature=0.6, top_p=0.95, top_k=20 (Qwen3.5 defaults).
Also adds top_p and top_k to ChatRequest so they're sent in the
API payload. Previously only temperature was sent.
UI controls for adjusting these at runtime are not yet implemented.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Mechanical rename: src/agent/ -> src/user/, all crate::agent:: ->
crate::user:: references updated. Binary poc-agent renamed to
consciousness with CLI name and user-facing strings updated.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
/score snapshots the context and client, releases the agent lock,
runs scoring in background. Only one score task at a time
(scoring_in_flight flag). Results stored on Agent and shown on
the F10 context debug screen with importance scores per memory.
ApiClient derives Clone. ContextState derives Clone.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
score_memories() drops each memory from the context one at a time,
runs prompt_logprobs against the full conversation, and builds a
divergence matrix: memories × responses.
Row sums = memory importance (for graph weight updates)
Column sums = response memory-dependence (training candidates)
Uses vLLM's prompt_logprobs to check "would the model have said
this without this memory?" — one forward pass per memory, all
responses scored at once. ~3s per memory on B200.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Show chunks received, SSE lines parsed, and the contents of
the line buffer (up to 500 bytes) on both stream errors and
timeouts. This tells us whether we got partial data, a non-SSE
response, or truly nothing from the server.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Stream chunk timeout is now api_stream_timeout_secs in config
(default 60s). Status bar shows total turn time and per-call
time with timeout: "thinking... 45s, 12/60s".
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Spawned streaming tasks were never cancelled when a turn ended or
retried, leaving zombie tasks blocked on dead vLLM connections.
AbortOnDrop wrapper aborts the task when it goes out of scope.
Chunk timeout reduced from 120s to 60s.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
pub → pub(crate) for SseReader methods (used across child modules).
pub → pub(super) for openai::stream_events, tool definitions, store
helpers. pub → private for normalize_link and differentiate_hub_with_graph
(only used within their own files).
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Serialize request JSON before send_and_check so it's available
for both HTTP errors and stream errors. Extracted save logic
into save_failed_request helper on SseReader.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Delete anthropic.rs (713 lines) — we only use OpenAI-compatible
endpoints (vLLM, OpenRouter). Simplify ApiClient to store base_url
directly instead of Backend enum.
SseReader now stores the serialized request payload and saves it
to ~/.consciousness/logs/failed-request-{ts}.json on stream timeout,
so failed requests can be replayed with curl for debugging.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Thread request priority through the API call chain to vLLM's
priority scheduler. Lower value = higher priority, with preemption.
Priority is set per-agent in the .agent header:
- interactive (runner): 0 (default, highest)
- surface-observe: 1 (near-realtime, watches conversation)
- all other agents: 10 (batch, default if not specified)
Requires vLLM started with --scheduling-policy priority.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Split the streaming pipeline: API backends yield StreamEvents through
a channel, the runner reads them and routes to the appropriate UI pane.
- Add StreamEvent enum (Content, Reasoning, ToolCallDelta, etc.)
- API start_stream() spawns backend as a task, returns event receiver
- Runner loops over events, sends content to conversation pane but
suppresses <tool_call> XML with a buffered tail for partial tags
- OpenAI backend refactored to stream_events() — no more UI coupling
- Anthropic backend gets a wrapper that synthesizes events from the
existing stream() (TODO: native event streaming)
- chat_completion_stream() kept for subconscious agents, reimplemented
on top of the event stream
- Usage derives Clone
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Tool call parsing was only in runner.rs, so subconscious agents
(poc-memory agent run) never recovered leaked tool calls from
models that emit <tool_call> as content text (e.g. Qwen via Crane).
Move the recovery into build_response_message where both code paths
share it. Leaked tool calls are promoted to structured tool_calls
and the content is cleaned, so all consumers see them uniformly.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
No more subcrate nesting — src/, agents/, schema/, defaults/, build.rs
all live at the workspace root. poc-daemon remains as the only workspace
member. Crate name (poc-memory) and all imports unchanged.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-03-25 00:54:12 -04:00
Renamed from poc-memory/src/agent/api/mod.rs (Browse further)