Add /v1/completions streaming path with raw token IDs

New stream_completions() in openai.rs sends prompt as token IDs to the completions endpoint instead of JSON messages to chat/completions. Handles <think> tags in the response (split into Reasoning events) and stops on <|im_end|> token. start_stream_completions() on ApiClient provides the same interface as start_stream() but takes token IDs instead of Messages. The turn loop in Agent::turn() uses completions when the tokenizer is initialized, falling back to the chat API otherwise. This allows gradual migration — consciousness uses completions (Qwen tokenizer), Claude Code hook still uses chat API (Anthropic). Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-08 11:42:22 -04:00 · 2026-04-08 11:42:22 -04:00 · f458af6dec
commit f458af6dec
parent e9765799c4
3 changed files with 188 additions and 8 deletions
--- a/src/agent/api/mod.rs
+++ b/src/agent/api/mod.rs
@ -133,6 +133,34 @@ impl ApiClient {
        (rx, AbortOnDrop(handle))
    }

+    /// Start a streaming completion with raw token IDs.
+    /// No message formatting — the caller provides the complete prompt as tokens.
+    pub(crate) fn start_stream_completions(
+        &self,
+        prompt_tokens: &[u32],
+        sampling: SamplingParams,
+        priority: Option<i32>,
+    ) -> (mpsc::UnboundedReceiver<StreamEvent>, AbortOnDrop) {
+        let (tx, rx) = mpsc::unbounded_channel();
+        let client = self.client.clone();
+        let api_key = self.api_key.clone();
+        let model = self.model.clone();
+        let prompt_tokens = prompt_tokens.to_vec();
+        let base_url = self.base_url.clone();
+
+        let handle = tokio::spawn(async move {
+            let result = openai::stream_completions(
+                &client, &base_url, &api_key, &model,
+                &prompt_tokens, &tx, sampling, priority,
+            ).await;
+            if let Err(e) = result {
+                let _ = tx.send(StreamEvent::Error(e.to_string()));
+            }
+        });
+
+        (rx, AbortOnDrop(handle))
+    }
+
    pub(crate) async fn chat_completion_stream_temp(
        &self,
        messages: &[Message],