Parser consumes stream directly, yields tool calls via channel

ResponseParser::run() spawns a task that reads StreamTokens, parses
into the AST (locking context per token), and sends PendingToolCalls
through a channel. Returns (tool_rx, JoinHandle<Result>) — the turn
loop dispatches tool calls and awaits the handle for error checking.

Token IDs from vLLM are accumulated alongside text and stored directly
on AST leaves — no local re-encoding on the response path.

The turn loop no longer matches on individual stream events. It just
reads tool calls and dispatches them.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
This commit is contained in:
Kent Overstreet 2026-04-08 16:32:00 -04:00
parent 0b9813431a
commit 2c401e24d6
3 changed files with 119 additions and 85 deletions

View file

@ -8,14 +8,13 @@
pub mod http;
use anyhow::Result;
use std::time::{Duration, Instant};
use self::http::{HttpClient, HttpResponse};
use anyhow::Result;
use tokio::sync::mpsc;
use serde::Deserialize;
use http::{HttpClient, HttpResponse};
#[derive(Debug, Clone, Deserialize)]
pub struct Usage {
pub prompt_tokens: u32,