consciousness

History

Kent Overstreet 204ba5570a agent: send images as multi_modal_data on completion requests Split the prompt assembly into two forms: the AST keeps the fully-expanded representation (N image_pads per image, for accurate context budget accounting), while the request wire form collapses each image to a single <\|image_pad\|> bookended by vision_start/end and ships the raw bytes out-of-band as a base64 data URI in a new `multi_modal_data.image` field on /v1/completions. vLLM's Qwen3VL processor uses PromptReplacement with target=single <\|image_pad\|> and replacement=N image_pads, so the wire-form matches what the processor expects and it re-expands to N server-side. Server side needs /v1/completions to accept multi_modal_data for this to land images end-to-end — that's the next piece. Co-Authored-By: Proof of Concept <poc@bcachefs.org>		2026-04-16 18:08:26 -04:00
..
api	agent: send images as multi_modal_data on completion requests	2026-04-16 18:08:26 -04:00
tools	agent: rewrite view_image to emit Image leaves	2026-04-16 18:06:25 -04:00
context.rs	agent: send images as multi_modal_data on completion requests	2026-04-16 18:08:26 -04:00
mod.rs	agent: send images as multi_modal_data on completion requests	2026-04-16 18:08:26 -04:00
oneshot.rs	config: move user_name/assistant_name to AppConfig (top level)	2026-04-16 16:20:17 -04:00
tokenizer.rs	agent: add NodeBody::Image for Qwen3-VL vision input	2026-04-16 18:00:10 -04:00