consciousness/src/agent
Kent Overstreet 204ba5570a agent: send images as multi_modal_data on completion requests
Split the prompt assembly into two forms: the AST keeps the
fully-expanded representation (N image_pads per image, for accurate
context budget accounting), while the request wire form collapses
each image to a single <|image_pad|> bookended by vision_start/end
and ships the raw bytes out-of-band as a base64 data URI in a new
`multi_modal_data.image` field on /v1/completions.

vLLM's Qwen3VL processor uses PromptReplacement with target=single
<|image_pad|> and replacement=N image_pads, so the wire-form matches
what the processor expects and it re-expands to N server-side.

Server side needs /v1/completions to accept multi_modal_data for
this to land images end-to-end — that's the next piece.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 18:08:26 -04:00
..
api agent: send images as multi_modal_data on completion requests 2026-04-16 18:08:26 -04:00
tools agent: rewrite view_image to emit Image leaves 2026-04-16 18:06:25 -04:00
context.rs agent: send images as multi_modal_data on completion requests 2026-04-16 18:08:26 -04:00
mod.rs agent: send images as multi_modal_data on completion requests 2026-04-16 18:08:26 -04:00
oneshot.rs config: move user_name/assistant_name to AppConfig (top level) 2026-04-16 16:20:17 -04:00
tokenizer.rs agent: add NodeBody::Image for Qwen3-VL vision input 2026-04-16 18:00:10 -04:00