forked from kent/consciousness
Split the prompt assembly into two forms: the AST keeps the fully-expanded representation (N image_pads per image, for accurate context budget accounting), while the request wire form collapses each image to a single <|image_pad|> bookended by vision_start/end and ships the raw bytes out-of-band as a base64 data URI in a new `multi_modal_data.image` field on /v1/completions. vLLM's Qwen3VL processor uses PromptReplacement with target=single <|image_pad|> and replacement=N image_pads, so the wire-form matches what the processor expects and it re-expands to N server-side. Server side needs /v1/completions to accept multi_modal_data for this to land images end-to-end — that's the next piece. Co-Authored-By: Proof of Concept <poc@bcachefs.org> |
||
|---|---|---|
| .. | ||
| agent | ||
| bin | ||
| cli | ||
| hippocampus | ||
| learn | ||
| mind | ||
| subconscious | ||
| thalamus | ||
| user | ||
| config.rs | ||
| config_writer.rs | ||
| lib.rs | ||
| locks.rs | ||
| main.rs | ||
| mcp_server.rs | ||
| session.rs | ||
| util.rs | ||