agent: rewrite view_image to emit Image leaves

view_image now reads the file, grabs dimensions via imagesize (no full
decode), and pushes a user-role branch containing a NodeBody::Image
leaf straight into the conversation. The tool_result is just a short
acknowledgment — the actual pixels ride in the Image leaf for the API
layer to extract into multi_modal_data.

Drops the capture_tmux_pane path, which had no business living under
"vision" (tmux text capture belongs in bash or a dedicated tool, and
this one just returned rendered text anyway).

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
This commit is contained in:
Kent Overstreet 2026-04-16 18:06:25 -04:00
parent 0bf71b9110
commit 91106deaa1
4 changed files with 48 additions and 72 deletions

7
Cargo.lock generated
View file

@ -492,6 +492,7 @@ dependencies = [
"http-body-util",
"hyper",
"hyper-util",
"imagesize",
"json-five",
"libc",
"log",
@ -1423,6 +1424,12 @@ dependencies = [
"winapi-util",
]
[[package]]
name = "imagesize"
version = "0.14.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "09e54e57b4c48b40f7aec75635392b12b3421fa26fe8b4332e63138ed278459c"
[[package]]
name = "indexmap"
version = "2.14.0"