Commit graph

7 commits

Author SHA1 Message Date
Kent Overstreet
e8c3ed3d96 switch memory scoring to /v1/score endpoint
Replace prompt_logprobs-based scoring with the new vLLM /v1/score
endpoint. Much simpler: one API call per memory drop, returns
per-message total_logprob directly. No chunking needed, no OOM risk
— the endpoint only computes logits for scored tokens.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-03 00:40:27 -04:00
Kent Overstreet
4f19c02e50 reuse HTTP client across scoring calls for connection pooling
Single reqwest::Client shared across all prompt_logprobs calls
instead of creating a new one per call. Keeps HTTP connections
alive for faster sequential requests.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-02 23:11:40 -04:00
Kent Overstreet
78abf90461 fix scoring: HTTP error checking, context refresh, chunk logging
Check HTTP status from logprobs API (was silently ignoring 500s).
Call publish_context_state() after storing scores so F10 screen
updates. Add chunk size logging for OOM debugging.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-02 22:47:44 -04:00
Kent Overstreet
29b3aeca57 chunk scoring calls to avoid OOM on large contexts
Split conversation into ~50K token chunks (configurable via
scoring_chunk_tokens in config) for prompt_logprobs calls.
Each chunk ends at an assistant message boundary. Avoids the
~40GB logprobs tensor allocation that OOM'd on full contexts.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-02 22:35:29 -04:00
Kent Overstreet
19205b9bae show scoring progress and per-response memory attribution
Status bar shows "scoring 3/7..." during scoring. Debug pane logs
per-memory importance and top-5 response breakdowns. F10 context
screen shows which memories were important for each assistant
response as drilldown children (← memory_key (score)).

Added important_memories_for_entry() to look up the matrix by
conversation entry index.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-02 22:27:43 -04:00
Kent Overstreet
c01d4a5b08 wire up /score command and debug screen for memory importance
/score snapshots the context and client, releases the agent lock,
runs scoring in background. Only one score task at a time
(scoring_in_flight flag). Results stored on Agent and shown on
the F10 context debug screen with importance scores per memory.

ApiClient derives Clone. ContextState derives Clone.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-02 22:21:31 -04:00
Kent Overstreet
df9b610c7f add memory importance scoring via prompt logprobs
score_memories() drops each memory from the context one at a time,
runs prompt_logprobs against the full conversation, and builds a
divergence matrix: memories × responses.

Row sums = memory importance (for graph weight updates)
Column sums = response memory-dependence (training candidates)

Uses vLLM's prompt_logprobs to check "would the model have said
this without this memory?" — one forward pass per memory, all
responses scored at once. ~3s per memory on B200.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-02 22:13:55 -04:00