The /score endpoint was receiving chat-format messages which had to go through the chat template tokenizer — this was failing with "System message must be first" errors because the AST structure doesn't map cleanly to chat message format. Send raw token IDs via the new `prompt` field instead, matching what the /completions endpoint already does. The vLLM score endpoint finds assistant boundaries by scanning for <|im_start|>assistant token patterns, so no message-level metadata is needed. Also includes identity and journal sections in the scored context, matching what the model actually sees during inference. Co-Authored-By: Proof of Concept <poc@bcachefs.org> |
||
|---|---|---|
| .. | ||
| agents | ||
| audit.rs | ||
| consolidate.rs | ||
| daemon.rs | ||
| defs.rs | ||
| digest.rs | ||
| learn.rs | ||
| mod.rs | ||
| prompts.rs | ||