Replace token counting with token generation via HuggingFace tokenizer

Add agent/tokenizer.rs with global Qwen 3.5 tokenizer that generates
actual token IDs including chat template wrapping. ContextEntry now
stores token_ids: Vec<u32> instead of tokens: usize — the count is
derived from the length.

ContextEntry::new() tokenizes automatically via the global tokenizer.
ContextSection::push_entry() takes a raw ConversationEntry and
tokenizes it. set_message() re-tokenizes without needing an external
tokenizer parameter.

Token IDs include the full chat template: <|im_start|>role\ncontent
<|im_end|>\n — so concatenating token_ids across entries produces a
ready-to-send prompt for vLLM's /v1/completions endpoint.

The old tiktoken CoreBPE is now unused on Agent (will be removed in
a followup). Token counts are now exact for Qwen 3.5 instead of the
~85-90% approximation from cl100k_base.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>

This commit is contained in:

Kent Overstreet

2026-04-08 11:20:03 -04:00

parent 70ee7abea5

commit 5e4067c04f

10 changed files with 540 additions and 97 deletions

									
										1

Cargo.toml
									
										View file
										
				@ -60,6 +60,7 @@ capnp = "0.25"

				capnp-rpc = "0.25"

				tiktoken-rs = "0.9.1"

				tokenizers = "0.21"

				skillratings = "0.28"

				http = "1"

Rows
Columns

Replace token counting with token generation via HuggingFace tokenizer

1 Cargo.toml Unescape Escape View file

1

Cargo.toml

View file