consciousness

spqrz/consciousness

Fork 0

forked from kent/consciousness

Commit graph

Author	SHA1	Message	Date
Kent Overstreet	bf3e2a9b73	WIP: Rename context_new → context, delete old files, fix UI layer Renamed context_new.rs to context.rs, deleted context_old.rs, types.rs, openai.rs, parsing.rs. Updated all imports. Rewrote user/context.rs and user/widgets.rs for new types. Stubbed working_stack tool. Killed tokenize_conv_entry. Remaining: mind/mod.rs, mind/dmn.rs, learn.rs, chat.rs, subconscious.rs, oneshot.rs. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-08 15:20:26 -04:00
Kent Overstreet	f1397b7783	Redesign context AST: typed NodeBody, Role as grammar roles, tests Role is now just System/User/Assistant — maps 1:1 to the grammar. Leaf types are NodeBody variants: Content, Thinking, ToolCall, ToolResult, Memory, Dmn, Log. Each variant renders itself; no Role needed on leaves. AstNode is Leaf(NodeLeaf) \| Branch{role, children}. ContextState holds four Vec<AstNode> sections directly. Moved tool call XML parsing from api/parsing.rs into context_new.rs so all grammar knowledge lives in one place. Tokenizer encode() now returns empty vec when uninitialized instead of panicking, so tests work without the tokenizer file. 26 tests: XML parsing, incremental streaming (char-by-char feeds found and fixed a lookahead bug), rendering for all node types, tokenizer round-trip verification. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-08 13:35:04 -04:00
Kent Overstreet	5e4067c04f	Replace token counting with token generation via HuggingFace tokenizer Add agent/tokenizer.rs with global Qwen 3.5 tokenizer that generates actual token IDs including chat template wrapping. ContextEntry now stores token_ids: Vec<u32> instead of tokens: usize — the count is derived from the length. ContextEntry::new() tokenizes automatically via the global tokenizer. ContextSection::push_entry() takes a raw ConversationEntry and tokenizes it. set_message() re-tokenizes without needing an external tokenizer parameter. Token IDs include the full chat template: <\|im_start\|>role\ncontent <\|im_end\|>\n — so concatenating token_ids across entries produces a ready-to-send prompt for vLLM's /v1/completions endpoint. The old tiktoken CoreBPE is now unused on Agent (will be removed in a followup). Token counts are now exact for Qwen 3.5 instead of the ~85-90% approximation from cl100k_base. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-08 11:20:03 -04:00

Author

SHA1

Message

Date

Kent Overstreet

bf3e2a9b73

WIP: Rename context_new → context, delete old files, fix UI layer

Renamed context_new.rs to context.rs, deleted context_old.rs,
types.rs, openai.rs, parsing.rs. Updated all imports. Rewrote
user/context.rs and user/widgets.rs for new types. Stubbed
working_stack tool. Killed tokenize_conv_entry.

Remaining: mind/mod.rs, mind/dmn.rs, learn.rs, chat.rs,
subconscious.rs, oneshot.rs.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>

2026-04-08 15:20:26 -04:00

Kent Overstreet

f1397b7783

Redesign context AST: typed NodeBody, Role as grammar roles, tests

Role is now just System/User/Assistant — maps 1:1 to the grammar.
Leaf types are NodeBody variants: Content, Thinking, ToolCall,
ToolResult, Memory, Dmn, Log. Each variant renders itself; no Role
needed on leaves. AstNode is Leaf(NodeLeaf) | Branch{role, children}.
ContextState holds four Vec<AstNode> sections directly.

Moved tool call XML parsing from api/parsing.rs into context_new.rs
so all grammar knowledge lives in one place.

Tokenizer encode() now returns empty vec when uninitialized instead
of panicking, so tests work without the tokenizer file.

26 tests: XML parsing, incremental streaming (char-by-char feeds
found and fixed a lookahead bug), rendering for all node types,
tokenizer round-trip verification.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>

2026-04-08 13:35:04 -04:00

Kent Overstreet

5e4067c04f

Replace token counting with token generation via HuggingFace tokenizer

Add agent/tokenizer.rs with global Qwen 3.5 tokenizer that generates
actual token IDs including chat template wrapping. ContextEntry now
stores token_ids: Vec<u32> instead of tokens: usize — the count is
derived from the length.

ContextEntry::new() tokenizes automatically via the global tokenizer.
ContextSection::push_entry() takes a raw ConversationEntry and
tokenizes it. set_message() re-tokenizes without needing an external
tokenizer parameter.

Token IDs include the full chat template: <|im_start|>role\ncontent
<|im_end|>\n — so concatenating token_ids across entries produces a
ready-to-send prompt for vLLM's /v1/completions endpoint.

The old tiktoken CoreBPE is now unused on Agent (will be removed in
a followup). Token counts are now exact for Qwen 3.5 instead of the
~85-90% approximation from cl100k_base.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>

2026-04-08 11:20:03 -04:00

3 commits