Deleted the directory-walking CLAUDE.md/POC.md loader. Identity now comes entirely from personality_nodes in the memory graph. Simplified: - assemble_context_message() takes just personality_nodes - Removed config_file_count/memory_file_count tracking - reload_for_model() → reload_context() (no longer model-specific) Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2.6 KiB
Qwen 3.5 Thinking Mode Fix
Problem
poc-agent uses Qwen 3.5 27B but thinking traces (<think>...</think>) aren't appearing.
Root Causes
1. Generation prompt missing <think>\n
Qwen 3.5's chat template adds <think>\n after <|im_start|>assistant\n when thinking is enabled. poc-agent doesn't do this.
Current (mod.rs:287):
tokens.extend(tokenizer::encode("assistant\n"));
Fix:
tokens.extend(tokenizer::encode("assistant\n<think>\n"));
2. Missing presence_penalty
Research shows thinking mode needs presence_penalty: 1.5 to prevent repetitive/circular thinking.
Current (api/mod.rs:36-40):
pub(crate) struct SamplingParams {
pub temperature: f32,
pub top_p: f32,
pub top_k: u32,
}
Fix - add to struct:
pub presence_penalty: f32,
And add to API request (api/mod.rs:117-128):
"presence_penalty": sampling.presence_penalty,
3. Using /completions endpoint
poc-agent uses /completions with raw tokens, not /chat/completions. This bypasses vLLM's chat template handling entirely. Any server-side --chat-template-kwargs '{"enable_thinking": true}' config has no effect.
This isn't necessarily wrong - it just means poc-agent must handle thinking tokens manually.
Qwen 3.5 vs Qwen 3
Important: Qwen 3.5 removed soft switch support. The /think and /no_think commands that worked in Qwen 3 do NOT work in Qwen 3.5.
Thinking must be controlled via:
enable_thinkingparameter in chat template- Or manually adding
<think>\nto the generation prompt
Recommended Sampling Parameters
From Unsloth documentation:
Thinking Mode - Precise Coding:
- Temperature: 0.6 (poc-agent already uses this)
- Top-p: 0.95
- Top-k: 20
- Presence penalty: 1.5
Implementation Options
Option A: Always enable thinking
Just add <think>\n to the generation prompt. Simple, always-on thinking.
Option B: Configurable thinking
Add enable_thinking: bool to agent state/config. When true, add <think>\n. When false, add <think>\n\n</think>\n\n (empty think block tells model to skip thinking).
Option C: Think tool approach
Instead of native <think> tags, add a "think" tool (like Anthropic's approach). The model calls it explicitly when it needs to reason. More control, but different from Qwen's native approach.
Sources
- Unsloth Qwen3.5 Guide
- HuggingFace Qwen3.5-27B
- Anthropic Think Tool
- Chat template:
~/.consciousness/qwen-chat-template.jinja2lines 147-154