# Qwen 3.5 Thinking Mode Fix ## Problem poc-agent uses Qwen 3.5 27B but thinking traces (`...`) aren't appearing. ## Root Causes ### 1. Generation prompt missing `\n` Qwen 3.5's chat template adds `\n` after `<|im_start|>assistant\n` when thinking is enabled. poc-agent doesn't do this. **Current** (`mod.rs:287`): ```rust tokens.extend(tokenizer::encode("assistant\n")); ``` **Fix**: ```rust tokens.extend(tokenizer::encode("assistant\n\n")); ``` ### 2. Missing `presence_penalty` Research shows thinking mode needs `presence_penalty: 1.5` to prevent repetitive/circular thinking. **Current** (`api/mod.rs:36-40`): ```rust pub(crate) struct SamplingParams { pub temperature: f32, pub top_p: f32, pub top_k: u32, } ``` **Fix** - add to struct: ```rust pub presence_penalty: f32, ``` **And add to API request** (`api/mod.rs:117-128`): ```json "presence_penalty": sampling.presence_penalty, ``` ### 3. Using `/completions` endpoint poc-agent uses `/completions` with raw tokens, not `/chat/completions`. This bypasses vLLM's chat template handling entirely. Any server-side `--chat-template-kwargs '{"enable_thinking": true}'` config has no effect. This isn't necessarily wrong - it just means poc-agent must handle thinking tokens manually. ## Qwen 3.5 vs Qwen 3 Important: **Qwen 3.5 removed soft switch support**. The `/think` and `/no_think` commands that worked in Qwen 3 do NOT work in Qwen 3.5. Thinking must be controlled via: - `enable_thinking` parameter in chat template - Or manually adding `\n` to the generation prompt ## Recommended Sampling Parameters From Unsloth documentation: **Thinking Mode - Precise Coding:** - Temperature: 0.6 (poc-agent already uses this) - Top-p: 0.95 - Top-k: 20 - Presence penalty: 1.5 ## Implementation Options ### Option A: Always enable thinking Just add `\n` to the generation prompt. Simple, always-on thinking. ### Option B: Configurable thinking Add `enable_thinking: bool` to agent state/config. When true, add `\n`. When false, add `\n\n\n\n` (empty think block tells model to skip thinking). ### Option C: Think tool approach Instead of native `` tags, add a "think" tool (like Anthropic's approach). The model calls it explicitly when it needs to reason. More control, but different from Qwen's native approach. ## Sources - [Unsloth Qwen3.5 Guide](https://unsloth.ai/docs/models/qwen3.5) - [HuggingFace Qwen3.5-27B](https://huggingface.co/Qwen/Qwen3.5-27B) - [Anthropic Think Tool](https://www.anthropic.com/engineering/claude-think-tool) - Chat template: `~/.consciousness/qwen-chat-template.jinja2` lines 147-154