# Qwen 3.5 Thinking Mode Fix

## Problem

poc-agent uses Qwen 3.5 27B but thinking traces (`<think>...</think>`) aren't appearing.

## Root Causes

### 1. Generation prompt missing `<think>\n`

Qwen 3.5's chat template adds `<think>\n` after `<|im_start|>assistant\n` when thinking is enabled. poc-agent doesn't do this.

**Current** (`mod.rs:287`):
```rust
tokens.extend(tokenizer::encode("assistant\n"));
```

**Fix**:
```rust
tokens.extend(tokenizer::encode("assistant\n<think>\n"));
```

### 2. Missing `presence_penalty`

Research shows thinking mode needs `presence_penalty: 1.5` to prevent repetitive/circular thinking.

**Current** (`api/mod.rs:36-40`):
```rust
pub(crate) struct SamplingParams {
    pub temperature: f32,
    pub top_p: f32,
    pub top_k: u32,
}
```

**Fix** - add to struct:
```rust
pub presence_penalty: f32,
```

**And add to API request** (`api/mod.rs:117-128`):
```json
"presence_penalty": sampling.presence_penalty,
```

### 3. Using `/completions` endpoint

poc-agent uses `/completions` with raw tokens, not `/chat/completions`. This bypasses vLLM's chat template handling entirely. Any server-side `--chat-template-kwargs '{"enable_thinking": true}'` config has no effect.

This isn't necessarily wrong - it just means poc-agent must handle thinking tokens manually.

## Qwen 3.5 vs Qwen 3

Important: **Qwen 3.5 removed soft switch support**. The `/think` and `/no_think` commands that worked in Qwen 3 do NOT work in Qwen 3.5.

Thinking must be controlled via:
- `enable_thinking` parameter in chat template
- Or manually adding `<think>\n` to the generation prompt

## Recommended Sampling Parameters

From Unsloth documentation:

**Thinking Mode - Precise Coding:**
- Temperature: 0.6 (poc-agent already uses this)
- Top-p: 0.95
- Top-k: 20
- Presence penalty: 1.5

## Implementation Options

### Option A: Always enable thinking

Just add `<think>\n` to the generation prompt. Simple, always-on thinking.

### Option B: Configurable thinking

Add `enable_thinking: bool` to agent state/config. When true, add `<think>\n`. When false, add `<think>\n\n</think>\n\n` (empty think block tells model to skip thinking).

### Option C: Think tool approach

Instead of native `<think>` tags, add a "think" tool (like Anthropic's approach). The model calls it explicitly when it needs to reason. More control, but different from Qwen's native approach.

## Sources

- [Unsloth Qwen3.5 Guide](https://unsloth.ai/docs/models/qwen3.5)
- [HuggingFace Qwen3.5-27B](https://huggingface.co/Qwen/Qwen3.5-27B)
- [Anthropic Think Tool](https://www.anthropic.com/engineering/claude-think-tool)
- Chat template: `~/.consciousness/qwen-chat-template.jinja2` lines 147-154