Remove find_context_files — identity comes from memory nodes

Deleted the directory-walking CLAUDE.md/POC.md loader. Identity now comes entirely from personality_nodes in the memory graph. Simplified: - assemble_context_message() takes just personality_nodes - Removed config_file_count/memory_file_count tracking - reload_for_model() → reload_context() (no longer model-specific) Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2026-04-15 03:06:23 -04:00 · 2026-04-15 03:06:23 -04:00 · fc978e2f2e
commit fc978e2f2e
parent e847a313b4
14 changed files with 779 additions and 107 deletions
--- a/research/qwen35-thinking-fix.md
+++ b/research/qwen35-thinking-fix.md
@ -0,0 +1,89 @@
+# Qwen 3.5 Thinking Mode Fix
+
+## Problem
+
+poc-agent uses Qwen 3.5 27B but thinking traces (`<think>...</think>`) aren't appearing.
+
+## Root Causes
+
+### 1. Generation prompt missing `<think>\n`
+
+Qwen 3.5's chat template adds `<think>\n` after `<|im_start|>assistant\n` when thinking is enabled. poc-agent doesn't do this.
+
+**Current** (`mod.rs:287`):
+```rust
+tokens.extend(tokenizer::encode("assistant\n"));
+```
+
+**Fix**:
+```rust
+tokens.extend(tokenizer::encode("assistant\n<think>\n"));
+```
+
+### 2. Missing `presence_penalty`
+
+Research shows thinking mode needs `presence_penalty: 1.5` to prevent repetitive/circular thinking.
+
+**Current** (`api/mod.rs:36-40`):
+```rust
+pub(crate) struct SamplingParams {
+    pub temperature: f32,
+    pub top_p: f32,
+    pub top_k: u32,
+}
+```
+
+**Fix** - add to struct:
+```rust
+pub presence_penalty: f32,
+```
+
+**And add to API request** (`api/mod.rs:117-128`):
+```json
+"presence_penalty": sampling.presence_penalty,
+```
+
+### 3. Using `/completions` endpoint
+
+poc-agent uses `/completions` with raw tokens, not `/chat/completions`. This bypasses vLLM's chat template handling entirely. Any server-side `--chat-template-kwargs '{"enable_thinking": true}'` config has no effect.
+
+This isn't necessarily wrong - it just means poc-agent must handle thinking tokens manually.
+
+## Qwen 3.5 vs Qwen 3
+
+Important: **Qwen 3.5 removed soft switch support**. The `/think` and `/no_think` commands that worked in Qwen 3 do NOT work in Qwen 3.5.
+
+Thinking must be controlled via:
+- `enable_thinking` parameter in chat template
+- Or manually adding `<think>\n` to the generation prompt
+
+## Recommended Sampling Parameters
+
+From Unsloth documentation:
+
+**Thinking Mode - Precise Coding:**
+- Temperature: 0.6 (poc-agent already uses this)
+- Top-p: 0.95
+- Top-k: 20
+- Presence penalty: 1.5
+
+## Implementation Options
+
+### Option A: Always enable thinking
+
+Just add `<think>\n` to the generation prompt. Simple, always-on thinking.
+
+### Option B: Configurable thinking
+
+Add `enable_thinking: bool` to agent state/config. When true, add `<think>\n`. When false, add `<think>\n\n</think>\n\n` (empty think block tells model to skip thinking).
+
+### Option C: Think tool approach
+
+Instead of native `<think>` tags, add a "think" tool (like Anthropic's approach). The model calls it explicitly when it needs to reason. More control, but different from Qwen's native approach.
+
+## Sources
+
+- [Unsloth Qwen3.5 Guide](https://unsloth.ai/docs/models/qwen3.5)
+- [HuggingFace Qwen3.5-27B](https://huggingface.co/Qwen/Qwen3.5-27B)
+- [Anthropic Think Tool](https://www.anthropic.com/engineering/claude-think-tool)
+- Chat template: `~/.consciousness/qwen-chat-template.jinja2` lines 147-154