- Always display reasoning tokens regardless of reasoning_effort
setting — Qwen 3.5 thinks natively and the reasoning parser
separates it into its own field
- Remove chat_template_kwargs that disabled thinking when
reasoning_effort was "none"
- Add chat_template_kwargs field to ChatRequest for vllm compat
- Update provision script: qwen3_xml tool parser, qwen3 reasoning
parser, 262K context, 95% GPU memory utilization
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split poc-agent into lib + bin so its API client, types, and tool
dispatch can be imported by poc-memory. This is the foundation for
replacing claude CLI subprocess calls with direct API calls to
vllm/OpenAI-compatible endpoints.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move poc-agent (substrate-independent AI agent framework) into the
memory workspace as a step toward using its API client for direct
LLM calls instead of shelling out to claude CLI.
Agent prompt improvements:
- distill: rewrite from hub-focused to knowledge-flow-focused.
Now walks upward from seed nodes to find and refine topic nodes,
instead of only maintaining high-degree hubs.
- distill: remove "don't touch journal entries" restriction
- memory-instructions-core: add "Make it alive" section — write
with creativity and emotional texture, not spreadsheet summaries
- memory-instructions-core: add "Show your reasoning" section —
agents must explain decisions, especially when they do nothing
- linker: already had emotional texture guidance (kept as-is)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>