poc-memory now reads from poc-agent's config.json5 as the primary
config source. Memory-specific settings live in a "memory" section;
API credentials are resolved from the shared model/backend config
instead of being duplicated.
- Add "memory" section to ~/.config/poc-agent/config.json5
- poc-memory config.rs: try shared config first, fall back to
legacy JSONL
- API fields (base_url, api_key, model) resolved via
memory.agent_model -> models -> backend lookup
- Add json5 dependency for proper JSON5 parsing
- Update provisioning scripts: hermes -> qwen3_coder tool parser
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The cursor index is into self.input, but the rendered buffer contains
the prompt prepended to the first line. Need to add prompt.len() to
get the correct character position when scanning the buffer.
- Always display reasoning tokens regardless of reasoning_effort
setting — Qwen 3.5 thinks natively and the reasoning parser
separates it into its own field
- Remove chat_template_kwargs that disabled thinking when
reasoning_effort was "none"
- Add chat_template_kwargs field to ChatRequest for vllm compat
- Update provision script: qwen3_xml tool parser, qwen3 reasoning
parser, 262K context, 95% GPU memory utilization
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sets up vllm with Qwen 2.5 27B Instruct, prefix caching enabled,
Hermes tool call parser for function calling support. Configurable
via environment variables (MODEL, PORT, MAX_MODEL_LEN).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>