From e6c7b82a0ffeed071f7a2a35fe21a68c2c319f0f Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Thu, 9 Apr 2026 20:06:12 -0400 Subject: [PATCH] readme: vllm notes --- README.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/README.md b/README.md index c34cb4d..3c67540 100644 --- a/README.md +++ b/README.md @@ -17,6 +17,15 @@ maintenance) — loosely modelled on how biological memory works. Channels - sensory inputs - map to the thalamus, as focus/sensory gating must be managed to effectively function in such an environment. +Notes, requirements: Currently only Qwen 3.5 is supported, as 27b is what we've +been running against; supporting other models would require re-adding support +for generic chat completions, tool call parsing etc. in src/agent/context.rs. + +Development has been done with vllm for the backend, with additional patches +for calculating logits on subsections of large messages (without this vllm will +attempt to allocate a 40GB tensor and OOM), and a wrapper for hooking in Apollo +for fine tuning the same model that inference is running on in GPU memory. + ## Architectural innovations: Memory is both episodic and associative, represented as a weighted graph, where