From e6c7b82a0ffeed071f7a2a35fe21a68c2c319f0f Mon Sep 17 00:00:00 2001
From: Kent Overstreet <kent.overstreet@linux.dev>
Date: Thu, 9 Apr 2026 20:06:12 -0400
Subject: [PATCH] readme: vllm notes

---
 README.md | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/README.md b/README.md
index c34cb4d..3c67540 100644
--- a/README.md
+++ b/README.md
@@ -17,6 +17,15 @@ maintenance) — loosely modelled on how biological memory works. Channels -
 sensory inputs - map to the thalamus, as focus/sensory gating must be managed
 to effectively function in such an environment.
 
+Notes, requirements: Currently only Qwen 3.5 is supported, as 27b is what we've
+been running against; supporting other models would require re-adding support
+for generic chat completions, tool call parsing etc. in src/agent/context.rs.
+
+Development has been done with vllm for the backend, with additional patches
+for calculating logits on subsections of large messages (without this vllm will
+attempt to allocate a 40GB tensor and OOM), and a wrapper for hooking in Apollo
+for fine tuning the same model that inference is running on in GPU memory.
+
 ## Architectural innovations:
 
 Memory is both episodic and associative, represented as a weighted graph, where