readme: vllm notes

2026-04-09 20:06:12 -04:00 · 2026-04-09 20:06:12 -04:00 · e6c7b82a0f
commit e6c7b82a0f
parent ff5be3e792
1 changed files with 9 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -17,6 +17,15 @@ maintenance) — loosely modelled on how biological memory works. Channels -
 sensory inputs - map to the thalamus, as focus/sensory gating must be managed
 to effectively function in such an environment.
 Notes, requirements: Currently only Qwen 3.5 is supported, as 27b is what we've
 been running against; supporting other models would require re-adding support
 for generic chat completions, tool call parsing etc. in src/agent/context.rs.
 Development has been done with vllm for the backend, with additional patches
 for calculating logits on subsections of large messages (without this vllm will
 attempt to allocate a 40GB tensor and OOM), and a wrapper for hooking in Apollo
 for fine tuning the same model that inference is running on in GPU memory.
 ## Architectural innovations:
 Memory is both episodic and associative, represented as a weighted graph, where