readme: vllm notes

This commit is contained in:
Kent Overstreet 2026-04-09 20:06:12 -04:00
parent ff5be3e792
commit e6c7b82a0f

View file

@ -17,6 +17,15 @@ maintenance) — loosely modelled on how biological memory works. Channels -
sensory inputs - map to the thalamus, as focus/sensory gating must be managed
to effectively function in such an environment.
Notes, requirements: Currently only Qwen 3.5 is supported, as 27b is what we've
been running against; supporting other models would require re-adding support
for generic chat completions, tool call parsing etc. in src/agent/context.rs.
Development has been done with vllm for the backend, with additional patches
for calculating logits on subsections of large messages (without this vllm will
attempt to allocate a 40GB tensor and OOM), and a wrapper for hooking in Apollo
for fine tuning the same model that inference is running on in GPU memory.
## Architectural innovations:
Memory is both episodic and associative, represented as a weighted graph, where