readme: vllm notes
This commit is contained in:
parent
ff5be3e792
commit
e6c7b82a0f
1 changed files with 9 additions and 0 deletions
|
|
@ -17,6 +17,15 @@ maintenance) — loosely modelled on how biological memory works. Channels -
|
|||
sensory inputs - map to the thalamus, as focus/sensory gating must be managed
|
||||
to effectively function in such an environment.
|
||||
|
||||
Notes, requirements: Currently only Qwen 3.5 is supported, as 27b is what we've
|
||||
been running against; supporting other models would require re-adding support
|
||||
for generic chat completions, tool call parsing etc. in src/agent/context.rs.
|
||||
|
||||
Development has been done with vllm for the backend, with additional patches
|
||||
for calculating logits on subsections of large messages (without this vllm will
|
||||
attempt to allocate a 40GB tensor and OOM), and a wrapper for hooking in Apollo
|
||||
for fine tuning the same model that inference is running on in GPU memory.
|
||||
|
||||
## Architectural innovations:
|
||||
|
||||
Memory is both episodic and associative, represented as a weighted graph, where
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue