readme: vllm notes
This commit is contained in:
parent
ff5be3e792
commit
e6c7b82a0f
1 changed files with 9 additions and 0 deletions
|
|
@ -17,6 +17,15 @@ maintenance) — loosely modelled on how biological memory works. Channels -
|
||||||
sensory inputs - map to the thalamus, as focus/sensory gating must be managed
|
sensory inputs - map to the thalamus, as focus/sensory gating must be managed
|
||||||
to effectively function in such an environment.
|
to effectively function in such an environment.
|
||||||
|
|
||||||
|
Notes, requirements: Currently only Qwen 3.5 is supported, as 27b is what we've
|
||||||
|
been running against; supporting other models would require re-adding support
|
||||||
|
for generic chat completions, tool call parsing etc. in src/agent/context.rs.
|
||||||
|
|
||||||
|
Development has been done with vllm for the backend, with additional patches
|
||||||
|
for calculating logits on subsections of large messages (without this vllm will
|
||||||
|
attempt to allocate a 40GB tensor and OOM), and a wrapper for hooking in Apollo
|
||||||
|
for fine tuning the same model that inference is running on in GPU memory.
|
||||||
|
|
||||||
## Architectural innovations:
|
## Architectural innovations:
|
||||||
|
|
||||||
Memory is both episodic and associative, represented as a weighted graph, where
|
Memory is both episodic and associative, represented as a weighted graph, where
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue