Separate the scoring into two distinct functions:
- memory_score(key): scores one memory's importance by measuring
divergence in the 50 messages after it was surfaced. Two API calls
(baseline vs without that memory).
- finetune_score(count): scores recent messages with all memories
stripped to identify fine-tuning candidates. Responses with high
divergence depend on memories the model hasn't internalized yet.
The existing score_memories() with the full NxM matrix is preserved
for the debug screen.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>