learn: nanosecond timestamps, token ranges for /score

Two related changes to the learn subsystem: 1. AST node timestamps are now non-optional — both Leaf and Branch variants carry a DateTime<Utc>. UNIX_EPOCH means "unset" (old entries deserialized from on-disk conversation logs). Training uses timestamps as unique keys for dedup, so we promote to nanosecond precision: node_timestamp_ns(), TrainData.timestamp_ns, FinetuneCandidate.timestamp_ns, mark_trained(ns). 2. build_token_ids() now also returns token-position ranges of assistant messages. These are passed to vLLM's /score endpoint via the new score_ranges field so only scored-position logprobs are returned — cuts bandwidth/compute when scoring small windows. Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 11:48:37 -04:00 · 2026-04-16 11:48:37 -04:00 · 2b632d568b
commit 2b632d568b
parent 5d9d3ffc5b
5 changed files with 130 additions and 44 deletions
--- a/src/user/learn.rs
+++ b/src/user/learn.rs
@ -31,8 +31,8 @@ pub struct FinetuneCandidate {
    pub continuation_ids: Vec<u32>,
    /// What the model would have said without memories (if generated).
    pub alternate_text: Option<String>,
-    /// Timestamp in millis for tracking trained status.
-    pub timestamp_ms: i64,
+    /// Timestamp in nanos — used as unique key for trained-set dedup.
+    pub timestamp_ns: i64,
 }

 #[derive(Clone, Debug, PartialEq)]
@ -53,7 +53,7 @@ impl From<crate::subconscious::learn::FinetuneCandidate> for FinetuneCandidate {
            context_ids: c.context_ids,
            continuation_ids: c.continuation_ids,
            alternate_text: c.alternate_text,
-            timestamp_ms: c.timestamp_ms,
+            timestamp_ns: c.timestamp_ns,
        }
    }
 }