forked from kent/consciousness
context: tighten timestamp schema; every AstNode has one
Previously NodeLeaf.timestamp and AstNode::Branch.timestamp accepted null or missing via a deserialize_timestamp_or_epoch fallback — legacy entries in conversation.jsonl from before Branch timestamps existed (and from before chrono serialization was wired up) would load with UNIX_EPOCH as a sentinel. Downstream, node_timestamp_ns() returned Option<i64> and callers had to handle None as "old entry, skip." That second filter was silently dropping every candidate in score_finetune_candidates when scoring an older session — the F6 screen showed "0 above threshold" even when max_divergence was orders of magnitude above the threshold, because every entry was failing the None check, not the divergence check. The fix, in three parts: 1. src/bin/fix-timestamps.rs — one-off migration tool that walks a conversation.jsonl, linearly interpolates timestamps for entries stuck at UNIX_EPOCH (using surrounding real timestamps as anchors), propagates to child leaves with per-sibling ns offsets, and bumps any collisions by 1 ns for uniqueness. Ran against the current session's log: 11887 entries, 72289 ns bumps, all unique. 2. context.rs — drop default_timestamp and deserialize_timestamp_or_epoch. NodeLeaf and Branch now require a present non-null timestamp on deserialize. Tests flip from "missing/null → UNIX_EPOCH" to "missing/null → Err." 3. subconscious/learn.rs — node_timestamp_ns now returns i64, not Option<i64>. The matching caller in score_finetune_candidates collapses from a Some/None match to a single trained-set check. mind/log.rs's oldest_timestamp no longer filters UNIX_EPOCH. Every line currently on disk has already been migrated. Going forward, new AstNodes always carry real timestamps (Utc::now() at construction time), so the strict schema is the invariant, not an aspiration. Co-Authored-By: Proof of Concept <poc@bcachefs.org>
This commit is contained in:
parent
77822992c8
commit
080b4f9084
4 changed files with 210 additions and 71 deletions
|
|
@ -516,16 +516,11 @@ pub async fn score_finetune_candidates(
|
|||
|
||||
let node = &entries[entry_idx];
|
||||
|
||||
// Get timestamp and skip if already trained
|
||||
let timestamp_ns = match node_timestamp_ns(node) {
|
||||
Some(ts) => {
|
||||
if trained.contains(&ts) {
|
||||
continue; // Already trained, skip
|
||||
}
|
||||
ts
|
||||
}
|
||||
None => continue, // No timestamp, skip
|
||||
};
|
||||
// Skip if already trained on.
|
||||
let timestamp_ns = node_timestamp_ns(node);
|
||||
if trained.contains(×tamp_ns) {
|
||||
continue;
|
||||
}
|
||||
|
||||
// Extract response text
|
||||
let response_text = match node {
|
||||
|
|
@ -661,18 +656,15 @@ pub fn mark_trained(timestamp_ns: i64) {
|
|||
}
|
||||
|
||||
/// Get timestamp in nanoseconds from an AstNode.
|
||||
/// Returns None for entries with default UNIX_EPOCH timestamp (old data)
|
||||
/// or timestamps outside the representable nano range (pre-1677 or post-2262).
|
||||
pub fn node_timestamp_ns(node: &AstNode) -> Option<i64> {
|
||||
/// i64-ns representation covers 1677..2262 via chrono; timestamps
|
||||
/// outside that window would be bugs we'd want to surface, hence panic.
|
||||
pub fn node_timestamp_ns(node: &AstNode) -> i64 {
|
||||
let ts = match node {
|
||||
AstNode::Leaf(leaf) => leaf.timestamp(),
|
||||
AstNode::Branch { timestamp, .. } => *timestamp,
|
||||
};
|
||||
if ts == chrono::DateTime::UNIX_EPOCH {
|
||||
None // Old entry without real timestamp
|
||||
} else {
|
||||
ts.timestamp_nanos_opt()
|
||||
}
|
||||
ts.timestamp_nanos_opt()
|
||||
.expect("timestamp outside i64-ns representable range (1677..2262)")
|
||||
}
|
||||
|
||||
// ── Training API ────────────────────────────────────────────────
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue