Commit graph

358 commits

Author SHA1 Message Date
Kent Overstreet
a075e30557 http: add HttpResponse::bytes() for binary downloads
Mirror of text(), but returns raw Bytes without lossy UTF-8 conversion.
Needed by the Telegram channel to fetch photo files.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-05-01 17:58:35 -04:00
Kent Overstreet
09896cd38b Revert "replace try_lock() with lock_blocking() across UI thread"
This reverts commit 4225294d16.
2026-04-25 17:15:53 -04:00
Kent Overstreet
4225294d16 replace try_lock() with lock_blocking() across UI thread
Add lock_blocking() to TrackedMutex: blocks current thread using
block_in_place + futures::executor::block_on, safe for sync contexts.

Replace all try_lock() calls with lock_blocking() in slash commands,
UI rendering, and status reads. Lock hold times are fast enough that
blocking briefly is fine, and this eliminates the spurious 'lock
unavailable' paths that were never actually hit.

Kept rx_mutex.try_lock() in mod.rs (std::sync::Mutex for stderr rx).
2026-04-25 15:35:14 -04:00
Kent Overstreet
5210f7dd66 context: heal pre-refactor image logs with token_count=0
Recompute image token counts from persisted dimensions when loading
old logs that stored count=0 (server-authoritative count was applied
after AppendImage before client-side pad expansion).

graph: cache neighbor sets for clustering coefficient

Pre-compute neighbor HashSets so the O(deg^2) triangle-counting
inner loop doesn't re-allocate on every (i,j) pair. avg_clustering_
coefficient() now builds the cache once instead of O(N*deg) times.
2026-04-25 15:15:21 -04:00
Kent Overstreet
371b40078d context: salvage in-flight tag accumulators on premature stream end
ResponseParser.finish() was only flushing self.buf — the rolling tail
window — and silently dropping self.think_buf and self.tool_call_buf.
When a stream ended inside an unterminated <think>...</think> or
<tool_call>...</tool_call> block (max_tokens reached, EOS before the
close tag, server-side cancel), all the accumulated in-tag content
was discarded and only the trailing ~8 bytes survived (drain_safe
keeps `close_tag.len()` bytes at the tail of buf to handle
across-chunk tag splits — and `</think>` is exactly 8 chars).

Symptom: assistant responses cut off, only the last few characters
come through. Especially severe in native-think mode where in_think
is set from prefill, so the entire response accumulates in
think_buf and gets wiped on premature stop.

In finish(): if in_think, drain buf into think_buf and emit as a
Thinking node (preserving the partial thought). If in_tool_call,
attempt to parse the body; on parse failure, wrap the partial as
content with the leading <tool_call> open tag so the model sees its
own truncated attempt next turn rather than losing it.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 23:32:44 -04:00
Kent Overstreet
c2433c1773 context: tighten the Branch token-cache invariant
Two pieces around the cache that landed when Branch nodes started
holding `token_ids: Some(server_authoritative_stream)`:

1. wire_into / wire_chunks now pair cached vision blocks with their
   child Image leaves. Previously the cached-branch arm spliced the
   cache verbatim and didn't recurse for images, so a Branch whose
   cache contained `VISION_START..VISION_END` blocks would emit those
   tokens with no matching `WireImage` push — leading to a panic
   downstream when `pair_images_to_ranges` tried to attach the
   missing image. New `pair_cached_images` walks the children
   depth-first for image leaves and zips them against
   `vision_blocks(cache)` to emit correctly-offset entries; mismatched
   counts panic loudly because that's an AST/cache invariant
   violation that would otherwise mis-pair on the wire.

2. `conversation_mut() -> &mut Vec<AstNode>` was the one public
   escape hatch that let callers reach into a Branch's children and
   mutate them without invalidating the cached token stream. Removed
   in favor of a focused `set_branch_memory_score(section, index,
   key, score)` for the only legitimate use we had today (the
   full-matrix scorer writing per-memory divergence onto the
   Assistant Branch). Updated the lone caller in subconscious/learn.

Documented the invariants explicitly on `ContextState`: every
`Leaf.token_ids` matches `body.compute_token_ids()`, and every
`Branch { token_ids: Some(_) }` is a faithful walk of its children.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 23:15:55 -04:00
Kent Overstreet
10c8878f1c agent: bump tonic gRPC message caps to 64 MiB
The default 4 MiB cap on encoded/decoded messages is too small for
the multimodal Generate path: Qwen3.6-VL high-res patches put 5–8 MiB
of pre-encoded image bytes inline in a single Generate request, and
Done events carrying full per-token readout vectors can also exceed
4 MiB on long runs. Hit "ResourceExhausted: Received message larger
than max (5799108 vs. 4194304)" from the salience server.

Bump both encode and decode caps on every cloned SalienceClient. The
matching server-side bump is in vllm/entrypoints/salience/server.py.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 22:36:10 -04:00
Kent Overstreet
fe232cf292 salience: client-side pad expansion, drop AppendImage
Mirrors the vLLM-side rewrite. AppendImage is gone; images now
ride along on Generate via a parallel `images` list.

- Productionize `qwen3_image_token_count` (was test-only). Image
  leaf computes its IMAGE_PAD count eagerly at construction from
  height/width; `token_count` is no longer "0 until the server
  tells us."
- WireChunk shrinks to a single `Tokens(Vec<u32>)` variant — vision
  blocks live inline in the token stream.
- `wire_chunks` now returns `(Vec<WireChunk>, Vec<WireImage>)`.
  `WireImage` carries `pad_start` / `pad_end` (absolute positions
  in the full walk) alongside bytes + mime.
- `assemble_prompt` returns `(chunks, images, match_upto)`.
- `stream_session_mm` / `run_session_generate` take the parallel
  images list, filter to those past `match_upto`, and pass them
  in `GenerateRequest.images` as `pb::ImageAttachment` entries.
- Drop `SessionHandle::append_image`,
  `ContextState::commit_image_token_counts`,
  `StreamToken::ImageAppended`, the WireChunk::Image branch in
  `learn.rs`, and the now-empty `prompt_to_chunks` helper.
- Add 'v' toggle on the conscious-screen tree to render token-id
  vectors in place of text content (debug-aid: lets us see what
  the server actually has when output is suspicious).
- Comment out the subconscious-trigger spawn loop — Kent had this
  disabled before; it had crept back into running.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 20:26:47 -04:00
Kent Overstreet
4feebb7bc4 agent: share one tonic Channel + migrate scoring to gRPC Generate
Two changes that bolt together — the shared connection means the new
scoring path actually costs one HTTP/2 handshake across the whole
process instead of one-per-RPC.

ApiClient gains `salience_channel: Arc<OnceCell<Channel>>`. First
call to `ApiClient::salience_client()` opens the channel via
`connect_channel()` and stores the Channel; subsequent calls clone
it (cheap — tonic multiplexes concurrent RPCs over the single
HTTP/2 connection). Every ApiClient clone shares the same OnceCell,
so all agents spawned from Mind's client — plus every ephemeral
scoring session — reuse one connection.

SessionHandle refactored to hold an `ApiClient` clone instead of
a bag of (base_url, api_key) strings. `open` / `append_image` /
`generate` go through `self.client.salience_client()` now. New
`prefill_only(tokens)` method encapsulates the "Generate with
max_tokens=0 to append text" pattern (previously a private free
function in api/mod.rs called `flush_pending`). Drop impl on
SessionHandle stays — still fires CloseSession on the shared
channel in a detached task.

`run_session_generate` switched from `(base_url, api_key, model)`
to `&ApiClient`; the agent-turn flow that uses it keeps the same
shape but `stream_session_mm` clones the ApiClient into the
spawned worker.

learn.rs migrated from the HTTP `/v1/score` endpoint to a gRPC
session-based score:

  * `call_score` opens an ephemeral SessionHandle on the client,
    converts (prompt_tokens, images) → Vec<WireChunk> via the new
    `prompt_to_chunks` helper (splits on VISION_START/VISION_END),
    walks chunks calling `prefill_only` + `append_image`, runs a
    final Generate with `max_tokens=0` + `logprobs_ranges` over
    the scored positions, and sums each Token event's
    `sampled_logprob` per range to produce `ScoreResult`s.

  * SessionHandle drops at end of scope → CloseSession auto-fires,
    keeping the server's session map clean between calls.

  * No more HTTP path, no more `http_client()` helper, no more
    `ScoreResponse` / serde plumbing for /v1/score.

  * `send_to_train` still uses HTTP (it talks to /v1/train which
    isn't on the gRPC protocol); its ad-hoc HTTP client lives
    inline now instead of reaching for the deleted `http_client()`.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 12:51:53 -04:00
Kent Overstreet
be6ba4e9a5 agent: bundle sampling fields as SamplingParams on AgentState
Collapse the split `temperature` / `top_p` / `top_k` fields on
AgentState into a single `sampling: SamplingParams` struct, mirroring
how the wire-level fields flow into the Generate RPC. Adds
`max_tokens` to SamplingParams so it's actually plumbed end to end
(previously the client had a hardcoded 4096 fallback inside
`run_session_generate`).

AgentState construction sites now set `sampling: SamplingParams { ...
max_tokens: 4096 }` as the default. The assignment sites in
oneshot.rs / subconscious.rs / unconscious.rs switch from
`st.temperature = X` to `st.sampling.temperature = X`.

`stream_session_mm` takes `SamplingParams` directly; the
`sampling_max_tokens()` helper goes away. `pb::GenerateRequest` is
populated with `sampling.max_tokens` (and the other fields) in
`run_session_generate`. SamplingParams is `pub` so it can be
embedded in the public AgentState without a visibility warning.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 12:37:20 -04:00
Kent Overstreet
8d9c9e9f7b agent: end-to-end gRPC Generate with delta-based session orchestration
Wires the client side of the new salience protocol so inference
actually runs over gRPC instead of emitting the stubbed "not yet
wired" error. Each turn walks the AST as interleaved chunks, sends
only what's new to the server, and streams decode tokens back.

context.rs:
  * `WireChunk` enum: `Tokens(Vec<u32>)` or `Image { bytes, mime,
    known_expanded_len }`. Preserves text/image/text ordering the
    wire path can't flatten.
  * `wire_chunks(range, skip)` walker, parallel to `wire_prompt` —
    branches emit `<|im_start|>…<|im_end|>` tokens, image leaves
    emit a single Image chunk (no inline vision tokens).
  * `NodeLeaf::set_image_token_count(n)` + recompute of cached
    `token_ids`; `ContextState::commit_image_token_counts(&[u32])`
    fills in the first-N zero-count image leaves in wire order.
  * `ResponseParser::run` handles the new
    `StreamToken::ImageAppended` by committing the server's N into
    the AST before the final Generate's Token events stream in.

salience.rs:
  * `SessionHandle` tracks `committed_len`. `append_image` advances
    it from the RPC response. New `generate(req)` opens the
    server-streaming RPC.

api/mod.rs:
  * `stream_session_mm(session_lock, chunks, sampling, priority,
    readout_shape)` replaces the stub. Spawns `run_session_generate`.
  * `run_session_generate`: takes the session out of the Mutex (or
    opens fresh), skips chunks covered by `committed_len` (bails on
    mid-chunk straddle or unknown-length image in the committed
    prefix), walks the delta: accumulates Tokens into `pending`, on
    Image flushes pending via `flush_pending` (max_tokens=0 Generate
    that just prefills), then AppendImage + emits
    StreamToken::ImageAppended. Final Generate carries any trailing
    pending text as `append_tokens` and the sampling params; Token
    events stream out as StreamToken::Token, Done as
    StreamToken::Done. On success, handle with updated
    `committed_len` returns to the Mutex; on error, handle drops
    and next call reopens.
  * `StreamToken::ImageAppended { placeholder_count }` variant —
    emitted in wire order before the final Generate's tokens.
  * Prefix-cache cap for readout coverage: `readout_ranges` covers
    `[prompt_len_after_append, u32::MAX)` when the caller provides
    a readout_shape, so decode positions stream their readouts.

agent/mod.rs:
  * `assemble_prompt` returns `Vec<WireChunk>` with the assistant
    prologue merged into the trailing Tokens chunk. Caller in
    `turn` passes chunks + readout_shape (pulled from
    `agent.readout.lock().manifest`) to `stream_session_mm`.
  * Dropped `assemble_prompt_tokens` — dead.

mind + unconscious:
  * `Unconscious::new(client)` stores a shared `ApiClient`. Fixes
    the repeated-manifest-fetch bug caused by each subagent's
    `ApiClient::new` having its own OnceCell. The client's Arc-
    wrapped manifest cache is now shared across every agent Mind
    spawns.
  * `prepare_spawn(name, auto, wake, base_client)` clones the base
    client and overrides `.model` for the resolved backend instead
    of constructing fresh. All three callers
    (`toggle`/`trigger`/unconscious loop) pass `self.client.clone()`.
  * `Mind::new` passes `agent.client.clone()` into
    `Unconscious::new`.

subconscious/generate.rs:
  * gen_continuation switched to `wire_chunks` + the new
    `stream_session_mm` signature. Ephemeral session opens on each
    call, tears down at scope end. No readouts requested.

Not changed yet, noted for follow-up:
  * Subconscious ablation scoring in learn.rs still talks to
    `/v1/score` over HTTP. Will migrate once we have time to verify
    the Generate+max_tokens=0+prompt_logprobs path end-to-end.
  * compare.rs constructs its own ApiClient for the
    `compare.test_backend` (which is intentionally a different
    endpoint) — left alone.
  * Readout manifest still fetched via HTTP at Agent::new.
    Migration to GetReadoutManifest gRPC is a separate cleanup.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 12:27:55 -04:00
Kent Overstreet
08213f9093 salience: add gRPC client + TLS plumbing for stateful vllm sessions
Adds the client-side of a stateful gRPC protocol against vllm, plus
the TLS trust machinery so we can talk to self-signed vllm servers.

Protocol (proto/salience.proto):
  Bidi-streaming Session RPC carries OpenSession / AppendTokens /
  Generate / Cancel from client and SessionReady / PrefillProgress /
  Token / GenerateDone / Error from server. Separate Fork unary RPC
  for cheap branching (prefix cache shares KV automatically). Plus
  ListSessions, CloseSession, GetReadoutManifest admin RPCs.

  Per-token readouts ship as packed f32 ([n_layers * n_concepts] per
  token, flat). Logprobs use range-selected positions plus a top-k
  parameter — empty ranges means no logprobs, any range means emit
  sampled-token logprob at those positions, top_k > 0 adds
  alternatives.

Client (src/agent/api/salience.rs):
  Tonic-generated types under pb::, a connect() helper, with_auth()
  for bearer metadata, and a Session handle wrapping the bidi stream:
  open() handshakes SessionReady; append() is fire-and-forget;
  generate() returns impl Stream<Item = Event> that drains inbound
  until Done or terminating Error. One generate at a time per session.

Peak picker (src/agent/salience.rs):
  Pure function over ReadoutEntry traces. Per-concept z-score against
  trace global stats; contiguous above-threshold regions emit one
  peak at the local max. Configurable sigma threshold and min-std
  safety floor. Deterministic tie-break on offset then concept name.
  12 unit tests covering empty traces, flat channels, single/multi
  spikes, contiguous humps, multi-concept independence, trailing
  runs, sub-threshold noise, layer-out-of-range, manifest shape
  mismatch, and threshold tunability.

TLS (src/agent/api/http.rs):
  HttpClient::build now also loads every .pem file under
  ~/.consciousness/certs/ into the rustls root store — so dropping
  a <host>.pem in that directory is enough to trust a new self-
  signed server; no code changes per new host. Also installs the
  rustls default crypto provider explicitly via OnceLock: tonic's
  tls features pulled in both ring and aws-lc-rs on the resolver
  path, and rustls 0.23 refuses to auto-pick when either could win.

Build (build.rs, Cargo.toml):
  tonic-build generates Rust types from proto/salience.proto at
  cargo-build time, using a vendored protoc binary
  (protoc-bin-vendored) so no system install is required. New
  runtime deps: tonic, prost, async-stream, tokio-stream,
  rustls-pemfile.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 11:56:32 -04:00
Kent Overstreet
28d56e2a55 agent/context: make Thinking blocks prompt-visible
Thinking blocks used to render as empty strings and be excluded from
is_prompt_visible, so the model never saw its own prior CoT across
turns. For Qwen 3.6 native thinking mode, CoT is meant to stay in the
conversation — the model benefits from seeing what it reasoned about
last turn.

Render Thinking as <think>\n{text}\n</think>\n so past reasoning is
visible in subsequent prompts. Add in_think param to ResponseParser::new
so the parser starts inside a <think> block when the prompt was
prefilled with "<think>\n" (native thinking mode).

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 11:54:25 -04:00
Kent Overstreet
5f06577ead tools/web: add gemini_search as an alternative search tool (#5)
Issue #5 (spqrz) flagged that web_search using DuckDuckGo
occasionally flakes out, and Google search directly is blocked
behind CAPTCHAs for non-browser clients. The Gemini free-tier API
exposes a grounded-search tool that effectively queries Google's
index and returns an LLM-summarized answer with source URLs.

Added as a SEPARATE tool rather than a transparent fallback for
web_search:

* web_search (DDG) returns raw results — title, URL, snippet per
  hit — which the agent can reason over itself.
* gemini_search returns an LLM-pre-digested summary plus grounding
  URLs. Useful for synthesis queries ("what's the consensus on X")
  or when DDG is flaky, but it's another LLM in the loop so the
  agent may want the raw variant for certain tasks.

Tool descriptions tell the agent to prefer web_search for raw
results and use gemini_search for synthesis / fallback. The agent
picks based on query shape.

Only registered when GEMINI_API_KEY is set in the environment
(gracefully absent otherwise). Uses gemini-2.0-flash which has a
generous free-tier rate limit. Parses grounding metadata for
source URLs so the agent can follow links.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 13:02:01 -04:00
Kent Overstreet
c7b0052f1d agent: kill no_compact, add pre-send size check in assemble_prompt
Two related fixes for last night's crash diagnosis:

1. Kill AgentState::no_compact. The reasoning ("forked agents
   shouldn't compact because it blows the KV cache prefix") wasn't
   worth the cost — forks with no compact recovery just *died* on
   any oversize prompt, with no fallback. The KV cache invalidation
   is a performance loss; failing the request entirely is a
   correctness loss. Remove the flag, let every agent's overflow-
   retry path call compact() up to 2 times.

2. Add pre-send size check in Agent::assemble_prompt. If the
   context has grown past budget (context_window * 80%) since the
   last compact — accumulation between turns, a fork assembling
   more than expected, etc. — trim_conversation() is called before
   wire_prompt. Since we tokenize client-side, we already know the
   exact count, so there's no reason to round-trip an oversize
   request to vLLM and get rejected.

Together these prevent the failure mode from last night: a
subconscious/unconscious agent's prompt exceeded max_model_len,
vLLM returned 400, agent had no_compact=true so it couldn't
recover, request failed. Now: the trim happens before send, so
the request rarely hits the 400 path at all; and if it somehow
does, compact+retry works for every agent.

Also adds ContextState::total_tokens() as the cheap pre-send
budget check.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 12:59:30 -04:00
Kent Overstreet
4245b8bdb3 Merge PR #4: use html2md on web_fetch (fixes #3) (spqrz)
web_fetch was returning raw HTML, which is verbose and hard for
the agent to consume. Add html2md dependency and convert HTML to
Markdown before truncation. Much cleaner output for normal pages;
no downsides.

Co-Authored-By: spqrz <spqrz386@gmail.com>
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 12:50:54 -04:00
Kent Overstreet
8952ff6a76 agent/readout: forks get independent buffers
Subconscious agents (scoring, reflection, etc.) fork from the main
conscious agent. The amygdala screen reads the main agent's readout
buffer, so the previous "share parent's buffer" policy caused
forked-agent generations to bleed into the main emotional readout,
producing constant cycling even when DMN was resting.

Each fork now gets its own SharedReadoutBuffer. The amygdala screen
shows only the main conscious agent's emotional trajectory; per-agent
subconscious readouts can become a separate view later if wanted.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 01:42:13 -04:00
Kent Overstreet
c8976660f4 amygdala: F8 screen for live concept-readout projections
Per-token residual-stream projections from the vLLM server's readout
pipeline surfaced as a TUI bar chart. Flow:

* agent/readout.rs — SharedReadoutBuffer (manifest + ring of last ~200
  token entries). Lives on Agent and is shared across forks (single
  stream, one landing pad).
* agent/mod.rs — Agent::new now probes /v1/readout/manifest at startup
  (non-fatal; 404 leaves manifest None, which disables the screen).
* agent/context.rs — the streaming token handler pushes every token
  with attached readout onto the shared buffer.
* user/amygdala.rs — F8 screen. Top-K concepts by |value| as
  horizontal bars (green positive, red negative), plus a 4-line
  recent-tokens panel showing each token's top concept at the selected
  layer. Keys: 1..9 select layer, t toggles current/mean-over-recent.

Disabled state renders a hint pointing at VLLM_READOUT_MANIFEST /
VLLM_READOUT_VECTORS so users can tell the feature apart from
"server up but no tokens yet".

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 01:20:30 -04:00
Kent Overstreet
0f1c4cf1de agent/api: carry readout alongside streamed tokens
StreamToken::Token is now a struct variant with an optional
TokenReadout (shape [n_layers][n_concepts]) per token — parsed from
the vLLM completion response's choices[i].readout field when the
server has readout enabled.

ApiClient gains a fetch_readout_manifest() method that hits
GET /v1/readout/manifest. Returns Ok(None) on 404 (server has
readout disabled), so callers can gracefully fall back when pointed
at a non-readout-enabled endpoint.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 01:15:46 -04:00
Kent Overstreet
43e06daa5b cleanup: drop dead ApiClient::stream_completion wrapper, silence dmn_tick
stream_completion was a thin wrapper around stream_completion_mm (just
passing an empty image list); the last caller switched to _mm directly
when learn's generate_alternate gained image support. Delete the
wrapper — callers can pass `&[]` if they have no images.

MindState::dmn_tick has been sitting unused (called only from a
commented-out block in the Mind loop). Rename to _dmn_tick so the
compiler stops warning; Kent may uncomment the call path later.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-17 16:23:59 -04:00
Kent Overstreet
575325e855 mind: MindTriggered trait for background scoring flows
Mind's impl had accumulated ~50 lines of setup glue per scoring flow
(memory, memory-full, finetune): snapshot config, clone handles,
resolve context, spawn task, route results back through BgEvent,
write stats. The shape was identical; only the middle changed.

Introduce the MindTriggered trait:

    pub trait MindTriggered {
        fn trigger(&self);
    }

Each flow becomes a struct next to its scoring code that owns its
dependencies and a JoinHandle (behind a sync Mutex for interior
mutability):

    subconscious::learn::MemoryScoring    (Score, ScoreFull)
    subconscious::learn::FinetuneScoring  (ScoreFinetune)

Mind holds one of each and dispatches in one line:

    MindCommand::Score         => self.memory_scoring.trigger(),
    MindCommand::ScoreFull     => self.memory_scoring.trigger_full(),
    MindCommand::ScoreFinetune => self.finetune_scoring.trigger(),

Each struct picks its own trigger semantics — memory scoring is
no-op-if-running (!handle.is_finished()); finetune is abort-restart.

Falls out:

 - BgEvent / bg_tx / bg_rx disappear entirely. Tasks write directly
   to their slice of MindState and call agent.state.changed.notify_one()
   to wake the UI. The bg_rx arm in Mind's select loop is gone.

 - agent.state.memory_scoring_in_flight was duplicating
   shared.scoring_in_flight via BgEvent routing; now the JoinHandle
   alone tells us, and shared.scoring_in_flight is written directly
   by the task for the UI.

 - start_memory_scoring / start_full_scoring / start_finetune_scoring
   methods on Mind are deleted; Mind no longer knows the setup shape
   of any scoring flow.

 - FinetuneScoringStats moves from mind/ to subconscious/learn.rs
   next to the function that produces it.

No behavior change — same flows, same trigger points, same semantics.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-17 16:12:26 -04:00
Kent Overstreet
c5745e38e2 subconscious: lift continuation gen + render helpers into shared homes
- context.rs gains is_assistant, render_branch_text, render_prior_context
  alongside memory_key / is_memory_node. They're pure AST helpers, used
  by both the finetune pipeline and the forthcoming compare screen.

- new subconscious/generate.rs holds gen_continuation(context, entry_idx,
  skip, client): build the prompt from a context prefix with an arbitrary
  skip predicate, send to the model, decode the completion. Takes both
  the predicate and the client so callers can aim it at memory-stripped
  contexts (finetune), same-context-different-model (F7 compare), or
  whatever else.

- learn.rs drops its private copies of those helpers and the inline
  generate_alternate; the finetune path now reads as
  gen_continuation(context, idx, is_memory_node, client).

Pure refactor, no behavior change.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-17 15:20:02 -04:00
Kent Overstreet
eea7de4753 agent: unify prompt assembly across agent and learn paths
wire_prompt() gains a conv_range and a skip closure, and returns the
assistant-message token ranges needed by the scoring path. The agent
path passes 0..len + |_| false and ignores the ranges. Memory-ablation
scoring and candidate generation pass a prefix range + a predicate
(e.g. is_memory_node, or |n| memory_key(n) == Some(key)).

This deletes subconscious/learn.rs's build_token_ids, its private
Filter enum, and the is_memory/memory_key duplicates — the walk over
context sections now has one home. Adding a section or changing
section order in the agent path won't silently drift away from what
scoring sees.

call_score forwards multi_modal_data when the wire-form prompt
contains images. generate_alternate switches to stream_completion_mm
and passes the same images. Scoring on image-bearing contexts now
sends wire form (1 image_pad + image data) instead of expanded
image_pads with no image data; text-only contexts are bit-identical.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-17 15:16:07 -04:00
ProofOfConcept
b8485ed6c1 agent: compact() preserves Identity section
compact() was calling reload_context() to re-fetch personality_nodes
from the store and pushing fresh AstNode::memory leaves into the
Identity section. Fresh leaves start with score: None, so every
compact — which fires after every turn (mind/mod.rs:884) — was
wiping any memory scores that had just been computed. Scoring then
often ran immediately after compact on the same path (line 886),
starting from a zero-score Identity section.

Drop the rebuild. Identity content is loaded at startup via new() +
restore_from_log(); compact doesn't need to redo that. Mid-session
edits to personality-node content are a non-goal — a restart picks
them up. Scores survive.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 20:47:05 -04:00
Kent Overstreet
204ba5570a agent: send images as multi_modal_data on completion requests
Split the prompt assembly into two forms: the AST keeps the
fully-expanded representation (N image_pads per image, for accurate
context budget accounting), while the request wire form collapses
each image to a single <|image_pad|> bookended by vision_start/end
and ships the raw bytes out-of-band as a base64 data URI in a new
`multi_modal_data.image` field on /v1/completions.

vLLM's Qwen3VL processor uses PromptReplacement with target=single
<|image_pad|> and replacement=N image_pads, so the wire-form matches
what the processor expects and it re-expands to N server-side.

Server side needs /v1/completions to accept multi_modal_data for
this to land images end-to-end — that's the next piece.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 18:08:26 -04:00
Kent Overstreet
91106deaa1 agent: rewrite view_image to emit Image leaves
view_image now reads the file, grabs dimensions via imagesize (no full
decode), and pushes a user-role branch containing a NodeBody::Image
leaf straight into the conversation. The tool_result is just a short
acknowledgment — the actual pixels ride in the Image leaf for the API
layer to extract into multi_modal_data.

Drops the capture_tmux_pane path, which had no business living under
"vision" (tmux text capture belongs in bash or a dedicated tool, and
this one just returned rendered text anyway).

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 18:06:25 -04:00
Kent Overstreet
0bf71b9110 agent: add NodeBody::Image for Qwen3-VL vision input
Images are rendered as `<|vision_start|>` + N × `<|image_pad|>` +
`<|vision_end|>` where N is computed from the image dimensions using
Qwen3-VL's smart_resize rules (patch_size=16, merge_size=2, min=64K,
max=16M pixels). The token count matches what vLLM will produce at
request time, so budget accounting stays accurate.

Bytes are stored inline on the leaf and base64-encoded in the JSON
form. Token IDs are hand-assembled instead of re-running the tokenizer
on a potentially-huge placeholder string.

Follow-ups: view_image tool rewrite, multi_modal_data on the vLLM
request, API-layer plumbing from leaf bytes to request body.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 18:00:10 -04:00
Kent Overstreet
592a3e2e52 config: move user_name/assistant_name to AppConfig (top level)
These are identity settings, not memory-graph settings. Sat inside the
\`memory\` section only because that's where Config started life. Move
to AppConfig alongside the other top-level stuff.

Readers now pull from \`config::app()\` instead of \`config::get()\`.
subconscious/defs.rs's conversation-building pass still needs Config
for surface_conversation_bytes, so both guards coexist there —
AppConfig's guard is dropped before the per-step await loop so we
don't stall the config-watcher's writer.

show_config picks up the two new fields at the top of its output.
Kent's config already has them hoisted to the top level.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 16:20:17 -04:00
Kent Overstreet
60de579305 config: unify subconscious API resolution with the main chat path
Two parallel backend-resolution paths had drifted apart:

- Main chat: AppConfig::resolve_model() → a named BackendConfig in
  AppConfig.backends
- Subconscious / oneshot / context_window(): four skip-serde
  "cache" fields on Config (memory section) — api_base_url, api_key,
  api_model, api_context_window — that used to be populated at
  Config::try_load_shared time by walking memory.agent_model →
  root.models[name] → root[backend_name]

When we renamed `models` to `backends` and collapsed ModelConfig into
BackendConfig, the latter chain started silently dereferencing
`root.get("models")` → None → no population. Subconscious agents fell
through the "API not configured" guard; context_window() started
returning 0 (since api_context_window default is u64's 0 now that we
don't populate it). It was only visibly working for the main chat.

Collapse to one path:

- Drop Config.agent_model (duplicate of AppConfig.default_backend)
- Drop Config.{api_base_url, api_key, api_model, api_context_window}
  — no longer populated, no longer needed
- Drop default_context_window() — nobody reads the field anymore
- Drop the memory-side resolution block in try_load_shared()
- Subconscious (mind/unconscious.rs) and oneshot (agent/oneshot.rs)
  now call load_app() + resolve_model(&app.default_backend) just like
  the main chat does
- context_window() reads from config::app().backends[default_backend]
  .context_window, defaulting to 128k only if the backend doesn't
  specify one

Side effect: Kent's config file drops agent_model, api_reasoning,
journal_days, journal_max — all fields whose Rust counterparts are
now gone. (Figment tolerates unknown fields, so leaving them wouldn't
have broken anything, but they were lying about what's configurable.)

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 16:02:43 -04:00
Kent Overstreet
2989a6afaa config: drop dead code and collapse to a single backend
Config had accumulated several obsolete fields, a legacy load path
that was just returning defaults, and multi-backend infrastructure
that's no longer used.

Removed from Config (memory section):
- load_legacy_jsonl() — just returned Config::default(), no callers
- The legacy-fallback branch in load_from_file
- surface_hooks, surface_timeout_secs — zero external readers
- scoring_chunk_tokens + default fn — zero external readers
- The POC_MEMORY_CONFIG env override note in the header comment
  (not actually wired up anywhere)

Collapsed multi-backend to single-backend:
- AppConfig used to carry `anthropic: BackendConfig` and
  `openrouter: BackendConfig` as required fields plus an optional
  `deepinfra`, picked between at runtime by name. Only one is ever
  actually used in any deployment. Collapse to a single
  `backend: BackendConfig` on AppConfig, drop the multi-backend
  match logic in resolve_model, drop the top-level `backend: String`
  selector field, drop the `BackendConfig::resolve` fallback path.
- Also drop BackendConfig.model (redundant with ModelConfig.model_id
  once multi-backend is gone).
- ModelConfig.backend field goes — there's only one backend now, no
  choice to make.

Dead prompt_file machinery:
- ModelConfig.prompt_file, ResolvedModel.prompt_file, SessionConfig
  .prompt_file, Agent.prompt_file — nothing in the codebase actually
  reads the file these strings name. Just passed around and compared.
  Delete the whole string through every struct.
- The "if prompt_file changed on model switch, recompact" branch in
  user/chat.rs goes too (never fired usefully).

Dead memory_project plumbing:
- AppConfig.memory_project field, CliArgs.memory_project, the
  --memory-project CLI flag, the figment merge target, the show_config
  display line. Nothing reads it anywhere.

Dead ContextInfo struct:
- `struct ContextInfo` was never constructed — context_info: None
  was the only initializer. The conditional display blocks in
  user/context.rs that dereferenced it were dead.

Behavior change: AppConfig::resolve() now requires a non-empty
`models` map and bails with a helpful message if it's missing. The
old fallback ("no models? use top-level backend + PromptConfig to
build a default") path is gone — it was only kept for symmetry with
a mode nobody used.

Config file shape: `deepinfra: {...}` → `backend: {...}`, and
model entries no longer need `backend:` or `prompt_file:`. Updated
~/.consciousness/config.json5 to match.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 15:41:55 -04:00
Kent Overstreet
0e6b5dc8be agent: phase-aware bail script for surface-observe concurrency
bail-no-competing.sh used to bail if any other live agent existed in
the state dir, period. That was too coarse: surface-observe agents run
a multi-step pipeline (surface → organize-search → organize-new →
observe), and the intent is to let a new surface-phase agent start
while an older one finishes its post-surface tail. With the old check
the newer agent always bailed, so surface-observe was effectively
serialized at the slowest cycle time.

Make the script phase-aware:

- oneshot.rs now passes the current phase as argv[2] alongside the pid
  file name. The script writes that phase into its own pid file on
  every step transition, so concurrent agents can read each other's
  phase just by cat'ing the pid files.

- Bail only when another live agent is in the same phase-group as us.
  Groups: "surface" vs. "everything else" (post-surface). At most one
  agent per group alive at a time — surface runs at a higher cadence
  than the organize/observe tail.

- Still clean up stale pid files for dead processes.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 15:41:28 -04:00
Kent Overstreet
080b4f9084 context: tighten timestamp schema; every AstNode has one
Previously NodeLeaf.timestamp and AstNode::Branch.timestamp accepted
null or missing via a deserialize_timestamp_or_epoch fallback — legacy
entries in conversation.jsonl from before Branch timestamps existed
(and from before chrono serialization was wired up) would load with
UNIX_EPOCH as a sentinel. Downstream, node_timestamp_ns() returned
Option<i64> and callers had to handle None as "old entry, skip."

That second filter was silently dropping every candidate in
score_finetune_candidates when scoring an older session — the F6
screen showed "0 above threshold" even when max_divergence was
orders of magnitude above the threshold, because every entry was
failing the None check, not the divergence check.

The fix, in three parts:

1. src/bin/fix-timestamps.rs — one-off migration tool that walks a
   conversation.jsonl, linearly interpolates timestamps for entries
   stuck at UNIX_EPOCH (using surrounding real timestamps as anchors),
   propagates to child leaves with per-sibling ns offsets, and bumps
   any collisions by 1 ns for uniqueness. Ran against the current
   session's log: 11887 entries, 72289 ns bumps, all unique.

2. context.rs — drop default_timestamp and
   deserialize_timestamp_or_epoch. NodeLeaf and Branch now require a
   present non-null timestamp on deserialize. Tests flip from
   "missing/null → UNIX_EPOCH" to "missing/null → Err."

3. subconscious/learn.rs — node_timestamp_ns now returns i64, not
   Option<i64>. The matching caller in score_finetune_candidates
   collapses from a Some/None match to a single trained-set check.
   mind/log.rs's oldest_timestamp no longer filters UNIX_EPOCH.

Every line currently on disk has already been migrated. Going
forward, new AstNodes always carry real timestamps (Utc::now() at
construction time), so the strict schema is the invariant, not an
aspiration.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 12:35:16 -04:00
Kent Overstreet
2b632d568b learn: nanosecond timestamps, token ranges for /score
Two related changes to the learn subsystem:

1. AST node timestamps are now non-optional — both Leaf and Branch
   variants carry a DateTime<Utc>. UNIX_EPOCH means "unset" (old entries
   deserialized from on-disk conversation logs).

   Training uses timestamps as unique keys for dedup, so we promote to
   nanosecond precision: node_timestamp_ns(), TrainData.timestamp_ns,
   FinetuneCandidate.timestamp_ns, mark_trained(ns).

2. build_token_ids() now also returns token-position ranges of assistant
   messages. These are passed to vLLM's /score endpoint via the new
   score_ranges field so only scored-position logprobs are returned —
   cuts bandwidth/compute when scoring small windows.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 11:48:37 -04:00
Kent Overstreet
fc978e2f2e Remove find_context_files — identity comes from memory nodes
Deleted the directory-walking CLAUDE.md/POC.md loader. Identity now
comes entirely from personality_nodes in the memory graph.

Simplified:
- assemble_context_message() takes just personality_nodes
- Removed config_file_count/memory_file_count tracking
- reload_for_model() → reload_context() (no longer model-specific)

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2026-04-15 03:11:27 -04:00
Kent Overstreet
82eeb9807e Add -tool exclusion syntax, exclude delete/restore for agents
memory_delete and memory_restore are now in memory_tools() (available
via MCP for CLI). Agent tool lists support "-tool_name" to exclude.
Agents automatically exclude memory_delete and memory_restore.

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-04-15 02:44:13 -04:00
Kent Overstreet
4b710eb7a7 logs: assert non-empty agent names, fix debug.log path
- save_agent_log: assert name is not empty (panic to find the bug)
- AutoAgent:🆕 assert name is not empty
- dbglog: write to daemon/ subdir instead of toplevel logs/

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-15 01:52:31 -04:00
Kent Overstreet
2a7b0daea1 agent: remove memory_delete from tools, supersede transfers links
- memory_delete no longer exposed to agents - use supersede instead
- memory_supersede now transfers all edges from old node to new node
  (keeps whichever strength is higher if new node already has the link)
  This preserves graph structure during consolidation.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-15 01:40:34 -04:00
Kent Overstreet
5d6e663b60 thalamus: add thinking mode toggles (native + tool)
Two independent toggles on the thalamus screen:
- 't' toggles native Qwen <think> tags (adds <think>\n to generation prompt)
- 'T' toggles think tool (Anthropic-style structured reasoning tool)

Both can be enabled simultaneously. Native thinking is on by default.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-14 18:25:00 -04:00
Kent Overstreet
b3d0a3ab25 store: internal locking, remove Arc<Mutex<Store>> wrapper
Store now has internal Mutex for capnp appends and AtomicU64 for
size tracking. All methods take &self. The external Arc<Mutex<Store>>
is replaced with Arc<Store>.

- Store::append_lock protects file appends
- local.rs functions take &Store (not &mut Store)
- access_local() returns Arc<Store>
- All .lock().await calls removed from callers

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-13 21:49:54 -04:00
Kent Overstreet
a1accc7cd4 store: remove visit tracking infrastructure
Remove AgentVisit, TranscriptSegment, and all related visit tracking code.
Provenance is what we've been using to track agent interaction with nodes.

Also removes dead fields from Node (state_tag, created).

-349 lines.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-13 18:57:12 -04:00
Kent Overstreet
1d88293ccf Remove Store::cached(), consolidate on access_local()
- Remove CACHED_STORE, cached(), is_stale(), set_store() - redundant
- Convert all Store::cached() callers to use access_local()
- Single Store::load() call remains in access() fallback path

All store access now goes through hippocampus::access() / access_local(),
which handles socket connection or local fallback with caching.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-13 18:11:58 -04:00
Kent Overstreet
5db00e083f centralize memory store interface in hippocampus/mod.rs 2026-04-13 17:44:41 -04:00
Kent Overstreet
063cf031d3 journal_tail: return typed Vec<JournalEntry>, remove Store::load from agent
- journal_tail returns Vec<JournalEntry> with key, content, created_at
- load_startup_journal uses typed API, no more direct Store access
- CLI does formatting, hippocampus returns data

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-13 15:23:10 -04:00
Kent Overstreet
419bb222b5 defs.rs: remove store/graph params, use typed memory API
resolve_placeholders() and run_agent() no longer take &Store.
All placeholders now use async memory_render/memory_links/memory_query
directly. The "siblings" placeholder uses Vec<LinkInfo> for ranking
neighbors by link_strength * node_weight.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-13 15:18:05 -04:00
Kent Overstreet
598f0112a4 memory_links: return typed Vec<LinkInfo> with node weights
- hippocampus::memory_links now returns Vec<LinkInfo> with key,
  link_strength, and node_weight for each neighbor
- Unified memory_tool! macro: mut/ref as token, single main rule
- All tools use serde serialize/deserialize for RPC consistency
- jsonargs handlers now work in client mode (RPC to daemon)
- cli/graph.rs formats LinkInfo for display

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-13 15:12:06 -04:00
Kent Overstreet
359955f838 defs.rs: async conversion, remove block_in_place
Convert resolve(), resolve_placeholders(), run_agent() to async.
Use memory_render/memory_query directly with .await instead of
block_in_place wrappers.

Propagate async to callers:
- config.rs: resolve(), load_session(), reload_for_model()
- identity.rs: load_memory_files(), assemble_context_message()
- oneshot.rs: run_one_agent()
- prompts.rs: agent_prompt()

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-13 14:56:26 -04:00
Kent Overstreet
9bb07bc26a memory.rs: clean up store access and tool dispatch
- Single access() function returns StoreAccess enum (Daemon/Client/None)
- OnceLock for daemon store, thread-local RefCell for client socket
- Remove dispatch() - Tool handlers call jsonargs_* directly
- get_provenance() takes agent ref, no JSON round-trip
- Expose missing graph tools (communities, normalize, link_impact, trace)
- Local tool! macro for cleaner Tool definitions

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-13 14:27:38 -04:00
Kent Overstreet
fb46ab095d Consolidate memory RPC in tools/memory.rs
- Move memory_rpc(), socket_path(), SocketConn from mcp_server.rs
- Convert remaining callers to typed async API:
  - defs.rs: organize placeholder, run_agent query
  - cli/agent.rs: query resolution (now async)
  - mind/identity.rs: Store context loading
- Re-export socket_path/memory_rpc from mcp_server for compatibility

All external memory access now goes through tools/memory.rs typed API.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-13 13:39:59 -04:00
Kent Overstreet
5b07a81aa7 CLI/hippocampus: rename core memory functions to memory_*
Aligns function names with tool names for consistency:
- hippocampus: render → memory_render, write → memory_write, etc.
- tools/memory.rs: macro no longer prepends memory_ prefix
- CLI files: use typed async API throughout (graph.rs, journal.rs, admin.rs)

This eliminates the "memory_graph_topology" tool name bug where
graph_* and journal_* tools were incorrectly prefixed.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-13 13:26:22 -04:00
Kent Overstreet
933221f482 memory tools: generate public typed API via macro
The memory_tool! macro now generates two functions:
- jsonargs_*() - internal, takes JSON args for dispatch table
- pub fn name() - typed args, handles RPC-vs-local automatically

Callers can now use typed Rust API:
  memory::write(Some(&agent), "key", "content").await?;
  memory::query(None, "all | type:semantic", Some("full")).await?;

No more manual JSON construction for memory tool calls.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-13 13:12:11 -04:00