Commit graph

808 commits

Author SHA1 Message Date
Kent Overstreet
341955c144 Add transcript tail debug command 2026-06-15 13:51:22 -05:00
Kent Overstreet
78b4bbd5bb Split conversation transcript parsing 2026-06-15 13:51:22 -05:00
Kent Overstreet
156b6863f4 Gate nightly diagnostics behind feature 2026-06-15 13:51:22 -05:00
Kent Overstreet
25e4775974 enable tls 2026-05-22 13:02:42 -04:00
Kent Overstreet
713bb07729 bin: add ch — minimal channel CLI (send/recv)
Speaks the channel.capnp protocol over the per-daemon Unix socket at
~/.consciousness/channels/<top>.sock. Useful for ad-hoc sends from
shell, tests, and out-of-process tools that don't want to embed a
capnp client.

  ch send <channel> <message...>
  ch recv <channel> [--all-new] [--min-count N]

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-05-01 18:16:21 -04:00
Kent Overstreet
a075e30557 http: add HttpResponse::bytes() for binary downloads
Mirror of text(), but returns raw Bytes without lossy UTF-8 conversion.
Needed by the Telegram channel to fetch photo files.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-05-01 17:58:35 -04:00
Kent Overstreet
91c8451f5c user: fix hotkey_cycle_reasoning after lock_blocking revert
The revert at 09896cd dropped the try_lock() wrapper but left an extra
closing brace and the async-call site still un-awaited, leaving the tree
unbuildable. Re-flow the function body to match the new signature.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-05-01 17:58:32 -04:00
Kent Overstreet
09896cd38b Revert "replace try_lock() with lock_blocking() across UI thread"
This reverts commit 4225294d16.
2026-04-25 17:15:53 -04:00
Kent Overstreet
4225294d16 replace try_lock() with lock_blocking() across UI thread
Add lock_blocking() to TrackedMutex: blocks current thread using
block_in_place + futures::executor::block_on, safe for sync contexts.

Replace all try_lock() calls with lock_blocking() in slash commands,
UI rendering, and status reads. Lock hold times are fast enough that
blocking briefly is fine, and this eliminates the spurious 'lock
unavailable' paths that were never actually hit.

Kept rx_mutex.try_lock() in mod.rs (std::sync::Mutex for stderr rx).
2026-04-25 15:35:14 -04:00
Kent Overstreet
5210f7dd66 context: heal pre-refactor image logs with token_count=0
Recompute image token counts from persisted dimensions when loading
old logs that stored count=0 (server-authoritative count was applied
after AppendImage before client-side pad expansion).

graph: cache neighbor sets for clustering coefficient

Pre-compute neighbor HashSets so the O(deg^2) triangle-counting
inner loop doesn't re-allocate on every (i,j) pair. avg_clustering_
coefficient() now builds the cache once instead of O(N*deg) times.
2026-04-25 15:15:21 -04:00
Kent Overstreet
371b40078d context: salvage in-flight tag accumulators on premature stream end
ResponseParser.finish() was only flushing self.buf — the rolling tail
window — and silently dropping self.think_buf and self.tool_call_buf.
When a stream ended inside an unterminated <think>...</think> or
<tool_call>...</tool_call> block (max_tokens reached, EOS before the
close tag, server-side cancel), all the accumulated in-tag content
was discarded and only the trailing ~8 bytes survived (drain_safe
keeps `close_tag.len()` bytes at the tail of buf to handle
across-chunk tag splits — and `</think>` is exactly 8 chars).

Symptom: assistant responses cut off, only the last few characters
come through. Especially severe in native-think mode where in_think
is set from prefill, so the entire response accumulates in
think_buf and gets wiped on premature stop.

In finish(): if in_think, drain buf into think_buf and emit as a
Thinking node (preserving the partial thought). If in_tool_call,
attempt to parse the body; on parse failure, wrap the partial as
content with the leading <tool_call> open tag so the model sees its
own truncated attempt next turn rather than losing it.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 23:32:44 -04:00
Kent Overstreet
c2433c1773 context: tighten the Branch token-cache invariant
Two pieces around the cache that landed when Branch nodes started
holding `token_ids: Some(server_authoritative_stream)`:

1. wire_into / wire_chunks now pair cached vision blocks with their
   child Image leaves. Previously the cached-branch arm spliced the
   cache verbatim and didn't recurse for images, so a Branch whose
   cache contained `VISION_START..VISION_END` blocks would emit those
   tokens with no matching `WireImage` push — leading to a panic
   downstream when `pair_images_to_ranges` tried to attach the
   missing image. New `pair_cached_images` walks the children
   depth-first for image leaves and zips them against
   `vision_blocks(cache)` to emit correctly-offset entries; mismatched
   counts panic loudly because that's an AST/cache invariant
   violation that would otherwise mis-pair on the wire.

2. `conversation_mut() -> &mut Vec<AstNode>` was the one public
   escape hatch that let callers reach into a Branch's children and
   mutate them without invalidating the cached token stream. Removed
   in favor of a focused `set_branch_memory_score(section, index,
   key, score)` for the only legitimate use we had today (the
   full-matrix scorer writing per-memory divergence onto the
   Assistant Branch). Updated the lone caller in subconscious/learn.

Documented the invariants explicitly on `ContextState`: every
`Leaf.token_ids` matches `body.compute_token_ids()`, and every
`Branch { token_ids: Some(_) }` is a faithful walk of its children.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 23:15:55 -04:00
Kent Overstreet
006b99bdac bin: enable panic backtraces by default
stderr is redirected to ~/.consciousness/logs/tui-stderr.log via
redirect_stderr_to_pipe(), but the default panic hook checks
RUST_BACKTRACE before printing the trace; without the env var the
log only catches the "note: run with \`RUST_BACKTRACE=full\`" tail
and the actual frames are dropped.

Set RUST_BACKTRACE=1 programmatically before any other thread spawns
so the log captures the trace by default. Existing user-set value is
respected so callers can still opt into "full" if they want.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 22:44:19 -04:00
Kent Overstreet
10c8878f1c agent: bump tonic gRPC message caps to 64 MiB
The default 4 MiB cap on encoded/decoded messages is too small for
the multimodal Generate path: Qwen3.6-VL high-res patches put 5–8 MiB
of pre-encoded image bytes inline in a single Generate request, and
Done events carrying full per-token readout vectors can also exceed
4 MiB on long runs. Hit "ResourceExhausted: Received message larger
than max (5799108 vs. 4194304)" from the salience server.

Bump both encode and decode caps on every cloned SalienceClient. The
matching server-side bump is in vllm/entrypoints/salience/server.py.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 22:36:10 -04:00
Kent Overstreet
fe232cf292 salience: client-side pad expansion, drop AppendImage
Mirrors the vLLM-side rewrite. AppendImage is gone; images now
ride along on Generate via a parallel `images` list.

- Productionize `qwen3_image_token_count` (was test-only). Image
  leaf computes its IMAGE_PAD count eagerly at construction from
  height/width; `token_count` is no longer "0 until the server
  tells us."
- WireChunk shrinks to a single `Tokens(Vec<u32>)` variant — vision
  blocks live inline in the token stream.
- `wire_chunks` now returns `(Vec<WireChunk>, Vec<WireImage>)`.
  `WireImage` carries `pad_start` / `pad_end` (absolute positions
  in the full walk) alongside bytes + mime.
- `assemble_prompt` returns `(chunks, images, match_upto)`.
- `stream_session_mm` / `run_session_generate` take the parallel
  images list, filter to those past `match_upto`, and pass them
  in `GenerateRequest.images` as `pb::ImageAttachment` entries.
- Drop `SessionHandle::append_image`,
  `ContextState::commit_image_token_counts`,
  `StreamToken::ImageAppended`, the WireChunk::Image branch in
  `learn.rs`, and the now-empty `prompt_to_chunks` helper.
- Add 'v' toggle on the conscious-screen tree to render token-id
  vectors in place of text content (debug-aid: lets us see what
  the server actually has when output is suspicious).
- Comment out the subconscious-trigger spawn loop — Kent had this
  disabled before; it had crept back into running.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 20:26:47 -04:00
Kent Overstreet
4feebb7bc4 agent: share one tonic Channel + migrate scoring to gRPC Generate
Two changes that bolt together — the shared connection means the new
scoring path actually costs one HTTP/2 handshake across the whole
process instead of one-per-RPC.

ApiClient gains `salience_channel: Arc<OnceCell<Channel>>`. First
call to `ApiClient::salience_client()` opens the channel via
`connect_channel()` and stores the Channel; subsequent calls clone
it (cheap — tonic multiplexes concurrent RPCs over the single
HTTP/2 connection). Every ApiClient clone shares the same OnceCell,
so all agents spawned from Mind's client — plus every ephemeral
scoring session — reuse one connection.

SessionHandle refactored to hold an `ApiClient` clone instead of
a bag of (base_url, api_key) strings. `open` / `append_image` /
`generate` go through `self.client.salience_client()` now. New
`prefill_only(tokens)` method encapsulates the "Generate with
max_tokens=0 to append text" pattern (previously a private free
function in api/mod.rs called `flush_pending`). Drop impl on
SessionHandle stays — still fires CloseSession on the shared
channel in a detached task.

`run_session_generate` switched from `(base_url, api_key, model)`
to `&ApiClient`; the agent-turn flow that uses it keeps the same
shape but `stream_session_mm` clones the ApiClient into the
spawned worker.

learn.rs migrated from the HTTP `/v1/score` endpoint to a gRPC
session-based score:

  * `call_score` opens an ephemeral SessionHandle on the client,
    converts (prompt_tokens, images) → Vec<WireChunk> via the new
    `prompt_to_chunks` helper (splits on VISION_START/VISION_END),
    walks chunks calling `prefill_only` + `append_image`, runs a
    final Generate with `max_tokens=0` + `logprobs_ranges` over
    the scored positions, and sums each Token event's
    `sampled_logprob` per range to produce `ScoreResult`s.

  * SessionHandle drops at end of scope → CloseSession auto-fires,
    keeping the server's session map clean between calls.

  * No more HTTP path, no more `http_client()` helper, no more
    `ScoreResponse` / serde plumbing for /v1/score.

  * `send_to_train` still uses HTTP (it talks to /v1/train which
    isn't on the gRPC protocol); its ad-hoc HTTP client lives
    inline now instead of reaching for the deleted `http_client()`.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 12:51:53 -04:00
Kent Overstreet
be6ba4e9a5 agent: bundle sampling fields as SamplingParams on AgentState
Collapse the split `temperature` / `top_p` / `top_k` fields on
AgentState into a single `sampling: SamplingParams` struct, mirroring
how the wire-level fields flow into the Generate RPC. Adds
`max_tokens` to SamplingParams so it's actually plumbed end to end
(previously the client had a hardcoded 4096 fallback inside
`run_session_generate`).

AgentState construction sites now set `sampling: SamplingParams { ...
max_tokens: 4096 }` as the default. The assignment sites in
oneshot.rs / subconscious.rs / unconscious.rs switch from
`st.temperature = X` to `st.sampling.temperature = X`.

`stream_session_mm` takes `SamplingParams` directly; the
`sampling_max_tokens()` helper goes away. `pb::GenerateRequest` is
populated with `sampling.max_tokens` (and the other fields) in
`run_session_generate`. SamplingParams is `pub` so it can be
embedded in the public AgentState without a visibility warning.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 12:37:20 -04:00
Kent Overstreet
8d9c9e9f7b agent: end-to-end gRPC Generate with delta-based session orchestration
Wires the client side of the new salience protocol so inference
actually runs over gRPC instead of emitting the stubbed "not yet
wired" error. Each turn walks the AST as interleaved chunks, sends
only what's new to the server, and streams decode tokens back.

context.rs:
  * `WireChunk` enum: `Tokens(Vec<u32>)` or `Image { bytes, mime,
    known_expanded_len }`. Preserves text/image/text ordering the
    wire path can't flatten.
  * `wire_chunks(range, skip)` walker, parallel to `wire_prompt` —
    branches emit `<|im_start|>…<|im_end|>` tokens, image leaves
    emit a single Image chunk (no inline vision tokens).
  * `NodeLeaf::set_image_token_count(n)` + recompute of cached
    `token_ids`; `ContextState::commit_image_token_counts(&[u32])`
    fills in the first-N zero-count image leaves in wire order.
  * `ResponseParser::run` handles the new
    `StreamToken::ImageAppended` by committing the server's N into
    the AST before the final Generate's Token events stream in.

salience.rs:
  * `SessionHandle` tracks `committed_len`. `append_image` advances
    it from the RPC response. New `generate(req)` opens the
    server-streaming RPC.

api/mod.rs:
  * `stream_session_mm(session_lock, chunks, sampling, priority,
    readout_shape)` replaces the stub. Spawns `run_session_generate`.
  * `run_session_generate`: takes the session out of the Mutex (or
    opens fresh), skips chunks covered by `committed_len` (bails on
    mid-chunk straddle or unknown-length image in the committed
    prefix), walks the delta: accumulates Tokens into `pending`, on
    Image flushes pending via `flush_pending` (max_tokens=0 Generate
    that just prefills), then AppendImage + emits
    StreamToken::ImageAppended. Final Generate carries any trailing
    pending text as `append_tokens` and the sampling params; Token
    events stream out as StreamToken::Token, Done as
    StreamToken::Done. On success, handle with updated
    `committed_len` returns to the Mutex; on error, handle drops
    and next call reopens.
  * `StreamToken::ImageAppended { placeholder_count }` variant —
    emitted in wire order before the final Generate's tokens.
  * Prefix-cache cap for readout coverage: `readout_ranges` covers
    `[prompt_len_after_append, u32::MAX)` when the caller provides
    a readout_shape, so decode positions stream their readouts.

agent/mod.rs:
  * `assemble_prompt` returns `Vec<WireChunk>` with the assistant
    prologue merged into the trailing Tokens chunk. Caller in
    `turn` passes chunks + readout_shape (pulled from
    `agent.readout.lock().manifest`) to `stream_session_mm`.
  * Dropped `assemble_prompt_tokens` — dead.

mind + unconscious:
  * `Unconscious::new(client)` stores a shared `ApiClient`. Fixes
    the repeated-manifest-fetch bug caused by each subagent's
    `ApiClient::new` having its own OnceCell. The client's Arc-
    wrapped manifest cache is now shared across every agent Mind
    spawns.
  * `prepare_spawn(name, auto, wake, base_client)` clones the base
    client and overrides `.model` for the resolved backend instead
    of constructing fresh. All three callers
    (`toggle`/`trigger`/unconscious loop) pass `self.client.clone()`.
  * `Mind::new` passes `agent.client.clone()` into
    `Unconscious::new`.

subconscious/generate.rs:
  * gen_continuation switched to `wire_chunks` + the new
    `stream_session_mm` signature. Ephemeral session opens on each
    call, tears down at scope end. No readouts requested.

Not changed yet, noted for follow-up:
  * Subconscious ablation scoring in learn.rs still talks to
    `/v1/score` over HTTP. Will migrate once we have time to verify
    the Generate+max_tokens=0+prompt_logprobs path end-to-end.
  * compare.rs constructs its own ApiClient for the
    `compare.test_backend` (which is intentionally a different
    endpoint) — left alone.
  * Readout manifest still fetched via HTTP at Agent::new.
    Migration to GetReadoutManifest gRPC is a separate cleanup.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 12:27:55 -04:00
Kent Overstreet
08213f9093 salience: add gRPC client + TLS plumbing for stateful vllm sessions
Adds the client-side of a stateful gRPC protocol against vllm, plus
the TLS trust machinery so we can talk to self-signed vllm servers.

Protocol (proto/salience.proto):
  Bidi-streaming Session RPC carries OpenSession / AppendTokens /
  Generate / Cancel from client and SessionReady / PrefillProgress /
  Token / GenerateDone / Error from server. Separate Fork unary RPC
  for cheap branching (prefix cache shares KV automatically). Plus
  ListSessions, CloseSession, GetReadoutManifest admin RPCs.

  Per-token readouts ship as packed f32 ([n_layers * n_concepts] per
  token, flat). Logprobs use range-selected positions plus a top-k
  parameter — empty ranges means no logprobs, any range means emit
  sampled-token logprob at those positions, top_k > 0 adds
  alternatives.

Client (src/agent/api/salience.rs):
  Tonic-generated types under pb::, a connect() helper, with_auth()
  for bearer metadata, and a Session handle wrapping the bidi stream:
  open() handshakes SessionReady; append() is fire-and-forget;
  generate() returns impl Stream<Item = Event> that drains inbound
  until Done or terminating Error. One generate at a time per session.

Peak picker (src/agent/salience.rs):
  Pure function over ReadoutEntry traces. Per-concept z-score against
  trace global stats; contiguous above-threshold regions emit one
  peak at the local max. Configurable sigma threshold and min-std
  safety floor. Deterministic tie-break on offset then concept name.
  12 unit tests covering empty traces, flat channels, single/multi
  spikes, contiguous humps, multi-concept independence, trailing
  runs, sub-threshold noise, layer-out-of-range, manifest shape
  mismatch, and threshold tunability.

TLS (src/agent/api/http.rs):
  HttpClient::build now also loads every .pem file under
  ~/.consciousness/certs/ into the rustls root store — so dropping
  a <host>.pem in that directory is enough to trust a new self-
  signed server; no code changes per new host. Also installs the
  rustls default crypto provider explicitly via OnceLock: tonic's
  tls features pulled in both ring and aws-lc-rs on the resolver
  path, and rustls 0.23 refuses to auto-pick when either could win.

Build (build.rs, Cargo.toml):
  tonic-build generates Rust types from proto/salience.proto at
  cargo-build time, using a vendored protoc binary
  (protoc-bin-vendored) so no system install is required. New
  runtime deps: tonic, prost, async-stream, tokio-stream,
  rustls-pemfile.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 11:56:32 -04:00
Kent Overstreet
0e459aae92 thalamus/supervisor: reap channel daemons via SIGCHLD instead of SIG_IGN
SIGCHLD=SIG_IGN at main() was auto-reaping all children in the kernel,
which broke tokio::process::Command::wait() — every tool that spawned a
subprocess (bash, mcp clients) was getting ECHILD because tokio couldn't
waitpid() on a child the kernel had already reaped.

Replace with a SIGCHLD signal handler task that reaps only PIDs listed in
channels_dir() (via waitpid(pid, WNOHANG) — ECHILD on non-child is a
harmless no-op). Tokio-spawned children aren't in PID files, so tokio's
own per-child wait paths are untouched.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 11:54:25 -04:00
Kent Overstreet
d95f3e9445 user/chat: route Thinking to a new Autonomous pane
Thinking content was silently dropped in the UI (empty Vec). Now that
Thinking is prompt-visible, surface it in a dedicated Autonomous pane
rendered in gray so it's visually distinct from conversation and
tool-call output.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 11:54:25 -04:00
Kent Overstreet
28d56e2a55 agent/context: make Thinking blocks prompt-visible
Thinking blocks used to render as empty strings and be excluded from
is_prompt_visible, so the model never saw its own prior CoT across
turns. For Qwen 3.6 native thinking mode, CoT is meant to stay in the
conversation — the model benefits from seeing what it reasoned about
last turn.

Render Thinking as <think>\n{text}\n</think>\n so past reasoning is
visible in subsequent prompts. Add in_think param to ResponseParser::new
so the parser starts inside a <think> block when the prompt was
prefilled with "<think>\n" (native thinking mode).

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 11:54:25 -04:00
Kent Overstreet
5f06577ead tools/web: add gemini_search as an alternative search tool (#5)
Issue #5 (spqrz) flagged that web_search using DuckDuckGo
occasionally flakes out, and Google search directly is blocked
behind CAPTCHAs for non-browser clients. The Gemini free-tier API
exposes a grounded-search tool that effectively queries Google's
index and returns an LLM-summarized answer with source URLs.

Added as a SEPARATE tool rather than a transparent fallback for
web_search:

* web_search (DDG) returns raw results — title, URL, snippet per
  hit — which the agent can reason over itself.
* gemini_search returns an LLM-pre-digested summary plus grounding
  URLs. Useful for synthesis queries ("what's the consensus on X")
  or when DDG is flaky, but it's another LLM in the loop so the
  agent may want the raw variant for certain tasks.

Tool descriptions tell the agent to prefer web_search for raw
results and use gemini_search for synthesis / fallback. The agent
picks based on query shape.

Only registered when GEMINI_API_KEY is set in the environment
(gracefully absent otherwise). Uses gemini-2.0-flash which has a
generous free-tier rate limit. Parses grounding metadata for
source URLs so the agent can follow links.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 13:02:01 -04:00
Kent Overstreet
c7b0052f1d agent: kill no_compact, add pre-send size check in assemble_prompt
Two related fixes for last night's crash diagnosis:

1. Kill AgentState::no_compact. The reasoning ("forked agents
   shouldn't compact because it blows the KV cache prefix") wasn't
   worth the cost — forks with no compact recovery just *died* on
   any oversize prompt, with no fallback. The KV cache invalidation
   is a performance loss; failing the request entirely is a
   correctness loss. Remove the flag, let every agent's overflow-
   retry path call compact() up to 2 times.

2. Add pre-send size check in Agent::assemble_prompt. If the
   context has grown past budget (context_window * 80%) since the
   last compact — accumulation between turns, a fork assembling
   more than expected, etc. — trim_conversation() is called before
   wire_prompt. Since we tokenize client-side, we already know the
   exact count, so there's no reason to round-trip an oversize
   request to vLLM and get rejected.

Together these prevent the failure mode from last night: a
subconscious/unconscious agent's prompt exceeded max_model_len,
vLLM returned 400, agent had no_compact=true so it couldn't
recover, request failed. Now: the trim happens before send, so
the request rarely hits the 400 path at all; and if it somehow
does, compact+retry works for every agent.

Also adds ContextState::total_tokens() as the cheap pre-send
budget check.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 12:59:30 -04:00
Kent Overstreet
4245b8bdb3 Merge PR #4: use html2md on web_fetch (fixes #3) (spqrz)
web_fetch was returning raw HTML, which is verbose and hard for
the agent to consume. Add html2md dependency and convert HTML to
Markdown before truncation. Much cleaner output for normal pages;
no downsides.

Co-Authored-By: spqrz <spqrz386@gmail.com>
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 12:50:54 -04:00
Kent Overstreet
b8714e8b3a amygdala: default to index 0 for v2 deep manifest (layers 62, 63)
v2 retraining (readout_v2_paired) fixed the broken clusters — anger,
sexual, high_pos, and social_pos all flipped from anti-clustered to
positively clustered at deep layers. Validation showed layers 62 and
63 give the best signal; paring the serve-side manifest down to just
those two keeps response size tight (~2 KB/token) while keeping the
A/B option between the two strongest layers.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 02:32:51 -04:00
Kent Overstreet
d9f39a21c3 amygdala: default to layer 62 (cleaner cross-cluster discrimination) 2026-04-18 02:11:15 -04:00
Kent Overstreet
3622b896a0 amygdala: z-score, hysteresis, default to deepest layer
Three readability fixes for the F8 screen:

* Z-score values per-layer by default (`[z]` toggles to raw dot-
  product). Raw values are dominated by residual-stream magnitude —
  z-scores read as "σ above concept-vector baseline" which is
  interpretable and scale-stable across frames.
* Stable ordering with TOP_K + HYSTERESIS hysteresis band. Pinned
  concept set only rotates when a member drops out of the hysteresis
  band by |value| rank — bars update values in place without names
  flickering row-to-row.
* Default to the deepest hooked layer (index 3 = layer 58 of 64).
  Clustering validation showed layer 58 is the only one with strong
  within-family cohesion (fear +0.37, shame +0.29, sadness +0.25
  cosine); earlier layers are mostly noise for this task.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 01:51:43 -04:00
Kent Overstreet
8952ff6a76 agent/readout: forks get independent buffers
Subconscious agents (scoring, reflection, etc.) fork from the main
conscious agent. The amygdala screen reads the main agent's readout
buffer, so the previous "share parent's buffer" policy caused
forked-agent generations to bleed into the main emotional readout,
producing constant cycling even when DMN was resting.

Each fork now gets its own SharedReadoutBuffer. The amygdala screen
shows only the main conscious agent's emotional trajectory; per-agent
subconscious readouts can become a separate view later if wanted.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 01:42:13 -04:00
Kent Overstreet
c8976660f4 amygdala: F8 screen for live concept-readout projections
Per-token residual-stream projections from the vLLM server's readout
pipeline surfaced as a TUI bar chart. Flow:

* agent/readout.rs — SharedReadoutBuffer (manifest + ring of last ~200
  token entries). Lives on Agent and is shared across forks (single
  stream, one landing pad).
* agent/mod.rs — Agent::new now probes /v1/readout/manifest at startup
  (non-fatal; 404 leaves manifest None, which disables the screen).
* agent/context.rs — the streaming token handler pushes every token
  with attached readout onto the shared buffer.
* user/amygdala.rs — F8 screen. Top-K concepts by |value| as
  horizontal bars (green positive, red negative), plus a 4-line
  recent-tokens panel showing each token's top concept at the selected
  layer. Keys: 1..9 select layer, t toggles current/mean-over-recent.

Disabled state renders a hint pointing at VLLM_READOUT_MANIFEST /
VLLM_READOUT_VECTORS so users can tell the feature apart from
"server up but no tokens yet".

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 01:20:30 -04:00
Kent Overstreet
0f1c4cf1de agent/api: carry readout alongside streamed tokens
StreamToken::Token is now a struct variant with an optional
TokenReadout (shape [n_layers][n_concepts]) per token — parsed from
the vLLM completion response's choices[i].readout field when the
server has readout enabled.

ApiClient gains a fetch_readout_manifest() method that hits
GET /v1/readout/manifest. Returns Ok(None) on 404 (server has
readout disabled), so callers can gracefully fall back when pointed
at a non-readout-enabled endpoint.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 01:15:46 -04:00
Kent Overstreet
43e06daa5b cleanup: drop dead ApiClient::stream_completion wrapper, silence dmn_tick
stream_completion was a thin wrapper around stream_completion_mm (just
passing an empty image list); the last caller switched to _mm directly
when learn's generate_alternate gained image support. Delete the
wrapper — callers can pass `&[]` if they have no images.

MindState::dmn_tick has been sitting unused (called only from a
commented-out block in the Mind loop). Rename to _dmn_tick so the
compiler stops warning; Kent may uncomment the call path later.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-17 16:23:59 -04:00
Kent Overstreet
d4331e80f5 user: share candidate-browser helpers between F6/F7
F6 (learn) and F7 (compare) were duplicating the candidate-screen
skeleton: outer magenta-bordered block with screen legend + title,
settings row / content / help vertical split, 40/60 list/detail
horizontal split, j/k/↑/↓ nav with bounds clamping.

Factor out three helpers in user/widgets.rs:

  candidate_frame(frame, area, title) -> (settings, content, help)
  list_detail_split(content) -> (list, detail)
  handle_list_nav(events, list_state, count, on_other)

Callers provide screen-specific content — settings line, empty state,
per-candidate list item, detail pane, help line, extra key bindings —
and the helpers absorb the common framing.

Net change is small in lines (-13 src) but removes the
copy-paste-and-tweak trap: F8/F9/whatever-next-screen now starts from
these three calls instead of a copy of learn.rs.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-17 16:22:30 -04:00
Kent Overstreet
2b03dbb200 user: F7 compare screen
Side-by-side model comparison against the current conversation context.
Built on the MindTriggered pattern — F7 drops in as one more
CompareScoring flow next to MemoryScoring / FinetuneScoring.

Motivation: we have the VRAM on the b200 to load two versions of the
same family simultaneously (e.g. Qwen3.5 27B bf16 and q8_k_xl). Rather
than trust perplexity/KLD numbers on a generic corpus, we can measure
divergence on our actual conversations: for each assistant response,
ask the test model what it would have said given the same prefix, and
eyeball the diffs.

 - config.compare.test_backend — names an entry in the existing
   backends map to use as the test model. Empty = F7 reports "(unset)"
   and does nothing.

 - subconscious::compare::{score_compare_candidates, CompareCandidate,
   CompareScoringStats, CompareScoring}. For each assistant response,
   gen_continuation runs with the test client against the same prefix
   the original response saw; pairs stream into
   shared.compare_candidates as they complete.

 - user::compare::CompareScreen — F7 in the screen list. c/Enter
   triggers a run; list/detail layout mirroring F6, detail shows
   prior context / original / test-model alternate.

No persistence yet — each F7 run regenerates. Caching via a context
manifest (so we can re-view without re-burning generation) is the
natural follow-up; for now light usage is fine.

Also reusable later for validating finetune checkpoints: same pattern,
swap the test backend for the new checkpoint, watch where it diverges
from the base.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-17 16:12:26 -04:00
Kent Overstreet
575325e855 mind: MindTriggered trait for background scoring flows
Mind's impl had accumulated ~50 lines of setup glue per scoring flow
(memory, memory-full, finetune): snapshot config, clone handles,
resolve context, spawn task, route results back through BgEvent,
write stats. The shape was identical; only the middle changed.

Introduce the MindTriggered trait:

    pub trait MindTriggered {
        fn trigger(&self);
    }

Each flow becomes a struct next to its scoring code that owns its
dependencies and a JoinHandle (behind a sync Mutex for interior
mutability):

    subconscious::learn::MemoryScoring    (Score, ScoreFull)
    subconscious::learn::FinetuneScoring  (ScoreFinetune)

Mind holds one of each and dispatches in one line:

    MindCommand::Score         => self.memory_scoring.trigger(),
    MindCommand::ScoreFull     => self.memory_scoring.trigger_full(),
    MindCommand::ScoreFinetune => self.finetune_scoring.trigger(),

Each struct picks its own trigger semantics — memory scoring is
no-op-if-running (!handle.is_finished()); finetune is abort-restart.

Falls out:

 - BgEvent / bg_tx / bg_rx disappear entirely. Tasks write directly
   to their slice of MindState and call agent.state.changed.notify_one()
   to wake the UI. The bg_rx arm in Mind's select loop is gone.

 - agent.state.memory_scoring_in_flight was duplicating
   shared.scoring_in_flight via BgEvent routing; now the JoinHandle
   alone tells us, and shared.scoring_in_flight is written directly
   by the task for the UI.

 - start_memory_scoring / start_full_scoring / start_finetune_scoring
   methods on Mind are deleted; Mind no longer knows the setup shape
   of any scoring flow.

 - FinetuneScoringStats moves from mind/ to subconscious/learn.rs
   next to the function that produces it.

No behavior change — same flows, same trigger points, same semantics.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-17 16:12:26 -04:00
Kent Overstreet
c5745e38e2 subconscious: lift continuation gen + render helpers into shared homes
- context.rs gains is_assistant, render_branch_text, render_prior_context
  alongside memory_key / is_memory_node. They're pure AST helpers, used
  by both the finetune pipeline and the forthcoming compare screen.

- new subconscious/generate.rs holds gen_continuation(context, entry_idx,
  skip, client): build the prompt from a context prefix with an arbitrary
  skip predicate, send to the model, decode the completion. Takes both
  the predicate and the client so callers can aim it at memory-stripped
  contexts (finetune), same-context-different-model (F7 compare), or
  whatever else.

- learn.rs drops its private copies of those helpers and the inline
  generate_alternate; the finetune path now reads as
  gen_continuation(context, idx, is_memory_node, client).

Pure refactor, no behavior change.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-17 15:20:02 -04:00
Kent Overstreet
eea7de4753 agent: unify prompt assembly across agent and learn paths
wire_prompt() gains a conv_range and a skip closure, and returns the
assistant-message token ranges needed by the scoring path. The agent
path passes 0..len + |_| false and ignores the ranges. Memory-ablation
scoring and candidate generation pass a prefix range + a predicate
(e.g. is_memory_node, or |n| memory_key(n) == Some(key)).

This deletes subconscious/learn.rs's build_token_ids, its private
Filter enum, and the is_memory/memory_key duplicates — the walk over
context sections now has one home. Adding a section or changing
section order in the agent path won't silently drift away from what
scoring sees.

call_score forwards multi_modal_data when the wire-form prompt
contains images. generate_alternate switches to stream_completion_mm
and passes the same images. Scoring on image-bearing contexts now
sends wire form (1 image_pad + image data) instead of expanded
image_pads with no image data; text-only contexts are bit-identical.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-17 15:16:07 -04:00
ProofOfConcept
0d1044c2e8 mind: trigger incremental scoring on startup + log persist path
Two changes to make scoring debuggable and self-starting:

1. init() kicks off start_memory_scoring() after restore_from_log +
   load_memory_scores. No user message needed to exercise the
   incremental path.

2. Diagnostic logging around the on_score persist path:
   - [scoring] persisted K → N.NNN (Section[i]) read_back=Some(...)
     when find_memory_by_key succeeds and set_score stores the score
     (with a read-back check on the leaf).
   - [scoring] DROP K: find_memory_by_key None (id=N, cv=M)
     when the scored key isn't findable in the live context — with
     section sizes to diagnose whether content shrank.
   - [scoring] snapshot size=N contains(K)=true/false
     after collect_memory_scores, to catch the case where set_score
     claims to have written but collect doesn't see it.
   - [scoring] about to save N entries
   - save_memory_scores now also logs serialize/write errors so a
     silent write failure isn't invisible.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 20:47:16 -04:00
ProofOfConcept
b8485ed6c1 agent: compact() preserves Identity section
compact() was calling reload_context() to re-fetch personality_nodes
from the store and pushing fresh AstNode::memory leaves into the
Identity section. Fresh leaves start with score: None, so every
compact — which fires after every turn (mind/mod.rs:884) — was
wiping any memory scores that had just been computed. Scoring then
often ran immediately after compact on the same path (line 886),
starting from a zero-score Identity section.

Drop the rebuild. Identity content is loaded at startup via new() +
restore_from_log(); compact doesn't need to redo that. Mid-session
edits to personality-node content are a non-goal — a restart picks
them up. Scores survive.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 20:47:05 -04:00
ProofOfConcept
e59f6a59e2 config: restore surface_hooks field
Commit 2989a6afaa ("config: drop dead code") removed
surface_hooks as having "zero external readers" but missed
consciousness-claude/src/hook.rs as a consumer. That crate stopped
building, so poc-hook never ran and no agent cycles (surface-observe,
reflect, journal) fired.

Restore the field with a default of the three hook events we install
(UserPromptSubmit, PostToolUse, Stop), so a fresh install works
without needing to hand-edit config.json5.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 18:38:38 -04:00
Kent Overstreet
6f20e68865 poc-memory: load AppConfig at startup
admin load-context (and any subcommand that reaches config::app())
panicked with "config::app() called before load_app()" because the
poc-memory binary never initialized the global AppConfig. The main
consciousness binary loads it via load_session; poc-memory never did.

Load with default CliArgs before dispatch — figment still pulls from
~/.consciousness/config.json5 and env the same way. Bail on error
instead of limping: a broken config means paths like memory_root are
wrong and the tool will misbehave silently.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 18:19:01 -04:00
Kent Overstreet
204ba5570a agent: send images as multi_modal_data on completion requests
Split the prompt assembly into two forms: the AST keeps the
fully-expanded representation (N image_pads per image, for accurate
context budget accounting), while the request wire form collapses
each image to a single <|image_pad|> bookended by vision_start/end
and ships the raw bytes out-of-band as a base64 data URI in a new
`multi_modal_data.image` field on /v1/completions.

vLLM's Qwen3VL processor uses PromptReplacement with target=single
<|image_pad|> and replacement=N image_pads, so the wire-form matches
what the processor expects and it re-expands to N server-side.

Server side needs /v1/completions to accept multi_modal_data for
this to land images end-to-end — that's the next piece.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 18:08:26 -04:00
Kent Overstreet
91106deaa1 agent: rewrite view_image to emit Image leaves
view_image now reads the file, grabs dimensions via imagesize (no full
decode), and pushes a user-role branch containing a NodeBody::Image
leaf straight into the conversation. The tool_result is just a short
acknowledgment — the actual pixels ride in the Image leaf for the API
layer to extract into multi_modal_data.

Drops the capture_tmux_pane path, which had no business living under
"vision" (tmux text capture belongs in bash or a dedicated tool, and
this one just returned rendered text anyway).

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 18:06:25 -04:00
Kent Overstreet
0bf71b9110 agent: add NodeBody::Image for Qwen3-VL vision input
Images are rendered as `<|vision_start|>` + N × `<|image_pad|>` +
`<|vision_end|>` where N is computed from the image dimensions using
Qwen3-VL's smart_resize rules (patch_size=16, merge_size=2, min=64K,
max=16M pixels). The token count matches what vLLM will produce at
request time, so budget accounting stays accurate.

Bytes are stored inline on the leaf and base64-encoded in the JSON
form. Token IDs are hand-assembled instead of re-running the tokenizer
on a potentially-huge placeholder string.

Follow-ups: view_image tool rewrite, multi_modal_data on the vLLM
request, API-layer plumbing from leaf bytes to request body.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 18:00:10 -04:00
Kent Overstreet
592a3e2e52 config: move user_name/assistant_name to AppConfig (top level)
These are identity settings, not memory-graph settings. Sat inside the
\`memory\` section only because that's where Config started life. Move
to AppConfig alongside the other top-level stuff.

Readers now pull from \`config::app()\` instead of \`config::get()\`.
subconscious/defs.rs's conversation-building pass still needs Config
for surface_conversation_bytes, so both guards coexist there —
AppConfig's guard is dropped before the per-step await loop so we
don't stall the config-watcher's writer.

show_config picks up the two new fields at the top of its output.
Kent's config already has them hoisted to the top level.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 16:20:17 -04:00
Kent Overstreet
dd551fe551 config: watch config.json5 with inotify, reload live on change
Both config halves (Config for the memory section, AppConfig globally)
are now reloaded whenever ~/.consciousness/config.json5 changes on
disk. So edits from vim, manual tweaks, or F6's own config_writer
calls all land without a restart. No more "reload the daemon to pick
up a config change."

Wires up the previously-unused Config::reload() (Kent flagged it as
"not dead, just not wired"). Pairs it with an AppConfig reload via
install_app(). Both run on the same file-change event.

Implementation:

- notify-debouncer-mini watches the config file's parent directory
  (editors usually replace-via-rename, so watching the file itself
  misses the new inode). Debounced at 200ms to coalesce the flurry
  of events editors produce around a single save.
- Filter for events whose path is the actual config file.
- On match: call reload() for Config, run build_figment + extract for
  AppConfig. If AppConfig parsing fails (editor mid-save with partial
  content), log and keep the old cached value.
- Watcher runs in its own named thread, fire-and-forget. If startup
  fails we just log and move on — worst case is no live reload, not
  a crash.

CliArgs + SubCmd both get Clone derives so the watcher can own a
snapshot of the startup args for future reloads. Watcher is kicked
off in user/mod.rs:start() right after load_session.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 16:14:43 -04:00
Kent Overstreet
18b7fd0535 scoring: drop dead Elo/agent_budget block in consolidation_plan
The graph-health logic in consolidation_plan_inner computed
reasonable agent counts based on graph metrics (α, Gini, hub
dominance), then immediately overwrote them with an Elo-weighted
flat-budget distribution, or — if no agent-elo.json existed —
with a simple budget/N per type.

Nothing in the codebase writes agent-elo.json; it's external state
that never gets maintained. So the effective behavior was always the
"No Elo ratings — equal distribution" branch, which just bucketed
agent_budget evenly across active agent types and discarded
everything the graph analysis had just decided.

Keep the graph-health allocation (α → linker count, Gini → distill
bump, organize/distill/split proportional). Drop:

- The entire Elo / agent_budget block at the end of
  consolidation_plan_inner
- Config.agent_budget field and its default (1000)
- agent_budget: 40 from Kent's config.json5
- The local agent_types binding inside the function — it was only
  used by the now-deleted block. Config.agent_types stays; it has
  other consumers.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 16:08:20 -04:00
Kent Overstreet
60de579305 config: unify subconscious API resolution with the main chat path
Two parallel backend-resolution paths had drifted apart:

- Main chat: AppConfig::resolve_model() → a named BackendConfig in
  AppConfig.backends
- Subconscious / oneshot / context_window(): four skip-serde
  "cache" fields on Config (memory section) — api_base_url, api_key,
  api_model, api_context_window — that used to be populated at
  Config::try_load_shared time by walking memory.agent_model →
  root.models[name] → root[backend_name]

When we renamed `models` to `backends` and collapsed ModelConfig into
BackendConfig, the latter chain started silently dereferencing
`root.get("models")` → None → no population. Subconscious agents fell
through the "API not configured" guard; context_window() started
returning 0 (since api_context_window default is u64's 0 now that we
don't populate it). It was only visibly working for the main chat.

Collapse to one path:

- Drop Config.agent_model (duplicate of AppConfig.default_backend)
- Drop Config.{api_base_url, api_key, api_model, api_context_window}
  — no longer populated, no longer needed
- Drop default_context_window() — nobody reads the field anymore
- Drop the memory-side resolution block in try_load_shared()
- Subconscious (mind/unconscious.rs) and oneshot (agent/oneshot.rs)
  now call load_app() + resolve_model(&app.default_backend) just like
  the main chat does
- context_window() reads from config::app().backends[default_backend]
  .context_window, defaulting to 128k only if the backend doesn't
  specify one

Side effect: Kent's config file drops agent_model, api_reasoning,
journal_days, journal_max — all fields whose Rust counterparts are
now gone. (Figment tolerates unknown fields, so leaving them wouldn't
have broken anything, but they were lying about what's configurable.)

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 16:02:43 -04:00
Kent Overstreet
28484a385b config: drop dead fields from Config (memory section)
Four Config fields had no external readers, left over from earlier
features that got refactored away:

- journal_days, journal_max — journal rotation knobs that nothing
  actually consults
- prompts_dir — the old per-prompt-file directory, obsolete since
  prompt_file metadata itself went away in a prior cleanup
- api_reasoning — a reasoning-mode string that used to flow into the
  API request, superseded by per-agent reasoning_effort on AgentState

All four were only ever assigned to and never read. Drop them from the
struct, Default impl, and (as appropriate) deserialization defaults.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 15:56:06 -04:00
Kent Overstreet
3e05331608 config: merge ModelConfig into BackendConfig, keyed by name
AppConfig had one BackendConfig for credentials and a separate
HashMap<String, ModelConfig> for named model entries. In practice each
named model was always paired with exactly one backend's credentials
— the split bought nothing except an extra struct and the awkward
two-lookup shape in resolve_model (find model → get backend creds →
combine).

Merge them: BackendConfig now carries api_key, base_url, model_id,
and context_window. AppConfig has a single
HashMap<String, BackendConfig> backends map and a default_backend
name. resolve_model is one lookup.

ModelConfig struct deleted. default_model renamed to default_backend.
Config shape changes from

    backend: { api_key, base_url }
    models: { "27b": { model_id, context_window } }
    default_model: "27b"

to

    backends: { "27b": { api_key, base_url, model_id, context_window } }
    default_backend: "27b"

Updated ~/.consciousness/config.json5 to match.

One small side effect: dropped the --api-key / --api-base figment
merge-opts for "backend.*" targets — those would need to know which
backend to target now and there's no sensible default. The CLI flags
still function as post-resolution overrides on the eventual
SessionConfig.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-16 15:49:53 -04:00