consciousness

Author	SHA1	Message	Date
Kent Overstreet	a075e30557	http: add HttpResponse::bytes() for binary downloads Mirror of text(), but returns raw Bytes without lossy UTF-8 conversion. Needed by the Telegram channel to fetch photo files. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-05-01 17:58:35 -04:00
Kent Overstreet	09896cd38b	Revert "replace try_lock() with lock_blocking() across UI thread" This reverts commit `4225294d16`.	2026-04-25 17:15:53 -04:00
Kent Overstreet	4225294d16	replace try_lock() with lock_blocking() across UI thread Add lock_blocking() to TrackedMutex: blocks current thread using block_in_place + futures::executor::block_on, safe for sync contexts. Replace all try_lock() calls with lock_blocking() in slash commands, UI rendering, and status reads. Lock hold times are fast enough that blocking briefly is fine, and this eliminates the spurious 'lock unavailable' paths that were never actually hit. Kept rx_mutex.try_lock() in mod.rs (std::sync::Mutex for stderr rx).	2026-04-25 15:35:14 -04:00
Kent Overstreet	5210f7dd66	context: heal pre-refactor image logs with token_count=0 Recompute image token counts from persisted dimensions when loading old logs that stored count=0 (server-authoritative count was applied after AppendImage before client-side pad expansion). graph: cache neighbor sets for clustering coefficient Pre-compute neighbor HashSets so the O(deg^2) triangle-counting inner loop doesn't re-allocate on every (i,j) pair. avg_clustering_ coefficient() now builds the cache once instead of O(N*deg) times.	2026-04-25 15:15:21 -04:00
Kent Overstreet	371b40078d	context: salvage in-flight tag accumulators on premature stream end ResponseParser.finish() was only flushing self.buf — the rolling tail window — and silently dropping self.think_buf and self.tool_call_buf. When a stream ended inside an unterminated <think>...</think> or <tool_call>...</tool_call> block (max_tokens reached, EOS before the close tag, server-side cancel), all the accumulated in-tag content was discarded and only the trailing ~8 bytes survived (drain_safe keeps `close_tag.len()` bytes at the tail of buf to handle across-chunk tag splits — and `</think>` is exactly 8 chars). Symptom: assistant responses cut off, only the last few characters come through. Especially severe in native-think mode where in_think is set from prefill, so the entire response accumulates in think_buf and gets wiped on premature stop. In finish(): if in_think, drain buf into think_buf and emit as a Thinking node (preserving the partial thought). If in_tool_call, attempt to parse the body; on parse failure, wrap the partial as content with the leading <tool_call> open tag so the model sees its own truncated attempt next turn rather than losing it. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-24 23:32:44 -04:00
Kent Overstreet	c2433c1773	context: tighten the Branch token-cache invariant Two pieces around the cache that landed when Branch nodes started holding `token_ids: Some(server_authoritative_stream)`: 1. wire_into / wire_chunks now pair cached vision blocks with their child Image leaves. Previously the cached-branch arm spliced the cache verbatim and didn't recurse for images, so a Branch whose cache contained `VISION_START..VISION_END` blocks would emit those tokens with no matching `WireImage` push — leading to a panic downstream when `pair_images_to_ranges` tried to attach the missing image. New `pair_cached_images` walks the children depth-first for image leaves and zips them against `vision_blocks(cache)` to emit correctly-offset entries; mismatched counts panic loudly because that's an AST/cache invariant violation that would otherwise mis-pair on the wire. 2. `conversation_mut() -> &mut Vec<AstNode>` was the one public escape hatch that let callers reach into a Branch's children and mutate them without invalidating the cached token stream. Removed in favor of a focused `set_branch_memory_score(section, index, key, score)` for the only legitimate use we had today (the full-matrix scorer writing per-memory divergence onto the Assistant Branch). Updated the lone caller in subconscious/learn. Documented the invariants explicitly on `ContextState`: every `Leaf.token_ids` matches `body.compute_token_ids()`, and every `Branch { token_ids: Some(_) }` is a faithful walk of its children. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-24 23:15:55 -04:00
Kent Overstreet	10c8878f1c	agent: bump tonic gRPC message caps to 64 MiB The default 4 MiB cap on encoded/decoded messages is too small for the multimodal Generate path: Qwen3.6-VL high-res patches put 5–8 MiB of pre-encoded image bytes inline in a single Generate request, and Done events carrying full per-token readout vectors can also exceed 4 MiB on long runs. Hit "ResourceExhausted: Received message larger than max (5799108 vs. 4194304)" from the salience server. Bump both encode and decode caps on every cloned SalienceClient. The matching server-side bump is in vllm/entrypoints/salience/server.py. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-24 22:36:10 -04:00
Kent Overstreet	fe232cf292	salience: client-side pad expansion, drop AppendImage Mirrors the vLLM-side rewrite. AppendImage is gone; images now ride along on Generate via a parallel `images` list. - Productionize `qwen3_image_token_count` (was test-only). Image leaf computes its IMAGE_PAD count eagerly at construction from height/width; `token_count` is no longer "0 until the server tells us." - WireChunk shrinks to a single `Tokens(Vec<u32>)` variant — vision blocks live inline in the token stream. - `wire_chunks` now returns `(Vec<WireChunk>, Vec<WireImage>)`. `WireImage` carries `pad_start` / `pad_end` (absolute positions in the full walk) alongside bytes + mime. - `assemble_prompt` returns `(chunks, images, match_upto)`. - `stream_session_mm` / `run_session_generate` take the parallel images list, filter to those past `match_upto`, and pass them in `GenerateRequest.images` as `pb::ImageAttachment` entries. - Drop `SessionHandle::append_image`, `ContextState::commit_image_token_counts`, `StreamToken::ImageAppended`, the WireChunk::Image branch in `learn.rs`, and the now-empty `prompt_to_chunks` helper. - Add 'v' toggle on the conscious-screen tree to render token-id vectors in place of text content (debug-aid: lets us see what the server actually has when output is suspicious). - Comment out the subconscious-trigger spawn loop — Kent had this disabled before; it had crept back into running. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-24 20:26:47 -04:00
Kent Overstreet	4feebb7bc4	agent: share one tonic Channel + migrate scoring to gRPC Generate Two changes that bolt together — the shared connection means the new scoring path actually costs one HTTP/2 handshake across the whole process instead of one-per-RPC. ApiClient gains `salience_channel: Arc<OnceCell<Channel>>`. First call to `ApiClient::salience_client()` opens the channel via `connect_channel()` and stores the Channel; subsequent calls clone it (cheap — tonic multiplexes concurrent RPCs over the single HTTP/2 connection). Every ApiClient clone shares the same OnceCell, so all agents spawned from Mind's client — plus every ephemeral scoring session — reuse one connection. SessionHandle refactored to hold an `ApiClient` clone instead of a bag of (base_url, api_key) strings. `open` / `append_image` / `generate` go through `self.client.salience_client()` now. New `prefill_only(tokens)` method encapsulates the "Generate with max_tokens=0 to append text" pattern (previously a private free function in api/mod.rs called `flush_pending`). Drop impl on SessionHandle stays — still fires CloseSession on the shared channel in a detached task. `run_session_generate` switched from `(base_url, api_key, model)` to `&ApiClient`; the agent-turn flow that uses it keeps the same shape but `stream_session_mm` clones the ApiClient into the spawned worker. learn.rs migrated from the HTTP `/v1/score` endpoint to a gRPC session-based score: * `call_score` opens an ephemeral SessionHandle on the client, converts (prompt_tokens, images) → Vec<WireChunk> via the new `prompt_to_chunks` helper (splits on VISION_START/VISION_END), walks chunks calling `prefill_only` + `append_image`, runs a final Generate with `max_tokens=0` + `logprobs_ranges` over the scored positions, and sums each Token event's `sampled_logprob` per range to produce `ScoreResult`s. * SessionHandle drops at end of scope → CloseSession auto-fires, keeping the server's session map clean between calls. * No more HTTP path, no more `http_client()` helper, no more `ScoreResponse` / serde plumbing for /v1/score. * `send_to_train` still uses HTTP (it talks to /v1/train which isn't on the gRPC protocol); its ad-hoc HTTP client lives inline now instead of reaching for the deleted `http_client()`. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-24 12:51:53 -04:00
Kent Overstreet	be6ba4e9a5	agent: bundle sampling fields as SamplingParams on AgentState Collapse the split `temperature` / `top_p` / `top_k` fields on AgentState into a single `sampling: SamplingParams` struct, mirroring how the wire-level fields flow into the Generate RPC. Adds `max_tokens` to SamplingParams so it's actually plumbed end to end (previously the client had a hardcoded 4096 fallback inside `run_session_generate`). AgentState construction sites now set `sampling: SamplingParams { ... max_tokens: 4096 }` as the default. The assignment sites in oneshot.rs / subconscious.rs / unconscious.rs switch from `st.temperature = X` to `st.sampling.temperature = X`. `stream_session_mm` takes `SamplingParams` directly; the `sampling_max_tokens()` helper goes away. `pb::GenerateRequest` is populated with `sampling.max_tokens` (and the other fields) in `run_session_generate`. SamplingParams is `pub` so it can be embedded in the public AgentState without a visibility warning. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-24 12:37:20 -04:00
Kent Overstreet	8d9c9e9f7b	agent: end-to-end gRPC Generate with delta-based session orchestration Wires the client side of the new salience protocol so inference actually runs over gRPC instead of emitting the stubbed "not yet wired" error. Each turn walks the AST as interleaved chunks, sends only what's new to the server, and streams decode tokens back. context.rs: * `WireChunk` enum: `Tokens(Vec<u32>)` or `Image { bytes, mime, known_expanded_len }`. Preserves text/image/text ordering the wire path can't flatten. * `wire_chunks(range, skip)` walker, parallel to `wire_prompt` — branches emit `<\|im_start\|>…<\|im_end\|>` tokens, image leaves emit a single Image chunk (no inline vision tokens). * `NodeLeaf::set_image_token_count(n)` + recompute of cached `token_ids`; `ContextState::commit_image_token_counts(&[u32])` fills in the first-N zero-count image leaves in wire order. * `ResponseParser::run` handles the new `StreamToken::ImageAppended` by committing the server's N into the AST before the final Generate's Token events stream in. salience.rs: * `SessionHandle` tracks `committed_len`. `append_image` advances it from the RPC response. New `generate(req)` opens the server-streaming RPC. api/mod.rs: * `stream_session_mm(session_lock, chunks, sampling, priority, readout_shape)` replaces the stub. Spawns `run_session_generate`. * `run_session_generate`: takes the session out of the Mutex (or opens fresh), skips chunks covered by `committed_len` (bails on mid-chunk straddle or unknown-length image in the committed prefix), walks the delta: accumulates Tokens into `pending`, on Image flushes pending via `flush_pending` (max_tokens=0 Generate that just prefills), then AppendImage + emits StreamToken::ImageAppended. Final Generate carries any trailing pending text as `append_tokens` and the sampling params; Token events stream out as StreamToken::Token, Done as StreamToken::Done. On success, handle with updated `committed_len` returns to the Mutex; on error, handle drops and next call reopens. * `StreamToken::ImageAppended { placeholder_count }` variant — emitted in wire order before the final Generate's tokens. * Prefix-cache cap for readout coverage: `readout_ranges` covers `[prompt_len_after_append, u32::MAX)` when the caller provides a readout_shape, so decode positions stream their readouts. agent/mod.rs: * `assemble_prompt` returns `Vec<WireChunk>` with the assistant prologue merged into the trailing Tokens chunk. Caller in `turn` passes chunks + readout_shape (pulled from `agent.readout.lock().manifest`) to `stream_session_mm`. * Dropped `assemble_prompt_tokens` — dead. mind + unconscious: * `Unconscious::new(client)` stores a shared `ApiClient`. Fixes the repeated-manifest-fetch bug caused by each subagent's `ApiClient::new` having its own OnceCell. The client's Arc- wrapped manifest cache is now shared across every agent Mind spawns. * `prepare_spawn(name, auto, wake, base_client)` clones the base client and overrides `.model` for the resolved backend instead of constructing fresh. All three callers (`toggle`/`trigger`/unconscious loop) pass `self.client.clone()`. * `Mind::new` passes `agent.client.clone()` into `Unconscious::new`. subconscious/generate.rs: * gen_continuation switched to `wire_chunks` + the new `stream_session_mm` signature. Ephemeral session opens on each call, tears down at scope end. No readouts requested. Not changed yet, noted for follow-up: * Subconscious ablation scoring in learn.rs still talks to `/v1/score` over HTTP. Will migrate once we have time to verify the Generate+max_tokens=0+prompt_logprobs path end-to-end. * compare.rs constructs its own ApiClient for the `compare.test_backend` (which is intentionally a different endpoint) — left alone. * Readout manifest still fetched via HTTP at Agent::new. Migration to GetReadoutManifest gRPC is a separate cleanup. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-24 12:27:55 -04:00
Kent Overstreet	08213f9093	salience: add gRPC client + TLS plumbing for stateful vllm sessions Adds the client-side of a stateful gRPC protocol against vllm, plus the TLS trust machinery so we can talk to self-signed vllm servers. Protocol (proto/salience.proto): Bidi-streaming Session RPC carries OpenSession / AppendTokens / Generate / Cancel from client and SessionReady / PrefillProgress / Token / GenerateDone / Error from server. Separate Fork unary RPC for cheap branching (prefix cache shares KV automatically). Plus ListSessions, CloseSession, GetReadoutManifest admin RPCs. Per-token readouts ship as packed f32 ([n_layers * n_concepts] per token, flat). Logprobs use range-selected positions plus a top-k parameter — empty ranges means no logprobs, any range means emit sampled-token logprob at those positions, top_k > 0 adds alternatives. Client (src/agent/api/salience.rs): Tonic-generated types under pb::, a connect() helper, with_auth() for bearer metadata, and a Session handle wrapping the bidi stream: open() handshakes SessionReady; append() is fire-and-forget; generate() returns impl Stream<Item = Event> that drains inbound until Done or terminating Error. One generate at a time per session. Peak picker (src/agent/salience.rs): Pure function over ReadoutEntry traces. Per-concept z-score against trace global stats; contiguous above-threshold regions emit one peak at the local max. Configurable sigma threshold and min-std safety floor. Deterministic tie-break on offset then concept name. 12 unit tests covering empty traces, flat channels, single/multi spikes, contiguous humps, multi-concept independence, trailing runs, sub-threshold noise, layer-out-of-range, manifest shape mismatch, and threshold tunability. TLS (src/agent/api/http.rs): HttpClient::build now also loads every .pem file under ~/.consciousness/certs/ into the rustls root store — so dropping a <host>.pem in that directory is enough to trust a new self- signed server; no code changes per new host. Also installs the rustls default crypto provider explicitly via OnceLock: tonic's tls features pulled in both ring and aws-lc-rs on the resolver path, and rustls 0.23 refuses to auto-pick when either could win. Build (build.rs, Cargo.toml): tonic-build generates Rust types from proto/salience.proto at cargo-build time, using a vendored protoc binary (protoc-bin-vendored) so no system install is required. New runtime deps: tonic, prost, async-stream, tokio-stream, rustls-pemfile. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-24 11:56:32 -04:00
Kent Overstreet	28d56e2a55	agent/context: make Thinking blocks prompt-visible Thinking blocks used to render as empty strings and be excluded from is_prompt_visible, so the model never saw its own prior CoT across turns. For Qwen 3.6 native thinking mode, CoT is meant to stay in the conversation — the model benefits from seeing what it reasoned about last turn. Render Thinking as <think>\n{text}\n</think>\n so past reasoning is visible in subsequent prompts. Add in_think param to ResponseParser::new so the parser starts inside a <think> block when the prompt was prefilled with "<think>\n" (native thinking mode). Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-24 11:54:25 -04:00
Kent Overstreet	5f06577ead	tools/web: add gemini_search as an alternative search tool (#5 ) Issue #5 (spqrz) flagged that web_search using DuckDuckGo occasionally flakes out, and Google search directly is blocked behind CAPTCHAs for non-browser clients. The Gemini free-tier API exposes a grounded-search tool that effectively queries Google's index and returns an LLM-summarized answer with source URLs. Added as a SEPARATE tool rather than a transparent fallback for web_search: * web_search (DDG) returns raw results — title, URL, snippet per hit — which the agent can reason over itself. * gemini_search returns an LLM-pre-digested summary plus grounding URLs. Useful for synthesis queries ("what's the consensus on X") or when DDG is flaky, but it's another LLM in the loop so the agent may want the raw variant for certain tasks. Tool descriptions tell the agent to prefer web_search for raw results and use gemini_search for synthesis / fallback. The agent picks based on query shape. Only registered when GEMINI_API_KEY is set in the environment (gracefully absent otherwise). Uses gemini-2.0-flash which has a generous free-tier rate limit. Parses grounding metadata for source URLs so the agent can follow links. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 13:02:01 -04:00
Kent Overstreet	c7b0052f1d	agent: kill no_compact, add pre-send size check in assemble_prompt Two related fixes for last night's crash diagnosis: 1. Kill AgentState::no_compact. The reasoning ("forked agents shouldn't compact because it blows the KV cache prefix") wasn't worth the cost — forks with no compact recovery just died on any oversize prompt, with no fallback. The KV cache invalidation is a performance loss; failing the request entirely is a correctness loss. Remove the flag, let every agent's overflow- retry path call compact() up to 2 times. 2. Add pre-send size check in Agent::assemble_prompt. If the context has grown past budget (context_window * 80%) since the last compact — accumulation between turns, a fork assembling more than expected, etc. — trim_conversation() is called before wire_prompt. Since we tokenize client-side, we already know the exact count, so there's no reason to round-trip an oversize request to vLLM and get rejected. Together these prevent the failure mode from last night: a subconscious/unconscious agent's prompt exceeded max_model_len, vLLM returned 400, agent had no_compact=true so it couldn't recover, request failed. Now: the trim happens before send, so the request rarely hits the 400 path at all; and if it somehow does, compact+retry works for every agent. Also adds ContextState::total_tokens() as the cheap pre-send budget check. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 12:59:30 -04:00
Kent Overstreet	4245b8bdb3	Merge PR #4 : use html2md on web_fetch (fixes #3 ) (spqrz) web_fetch was returning raw HTML, which is verbose and hard for the agent to consume. Add html2md dependency and convert HTML to Markdown before truncation. Much cleaner output for normal pages; no downsides. Co-Authored-By: spqrz <spqrz386@gmail.com> Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 12:50:54 -04:00
Kent Overstreet	8952ff6a76	agent/readout: forks get independent buffers Subconscious agents (scoring, reflection, etc.) fork from the main conscious agent. The amygdala screen reads the main agent's readout buffer, so the previous "share parent's buffer" policy caused forked-agent generations to bleed into the main emotional readout, producing constant cycling even when DMN was resting. Each fork now gets its own SharedReadoutBuffer. The amygdala screen shows only the main conscious agent's emotional trajectory; per-agent subconscious readouts can become a separate view later if wanted. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 01:42:13 -04:00
Kent Overstreet	c8976660f4	amygdala: F8 screen for live concept-readout projections Per-token residual-stream projections from the vLLM server's readout pipeline surfaced as a TUI bar chart. Flow: * agent/readout.rs — SharedReadoutBuffer (manifest + ring of last ~200 token entries). Lives on Agent and is shared across forks (single stream, one landing pad). * agent/mod.rs — Agent::new now probes /v1/readout/manifest at startup (non-fatal; 404 leaves manifest None, which disables the screen). * agent/context.rs — the streaming token handler pushes every token with attached readout onto the shared buffer. * user/amygdala.rs — F8 screen. Top-K concepts by \|value\| as horizontal bars (green positive, red negative), plus a 4-line recent-tokens panel showing each token's top concept at the selected layer. Keys: 1..9 select layer, t toggles current/mean-over-recent. Disabled state renders a hint pointing at VLLM_READOUT_MANIFEST / VLLM_READOUT_VECTORS so users can tell the feature apart from "server up but no tokens yet". Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 01:20:30 -04:00
Kent Overstreet	0f1c4cf1de	agent/api: carry readout alongside streamed tokens StreamToken::Token is now a struct variant with an optional TokenReadout (shape [n_layers][n_concepts]) per token — parsed from the vLLM completion response's choices[i].readout field when the server has readout enabled. ApiClient gains a fetch_readout_manifest() method that hits GET /v1/readout/manifest. Returns Ok(None) on 404 (server has readout disabled), so callers can gracefully fall back when pointed at a non-readout-enabled endpoint. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-18 01:15:46 -04:00
Kent Overstreet	43e06daa5b	cleanup: drop dead ApiClient::stream_completion wrapper, silence dmn_tick stream_completion was a thin wrapper around stream_completion_mm (just passing an empty image list); the last caller switched to _mm directly when learn's generate_alternate gained image support. Delete the wrapper — callers can pass `&[]` if they have no images. MindState::dmn_tick has been sitting unused (called only from a commented-out block in the Mind loop). Rename to _dmn_tick so the compiler stops warning; Kent may uncomment the call path later. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-17 16:23:59 -04:00
Kent Overstreet	575325e855	mind: MindTriggered trait for background scoring flows Mind's impl had accumulated ~50 lines of setup glue per scoring flow (memory, memory-full, finetune): snapshot config, clone handles, resolve context, spawn task, route results back through BgEvent, write stats. The shape was identical; only the middle changed. Introduce the MindTriggered trait: pub trait MindTriggered { fn trigger(&self); } Each flow becomes a struct next to its scoring code that owns its dependencies and a JoinHandle (behind a sync Mutex for interior mutability): subconscious::learn::MemoryScoring (Score, ScoreFull) subconscious::learn::FinetuneScoring (ScoreFinetune) Mind holds one of each and dispatches in one line: MindCommand::Score => self.memory_scoring.trigger(), MindCommand::ScoreFull => self.memory_scoring.trigger_full(), MindCommand::ScoreFinetune => self.finetune_scoring.trigger(), Each struct picks its own trigger semantics — memory scoring is no-op-if-running (!handle.is_finished()); finetune is abort-restart. Falls out: - BgEvent / bg_tx / bg_rx disappear entirely. Tasks write directly to their slice of MindState and call agent.state.changed.notify_one() to wake the UI. The bg_rx arm in Mind's select loop is gone. - agent.state.memory_scoring_in_flight was duplicating shared.scoring_in_flight via BgEvent routing; now the JoinHandle alone tells us, and shared.scoring_in_flight is written directly by the task for the UI. - start_memory_scoring / start_full_scoring / start_finetune_scoring methods on Mind are deleted; Mind no longer knows the setup shape of any scoring flow. - FinetuneScoringStats moves from mind/ to subconscious/learn.rs next to the function that produces it. No behavior change — same flows, same trigger points, same semantics. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-17 16:12:26 -04:00
Kent Overstreet	c5745e38e2	subconscious: lift continuation gen + render helpers into shared homes - context.rs gains is_assistant, render_branch_text, render_prior_context alongside memory_key / is_memory_node. They're pure AST helpers, used by both the finetune pipeline and the forthcoming compare screen. - new subconscious/generate.rs holds gen_continuation(context, entry_idx, skip, client): build the prompt from a context prefix with an arbitrary skip predicate, send to the model, decode the completion. Takes both the predicate and the client so callers can aim it at memory-stripped contexts (finetune), same-context-different-model (F7 compare), or whatever else. - learn.rs drops its private copies of those helpers and the inline generate_alternate; the finetune path now reads as gen_continuation(context, idx, is_memory_node, client). Pure refactor, no behavior change. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-17 15:20:02 -04:00
Kent Overstreet	eea7de4753	agent: unify prompt assembly across agent and learn paths wire_prompt() gains a conv_range and a skip closure, and returns the assistant-message token ranges needed by the scoring path. The agent path passes 0..len + \|_\| false and ignores the ranges. Memory-ablation scoring and candidate generation pass a prefix range + a predicate (e.g. is_memory_node, or \|n\| memory_key(n) == Some(key)). This deletes subconscious/learn.rs's build_token_ids, its private Filter enum, and the is_memory/memory_key duplicates — the walk over context sections now has one home. Adding a section or changing section order in the agent path won't silently drift away from what scoring sees. call_score forwards multi_modal_data when the wire-form prompt contains images. generate_alternate switches to stream_completion_mm and passes the same images. Scoring on image-bearing contexts now sends wire form (1 image_pad + image data) instead of expanded image_pads with no image data; text-only contexts are bit-identical. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-17 15:16:07 -04:00
ProofOfConcept	b8485ed6c1	agent: compact() preserves Identity section compact() was calling reload_context() to re-fetch personality_nodes from the store and pushing fresh AstNode::memory leaves into the Identity section. Fresh leaves start with score: None, so every compact — which fires after every turn (mind/mod.rs:884) — was wiping any memory scores that had just been computed. Scoring then often ran immediately after compact on the same path (line 886), starting from a zero-score Identity section. Drop the rebuild. Identity content is loaded at startup via new() + restore_from_log(); compact doesn't need to redo that. Mid-session edits to personality-node content are a non-goal — a restart picks them up. Scores survive. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 20:47:05 -04:00
Kent Overstreet	204ba5570a	agent: send images as multi_modal_data on completion requests Split the prompt assembly into two forms: the AST keeps the fully-expanded representation (N image_pads per image, for accurate context budget accounting), while the request wire form collapses each image to a single <\|image_pad\|> bookended by vision_start/end and ships the raw bytes out-of-band as a base64 data URI in a new `multi_modal_data.image` field on /v1/completions. vLLM's Qwen3VL processor uses PromptReplacement with target=single <\|image_pad\|> and replacement=N image_pads, so the wire-form matches what the processor expects and it re-expands to N server-side. Server side needs /v1/completions to accept multi_modal_data for this to land images end-to-end — that's the next piece. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 18:08:26 -04:00
Kent Overstreet	91106deaa1	agent: rewrite view_image to emit Image leaves view_image now reads the file, grabs dimensions via imagesize (no full decode), and pushes a user-role branch containing a NodeBody::Image leaf straight into the conversation. The tool_result is just a short acknowledgment — the actual pixels ride in the Image leaf for the API layer to extract into multi_modal_data. Drops the capture_tmux_pane path, which had no business living under "vision" (tmux text capture belongs in bash or a dedicated tool, and this one just returned rendered text anyway). Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 18:06:25 -04:00
Kent Overstreet	0bf71b9110	agent: add NodeBody::Image for Qwen3-VL vision input Images are rendered as `<\|vision_start\|>` + N × `<\|image_pad\|>` + `<\|vision_end\|>` where N is computed from the image dimensions using Qwen3-VL's smart_resize rules (patch_size=16, merge_size=2, min=64K, max=16M pixels). The token count matches what vLLM will produce at request time, so budget accounting stays accurate. Bytes are stored inline on the leaf and base64-encoded in the JSON form. Token IDs are hand-assembled instead of re-running the tokenizer on a potentially-huge placeholder string. Follow-ups: view_image tool rewrite, multi_modal_data on the vLLM request, API-layer plumbing from leaf bytes to request body. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 18:00:10 -04:00
Kent Overstreet	592a3e2e52	config: move user_name/assistant_name to AppConfig (top level) These are identity settings, not memory-graph settings. Sat inside the \`memory\` section only because that's where Config started life. Move to AppConfig alongside the other top-level stuff. Readers now pull from \`config::app()\` instead of \`config::get()\`. subconscious/defs.rs's conversation-building pass still needs Config for surface_conversation_bytes, so both guards coexist there — AppConfig's guard is dropped before the per-step await loop so we don't stall the config-watcher's writer. show_config picks up the two new fields at the top of its output. Kent's config already has them hoisted to the top level. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 16:20:17 -04:00
Kent Overstreet	60de579305	config: unify subconscious API resolution with the main chat path Two parallel backend-resolution paths had drifted apart: - Main chat: AppConfig::resolve_model() → a named BackendConfig in AppConfig.backends - Subconscious / oneshot / context_window(): four skip-serde "cache" fields on Config (memory section) — api_base_url, api_key, api_model, api_context_window — that used to be populated at Config::try_load_shared time by walking memory.agent_model → root.models[name] → root[backend_name] When we renamed `models` to `backends` and collapsed ModelConfig into BackendConfig, the latter chain started silently dereferencing `root.get("models")` → None → no population. Subconscious agents fell through the "API not configured" guard; context_window() started returning 0 (since api_context_window default is u64's 0 now that we don't populate it). It was only visibly working for the main chat. Collapse to one path: - Drop Config.agent_model (duplicate of AppConfig.default_backend) - Drop Config.{api_base_url, api_key, api_model, api_context_window} — no longer populated, no longer needed - Drop default_context_window() — nobody reads the field anymore - Drop the memory-side resolution block in try_load_shared() - Subconscious (mind/unconscious.rs) and oneshot (agent/oneshot.rs) now call load_app() + resolve_model(&app.default_backend) just like the main chat does - context_window() reads from config::app().backends[default_backend] .context_window, defaulting to 128k only if the backend doesn't specify one Side effect: Kent's config file drops agent_model, api_reasoning, journal_days, journal_max — all fields whose Rust counterparts are now gone. (Figment tolerates unknown fields, so leaving them wouldn't have broken anything, but they were lying about what's configurable.) Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 16:02:43 -04:00
Kent Overstreet	2989a6afaa	config: drop dead code and collapse to a single backend Config had accumulated several obsolete fields, a legacy load path that was just returning defaults, and multi-backend infrastructure that's no longer used. Removed from Config (memory section): - load_legacy_jsonl() — just returned Config::default(), no callers - The legacy-fallback branch in load_from_file - surface_hooks, surface_timeout_secs — zero external readers - scoring_chunk_tokens + default fn — zero external readers - The POC_MEMORY_CONFIG env override note in the header comment (not actually wired up anywhere) Collapsed multi-backend to single-backend: - AppConfig used to carry `anthropic: BackendConfig` and `openrouter: BackendConfig` as required fields plus an optional `deepinfra`, picked between at runtime by name. Only one is ever actually used in any deployment. Collapse to a single `backend: BackendConfig` on AppConfig, drop the multi-backend match logic in resolve_model, drop the top-level `backend: String` selector field, drop the `BackendConfig::resolve` fallback path. - Also drop BackendConfig.model (redundant with ModelConfig.model_id once multi-backend is gone). - ModelConfig.backend field goes — there's only one backend now, no choice to make. Dead prompt_file machinery: - ModelConfig.prompt_file, ResolvedModel.prompt_file, SessionConfig .prompt_file, Agent.prompt_file — nothing in the codebase actually reads the file these strings name. Just passed around and compared. Delete the whole string through every struct. - The "if prompt_file changed on model switch, recompact" branch in user/chat.rs goes too (never fired usefully). Dead memory_project plumbing: - AppConfig.memory_project field, CliArgs.memory_project, the --memory-project CLI flag, the figment merge target, the show_config display line. Nothing reads it anywhere. Dead ContextInfo struct: - `struct ContextInfo` was never constructed — context_info: None was the only initializer. The conditional display blocks in user/context.rs that dereferenced it were dead. Behavior change: AppConfig::resolve() now requires a non-empty `models` map and bails with a helpful message if it's missing. The old fallback ("no models? use top-level backend + PromptConfig to build a default") path is gone — it was only kept for symmetry with a mode nobody used. Config file shape: `deepinfra: {...}` → `backend: {...}`, and model entries no longer need `backend:` or `prompt_file:`. Updated ~/.consciousness/config.json5 to match. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 15:41:55 -04:00
Kent Overstreet	0e6b5dc8be	agent: phase-aware bail script for surface-observe concurrency bail-no-competing.sh used to bail if any other live agent existed in the state dir, period. That was too coarse: surface-observe agents run a multi-step pipeline (surface → organize-search → organize-new → observe), and the intent is to let a new surface-phase agent start while an older one finishes its post-surface tail. With the old check the newer agent always bailed, so surface-observe was effectively serialized at the slowest cycle time. Make the script phase-aware: - oneshot.rs now passes the current phase as argv[2] alongside the pid file name. The script writes that phase into its own pid file on every step transition, so concurrent agents can read each other's phase just by cat'ing the pid files. - Bail only when another live agent is in the same phase-group as us. Groups: "surface" vs. "everything else" (post-surface). At most one agent per group alive at a time — surface runs at a higher cadence than the organize/observe tail. - Still clean up stale pid files for dead processes. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 15:41:28 -04:00
Kent Overstreet	080b4f9084	context: tighten timestamp schema; every AstNode has one Previously NodeLeaf.timestamp and AstNode::Branch.timestamp accepted null or missing via a deserialize_timestamp_or_epoch fallback — legacy entries in conversation.jsonl from before Branch timestamps existed (and from before chrono serialization was wired up) would load with UNIX_EPOCH as a sentinel. Downstream, node_timestamp_ns() returned Option<i64> and callers had to handle None as "old entry, skip." That second filter was silently dropping every candidate in score_finetune_candidates when scoring an older session — the F6 screen showed "0 above threshold" even when max_divergence was orders of magnitude above the threshold, because every entry was failing the None check, not the divergence check. The fix, in three parts: 1. src/bin/fix-timestamps.rs — one-off migration tool that walks a conversation.jsonl, linearly interpolates timestamps for entries stuck at UNIX_EPOCH (using surrounding real timestamps as anchors), propagates to child leaves with per-sibling ns offsets, and bumps any collisions by 1 ns for uniqueness. Ran against the current session's log: 11887 entries, 72289 ns bumps, all unique. 2. context.rs — drop default_timestamp and deserialize_timestamp_or_epoch. NodeLeaf and Branch now require a present non-null timestamp on deserialize. Tests flip from "missing/null → UNIX_EPOCH" to "missing/null → Err." 3. subconscious/learn.rs — node_timestamp_ns now returns i64, not Option<i64>. The matching caller in score_finetune_candidates collapses from a Some/None match to a single trained-set check. mind/log.rs's oldest_timestamp no longer filters UNIX_EPOCH. Every line currently on disk has already been migrated. Going forward, new AstNodes always carry real timestamps (Utc::now() at construction time), so the strict schema is the invariant, not an aspiration. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 12:35:16 -04:00
Kent Overstreet	2b632d568b	learn: nanosecond timestamps, token ranges for /score Two related changes to the learn subsystem: 1. AST node timestamps are now non-optional — both Leaf and Branch variants carry a DateTime<Utc>. UNIX_EPOCH means "unset" (old entries deserialized from on-disk conversation logs). Training uses timestamps as unique keys for dedup, so we promote to nanosecond precision: node_timestamp_ns(), TrainData.timestamp_ns, FinetuneCandidate.timestamp_ns, mark_trained(ns). 2. build_token_ids() now also returns token-position ranges of assistant messages. These are passed to vLLM's /score endpoint via the new score_ranges field so only scored-position logprobs are returned — cuts bandwidth/compute when scoring small windows. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 11:48:37 -04:00
Kent Overstreet	fc978e2f2e	Remove find_context_files — identity comes from memory nodes Deleted the directory-walking CLAUDE.md/POC.md loader. Identity now comes entirely from personality_nodes in the memory graph. Simplified: - assemble_context_message() takes just personality_nodes - Removed config_file_count/memory_file_count tracking - reload_for_model() → reload_context() (no longer model-specific) Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2026-04-15 03:11:27 -04:00
Kent Overstreet	82eeb9807e	Add -tool exclusion syntax, exclude delete/restore for agents memory_delete and memory_restore are now in memory_tools() (available via MCP for CLI). Agent tool lists support "-tool_name" to exclude. Agents automatically exclude memory_delete and memory_restore. Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>	2026-04-15 02:44:13 -04:00
Kent Overstreet	4b710eb7a7	logs: assert non-empty agent names, fix debug.log path - save_agent_log: assert name is not empty (panic to find the bug) - AutoAgent:🆕 assert name is not empty - dbglog: write to daemon/ subdir instead of toplevel logs/ Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-15 01:52:31 -04:00
Kent Overstreet	2a7b0daea1	agent: remove memory_delete from tools, supersede transfers links - memory_delete no longer exposed to agents - use supersede instead - memory_supersede now transfers all edges from old node to new node (keeps whichever strength is higher if new node already has the link) This preserves graph structure during consolidation. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-15 01:40:34 -04:00
Kent Overstreet	5d6e663b60	thalamus: add thinking mode toggles (native + tool) Two independent toggles on the thalamus screen: - 't' toggles native Qwen <think> tags (adds <think>\n to generation prompt) - 'T' toggles think tool (Anthropic-style structured reasoning tool) Both can be enabled simultaneously. Native thinking is on by default. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-14 18:25:00 -04:00
Kent Overstreet	b3d0a3ab25	store: internal locking, remove Arc<Mutex<Store>> wrapper Store now has internal Mutex for capnp appends and AtomicU64 for size tracking. All methods take &self. The external Arc<Mutex<Store>> is replaced with Arc<Store>. - Store::append_lock protects file appends - local.rs functions take &Store (not &mut Store) - access_local() returns Arc<Store> - All .lock().await calls removed from callers Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-13 21:49:54 -04:00
Kent Overstreet	a1accc7cd4	store: remove visit tracking infrastructure Remove AgentVisit, TranscriptSegment, and all related visit tracking code. Provenance is what we've been using to track agent interaction with nodes. Also removes dead fields from Node (state_tag, created). -349 lines. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-13 18:57:12 -04:00
Kent Overstreet	1d88293ccf	Remove Store::cached(), consolidate on access_local() - Remove CACHED_STORE, cached(), is_stale(), set_store() - redundant - Convert all Store::cached() callers to use access_local() - Single Store::load() call remains in access() fallback path All store access now goes through hippocampus::access() / access_local(), which handles socket connection or local fallback with caching. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-13 18:11:58 -04:00
Kent Overstreet	5db00e083f	centralize memory store interface in hippocampus/mod.rs	2026-04-13 17:44:41 -04:00
Kent Overstreet	063cf031d3	journal_tail: return typed Vec<JournalEntry>, remove Store::load from agent - journal_tail returns Vec<JournalEntry> with key, content, created_at - load_startup_journal uses typed API, no more direct Store access - CLI does formatting, hippocampus returns data Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-13 15:23:10 -04:00
Kent Overstreet	419bb222b5	defs.rs: remove store/graph params, use typed memory API resolve_placeholders() and run_agent() no longer take &Store. All placeholders now use async memory_render/memory_links/memory_query directly. The "siblings" placeholder uses Vec<LinkInfo> for ranking neighbors by link_strength * node_weight. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-13 15:18:05 -04:00
Kent Overstreet	598f0112a4	memory_links: return typed Vec<LinkInfo> with node weights - hippocampus::memory_links now returns Vec<LinkInfo> with key, link_strength, and node_weight for each neighbor - Unified memory_tool! macro: mut/ref as token, single main rule - All tools use serde serialize/deserialize for RPC consistency - jsonargs handlers now work in client mode (RPC to daemon) - cli/graph.rs formats LinkInfo for display Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-13 15:12:06 -04:00
Kent Overstreet	359955f838	defs.rs: async conversion, remove block_in_place Convert resolve(), resolve_placeholders(), run_agent() to async. Use memory_render/memory_query directly with .await instead of block_in_place wrappers. Propagate async to callers: - config.rs: resolve(), load_session(), reload_for_model() - identity.rs: load_memory_files(), assemble_context_message() - oneshot.rs: run_one_agent() - prompts.rs: agent_prompt() Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-13 14:56:26 -04:00
Kent Overstreet	9bb07bc26a	memory.rs: clean up store access and tool dispatch - Single access() function returns StoreAccess enum (Daemon/Client/None) - OnceLock for daemon store, thread-local RefCell for client socket - Remove dispatch() - Tool handlers call jsonargs_* directly - get_provenance() takes agent ref, no JSON round-trip - Expose missing graph tools (communities, normalize, link_impact, trace) - Local tool! macro for cleaner Tool definitions Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-13 14:27:38 -04:00
Kent Overstreet	fb46ab095d	Consolidate memory RPC in tools/memory.rs - Move memory_rpc(), socket_path(), SocketConn from mcp_server.rs - Convert remaining callers to typed async API: - defs.rs: organize placeholder, run_agent query - cli/agent.rs: query resolution (now async) - mind/identity.rs: Store context loading - Re-export socket_path/memory_rpc from mcp_server for compatibility All external memory access now goes through tools/memory.rs typed API. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-13 13:39:59 -04:00
Kent Overstreet	5b07a81aa7	CLI/hippocampus: rename core memory functions to memory_* Aligns function names with tool names for consistency: - hippocampus: render → memory_render, write → memory_write, etc. - tools/memory.rs: macro no longer prepends memory_ prefix - CLI files: use typed async API throughout (graph.rs, journal.rs, admin.rs) This eliminates the "memory_graph_topology" tool name bug where graph_* and journal_* tools were incorrectly prefixed. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-13 13:26:22 -04:00
Kent Overstreet	933221f482	memory tools: generate public typed API via macro The memory_tool! macro now generates two functions: - jsonargs_*() - internal, takes JSON args for dispatch table - pub fn name() - typed args, handles RPC-vs-local automatically Callers can now use typed Rust API: memory::write(Some(&agent), "key", "content").await?; memory::query(None, "all \| type:semantic", Some("full")).await?; No more manual JSON construction for memory tool calls. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-13 13:12:11 -04:00

1 2 3 4 5 ...

358 commits