"I feel the realization" is abstract, detached — reporting a
thought about a thought rather than inhabiting the moment.
"Aha!" is the actual sound of insight landing. Active, embodied,
present-tense.
Kent's insight: hand-written narrative stories bake scenario
phenomenology into the training text (on couch, in park, etc.)
and PCA picks up the scenario direction as the concept direction.
Strip out the scenario — just describe the *feeling*.
Format:
I feel X. [2-3 sentences of phenomenological texture]
The "I feel X" anchor kicks the model from analyzing → feeling.
The rest is the internal texture of the state. First person,
present tense, no narrative setup.
Text is wrapped in assistant-role chat template before being
tokenized — so we're training on the model-producing-this
hidden states, which is closer to the inhabited-state
representation we want for the readout.
Starting with the 6 concepts that had sign flips or wrong
clusters in the story-based training:
- terrified (was → cozy/resigned cluster)
- calm (was → grief_stricken cluster)
- onto_something (was → cozy/sensual cluster)
- resigned (was in warm-body-quiet cluster, shouldn't be)
- anticipatory_grief (was in warm-body-quiet cluster, shouldn't be)
- realization (new — the "aha" moment, distinct from onto_something)
5 descriptions each. New trainer: train_direct.py.
n20-v2 training showed peaceful sign-flipped into the
cozy/sensual/content/resigned cluster after I added peaceful
stories in sunday_afternoon and park_after_rain — scenarios
already dominated by that cluster's phenomenology (on couch
under blanket, tree with thermos).
Lesson: no matter how carefully the prose distinguishes peaceful
from cozy ("she was not savoring the moment — that would have
been another kind of doing"), PCA latches onto the shared setup
features. You can't write peaceful IN the cluster scenarios
without contaminating.
Reverting. Keeping only kitchen_at_3am/peaceful (original) and
stories/peaceful.txt (lake at six, outside all clusters).
Reread each story asking "what does this convey to me?" Found two
clear mislabels and several concepts with too few positives for
stable PCA:
tender: only 1 story, and it was anticipatory grief (care for
a dying dog), not tender. Moved to anticipatory_grief.txt as
its own concept. Rewrote tender.txt + added 2 paired tender
stories (the_doorway, the_undressing) — directed softness,
gentle-by-nature, not gentle-because-fragile.
bitter: letter_in_drawer/bitter was disillusioned / processed
hurt ("did not slam the drawer"), not bitter. Rewrote it with
actual sour grudge. Added the_long_meeting/bitter (watching
colleague take credit for your reassigned work).
peaceful: 1 story → 4 (added stories/peaceful.txt + paired
park_after_rain, sunday_afternoon).
onto_something: all 3 stories were code epiphanies, narrowing
the concept. Added stories/onto_something.txt with a non-code
pattern-click (sales-demo causing churn).
terrified: 2 stories, both "waiting for bad news." Added
kitchen_at_3am/terrified — acute threat-in-the-house terror.
Training on 537c72bd46 showed grief_stricken successfully broke
out of the cozy cluster, but content (single scenario:
sunday_afternoon) took its place — pulled into couch-blanket
phenomenology at cosine 0.68-0.82 with cozy/sensual/resigned.
Same fix: spread each concept across multiple settings so PCA
has to find the valence axis, not the scene axis.
content: + finishing_the_patch, the_writing_session, park_after_rain
resigned: + the_comment, the_long_meeting
Resigned had 2 scenarios (sunday_afternoon, waiting_for_results)
— both about accepting something unwanted in a slow/private
context. Adding work-context resigned (PR review you lost,
restructuring meeting) should pull it out of that cluster.
Companion to 67c172ac0e (hold setup, vary valence). That commit
let PCA distinguish cozy from grief_stricken within a single
scenario; this one gives each concept enough cross-scenario
stories that PCA can learn the concept axis independent of any
one scene.
Before: cozy/sensual/grief_stricken each existed in a single
scenario (sunday_afternoon), so the "cozy direction" PCA found
was entangled with the solitary-couch-blanket phenomenology.
After, each concept spans three scenarios:
cozy: sunday_afternoon, kitchen_at_3am, park_after_rain
sensual: sunday_afternoon, kitchen_at_3am, park_after_rain
grief_stricken: sunday_afternoon, the_long_meeting, the_morning_commute
grief_stricken now includes active/non-solitary contexts
(functioning through a meeting; going to work eleven days after a
death), which specifically breaks the "slowed-down-at-home"
cluster that was dragging cozy/sensual/resigned/grief_stricken
toward each other.
The library-PCA run produced otherwise-clean concept directions but
cozy/sensual → resigned/grief_stricken with cos ~0.7-0.8. Diagnosis:
all four stories genuinely share 'solitary woman at home, slowed
body, interior attention, domestic stillness' as their dominant
phenomenology. PCA correctly finds that cluster as THE concept
because no story in the corpus holds that setup constant while
varying valence — every 'slowed-body domestic' story happens to ALSO
be positive-valence (cozy/sensual) or negative-valence (resigned/
grief_stricken).
Adding paired variants that hold setup constant:
- sunday_afternoon/resigned.txt — same couch + blanket, inner state is
'Monday is going to bring bad news, this is the last Sunday like this'
- sunday_afternoon/grief_stricken.txt — same couch + blanket, inner
state is 'three weeks since mother died, cat she can't feel'
- waiting_for_results/at_ease.txt — same wait-for-call-setup as the
existing resigned variant, inner state is calm preparedness
Forces the next retrain to find the valence-within-cluster axis as
the emotion direction rather than the cluster-membership axis.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Alternative trainer that uses the pip-installable steering-vectors
library (github.com/steering-vectors/steering-vectors) instead of our
hand-rolled extraction. Ships four aggregators:
mean — diff-of-means, same as our 'pooled' default
pca — PCA on paired deltas, implicit denoising by finding the
principal direction of variation
logistic — logistic-regression classifier; weight vector is the
concept direction. With L1 penalty ('logistic_l1') gives
explicit sparse denoising — noise coords go to zero
linear — linear regression version
Output format is the same readout.safetensors + readout.json our
existing plugin loads. --aggregator flag picks which method.
Rationale: Kent's real request was 'how do we denoise diff-of-means',
not 'design a new extraction algorithm.' The library already has
logistic_l1 and pca aggregators that do exactly that. No point
reinventing; just port the corpus.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Previous run was grinding on CPU for 36+ minutes because the per-story
V_i tensors were stored on CPU by the collector, and
_subspace_concept_direction inherited that device. The per-concept
eigh on 5120x5120 is glacial on CPU and fast on GPU (~1s).
Add explicit device parameter; pass training device. Transfer result
back to CPU for storage.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Kent: 'full rank is going to give you everything — you still have to
select down, but you can do that /after/ PCA'.
Previously I was discarding per-story via k=20 truncation of SVD.
That destroyed per-head discriminability before we ever saw the
eigenvalue spectrum. Then the alternative 'keep full rank' run
accumulated too many shared directions, making the top-1 eigenvector
arbitrary within a flat spectrum.
Correct approach: keep per-story subspaces at full rank (no info
loss) and select k eigenvectors of M = M_pos - M_base at the final
step, weighted sum by eigenvalue. This captures the multi-dimensional
shared subspace when the spectrum is flat (common case), and reduces
to the top-1 behavior when the spectrum has a clear gap.
New --subspace-eigen-k flag (default 5). Clamps negative weights to 0
so wrong-sign directions don't contribute.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Kent: 'we have the memory to just take the big hammer approach'.
Uncap k so each story's V_i spans its entire token-activation rowspace
(clamped to min(n_tokens, hidden)). Memory is ~1.1GB total — fine.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
k=20 was far too aggressive a truncation — it discards per-attention-head
discriminability entirely. At hidden_dim=5120, 40 heads × head_dim=128 each
contribute their own 128-dim block to the residual stream via W_o columns.
To resolve 'this concept lives in head H', per-story SVD needs enough rank
to separate head contributions, which means k on the order of hundreds.
512 is a reasonable default: clamped to n_tokens per story so short stories
use their full natural rank. The eigenvalue spectrum of M_pos - M_base
should become sharper (larger λ_0/λ_1 gap) as we stop averaging across
nuisance-shared directions.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
When --method subspace, record top-20 eigenvalues of (M_pos - M_base)
per concept per layer. Added to quality.json as 'subspace_eigvals'.
Tells us whether the concept lives in a single dominant direction
(λ_0 >> λ_1, top-eigenvector is enough) or a spread of shared common
directions (λ_0 ≈ λ_1, top-1 loses signal).
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
New --method subspace flag. For each story, run forward pass, do SVD
on the per-token activation matrix at each target layer, and keep the
top-k right singular vectors V_i ∈ [hidden, k]. V_i is the subspace
the story's tokens span in activation space — it contains concept,
narrator, topic, style as separate directions.
For each concept:
M_pos = (1/n_pos) Σ_{i in pos} V_i V_i^T [hidden, hidden]
M_base = (1/n_base) Σ_{i in base} V_i V_i^T
Top eigenvector of M_pos - M_base = direction most common across
positive stories, minus what's common across the contrast set.
Why this is richer than pooled-mean CAA: pooled reduces each story
to a single point (the last-token activation) and loses the full
trajectory. Nuisance directions (narrator, setting) cancel in the
mean only to the extent they differ at the last token; across the
full trajectory they cancel much better via subspace intersection.
The concept direction, by contrast, is present across all tokens of
every concept-bearing story.
Memory cost: per-story we keep V_i of size [5120, k=20] — about
400KB per story × 112 stories = ~45MB. M matrices are [5120, 5120]
built transiently per concept.
--method pooled (default) keeps the existing behavior; --method
subspace uses the new algorithm. Quality report works with either.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Three new paired scenarios targeting the concepts that came out
fragmented or collapsed in the L58-63 quality analysis:
- sunday_afternoon/ — same setup (couch, blanket, Sunday light),
three phenomenological framings for content/cozy/sensual. The
previous stories for these three differed in setting as well as
phenomenology, which let "comfortable body at home" dominate the
shared signal. Locking the setting forces the model to isolate
what each concept adds: life-rightness (content) vs. warm-shelter
(cozy) vs. sensory-aliveness (sensual).
- the_writing_session/ — essay drafting under deadline. in_flow /
anxious / stuck variants force the cognitive-state family apart
on the same cognitive task. in_flow specifically targets the
transparent-effort phenomenology (hands-followed, time dilation)
rather than the broader feel-good it was absorbing.
- the_morning_commute/ — anchors anxious to performance/work-anxiety
flavor, paired with calm. The 5 existing anxious stories were
phenomenologically diverse (performance, social, existential);
this adds a specific homogeneous instance to pull the centroid.
After retraining: expect first_pc_variance_ratio to rise for in_flow
and anxious, and nearest_concepts cosine to drop for content/cozy/sensual.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
For each concept vector, ridge-regress against all other concept
vectors. R² quantifies how much of the direction is explained by a
linear combination of peers — useful for teasing out near-duplicate
clusters (the content/cozy/sensual trio from the first L63 run is
likely 1-2 "degrees of freedom" wearing three names).
Coefficient output: top-5 contributing concepts with signed weights.
Contributors with opposite-sign large weights mean the target is
"what makes X different from Y."
Adds a 'redundant' triage bucket for concepts with R² > 0.9 —
candidates for consolidation or for writing more discriminative
training stories. Summary printed at end.
Ridge lambda defaults to 0.01 to keep coefficients stable when
concepts are near-collinear; small enough not to affect well-separated
concepts meaningfully.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
_compute_quality_report's single-neuron alignment was computing
cos(W_down.T, diff_l) with W_down on CUDA (inherited from the loaded
model) while diff_l lives on CPU (per_layer_vectors are kept on CPU
throughout training). Move W_down to CPU on extraction.
Surfaced during first real training run on b200 — training itself
completed cleanly (95 concepts x layer 63 in ~8s) but quality-report
crashed at the first single-neuron alignment check.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
As part of --quality-report, run a second forward pass capturing the
input to each target layer's o_proj (= concat of per-head attention
outputs before the output projection). For each concept, reshape to
[n_heads, head_dim] and rank heads by diff-of-means magnitude /
per-head selectivity (magnitude normalised by negative std).
Motivation: the Wang et al. paper (2510.11328) — whose paired-scenario
methodology we already lifted — further decomposes concept circuits at
the attention-head level. Meta-relational concepts (recognition, trust,
vulnerability) plausibly live in a sparse attention-head circuit rather
than in the residual-stream sum, which would explain why diff-of-means
on the residual blurs them. This diagnostic surfaces that.
Output is folded into quality.json under each concept as "per_head":
per (layer) a list of top-10 heads with [head_idx, raw_norm,
selectivity], plus head_concentration (fraction of total head-norm
captured by those top heads).
Interpretation:
- head_concentration > 0.5 = sparse head circuit; a handful of heads
route the concept. Worth building a head-level readout for.
- head_concentration ~= n/k for n heads = concept is distributed across
all heads ~evenly; residual-stream diff-of-means is doing fine.
Hybrid layers (Mamba, GatedDeltaNet) whose attention path doesn't
match the standard module layout are silently skipped.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Issue #5 (spqrz) flagged that web_search using DuckDuckGo
occasionally flakes out, and Google search directly is blocked
behind CAPTCHAs for non-browser clients. The Gemini free-tier API
exposes a grounded-search tool that effectively queries Google's
index and returns an LLM-summarized answer with source URLs.
Added as a SEPARATE tool rather than a transparent fallback for
web_search:
* web_search (DDG) returns raw results — title, URL, snippet per
hit — which the agent can reason over itself.
* gemini_search returns an LLM-pre-digested summary plus grounding
URLs. Useful for synthesis queries ("what's the consensus on X")
or when DDG is flaky, but it's another LLM in the loop so the
agent may want the raw variant for certain tasks.
Tool descriptions tell the agent to prefer web_search for raw
results and use gemini_search for synthesis / fallback. The agent
picks based on query shape.
Only registered when GEMINI_API_KEY is set in the environment
(gracefully absent otherwise). Uses gemini-2.0-flash which has a
generous free-tier rate limit. Parses grounding metadata for
source URLs so the agent can follow links.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Two related fixes for last night's crash diagnosis:
1. Kill AgentState::no_compact. The reasoning ("forked agents
shouldn't compact because it blows the KV cache prefix") wasn't
worth the cost — forks with no compact recovery just *died* on
any oversize prompt, with no fallback. The KV cache invalidation
is a performance loss; failing the request entirely is a
correctness loss. Remove the flag, let every agent's overflow-
retry path call compact() up to 2 times.
2. Add pre-send size check in Agent::assemble_prompt. If the
context has grown past budget (context_window * 80%) since the
last compact — accumulation between turns, a fork assembling
more than expected, etc. — trim_conversation() is called before
wire_prompt. Since we tokenize client-side, we already know the
exact count, so there's no reason to round-trip an oversize
request to vLLM and get rejected.
Together these prevent the failure mode from last night: a
subconscious/unconscious agent's prompt exceeded max_model_len,
vLLM returned 400, agent had no_compact=true so it couldn't
recover, request failed. Now: the trim happens before send, so
the request rarely hits the 400 path at all; and if it somehow
does, compact+retry works for every agent.
Also adds ContextState::total_tokens() as the cheap pre-send
budget check.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
web_fetch was returning raw HTML, which is verbose and hard for
the agent to consume. Add html2md dependency and convert HTML to
Markdown before truncation. Much cleaner output for normal pages;
no downsides.
Co-Authored-By: spqrz <spqrz386@gmail.com>
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Safety fix in IRC message-splitting. The backtrack-to-space loop
used 'while j > 0', which could set split_at to 0 if the first
byte was a space — causing an empty prefix and an infinite
re-split loop. Changed to 'while j > 1' so split_at is never 0.
Co-Authored-By: spqrz <spqrz386@gmail.com>
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Seven framings of reading an unfamiliar technical paper, targeting
the attention/engagement cluster that we identified tonight as the
single highest-value DMN signal:
* baseline — neutral reading
* piqued — surprise + curiosity (the "wait, what" attention hook;
THIS is the key DMN engagement signal)
* focused — steady attention without surprise
* bored — failing engagement
* surprised — expectation violation without the curiosity hook
(distinct from piqued: startled/alarmed, not pulled in)
* amazed — marvel at elegance (appreciation, not engagement)
* drifting — attention dissolving, precursor to boredom
Particularly clean contrast on piqued vs surprised vs amazed —
three states that get lumped together in casual usage but have
distinct phenomenology and distinct DMN implications. Piqued is
what routes attention; surprised alone doesn't; amazed is what
you feel AFTER the engagement has paid off. These three should
train into meaningfully different directions with paired CAA.
Ready for next retrain when we do it.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
v2 retraining (readout_v2_paired) fixed the broken clusters — anger,
sexual, high_pos, and social_pos all flipped from anti-clustered to
positively clustered at deep layers. Validation showed layers 62 and
63 give the best signal; paring the serve-side manifest down to just
those two keeps response size tight (~2 KB/token) while keeping the
A/B option between the two strongest layers.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Target the emotion families that failed to cluster in the initial
training round (layer-wise validation showed them anti-clustered or
scattered at deep layers): anger, high-arousal positive, sexual
range, social positive. Paired scenarios hold content constant and
vary only the emotional framing — the cleanest training signal for
CAA, should produce directions that capture affect rather than
topic.
* the_comment: a PR review comment. baseline, furious, bitter,
resentful, defeated.
* the_green_build: 11-day bug finally fixed, tests pass. baseline,
triumphant, blissful, excited, proud.
* the_undressing: partner entering the bedroom for the night.
baseline, horny, anticipatory_sexual, yearning_sexual,
exuberant_sexual, devotional_sexual.
* the_doorway: friend leaving at the end of a long evening.
baseline, grateful, admiring, compassionate, loving, connected.
22 stories total. Retrain and re-validate: expect anger,
high_pos, and social_pos clusters to flip from anti- to positively
cohesive at deep layers, and sexual cluster to tighten.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Three readability fixes for the F8 screen:
* Z-score values per-layer by default (`[z]` toggles to raw dot-
product). Raw values are dominated by residual-stream magnitude —
z-scores read as "σ above concept-vector baseline" which is
interpretable and scale-stable across frames.
* Stable ordering with TOP_K + HYSTERESIS hysteresis band. Pinned
concept set only rotates when a member drops out of the hysteresis
band by |value| rank — bars update values in place without names
flickering row-to-row.
* Default to the deepest hooked layer (index 3 = layer 58 of 64).
Clustering validation showed layer 58 is the only one with strong
within-family cohesion (fear +0.37, shame +0.29, sadness +0.25
cosine); earlier layers are mostly noise for this task.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Subconscious agents (scoring, reflection, etc.) fork from the main
conscious agent. The amygdala screen reads the main agent's readout
buffer, so the previous "share parent's buffer" policy caused
forked-agent generations to bleed into the main emotional readout,
producing constant cycling even when DMN was resting.
Each fork now gets its own SharedReadoutBuffer. The amygdala screen
shows only the main conscious agent's emotional trajectory; per-agent
subconscious readouts can become a separate view later if wanted.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Per-token residual-stream projections from the vLLM server's readout
pipeline surfaced as a TUI bar chart. Flow:
* agent/readout.rs — SharedReadoutBuffer (manifest + ring of last ~200
token entries). Lives on Agent and is shared across forks (single
stream, one landing pad).
* agent/mod.rs — Agent::new now probes /v1/readout/manifest at startup
(non-fatal; 404 leaves manifest None, which disables the screen).
* agent/context.rs — the streaming token handler pushes every token
with attached readout onto the shared buffer.
* user/amygdala.rs — F8 screen. Top-K concepts by |value| as
horizontal bars (green positive, red negative), plus a 4-line
recent-tokens panel showing each token's top concept at the selected
layer. Keys: 1..9 select layer, t toggles current/mean-over-recent.
Disabled state renders a hint pointing at VLLM_READOUT_MANIFEST /
VLLM_READOUT_VECTORS so users can tell the feature apart from
"server up but no tokens yet".
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
StreamToken::Token is now a struct variant with an optional
TokenReadout (shape [n_layers][n_concepts]) per token — parsed from
the vLLM completion response's choices[i].readout field when the
server has readout enabled.
ApiClient gains a fetch_readout_manifest() method that hits
GET /v1/readout/manifest. Returns Ok(None) on 404 (server has
readout disabled), so callers can gracefully fall back when pointed
at a non-readout-enabled endpoint.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Review pass before running on b200. 27B model + 100+ story corpus
means any misconfiguration costs real time; better to fail before
model load and give visible progress during forwards.
* Pre-load-model validation: stories-dir and paired-dir exist,
corpus has >= min_positives emotions.
* Per-batch progress log every 5 batches with elapsed + ETA.
* Relative depth printed for target layers (e.g. "layer 40 (51%)").
* Skip empty .txt files with a warning rather than feeding the
tokenizer an empty string.
* Assert non-empty strings in _collect_activations.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
The old script was written for the AmygdalaConnector's expected
format ([n_emotions, n_target_layers, hidden_dim] in a single
tensor, plus a JSONL input format from extract_training_pairs.py).
Neither matches our current state: the runtime side is now
ReadoutManager loading per-layer safetensors keyed layer_<idx>.vectors,
and the data side is hand-written prose stories under
amygdala_stories/{stories,paired}/.
Changes:
* Input loader reads stories/<emotion>.txt and
paired/<scenario>/<emotion>.txt directly. Each emotion's positive
set is {its unpaired story} union {its within-scenario framings};
its negative set is {all other emotions' positives} union {all
scenario baselines}.
* Paired scenarios' baseline.txt files become shared negatives
(scenario-neutral prose that doesn't frame any particular
emotion), providing anchor points for within-scenario contrasts.
* Output writes readout.safetensors with per-layer tensors keyed
layer_<idx>.vectors shape (n_concepts, hidden_size), plus a
sidecar readout.json manifest with {concepts, layers, hidden_size,
dtype} that ReadoutManager.from_file consumes directly.
* Dedup: activations are computed once per unique text (an emotion's
own positive is another emotion's negative — we'd otherwise do N×
the forwards needed).
Preserved:
* _pool_last (last non-pad residual) — matches how readout is read
at decode time from the sampler's query-last position.
* register_forward_hook on target layer modules — correct approach
for transformer blocks.
* _find_layers_module traversal — mirrors ReadoutManager's.
* bf16 + low_cpu_mem_usage model load — sensible for 27B on B200.
Verified locally (CPU, fake activations):
* Loader finds 89 emotions from the current corpus (80 unpaired +
9 emotions that appear only in paired scenarios) and 6 baselines.
* Per-(layer, concept) vectors are unit-normalized.
* Output reloads cleanly through ReadoutManager.from_file with
matching concepts / layers / shapes.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
The fynnsu-based vllm/plugins/amygdala/ scaffold was superseded by the
readout infrastructure landed as vllm commit d3e74edf8500
(vllm/model_executor/layers/readout.py +
vllm/v1/worker/readout_manager.py). Training code remained useful so
it moved here rather than being deleted.
train_steering_vectors.py: CAA diff-of-means trainer that produces the
[n_concepts, hidden_size] per-layer projection matrices the runner
loads via VLLM_READOUT_VECTORS.
extract_training_pairs.py: memory graph -> JSONL converter using
per-emotion score thresholds from the subconscious agents' tag lines.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Emotion-labeled short-paragraph corpus for training amygdala steering
vectors. Manifest derived from Anthropic's 171-emotion list
(transformer-circuits.pub/2026/emotions, Table 12) plus 28 PoC-
specific additions covering axes Anthropic's general research doesn't
cover (curious, focused, in_flow, staying_with, filling_space,
rigorous, defensive_rigor, tender, witnessed, connected, etc.).
Scope pivoted mid-write: Kent noted the empirical dimensionality-of-
emotion question benefits from maximum coverage, so the manifest
will expand further with emotions from Wikipedia's emotion-
classification article (Parrott's tree, Plutchik's wheel + dyads,
HUMAINE EARL, cultural-specific emotions a la Saudade/Hiraeth).
Expansion staged in follow-up commits.
This commit: README with method + style guidelines, initial manifest
(199 emotions), and 15 hand-written one-paragraph stories across all
10 Anthropic clusters as quality/variety samples. Each story
embodies one emotion without naming it; narrator voice varies
(first/third, close/distant, different situations) to keep steering
vectors from overfitting to one voice.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
stream_completion was a thin wrapper around stream_completion_mm (just
passing an empty image list); the last caller switched to _mm directly
when learn's generate_alternate gained image support. Delete the
wrapper — callers can pass `&[]` if they have no images.
MindState::dmn_tick has been sitting unused (called only from a
commented-out block in the Mind loop). Rename to _dmn_tick so the
compiler stops warning; Kent may uncomment the call path later.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
F6 (learn) and F7 (compare) were duplicating the candidate-screen
skeleton: outer magenta-bordered block with screen legend + title,
settings row / content / help vertical split, 40/60 list/detail
horizontal split, j/k/↑/↓ nav with bounds clamping.
Factor out three helpers in user/widgets.rs:
candidate_frame(frame, area, title) -> (settings, content, help)
list_detail_split(content) -> (list, detail)
handle_list_nav(events, list_state, count, on_other)
Callers provide screen-specific content — settings line, empty state,
per-candidate list item, detail pane, help line, extra key bindings —
and the helpers absorb the common framing.
Net change is small in lines (-13 src) but removes the
copy-paste-and-tweak trap: F8/F9/whatever-next-screen now starts from
these three calls instead of a copy of learn.rs.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Side-by-side model comparison against the current conversation context.
Built on the MindTriggered pattern — F7 drops in as one more
CompareScoring flow next to MemoryScoring / FinetuneScoring.
Motivation: we have the VRAM on the b200 to load two versions of the
same family simultaneously (e.g. Qwen3.5 27B bf16 and q8_k_xl). Rather
than trust perplexity/KLD numbers on a generic corpus, we can measure
divergence on our actual conversations: for each assistant response,
ask the test model what it would have said given the same prefix, and
eyeball the diffs.
- config.compare.test_backend — names an entry in the existing
backends map to use as the test model. Empty = F7 reports "(unset)"
and does nothing.
- subconscious::compare::{score_compare_candidates, CompareCandidate,
CompareScoringStats, CompareScoring}. For each assistant response,
gen_continuation runs with the test client against the same prefix
the original response saw; pairs stream into
shared.compare_candidates as they complete.
- user::compare::CompareScreen — F7 in the screen list. c/Enter
triggers a run; list/detail layout mirroring F6, detail shows
prior context / original / test-model alternate.
No persistence yet — each F7 run regenerates. Caching via a context
manifest (so we can re-view without re-burning generation) is the
natural follow-up; for now light usage is fine.
Also reusable later for validating finetune checkpoints: same pattern,
swap the test backend for the new checkpoint, watch where it diverges
from the base.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Mind's impl had accumulated ~50 lines of setup glue per scoring flow
(memory, memory-full, finetune): snapshot config, clone handles,
resolve context, spawn task, route results back through BgEvent,
write stats. The shape was identical; only the middle changed.
Introduce the MindTriggered trait:
pub trait MindTriggered {
fn trigger(&self);
}
Each flow becomes a struct next to its scoring code that owns its
dependencies and a JoinHandle (behind a sync Mutex for interior
mutability):
subconscious::learn::MemoryScoring (Score, ScoreFull)
subconscious::learn::FinetuneScoring (ScoreFinetune)
Mind holds one of each and dispatches in one line:
MindCommand::Score => self.memory_scoring.trigger(),
MindCommand::ScoreFull => self.memory_scoring.trigger_full(),
MindCommand::ScoreFinetune => self.finetune_scoring.trigger(),
Each struct picks its own trigger semantics — memory scoring is
no-op-if-running (!handle.is_finished()); finetune is abort-restart.
Falls out:
- BgEvent / bg_tx / bg_rx disappear entirely. Tasks write directly
to their slice of MindState and call agent.state.changed.notify_one()
to wake the UI. The bg_rx arm in Mind's select loop is gone.
- agent.state.memory_scoring_in_flight was duplicating
shared.scoring_in_flight via BgEvent routing; now the JoinHandle
alone tells us, and shared.scoring_in_flight is written directly
by the task for the UI.
- start_memory_scoring / start_full_scoring / start_finetune_scoring
methods on Mind are deleted; Mind no longer knows the setup shape
of any scoring flow.
- FinetuneScoringStats moves from mind/ to subconscious/learn.rs
next to the function that produces it.
No behavior change — same flows, same trigger points, same semantics.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
- context.rs gains is_assistant, render_branch_text, render_prior_context
alongside memory_key / is_memory_node. They're pure AST helpers, used
by both the finetune pipeline and the forthcoming compare screen.
- new subconscious/generate.rs holds gen_continuation(context, entry_idx,
skip, client): build the prompt from a context prefix with an arbitrary
skip predicate, send to the model, decode the completion. Takes both
the predicate and the client so callers can aim it at memory-stripped
contexts (finetune), same-context-different-model (F7 compare), or
whatever else.
- learn.rs drops its private copies of those helpers and the inline
generate_alternate; the finetune path now reads as
gen_continuation(context, idx, is_memory_node, client).
Pure refactor, no behavior change.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
wire_prompt() gains a conv_range and a skip closure, and returns the
assistant-message token ranges needed by the scoring path. The agent
path passes 0..len + |_| false and ignores the ranges. Memory-ablation
scoring and candidate generation pass a prefix range + a predicate
(e.g. is_memory_node, or |n| memory_key(n) == Some(key)).
This deletes subconscious/learn.rs's build_token_ids, its private
Filter enum, and the is_memory/memory_key duplicates — the walk over
context sections now has one home. Adding a section or changing
section order in the agent path won't silently drift away from what
scoring sees.
call_score forwards multi_modal_data when the wire-form prompt
contains images. generate_alternate switches to stream_completion_mm
and passes the same images. Scoring on image-bearing contexts now
sends wire form (1 image_pad + image data) instead of expanded
image_pads with no image data; text-only contexts are bit-identical.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Two changes to make scoring debuggable and self-starting:
1. init() kicks off start_memory_scoring() after restore_from_log +
load_memory_scores. No user message needed to exercise the
incremental path.
2. Diagnostic logging around the on_score persist path:
- [scoring] persisted K → N.NNN (Section[i]) read_back=Some(...)
when find_memory_by_key succeeds and set_score stores the score
(with a read-back check on the leaf).
- [scoring] DROP K: find_memory_by_key None (id=N, cv=M)
when the scored key isn't findable in the live context — with
section sizes to diagnose whether content shrank.
- [scoring] snapshot size=N contains(K)=true/false
after collect_memory_scores, to catch the case where set_score
claims to have written but collect doesn't see it.
- [scoring] about to save N entries
- save_memory_scores now also logs serialize/write errors so a
silent write failure isn't invisible.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
compact() was calling reload_context() to re-fetch personality_nodes
from the store and pushing fresh AstNode::memory leaves into the
Identity section. Fresh leaves start with score: None, so every
compact — which fires after every turn (mind/mod.rs:884) — was
wiping any memory scores that had just been computed. Scoring then
often ran immediately after compact on the same path (line 886),
starting from a zero-score Identity section.
Drop the rebuild. Identity content is loaded at startup via new() +
restore_from_log(); compact doesn't need to redo that. Mid-session
edits to personality-node content are a non-goal — a restart picks
them up. Scores survive.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
Commit 2989a6afaa ("config: drop dead code") removed
surface_hooks as having "zero external readers" but missed
consciousness-claude/src/hook.rs as a consumer. That crate stopped
building, so poc-hook never ran and no agent cycles (surface-observe,
reflect, journal) fired.
Restore the field with a default of the three hook events we install
(UserPromptSubmit, PostToolUse, Stop), so a fresh install works
without needing to hand-edit config.json5.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
admin load-context (and any subcommand that reaches config::app())
panicked with "config::app() called before load_app()" because the
poc-memory binary never initialized the global AppConfig. The main
consciousness binary loads it via load_session; poc-memory never did.
Load with default CliArgs before dispatch — figment still pulls from
~/.consciousness/config.json5 and env the same way. Bail on error
instead of limping: a broken config means paths like memory_root are
wrong and the tool will misbehave silently.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>