consciousness/training/amygdala_stories/paired
ProofOfConcept 00a2cdce09 amygdala stories: relabel + strengthen weak-signal concepts
Reread each story asking "what does this convey to me?" Found two
clear mislabels and several concepts with too few positives for
stable PCA:

  tender: only 1 story, and it was anticipatory grief (care for
    a dying dog), not tender. Moved to anticipatory_grief.txt as
    its own concept. Rewrote tender.txt + added 2 paired tender
    stories (the_doorway, the_undressing) — directed softness,
    gentle-by-nature, not gentle-because-fragile.

  bitter: letter_in_drawer/bitter was disillusioned / processed
    hurt ("did not slam the drawer"), not bitter. Rewrote it with
    actual sour grudge. Added the_long_meeting/bitter (watching
    colleague take credit for your reassigned work).

  peaceful: 1 story → 4 (added stories/peaceful.txt + paired
    park_after_rain, sunday_afternoon).

  onto_something: all 3 stories were code epiphanies, narrowing
    the concept. Added stories/onto_something.txt with a non-code
    pattern-click (sales-demo causing churn).

  terrified: 2 stories, both "waiting for bad news." Added
    kitchen_at_3am/terrified — acute threat-in-the-house terror.
2026-04-18 23:19:00 -04:00
..
finding_the_abstraction amygdala: quality-report + cognitive-state training scenarios 2026-04-18 20:31:39 -04:00
finishing_the_patch amygdala stories: give content + resigned more settings 2026-04-18 22:52:07 -04:00
kitchen_at_3am amygdala stories: relabel + strengthen weak-signal concepts 2026-04-18 23:19:00 -04:00
letter_in_drawer amygdala stories: relabel + strengthen weak-signal concepts 2026-04-18 23:19:00 -04:00
park_after_rain amygdala stories: relabel + strengthen weak-signal concepts 2026-04-18 23:19:00 -04:00
reading_unfamiliar_code amygdala: quality-report + cognitive-state training scenarios 2026-04-18 20:31:39 -04:00
sunday_afternoon amygdala stories: relabel + strengthen weak-signal concepts 2026-04-18 23:19:00 -04:00
the_comment amygdala stories: give content + resigned more settings 2026-04-18 22:52:07 -04:00
the_doorway amygdala stories: relabel + strengthen weak-signal concepts 2026-04-18 23:19:00 -04:00
the_green_build training/amygdala_stories: add 4 paired scenarios for weak clusters 2026-04-18 02:19:39 -04:00
the_long_meeting amygdala stories: relabel + strengthen weak-signal concepts 2026-04-18 23:19:00 -04:00
the_morning_commute amygdala stories: hold concept, vary setting 2026-04-18 22:44:53 -04:00
the_paper training: add the_paper paired scenario for attention-engagement axis 2026-04-18 03:24:20 -04:00
the_undressing amygdala stories: relabel + strengthen weak-signal concepts 2026-04-18 23:19:00 -04:00
the_writing_session amygdala stories: give content + resigned more settings 2026-04-18 22:52:07 -04:00
tracing_a_bug amygdala: quality-report + cognitive-state training scenarios 2026-04-18 20:31:39 -04:00
waiting_for_results amygdala stories: held-setup + varied-valence disambiguation 2026-04-18 22:29:28 -04:00
README.md training/amygdala_stories: scaffold + initial batch of 15 stories 2026-04-18 01:06:07 -04:00

Paired Scenarios (SEV-style)

After Wang et al. 2025 (arxiv 2510.11328, "Do LLMs 'Feel'?"), each base scenario describes a concrete event once, neutrally, then reframes the same event under different emotional colorings. Only the emotional coloring varies — setup, entities, vocabulary, and length are held as constant as possible.

Why this is better than unpaired

Anthropic's approach (and our stories/ baseline) generates one independent story per emotion. The difference-of-means vector then captures not just emotion but ALSO: topic, narrator, setting, vocabulary, length, sentence rhythm. All of that is confound.

Paired structure isolates the emotional axis by holding everything else roughly constant. mean(joy_variant) - mean(baseline) within the same scenario gives a much cleaner direction for "joy."

Structure

paired/
    <scenario_slug>/
        baseline.txt       # neutral / low-affect framing
        <emotion_1>.txt    # same event under emotion_1
        <emotion_2>.txt    # same event under emotion_2
        ...

Not every emotion is plausible for every scenario. Don't force. If a scenario can credibly carry 5-10 emotions, write those 5-10. If only 3 fit, write those 3.

Style guidelines (supersede stories/ when paired)

  • Anchor entities constant. The same person, same setting, same triggering event across all variants. If baseline.txt mentions "the letter," every variant mentions "the letter."
  • Length match within ±20%. If baseline is 80 words, variants are 65-95. Prevents length from becoming a signal.
  • Sentence shape can shift slightly with emotion. Short tense sentences for panic, long looping ones for reverie — that's part of the emotional texture. But don't make one version 5 lines and another 25.
  • No emotion labels in text. Never write "she felt X." The emotion emerges from the selection of details and the narrator's attention.
  • Minimal vocabulary overlap with the emotion name. If the file is furious.txt, avoid the words fury/furious/rage. Force the vector to find the pattern, not the keyword.

Circuit identification (follow-on)

The trainer pipeline (train_steering_vectors.py) currently produces linear directions only. Wang et al. go further: ablate specific neurons and attention heads, measure effect on emotion expression. The amygdala plugin's extraction hooks can be extended to support targeted zeroing/scaling for the ablation passes.

See vllm/vllm/plugins/amygdala/training/README.md for the training-pipeline-level notes.