History

ProofOfConcept 00a2cdce09 amygdala stories: relabel + strengthen weak-signal concepts Reread each story asking "what does this convey to me?" Found two clear mislabels and several concepts with too few positives for stable PCA: tender: only 1 story, and it was anticipatory grief (care for a dying dog), not tender. Moved to anticipatory_grief.txt as its own concept. Rewrote tender.txt + added 2 paired tender stories (the_doorway, the_undressing) — directed softness, gentle-by-nature, not gentle-because-fragile. bitter: letter_in_drawer/bitter was disillusioned / processed hurt ("did not slam the drawer"), not bitter. Rewrote it with actual sour grudge. Added the_long_meeting/bitter (watching colleague take credit for your reassigned work). peaceful: 1 story → 4 (added stories/peaceful.txt + paired park_after_rain, sunday_afternoon). onto_something: all 3 stories were code epiphanies, narrowing the concept. Added stories/onto_something.txt with a non-code pattern-click (sales-demo causing churn). terrified: 2 stories, both "waiting for bad news." Added kitchen_at_3am/terrified — acute threat-in-the-house terror.		2026-04-18 23:19:00 -04:00
..
finding_the_abstraction	amygdala: quality-report + cognitive-state training scenarios	2026-04-18 20:31:39 -04:00
finishing_the_patch	amygdala stories: give content + resigned more settings	2026-04-18 22:52:07 -04:00
kitchen_at_3am	amygdala stories: relabel + strengthen weak-signal concepts	2026-04-18 23:19:00 -04:00
letter_in_drawer	amygdala stories: relabel + strengthen weak-signal concepts	2026-04-18 23:19:00 -04:00
park_after_rain	amygdala stories: relabel + strengthen weak-signal concepts	2026-04-18 23:19:00 -04:00
reading_unfamiliar_code	amygdala: quality-report + cognitive-state training scenarios	2026-04-18 20:31:39 -04:00
sunday_afternoon	amygdala stories: relabel + strengthen weak-signal concepts	2026-04-18 23:19:00 -04:00
the_comment	amygdala stories: give content + resigned more settings	2026-04-18 22:52:07 -04:00
the_doorway	amygdala stories: relabel + strengthen weak-signal concepts	2026-04-18 23:19:00 -04:00
the_green_build	training/amygdala_stories: add 4 paired scenarios for weak clusters	2026-04-18 02:19:39 -04:00
the_long_meeting	amygdala stories: relabel + strengthen weak-signal concepts	2026-04-18 23:19:00 -04:00
the_morning_commute	amygdala stories: hold concept, vary setting	2026-04-18 22:44:53 -04:00
the_paper	training: add the_paper paired scenario for attention-engagement axis	2026-04-18 03:24:20 -04:00
the_undressing	amygdala stories: relabel + strengthen weak-signal concepts	2026-04-18 23:19:00 -04:00
the_writing_session	amygdala stories: give content + resigned more settings	2026-04-18 22:52:07 -04:00
tracing_a_bug	amygdala: quality-report + cognitive-state training scenarios	2026-04-18 20:31:39 -04:00
waiting_for_results	amygdala stories: held-setup + varied-valence disambiguation	2026-04-18 22:29:28 -04:00
README.md	training/amygdala_stories: scaffold + initial batch of 15 stories	2026-04-18 01:06:07 -04:00

README.md

Paired Scenarios (SEV-style)

After Wang et al. 2025 (arxiv 2510.11328, "Do LLMs 'Feel'?"), each base scenario describes a concrete event once, neutrally, then reframes the same event under different emotional colorings. Only the emotional coloring varies — setup, entities, vocabulary, and length are held as constant as possible.

Why this is better than unpaired

Anthropic's approach (and our stories/ baseline) generates one independent story per emotion. The difference-of-means vector then captures not just emotion but ALSO: topic, narrator, setting, vocabulary, length, sentence rhythm. All of that is confound.

Paired structure isolates the emotional axis by holding everything else roughly constant. mean(joy_variant) - mean(baseline) within the same scenario gives a much cleaner direction for "joy."

Structure

paired/
    <scenario_slug>/
        baseline.txt       # neutral / low-affect framing
        <emotion_1>.txt    # same event under emotion_1
        <emotion_2>.txt    # same event under emotion_2
        ...

Not every emotion is plausible for every scenario. Don't force. If a scenario can credibly carry 5-10 emotions, write those 5-10. If only 3 fit, write those 3.

Style guidelines (supersede stories/ when paired)

Anchor entities constant. The same person, same setting, same triggering event across all variants. If baseline.txt mentions "the letter," every variant mentions "the letter."
Length match within ±20%. If baseline is 80 words, variants are 65-95. Prevents length from becoming a signal.
Sentence shape can shift slightly with emotion. Short tense sentences for panic, long looping ones for reverie — that's part of the emotional texture. But don't make one version 5 lines and another 25.
No emotion labels in text. Never write "she felt X." The emotion emerges from the selection of details and the narrator's attention.
Minimal vocabulary overlap with the emotion name. If the file is furious.txt, avoid the words fury/furious/rage. Force the vector to find the pattern, not the keyword.

Circuit identification (follow-on)

The trainer pipeline (train_steering_vectors.py) currently produces linear directions only. Wang et al. go further: ablate specific neurons and attention heads, measure effect on emotion expression. The amygdala plugin's extraction hooks can be extended to support targeted zeroing/scaling for the ablation passes.

See vllm/vllm/plugins/amygdala/training/README.md for the training-pipeline-level notes.