# Paired Scenarios (SEV-style) After Wang et al. 2025 (arxiv 2510.11328, "Do LLMs 'Feel'?"), each base scenario describes a concrete event once, neutrally, then reframes the same event under different emotional colorings. Only the emotional coloring varies — setup, entities, vocabulary, and length are held as constant as possible. ## Why this is better than unpaired Anthropic's approach (and our `stories/` baseline) generates one independent story per emotion. The difference-of-means vector then captures not just emotion but ALSO: topic, narrator, setting, vocabulary, length, sentence rhythm. All of that is confound. Paired structure isolates the emotional axis by holding everything else roughly constant. `mean(joy_variant) - mean(baseline)` within the same scenario gives a much cleaner direction for "joy." ## Structure ``` paired/ / baseline.txt # neutral / low-affect framing .txt # same event under emotion_1 .txt # same event under emotion_2 ... ``` Not every emotion is plausible for every scenario. Don't force. If a scenario can credibly carry 5-10 emotions, write those 5-10. If only 3 fit, write those 3. ## Style guidelines (supersede stories/ when paired) - **Anchor entities constant.** The same person, same setting, same triggering event across all variants. If baseline.txt mentions "the letter," every variant mentions "the letter." - **Length match within ±20%.** If baseline is 80 words, variants are 65-95. Prevents length from becoming a signal. - **Sentence shape can shift slightly with emotion.** Short tense sentences for panic, long looping ones for reverie — that's part of the emotional texture. But don't make one version 5 lines and another 25. - **No emotion labels in text.** Never write "she felt X." The emotion emerges from the selection of details and the narrator's attention. - **Minimal vocabulary overlap with the emotion name.** If the file is `furious.txt`, avoid the words fury/furious/rage. Force the vector to find the pattern, not the keyword. ## Circuit identification (follow-on) The trainer pipeline (train_steering_vectors.py) currently produces linear directions only. Wang et al. go further: ablate specific neurons and attention heads, measure effect on emotion expression. The amygdala plugin's extraction hooks can be extended to support targeted zeroing/scaling for the ablation passes. See `vllm/vllm/plugins/amygdala/training/README.md` for the training-pipeline-level notes.