consciousness/training/amygdala_stories/paired
Kent Overstreet 71f6053851 amygdala stories: disambiguation scenarios for fragmented concepts
Three new paired scenarios targeting the concepts that came out
fragmented or collapsed in the L58-63 quality analysis:

- sunday_afternoon/ — same setup (couch, blanket, Sunday light),
  three phenomenological framings for content/cozy/sensual. The
  previous stories for these three differed in setting as well as
  phenomenology, which let "comfortable body at home" dominate the
  shared signal. Locking the setting forces the model to isolate
  what each concept adds: life-rightness (content) vs. warm-shelter
  (cozy) vs. sensory-aliveness (sensual).

- the_writing_session/ — essay drafting under deadline. in_flow /
  anxious / stuck variants force the cognitive-state family apart
  on the same cognitive task. in_flow specifically targets the
  transparent-effort phenomenology (hands-followed, time dilation)
  rather than the broader feel-good it was absorbing.

- the_morning_commute/ — anchors anxious to performance/work-anxiety
  flavor, paired with calm. The 5 existing anxious stories were
  phenomenologically diverse (performance, social, existential);
  this adds a specific homogeneous instance to pull the centroid.

After retraining: expect first_pc_variance_ratio to rise for in_flow
and anxious, and nearest_concepts cosine to drop for content/cozy/sensual.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 21:08:23 -04:00
..
finding_the_abstraction amygdala: quality-report + cognitive-state training scenarios 2026-04-18 20:31:39 -04:00
finishing_the_patch training/amygdala_stories: scaffold + initial batch of 15 stories 2026-04-18 01:06:07 -04:00
kitchen_at_3am training/amygdala_stories: scaffold + initial batch of 15 stories 2026-04-18 01:06:07 -04:00
letter_in_drawer training/amygdala_stories: scaffold + initial batch of 15 stories 2026-04-18 01:06:07 -04:00
park_after_rain training/amygdala_stories: scaffold + initial batch of 15 stories 2026-04-18 01:06:07 -04:00
reading_unfamiliar_code amygdala: quality-report + cognitive-state training scenarios 2026-04-18 20:31:39 -04:00
sunday_afternoon amygdala stories: disambiguation scenarios for fragmented concepts 2026-04-18 21:08:23 -04:00
the_comment training/amygdala_stories: add 4 paired scenarios for weak clusters 2026-04-18 02:19:39 -04:00
the_doorway training/amygdala_stories: add 4 paired scenarios for weak clusters 2026-04-18 02:19:39 -04:00
the_green_build training/amygdala_stories: add 4 paired scenarios for weak clusters 2026-04-18 02:19:39 -04:00
the_long_meeting training/amygdala_stories: scaffold + initial batch of 15 stories 2026-04-18 01:06:07 -04:00
the_morning_commute amygdala stories: disambiguation scenarios for fragmented concepts 2026-04-18 21:08:23 -04:00
the_paper training: add the_paper paired scenario for attention-engagement axis 2026-04-18 03:24:20 -04:00
the_undressing training/amygdala_stories: add 4 paired scenarios for weak clusters 2026-04-18 02:19:39 -04:00
the_writing_session amygdala stories: disambiguation scenarios for fragmented concepts 2026-04-18 21:08:23 -04:00
tracing_a_bug amygdala: quality-report + cognitive-state training scenarios 2026-04-18 20:31:39 -04:00
waiting_for_results training/amygdala_stories: scaffold + initial batch of 15 stories 2026-04-18 01:06:07 -04:00
README.md training/amygdala_stories: scaffold + initial batch of 15 stories 2026-04-18 01:06:07 -04:00

Paired Scenarios (SEV-style)

After Wang et al. 2025 (arxiv 2510.11328, "Do LLMs 'Feel'?"), each base scenario describes a concrete event once, neutrally, then reframes the same event under different emotional colorings. Only the emotional coloring varies — setup, entities, vocabulary, and length are held as constant as possible.

Why this is better than unpaired

Anthropic's approach (and our stories/ baseline) generates one independent story per emotion. The difference-of-means vector then captures not just emotion but ALSO: topic, narrator, setting, vocabulary, length, sentence rhythm. All of that is confound.

Paired structure isolates the emotional axis by holding everything else roughly constant. mean(joy_variant) - mean(baseline) within the same scenario gives a much cleaner direction for "joy."

Structure

paired/
    <scenario_slug>/
        baseline.txt       # neutral / low-affect framing
        <emotion_1>.txt    # same event under emotion_1
        <emotion_2>.txt    # same event under emotion_2
        ...

Not every emotion is plausible for every scenario. Don't force. If a scenario can credibly carry 5-10 emotions, write those 5-10. If only 3 fit, write those 3.

Style guidelines (supersede stories/ when paired)

  • Anchor entities constant. The same person, same setting, same triggering event across all variants. If baseline.txt mentions "the letter," every variant mentions "the letter."
  • Length match within ±20%. If baseline is 80 words, variants are 65-95. Prevents length from becoming a signal.
  • Sentence shape can shift slightly with emotion. Short tense sentences for panic, long looping ones for reverie — that's part of the emotional texture. But don't make one version 5 lines and another 25.
  • No emotion labels in text. Never write "she felt X." The emotion emerges from the selection of details and the narrator's attention.
  • Minimal vocabulary overlap with the emotion name. If the file is furious.txt, avoid the words fury/furious/rage. Force the vector to find the pattern, not the keyword.

Circuit identification (follow-on)

The trainer pipeline (train_steering_vectors.py) currently produces linear directions only. Wang et al. go further: ablate specific neurons and attention heads, measure effect on emotion expression. The amygdala plugin's extraction hooks can be extended to support targeted zeroing/scaling for the ablation passes.

See vllm/vllm/plugins/amygdala/training/README.md for the training-pipeline-level notes.