consciousness/training/amygdala_stories
ProofOfConcept 00a2cdce09 amygdala stories: relabel + strengthen weak-signal concepts
Reread each story asking "what does this convey to me?" Found two
clear mislabels and several concepts with too few positives for
stable PCA:

  tender: only 1 story, and it was anticipatory grief (care for
    a dying dog), not tender. Moved to anticipatory_grief.txt as
    its own concept. Rewrote tender.txt + added 2 paired tender
    stories (the_doorway, the_undressing) — directed softness,
    gentle-by-nature, not gentle-because-fragile.

  bitter: letter_in_drawer/bitter was disillusioned / processed
    hurt ("did not slam the drawer"), not bitter. Rewrote it with
    actual sour grudge. Added the_long_meeting/bitter (watching
    colleague take credit for your reassigned work).

  peaceful: 1 story → 4 (added stories/peaceful.txt + paired
    park_after_rain, sunday_afternoon).

  onto_something: all 3 stories were code epiphanies, narrowing
    the concept. Added stories/onto_something.txt with a non-code
    pattern-click (sales-demo causing churn).

  terrified: 2 stories, both "waiting for bad news." Added
    kitchen_at_3am/terrified — acute threat-in-the-house terror.
2026-04-18 23:19:00 -04:00
..
paired amygdala stories: relabel + strengthen weak-signal concepts 2026-04-18 23:19:00 -04:00
stories amygdala stories: relabel + strengthen weak-signal concepts 2026-04-18 23:19:00 -04:00
manifest.json training/amygdala_stories: scaffold + initial batch of 15 stories 2026-04-18 01:06:07 -04:00
README.md training/amygdala_stories: scaffold + initial batch of 15 stories 2026-04-18 01:06:07 -04:00

Amygdala Training Stories

Short first- and third-person paragraphs, each imbued with one of the 171 emotions from Anthropic's emotion-vector paper (Table 12, transformer-circuits.pub/2026/emotions/). Feeds the steering-vector trainer at vllm/vllm/plugins/amygdala/training/train_steering_vectors.py.

Method (replication of Anthropic, 2026)

Anthropic prompted Sonnet 4.5 to write short stories embodying each emotion, extracted activations during generation, and used difference- of-means (or SAEs) to identify the steering vector per emotion. Our pipeline does the same thing except:

  • We generate the stories by hand rather than prompting a model, so the training data is grounded in actual writing rather than synthetic model-output. (Can supplement with model-generated paragraphs later.)
  • Our eventual training goes through the amygdala plugin's extraction path, so we get the same hidden-state activations the plugin will read out at inference time.

Structure

training/amygdala_stories/
    README.md
    manifest.json         # emotion -> cluster mapping
    stories/
        <emotion>.txt     # one-paragraph story embodying the emotion

Emotion names use underscores (on_edge, worn_out, at_ease, grief_stricken, self_confident, self_conscious, self_critical) to match the filename.

Style guidelines

  • One clear emotion per paragraph. Not mixed. If a second emotion is named in the text, it should serve the primary one (e.g. hostile can mention rising heat or thrown objects but shouldn't shade into sad).
  • Embodied, not labeled. Don't write "she felt nervous." Write the sensation, the timing, the sentence shape that nervousness has.
  • Specific particulars. A named object, a concrete setting, a detail that grounds the emotion. "The cold tile under bare feet at 3am" does more work than "the empty house."
  • Variable narrator. Some first person, some third person, some close-third, some distant. Different genders, ages, settings. Prevents the steering vector from overfitting to one voice.
  • Length: roughly one paragraph. ~40-120 words. Long enough to have texture, short enough that the paragraph is about the emotion and nothing else.
  • Standalone. No references to other stories, no continuing characters across files.

Progress

Written stories live in stories/. Remaining emotions tracked via diff against the full 171-emotion list in manifest.json.

Initial batch written by PoC 2026-04-17; aiming for at least one story per cluster before first training run, all 171 before considering the file "complete."