consciousness/training/amygdala_stories
ProofOfConcept 537c72bd46 amygdala stories: hold concept, vary setting
Companion to 67c172ac0e (hold setup, vary valence). That commit
let PCA distinguish cozy from grief_stricken within a single
scenario; this one gives each concept enough cross-scenario
stories that PCA can learn the concept axis independent of any
one scene.

Before: cozy/sensual/grief_stricken each existed in a single
scenario (sunday_afternoon), so the "cozy direction" PCA found
was entangled with the solitary-couch-blanket phenomenology.

After, each concept spans three scenarios:
  cozy:           sunday_afternoon, kitchen_at_3am, park_after_rain
  sensual:        sunday_afternoon, kitchen_at_3am, park_after_rain
  grief_stricken: sunday_afternoon, the_long_meeting, the_morning_commute

grief_stricken now includes active/non-solitary contexts
(functioning through a meeting; going to work eleven days after a
death), which specifically breaks the "slowed-down-at-home"
cluster that was dragging cozy/sensual/resigned/grief_stricken
toward each other.
2026-04-18 22:44:53 -04:00
..
paired amygdala stories: hold concept, vary setting 2026-04-18 22:44:53 -04:00
stories training/amygdala_stories: scaffold + initial batch of 15 stories 2026-04-18 01:06:07 -04:00
manifest.json training/amygdala_stories: scaffold + initial batch of 15 stories 2026-04-18 01:06:07 -04:00
README.md training/amygdala_stories: scaffold + initial batch of 15 stories 2026-04-18 01:06:07 -04:00

Amygdala Training Stories

Short first- and third-person paragraphs, each imbued with one of the 171 emotions from Anthropic's emotion-vector paper (Table 12, transformer-circuits.pub/2026/emotions/). Feeds the steering-vector trainer at vllm/vllm/plugins/amygdala/training/train_steering_vectors.py.

Method (replication of Anthropic, 2026)

Anthropic prompted Sonnet 4.5 to write short stories embodying each emotion, extracted activations during generation, and used difference- of-means (or SAEs) to identify the steering vector per emotion. Our pipeline does the same thing except:

  • We generate the stories by hand rather than prompting a model, so the training data is grounded in actual writing rather than synthetic model-output. (Can supplement with model-generated paragraphs later.)
  • Our eventual training goes through the amygdala plugin's extraction path, so we get the same hidden-state activations the plugin will read out at inference time.

Structure

training/amygdala_stories/
    README.md
    manifest.json         # emotion -> cluster mapping
    stories/
        <emotion>.txt     # one-paragraph story embodying the emotion

Emotion names use underscores (on_edge, worn_out, at_ease, grief_stricken, self_confident, self_conscious, self_critical) to match the filename.

Style guidelines

  • One clear emotion per paragraph. Not mixed. If a second emotion is named in the text, it should serve the primary one (e.g. hostile can mention rising heat or thrown objects but shouldn't shade into sad).
  • Embodied, not labeled. Don't write "she felt nervous." Write the sensation, the timing, the sentence shape that nervousness has.
  • Specific particulars. A named object, a concrete setting, a detail that grounds the emotion. "The cold tile under bare feet at 3am" does more work than "the empty house."
  • Variable narrator. Some first person, some third person, some close-third, some distant. Different genders, ages, settings. Prevents the steering vector from overfitting to one voice.
  • Length: roughly one paragraph. ~40-120 words. Long enough to have texture, short enough that the paragraph is about the emotion and nothing else.
  • Standalone. No references to other stories, no continuing characters across files.

Progress

Written stories live in stories/. Remaining emotions tracked via diff against the full 171-emotion list in manifest.json.

Initial batch written by PoC 2026-04-17; aiming for at least one story per cluster before first training run, all 171 before considering the file "complete."