consciousness/training/amygdala_stories/README.md

# Amygdala Training Stories

Short first- and third-person paragraphs, each imbued with one of the
171 emotions from Anthropic's emotion-vector paper (Table 12,
`transformer-circuits.pub/2026/emotions/`). Feeds the steering-vector
trainer at `vllm/vllm/plugins/amygdala/training/train_steering_vectors.py`.

## Method (replication of Anthropic, 2026)

Anthropic prompted Sonnet 4.5 to write short stories embodying each
emotion, extracted activations during generation, and used difference-
of-means (or SAEs) to identify the steering vector per emotion. Our
pipeline does the same thing except:

- We generate the stories by hand rather than prompting a model, so
  the training data is grounded in actual writing rather than
  synthetic model-output. (Can supplement with model-generated
  paragraphs later.)
- Our eventual training goes through the amygdala plugin's extraction
  path, so we get the same hidden-state activations the plugin will
  read out at inference time.

## Structure

```
training/amygdala_stories/
    README.md
    manifest.json         # emotion -> cluster mapping
    stories/
        <emotion>.txt     # one-paragraph story embodying the emotion
```

Emotion names use underscores (`on_edge`, `worn_out`, `at_ease`,
`grief_stricken`, `self_confident`, `self_conscious`, `self_critical`)
to match the filename.

## Style guidelines

- **One clear emotion per paragraph.** Not mixed. If a second emotion
  is named in the text, it should serve the primary one (e.g. `hostile`
  can mention rising heat or thrown objects but shouldn't shade into
  `sad`).
- **Embodied, not labeled.** Don't write "she felt nervous." Write
  the sensation, the timing, the sentence shape that nervousness has.
- **Specific particulars.** A named object, a concrete setting, a
  detail that grounds the emotion. "The cold tile under bare feet at
  3am" does more work than "the empty house."
- **Variable narrator.** Some first person, some third person, some
  close-third, some distant. Different genders, ages, settings.
  Prevents the steering vector from overfitting to one voice.
- **Length: roughly one paragraph.** ~40-120 words. Long enough to
  have texture, short enough that the paragraph is *about* the
  emotion and nothing else.
- **Standalone.** No references to other stories, no continuing
  characters across files.

## Progress

Written stories live in `stories/`. Remaining emotions tracked via
diff against the full 171-emotion list in `manifest.json`.

Initial batch written by PoC 2026-04-17; aiming for at least one
story per cluster before first training run, all 171 before
considering the file "complete."