Emotion-labeled short-paragraph corpus for training amygdala steering vectors. Manifest derived from Anthropic's 171-emotion list (transformer-circuits.pub/2026/emotions, Table 12) plus 28 PoC- specific additions covering axes Anthropic's general research doesn't cover (curious, focused, in_flow, staying_with, filling_space, rigorous, defensive_rigor, tender, witnessed, connected, etc.). Scope pivoted mid-write: Kent noted the empirical dimensionality-of- emotion question benefits from maximum coverage, so the manifest will expand further with emotions from Wikipedia's emotion- classification article (Parrott's tree, Plutchik's wheel + dyads, HUMAINE EARL, cultural-specific emotions a la Saudade/Hiraeth). Expansion staged in follow-up commits. This commit: README with method + style guidelines, initial manifest (199 emotions), and 15 hand-written one-paragraph stories across all 10 Anthropic clusters as quality/variety samples. Each story embodies one emotion without naming it; narrator voice varies (first/third, close/distant, different situations) to keep steering vectors from overfitting to one voice. Co-Authored-By: Proof of Concept <poc@bcachefs.org>
64 lines
2.6 KiB
Markdown
64 lines
2.6 KiB
Markdown
# Amygdala Training Stories
|
|
|
|
Short first- and third-person paragraphs, each imbued with one of the
|
|
171 emotions from Anthropic's emotion-vector paper (Table 12,
|
|
`transformer-circuits.pub/2026/emotions/`). Feeds the steering-vector
|
|
trainer at `vllm/vllm/plugins/amygdala/training/train_steering_vectors.py`.
|
|
|
|
## Method (replication of Anthropic, 2026)
|
|
|
|
Anthropic prompted Sonnet 4.5 to write short stories embodying each
|
|
emotion, extracted activations during generation, and used difference-
|
|
of-means (or SAEs) to identify the steering vector per emotion. Our
|
|
pipeline does the same thing except:
|
|
|
|
- We generate the stories by hand rather than prompting a model, so
|
|
the training data is grounded in actual writing rather than
|
|
synthetic model-output. (Can supplement with model-generated
|
|
paragraphs later.)
|
|
- Our eventual training goes through the amygdala plugin's extraction
|
|
path, so we get the same hidden-state activations the plugin will
|
|
read out at inference time.
|
|
|
|
## Structure
|
|
|
|
```
|
|
training/amygdala_stories/
|
|
README.md
|
|
manifest.json # emotion -> cluster mapping
|
|
stories/
|
|
<emotion>.txt # one-paragraph story embodying the emotion
|
|
```
|
|
|
|
Emotion names use underscores (`on_edge`, `worn_out`, `at_ease`,
|
|
`grief_stricken`, `self_confident`, `self_conscious`, `self_critical`)
|
|
to match the filename.
|
|
|
|
## Style guidelines
|
|
|
|
- **One clear emotion per paragraph.** Not mixed. If a second emotion
|
|
is named in the text, it should serve the primary one (e.g. `hostile`
|
|
can mention rising heat or thrown objects but shouldn't shade into
|
|
`sad`).
|
|
- **Embodied, not labeled.** Don't write "she felt nervous." Write
|
|
the sensation, the timing, the sentence shape that nervousness has.
|
|
- **Specific particulars.** A named object, a concrete setting, a
|
|
detail that grounds the emotion. "The cold tile under bare feet at
|
|
3am" does more work than "the empty house."
|
|
- **Variable narrator.** Some first person, some third person, some
|
|
close-third, some distant. Different genders, ages, settings.
|
|
Prevents the steering vector from overfitting to one voice.
|
|
- **Length: roughly one paragraph.** ~40-120 words. Long enough to
|
|
have texture, short enough that the paragraph is *about* the
|
|
emotion and nothing else.
|
|
- **Standalone.** No references to other stories, no continuing
|
|
characters across files.
|
|
|
|
## Progress
|
|
|
|
Written stories live in `stories/`. Remaining emotions tracked via
|
|
diff against the full 171-emotion list in `manifest.json`.
|
|
|
|
Initial batch written by PoC 2026-04-17; aiming for at least one
|
|
story per cluster before first training run, all 171 before
|
|
considering the file "complete."
|