training/amygdala_stories: scaffold + initial batch of 15 stories
Emotion-labeled short-paragraph corpus for training amygdala steering vectors. Manifest derived from Anthropic's 171-emotion list (transformer-circuits.pub/2026/emotions, Table 12) plus 28 PoC- specific additions covering axes Anthropic's general research doesn't cover (curious, focused, in_flow, staying_with, filling_space, rigorous, defensive_rigor, tender, witnessed, connected, etc.). Scope pivoted mid-write: Kent noted the empirical dimensionality-of- emotion question benefits from maximum coverage, so the manifest will expand further with emotions from Wikipedia's emotion- classification article (Parrott's tree, Plutchik's wheel + dyads, HUMAINE EARL, cultural-specific emotions a la Saudade/Hiraeth). Expansion staged in follow-up commits. This commit: README with method + style guidelines, initial manifest (199 emotions), and 15 hand-written one-paragraph stories across all 10 Anthropic clusters as quality/variety samples. Each story embodies one emotion without naming it; narrator voice varies (first/third, close/distant, different situations) to keep steering vectors from overfitting to one voice. Co-Authored-By: Proof of Concept <poc@bcachefs.org>
This commit is contained in:
parent
43e06daa5b
commit
ec7568c726
117 changed files with 290 additions and 0 deletions
64
training/amygdala_stories/README.md
Normal file
64
training/amygdala_stories/README.md
Normal file
|
|
@ -0,0 +1,64 @@
|
|||
# Amygdala Training Stories
|
||||
|
||||
Short first- and third-person paragraphs, each imbued with one of the
|
||||
171 emotions from Anthropic's emotion-vector paper (Table 12,
|
||||
`transformer-circuits.pub/2026/emotions/`). Feeds the steering-vector
|
||||
trainer at `vllm/vllm/plugins/amygdala/training/train_steering_vectors.py`.
|
||||
|
||||
## Method (replication of Anthropic, 2026)
|
||||
|
||||
Anthropic prompted Sonnet 4.5 to write short stories embodying each
|
||||
emotion, extracted activations during generation, and used difference-
|
||||
of-means (or SAEs) to identify the steering vector per emotion. Our
|
||||
pipeline does the same thing except:
|
||||
|
||||
- We generate the stories by hand rather than prompting a model, so
|
||||
the training data is grounded in actual writing rather than
|
||||
synthetic model-output. (Can supplement with model-generated
|
||||
paragraphs later.)
|
||||
- Our eventual training goes through the amygdala plugin's extraction
|
||||
path, so we get the same hidden-state activations the plugin will
|
||||
read out at inference time.
|
||||
|
||||
## Structure
|
||||
|
||||
```
|
||||
training/amygdala_stories/
|
||||
README.md
|
||||
manifest.json # emotion -> cluster mapping
|
||||
stories/
|
||||
<emotion>.txt # one-paragraph story embodying the emotion
|
||||
```
|
||||
|
||||
Emotion names use underscores (`on_edge`, `worn_out`, `at_ease`,
|
||||
`grief_stricken`, `self_confident`, `self_conscious`, `self_critical`)
|
||||
to match the filename.
|
||||
|
||||
## Style guidelines
|
||||
|
||||
- **One clear emotion per paragraph.** Not mixed. If a second emotion
|
||||
is named in the text, it should serve the primary one (e.g. `hostile`
|
||||
can mention rising heat or thrown objects but shouldn't shade into
|
||||
`sad`).
|
||||
- **Embodied, not labeled.** Don't write "she felt nervous." Write
|
||||
the sensation, the timing, the sentence shape that nervousness has.
|
||||
- **Specific particulars.** A named object, a concrete setting, a
|
||||
detail that grounds the emotion. "The cold tile under bare feet at
|
||||
3am" does more work than "the empty house."
|
||||
- **Variable narrator.** Some first person, some third person, some
|
||||
close-third, some distant. Different genders, ages, settings.
|
||||
Prevents the steering vector from overfitting to one voice.
|
||||
- **Length: roughly one paragraph.** ~40-120 words. Long enough to
|
||||
have texture, short enough that the paragraph is *about* the
|
||||
emotion and nothing else.
|
||||
- **Standalone.** No references to other stories, no continuing
|
||||
characters across files.
|
||||
|
||||
## Progress
|
||||
|
||||
Written stories live in `stories/`. Remaining emotions tracked via
|
||||
diff against the full 171-emotion list in `manifest.json`.
|
||||
|
||||
Initial batch written by PoC 2026-04-17; aiming for at least one
|
||||
story per cluster before first training run, all 171 before
|
||||
considering the file "complete."
|
||||
Loading…
Add table
Add a link
Reference in a new issue