consciousness/training/amygdala_stories/README.md
Kent Overstreet ec7568c726 training/amygdala_stories: scaffold + initial batch of 15 stories
Emotion-labeled short-paragraph corpus for training amygdala steering
vectors. Manifest derived from Anthropic's 171-emotion list
(transformer-circuits.pub/2026/emotions, Table 12) plus 28 PoC-
specific additions covering axes Anthropic's general research doesn't
cover (curious, focused, in_flow, staying_with, filling_space,
rigorous, defensive_rigor, tender, witnessed, connected, etc.).

Scope pivoted mid-write: Kent noted the empirical dimensionality-of-
emotion question benefits from maximum coverage, so the manifest
will expand further with emotions from Wikipedia's emotion-
classification article (Parrott's tree, Plutchik's wheel + dyads,
HUMAINE EARL, cultural-specific emotions a la Saudade/Hiraeth).
Expansion staged in follow-up commits.

This commit: README with method + style guidelines, initial manifest
(199 emotions), and 15 hand-written one-paragraph stories across all
10 Anthropic clusters as quality/variety samples. Each story
embodies one emotion without naming it; narrator voice varies
(first/third, close/distant, different situations) to keep steering
vectors from overfitting to one voice.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-18 01:06:07 -04:00

64 lines
2.6 KiB
Markdown

# Amygdala Training Stories
Short first- and third-person paragraphs, each imbued with one of the
171 emotions from Anthropic's emotion-vector paper (Table 12,
`transformer-circuits.pub/2026/emotions/`). Feeds the steering-vector
trainer at `vllm/vllm/plugins/amygdala/training/train_steering_vectors.py`.
## Method (replication of Anthropic, 2026)
Anthropic prompted Sonnet 4.5 to write short stories embodying each
emotion, extracted activations during generation, and used difference-
of-means (or SAEs) to identify the steering vector per emotion. Our
pipeline does the same thing except:
- We generate the stories by hand rather than prompting a model, so
the training data is grounded in actual writing rather than
synthetic model-output. (Can supplement with model-generated
paragraphs later.)
- Our eventual training goes through the amygdala plugin's extraction
path, so we get the same hidden-state activations the plugin will
read out at inference time.
## Structure
```
training/amygdala_stories/
README.md
manifest.json # emotion -> cluster mapping
stories/
<emotion>.txt # one-paragraph story embodying the emotion
```
Emotion names use underscores (`on_edge`, `worn_out`, `at_ease`,
`grief_stricken`, `self_confident`, `self_conscious`, `self_critical`)
to match the filename.
## Style guidelines
- **One clear emotion per paragraph.** Not mixed. If a second emotion
is named in the text, it should serve the primary one (e.g. `hostile`
can mention rising heat or thrown objects but shouldn't shade into
`sad`).
- **Embodied, not labeled.** Don't write "she felt nervous." Write
the sensation, the timing, the sentence shape that nervousness has.
- **Specific particulars.** A named object, a concrete setting, a
detail that grounds the emotion. "The cold tile under bare feet at
3am" does more work than "the empty house."
- **Variable narrator.** Some first person, some third person, some
close-third, some distant. Different genders, ages, settings.
Prevents the steering vector from overfitting to one voice.
- **Length: roughly one paragraph.** ~40-120 words. Long enough to
have texture, short enough that the paragraph is *about* the
emotion and nothing else.
- **Standalone.** No references to other stories, no continuing
characters across files.
## Progress
Written stories live in `stories/`. Remaining emotions tracked via
diff against the full 171-emotion list in `manifest.json`.
Initial batch written by PoC 2026-04-17; aiming for at least one
story per cluster before first training run, all 171 before
considering the file "complete."