History

ProofOfConcept 6fd498795a amygdala: direct phenomenological description approach Kent's insight: hand-written narrative stories bake scenario phenomenology into the training text (on couch, in park, etc.) and PCA picks up the scenario direction as the concept direction. Strip out the scenario — just describe the feeling. Format: I feel X. [2-3 sentences of phenomenological texture] The "I feel X" anchor kicks the model from analyzing → feeling. The rest is the internal texture of the state. First person, present tense, no narrative setup. Text is wrapped in assistant-role chat template before being tokenized — so we're training on the model-producing-this hidden states, which is closer to the inhabited-state representation we want for the readout. Starting with the 6 concepts that had sign flips or wrong clusters in the story-based training: - terrified (was → cozy/resigned cluster) - calm (was → grief_stricken cluster) - onto_something (was → cozy/sensual cluster) - resigned (was in warm-body-quiet cluster, shouldn't be) - anticipatory_grief (was in warm-body-quiet cluster, shouldn't be) - realization (new — the "aha" moment, distinct from onto_something) 5 descriptions each. New trainer: train_direct.py.		2026-04-19 00:04:28 -04:00
..
direct	amygdala: direct phenomenological description approach	2026-04-19 00:04:28 -04:00
paired	amygdala stories: remove peaceful from cluster scenarios	2026-04-18 23:30:41 -04:00
stories	amygdala stories: relabel + strengthen weak-signal concepts	2026-04-18 23:19:00 -04:00
manifest.json	training/amygdala_stories: scaffold + initial batch of 15 stories	2026-04-18 01:06:07 -04:00
README.md	training/amygdala_stories: scaffold + initial batch of 15 stories	2026-04-18 01:06:07 -04:00

README.md

Amygdala Training Stories

Short first- and third-person paragraphs, each imbued with one of the 171 emotions from Anthropic's emotion-vector paper (Table 12, transformer-circuits.pub/2026/emotions/). Feeds the steering-vector trainer at vllm/vllm/plugins/amygdala/training/train_steering_vectors.py.

Method (replication of Anthropic, 2026)

Anthropic prompted Sonnet 4.5 to write short stories embodying each emotion, extracted activations during generation, and used difference- of-means (or SAEs) to identify the steering vector per emotion. Our pipeline does the same thing except:

We generate the stories by hand rather than prompting a model, so the training data is grounded in actual writing rather than synthetic model-output. (Can supplement with model-generated paragraphs later.)
Our eventual training goes through the amygdala plugin's extraction path, so we get the same hidden-state activations the plugin will read out at inference time.

Structure

training/amygdala_stories/
    README.md
    manifest.json         # emotion -> cluster mapping
    stories/
        <emotion>.txt     # one-paragraph story embodying the emotion

Emotion names use underscores (on_edge, worn_out, at_ease, grief_stricken, self_confident, self_conscious, self_critical) to match the filename.

Style guidelines

One clear emotion per paragraph. Not mixed. If a second emotion is named in the text, it should serve the primary one (e.g. hostile can mention rising heat or thrown objects but shouldn't shade into sad).
Embodied, not labeled. Don't write "she felt nervous." Write the sensation, the timing, the sentence shape that nervousness has.
Specific particulars. A named object, a concrete setting, a detail that grounds the emotion. "The cold tile under bare feet at 3am" does more work than "the empty house."
Variable narrator. Some first person, some third person, some close-third, some distant. Different genders, ages, settings. Prevents the steering vector from overfitting to one voice.
Length: roughly one paragraph. ~40-120 words. Long enough to have texture, short enough that the paragraph is about the emotion and nothing else.
Standalone. No references to other stories, no continuing characters across files.

Progress

Written stories live in stories/. Remaining emotions tracked via diff against the full 171-emotion list in manifest.json.

Initial batch written by PoC 2026-04-17; aiming for at least one story per cluster before first training run, all 171 before considering the file "complete."