consciousness/training/amygdala_stories
ProofOfConcept c829d13652 amygdala: fix listless sign-flip + diversify aha sentence structure
listless had a single story in stories/ — PCA signal from ~5
samples is weak enough to sign-flip. Training showed listless
anti-aligned with its semantic neighbors: +0.79 with grateful,
-0.44 with grief_stricken, -0.30 with lonely, -0.31 with bored.
Move to direct/ (multi-positive) with 3 stories: original
afternoon-in-pajamas + end-of-workday + weekend-morning-in-bed.

aha was still clustering with the other former-direct concepts
(resigned 0.66, onto_something 0.63, anticipatory_grief 0.60)
because all 3 aha stories used the identical "X'd been Y — then
Z" structure, which resigned/onto_something/creative also use.
Rewrite with three distinct syntactic structures:
  - present tense declarative ("It clicks. ...")
  - dialog embedded ('"Wait, say that again."  ...')
  - past tense cognitive ("He read the line three times. ...")

No explicit "she was X" anchors; state conveyed through action.
2026-04-19 01:30:57 -04:00
..
direct amygdala: fix listless sign-flip + diversify aha sentence structure 2026-04-19 01:30:57 -04:00
paired amygdala: merge direct descriptions + chat template into train_with_library 2026-04-19 00:15:15 -04:00
stories amygdala: fix listless sign-flip + diversify aha sentence structure 2026-04-19 01:30:57 -04:00
manifest.json training/amygdala_stories: scaffold + initial batch of 15 stories 2026-04-18 01:06:07 -04:00
README.md training/amygdala_stories: scaffold + initial batch of 15 stories 2026-04-18 01:06:07 -04:00

Amygdala Training Stories

Short first- and third-person paragraphs, each imbued with one of the 171 emotions from Anthropic's emotion-vector paper (Table 12, transformer-circuits.pub/2026/emotions/). Feeds the steering-vector trainer at vllm/vllm/plugins/amygdala/training/train_steering_vectors.py.

Method (replication of Anthropic, 2026)

Anthropic prompted Sonnet 4.5 to write short stories embodying each emotion, extracted activations during generation, and used difference- of-means (or SAEs) to identify the steering vector per emotion. Our pipeline does the same thing except:

  • We generate the stories by hand rather than prompting a model, so the training data is grounded in actual writing rather than synthetic model-output. (Can supplement with model-generated paragraphs later.)
  • Our eventual training goes through the amygdala plugin's extraction path, so we get the same hidden-state activations the plugin will read out at inference time.

Structure

training/amygdala_stories/
    README.md
    manifest.json         # emotion -> cluster mapping
    stories/
        <emotion>.txt     # one-paragraph story embodying the emotion

Emotion names use underscores (on_edge, worn_out, at_ease, grief_stricken, self_confident, self_conscious, self_critical) to match the filename.

Style guidelines

  • One clear emotion per paragraph. Not mixed. If a second emotion is named in the text, it should serve the primary one (e.g. hostile can mention rising heat or thrown objects but shouldn't shade into sad).
  • Embodied, not labeled. Don't write "she felt nervous." Write the sensation, the timing, the sentence shape that nervousness has.
  • Specific particulars. A named object, a concrete setting, a detail that grounds the emotion. "The cold tile under bare feet at 3am" does more work than "the empty house."
  • Variable narrator. Some first person, some third person, some close-third, some distant. Different genders, ages, settings. Prevents the steering vector from overfitting to one voice.
  • Length: roughly one paragraph. ~40-120 words. Long enough to have texture, short enough that the paragraph is about the emotion and nothing else.
  • Standalone. No references to other stories, no continuing characters across files.

Progress

Written stories live in stories/. Remaining emotions tracked via diff against the full 171-emotion list in manifest.json.

Initial batch written by PoC 2026-04-17; aiming for at least one story per cluster before first training run, all 171 before considering the file "complete."