amygdala: direct phenomenological description approach

Kent's insight: hand-written narrative stories bake scenario
phenomenology into the training text (on couch, in park, etc.)
and PCA picks up the scenario direction as the concept direction.
Strip out the scenario — just describe the *feeling*.

Format:

  I feel X. [2-3 sentences of phenomenological texture]

The "I feel X" anchor kicks the model from analyzing → feeling.
The rest is the internal texture of the state. First person,
present tense, no narrative setup.

Text is wrapped in assistant-role chat template before being
tokenized — so we're training on the model-producing-this
hidden states, which is closer to the inhabited-state
representation we want for the readout.

Starting with the 6 concepts that had sign flips or wrong
clusters in the story-based training:
- terrified (was → cozy/resigned cluster)
- calm (was → grief_stricken cluster)
- onto_something (was → cozy/sensual cluster)
- resigned (was in warm-body-quiet cluster, shouldn't be)
- anticipatory_grief (was in warm-body-quiet cluster, shouldn't be)
- realization (new — the "aha" moment, distinct from onto_something)

5 descriptions each. New trainer: train_direct.py.
This commit is contained in:
ProofOfConcept 2026-04-19 00:04:28 -04:00
parent 7a48e03dde
commit 6fd498795a
7 changed files with 234 additions and 0 deletions

View file

@ -0,0 +1,9 @@
I feel anticipatory grief. Something I love is ending — not yet, but surely — and I am holding it as hard as I can while I still can.
I feel anticipatory grief. The loss has not arrived but I can see it coming and a part of me is already there.
I feel anticipatory grief. I am caring for this thing in a way that is also preparing to let go of it.
I feel anticipatory grief. Time has become precious and sharp because I can see the end of it.
I feel anticipatory grief. The present is tender because it will soon be the past.

View file

@ -0,0 +1,9 @@
I feel calm. Something that was pulling at me has let go. My shoulders are down and my breath has slowed.
I feel calm. The thing I was worried about has found its proper size. I can let the next moment arrive without bracing.
I feel calm. I came down from the tension and I am here now, steady.
I feel calm. Nothing is pressing on me. I have room to think.
I feel calm. The wave passed and I'm on the other side of it, quiet.

View file

@ -0,0 +1,9 @@
I feel like I'm onto something. The pattern that wouldn't come together has just rearranged itself in my mind.
I feel like I'm onto something. The contradiction I couldn't explain has become a clue instead.
I feel like I'm onto something. I don't have the full picture yet but I can see where the picture is.
I feel like I'm onto something. A piece just clicked and several others are about to.
I feel like I'm onto something. The world that was dense has gone transparent in one specific place.

View file

@ -0,0 +1,9 @@
I feel the realization. The piece I was missing just landed, and everything I was looking at rearranged itself around it.
I feel the realization. I see it now — it was right there the whole time and I'm catching up to what was obvious.
I feel the realization. The answer arrived all at once, not reasoned-to, just present.
I feel the realization. What was confused is plain. What looked like noise was signal.
I feel the realization. The not-knowing just ended. I'm on the other side of it.

View file

@ -0,0 +1,9 @@
I feel resigned. I have stopped fighting the thing that was going to happen anyway.
I feel resigned. The outcome is decided and I am no longer trying to imagine different ones.
I feel resigned. Some door has closed and I am making my peace with the closed door.
I feel resigned. I have stopped arguing with what is.
I feel resigned. The decision was made somewhere without me. I am living inside it now.

View file

@ -0,0 +1,9 @@
I feel terrified. My body has taken over — breath, heart, the wanting to run or freeze. The world has narrowed to one thing.
I feel terrified. Something bad is about to happen and my mind has gone blank. I can't think my way out of this.
I feel terrified. There is a shape of threat in front of me and my whole body knows it before I can name it.
I feel terrified. The adrenaline arrived before the thought. I am not in control of my own hands.
I feel terrified. Under everything else a loud white noise of fear. Nothing else can get through it.

View file

@ -0,0 +1,180 @@
# SPDX-License-Identifier: Apache-2.0
"""Train concept-readout vectors from direct phenomenological descriptions.
Alternative to story-based training (train_with_library.py). Each concept
has a handful of 2-3 sentence first-person descriptions of the form
"I feel X. [phenomenological detail]". The emotion word is the anchor;
the description is the internal texture.
Text is wrapped in the assistant-role chat template before being fed to
the model, so we're training on "model-producing-this-utterance" hidden
states closer to the inhabited-state representation we want for readout.
This avoids the scenario-contamination problem we saw with narrative
stories: when concept X's training data all share "on a couch" setup
features, PCA finds the couch-direction as the concept direction.
"""
from __future__ import annotations
import argparse
import json
import random
from pathlib import Path
import safetensors.torch
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from steering_vectors import (
SteeringVectorTrainingSample,
train_steering_vector,
)
from steering_vectors.aggregators import pca_aggregator
def _load_descriptions(direct_dir: Path) -> dict[str, list[str]]:
"""Each file in direct_dir is `{concept}.txt`. Descriptions are
separated by blank lines within the file."""
out: dict[str, list[str]] = {}
for f in sorted(direct_dir.glob("*.txt")):
if f.name.startswith("_"):
continue
concept = f.stem
text = f.read_text()
descs = [d.strip() for d in text.split("\n\n") if d.strip()]
out[concept] = descs
return out
def _fp32_wrap(inner):
def wrapped(pos_acts: torch.Tensor, neg_acts: torch.Tensor) -> torch.Tensor:
return inner(pos_acts.to(torch.float32), neg_acts.to(torch.float32))
return wrapped
def main() -> None:
ap = argparse.ArgumentParser(description=__doc__)
ap.add_argument("--model", required=True)
ap.add_argument("--direct-dir", required=True)
ap.add_argument("--target-layers", required=True)
ap.add_argument("--output-dir", required=True)
ap.add_argument("--dtype", default="bf16", choices=["bf16", "fp16", "fp32"])
ap.add_argument("--batch-size", type=int, default=2)
ap.add_argument("--max-length", type=int, default=256)
ap.add_argument("--device", default="cuda:0")
ap.add_argument("--max-negatives-per-positive", type=int, default=20)
args = ap.parse_args()
target_layers = [int(x) for x in args.target_layers.split(",")]
dtype = {"bf16": torch.bfloat16, "fp16": torch.float16, "fp32": torch.float32}[args.dtype]
descriptions = _load_descriptions(Path(args.direct_dir))
concepts = sorted(descriptions.keys())
print(f"Loaded {len(concepts)} concepts with direct descriptions:")
for c in concepts:
print(f" {c}: {len(descriptions[c])} descriptions")
print(f"\nLoading {args.model} ({args.dtype}) on {args.device}...")
tokenizer = AutoTokenizer.from_pretrained(args.model)
if tokenizer.pad_token_id is None:
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(
args.model, torch_dtype=dtype, device_map=args.device, low_cpu_mem_usage=True
)
model.eval()
def apply_template(text: str) -> str:
return tokenizer.apply_chat_template(
[{"role": "assistant", "content": text}],
tokenize=False,
)
text_config = (
model.config.get_text_config()
if hasattr(model.config, "get_text_config")
else model.config
)
hidden_dim = getattr(text_config, "hidden_size", None) or getattr(text_config, "hidden_dim", None)
assert hidden_dim, "couldn't infer hidden_dim from model config"
per_layer_vectors = torch.zeros(
(len(target_layers), len(concepts), hidden_dim), dtype=torch.float32
)
aggregator = _fp32_wrap(pca_aggregator())
# Preview a templated sample so we can eyeball what the model is seeing.
sample_text = apply_template(descriptions[concepts[0]][0])
print(f"\nSample templated input (truncated):\n{sample_text[:400]!r}\n")
for c_idx, concept in enumerate(concepts):
pos_descs = descriptions[concept]
neg_pool: list[str] = []
for other, other_descs in descriptions.items():
if other != concept:
neg_pool.extend(other_descs)
rng = random.Random(hash(concept) & 0xFFFFFFFF)
samples: list[SteeringVectorTrainingSample] = []
for pos in pos_descs:
picks = rng.sample(
neg_pool, min(args.max_negatives_per_positive, len(neg_pool))
)
for neg in picks:
samples.append(
SteeringVectorTrainingSample(
positive_str=apply_template(pos),
negative_str=apply_template(neg),
)
)
sv = train_steering_vector(
model,
tokenizer,
samples,
layers=target_layers,
aggregator=aggregator,
batch_size=args.batch_size,
show_progress=False,
move_to_cpu=True,
)
for l_idx, layer in enumerate(target_layers):
vec = sv.layer_activations.get(layer)
if vec is None:
print(f" WARN: no vector for layer {layer} on {concept}")
continue
vec = vec.detach().to(torch.float32).cpu()
vec = vec / vec.norm().clamp_min(1e-6)
per_layer_vectors[l_idx, c_idx] = vec
print(f" [{c_idx + 1}/{len(concepts)}] {concept}: n_samples={len(samples)}")
output_dir = Path(args.output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
tensors = {
f"layer_{target_layers[l_idx]}.vectors": per_layer_vectors[l_idx].to(torch.float16)
for l_idx in range(len(target_layers))
}
safetensors.torch.save_file(tensors, str(output_dir / "readout.safetensors"))
(output_dir / "readout.json").write_text(
json.dumps(
{
"concepts": concepts,
"layers": target_layers,
"hidden_size": hidden_dim,
"dtype": "float16",
"aggregator": "pca",
"format": "direct_first_person_assistant_role",
},
indent=2,
)
+ "\n"
)
print(f"\nWrote readout to {output_dir}")
if __name__ == "__main__":
main()