Design document for wiring the model's internal uncertainty, error
detection, and emotional valence circuits to the observe agent.
Based on contrastive activation probing (CAA, ACL 2024). Most of the
infrastructure already exists in extract_steering_vector.py and
vllm_export_hook.py — the bottleneck is building contrastive datasets.
Co-Authored-By: Kent Overstreet <kent.overstreet@gmail.com>