amygdala: default subspace-k to full per-story rank

Kent: 'we have the memory to just take the big hammer approach'.
Uncap k so each story's V_i spans its entire token-activation rowspace
(clamped to min(n_tokens, hidden)). Memory is ~1.1GB total — fine.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
This commit is contained in:
Kent Overstreet 2026-04-18 21:41:32 -04:00
parent 389f1bbe03
commit 2411925700

View file

@ -850,12 +850,13 @@ def main() -> None:
ap.add_argument(
"--subspace-k",
type=int,
default=512,
default=99999,
help="Max top-k right singular vectors per story for subspace method "
"(clamped to n_tokens per story). Default 512 is enough to span "
"each story's full natural subspace including per-attention-head "
"contributions on a hidden_dim=5120 residual stream. Smaller "
"values (e.g. 20) discard per-head discriminability.",
"(clamped to min(n_tokens, hidden_dim) per story). Default is "
"effectively 'keep full per-story subspace' — each story's V_i "
"spans its entire natural row space. On a hidden_dim=5120 "
"residual and ~500-token stories, that's ~500 vectors per story. "
"Memory is fine: 112 × 5120 × 500 × 4 bytes ≈ 1.1 GB.",
)
ap.add_argument(
"--quality-report",