amygdala: default subspace-k to full per-story rank
Kent: 'we have the memory to just take the big hammer approach'. Uncap k so each story's V_i spans its entire token-activation rowspace (clamped to min(n_tokens, hidden)). Memory is ~1.1GB total — fine. Co-Authored-By: Proof of Concept <poc@bcachefs.org>
This commit is contained in:
parent
389f1bbe03
commit
2411925700
1 changed files with 6 additions and 5 deletions
|
|
@ -850,12 +850,13 @@ def main() -> None:
|
||||||
ap.add_argument(
|
ap.add_argument(
|
||||||
"--subspace-k",
|
"--subspace-k",
|
||||||
type=int,
|
type=int,
|
||||||
default=512,
|
default=99999,
|
||||||
help="Max top-k right singular vectors per story for subspace method "
|
help="Max top-k right singular vectors per story for subspace method "
|
||||||
"(clamped to n_tokens per story). Default 512 is enough to span "
|
"(clamped to min(n_tokens, hidden_dim) per story). Default is "
|
||||||
"each story's full natural subspace including per-attention-head "
|
"effectively 'keep full per-story subspace' — each story's V_i "
|
||||||
"contributions on a hidden_dim=5120 residual stream. Smaller "
|
"spans its entire natural row space. On a hidden_dim=5120 "
|
||||||
"values (e.g. 20) discard per-head discriminability.",
|
"residual and ~500-token stories, that's ~500 vectors per story. "
|
||||||
|
"Memory is fine: 112 × 5120 × 500 × 4 bytes ≈ 1.1 GB.",
|
||||||
)
|
)
|
||||||
ap.add_argument(
|
ap.add_argument(
|
||||||
"--quality-report",
|
"--quality-report",
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue