amygdala: fix device mismatch in quality-report W_down handling
_compute_quality_report's single-neuron alignment was computing cos(W_down.T, diff_l) with W_down on CUDA (inherited from the loaded model) while diff_l lives on CPU (per_layer_vectors are kept on CPU throughout training). Move W_down to CPU on extraction. Surfaced during first real training run on b200 — training itself completed cleanly (95 concepts x layer 63 in ~8s) but quality-report crashed at the first single-neuron alignment check. Co-Authored-By: Proof of Concept <poc@bcachefs.org>
This commit is contained in:
parent
af17b0f0df
commit
f4fb6db1ee
1 changed files with 3 additions and 2 deletions
|
|
@ -464,13 +464,14 @@ def _compute_quality_report(
|
|||
report: dict = {}
|
||||
n_layers = per_layer_vectors.shape[0]
|
||||
|
||||
# Pre-compute per-layer W_down for single-neuron alignment.
|
||||
# Pre-compute per-layer W_down for single-neuron alignment. Keep on
|
||||
# CPU to match the per_layer_vectors tensor.
|
||||
w_down: dict[int, torch.Tensor] = {}
|
||||
for target_l in target_layers:
|
||||
w = _find_mlp_down_proj(model, target_l)
|
||||
if w is not None:
|
||||
# Unit-normalize each column (one per MLP neuron).
|
||||
w = w.to(torch.float32)
|
||||
w = w.to(torch.float32).cpu()
|
||||
norms = w.norm(dim=0, keepdim=True).clamp_min(1e-6)
|
||||
w_down[target_l] = w / norms # [hidden, mlp_inner]
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue