_compute_quality_report's single-neuron alignment was computing cos(W_down.T, diff_l) with W_down on CUDA (inherited from the loaded model) while diff_l lives on CPU (per_layer_vectors are kept on CPU throughout training). Move W_down to CPU on extraction. Surfaced during first real training run on b200 — training itself completed cleanly (95 concepts x layer 63 in ~8s) but quality-report crashed at the first single-neuron alignment check. Co-Authored-By: Proof of Concept <poc@bcachefs.org> |
||
|---|---|---|
| .. | ||
| amygdala_stories | ||
| amygdala_training | ||
| apollo_plugin | ||
| research | ||
| DESIGN.md | ||
| pyproject.toml | ||