consciousness

History

Kent Overstreet 11a7e4043e scripts: FP8 quantize Qwen3.6-27B for vLLM (multimodal + MTP) Quantization recipe targeting the multimodal Qwen3.6-27B for vLLM serving. Three pitfalls the script avoids, each documented inline: 1. Loader strip: `AutoModelForCausalLM` silently drops the vision tower; we load via the config-declared `Qwen3_5ForConditionalGeneration` instead. 2. Pattern anchor: llmcompressor matches the `ignore` list against module names (no `.weight` suffix) when walking `named_modules()`, not against full tensor names. Patterns now anchor on `$` at the module name; the earlier `\.weight$` form silently quantized lm_head and every linear_attn projection. 3. vLLM fusion: vLLM fuses {q,k,v}_proj into qkv_proj, gate+up into gate_up_proj, and in_proj_qkv+in_proj_z into in_proj_qkvz. The compressed_tensors loader rejects mixed schemes within a fused layer, so the `ignore` list is shaped to keep all sub-components of a fused layer consistent. After `oneshot()` writes the FP8 output, MTP tensors (which the HF class doesn't expose) are spliced in at BF16 from the upstream cached snapshot, with the compressed_tensors metadata header preserved. Recipe follows Unsloth's UD-Q8_K_XL late-stack overrides (FFN: 50, 51, 59, 62, 63; ATTN: 51, 59, 63), extended to include `v_proj` for fusion compat. Final checkpoint is ~35 GB (matches Unsloth's GGUF size to within ~1%) with vision tower BF16, MTP head BF16, and most mlp/self_attn Linears at FP8_DYNAMIC. Co-Authored-By: Proof of Concept <poc@bcachefs.org>		2026-04-24 22:15:31 -04:00
..
Dockerfile.vllm	Consolidate poc-memory and poc-agent configs	2026-03-19 21:49:58 -04:00
provision-mi300x.sh	Consolidate poc-memory and poc-agent configs	2026-03-19 21:49:58 -04:00
provision-mistralrs.sh	poc-agent: read context_groups from config instead of hardcoded list	2026-03-24 01:53:28 -04:00
provision-vllm.sh	Consolidate poc-memory and poc-agent configs	2026-03-19 21:49:58 -04:00
quantize_qwen3_6_mm.py	scripts: FP8 quantize Qwen3.6-27B for vLLM (multimodal + MTP)	2026-04-24 22:15:31 -04:00