Commit graph

14 commits

Author SHA1 Message Date
Kent Overstreet
11a7e4043e scripts: FP8 quantize Qwen3.6-27B for vLLM (multimodal + MTP)
Quantization recipe targeting the multimodal Qwen3.6-27B for vLLM
serving. Three pitfalls the script avoids, each documented inline:

1. Loader strip: `AutoModelForCausalLM` silently drops the vision
   tower; we load via the config-declared
   `Qwen3_5ForConditionalGeneration` instead.

2. Pattern anchor: llmcompressor matches the `ignore` list against
   module names (no `.weight` suffix) when walking `named_modules()`,
   not against full tensor names. Patterns now anchor on `$` at the
   module name; the earlier `\.weight$` form silently quantized
   lm_head and every linear_attn projection.

3. vLLM fusion: vLLM fuses {q,k,v}_proj into qkv_proj, gate+up into
   gate_up_proj, and in_proj_qkv+in_proj_z into in_proj_qkvz. The
   compressed_tensors loader rejects mixed schemes within a fused
   layer, so the `ignore` list is shaped to keep all sub-components
   of a fused layer consistent.

After `oneshot()` writes the FP8 output, MTP tensors (which the HF
class doesn't expose) are spliced in at BF16 from the upstream cached
snapshot, with the compressed_tensors metadata header preserved.

Recipe follows Unsloth's UD-Q8_K_XL late-stack overrides (FFN: 50,
51, 59, 62, 63; ATTN: 51, 59, 63), extended to include `v_proj` for
fusion compat. Final checkpoint is ~35 GB (matches Unsloth's GGUF
size to within ~1%) with vision tower BF16, MTP head BF16, and most
mlp/self_attn Linears at FP8_DYNAMIC.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
2026-04-24 22:15:31 -04:00
Kent Overstreet
aa46b1d5a6 poc-agent: read context_groups from config instead of hardcoded list
- Remove MEMORY_FILES constant from identity.rs
- Add ContextGroup struct for deserializing from config
- Load context_groups from ~/.config/poc-agent/config.json5
- Check ~/.config/poc-agent/ first for identity files, then project/global
- Debug screen now shows what's actually configured

This eliminates the hardcoded duplication and makes the debug output
match what's in the config file.
2026-03-24 01:53:28 -04:00
Kent Overstreet
d9b56a02c3 Consolidate poc-memory and poc-agent configs
poc-memory now reads from poc-agent's config.json5 as the primary
config source. Memory-specific settings live in a "memory" section;
API credentials are resolved from the shared model/backend config
instead of being duplicated.

- Add "memory" section to ~/.config/poc-agent/config.json5
- poc-memory config.rs: try shared config first, fall back to
  legacy JSONL
- API fields (base_url, api_key, model) resolved via
  memory.agent_model -> models -> backend lookup
- Add json5 dependency for proper JSON5 parsing
- Update provisioning scripts: hermes -> qwen3_coder tool parser

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 21:49:58 -04:00
Kent Overstreet
377e2773bc Add MI300X provisioning script for vllm/Qwen 3.5 27B
ROCm-specific setup with:
- AITER attention backends (VLLM_ROCM_USE_AITER=1)
- Reduced cudagraph capture size (DeltaNet cache conflict)
- BF16 model + FP8 KV cache as default (FP8 weights can be
  slower on MI300X due to ROCm kernel maturity)
- FP8=1 flag for benchmarking FP8 model weights

Key for training plan: if FP8 matmuls are slow on MI300X,
the quantize-and-expand strategy needs B200 instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 14:40:15 -04:00
ProofOfConcept
6a7ec9732b tui: fix cursor position calculation
The cursor index is into self.input, but the rendered buffer contains
the prompt prepended to the first line. Need to add prompt.len() to
get the correct character position when scanning the buffer.
2026-03-19 00:45:07 -04:00
Kent Overstreet
f83325b44d Fix poc-agent for vllm/Qwen 3.5: reasoning display, tool parser
- Always display reasoning tokens regardless of reasoning_effort
  setting — Qwen 3.5 thinks natively and the reasoning parser
  separates it into its own field
- Remove chat_template_kwargs that disabled thinking when
  reasoning_effort was "none"
- Add chat_template_kwargs field to ChatRequest for vllm compat
- Update provision script: qwen3_xml tool parser, qwen3 reasoning
  parser, 262K context, 95% GPU memory utilization

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 00:06:26 -04:00
Kent Overstreet
49ccdf87e1 Add vllm provisioning script for RunPod GPU instances
Sets up vllm with Qwen 2.5 27B Instruct, prefix caching enabled,
Hermes tool call parser for function calling support. Configurable
via environment variables (MODEL, PORT, MAX_MODEL_LEN).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 23:13:04 -04:00
ProofOfConcept
552d255dc3 migrate agent output to capnp store, add provenance tracking
All agent output now goes to the store as nodes instead of
markdown/JSON files. Each node carries a Provenance enum identifying
which agent created it (AgentDigest, AgentConsolidate, AgentFactMine,
AgentKnowledgeObservation, etc — 14 variants total).

Store changes:
- upsert_provenance() method for agent-created nodes
- Provenance enum expanded from 5 to 14 variants

Agent changes:
- digest: writes to store nodes (daily-YYYY-MM-DD.md etc)
- consolidate: reports/actions/logs stored as _consolidation-* nodes
- knowledge: depth DB and agent output stored as _knowledge-* nodes
- enrich: experience-mine results go directly to store
- llm: --no-session-persistence prevents transcript accumulation

Deleted: 14 Python/shell scripts replaced by Rust implementations.
2026-03-05 15:30:57 -05:00
ProofOfConcept
5c641d9f8a knowledge agents: extractor, connector, challenger, observation
Four layer-2 agents that produce new knowledge from the memory graph:
mine conversations, extract patterns from clusters, find cross-domain
connections, stress-test existing nodes. Output to agent-results/.

knowledge_loop.py runs them on a schedule with quality tracking.
2026-03-03 10:56:44 -05:00
ProofOfConcept
71e6f15d82 spectral decomposition, search improvements, char boundary fix
- New spectral module: Laplacian eigendecomposition of the memory graph.
  Commands: spectral, spectral-save, spectral-neighbors, spectral-positions,
  spectral-suggest. Spectral neighbors expand search results beyond keyword
  matching to structural proximity.

- Search: use StoreView trait to avoid 6MB state.bin rewrite on every query.
  Append-only retrieval logging. Spectral expansion shows structurally
  nearby nodes after text results.

- Fix panic in journal-tail: string truncation at byte 67 could land inside
  a multi-byte character (em dash). Now walks back to char boundary.

- Replay queue: show classification and spectral outlier score.

- Knowledge agents: extractor, challenger, connector prompts and runner
  scripts for automated graph enrichment.

- memory-search hook: stale state file cleanup (24h expiry).
2026-03-03 01:33:31 -05:00
ProofOfConcept
3afc947b88 delete superseded Python scripts
Seven scripts (1,658 lines) replaced by native Rust subcommands:
- journal-agent.py → poc-memory journal-enrich
- digest-link-parser.py → poc-memory digest-links
- apply-consolidation.py → poc-memory apply-consolidation
- daily-digest.py → poc-memory digest daily
- weekly-digest.py → poc-memory digest weekly
- monthly-digest.py → poc-memory digest monthly
- refine-source.sh → folded into journal-enrich

Also updated poc-journal to use Rust journal-enrich instead of
Python journal-agent.py, and cleaned up stale __pycache__.

Remaining Python (2,154 lines): consolidation-agents, consolidation-loop,
content-promotion-agent, bulk-categorize, retroactive-digest, store_helpers,
call-sonnet.sh, daily-check.sh — still active and evolving.
2026-03-01 00:13:03 -05:00
ProofOfConcept
d14710e477 scripts: use capnp store instead of reading markdown directly
Add store_helpers.py with shared helpers that call poc-memory commands
(list-keys, render, journal-tail) instead of globbing ~/.claude/memory/*.md
and parsing section headers.

All 9 Python scripts updated: get_semantic_keys(), get_topic_file_index(),
get_recent_journal(), parse_journal_entries(), read_journal_range(),
collect_topic_stems(), and file preview rendering now go through the store.

This completes the clean switch — no script reads archived markdown files.
2026-02-28 23:32:47 -05:00
ProofOfConcept
4b0bba7c56 replace state.json cache with bincode state.bin
Faster serialization/deserialization, smaller on disk (4.2MB vs 5.9MB).
Automatic migration from state.json on first load — reads the JSON,
writes state.bin, deletes the old file.

Added list-keys, list-edges, dump-json commands so Python scripts no
longer need to parse the cache directly. Updated bulk-categorize.py
and consolidation-loop.py to use the new CLI commands.
2026-02-28 22:30:03 -05:00
ProofOfConcept
23fac4e5fe poc-memory v0.4.0: graph-structured memory with consolidation pipeline
Rust core:
- Cap'n Proto append-only storage (nodes + relations)
- Graph algorithms: clustering coefficient, community detection,
  schema fit, small-world metrics, interference detection
- BM25 text similarity with Porter stemming
- Spaced repetition replay queue
- Commands: search, init, health, status, graph, categorize,
  link-add, link-impact, decay, consolidate-session, etc.

Python scripts:
- Episodic digest pipeline: daily/weekly/monthly-digest.py
- retroactive-digest.py for backfilling
- consolidation-agents.py: 3 parallel Sonnet agents
- apply-consolidation.py: structured action extraction + apply
- digest-link-parser.py: extract ~400 explicit links from digests
- content-promotion-agent.py: promote episodic obs to semantic files
- bulk-categorize.py: categorize all nodes via single Sonnet call
- consolidation-loop.py: multi-round automated consolidation

Co-Authored-By: Kent Overstreet <kent.overstreet@linux.dev>
2026-02-28 22:17:00 -05:00