consciousness

spqrz/consciousness

Fork 0

forked from kent/consciousness

Commit graph

Author	SHA1	Message	Date
Kent Overstreet	68a2df2185	training: use rank 64, define as single constant - DEFAULT_RANK = 64 in train_router.py - All references use the constant, not magic numbers - ~2.5GB optimizer state instead of ~10GB Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 02:04:26 -04:00
Kent Overstreet	039473d31f	training: persist Apollo optimizer state across /train calls Optimizer state (momentum, variance estimates) now persists between training sessions: - Saved to /tmp/apollo_optimizer_state.pt during checkpoint sync - Restored on next /train call if available - Preserves training continuity for incremental learning Previously each /train call started with fresh optimizer state, losing accumulated gradient history. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 02:04:26 -04:00
Kent Overstreet	7e7e9a4b69	training: integrate /train into vLLM process (no separate daemon) Remove standalone worker.py daemon. Training now runs inside vLLM: - train_router.py: FastAPI router patched into vLLM's build_app() - /train served on same port as /completions, /score - Lazy-loads HF model with vLLM weight views on first request - HOGWILD training: no pause, weights updated in-place The previous architecture had a separate daemon on port 8080 that communicated with vLLM via pause/resume endpoints. This was wrong - training should run in-process, sharing GPU memory directly. Co-Authored-By: Proof of Concept <poc@bcachefs.org>	2026-04-16 02:04:26 -04:00

Author

SHA1

Message

Date

Kent Overstreet

68a2df2185

training: use rank 64, define as single constant

- DEFAULT_RANK = 64 in train_router.py
- All references use the constant, not magic numbers
- ~2.5GB optimizer state instead of ~10GB

Co-Authored-By: Proof of Concept <poc@bcachefs.org>

2026-04-16 02:04:26 -04:00

Kent Overstreet

039473d31f

training: persist Apollo optimizer state across /train calls

Optimizer state (momentum, variance estimates) now persists between
training sessions:

- Saved to /tmp/apollo_optimizer_state.pt during checkpoint sync
- Restored on next /train call if available
- Preserves training continuity for incremental learning

Previously each /train call started with fresh optimizer state,
losing accumulated gradient history.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>

2026-04-16 02:04:26 -04:00

Kent Overstreet

7e7e9a4b69

training: integrate /train into vLLM process (no separate daemon)

Remove standalone worker.py daemon. Training now runs inside vLLM:

- train_router.py: FastAPI router patched into vLLM's build_app()
- /train served on same port as /completions, /score
- Lazy-loads HF model with vLLM weight views on first request
- HOGWILD training: no pause, weights updated in-place

The previous architecture had a separate daemon on port 8080 that
communicated with vLLM via pause/resume endpoints. This was wrong -
training should run in-process, sharing GPU memory directly.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>

2026-04-16 02:04:26 -04:00

3 commits