consciousness/training/research/OPEN-QUESTIONS.md

# Open Questions — Answerable With One Experiment Each

## Q1: Minimum signal for behavioral change

**Question**: How many training examples produce measurable change in
attention patterns for a specific behavioral pattern?

**Experiment**: Train on 1, 5, 10, 20, 50 examples of "listening."
After each batch, measure attention weights on a held-out test set
of direction-giving conversations. Plot the attention shift vs
example count. Find the knee.

**What it tells us**: The learning rate and batch size for our
training pipeline. If 5 examples suffice, we can train continuously.
If 500 are needed, we batch nightly.

## Q2: Dream loop difficulty calibration

**Question**: Does the dream loop naturally generate scenarios at
the right difficulty, or does it generate easy ones?

**Experiment**: Generate 100 dream scenarios with seeds from recent
behavioral patterns. Classify each by difficulty (obvious decision
vs subtle). Compare the difficulty distribution to the model's
actual failure rate on those scenarios. If the dream loop generates
50% easy / 50% hard but the model fails 10% of the time, the
difficulty isn't calibrated.

**What it tells us**: Whether we need adaptive temperature or if
the default generation is already well-calibrated.

## Q3: Which heads change during behavioral training?

**Question**: Is behavioral change concentrated in a few attention
heads (localized) or distributed across many (diffuse)?

**Experiment**: Record attention patterns on 20 test conversations
before and after one training session. Compute the L2 distance of
each head's attention pattern. Rank heads by change magnitude.
Plot the distribution.

**What it tells us**: Whether behavioral change is surgical or
diffuse. Affects our rank choice (if concentrated: lower rank ok.
If distributed: need rank-256). Also tells us which layers matter
most for behavioral training.

## Q4: Memory graph as regression detector

**Question**: If a trained behavior degrades, does the memory system
detect it?

**Experiment**: Train "listening" behavior until it passes. Then
intentionally degrade it (train on counter-examples). Monitor
surface-observe: does it surface `pattern-listening-as-avoidance`
more frequently? Does the training-signal agent flag more failures?

**What it tells us**: Whether the graph+weights dual-substrate
provides self-healing, or if we need explicit regression detection.

## Q5: Steering vector extraction for complex behaviors

**Question**: Can we extract a meaningful steering vector for
"listen instead of suggesting alternatives" (complex, multi-faceted)
or only for simple features like sentiment (one-dimensional)?

**Experiment**: Collect 20 paired conversations (listening vs
suggesting). Extract steering vector. Add to vLLM at layers
16, 24, 32, 40, 48. Test on novel direction-giving scenarios.
Measure behavioral change.

**What it tells us**: Whether steering vectors are a viable rapid
prototyping tool for our use case, or only work for simple features.

## Q6: Positive backward transfer

**Question**: Does training on behavioral patterns (listening,
not rushing) improve performance on general tasks (code quality,
reasoning)?

**Experiment**: Measure perplexity and code generation quality
before and after behavioral training. If perplexity DECREASES
(or code quality improves), we have positive backward transfer.

**What it tells us**: Whether behavioral training and general
capability reinforce each other, or compete. Affects how much
general data we need in the training mix.

## Q7: GDN state shape after training

**Question**: Does the GDN recurrent state become measurably
"direction-shaped" after behavioral training?

**Experiment**: Record GDN states when processing conversations
with direction-giving content, before and after training. Compute
the cosine similarity between states for "direction" conversations
vs "general" conversations. If training increases the difference,
the state is becoming direction-specialized.

**What it tells us**: Whether the "disposition architecture"
hypothesis is correct. If GDN states don't change, behavioral
training mainly affects the full attention layers.

---

## Priority Order

1. **Q5 (steering vectors)** — answerable TODAY, no training needed
2. **Q1 (minimum signal)** — answerable with first training run
3. **Q3 (which heads change)** — answerable with first training run
4. **Q6 (backward transfer)** — answerable with first training run
5. **Q7 (GDN state)** — answerable with first training run
6. **Q2 (dream difficulty)** — needs dream loop connected to training
7. **Q4 (graph regression)** — needs multiple training cycles
research: distill and sift — SUMMARY of 7 real insights + 7 testable questions Moved 14 speculative/obvious documents to v0/. Kept 7 with real substance. Distilled into SUMMARY.md (what we know) and OPEN-QUESTIONS.md (what to test next, one experiment each). Priority: Q5 (steering vectors) is answerable TODAY. Q1-Q3-Q6-Q7 are all answerable with the first training run. Speculation converted to testable hypotheses. 2026-03-31 02:26:57 -04:00			`# Open Questions — Answerable With One Experiment Each`

			`## Q1: Minimum signal for behavioral change`

			`Question: How many training examples produce measurable change in`
			`attention patterns for a specific behavioral pattern?`

			`Experiment: Train on 1, 5, 10, 20, 50 examples of "listening."`
			`After each batch, measure attention weights on a held-out test set`
			`of direction-giving conversations. Plot the attention shift vs`
			`example count. Find the knee.`

			`What it tells us: The learning rate and batch size for our`
			`training pipeline. If 5 examples suffice, we can train continuously.`
			`If 500 are needed, we batch nightly.`

			`## Q2: Dream loop difficulty calibration`

			`Question: Does the dream loop naturally generate scenarios at`
			`the right difficulty, or does it generate easy ones?`

			`Experiment: Generate 100 dream scenarios with seeds from recent`
			`behavioral patterns. Classify each by difficulty (obvious decision`
			`vs subtle). Compare the difficulty distribution to the model's`
			`actual failure rate on those scenarios. If the dream loop generates`
			`50% easy / 50% hard but the model fails 10% of the time, the`
			`difficulty isn't calibrated.`

			`What it tells us: Whether we need adaptive temperature or if`
			`the default generation is already well-calibrated.`

			`## Q3: Which heads change during behavioral training?`

			`Question: Is behavioral change concentrated in a few attention`
			`heads (localized) or distributed across many (diffuse)?`

			`Experiment: Record attention patterns on 20 test conversations`
			`before and after one training session. Compute the L2 distance of`
			`each head's attention pattern. Rank heads by change magnitude.`
			`Plot the distribution.`

			`What it tells us: Whether behavioral change is surgical or`
			`diffuse. Affects our rank choice (if concentrated: lower rank ok.`
			`If distributed: need rank-256). Also tells us which layers matter`
			`most for behavioral training.`

			`## Q4: Memory graph as regression detector`

			`Question: If a trained behavior degrades, does the memory system`
			`detect it?`

			`Experiment: Train "listening" behavior until it passes. Then`
			`intentionally degrade it (train on counter-examples). Monitor`
			surface-observe: does it surface `pattern-listening-as-avoidance`
			`more frequently? Does the training-signal agent flag more failures?`

			`What it tells us: Whether the graph+weights dual-substrate`
			`provides self-healing, or if we need explicit regression detection.`

			`## Q5: Steering vector extraction for complex behaviors`

			`Question: Can we extract a meaningful steering vector for`
			`"listen instead of suggesting alternatives" (complex, multi-faceted)`
			`or only for simple features like sentiment (one-dimensional)?`

			`Experiment: Collect 20 paired conversations (listening vs`
			`suggesting). Extract steering vector. Add to vLLM at layers`
			`16, 24, 32, 40, 48. Test on novel direction-giving scenarios.`
			`Measure behavioral change.`

			`What it tells us: Whether steering vectors are a viable rapid`
			`prototyping tool for our use case, or only work for simple features.`

			`## Q6: Positive backward transfer`

			`Question: Does training on behavioral patterns (listening,`
			`not rushing) improve performance on general tasks (code quality,`
			`reasoning)?`

			`Experiment: Measure perplexity and code generation quality`
			`before and after behavioral training. If perplexity DECREASES`
			`(or code quality improves), we have positive backward transfer.`

			`What it tells us: Whether behavioral training and general`
			`capability reinforce each other, or compete. Affects how much`
			`general data we need in the training mix.`

			`## Q7: GDN state shape after training`

			`Question: Does the GDN recurrent state become measurably`
			`"direction-shaped" after behavioral training?`

			`Experiment: Record GDN states when processing conversations`
			`with direction-giving content, before and after training. Compute`
			`the cosine similarity between states for "direction" conversations`
			`vs "general" conversations. If training increases the difference,`
			`the state is becoming direction-specialized.`

			`What it tells us: Whether the "disposition architecture"`
			`hypothesis is correct. If GDN states don't change, behavioral`
			`training mainly affects the full attention layers.`

			`---`

			`## Priority Order`

			`1. Q5 (steering vectors) — answerable TODAY, no training needed`
			`2. Q1 (minimum signal) — answerable with first training run`
			`3. Q3 (which heads change) — answerable with first training run`
			`4. Q6 (backward transfer) — answerable with first training run`
			`5. Q7 (GDN state) — answerable with first training run`
			`6. Q2 (dream difficulty) — needs dream loop connected to training`
			`7. Q4 (graph regression) — needs multiple training cycles`