DOC23_FAMILY_SELF_LEARNING_COHERENCE_MAP_V1.md

Current Specs/DOC23/DOC23_FAMILY_SELF_LEARNING_COHERENCE_MAP_V1.md
Short text page 5b3cba1872e7. Generated 2026-06-09T01:23:58.539Z from commit dbaa25962edc11ab30e8d4ca1715f9ae5bf77331. Worktree: clean.
Open readable HTML page · Open raw txt · Open path URL
ELNOR REPO READER TEXT MIRROR
Original path: Current Specs/DOC23/DOC23_FAMILY_SELF_LEARNING_COHERENCE_MAP_V1.md
Source repo: /Users/OpenClaw1/Elnor/Elnor Specs
Git branch: main
Git commit: dbaa25962edc11ab30e8d4ca1715f9ae5bf77331
Generated: 2026-06-09T01:23:58.539Z

---

# DOC23 Family Self-Learning Coherence Map V1

**Document type:** Scoping document (hybrid scope + spec references; not a final spec)

**Scope:** All self-learning and self-improvement loops in the DOC23 family (Addenda A R4.1 V6 + Addenda B Core R0.7 + Outcome Evaluator+Revisor V3.3 + Evaluation Common Contracts V1.1 + the three Addenda B sub-addenda V1). Covers Compiler, Evaluator, Revisor, Judge, Experiment, Claim Extractor, Task Agent, sub-agent dispatch. Does NOT cover system-wide self-diagnosis (Piece 3, separate work).

**Discipline:** Every loop is traced through its full lifecycle: trigger → emitter → schema → envelope → persistence → policy gate → consumer → utility compilation → downstream consumer → action surface. Where any step is unspecified or vague in the current spec set, it is explicitly flagged as a gap rather than assumed.

**Status:** V1 scoping. Once reviewed, becomes input to a proper spec (DOC23 Family Self-Learning Architecture R1) that revises Addenda A and Addenda B coordinately.

**Date:** 2026-05-19

**Source documents consulted:**
- DOC23 Addenda B Core R0.7 (2026-05-17)
- DOC23 Addenda B / Outcome Evaluator+Revisor V3.3 (2026-05-17)
- DOC23 Evaluation Common Contracts V1.1 (2026-05-17)
- DOC23 Addenda A / Task Optimization R4.1 V6 (2026-05-17)
- ELNOR Sub-Agent Architecture V5.2 (audited, 2026-05-19)
- OP-A V3.16 (2026-05-19)

---

## §1 Executive summary

The DOC23 family has a partially-specified self-learning surface. Some loops are complete and operative. Many are partial — emitter is specified but consumer or downstream action is not. A few are entirely missing despite being mentioned as "auto-learning" features. One (goal-advancement learning) is specified but architecturally wrong and produces no useful signal in practice.

**What works (fully traceable):**
- Pattern performance slice updates for the non-goal axes (`usage_count`, `convergence_count`, `failure_count`, `regression_count`, `user_override_count`, `contested_finding_count`, `rollback_count`)
- Sub-agent reputation tracking
- TaintClearanceSignal end-to-end
- HardCallResolutionSignal end-to-end

**What's partial (emitter specified, consumer or action unspecified):**
- PromptComparisonSignal (Experiment emits to BDSM; BDSM consumption rules undefined)
- TaskAgentProposalEditTrace (consumption defined as obligation; schema, diff format, and action path undefined)
- TaskContextFeedbackEvent (R0.6.4-origin; consumer and aggregation undefined)
- RepairCycleSignal (emitter and envelope defined; downstream action surface undefined)
- OutcomeEvaluationSignal (same — Core-owned but downstream undefined)
- TaskInvocationUtility / TaskSuggestionFeedback / TaskAgentDesignUtility (defined as consumption obligations; full lifecycle not specified)

**What's missing entirely:**
- OutcomeCompilerProposalEditTrace (highest-leverage signal in entire pipeline; doesn't exist)
- ClaimExtractorProposalEditTrace (analogous gap)
- JudgeCriteriaEditTrace (analogous gap)
- ExperimentDesignEditTrace (analogous gap)
- TaskAgentChatDuringSetupSignal (chat interactions during task setup; only post-proposal edits captured)
- Outcome cluster emergence in DOC72 (proposed replacement for goal-advancement axis)
- Multi-prior coordination policy in module prompt assembly (DOC15 CIL contract)
- Optional rationale/comment field on accept/reject feedback signals (panel feedback, task suggestion feedback, direct instruction signals partially have it)
- DSPy training data architecture per registered target (R5-reserved but no scoping work yet)

**What's wrong (currently specified but architecturally broken):**
- `goal_advancement_count` axis on PatternPerformanceSlice (matter-specific, sycophancy-attack-surface, requires manual user wiring of comparative-judge evaluator that no UI exposes; produces no useful signal in practice; proposed replacement: outcome cluster axis)

**What this map proposes (high-level):**
1. Replace `goal_advancement_count` axis with `outcome_cluster_id` axis; drive cluster emergence from DOC72 nightly job; pattern effectiveness measured per emergent cluster (not per matter-specific goal)
2. Add five missing edit-trace signals (Outcome Compiler, Claim Extractor, Judge, Experiment, Task Agent chat-during-setup)
3. Standardize optional rationale field on all accept/reject/edit feedback signals (revealed-preference + stated-preference)
4. Specify full lifecycle for the seven partial loops (consumer rules, utility compilation, action surfaces)
5. Specify multi-prior coordination policy in module prompt assembly
6. Specify DSPy training data per target (one input among several per target; configurable composite metric)
7. Remove comparative-judge requirement from Revisor self-grading; sycophancy fix becomes "no self-graded learning axis at all" rather than "self-graded with manual wiring requirement"

---

## §2 Learning surface inventory

### §2.1 Pattern performance slice axes (the central scorecard)

PatternPerformanceSlice is defined in V3.3 §13.3. Each slice is keyed by `context_signature` and tracks counters for a (pattern, context) pair. The context signature already includes `domain_tags`, `artifact_kind`, `failure_kind`, `risk_class`, `assurance_basis`, `privilege_class`, and (V3.2 addition) `model_class`.

**Axes that auto-increment (working):**
| Counter | Increment trigger | Source spec |
|---|---|---|
| `usage_count` | Pattern applied in a revision | V3.3 §13.3 |
| `convergence_count` | Targeted outcomes transitioned to satisfied | V3.3 §13.3 |
| `failure_count` | Outcomes still failed after pattern application | V3.3 §13.3 |
| `regression_count` | Pattern caused regression in other outcomes | V3.3 §13.3 |
| `user_override_count` | User rejected pattern's suggestion | V3.3 §13.3 |
| `contested_finding_count` | Findings produced were disputed | V3.3 §13.3 |
| `rollback_count` | Rollback applied after pattern use | V3.3 §13.3 |

**Axes that require manual wiring (broken):**
| Counter | Increment trigger | Status |
|---|---|---|
| `goal_advancement_count` | Requires user to wire `step.evaluator` with `AssuranceBasis: comparative_judge`; no UI for DOC72 goal declaration exists | **PROPOSED FOR REMOVAL** |
| `goal_regression_count` | Same wiring requirement | **PROPOSED FOR REMOVAL** |

**Axis proposed for addition:**
| Counter | Increment trigger | Status |
|---|---|---|
| `outcome_cluster_id` field on context_signature | DOC72 nightly clustering job assigns emergent cluster ID to each OutcomeSpec | **NEW — see §4.1** |

### §2.2 Core-owned signal payloads (Addenda B Core R0.7 §9.0)

Five payloads specified as Core-owned, wrapped in Common Contracts §5.1 `EvaluationLearningSignalEnvelope`:

| Signal | Emitter | Status | Where consumer specified |
|---|---|---|---|
| OutcomeEvaluationSignal | Evaluator (per outcome resolution) | Schema specified; consumer rules **partial** | `OBL-XDOC-BDSM-CONSUME-SIGNALS-01` (consumption obligation only; action surface not specified) |
| RepairCycleSignal | Revisor (per revision cycle) | Schema specified; consumer rules **partial** | Same — obligation level only |
| TaskProcessGapSignal | Runtime (per process gap detection) | Schema specified; consumer rules **partial** | Same |
| TaintClearanceSignal | V3.2 Revisor or user-action surfaces (per clearance event) | **Fully traced** | Core R0.7 §9.0.4; consumer learning patterns defined |
| HardCallResolutionSignal | HardCall resolution flow | **Fully traced** | Core R0.7 §9; consumer rules defined |

### §2.3 Addenda B module-specific signals

| Signal | Emitter | OP-A row | Lifecycle status |
|---|---|---|---|
| TaskAgentProposalEditTrace | Task Agent | `OBL-D8-TASK-AGENT-PROPOSAL-EDIT-TRACE-01` | **Partial** — schema for the diff not specified; what gets diffed (full proposal vs subset) unclear |
| TaskAgentDesignUtility | Task Agent design surface | `OBL-D8-TASK-AGENT-DESIGN-UTILITY-01` | **Partial** — what gets measured per session not specified |
| TaskAgentPanelFeedback | Task Agent panel UI | `OBL-D8-TASK-AGENT-PANEL-FEEDBACK-01` | **Partial** — thumbs/accept/reject specified; optional rationale field NOT specified (see §4.3) |
| TaskInvocationUtility | DOC24 packet machinery | `OBL-D8-TASK-INVOCATION-UTILITY-01` | **Partial** — task_invocation_kind enum specified; outcome metric schema not specified |
| TaskSuggestionFeedback | DOC24 packet (TaskInvocationDirective cards) | `OBL-D8-TASK-SUGGESTION-FEEDBACK-01` | **Partial** — accept/reject/snooze/ignore specified; optional rationale NOT specified (see §4.3) |
| TaskContextFeedbackEvent | DOC24 (per packet inclusion/exclusion) | `OBL-D8-TASK-CONTEXT-FEEDBACK-EVENT-01` (R0.6.4-origin) | **Underspecified** — full event schema and emission triggers not in current spec set |
| ArtifactUtilitySignals | DOC20 user actions on outputs | `OBL-D8-ARTIFACT-UTILITY-SIGNALS-01` (R0.6.4-origin) | **Partial** — action kinds enumerated; signal schema not specified |
| RunForkFollowupUtility | Run fork outcomes | `OBL-D8-RUN-FORK-FOLLOWUP-UTILITY-01` (R0.6.4-origin) | **Partial** — fork outcome trigger specified; correlation with original run unclear |
| TaskSegmentReuseSignals | Task Segment instantiation | `OBL-D8-TASK-SEGMENT-REUSE-SIGNALS-01` (R0.6.4-origin) | **Partial** — reuse trigger specified; duplicate-detection signal mechanism unclear |
| PromptEditEvalSignals | Prompt Advisor service + prompt editor UI + PromptEvaluationTask runs | `OBL-D8-PROMPT-EDIT-EVAL-SIGNALS-01` | **Partial** — three sources mentioned; payload distinction between them unclear |

### §2.4 Addenda A module-specific signals

| Signal | Emitter | Spec ref | Lifecycle status |
|---|---|---|---|
| PromptComparisonSignal | Experiment module (post-completion, ≥1 downstream evaluator produced EvaluationResultEnvelope for ≥1 variant) | Addenda A V6 §A2.7 + V5 R221 + `OBL-XDOC-PROMPT-COMPARISON-SIGNAL-01` | **Partial** — emission triggers fully specified; BDSM consumption rules and action surface NOT specified |
| Evaluation-bundle exports | Judge module output | Addenda A V6 — Judge results land in `EvaluationResultEnvelope` per V3.3 §10 | **Adjacent to learning, not a learning signal** — Judge results are evaluation output, but Judge does not emit a dedicated learning signal of its own |
| Claim Extractor output bundles | Claim Extractor module | Addenda A V6 §A6 | **No learning signal at module level** — extraction quality is observed via downstream Evaluator feedback only |

### §2.5 BDSM consumption obligations

| Obligation | What's specified | What's not |
|---|---|---|
| `OBL-XDOC-BDSM-CONSUME-SIGNALS-01` | BDSM consumes unified signal stream; discriminates by `signal_type`; produces utility bundles consumed by DOC72 Pattern primitive store; threshold-gates pattern surfacing via PatternSurfacingThreshold; emits aggregate `TaskDesignCorrelationSignal` | Per-signal compilation rules (which fields aggregate, what derived metrics); threshold values; correlation signal payload schema |
| `OBL-D8-TASK-SUPPRESSION-BOOST-POLICIES-01` | BDSM compiles suppression/boost policies per context class | Policy output schema; consumption mechanism by DOC24 top-k ranking |

### §2.6 DOC72 consumption obligations

| Obligation | What's specified | What's not |
|---|---|---|
| `OBL-D72-TASKRUN-EXECUTION-TRACE-01` | Task runs stored as execution_trace hubs with eight edge kinds | Storage timing (synchronous on run completion vs nightly batch); query patterns for pattern emergence |
| `OBL-D72-TASK-DESIGN-PATTERN-STORE-01` | Semantic projections of templates/presets stored as searchable cards | Projection generation trigger; refresh policy; emergence into Pattern primitive |
| `OBL-D72-DESIGN-CASEBOOK-01` | Casebook of prior-task rationale linked to design decisions/outcomes | Ingestion trigger; user-notes integration mechanism |
| Pattern primitive emergence (V3.3 §13.5) | Patterns emerge from accumulated CompiledRevisionStrategy successes | Specific emergence threshold; clustering algorithm; cross-validation against context_signature |

### §2.7 DOC24 in-session prior consumption

| Obligation | Status |
|---|---|
| `OBL-D24-BDSM-UTILITY-BUNDLE-CONSUMPTION-01` | DOC24 reads utility bundles for top-k ranking; **bundle schema and reading mechanism not specified in DOC24 R3.1.1+** |
| `OBL-D24-TOPK-INJECTION-01` | Top-k injection mechanics specified; **how bundles drive ranking (algorithm, freshness, fallback) not specified** |
| `OBL-D24-TASK-DESIGN-INTELLIGENCE-CARD-RENDERING-01` | Pattern-derived design hints render in Task Agent design surface; **pattern matching mechanism unspecified** |

Multi-prior coordination — when several priors want to inject at once — is **not specified anywhere in current spec set**. See §4.2 for proposed mechanism.

### §2.8 DSPy target registrations (R5-gated)

Per `OBL-XDOC-PROPA-DSPY-TARGETS-01` (V3.10) + Addenda A V6 §A4 + V5 R225:

| Target ID | Module | Spec ref | Training data sources |
|---|---|---|---|
| `claim_extractor_main` | Addenda A Claim Extractor | Addenda A V6 §A4 | **Not specified** |
| `outcome_evaluator_main` | Addenda B Evaluator | Module System Prompts proposal (deferred) | **Not specified** |
| `revision_compiler_main` | Addenda B Revisor (compile stage) | Module System Prompts proposal | **Not specified** |
| `outcome_compiler_main` | Addenda B Outcome Compiler | Module System Prompts proposal | **Not specified** |

All four targets reserved until R5 substrate ships (`OBL-D23-A-DSPY-GEPA-R5-GATE-01`). No training data architecture exists for any target. See §4.4.

### §2.9 EC Core policy gates

| Obligation | What's specified | Lifecycle status |
|---|---|---|
| `OBL-XDOC-EC-POLICY-SIGNALS-01` | EC Core gates signal envelope persistence by data_class, matter_id firewall, pattern_promotion_eligibility; cost governance per learning_mode | **Specified at obligation level** — concrete gate implementation per signal type not specified |
| `OBL-EC-NO-HIDDEN-GRAPH-RUNS-01` | Visible record requirement for every graph run | Specified; relevant to learning because it bounds what can be invisibly used for learning |

### §2.10 Surface obligations (DOC20 — where users see/edit/correct)

| Obligation | Relevant to which loop |
|---|---|
| `OBL-D20-TASK-AGENT-PANELS-01` | Five Task Agent panels (task editor, Run Inspector, task list, templates/presets, prompt editor) — surfaces for proposal edits, panel feedback, chat interactions |
| `OBL-D20-TASK-OPPORTUNITY-CHIPS-01` | Task suggestion accept/reject surfaces |
| `OBL-D20-ARTIFACT-CONTEXT-MENU-01` | Action surfaces feeding ArtifactUtilitySignals |
| `OBL-D20-TKP-READINESS-DRIFT-01` | TKP drift visibility (related to system-level health, not directly learning) |

**Missing surfaces:**
- Optional comment box on thumbs-up/down feedback (see §4.3)
- Outcome Compiler proposal-vs-accepted diff view + accept/edit with rationale
- Claim Extractor proposal-vs-accepted diff view
- Judge criteria edit interface with diff capture
- Experiment design edit interface with diff capture

---

## §3 Lifecycle traces per loop

This section traces the full lifecycle of each loop. Only loops with significant gaps or proposals get a full trace; loops marked "fully traced" in §2 above are not repeated here.

### §3.1 OutcomeEvaluationSignal (PARTIAL)

| Step | Spec | Status |
|---|---|---|
| 1. Trigger | When Evaluator resolves an outcome (state transition to `satisfied`, `unsatisfied`, or `indeterminate`) | Core R0.7 §9.0; specified |
| 2. Emitter | `step.evaluator` runtime | V3.3 §5; specified |
| 3. Payload schema | `OutcomeEvaluationSignal` | Core R0.7 §9.0; specified |
| 4. Envelope | `EvaluationLearningSignalEnvelope` (Common Contracts §5.1) | Common Contracts V1.1 §5.1; specified |
| 5. Persistence | Via EC; `data_class` enforced per envelope | `OBL-XDOC-EC-POLICY-SIGNALS-01`; specified |
| 6. EC policy gate | data_class / matter_id firewall / pattern_promotion_eligible | `OBL-XDOC-EC-POLICY-SIGNALS-01`; specified |
| 7. Consumer | BDSM via `OBL-XDOC-BDSM-CONSUME-SIGNALS-01` | Specified as obligation |
| 8. Utility compilation | **NOT SPECIFIED** — what BDSM computes from accumulated OutcomeEvaluationSignal stream | **GAP** |
| 9. Downstream consumer | DOC72 Pattern primitive store via threshold-gated surfacing | `OBL-XDOC-BDSM-CONSUME-SIGNALS-01`; obligation only |
| 10. Action surface | **NOT SPECIFIED** — how aggregated outcome-evaluation patterns affect user-visible behavior | **GAP** |

**Gap summary:** Steps 8 and 10 unspecified. Signal is emitted, persisted, consumed (in principle), but what BDSM computes and where the compiled output drives user-visible behavior is undefined.

**Spec changes needed:**
- Specify utility compilation rules per signal type (which fields aggregate; what derived metrics emerge)
- Specify action surfaces per compiled utility (Pattern primitive emergence threshold; Pattern slice updates; DOC24 in-session priors)

### §3.2 RepairCycleSignal (PARTIAL)

Same lifecycle gaps as OutcomeEvaluationSignal. Step 1: emitted by Revisor per revision cycle. Steps 8 and 10 unspecified.

**Specific to this signal:** payload includes `taint_evolution` and `qualitative_delta` per Core R0.7 §9.0 (V3.2 expansion). What BDSM does with taint_evolution patterns specifically (cross-cycle taint propagation learning) is not specified — this is potentially high-leverage because it can drive privilege firewall tuning.

### §3.3 PromptComparisonSignal (PARTIAL — Addenda A side)

| Step | Spec | Status |
|---|---|---|
| 1. Trigger | Experiment activation completes AND ≥1 downstream evaluator produced EvaluationResultEnvelope for ≥1 variant | Addenda A V6 §A2.7; specified |
| 2. Emitter | `utility.experiment` | Addenda A V6 §A2.7; specified |
| 3. Payload schema | `PromptComparisonSignal` with `task_design_signature` when applicable | Addenda A V6 §A2.7; specified |
| 4. Envelope | `EvaluationLearningSignalEnvelope` | Addenda A V6 §A2.7; specified |
| 5. Persistence | Via EC | Implied by envelope; not explicitly traced |
| 6. EC policy gate | data_class | Implied by envelope; not explicitly traced |
| 7. Consumer | DOC8/BDSM | `OBL-XDOC-PROMPT-COMPARISON-SIGNAL-01`; obligation level |
| 8. Utility compilation | **NOT SPECIFIED** — what BDSM computes from prompt-comparison patterns | **GAP** |
| 9. Downstream consumer | Presumably DOC72 Pattern primitive store for prompt-effectiveness patterns; presumably DSPy training data when R5 ships | **GAP** — not explicitly specified |
| 10. Action surface | Eventually DSPy retraining; meanwhile no current-day action surface | **GAP for current-day; specified-by-implication for R5** |

**Gap summary:** Once R5 ships, this signal becomes one of the most important training data sources. Until R5 ships, the signal accumulates without consumer-side action. The spec doesn't say what (if anything) happens to accumulated PromptComparisonSignal data pre-R5 — does it just sit in EC, awaiting R5? Is there an interim consumer (e.g., quality dashboards)?

**Spec changes needed:**
- Specify pre-R5 utility (quality dashboards? Pattern primitive ingestion? cold storage?)
- Specify post-R5 DSPy training-data extraction from PromptComparisonSignal accumulation

### §3.4 TaskAgentProposalEditTrace (PARTIAL)

| Step | Spec | Status |
|---|---|---|
| 1. Trigger | User accepts/edits a Task Agent proposal (the moment of acceptance is the diff capture point) | Implied by obligation; not explicitly specified |
| 2. Emitter | Task Agent | `OBL-D8-TASK-AGENT-PROPOSAL-EDIT-TRACE-01`; specified |
| 3. Payload schema | "Diff between proposed task design and accepted/edited version" | **GAP** — no formal schema; what exactly gets diffed (full proposal, structural fields only, etc.) is unspecified |
| 4. Envelope | Presumably EvaluationLearningSignalEnvelope but not explicit | **GAP** |
| 5. Persistence | Via EC | Implied; not explicit |
| 6. EC policy gate | data_class (proposals may contain matter-sensitive content) | Implied |
| 7. Consumer | BDSM | Specified |
| 8. Utility compilation | "Aggregation per proposal element; signals feed DSPy training data when Task Agent prompt is a DSPy target" | Specified at high level; specific aggregation rules unspecified |
| 9. Downstream consumer | DSPy training (R5-gated); Pattern emergence; DOC24 in-session priors | **GAP** — last two not specified |
| 10. Action surface | Eventually improved Task Agent proposals via DSPy; meanwhile no current-day action surface | **GAP for current-day** |

**Gap summary:** Schema for the diff is the biggest gap. "Diff between proposed and accepted" is ambiguous: text-level diff, structural-field diff, semantic diff, all of the above? Different choices have different signal value. Also: should the user be prompted for an optional rationale when editing the proposal (stated-preference signal in addition to revealed-preference)? Currently no.

**Spec changes needed:**
- Specify diff schema: structural-field diffs (changed AssuranceBasis values, added/removed lanes, changed module configs) + optional text-level diffs for free-text fields
- Add optional rationale field on user acceptance (see §4.3)
- Specify Pattern emergence rules (when does an accumulated edit-trace pattern become a candidate pattern?)
- Specify DOC24 in-session prior consumption (when recent edit-trace clusters bias Task Agent's next proposal)

### §3.5 TaskAgentPanelFeedback (PARTIAL)

| Step | Spec | Status |
|---|---|---|
| 1. Trigger | User clicks thumbs-up/thumbs-down on a Task Agent response, accepts/rejects suggestion, etc. | `OBL-D8-TASK-AGENT-PANEL-FEEDBACK-01`; specified at trigger level |
| 2. Emitter | DOC20 Task Agent panel UI | Specified |
| 3. Payload schema | "thumbs up/down, accept/reject, success/failure, module-followup utility" | **GAP** — no formal schema; **no optional rationale field** |
| 4. Envelope | Presumably EvaluationLearningSignalEnvelope | Implied; not explicit |
| 5. Persistence | Via EC | Implied |
| 6. EC policy gate | data_class | Implied |
| 7. Consumer | BDSM | Specified |
| 8. Utility compilation | Aggregation per Task Agent entrypoint | Specified at high level |
| 9. Downstream consumer | Task Agent prompt optimization (R5 DSPy); panel quality dashboards | Implied |
| 10. Action surface | Eventually improved Task Agent responses via DSPy | Implied |

**Gap summary:** Will's explicit observation — no optional rationale/comment field. Revealed-preference signal (the user did/didn't accept) is one bit per interaction. Stated-preference signal (with reason: "wrong tone," "missed key step," "irrelevant to my case type") multiplies the value by an order of magnitude. The spec implicitly allows free-text via standard UI patterns but doesn't formalize it as a payload field.

**Spec changes needed:**
- Add optional `rationale: string` field to feedback payload schema
- Add optional `correction_hint: string` field (different from rationale — what should have happened instead)
- Specify UI surface: comment box appears on negative feedback by default, optional on positive
- Specify utility compilation: BDSM treats rationale field as semantic clustering input (cluster similar rationales over time → emerges as candidate pattern for "users frequently complain about X")

### §3.6 TaskSuggestionFeedback (PARTIAL — same rationale gap)

Same gap as Panel Feedback. Accept/reject/snooze/ignore on TaskInvocationDirective cards is revealed-preference only. No optional rationale.

**Spec changes needed:** Same fix as Panel Feedback. Plus a specific consideration — for `snooze`, the rationale is implicitly "not now"; for `ignore`, it's "not interested"; for `reject`, the user may want to say why. Distinguish optional-rationale per action kind.

### §3.7 TaskContextFeedbackEvent (UNDERSPECIFIED — R0.6.4-origin)

| Step | Spec | Status |
|---|---|---|
| 1. Trigger | DOC24 packet inclusion/exclusion event | Specified at obligation level |
| 2. Emitter | DOC24 | Specified |
| 3. Payload schema | "What was injected vs. excluded and the user's downstream behavior (used / ignored / promoted / suppressed)" | **GAP** — full schema not in current spec set |
| 4-10 | All inherit gaps from underspecified schema | **GAP** |

**Gap summary:** This signal is potentially high-value (it directly measures DOC24 packet assembly quality), but its R0.6.4 lineage means the schema details exist only in R0.6.4 which has been superseded. The obligation was carried forward to V3.14 but the schema was not re-derived.

**Spec changes needed:**
- Re-derive full schema in the new Self-Learning Architecture R1 spec
- Specify "downstream behavior" measurement: how does the system know whether injected content was "used"? Token-level attention tracking is impractical; need a behavioral proxy (e.g., did the module output reference the injected content; did the user accept the module output without editing the parts that relied on the injection)

### §3.8 ArtifactUtilitySignals (PARTIAL — R0.6.4-origin)

| Step | Spec | Status |
|---|---|---|
| 1. Trigger | User action on module output (open, save, promote to artifact, share, delete) | Specified |
| 2-4 | Schema implied by action_kind enum; envelope not explicit | **GAP** |
| 5-7 | Implied | **GAP** |
| 8. Utility compilation | Per module/artifact kind | Specified at high level |
| 9. Downstream consumer | Artifact-promotion learning | Implied |
| 10. Action surface | Pattern primitive emergence (which module configs produce kept artifacts vs deleted ones); DOC24 in-session priors (boost configs that produce kept outputs) | **GAP** |

**Spec changes needed:** Same pattern — full schema and action surface need specification.

### §3.9 RunForkFollowupUtility (PARTIAL — R0.6.4-origin)

| Step | Spec | Status |
|---|---|---|
| 1. Trigger | Fork closure (fork run terminates) | Specified |
| 2-4 | Schema partially implied | **GAP** |
| 5-7 | Implied | **GAP** |
| 8. Utility compilation | "Aggregated per fork_kind and per original-failure-type" | Specified at high level |
| 9. Downstream consumer | "Repair-pattern training data" | Specified at high level |
| 10. Action surface | **NOT SPECIFIED** — where compiled repair patterns get used | **GAP** |

**Specific potential surface:** Revisor's pattern retrieval at compile time could rank patterns by fork-success-history (patterns associated with successful forks rank higher when the current revision context resembles a known failure-then-fork pattern). Not currently specified.

### §3.10 TaskSegmentReuseSignals (PARTIAL — R0.6.4-origin)

Same pattern. Specific gap: duplicate-detection signal mechanism. The obligation mentions "duplicate/near-duplicate segment detection signals" but doesn't specify how detection works (embedding similarity threshold? Hash-based? Structural?).

**Spec changes needed:** Specify duplicate-detection mechanism; specify Pattern primitive consumption.

### §3.11 PromptEditEvalSignals (PARTIAL)

Three sources mentioned in `OBL-D8-PROMPT-EDIT-EVAL-SIGNALS-01`:
1. Prompt Advisor service edits
2. Prompt editor UI direct edits
3. PromptEvaluationTask runs

Payload distinction between these three is not specified. They likely warrant separate signal types because the source semantics differ:
- Prompt Advisor edits are *suggested* edits (revealed acceptance of suggestion)
- Prompt editor UI edits are *direct user authoring* (no suggestion baseline)
- PromptEvaluationTask runs are *systematic evaluation outcomes* (test results)

**Spec changes needed:** Split into three distinct signal types with clear payload schemas. Specify utility compilation differently for each (revealed-acceptance pattern for Prompt Advisor; authoring pattern for direct UI; evaluation outcome pattern for PromptEvaluationTask).

### §3.12 OutcomeCompilerProposalEditTrace (MISSING — proposed)

This is the most critical missing signal in the entire DOC23 family learning surface. Full proposed lifecycle:

| Step | Specification |
|---|---|
| 1. Trigger | User accepts or edits a `CompiledEvaluationPlan` proposal from the Outcome Compiler. Trigger fires at acceptance (or at any save point in iterative editing). |
| 2. Emitter | Outcome Compiler (`step.outcome_compiler`) |
| 3. Payload schema | `OutcomeCompilerProposalEditTrace` containing: `proposed_plan_ref: StorageRef`, `accepted_plan_ref: StorageRef`, `structural_diff: PlanDiff`, `optional_rationale: string?`, `optional_correction_hint: string?`, `context_signature: ContextSignature` (domain, artifact kind, etc.) |
| 4. Envelope | `EvaluationLearningSignalEnvelope` |
| 5. Persistence | Via EC; data_class = "internal" by default (proposals contain plan structure but not necessarily matter content); user-rationale may bump to "privileged" if it references matter facts |
| 6. EC policy gate | Standard signal envelope governance per `OBL-XDOC-EC-POLICY-SIGNALS-01` |
| 7. Consumer | BDSM via standard consumption obligation |
| 8. Utility compilation | BDSM aggregates structural-diff patterns per context_signature: which AssuranceBasis values get edited toward what; which lane compositions get added/removed; which source binding patterns get changed. Rationale field, when present, gets semantic-clustered to surface frequent reasons. |
| 9. Downstream consumer | (a) DOC72 Pattern primitive store — frequent structural edits become candidate patterns; (b) DOC24 in-session priors — recent edit-trace clusters bias Outcome Compiler's next proposal; (c) DSPy training data when R5 ships — Outcome Compiler prompt optimized against accumulated edit traces |
| 10. Action surface | (a) Outcome Compiler proposals get measurably closer to what user would accept; (b) Compiler quality dashboards show edit rate per context signature, surface degradation; (c) Pattern primitive ranking includes "users typically edit toward X in this context" hints |

**Why critical:** the Outcome Compiler is doing the hardest interpretive work in the pipeline. Every Compiler edit captured is gold-standard training signal.

### §3.13 ClaimExtractorProposalEditTrace (MISSING — proposed)

Same lifecycle pattern as Outcome Compiler edit trace, but for `step.claim_extractor`. Captures user edits to extracted claim sets before they flow to the Evaluator's `claims_in` port. Critical for extraction prompt improvement (R5 DSPy target).

### §3.14 JudgeCriteriaEditTrace (MISSING — proposed)

Same pattern for `step.judge`. Captures user edits to Judge criteria/rubrics before judgment runs. Different from PromptComparisonSignal (which compares variants); this captures the design-time edits the user makes to a Judge config.

### §3.15 ExperimentDesignEditTrace (MISSING — proposed)

Same pattern for `utility.experiment`. Captures user edits to Experiment configuration (variant selection, scoring rubric, dispatch rules) before Experiment runs. Different from PromptComparisonSignal (which is per-completed-experiment output); this is design-time learning signal.

### §3.16 TaskAgentChatDuringSetupSignal (MISSING — proposed)

`OBL-D8-TASK-AGENT-PROPOSAL-EDIT-TRACE-01` captures the diff between final proposal and user-accepted version. **But the chat conversation during task setup itself contains the highest-density user-correction signal** — the user iteratively refining what they want before the Task Agent produces its final proposal.

Proposed signal captures structured events from the chat-during-setup phase:
- User clarifications (when the user explains something the Task Agent misunderstood)
- User rejections of intermediate proposals (before final acceptance)
- User satisfaction signals (when the user confirms the Task Agent got something right)
- Topic shifts (when the user redirects the conversation)

Captured as discrete events with optional semantic clustering, not as a continuous transcript log. Schema needs design — particularly distinguishing actionable correction signal from incidental chat.

**This is non-trivial:** unlike edit traces (clean before/after diffs), chat signals are noisy and require interpretation. Recommend: signal carries the chat-segment ref + a Task-Agent-labeled "correction" tag (revealed by the Task Agent's subsequent proposal change); BDSM aggregates correction-tagged segments only, not full chat content.

### §3.17 Sub-agent reputation (FULLY TRACED)

Specified in V3.3 §15.8 + V5.2 §1.5/§2.x. Reputation per dispatch outcome (accepted/rejected/deferred). Reads back to discovery query ranking. Working loop.

### §3.18 Module quality metrics (FULLY TRACED)

V3.3 §15.1-§15.7 specifies metrics. These are dashboard signals, not learning signals per se — they don't drive automated behavior change, they drive operator visibility into degradation. Working as specified.

---

## §4 Cross-module mechanisms

### §4.1 Outcome cluster emergence (NEW MECHANISM)

**Purpose:** Replace the broken `goal_advancement_count` axis with emergent outcome categorization. Patterns get scored against emergent clusters of similar outcomes rather than against matter-specific user-declared goals.

**Owner:** DOC72 (symmetric to existing Pattern primitive emergence in DOC72 §13.5)

**Lifecycle:**

| Step | Specification |
|---|---|
| 1. Trigger | Outcome Compiler produces a `CompiledEvaluationPlan` containing `OutcomeSpec[]`; spec persisted via EC |
| 2. Embedding | DOC72 embeds each OutcomeSpec's structured content (not natural-language description; the structured schema fields: AssuranceBasis, lane composition, source binding pattern, criteria type) |
| 3. Storage | Embeddings stored alongside OutcomeSpec refs in DOC72 |
| 4. Clustering | Nightly job runs density-based clustering (HDBSCAN or similar) over recent OutcomeSpec embeddings; produces cluster IDs |
| 5. Cluster promotion | Stable clusters (re-discovered across re-runs, member count ≥ threshold) get promoted to durable clusters with stable IDs |
| 6. Cluster metadata | Each cluster stores: centroid, member refs, auto-generated label derived from centroid, structural distributions (AssuranceBasis distribution, lane composition distribution, source binding distribution), performance aggregates (which patterns succeed against this cluster) |
| 7. Assignment to new outcomes | When a new OutcomeSpec is compiled, DOC72 assigns its nearest-cluster ID (or seeds a new candidate cluster if no existing cluster fits within similarity threshold) |
| 8. PatternPerformanceSlice integration | `context_signature` adds `outcome_cluster_id` field; pattern slices accumulate per cluster |
| 9. Module prompt consumption | Outcome Compiler's prompt receives cluster context as a slot: "similar outcomes I've handled clustered as c-2891; dominant patterns in that cluster: [structural features]; 3 examples." Compiler reads as context, generates fresh for current task |

**Key design point:** clusters store *distributions over structural features*, not templates. Compiler is informed but not bound. Nuance is preserved because the Compiler still generates fresh.

**Open design questions:**
- Similarity threshold for cluster admission (false-positive cluster assignment vs false-negative new-cluster seeding)
- Cluster ID stability across re-runs (HDBSCAN doesn't natively guarantee stability; need ID-preservation mechanism)
- Auto-labeling quality (centroid-to-label generation needs to produce human-readable labels, probably via short LLM call)
- Cluster decay (old clusters that stop accumulating new members — retire or archive?)
- Cross-matter cluster firewall (do clusters cross matter boundaries? If yes, privilege firewall; if no, sparse data problem)

**Dependencies:** Requires DOC72 schema extension; requires `context_signature` schema extension on PatternPerformanceSlice; requires Outcome Compiler prompt template extension (Module System Prompts proposal); requires DOC24 packet assembly extension (slot for cluster context).

### §4.2 Multi-prior coordination policy (NEW)

**Purpose:** When multiple priors want to inject into a module's prompt at activation time, specify priority, conflict resolution, and budget allocation.

**Owner:** DOC15 CIL (prompt assembly layer) with policy stored in `RevisorConfig` and per-module overrides.

**Prior taxonomy:**

| Prior kind | Source | Example |
|---|---|---|
| User-stated preference | RevisorConfig, user profile | "I always use comparative_judge for argumentation outcomes" |
| Recent revealed preference | BDSM utility bundles (recent edit traces) | "User changed AssuranceBasis to X 3 times this week" |
| Similar-task memory | DOC72 (cluster or pattern match) | "Similar outcomes clustered as c-2891" |
| Capability availability | DOC24 capability registry | "These specialists are available" |
| Scope/policy context | EC Core | "This is a privileged matter; certain content excluded" |
| Module-specific defaults | RevisorConfig per-module | "Default verbosity: high for legal briefs" |

**Default priority order (configurable):**
1. Hard policy (EC Core scope/firewall — non-negotiable)
2. User-stated preference (RevisorConfig)
3. Recent revealed preference (BDSM utility bundles, time-decayed)
4. Similar-task memory (DOC72 cluster context)
5. Capability availability (DOC24)
6. Module-specific defaults

**Conflict resolution:** When two priors of different kinds disagree, higher priority wins. When two priors of same kind disagree (e.g., two recent edit patterns suggest different things), prompt assembly presents both with annotations and lets the LLM weigh them. When same-kind disagreement is severe (e.g., user-stated preference contradicts itself across two RevisorConfig entries), surface a validation warning.

**Budget allocation:** Each prior kind gets a token budget allocation in the prompt template. Budgets configurable per module. When sum of priors exceeds budget, truncation rule: keep highest-priority priors intact; truncate lower-priority priors first; if priority equals, truncate older priors first.

**Open design questions:**
- Should priority be configurable per module, or system-wide with module overrides?
- How to surface conflicts to the user (silent resolution vs explicit notification)
- Time-decay constants for "recent" revealed preferences

### §4.3 Optional rationale field standardization (NEW)

**Purpose:** Every accept/reject/edit feedback signal in the system should support optional rationale capture. Revealed-preference data is one bit per interaction; stated-preference data with reason multiplies the signal value.

**Scope:** Standardize across all of:
- TaskAgentPanelFeedback (per §3.5)
- TaskSuggestionFeedback (per §3.6)
- TaskAgentProposalEditTrace (per §3.4)
- OutcomeCompilerProposalEditTrace (proposed, per §3.12)
- ClaimExtractorProposalEditTrace (proposed, per §3.13)
- JudgeCriteriaEditTrace (proposed, per §3.14)
- ExperimentDesignEditTrace (proposed, per §3.15)
- ArtifactUtilitySignals (per §3.8)
- PromptEditEvalSignals (per §3.11)
- DirectInstructionSignal family (V3.3 §14.8.3 — `direct_instruction_edited` already has a rationale field per V3.3 §14; the rest of the family does not)

**Schema standardization:**
```ts
OptionalUserRationale {
  rationale: string?              // free-text reason for the action
  correction_hint: string?        // free-text what-should-have-happened-instead
  rationale_tags: string[]?       // user-selected tags from a domain-relevant taxonomy
  captured_at: timestamp
}
```

Every applicable feedback signal payload schema extends with `user_rationale: OptionalUserRationale?`.

**UI surface specification:** A comment box appears on negative feedback by default (thumbs-down, reject, edit-away-from-proposed), optional on positive feedback. Comment is never required; the user can dismiss the dialog.

**BDSM treatment:** Rationale fields are not just stored; BDSM semantic-clusters them. Frequently-recurring rationales become candidate patterns ("users frequently say X" → surface as Pattern primitive or design hint). This is what makes optional rationale 10x more valuable than thumbs-only.

### §4.4 DSPy training data architecture per target (R5-gated)

**Purpose:** When the R5 substrate ships, each registered DSPy target needs a specified training-data architecture: which signals feed it, how they're combined into training examples, what the composite metric looks like.

**Owner:** Addenda A V6 §A4 mentions DSPy targets; actual training-data architecture not yet specified. This work needs to happen in coordination with R5 substrate spec drafting.

**Per-target draft:**

| Target | Primary training data | Secondary training data | Composite metric components |
|---|---|---|---|
| `claim_extractor_main` | ClaimExtractorProposalEditTrace (proposed §3.13) | PromptComparisonSignal | Edit distance (low), downstream Evaluator findings accuracy (high), claim verification rate (high) |
| `outcome_evaluator_main` | OutcomeEvaluationSignal aggregate | EvaluationLimitationKind frequency | Finding quality vs human review (high), false-positive rate (low), missing-finding rate (low) |
| `revision_compiler_main` | RepairCycleSignal aggregate | Pattern slice convergence/failure rates | Plans-that-converge rate (high), regression-introduction rate (low), cost-per-success (low) |
| `outcome_compiler_main` | OutcomeCompilerProposalEditTrace (proposed §3.12) | Downstream Compiler-quality metrics §15.2 | Edit distance (low), downstream convergence (high), human-stated approval (high), semantic lint pass rate (high) |

**Open design questions:**
- Composite metric weighting (and whether weights are global, per-user, or learned)
- Training data freshness window (how recent must signals be to count)
- Cross-matter data class enforcement (which signals can train cross-matter, which can't, per privilege firewall)
- Multi-objective DSPy configuration (when to use Pareto frontier vs scalarized objective)

---

## §5 Loops to REMOVE

### §5.1 `goal_advancement_count` axis

**Current state:** V3.3 §13.3 defines `goal_advancement_count` with required `goal_advancement_source` enum (`independent_comparative_judge` | `explicit_human_feedback`); §6.12.2 specifies the "severed learning loop" sycophancy fix.

**Problems:**
1. Requires user to manually wire a `step.evaluator` with `AssuranceBasis: comparative_judge` (no auto-attach)
2. No UI for declaring DOC72 goals on a task exists in current DOC20 spec
3. Goal is matter-specific ("win Paramount") — doesn't generalize across matters; comparative judge would hallucinate verdicts against vague strategic goals
4. Attribution noise: whole-revision-cycle comparison can't isolate pattern contribution from module-execution contribution
5. Adds compute cost (extra LLM call per revision cycle) for a signal that's not actually useful

**Replacement:** `outcome_cluster_id` axis per §4.1. Emergent clustering by structural similarity replaces matter-specific goals. No manual wiring; no comparative judge; no sycophancy attack surface; no compute cost beyond nightly DOC72 clustering job (cheap, batch).

**Spec changes:**
- Remove `goal_advancement_count`, `goal_regression_count`, `goal_advancement_source`, `goal_advancement_evaluator_ref`, `goal_advancement_human_feedback_ref` from PatternPerformanceSlice
- Remove V3.3 §6.12 `GoalImpactAssessment` UI-only display (or repurpose it as `OutcomeClusterImpactAssessment` informational display)
- Remove V3.3 §15 test fixture F-LEARN-01 (sycophancy fix becomes structurally impossible if the axis doesn't exist)
- Remove validation rule `validation.goal_advancement_self_graded`
- Add `outcome_cluster_id` field on `context_signature`
- Update Core R0.7 §9 OutcomeEvaluationSignal payload to include `outcome_cluster_id` when available

**Migration consideration:** If any production data already has `goal_advancement_count > 0` for any slice, that data is preserved as historical but not incremented further; slice schema versions to v3 to mark the transition. (Likely no such data exists yet; ELNOR is pre-production.)

### §5.2 GoalRef on RevisionPlan (downstream)

If goal axis is removed, the `GoalRef[]` field on RevisionPlan and its propagation to GoalImpactAssessment becomes vestigial. Options:

- **Remove entirely** — cleanest but loses any future use of strategic goal annotation
- **Repurpose as user-organization metadata** — keep the field, demote to "for task-listing organization only," explicitly mark as non-learning-axis
- **Defer decision** — leave field in place, mark deprecated, decide at next revision

Recommend Option 2 (repurpose as organization metadata). Future DOC50+ shared-surfaces work might use goal tags for cross-matter task organization; preserving the field is cheap.

---

## §6 DOC23 proper note

DOC23 proper (the task system spec, not the addenda) has a deliberately minimal learning surface. The high-leverage learning happens in module specs (Addenda A + B), which is correct architectural separation.

DOC23-proper-owned signals:
- `OBL-D8-TASK-INVOCATION-UTILITY-01` — task invocation outcome metrics (task scope, not module scope)
- `OBL-D8-TASK-SUGGESTION-FEEDBACK-01` — task-level suggestion feedback
- `OBL-D8-TASK-CONTEXT-FEEDBACK-EVENT-01` — DOC24 context injection feedback (task-execution scope)
- `OBL-D8-TASK-SEGMENT-REUSE-SIGNALS-01` — task segment reuse outcomes

These are all task-level (whole-task) signals, not module-level. They feed task design pattern emergence in DOC72. The lifecycle gaps identified for each in §3 apply.

**No additional DOC23-proper learning loops proposed.** The Task Agent learning surfaces (which Will called out) belong to Addenda B per V5.2 §1.10 explicit boundary; they're covered in §3.4, §3.5, §3.16 above.

---

## §7 Out-of-scope (Piece 3 substrate)

The following are deliberately NOT addressed in this scoping document because they belong to the system-wide self-diagnosis substrate (Piece 3, separate proposal):

- BDSM quality threshold-crossing detection and proactive surfacing
- Error/bug/disconnection pattern detection
- Repeated-issue-in-chat pattern recognition
- Code-level diagnostics of failing components
- Self-diagnosis task graph (ELNOR investigating its own failures)
- Multi-modal investigation invoking red-team subevaluators or multi-agent panels

The boundary: this scoping document is about per-module BDSM signal-driven learning that drives prompt/pattern/preset improvement. Piece 3 is about system-wide health monitoring and self-investigation triggered by accumulated quality data crossing thresholds.

Piece 3 will consume DATA from the learning loops scoped here (BDSM quality metrics, edit-trace volumes, pattern failure rates) — but Piece 3's own architecture is independent and cross-cuts DOC23/DOC72/DOC8/EC Core/DOC11/DOC24/DOC20.

---

## §8 OP-A obligations affected

This scoping document, once converted to the actual Self-Learning Architecture R1 spec, will produce coordinated OP-A row additions and modifications. Approximate landing:

**Modified rows:**
- `OBL-D8-TASK-AGENT-PANEL-FEEDBACK-01` — add optional rationale field requirement
- `OBL-D8-TASK-SUGGESTION-FEEDBACK-01` — add optional rationale field requirement
- `OBL-D8-TASK-AGENT-PROPOSAL-EDIT-TRACE-01` — specify diff schema + optional rationale
- `OBL-D8-PROMPT-EDIT-EVAL-SIGNALS-01` — split into three distinct signals
- `OBL-D8-TASK-CONTEXT-FEEDBACK-EVENT-01` — full schema specification (currently R0.6.4-origin gap)
- `OBL-D8-ARTIFACT-UTILITY-SIGNALS-01` — full schema specification
- `OBL-D8-RUN-FORK-FOLLOWUP-UTILITY-01` — action surface specification
- `OBL-D8-TASK-SEGMENT-REUSE-SIGNALS-01` — duplicate-detection mechanism specification
- `OBL-XDOC-BDSM-CONSUME-SIGNALS-01` — utility compilation rules per signal type
- `OBL-XDOC-EVAL-SIGNAL-OWNERSHIP-01` — RepairCycleSignal taint_evolution downstream consumer
- `OBL-XDOC-PROMPT-COMPARISON-SIGNAL-01` — pre-R5 and post-R5 consumer/action specification

**New rows expected:**
- `OBL-D8-OUTCOME-COMPILER-EDIT-TRACE-01` (NEW signal)
- `OBL-D8-CLAIM-EXTRACTOR-EDIT-TRACE-01` (NEW signal)
- `OBL-D8-JUDGE-CRITERIA-EDIT-TRACE-01` (NEW signal)
- `OBL-D8-EXPERIMENT-DESIGN-EDIT-TRACE-01` (NEW signal)
- `OBL-D8-TASK-AGENT-CHAT-DURING-SETUP-SIGNAL-01` (NEW signal)
- `OBL-D72-OUTCOME-CLUSTER-EMERGENCE-01` (DOC72 nightly clustering job)
- `OBL-D72-OUTCOME-SPEC-EMBEDDING-01` (DOC72 embedding generation)
- `OBL-D23-CONTEXT-SIGNATURE-CLUSTER-ID-01` (PatternPerformanceSlice schema extension)
- `OBL-D15-MULTI-PRIOR-COORDINATION-01` (DOC15 CIL prompt assembly policy)
- `OBL-D24-MULTI-PRIOR-BUDGET-ALLOCATION-01` (DOC24 packet assembly budget per prior kind)
- `OBL-D23-RATIONALE-FIELD-STANDARD-01` (cross-module schema standardization)
- `OBL-D8-RATIONALE-CLUSTERING-01` (BDSM semantic clustering of rationale text)
- `OBL-XDOC-DSPY-TRAINING-DATA-ARCHITECTURE-01` (per-target training data spec; R5-gated)

**Rows to mark deprecated:**
- `goal_advancement_count` references throughout (no direct OP-A row; embedded in V3.3 §13.3 schema)

**Separately noted (V3.16 audit gap):**
- Addenda A V6 not in §3 Source Registry or §5 Currency snapshot. Inherited oversight from V3.16 sub-agent-architecture fold-in which didn't audit §3/§5. Should be addressed at next OP-A maintenance pass (not part of this scoping document scope, but noted here for tracking).

---

## §9 Open questions for refinement

Items where this scoping document makes a recommendation but the design needs validation or further discussion before becoming spec:

1. **Outcome cluster similarity threshold.** What admission threshold prevents false-cluster-collapse while allowing nuance? Needs empirical tuning once data exists.

2. **Cluster ID stability across HDBSCAN re-runs.** HDBSCAN doesn't natively preserve cluster IDs. Need an ID-preservation mechanism (e.g., centroid-distance-based ID inheritance).

3. **Cross-matter cluster firewall.** Should clusters span matter boundaries (richer data, privilege risk) or stay matter-scoped (sparse data, safer)? Likely workspace-scoped with explicit cross-matter promotion gate.

4. **Multi-prior priority policy.** Default order proposed in §4.2 is a starting point. Real-world priority probably varies by module. User-configurable per RevisorConfig with system-default fallback.

5. **Optional rationale UI surface design.** Comment box on every feedback action could be friction. Default-on-negative, optional-on-positive is one heuristic; might need user-preference toggle.

6. **Chat-during-setup signal noise filtering.** How to separate actionable correction signal from incidental chat in TaskAgentChatDuringSetupSignal. Initial proposal: only Task-Agent-labeled segments where the user's input changed the next proposal.

7. **DSPy composite metric weighting.** Per-target weights are configurable but defaults need to be set. Probably learned per-user over time (meta-learning), but initial defaults need design.

8. **Pre-R5 PromptComparisonSignal action surface.** Currently no consumer. Should it sit cold until R5, or feed quality dashboards in the interim?

9. **TaskContextFeedbackEvent downstream-behavior measurement.** What's the behavioral proxy for "user used the injected content"? Module-output reference to injected content is one option; user-edit-distance on output sections that depended on injection is another.

10. **Existing signal lifecycle gap closure.** Many partial loops (§3.1-§3.11) need utility-compilation rules and action surfaces specified. These are spec drafting work, not architectural decisions, but the order/priority of closure matters.

---

## §10 Recommendations and next steps

### §10.1 Immediate (this thread)

Review this scoping document. Refine §4 cross-module mechanisms (especially outcome clustering and multi-prior coordination) since those are the most architectural-load-bearing pieces. Decide on §5 removal scope (full removal vs Option-2 metadata-only repurpose for GoalRef).

### §10.2 Next session

Take refined scoping doc and run it through ChatGPT (or other adversarial review) for architecture-level feedback per Will's stated workflow. Refine the design propositions as needed.

### §10.3 After review

Draft DOC23 Family Self-Learning Architecture R1 as the actual spec. Single consolidated document. Includes:
- Schema additions to PatternPerformanceSlice (cluster_id), new signal types (5 new + standardized rationale)
- DOC72 outcome clustering specification
- DOC15 multi-prior coordination policy
- Coordinated revisions to Addenda A R4.1 V6 and Addenda B V3.3 / Common Contracts V1.1
- OP-A row additions and modifications listed in §8

### §10.4 Coordination

The Module System Prompts proposal (deferred earlier in this work cycle) is downstream of this. Module prompts can't be finalized without knowing what priors get injected and how. After Self-Learning Architecture R1 ships, return to the prompts proposal with the prior-injection contract finalized.

### §10.5 Piece 3 trigger

Piece 3 (system-wide self-diagnosis substrate) becomes much easier to design after this. Many of Piece 3's data inputs are the same signals scoped here. Piece 3 can be drafted after R1 ships.

### §10.6 V3.16 OP-A gap

Separately: Addenda A V6 not registered in V3.16 §3/§5. Address at next OP-A maintenance pass (V3.17 or later). Not blocking on this work.

---

## §11 End of scoping document

This V1 scoping document is self-contained for review purposes. After refinement, becomes input to the Self-Learning Architecture R1 spec drafting cycle. Until that spec ships, the current state remains: partial learning loops, missing edit-trace signals, broken goal-advancement axis.