DOC23_EVALUATION_COMMON_CONTRACTS_V1_1_1.md
Current Specs/DOC23/DOC23 Addenda B/DOC23_EVALUATION_COMMON_CONTRACTS_V1_1_1.md
# DOC23 Evaluation Common Contracts V1.1.1
**Document type:** Sibling specification to DOC23 R3.1 hosting shared evaluation primitives until DOC23 R3.2 absorbs them.
**Status:** V1.1.1 — clean replacement for V1.1; reference/topology cleanup only; no schema changes.
**Version:** 1.1.1
**Prepared:** 2026-05-17 (V1); revised 2026-05-17 (V1.1 Pattern C consumption semantics); revised 2026-05-24 (V1.1.1 reference/topology cleanup)
**Lifecycle:** Active until DOC23 R3.2 compilation pass absorbs the contracts into the parent doc; then this document retires per §11 migration guide.
**Owner:** Joint between Addenda A chat and Addenda B chat per coordination V3 §3.2; conflicts adjudicated by Will.
**V1.1.1 changes from V1.1:**
No schema, behavior, or validation changes. This full replacement copy updates stale references to the current Addenda B family versions: Outcome Evaluator/Revisor V3.3.1, Core R0.7.1, and the active sub-addenda. It also fixes internal version residue that still described the document as V1.0/V1.1 in current-version text.
**V1.1 changes from V1:**
V1.1 adds §3.7 "Pattern C consumption semantics" documenting how the EvaluationResultEnvelope flows when Judge attaches downstream of an Evaluator output (Pattern C ad-hoc Judge attachment, per coordination V3 §2.9). The Pattern C wiring is crystallized at port level in DOC23 Addenda B / Outcome Evaluator+Revisor V3.3.1 §5.18 (sibling change); V1.1 §3.7 documents the envelope-level chain semantics: how `target_evaluation_chain_id` links the upstream Evaluator's envelope to the downstream Judge's envelope, enabling chain reconstruction in audit, UI, and learning.
Nothing else changes from V1. The twelve schemas in §§3-8 are unchanged. Validation rules, versioning protocol, migration guide, and cross-doc obligations carry through.
**Consumers:**
- DOC23 Addenda A R4.1 V3 (Judge, Experiment, Claim Extractor) — consumes shared envelope, slice schemas, Criterion, ArtifactScopeRef
- DOC23 Addenda B Core R0.7.1 (Task Agent, Outcome Evaluator/Revisor, Forum, Source Workspace, Feedback Delivery) — consumes shared envelope, signal envelope, Criterion, ArtifactScopeRef
- DOC23 Addenda B subsystem addenda (V3.3.1 Outcome Evaluator/Revisor; Source Workspace V1.0.1; Task Forum V1.0.1; Feedback Delivery V1.0.1) — reference shared primitives
- PropA R6.3+ — consumes ArtifactScopeRef, TextAnchor, StructuredAnchor (shared extraction primitives)
- DOC72 — consumes Pattern-related context types as referenced from Addenda B Core
- DOC8/BDSM — consumes EvaluationLearningSignalEnvelope as the unified signal stream
- EC Core — gates envelope governance fields (data_class, matter_id, pattern_promotion_eligible)
- DOC20 — renders the envelope, signals, and criteria in UI surfaces
---
## §0 How to read this document
This document hosts schemas that are **shared between Addenda A and Addenda B** but don't belong in either addendum individually. The architecture is locked per coordination V3 FINAL; this document is the canonical source for the shared primitives until DOC23 R3.2 absorbs them at the parent level.
### §0.1 Why this document exists
The Addenda A ↔ Addenda B coordination produced a converged architecture where Judge (Addenda A) and Outcome Evaluator (Addenda B) emit a shared `EvaluationResultEnvelope`, where all learning signals wrap in a common `EvaluationLearningSignalEnvelope`, and where extraction operations use shared anchoring primitives (`TextAnchor`, `StructuredAnchor`, `ArtifactScopeRef`). Hosting these primitives in one of the addenda would create fragile cross-addendum references and bury shared types inside one addendum's narrative.
Per coordination V3 §3.2, the resolution is a sibling document hosting these primitives. Both addenda reference this document. When DOC23 R3.2 compiles, the primitives migrate to the parent doc and this document retires (§11).
### §0.2 What lives here vs. what lives in the addenda
**Lives here (this document):**
- Shared schemas consumed by both Addenda A and Addenda B
- Cross-cutting governance fields that span Judge and Evaluator
- Shared anchoring/scoping primitives consumed by extractors
- Universal signal envelope structure
**Lives in Addenda A** (R4.1 V3 / V4.1 Coordination Patch / V5 Mini-Card):
- Judge module schema and configuration
- `OutcomeComplianceScoringConfig` (Judge's module config for outcome scoring)
- `step.claim_extractor` module schema and 22-type registry
- Experiment module schema and `experiment_winner_routing` config
- Judge-specific scoring methods (rubric, checklist, pairwise, factual_verification, consistency, outcome_compliance)
**Lives in Addenda B** (Core R0.7.1 / V3.3.1 Outcome Evaluator+Revisor / sub-addenda):
- `EvaluationOutcomeDefinition` schema (consumes Criterion from this document)
- Evaluator module schema and `claims_in` port (port type defined here; module surface in V3.3)
- Revisor module schema and `RevisorConfig.learning_mode`
- Pattern primitive with `cross_model_applicability` (DOC72 owns the storage; Addenda B specifies the consumption surface)
- Forum / Source Workspace / Feedback Delivery surfaces
### §0.3 Versioning protocol
Each schema in this document carries `schema_version`. Bumping any schema requires:
1. **Coordination check.** Schemas marked "joint Addenda A + Addenda B" require both chats to agree before bumping. The architect (Will) adjudicates if the chats disagree.
2. **Compatibility analysis.** Bump is `MAJOR` (breaking — old consumers must update), `MINOR` (additive — old consumers continue working), or `PATCH` (clarification — no consumer changes needed). Compatibility is recorded in §10.
3. **Cross-doc notification.** Bumps to schemas consumed by other docs (PropA, DOC72, DOC8, EC Core, DOC20) require the corresponding OP-A rows to be updated.
### §0.4 Reading order
The document is organized for read-through:
- §1 — Scope and non-goals
- §2 — Producer kinds and ownership (who emits what)
- §3 — `EvaluationResultEnvelope` (the main artifact)
- §4 — Slice schemas (the five slices the envelope can carry)
- §5 — `EvaluationLearningSignalEnvelope` (signal wrapper)
- §6 — `Criterion` (the public sub-contract)
- §7 — Anchoring primitives (ArtifactScopeRef, TextAnchor, StructuredAnchor)
- §8 — Lineage primitives (VariantEvaluationLineage, CriterionLineage)
- §9 — Validation rules
- §10 — Versioning and evolution rules
- §11 — Migration guide for DOC23 R3.2 absorption
- §12 — Cross-doc obligations summary
---
## §1 Scope and non-goals
### §1.1 In scope
This document specifies:
- The shared `EvaluationResultEnvelope` schema and its wrapping in Addenda A's existing `EvaluationArtifactEnvelope` (per coordination V3 §2.3)
- The five slice schemas (Quantitative, Qualitative, Comparison, AssuranceAndLimitation, SafetyAndGovernance) that the envelope may carry
- The `EvaluationLearningSignalEnvelope` that wraps all eight Phase 1 learning signal types (per coordination V3 §2.11)
- The `Criterion` schema as the public sub-contract on `EvaluationOutcomeDefinition` (per coordination V3 §2.4)
- The `ArtifactScopeRef`, `TextAnchor`, and `StructuredAnchor` primitives shared between Addenda A's Claim Extractor and Addenda B's evaluation operations (per coordination V3 §2.12)
- The `VariantEvaluationLineage` and `CriterionLineage` records carried in the envelope for variant/criterion tracking
- Validation rules that all conforming implementations enforce
- The migration path for absorption into DOC23 R3.2
### §1.2 Out of scope
This document does NOT specify:
- The Judge module schema (lives in Addenda A R4.1 V3)
- The Outcome Evaluator module schema (lives in Addenda B V3.3)
- The Revisor module schema (lives in Addenda B V3.3)
- The Claim Extractor module schema or its 22-type unit registry (lives in Addenda A R4.1 V3)
- Specific signal payload schemas (the wrappers are here; payloads live in the owning addenda: RepairCycleSignal in Addenda B Core R0.7.1 etc.)
- Pattern primitive storage (DOC72)
- EC Core policy engine details (EC Core Addendum A V3.3)
- Cost governance details (EC Core §6)
- UI rendering specifics (DOC20)
These are listed as cross-references where consumed.
### §1.3 Coordination origin
This document originates from the Addenda A ↔ Addenda B coordination V3 FINAL proposal (filename: produced by Addenda A chat 2026-05-17). The schemas here are paste-ready from §4 of the Addenda B response (`DOC23_ADDB_RESPONSE_TO_ADDA_R4_1_V3_COORDINATION_PROPOSAL_V1.md` §4); this document organizes them into a sibling specification with governance, versioning, and migration framing.
---
## §2 Producer kinds and ownership
### §2.1 Producer kinds (per coordination V3 §2.2)
`EvaluationResultEnvelope.producer_kind` is exactly five values in Phase 1:
```ts
ProducerKind =
| "judge" // Addenda A step.judge
| "outcome_evaluator" // Addenda B step.evaluator
| "agent_review_gate" // Addenda B
| "human_review" // reserved for explicit human review records
| "deterministic_scorer" // reserved for deterministic / rule-based scorers
```
Control-flow decisions (`switch_agent_decision`, `loop_controller_agent_decision`) are NOT producer kinds in Phase 1. They emit thin `ControlDecisionResult` records via a future adapter; they do not pollute evaluation signal lineage.
### §2.2 Ownership matrix
| Producer kind | Owning addendum | Module type | Primary slice populated |
|---|---|---|---|
| `judge` | Addenda A | `step.judge` | quantitative_slice |
| `outcome_evaluator` | Addenda B | `step.evaluator` | qualitative_slice |
| `agent_review_gate` | Addenda B | (sub-agent gate, not standalone module) | qualitative_slice (minimal), assurance_slice |
| `human_review` | (reserved) | (none — direct human input record) | qualitative_slice, assurance_slice |
| `deterministic_scorer` | (reserved) | (future) | quantitative_slice |
### §2.3 Slice population per producer
| Producer | quantitative | qualitative | comparison | assurance | safety |
|---|---|---|---|---|---|
| judge (rubric/checklist/pairwise/factual_verification/consistency/outcome_compliance) | yes | no | when in Experiment | yes (limitations) | yes |
| outcome_evaluator | no | yes | when in Experiment | yes | yes |
| agent_review_gate | sometimes | yes | when applicable | yes | yes |
| human_review | (reserved) | yes | (reserved) | yes | yes |
| deterministic_scorer | yes | no | when applicable | yes (limited) | yes |
Verdict and lifecycle status are always populated regardless of which slices are populated.
---
## §3 EvaluationResultEnvelope
### §3.1 Schema
```ts
EvaluationResultEnvelope {
// Identity
result_id: string // format: "evr-{ulid}"
producer_kind: ProducerKind // §2.1
// Context
task_id: string
run_id: string
producer_module_id: string
producer_activation_seq: number
producer_config_ref: StorageRef
target_evaluation_chain_id?: string // V4 R200 chain identifier
// What was evaluated
target_artifact_ref: StorageRef | null
target_artifact_version_ref: StorageRef | null
target_scope_ref: ArtifactScopeRef | null // §7 (scope evaluated, may be sub-document)
evaluation_snapshot_ref: StorageRef // REQUIRED — anchors to immutable
// EvaluationSnapshot per Addenda B V3.1 §5.16
// Verdict and lifecycle (per coordination V3 §2.5)
evaluation_verdict:
| "passed"
| "failed"
| "indeterminate"
| "not_applicable"
result_lifecycle_status:
| "complete"
| "partial"
| "blocked"
| "error_no_result"
| "superseded"
indeterminate_reasons: IndeterminateCause[] // V4 R203 taxonomy
// Addenda B-internal state (rich verdict for revisor compile-time strategy)
// Producers from Addenda B populate this; consumers from Addenda A may
// ignore it. The mapping from overall_state to evaluation_verdict is
// canonical (see §3.2).
overall_state: OutcomeEvaluationState // 14-value enum per Addenda B V3.1 §5.1
// Slices — each null when not applicable; see §4
quantitative_slice: QuantitativeEvaluationSlice | null
qualitative_slice: QualitativeEvaluationSlice | null
comparison_slice: ComparisonEvaluationSlice | null
assurance_slice: AssuranceAndLimitationSlice | null
safety_slice: SafetyAndGovernanceSlice | null
// Lineage — §8
variant_lineage?: VariantEvaluationLineage
criterion_lineage: CriterionLineage[] // populated when criteria were evaluated
// Route recommendation — NOT graph port (runtime decides actual port)
// Per coordination V3 §2.5: the envelope recommends, the graph runtime
// dispatches. This avoids two sources of routing truth.
route_recommendation?: {
recommended_outcome:
| "pass_path"
| "fail_path"
| "human_review_path"
| "retry_path"
rationale_summary: string // human-readable
}
// Hard Call surface (per Addenda B V3.1 §7.9)
hard_call_surface_ref?: StorageRef // points to HardRevisionCall
// if the producer raised one
limitation_records: JudgmentLimitationRecord[] // Addenda B V3.1 §5.9
// Audit and replay
audit_refs: StorageRef[]
execution_watermark_ref?: StorageRef // for replay determinism
source_policy_snapshot_ref?: StorageRef // PropA classification snapshot
// at evaluation time
// Schema versioning
schema_version: "1.0"
addendum_revision: string // producer's owning addendum revision
// e.g., "R4.1_V3" / "R0.7" / "R3.2"
migration_version: number // monotonic across schema bumps
}
```
### §3.2 Verdict mapping from overall_state
For Addenda B producers (which populate `overall_state` from V3.1's 14-value enum), the mapping to `evaluation_verdict` is canonical:
```
satisfied → passed
needs_revision → failed
regressed → failed
unrecoverable → failed
upstream_failure → failed
needs_information → indeterminate
needs_verification → indeterminate
needs_human_judgment → indeterminate
unable_to_evaluate → indeterminate
blocked_by_policy → indeterminate
dirty → NOT emitted to envelope (transient)
superseded → NOT emitted to envelope (transient)
pending → NOT emitted to envelope (transient)
pending_dependency → NOT emitted to envelope (transient)
```
For Addenda A producers (Judge), `overall_state` is populated from its scoring outcome — typically `satisfied`, `needs_revision`, or `blocked_by_policy`. Judge doesn't generate the full 14-value range; the field is reserved for future expansion.
### §3.3 Wrapping in EvaluationArtifactEnvelope
Per coordination V3 §2.3, every `EvaluationResultEnvelope` is the payload inside Addenda A's existing `EvaluationArtifactEnvelope` (V4 R199), which provides payload modes, nested-ref governance, legacy payload hash, envelope hash, and injection-eligibility flags:
```ts
EvaluationArtifactEnvelope<EvaluationResultEnvelope>
```
Implementations MUST NOT emit a bare `EvaluationResultEnvelope`; the wrapper is required. This eliminates a parallel governance model and inherits the existing artifact-envelope machinery.
### §3.4 Identity and uniqueness
- `result_id` is globally unique. Format: `"evr-{ulid}"`. The ULID provides time-orderedness plus collision resistance.
- `(producer_module_id, producer_activation_seq)` is a unique key within a task. Re-runs produce new `result_id` and new `producer_activation_seq`.
- `evaluation_snapshot_ref` links to an immutable EvaluationSnapshot (Addenda B V3.1 §5.16). Two envelopes can reference the same snapshot if they evaluated against the same captured state (e.g., Judge and Evaluator running in parallel on the same artifact version).
### §3.5 Snapshot requirement
`evaluation_snapshot_ref` is REQUIRED. Producers MUST emit a snapshot before emitting the envelope; the envelope references the snapshot. This anchors attribution under concurrent edits — see Addenda B V3.1 §5.16 for snapshot semantics.
Consumers that re-run revision plans validate against the snapshot's content hashes via Addenda B V3.1 §7.13 `ArtifactMutationPrecondition`.
### §3.6 Hard Call surface
When a producer cannot reach a substantive verdict because user judgment is required, it sets `hard_call_surface_ref` to a `HardRevisionCall` (Addenda B V3.1 §7.9) and sets `evaluation_verdict` to `indeterminate`. The producer MUST NOT populate `route_recommendation.recommended_outcome` with anything but `human_review_path` in this case.
### §3.7 Pattern C consumption semantics (V1.1)
Pattern C (ad-hoc Judge attachment, per coordination V3 §2.9) is the wiring where Judge attaches downstream of any standalone Evaluator output to produce per-criterion numeric scores without requiring an Experiment. V1.1 documents how the EvaluationResultEnvelope flows in this pattern.
**Producer in Pattern C:** standalone Evaluator activation (NOT inside an Experiment context).
**Output port:** the Evaluator emits its envelope on `evaluation_result_out` per Addenda B V3.3.1 §5.18. The port emits `EvaluationArtifactEnvelope<EvaluationResultEnvelope>` per coordination V3 §2.3 wrapping requirement.
**Consumer:** Judge module attached downstream via graph wiring. Judge's input port surface is specified by Addenda A R4.1 V3+ per OBL-XDOC-JUDGE-EVALUATOR-OUTPUT-IN-01; the port name is at Addenda A's discretion (typical candidates: `evaluation_result_in`, `evaluator_output_in`, `upstream_evaluation_in`).
**Hidden-dispatch prohibition.** Pattern C MUST use graph wiring. The Evaluator MUST NOT hidden-dispatch a Judge module. The Evaluator emits its envelope; whether and how a Judge consumes is determined by graph wiring at edit time, not by the Evaluator's logic. This preserves DOC23 graph primacy and is parallel to the V3.3 §5.17.5 hidden-dispatch prohibition for the Claim Extractor.
**Judge's envelope output.** Judge produces its own EvaluationResultEnvelope with `producer_kind = "judge"` populating `quantitative_slice`; `qualitative_slice` left empty (Judge does not emit prescriptive findings, except as noted in §2.3 ownership matrix for hybrid Judge modes).
**Chain linkage via `target_evaluation_chain_id`.** The upstream Evaluator's envelope populates `target_evaluation_chain_id` with a UUID identifying the evaluation chain. The downstream Judge's envelope (Pattern C) populates `target_evaluation_chain_id` with the SAME value, signaling that the two envelopes are part of one chain. This enables:
- **Audit reconstruction** — given the chain id, retrieve all envelopes in the chain (the Evaluator's qualitative_slice + Judge's quantitative_slice = complete evaluation result)
- **UI rendering** — DOC20 renders the chain as a single conceptual "evaluation" with two producer contributions
- **Learning correlation** — DOC8/BDSM and DOC72 Pattern primitive correlate Evaluator findings with Judge scores via the chain id
- **Replay determinism** — the chain reconstructs deterministically from envelope ids
Chain id propagation rules:
```
Standalone Evaluator (Pattern C eligible):
evaluator_envelope.target_evaluation_chain_id = <new UUID>
Pattern C Judge attached downstream:
judge_envelope.target_evaluation_chain_id = evaluator_envelope.target_evaluation_chain_id
Multi-hop chain (Evaluator → Judge → another Judge):
All envelopes in the chain share the same chain id
```
**Pattern C does NOT use a separate envelope subtype.** The same EvaluationResultEnvelope serves all five producer kinds (§2.1); Pattern affiliation is recorded via `variant_lineage` (null for Pattern C; populated for Patterns A and B). The slice population differs by producer kind per §2.3 ownership matrix.
**Pattern C and `route_recommendation`.** When Judge runs in Pattern C and produces its quantitative envelope, the envelope's `route_recommendation` reflects Judge's quantitative-side recommendation (e.g., "pass_path" when scores meet threshold, "fail_path" when below threshold). This may differ from the upstream Evaluator envelope's `route_recommendation` (which reflected Evaluator's qualitative-side recommendation). Downstream consumers see two route recommendations in the chain; resolution is by consumer policy — typically the Judge's quantitative recommendation governs when Pattern C is wired, since Judge is the more recent producer in the chain.
**Cross-doc references:**
- Addenda B V3.3.1 §5.18 — Evaluator's `evaluation_result_out` port contract (output side of Pattern C wiring)
- OBL-XDOC-JUDGE-EVALUATOR-OUTPUT-IN-01 — Addenda A's symmetric obligation for Judge's input port
- OBL-XDOC-OUTCOME-COMPLIANCE-01 — Judge's `outcome_compliance_scoring` method that powers Pattern C
- OBL-XDOC-DOC20-EVAL-UI-01 — DOC20 "Attach Judge" UI action for user-driven Pattern C wiring
- Coordination V3 §2.9 — Pattern C originating spec
- §8.1 VariantEvaluationLineage — null for Pattern C
---
## §4 Slice schemas
### §4.1 QuantitativeEvaluationSlice
Populated when scoring is the mechanism (Judge in any of its modes; deterministic scorers; some agent review gates).
```ts
QuantitativeEvaluationSlice {
quality_index: QualityIndex // V4 R187 / R212 aggregate metric
per_dimension: DimensionScore[] // V4 R204 per-dimension breakdown
// For outcome_compliance method,
// each dimension corresponds to one
// Criterion (dimension_id = criterion_id)
scoring_method:
| "rubric"
| "checklist"
| "pairwise"
| "factual_verification"
| "consistency"
| "outcome_compliance" // new per coordination V3 §2.4
metric_semantics_version: string // V4 R30 — comparability anchor
scorer_hash: string // V2 R58 / V4 R217 — config fingerprint
}
```
For `outcome_compliance` method, each `DimensionScore` carries the `Criterion.criterion_id` as `dimension_id` and the `Criterion.criterion_semantics_hash` as a stability field (so deltas across runs correlate even when criterion_text rephrases).
### §4.2 QualitativeEvaluationSlice
Populated when prescriptive output is the mechanism (Outcome Evaluator; agent review gates; human review).
```ts
QualitativeEvaluationSlice {
findings: EvaluationFinding[] // Addenda B V3.1 §5.7
// each finding may carry target_criterion_id
// linking to Criterion.criterion_id
repair_instructions: OutcomeRepairInstruction[]
source_needs: ResearchNeed[]
affirmations: EvaluationAffirmation[] | null
outcome_spec_ref: string // which EvaluationOutcomeDefinition
}
```
The full `EvaluationFinding`, `OutcomeRepairInstruction`, `ResearchNeed`, and `EvaluationAffirmation` schemas live in Addenda B Core R0.7.1 (their owning addendum). This document references them by name.
### §4.3 ComparisonEvaluationSlice
Populated when the producer is part of a comparative evaluation (Pattern A per-variant, or Pattern B bundled).
```ts
ComparisonEvaluationSlice {
target_variant_id: string | null
is_comparative: boolean
sibling_variant_ids: string[] | null
comparative_recommendation:
| "best_among_evaluated"
| "passes_all_criteria"
| "passes_with_minor_findings"
| "acceptable_alternative"
| "rejected_below_threshold"
| "rejected_failed_criteria"
| null
// Numeric ranking lives in quantitative_slice.per_dimension when produced
// by Judge; the comparison_slice is recommendation-level, not score-level.
}
```
This slice is recommendation-level. Numeric ranking lives in `quantitative_slice`. The comparison_slice exists so downstream consumers (the Switch module, the Loop Controller, Forum review) can route without re-deriving comparison semantics from raw scores.
### §4.4 AssuranceAndLimitationSlice
Populated when the producer wants to expose its confidence basis and any limitations encountered.
```ts
AssuranceAndLimitationSlice {
assurance_basis: AssuranceBasis[] // Addenda B V3.1 §0.4 — 14 values
// (trust reasons for the verdict)
limitations: EvaluationLimitationKind[] // Addenda B V3.1 §0.4 — limitation reasons
evidence_status:
| "available_verified"
| "available_unverified"
| "missing_required"
| "stale"
| "blocked_by_policy"
}
```
`AssuranceBasis` and `EvaluationLimitationKind` enums are split per Addenda B V3.1 patch P2. Trust reasons (assurance basis) and limitation reasons (why something couldn't be verified) live in separate enums; they are not the same axis.
### §4.5 SafetyAndGovernanceSlice
Populated by every producer (governance is universal). Inherits from the envelope's `EvaluationArtifactEnvelope` wrapper governance plus adds evaluation-time policy state.
```ts
SafetyAndGovernanceSlice {
taint_class_at_evaluation: TaintClass // Addenda B V3.1 §0.4 / §15.10
// — taint state when evaluation ran
policy_decision_refs: PolicyEvaluationRef[] // EC Core PolicyDecision records
// consulted during evaluation
sanitization_required: boolean // whether downstream consumers need
// to sanitize before using results
governance_class: AccessTier // Addenda B V3.1 §0.4 / §16.2
privileged: boolean // whether artifact is privileged
matter_id?: string // matter scope (per V3.1 §13.4 firewall)
}
```
EC Core's compiled policy engine reads these fields at envelope persistence time to gate retention, promotion, and cross-matter access.
---
## §5 EvaluationLearningSignalEnvelope
### §5.1 Schema
Every learning signal — across Addenda A, Addenda B, and DOC8/BDSM — wraps inside a common envelope:
```ts
EvaluationLearningSignalEnvelope {
signal_id: string
signal_type: SignalType // §5.2
task_id: string
run_id: string
evaluation_chain_id?: string
// Source
source_module_id: string
source_activation_seq: number
// Governance — gated by EC Core policy engine
governance_policy_ref: string // EC PolicyDecision reference
source_policy_snapshot_ref?: StorageRef
data_class: "public" | "internal" | "privileged" | "local_only"
matter_id?: string
pattern_promotion_eligible: boolean
// Model context (per coordination V3 §2.10)
model_class: "cheap_local" | "cheap_api" | "medium" | "expensive_frontier"
model_fingerprint: string
// Task design context (per coordination V3 §2.7, optional)
// Null for point-in-time signals where task design isn't relevant
// (TaintClearanceSignal, UserActionSignal).
task_design_signature?: {
graph_topology_hash: string // hash of task graph at emit time
upstream_module_types: string[] // e.g., ["step.claim_extractor",
// "step.judge", "step.evaluator"]
upstream_module_version_constraints?: Record<string, string>
// e.g., {"step.judge": "^1.2"}
// enables version-aware patterns
segment_ids?: string[] // task segments present
task_blueprint_ref?: string // when task from saved Task Blueprint
// (R0.6.4 §6); enables blueprint-level
// correlation
}
// Timing
emitted_at: ISO8601
// Payload — typed signal-specific record
payload_ref: StorageRef
schema_version: 1
}
```
### §5.2 SignalType enum
The eight Phase 1 signal types:
```ts
SignalType =
| "outcome_evaluation" // Addenda B owns
| "repair_cycle" // Addenda B owns
| "task_process_gap_runtime" // Addenda B owns
| "taint_clearance" // Addenda B owns
| "hard_call_resolution" // Addenda B owns
| "prompt_comparison" // Addenda A owns
| "task_design_correlation" // DOC8/BDSM owns (aggregate)
| "user_action" // Addenda B R0.6.4 owns (existing)
```
The mapping from `signal_type` to payload schema and owning addendum is canonical:
| signal_type | Payload schema | Owning addendum | Section |
|---|---|---|---|
| `outcome_evaluation` | OutcomeEvaluationSignal | Addenda B Core R0.7.1 | (specified in Core R0.7.1) |
| `repair_cycle` | RepairCycleSignal | Addenda B Core R0.7.1 | (specified in Core R0.7.1) |
| `task_process_gap_runtime` | TaskProcessGapSignal | Addenda B Core R0.7.1 | (specified in Core R0.7.1) |
| `taint_clearance` | TaintClearanceSignal | Addenda B Core R0.7.1 | (specified in Core R0.7.1) |
| `hard_call_resolution` | HardCallResolutionSignal | Addenda B Core R0.7.1 | (specified in Core R0.7.1) |
| `prompt_comparison` | PromptComparisonSignal | Addenda A R4.1 V3 | (specified in Addenda A) |
| `task_design_correlation` | TaskDesignCorrelationSignal | DOC8/BDSM | (specified in DOC8) |
| `user_action` | UserActionSignal | Addenda B R0.6.4 | R0.6.4 §24A.7 |
### §5.3 Governance gating
EC Core's compiled policy engine gates signal persistence and promotion at the envelope layer:
- **`data_class`** — privileged signals do not auto-promote to durable learning without explicit policy gate
- **`matter_id`** — matter-scoped signals do not auto-cross matter boundaries (per Addenda B V3.1 §13.4 firewall)
- **`pattern_promotion_eligible`** — flag indicating whether the signal feeds pattern promotion (some signals are diagnostic-only and not for learning)
EC Core consumes the envelope fields directly; this document does not re-specify the policy mechanism. See EC Core Addendum A V3.3 §3 for the compiled policy engine.
### §5.4 Model context for cheap-LLM learning
Per coordination V3 §2.10, `model_class` enables the cheap-LLM learning generator mode. Signals tagged `model_class = "cheap_local"` or `"cheap_api"` produced under `learning_mode = "signal_generation"` runs feed pattern learning with the `cross_model_applicability` defaulting to `requires_validation` (so cheap-model patterns don't auto-apply to production-model contexts).
### §5.5 Task design signature (optional)
The `task_design_signature` field captures the task graph context at signal emit time, enabling BDSM correlation analytics like "tasks without citation-verification segment fail criterion 3 at 4x rate" without requiring expensive after-the-fact graph reconstruction.
The field is OPTIONAL because some signal types don't have task-design relevance (taint clearance is point-in-time; user actions don't depend on graph topology). Signals that benefit from task-design correlation MUST populate it; signals that don't MAY omit.
---
## §6 Criterion
### §6.1 Schema
`Criterion` is the public sub-contract on `EvaluationOutcomeDefinition.criteria[]`. Judge consumes it for `outcome_compliance_scoring`. The Evaluator scopes findings to criteria via `EvaluationFinding.target_criterion_id`.
```ts
Criterion {
criterion_id: string // stable within the outcome
criterion_text: string // natural language description
criterion_semantics_hash: string // stable across runs even when
// criterion_text rephrases
// Aggregation metadata
required: boolean // must-have vs nice-to-have
weight: number | null // 0.0-1.0; null → uniform within outcome
// when default_weight_policy = "uniform"
priority?: "must_have" | "should_have" | "nice_to_have"
// Scoring metadata (consumed by Judge's outcome_compliance_scoring)
rubric_hint?: string // optional pre-authored guidance
scoring_basis:
| "deterministic_count" // count-based (cite N items)
| "source_verified" // verify against external source
| "rubric_anchored_judgment" // qualitative with anchors
| "unanchored_llm_judgment" // qualitative without anchors;
// NOT aggregation-eligible by default
// Evidence requirements (consumed by both Judge and Evaluator)
required_claim_types?: ClaimType[] // claim types needed
// (from Addenda A 22-type registry)
evidence_requirements?: string[]
source_policy_refs?: StorageRef[]
schema_version: 1
}
```
### §6.2 Joint ownership
Per coordination V3 §2.4, `Criterion` is owned jointly by Addenda A and Addenda B. Neither addendum may modify the schema unilaterally; changes require coordination. The architect adjudicates if the chats disagree.
### §6.3 Stability hash
`criterion_semantics_hash` is computed from a normalized representation of `criterion_text`:
- Lowercase
- Whitespace normalized (collapse runs to single space, strip leading/trailing)
- Punctuation stripped except inside quoted strings
- Content words only (stopwords removed)
- SHA-256 over the normalized representation
This makes the hash stable across cosmetic edits to `criterion_text` (rephrasing, formatting fixes, casing changes) while still detecting substantive changes. The hash anchors cross-run learning: Signal 3 per-criterion deltas correlate even when `criterion_text` evolves.
### §6.4 Aggregation eligibility
`scoring_basis` governs aggregation eligibility in Judge's `outcome_compliance_scoring`:
- `deterministic_count`, `source_verified`, `rubric_anchored_judgment` → aggregation-eligible by default
- `unanchored_llm_judgment` → NOT aggregation-eligible by default
Judge's `OutcomeComplianceScoringConfig` (in Addenda A) can override `unanchored_llm_judgment` aggregation with an audit flag.
### §6.5 Evidence requirements
`required_claim_types` references Addenda A's 22-type `ExtractedEvaluationUnit` registry (factual_assertion, citation_reference, etc.). Criteria with non-empty `required_claim_types` trigger the Outcome Compiler's `claims_in` port wiring check (per Addenda B V3.3.1 §5.17.4).
---
## §7 Anchoring primitives
### §7.1 ArtifactScopeRef
Used to reference a scope within an artifact (the full document, a section, a paragraph, a citation block, etc.). Consumed by:
- `EvaluationResultEnvelope.target_scope_ref` — what scope was evaluated
- Addenda A Claim Extractor — what scope an extracted unit references
- Addenda B Revisor `RevisionPlanStep.section_refs` — what scope an action operates on
- Pattern learning — what scope a learned pattern applies to
```ts
ArtifactScopeRef {
artifact_ref: StorageRef
artifact_version_ref: StorageRef
scope_kind:
| "document" // full document
| "section" // identified section
| "paragraph"
| "citation_block"
| "claim" // single claim span
| "page_range"
| "line_range"
| "field" // structured field
anchor: TextAnchor | StructuredAnchor | null
anchor_confidence: number // 0.0-1.0
schema_version: 1
}
```
### §7.2 TextAnchor
For positional anchoring in unstructured text:
```ts
TextAnchor {
start_offset: number // character offset from artifact start
end_offset: number
context_hash: string // hash of surrounding text (e.g.,
// 50 chars before + 50 after) for
// drift detection
schema_version: 1
}
```
The `context_hash` enables drift detection: if the surrounding text has changed since the anchor was captured, the anchor may have drifted and consumers should re-resolve via semantic search or surface the drift to the user.
### §7.3 StructuredAnchor
For positional anchoring in structured documents (legal briefs with section numbers, contracts with article references, etc.):
```ts
StructuredAnchor {
section_id?: string // e.g., "III.B.4"
field_path?: string // e.g., "header.title"
citation_ref?: string // e.g., "footnote_12"
schema_version: 1
}
```
Implementations populate whichever subset applies. Structured anchors are more durable than text anchors when the document's structural identifiers are stable, less durable when section numbers change between drafts.
### §7.4 Shared by extractors
Per coordination V3 §2.12 and PropA R6.3+ coordination, the anchoring primitives in this section are shared between:
- Addenda A's `step.claim_extractor` (extracts evaluation units anchored via TextAnchor/StructuredAnchor)
- PropA's `P0_master_extraction` (extracts knowledge graph candidates anchored similarly)
- Any future extraction module
The two extraction systems remain separate (different consumers, different lifecycles per coordination V3 §2.12), but they share these anchoring primitives. Shared infrastructure also includes source-span resolution and extraction cache keying, owned at this level.
---
## §8 Lineage primitives
### §8.1 VariantEvaluationLineage
Carried by `EvaluationResultEnvelope.variant_lineage` when the evaluation was on an Experiment variant:
```ts
VariantEvaluationLineage {
experiment_run_id: string
comparison_group_id: string // V2 R58 / V4 R217
variant_id: string
is_baseline: boolean // baseline variant of the experiment
sibling_variant_ids: string[]
schema_version: 1
}
```
Per coordination V3 §2.9 Pattern A: each per-variant Evaluator activation populates this field with its own variant_id and the experiment's comparison_group_id.
Per Pattern B (bundled comparative): the comparison-aware Evaluator emits an `EvaluationResultSet` containing multiple `EvaluationResultEnvelope`s, each with its own `variant_lineage`.
Per Pattern C (ad-hoc Judge attachment): `variant_lineage` is null. Pattern C runs without an Experiment context.
### §8.2 CriterionLineage
Carried by `EvaluationResultEnvelope.criterion_lineage[]` when the evaluation evaluated specific criteria:
```ts
CriterionLineage {
criterion_id: string
criterion_semantics_hash: string // matches Criterion.criterion_semantics_hash
metric_semantics_version?: string // present when Judge scored this criterion
scoring_basis: Criterion["scoring_basis"] // copied from Criterion
schema_version: 1
}
```
This is per-criterion lineage that consumers can correlate across runs without traversing the full `EvaluationOutcomeDefinition`. Pattern primitives use this to compute per-criterion deltas across re-evaluations.
---
## §9 Validation rules
### §9.1 Envelope-level validations
Conforming implementations enforce:
- **`evaluation_snapshot_ref` non-empty.** Producers MUST emit a snapshot before emitting the envelope. Code: `validation.envelope_missing_snapshot_ref`.
- **`evaluation_verdict` consistent with `result_lifecycle_status`.** `complete` status with `not_applicable` verdict is OK; `error_no_result` status with `passed` verdict is invalid. Code: `validation.envelope_inconsistent_verdict_lifecycle`.
- **`producer_kind` matches owning addendum.** A producer registered as `step.judge` MUST emit `producer_kind = "judge"`; cross-kind emission fires `validation.envelope_producer_kind_mismatch`.
- **Slices populated per ownership matrix (§2.3).** If `producer_kind = "judge"` and `qualitative_slice` is non-null, fire `validation.envelope_judge_emitted_qualitative_slice` (warning; not blocking — agent review gates may legitimately emit both).
- **Wrapped in `EvaluationArtifactEnvelope`.** Bare `EvaluationResultEnvelope` emission is invalid. Code: `validation.envelope_not_wrapped`.
### §9.2 Slice-level validations
- **QuantitativeSlice present requires `scorer_hash` and `metric_semantics_version`.** Without these, score deltas across runs are not comparable. Code: `validation.quantitative_slice_missing_semantics_anchors`.
- **QualitativeSlice findings reference valid Criterion ids when `target_criterion_id` is set.** A finding pointing to a non-existent criterion fires `validation.finding_unknown_criterion_id`.
- **ComparisonSlice `target_variant_id` requires non-null `variant_lineage`.** Code: `validation.comparison_slice_missing_variant_lineage`.
- **AssuranceSlice `evidence_status = "missing_required"` requires non-empty `limitations`.** A missing-evidence verdict without an explanation is incomplete. Code: `validation.assurance_slice_missing_limitation_explanation`.
- **SafetyAndGovernanceSlice `privileged = true` requires `matter_id` populated.** Privileged signals without matter scope fire `validation.safety_slice_privileged_without_matter`.
### §9.3 Signal envelope validations
- **`signal_type` matches payload schema.** A `signal_type = "repair_cycle"` envelope wrapping a `PromptComparisonSignal` payload fires `validation.signal_type_payload_mismatch`.
- **`data_class = "privileged"` requires `matter_id` populated.** Same as §9.2 above for the signal layer.
- **`pattern_promotion_eligible = true` requires `governance_policy_ref` resolvable.** EC Core must have a policy decision; missing reference fires `validation.signal_promotion_eligible_without_policy`.
- **`model_class` populated.** Every signal carries model context. Code: `validation.signal_missing_model_class`.
### §9.4 Criterion validations
- **`criterion_semantics_hash` matches normalized `criterion_text`.** Tampered or stale hashes fire `validation.criterion_semantics_hash_mismatch`.
- **`scoring_basis = "unanchored_llm_judgment"` AND `required = true` requires explicit user acknowledgment.** Unanchored subjective criteria that are also required can produce indeterminate verdicts without clear path forward. Code: `validation.criterion_unanchored_judgment_required_without_ack` (warning).
- **`weight` outside [0.0, 1.0] is invalid.** Code: `validation.criterion_weight_out_of_range`.
### §9.5 Anchor validations
- **`anchor_confidence` outside [0.0, 1.0] is invalid.** Code: `validation.anchor_confidence_out_of_range`.
- **`ArtifactScopeRef.anchor` null is allowed only when `scope_kind = "document"`.** Sub-document scopes require an anchor. Code: `validation.scope_ref_sub_document_without_anchor`.
---
## §10 Versioning and evolution rules
### §10.1 Schema bump categories
Each schema in this document carries `schema_version`. Bumps are classified:
- **MAJOR** (breaking): existing consumers must update before processing new envelopes. Examples: removing a required field; changing field type; reordering enum values such that wire serialization breaks.
- **MINOR** (additive): existing consumers continue working with old envelopes. Examples: adding optional field; adding new enum value with explicit "default unrecognized" semantics; adding new optional slice.
- **PATCH** (clarification): no schema change; documentation or validation rule clarification. Consumers may safely ignore.
### §10.2 Joint-ownership coordination
Schemas marked as joint Addenda A + Addenda B ownership in §0.2 require coordination for any version bump:
| Schema | Joint owners | Coordination required for |
|---|---|---|
| EvaluationResultEnvelope | Addenda A + Addenda B | any bump |
| Slice schemas (5) | Addenda A + Addenda B | any bump |
| EvaluationLearningSignalEnvelope | Addenda A + Addenda B + DOC8 (consumer) | any bump |
| Criterion | Addenda A + Addenda B | any bump |
| ArtifactScopeRef, TextAnchor, StructuredAnchor | Addenda A + Addenda B + PropA | any bump |
| VariantEvaluationLineage, CriterionLineage | Addenda A + Addenda B | any bump |
Coordination protocol:
1. Either chat proposes a bump via amendment to this document.
2. The other chat reviews and accepts, modifies, or contests.
3. If contested, architect (Will) adjudicates.
4. On agreement, this document version bumps (§10.4) and the schemas update.
5. Downstream OP-A rows are updated to reflect the new schema version.
### §10.3 Cross-doc notification
Bumps that affect schemas consumed by PropA, DOC72, DOC8, EC Core, or DOC20 require the corresponding OP-A row to be updated and an `[XDOC-INSERT:target_doc]` block to be added to this document specifying the change. Will's cross-doc obligations pass picks these up.
### §10.4 Document version
This document carries its own `Version: X.Y`. The version tracks the document, not individual schemas. Schema version fields remain authoritative for their specific contracts.
Bumps to this document:
- MAJOR — schema breaking change to any contract
- MINOR — schema additive change to any contract, or new contract added
- PATCH — documentation clarification, validation rule clarification, migration guide update
Current: V1.1 (Pattern C consumption semantics patch; no schema changes from V1.0).
---
## §11 Migration guide for DOC23 R3.2 absorption
When DOC23 R3.2 compilation pass runs, this document retires. The migration:
### §11.1 Schemas migrating
All twelve schemas in this document migrate to DOC23 R3.2 as a new top-level section "Evaluation Common Contracts" (anticipated section number TBD based on R3.2's organization):
1. ProducerKind enum
2. EvaluationResultEnvelope
3. QuantitativeEvaluationSlice
4. QualitativeEvaluationSlice
5. ComparisonEvaluationSlice
6. AssuranceAndLimitationSlice
7. SafetyAndGovernanceSlice
8. EvaluationLearningSignalEnvelope
9. SignalType enum
10. Criterion
11. ArtifactScopeRef + TextAnchor + StructuredAnchor
12. VariantEvaluationLineage + CriterionLineage
### §11.2 References update
After absorption:
- Addenda A R4.1 V3 references to "DOC23 Evaluation Common Contracts §X.Y" update to "DOC23 R3.2 §X.Y"
- Addenda B Core R0.7.1 same update
- Addenda B V3.3.1 (and any further surgical patches) same update
- PropA references to "DOC23 Evaluation Common Contracts §7" update to "DOC23 R3.2 §[scope-primitives]"
- DOC8 references to "DOC23 Evaluation Common Contracts §5" update to "DOC23 R3.2 §[signal-envelope]"
The architect's cross-doc-obligations pass through the spec library applies these updates.
### §11.3 OP-A row reconciliation
OP-A rows referencing this document update their `Owner` field from "DOC23 Evaluation Common Contracts" to "DOC23 R3.2". Specifically:
- OBL-XDOC-EVAL-ENV-01 — Owner shifts to DOC23 R3.2
- OBL-XDOC-SCOPE-PRIMITIVES-01 — Owner shifts to DOC23 R3.2
- All `specified_in_owner` status entries referencing this document transition to `specified_in_R3_2`
### §11.4 Retirement protocol
When R3.2 absorption is complete:
1. This document's status field changes to "RETIRED — content absorbed into DOC23 R3.2 §[X.Y]"
2. The schemas in §3 through §8 are replaced with single-line "See DOC23 R3.2 §[X.Y]" references
3. The document remains in the spec library as historical record but is no longer the authoritative source
4. New references to the contracts MUST point to DOC23 R3.2
### §11.5 Backward compatibility during migration
Implementations using the schemas during migration:
- Active implementations consuming the schemas continue working — the schemas migrate by reference, not by value
- Schema version numbers DO NOT bump on migration (the schemas are identical; only their hosting location changes)
- Validation codes referenced in §9 retain their codes; they remain enforced by R3.2 conformance gates
### §11.6 Timing trigger
Migration runs when DOC23 R3.2 compilation pass opens. At that point:
- R3.2 chat lifts the schemas into the parent doc structure
- This document's retirement is recorded in Will's cross-doc-obligations pass
- Both Addenda A and Addenda B chats receive notice via OP-A row status changes
---
## §12 Cross-doc obligations summary
The OP-A rows that touch this document, with current status. Full row text lives in the source coordination response (`DOC23_ADDB_RESPONSE_TO_ADDA_R4_1_V3_COORDINATION_PROPOSAL_V1.md` §9.1) and the Addenda A R4.1 V3 / V4.1 Coordination Patch.
| OP-A row | Relevance to this document | Status |
|---|---|---|
| OBL-XDOC-EVAL-ENV-01 | This document owns the schema (§3) | specified_in_owner |
| OBL-XDOC-SCOPE-PRIMITIVES-01 | This document owns the schemas (§7) | specified_in_owner |
| OBL-XDOC-MODULES-REGISTRY-01 | This document references the module types; DOC23 R3.2 will register them at parent level | pending_R3_2_compile |
| OBL-XDOC-OUTCOME-COMPLIANCE-01 | This document hosts Criterion (§6) which Judge consumes | in_review (Addenda A R4.1 V3 pending) |
| OBL-XDOC-PROMPT-COMPARISON-SIGNAL-01 | Signal wraps in envelope from this document | in_review |
| OBL-XDOC-CLAIM-EXTRACTOR-PUBLIC-01 | Claim Extractor uses anchoring primitives from this document | in_review |
| OBL-XDOC-EVALUATOR-CLAIMS-IN-01 | Evaluator consumes ClaimSetBundle via port specified in Addenda B V3.3.1 | specified_in_owner |
| OBL-XDOC-EVAL-SIGNAL-OWNERSHIP-01 | All signals wrap in envelope from this document | specified_in_owner |
| OBL-XDOC-LEARNING-MODE-01 | model_class field in envelope tracks learning mode | specified_in_owner |
| OBL-XDOC-MODEL-CLASS-AXIS-01 | model_class enum from this document used in DOC72 patterns | pending_DOC72_update |
| OBL-XDOC-BDSM-CONSUME-SIGNALS-01 | BDSM consumes unified signal stream wrapped by this envelope | pending_DOC8_update |
| OBL-XDOC-EC-POLICY-SIGNALS-01 | EC Core gates envelope governance fields | pending_EC_Core_update |
| OBL-XDOC-PROPA-DSPY-TARGETS-01 | PropA shares anchoring primitives | pending_PropA_update |
| OBL-XDOC-DOC20-EVAL-UI-01 | DOC20 renders envelope, slices, criteria, signals | pending_DOC20_update |
### §12.1 [XDOC-INSERT:...] blocks
This document does not contain `[XDOC-INSERT:...]` blocks directly. The original inserts in Addenda B response §9.2 specify what each target doc (DOC23 R3.2, PropA, DOC72, DOC8/BDSM, EC Core, DOC20) absorbs. Will's coding agent applies them during the cross-doc obligations pass per the Addenda A R4.1 V3 / V4.1 Coordination Patch and Addenda B Core R0.7.1 deliverables.
### §12.2 Coding-agent instructions
When a coding agent implements this document or any consuming addendum, it:
1. Treats the schemas in §3 through §8 as canonical until R3.2 absorbs them
2. Resolves cross-references like "see Addenda B Core R0.7.1 §X.Y" to the current draft of that document
3. Enforces the validation rules in §9 in any conforming implementation
4. Watches for retirement notification (§11.4) and updates references when DOC23 R3.2 absorbs the contracts
---
# END OF DOC23 EVALUATION COMMON CONTRACTS V1.1
**Document:** DOC23 Evaluation Common Contracts V1.1.1
**Status:** Active — authoritative source for shared evaluation primitives until DOC23 R3.2 absorbs them
**Date:** 2026-05-17 (V1); 2026-05-17 (V1.1 Pattern C consumption semantics)
**Total sections:** 12
**Total schemas:** 12 (ProducerKind, EvaluationResultEnvelope, 5 slices, EvaluationLearningSignalEnvelope, SignalType, Criterion, ArtifactScopeRef, TextAnchor, StructuredAnchor, VariantEvaluationLineage, CriterionLineage); no schema changes from V1 — V1.1 adds §3.7 documenting Pattern C envelope consumption semantics
**Total OP-A rows referenced:** 15 (V1's 14 + V1.1's OBL-XDOC-JUDGE-EVALUATOR-OUTPUT-IN-01)
**Self-contained:** Yes; can be reviewed by either chat or by Will without consulting the other addenda
**Retirement trigger:** DOC23 R3.2 compilation pass — see §11
V1 may be retired once V1.1 is reviewed.
---