ELNOR REPO READER TEXT MIRROR Original path: Active Working and Red Team/DOC23 Working/DOC23 Red Teaming/Archived DOC23 Red Teaming/DOC23 Addenda B RT ADJUDICATION CARD STAGED.md Source repo: /Users/OpenClaw1/Elnor/Elnor Specs Git branch: main Git commit: dbaa25962edc11ab30e8d4ca1715f9ae5bf77331 Generated: 2026-06-09T01:23:58.539Z --- ```` # DOC23 Addenda B — Red Team Adjudication Card (Staged Application Build) **Status:** Draft, pending architect review + multi-LLM red-team. **Layers 0–5 + R0.4 audit additions; all rows rendered in full (Appendix R)** — front matter + Step 0 + Layer 1 (seven shared primitives) + Layer 2 (four flips, anchor-confirmed) + Layer 3 core (criticals/collisions/re-anchors/re-grades) + Layer 3 C-cluster math + standard rows. **Remaining:** Layer 4 (§G as read-models + DOC20 OP-A), Layer 5 (fixtures), the new rows, then final pass. This is the assembled staged card; it supersedes `DOC23_ADDENDA_B_RT_ADJUDICATION_CARD_CONSOLIDATED.md` for application order (the consolidated card stays as the row-detail source). **What this is.** The application-ready restructure of the 126-item, four-reviewer consolidated card, reorganized per the three reviews **of that card** (Claude + ChatGPT + Gemini). It supersedes the consolidated card for application order and for every item whose disposition or fix changed; unchanged CONFIRMED rows are carried by reference. **Why staged (the central finding of all three card-reviews).** Do **not** apply the card as 126 flat rows. Apply a structural **Layer 1 first, defined once**, then the disposition flips, then the critical row fixes, then §G, then fixtures. Defining shared primitives once removes the per-row redefinition that inflated the consolidated card and caused cross-row collisions. ``` Step 0 Strip / type-owner pass on the fix package itself (run BEFORE applying anything) Layer 1 Shared structural primitives, defined once Layer 2 Disposition flips (reverse/redirect prior rows) — anchor-confirmed Layer 3 Critical row fixes + cross-row collisions + re-grades + C-cluster math + standard rows Layer 4 §G Professional-Reliance / UX layer, as derived read-models (no new truth store) Layer 5 Fixtures (golden scenarios + chaos/concurrency) ``` **Owner-first rule (binding; root cause of three reversed rows).** Before accepting any finding of the form "*X is missing / referenced but never defined*," resolve **whether X is already owned or defined elsewhere** (its owner doc, an existing enum, an existing field). A grep-level absence in one doc is not proof of absence in the system. Three consolidated-card rows (A-11, B-25, B-15) asserted a missing element that already exists or is owned elsewhere; they flip in Layer 2. Enforced mechanically in Step 0. **Severity reframe (propagated everywhere).** A finding's severity is **importance** — it drives revision priority and human-review surfacing — and **never halts** the run. Only mechanical/system errors halt execution (`StepExecutionFailureKind`). An output is not auto-"passed" while serious findings are unresolved; if unresolvable, termination logic (V3.3.1 §6.7.3, see B-24) **escalates** (Gate-4 review or "rely with limitations"), it does not halt. Irreversible external actions (filing/sending) remain gated by an action permission — an action gate, not a finding block. This retires the "blocking-authority guard" debate (A-05) and adjusts A-06 and B-24. **Disposition vocabulary:** `ACCEPT` · `ACCEPT WITH MODIFICATIONS` · `ACCEPT-AS-FIX` · `DECLINE` (reverse a prior ACCEPT) · `REDIRECT-OP-A` (owned by another doc → OP-A row) · `DEFER-PhaseB` (§E) · `DECLINED` (§F) · `ELEVATE` · `OPEN_FOR_ARCHITECT_REVIEW`. **Coverage crosswalk.** Every one of the 126 consolidated-card items is accounted for in exactly one layer (the Layer Map). No item is dropped; flips and re-grades are explicit; §E/§F carry forward unchanged. --- ## Layer Map — where each consolidated-card item lands **Layer 2 — Disposition flips / redirects** (all four anchor-confirmed — see Layer 2) - `A-11` → **DECLINE** (`LearningMode` already enumerated, V3.3.1 §6.16.1); the *use* stays DEFER-PhaseB. - `B-25` → **DECLINE** (`depends_on_step_ids` + cycle detection §6.11.1 both exist); de-CRITICAL. - `B-15` → **REDIRECT-OP-A** (`RoomKind.plan_review` owned by DOC12, `OBL-DOC12-FORUM-01`). - `D-04` → **SPLIT**: keep Tier 0 (`lookup_receipt`), bar it from trust claims; retain the other three sub-fixes. - `A-05` → **subsumed by the severity reframe**; `A-06` verdict-mapping adjusts (standard row). **Layer 3 — Critical fixes, cross-row collisions, re-grades** (rendered in full): `A-01` (→Layer 1) · `A-02` HIGH (elevation declined) · `A-04` (route-resolution) · `A-07` (one `NoVerdictReason` + aliases) · `A-12` (`OutcomeVerdict`) · `A-16` CRITICAL (`revision_in` chokepoint) · `A-23` (re-anchor to Source Workspace) · `A-26` (`FindingMatchKey`) · `B-02↔B-27` (`attempt_seq`+`prior_output_hash`) · `B-08` (dual `FailureKind` + `StepExecutionFailureKind`) · `B-24` CRITICAL (sufficiency predicate → §6.7.3 loop breaker) · `B-26` (§6.11↔§6.7.3) · `C-03` (remove predicted hashes) · `D-01` (transitive taint) · `D-13` HIGH/injection. **Layer 3 — Math/scoring hardening** (C-cluster): `C-01`,`C-02`,`C-04`,`C-05`,`C-06`,`C-07`,`C-08`,`C-09`,`C-10` on the L1.5 shapes (`C-03` is in Layer 3 core). **`B-16` is a concurrency row, not a C-cluster item.** **Layer 4 — §G Professional-Reliance / UX** (`G-01`…`G-21`): KEEP entire layer; build as **derived read-models** (`DerivedReadModelRecord`, Layer 1). UI surfaces (`G-07`, `G-12`…`G-16`, `G-20`) → OP-A row to DOC20. **Layer 5 — Fixtures**: `G-21` (chaos/concurrency) + golden scenarios. **Layer 3 — Standard fixes (carried by reference):** all remaining §A/§B/§C/§D ACCEPT rows not named above — ID + final disposition + consolidated-card anchor (full schema in the consolidated card). **§E (held, Phase-B)** `E-01…E-13` and **§F (declined)** `F` — rendered in full in Appendix R.5–R.6; carry forward unchanged; no per-row action. **New rows added this restructure:** the Layer-1 primitives as first-class rows; reliance-decay; delegation; `D-19`/`D-20` completions; Grok §3.3 / §5.7 / §5.9. Citation fix: **B-24 is Grok §4.9**. --- ## Step 0 — Strip / type-owner pass on the fix package (run before applying anything) The **subtractive stroke applied to the fix package itself** — removes adds that are redundant (already owned) or unspecifiable, and surfaces the Layer-2 flips. Runs once over the whole set before any row is applied. 1. **Admission gate (one-time, pre-write).** Every proposed type/enum/field/schema is checked against the `TypeOwnerRegistry` (§L1.2). The gate is a **check, not a standing object**: passes only if the symbol is (a) new + assigned an owner doc, or (b) an extension of an owned symbol. New canonical object families trip `ARCHITECT_STOP`. 2. **Phantom-reference lint** (`SchemaReferenceValidationRule`, §L1.2). Any reference to a field/type not in the registry fails. Catches "phantom schema refs" (A-09) at the package level. *This pass also caught two phantom fields in an early draft of this card — `confidence_merge_strategy` and `tiebreaker_epsilon` — both removed.* 3. **Owner-first resolution.** For every "X missing / referenced-never-defined" finding, confirm X's owner first. Output: the Layer-2 flips (A-11, B-25, B-15) + the D-04 hold. 4. **Collision scan.** Detect name collisions (e.g., `EvaluationVerdict` reused for an outcome concept; `FailureKind` defined twice). Output: Layer-3 renames (A-12→`OutcomeVerdict`, B-08→`StepExecutionFailureKind` + dual-`FailureKind` consolidation). 5. **Dedup-of-truth scan.** Detect rows inventing the same primitive under different names (termination counter, finding match key, read-model). Output: the Layer-1 consolidations (one `RevisorTerminationLedger`, one `FindingMatchKey`, the `DerivedReadModelRecord` family). Step-0 lints: ``` step0.symbol_without_owner step0.phantom_field_reference step0.missing_claim_unresolved_owner // owner-first rule step0.name_collision_across_package step0.duplicate_primitive_across_rows ``` --- ## Layer 1 — Shared structural primitives (defined once) Seven primitives defined **once** and referenced by ID from every later row. All three card-reviews built variants independently; consolidating them removes the per-row redefinition and the cross-row collisions. Each carries the rows it subsumes. ### L1.1 `GovernanceEnvelope` — universal governance mixin ```ts interface GovernanceEnvelope { policy_decision_ref: PolicyDecisionRef // the EC decision that authorized this object's disposition policy_generation_id: string // ties to the compiled policy version (B-12 freshness) privilege_class: PrivilegeClass // attorney_client | work_product | confidential | none access_tier: AccessTier // matter_team_access | supervising_attorney | firm_admin taint_class: TaintClass // rides the object; consumers treat content as data, not instruction matter_ref: MatterRef | null // scopes cross-matter visibility (D-24) schema_version: 1 } ``` *Subsumes:* A-20, D-02, D-07/D-08, D-24. The Review Studio `LifecycleActorEnvelope` (DOC23_ADDB_Review_Studio.md §9.1) **extends** this mixin. ### L1.2 `TypeOwnerRegistry` + admission gate + phantom-reference lint ```ts interface TypeOwnerRegistryEntry { symbol: string; owner_doc: string; defined_at: string status: "operative" | "proposed_this_pass" | "extension_of" extends_symbol: string | null schema_version: 1 } // One-time admission check (Step 0, NOT a runtime object): pass iff symbol is (a) new + owner_doc, OR (b) extension_of an owned symbol. New canonical family -> ARCHITECT_STOP. interface SchemaReferenceValidationRule { rule_id: "schema_reference_validation"; severity: "blocker" } // FAIL: any field/type reference not present in TypeOwnerRegistry. ``` *Subsumes:* A-09, A-13 (dual learning envelope forbidden), A-25 (one owned name), D-22. ### L1.3 `EvaluationFinding` — event / record / view (A-01, canonical) Single finding model, shared with the Review Studio unit (DOC23_ADDB_Review_Studio.md §2). Replaces the two incompatible schemas; keeps **both** bases; the view is a read-only copy. ```ts interface EvaluationFindingEvent { // immutable per-round audit record finding_event_id: string; outcome_id: string; target_artifact_ref: string target_version_ref: string // version-pinned (V3.3.1 §5.7) target_scope_ref: TargetScopeRef | null // anchor; maps to a DOC20 CommentAnchor detected_at: string; severity: FindingSeverity // IMPORTANCE, never a halt assurance_basis: AssuranceBasis[] // user_instruction | saved_criteria | human_label | multi_reviewer_consensus | model_judgment_only authority_basis: EvaluationAuthorityBasis[] // 9-value FD enum (FD §3.4) — kept distinct from assurance_basis match_key: FindingMatchKey // L1.4 body: string; governance: GovernanceEnvelope; schema_version: 1 } interface EvaluationFindingRecord { // mutable lifecycle row finding_id: string; origin_event_id: string state: FindingState // active | resolved | human_resolved | superseded_by_revision | superseded_by_source_change | dismissed | contested target_version_ref: string resolved_by_receipt_ref: string | null // only a Revisor RevisionExecutionReceipt sets `resolved` (§5.7.2) superseded_by_finding_id: string | null; governance: GovernanceEnvelope; schema_version: 1 } interface FeedbackFindingView { // read-only projection the UI renders finding_id: string; display_state: FindingState; anchor: TargetScopeRef | null severity: FindingSeverity; body: string history: FindingStateTransition[] // per-round snapshots -> cross-round drift view schema_version: 1 } ``` *Subsumes:* A-01, A-05 (basis reconciliation; blocking part retired by the reframe), A-19 (`FindingState` negative exits), A-26 (`match_key`). `human_resolved` per DOC23_ADDB_Review_Studio.md §8.3. ### L1.4 `FindingMatchKey` + `RevisorTerminationLedger` ```ts interface FindingMatchKey { // stable identity across revision rounds artifact_root_id: string // stable root, NOT a version ref (A-26) outcome_id: string; criterion_id: string | null failure_signature: FailureKind // reuse the §6.2 FailureKind taxonomy (no new enum) schema_version: 1 } interface RevisorTerminationLedger { // ONE per revision run; all termination accounting here run_id: string; rounds: RevisionRoundRecord[] per_finding_attempts: { match_key: FindingMatchKey; attempt_seq: number; prior_output_hash: string }[] // B-02 ↔ B-27 repeated_insufficiency_count: number; loop_breaker_limit: 3 // N = 3 (B-24) sufficiency_predicate_results: SufficiencyPredicateResult[] // B-24 predicate feeding the §6.7.3 loop breaker no_progress_detected: boolean termination_reason: "converged" | "loop_breaker" | "escalated_gate4" | "rely_with_limitations" | null schema_version: 1 } ``` *Subsumes:* A-26, B-24, B-02 ↔ B-27, B-18/B-19, B-26 (one source of truth), D-18. ### L1.5 `FormulaEvaluationReceipt` + `MetricObservation` + `MetricRollup` Makes every score a receipted formula evaluation, and gives the math-audit guards one home. Per-formula patches in Layer 3. ```ts interface FormulaEvaluationReceipt { // one per score computation (C-01: Formula Registry) formula_id: string // registered formula (versioned) inputs: Record denominator_basis: string // e.g. "total_advice_rendered_count" not "accepted_count" (C-04) sample_size: number guards_applied: ("zero_denominator_fallback" | "underflow_clamp" | "percentile_bounded" | "censored_to_terminal")[] output: number; schema_version: 1 } interface MetricObservation { metric_id: string; value: number terminal_state_only: boolean // C-02: count only terminal-state cycles observed_at: string; context_ref: string; schema_version: 1 } interface MetricRollup { metric_id: string; method: "mean" | "p50" | "p90" | "weighted_mean" zero_denominator_fallback: "null" | "0" | "undefined_reported" // C-06 underflow_clamp: boolean // C-09: Math.max(0.0, weighted_mean - penalty) percentile_impl: "rolling_window" | "t_digest" | "exact_bounded" // C-05: no unbounded array missing_dimension_penalty_value: number // C-01 schema_version: 1 } ``` *Subsumes (shape for):* C-01 (Formula Registry — keystone), C-02 (terminal-state censoring), C-04 (severity-weighted rate), C-05 (bounded percentile), C-06 (zero-denominator/insufficient-sample). C-07 (`CostEstimate`), C-08 (`NoveltyMetricSpec`), C-09 (task-mode/template-match) carry their own Layer-3 schemas but resolve `formula_id`s here. **(Corrections: `confidence_merge_strategy` and `tiebreaker_epsilon` removed — both mis-anchored. A-04 is route-resolution (rendered in Layer 3); B-16 is a concurrency tiebreaker §11.9 (standard row), not a metric epsilon.)** ### L1.6 `DerivedReadModelRecord` + `ReadModelInvalidationSpec` "No new truth store" primitive for §G and cross-cutting surfaces (matches V3.3.1 §11.1.1 "derived, not invented"). ```ts interface DerivedReadModelRecord { read_model_id: string derived_from: string[] // source-of-truth refs; NEVER authoritative projection_kind: string // task_health | findings_inbox | budget_narrative | run_diff | ... invalidation: ReadModelInvalidationSpec governance: GovernanceEnvelope // privilege/tier-filtered on read schema_version: 1 } interface ReadModelInvalidationSpec { declared_dependencies: string[] // changing any invalidates the read-model (auditable, not predicted) refresh: "on_dependency_change" | "on_read" | "scheduled"; schema_version: 1 } ``` *Subsumes:* §G (G-01…G-21, Layer 4), the Review Studio read contract (§9.2), D-12. ### L1.7 `LintRegistry` ```ts interface LintRegistryEntry { lint_id: string; owner_doc: string; checks: string severity: "advisory" | "warning" | "blocker" waiver: { allowed: boolean; approver: "architect"; expires_at: string | null } schema_version: 1 } ``` *Subsumes:* every scattered lint across §A–§D and §G, plus the Step-0 and Review Studio lints. --- ## Layer 1 — acceptance Accepted when: (1) all seven primitives have an owner in the `TypeOwnerRegistry`; (2) no later row redefines them (referenced by ID only); (3) the Step-0 admission gate + phantom lint pass over the whole package; (4) the finding model matches the Review Studio unit; (5) `RevisorTerminationLedger` is the sole termination accumulator and the `FormulaEvaluationReceipt`/`MetricRollup` pair the sole score-computation shape. --- ## Layer 2 — Disposition flips (anchor-confirmed) Each flip was confirmed against **live spec text (section + line)** before being written, per the owner-first rule. These reverse or narrow prior ACCEPTs; the consolidated-card claim is quoted, then rebutted. ### L2.1 `A-11` → **DECLINE** (enumeration redundant; the *use* remains DEFER-PhaseB) - **Card claim (ACCEPT):** "`LearningMode` is referenced across §15 but never enumerated"; proposed `production | signal_generation | cross_calibration`. - **Verified:** enumerated at **V3.3.1 §6.16.1** (3098–3108) as `"production" | "signal_generation" | "calibration"`; field `learning_mode: LearningMode` (3072); §0.4 inventory (45). The card looked in §15 and missed §6.16.1. - **Disposition:** **DECLINE** the add — it would create a divergent duplicate (`cross_calibration` ≠the real `calibration`). The behaviors (§6.16.2) remain **DEFER-PhaseB** (E-07). - **Owner:** V3.3.1 §6.16.1. ### L2.2 `B-25` → **DECLINE** (field *and* cycle-lint both exist; proposed remedy regresses the schema) — was CRITICAL - **Card claim (CRITICAL):** "§11.3's plan-lint references 'DAG acyclic' but `RevisionPlan` steps carry no dependency field." Proposed replacing `RevisionPlanStepBase`. - **Verified:** `RevisionPlanStepBase.depends_on_step_ids: string[]` exists (3493); the check exists — `RULE dag_acyclic` (4774–4775), §6.11.1 topological-sort-with-cycle-detection (2949), §11.22.1 (5833), `validation.dag_cyclic` (8175), deterministic plan linting includes DAG (1312). - **Disposition:** **DECLINE**; de-escalate from CRITICAL. Both halves false. The proposed rewrite is **rejected** — it would delete ~6 real fields (`step_order`, `affected_artifact_refs`, `affected_outcome_ids`, `source_repair_depth`, `idempotency_key`, `preconditions`). (B-26's `revalidation_policy` is handled on its own in Layer 3.) - **Owner:** V3.3.1 §6.11.1 / §11.22.1. ### L2.3 `B-15` → **REDIRECT-OP-A** (already owned by DOC12; the obligation already exists) - **Card claim (ACCEPT, ⚠verify):** "`RoomKind.plan_review` used but not registered." - **Verified:** the Forum sub-addendum **itself** says DOC12 owns canonical room registrations incl. `RoomKind.plan_review` (45), the plan-review forum uses the DOC12-registered kind (754), config carries `room_kind?` resolving to DOC12 (179), and **`OBL-DOC12-FORUM-01`** already requires DOC12 to register it (874, 906–907). - **Disposition:** **REDIRECT-OP-A.** Addenda B must not register it (would duplicate DOC12 ownership / violate the Step-0 admission gate). Already captured as `OBL-DOC12-FORUM-01`; carry on the OP-A list. - **Owner:** DOC12 (`OBL-DOC12-FORUM-01`). ### L2.4 `D-04` → **SPLIT**: keep Tier 0 (de-escalation hold) + retain the other three sub-fixes - **Tier-0 sub-claim ("remove `0`") → FLIP.** **Verified:** tier 0 = `lookup_receipt` is an intentional, load-bearing "receipt records only" tier for ephemeral lookups (SrcWS 103–106; "Check today's stock price → tier 0", 239). - **Disposition:** **KEEP Tier 0.** Reconcile the type to **include** 0 (`SourceRecord.tier ∈ {0,1,2,3}`, 0 = `lookup_receipt`) rather than delete it. Add the safeguard: a Tier-0 `lookup_receipt` **must not** back a downstream reliance/trust claim → `validation.tier0_receipt_used_as_trust_basis` (error). - **Other three sub-fixes → KEEP as ACCEPT** (standard rows): persist every `SourceTierTransition` (§3.3); align verification-state vocab (§2.3 / §4.1); gate demotion (`from_tier > to_tier`) by `cleared_by_access_tier >= "matter_team_access"` + reason (`validation.source_tier_demotion_without_authority`). - **Owner:** Source Workspace §3 / §3.3. ### L2.5 `A-05` → subsumed by the severity reframe (no separate fix) The "blocking-authority guard" debate is retired by the reframe (findings carry importance, never halt; unresolvable serious findings escalate; irreversible external actions gated by an action permission). A-05's `assurance_basis` vs `authority_basis` content is captured in L1.3 (both bases retained). ### L2.6 New finding (surfaced by this verification): loop-breaker section citation is inconsistent V3.3.1 cites the `D16` loop breaker as **both** §6.11 (1318, 1982, 9560) **and** §6.7.3 (2894, 3069, 3232) — a concrete instance of **B-26**. Folded into B-26 (Layer 3). ### Layer 2 — disposition delta & lints Three items leave the ACCEPT-and-build set (`A-11` enum, `B-25`, `B-15`); `D-04` narrows; one CRITICAL retired (`B-25`). ``` layer2.declined_add_reintroduced layer2.tier0_removed_from_domain validation.tier0_receipt_used_as_trust_basis ``` --- ## Layer 3 — Critical fixes, collisions, and re-grades These rows **change content**. Each verified against live spec before writing. ### L3.1 Critical / safety / compliance **`A-16` — Direct repair can bypass the central `revision_in` safety contract — CRITICAL (confirmed).** - **Verified:** `DirectFixStep` (V3.3.1 3560) is itself locked — `target_port:"none_direct_fix"`, `direct_fix_class: DirectFixAllowedClass` (whitelist 601–608), `target_module_id`/`typed_instruction` FORBIDDEN, allow/forbid classes (601–617). The risk is the **FD wiring**: Feedback Delivery routes repair to `instruction_in`/`data_in`/`context_in` (FD 34), and the Revisor's safety chain runs through `revision_in` (FD 431). Those direct paths skip capability validation, preconditions, candidate versions, policy gates, preservation, and receipts. - **Fix (FD §7.2):** repair may **execute** only through (a) a `revision_in` port, or (b) `revision_compatible = true` **and** `ModuleRevisionCapability` coverage; all other wiring is **advisory context only**. ```ts interface PortRevisionEligibility { port_id: string; module_id: string; revision_compatible: boolean; // default false module_revision_capability_ref?: StorageRef; } // REQUIRED when revision_compatible = true // validation.repair_routed_to_non_revision_port (error) ``` - **Cross-ref:** same chokepoint the Review Studio unit relies on (§1.2, §5.4). Pairs with A-17. **`D-01` — Transitive taint laundering — compliance blocker (confirmed).** - **Verified:** `SourceWorkspace.current_taint_class` exists (SrcWS 145), retrieval lineage exists (296–299), but there is **no `source_kind → taint_class` map** and **taint is not transitive** — a tainted artifact summarized by a sub-agent/forum re-enters "clean." Privilege/firewall-grade for a litigator. - **Fix (SrcWS §4.1A + cross-addenda rule + `TaintAggregationPolicy`):** ship the default map (`web_source → external_untrusted`; `case_law|statute|regulation|api_result|database_record → external_authority_trusted`; `document|email|file|prior_task_output → user_trusted_bounded`; `library_entry → internal_corpus_trusted`); **transitive rule** — any module/sub-agent output inherits the **max** taint of its inputs unless it passes a declared `SanitizationNode`; privileged input → privileged workspace. ```ts interface SanitizationNode { node_id: string; input_taint: TaintClass; output_taint: TaintClass; method: "human_review" | "structural_strip" | "policy_filter"; evidence_ref: StorageRef; } // validation.taint_not_inherited_through_summary (error); validation.source_kind_taint_unmapped (error) ``` - The `SanitizationNode` is the only sanctioned taint-downgrade. All taint rides `GovernanceEnvelope` (L1.1). **`B-24` — Revisor sufficiency protocol detection — CRITICAL (confirmed; mechanism exists, detection underspecified).** - **Verified:** the loop breaker exists — `D16`, default **N=3** consecutive `still_failing_same_reason`, then BLOCK + escalate to Task Agent (§6.9) or human (§6.6) (1982, 2894, test F-LOOP-01 at 9080); `repeated_insufficiency_count` (3232). `still_failing_same_reason` = "same FailureKind + same primary finding" (1942). Underspecified: the **"same reason" detection**. - **Fix:** the predicate is `FindingMatchKey` equality (L1.4); all accounting lives on the single `RevisorTerminationLedger` (L1.4). Hard Call surfaces only on a real user choice (missing source/verification/capability). The §6.11-vs-§6.7.3 split is fixed under B-26. **`D-13` — Cross-run `RunGuidance` persistence + contested-check — ELEVATE to HIGH / injection-class (confirmed).** - **Verified:** `RunGuidanceCandidate` (FD 296) / `RunGuidanceItem` (FD 311) exist with `guidance_kind ∈ source_warning|do_not_use|must_include|style_rule` + `lifecycle_state`, but FD never says where guidance persists or how Run-A reaches Run B, `contested` is never checked, no lifecycle receipt. - **Why elevated:** `must_include`/`do_not_use` is **injected** into later runs — a cross-run injection vector; with D-01, tainted-derived guidance crossing runs is an injection path. - **Fix:** persist `RunGuidanceItem` to `TaskBlueprint.persistent_guidance[]` (one authoritative store; EC durable write); `lifecycle_state: proposed|active|contested|superseded`, consumers MUST skip `contested`; emit `RunGuidanceLifecycleReceipt`; guidance carries `GovernanceEnvelope`. `validation.run_guidance_consumed_while_contested` (error). (Cross-run *learning* vs DOC72 precedence is §E / E-02.) ### L3.2 Collisions / dedup **`A-01`** — finding model → Layer 1 (L1.3). No separate fix. **`A-26`** — `ProgressSignal` keys on `(failure_kind, target_artifact_section_ref, finding_summary_hash)` (1948) — version-sensitive. **Fix:** replace with `FindingMatchKey` (L1.4, stable `artifact_root_id`); `matched_findings` computed by key equality. **`B-02 ↔ B-27`** — `regenerate` = "rerun from scratch with new instructions" (2739); B-02 under-specified idempotency, B-27 identical-output loop invisible. **Fix:** `RevisorTerminationLedger.per_finding_attempts` carries `attempt_seq` + `prior_output_hash` (L1.4); a regenerate matching `prior_output_hash` for the same `FindingMatchKey` counts as `still_failing_same_reason`. **`B-08`** — `FailureKind` is declared **twice** (§0.4.9 / 493 and §6.2 / 2715, same 12 values); `WorkspaceWriteFailureKind` (657) is separate. **Fix:** §6.2 canonical; §0.4.9 references it; step-execution failures → `StepExecutionFailureKind` (distinct from evaluation `FailureKind` and `WorkspaceWriteFailureKind`). `supersession.dual_running_type_family`. **`A-12`** — §5.13 `EvaluationVerdict` is **per-lane** (`lane_*`, 2292 "Lane results aggregate into the OutcomeEvaluationResult"); no outcome-level type. **Fix:** add `OutcomeVerdict` (`passed|failed|indeterminate|not_applicable|blocked`); `EvaluationVerdict` stays per-lane; lane→outcome is the canonical mapping (CC §3.2). **`A-07`** — `evaluation_verdict` includes `indeterminate` (CC 204); `indeterminate_reasons: IndeterminateCause[]` cites a "V4 R203 taxonomy" (CC 214). **Fix:** one canonical `NoVerdictReason` + an **alias registry** mapping the variants; needs **no R203 knowledge**; full Addenda-A consolidation is an **OP-A row**. ### L3.3 Re-anchors / corrections **`A-02`** — Pattern C envelope field-location (ACCEPT/HIGH; **NOT elevated**). **Verified:** §3.3 requires `EvaluationResultEnvelope` be wrapped in `EvaluationArtifactEnvelope` (CC 289–295); the Judge needs `evaluated_target`/`evaluation_basis` but they live on `EvaluationFeedbackBundle`, so Pattern C breaks at read time. **Fix:** add both fields to `EvaluationResultEnvelope` (CC §3); bundle copy → projection. **Re-grade:** high-importance correctness fix, no safety/compliance/data-loss dimension → stays HIGH, not CRITICAL (correcting the early layer-map placeholder). **`A-04`** — Pattern C route-resolution policy (ACCEPT; corrected anchor). **Verified:** CC §3.7 resolution is "by consumer policy" with no schema. **Fix:** `EvaluationChainResolutionPolicy` + `DEFAULT_PATTERN_C_RESOLUTION_POLICY` (`qualitative_blockers_survive_numeric_pass: true`, `judge_can_override_evaluator: false`, `override_allowed_only_for_finding_states: ["contested","dismissed","rejected_by_user"]`, `route_precedence: "blocking_qualitative_first"`, `disagreement_route: "human_review"`). **Correction:** route-resolution, not confidence-merge — the `confidence_merge_strategy` field mistakenly attached in L1.5 was removed. **`A-23`** — re-anchor `authority_level` to Source Workspace (corrected anchor). **Verified:** the card's `[FD §3/§5]` anchor is **wrong** — `authority_level` lives in Source Workspace: `ApplicabilityScope.authority_level` (`binding|persuasive|advisory|unknown`, SrcWS 338) and `LegalSourcePayload.authority_level` (`binding|persuasive|distinguishable|adverse`, 359). **Fix:** reconcile the two: scope-level governs applicability; the domain payload (`distinguishable|adverse` = legal-specific extensions of `persuasive`) is descriptive. No FD change. **`B-26`** — standardize revalidation/termination naming. **Verified:** loop breaker cited as both §6.11 (1318, 1982, 9560) and §6.7.3 (2894, 3069, 3232). **Fix:** pick one anchor (recommend **§6.7.3**, where `repeated_insufficiency_count` lives, 3232); update the §6.11 cites. Do **not** introduce the invented `RevisionStepRevalidationPolicy` name; the policy is named on `RevisorTerminationLedger` (L1.4). ### L3.4 Re-grade summary - `A-16` CRITICAL · `D-01` compliance blocker · `B-24` CRITICAL (mechanism exists, detection specified). - `D-13` → **HIGH / injection-class** (elevated). - `A-02` → **stays HIGH, not CRITICAL** (elevation declined on review). - `B-25` → **DECLINE, de-CRITICAL** (Layer 2). - **Layer 1 corrections:** `confidence_merge_strategy` and `tiebreaker_epsilon` removed from L1.5 — mis-anchored to A-04 (route-resolution) and B-16 (a concurrency row, §11.9) respectively. ### Layer 3 lints (registered in `LintRegistry`, L1.7) ``` validation.repair_routed_to_non_revision_port // A-16 validation.taint_not_inherited_through_summary // D-01 validation.source_kind_taint_unmapped // D-01 validation.run_guidance_consumed_while_contested // D-13 supersession.dual_running_type_family // B-08 progress_signal.match_on_version_sensitive_ref // A-26 ``` --- ## Layer 3 (continued) — C-cluster math patches All attach to the L1.5 primitives (`FormulaEvaluationReceipt` / `MetricObservation` / `MetricRollup`). `C-01` is the keystone — the `FormulaRegistry` that L1.5 receipts reference; the rest are accepted reviewer patches (Gemini BUG-0x / ChatGPT CG-Mx), each verified against live spec before adoption. **`C-01` — Formula Registry (ACCEPT; keystone).** **Verified:** `QualityIndex` (V3.3.1 1600), `compiler_confidence_score` (1771, "0.0–1.0"), reputation/novelty/template-match/cost are all bare numbers with no formula; `unanchored_llm_judgment` is already excluded from `QualityIndex` aggregation (1600). **Fix:** the `FormulaRegistry` / `FormulaSpec` (typed inputs, output type/range/units, `missing_input_policy`, `zero_denominator_policy`, `normalization_policy`, `version`, `test_vectors`) + `CalibratedScore` + `QualityIndex` (aggregation_method, missing_dimension_policy) + `normalizeCriterionWeights()` (weights from criterion_weight | priority | uniform; `unanchored_llm_judgment` excluded). Owner: Common Contracts §9A (registry home, A-09). Every L1.5 `FormulaEvaluationReceipt.formula_id` resolves here. Lints: `validation.criterion_weight_sum_zero`, `validation.no_aggregation_eligible_criteria`. *Subsumes ~15 "this number has no formula" findings incl. D13 `compiler_confidence_score`.* **`C-02` — Survivorship bias in convergence metric (ACCEPT-AS-FIX).** **Verified:** `avg_revision_cycles_to_convergence` exists (V3.3.1 §15.1, line 7133); its denominator counts only outcomes reaching `satisfied`, erasing aborts/escalations. **Fix (Gemini BUG-01, verbatim):** bifurcate — `avg_cycles_to_success` (denominator = outcomes transitioning to `satisfied`) and `wasted_cycle_burn_rate` (denominator = total cycles; numerator = cycles on outcomes that aborted/escalated/`unrecoverable`). Both are `MetricRollup`s with `terminal_state_only` observations (L1.5). **`C-03` — Hash-hallucination paradox (ACCEPT-AS-FIX).** **Already rendered in Layer 3 core** (predicted hashes verified at V3.3.1 3238–3239; remove `predicted_pre_hash`/`predicted_post_hash`; lock on stable section anchors; Dispatcher computes hashes at runtime). Listed here for cluster completeness. **`C-04` — Flat-ratio "weighted" reputation score (ACCEPT-AS-FIX; *use* → §E).** **Verified:** `advice_regression_rate` exists (V3.3.1 §15.8.2, line 7450); documented as 2×-weighted but computed flat; Phase-2 reputation-routing use is already deferred per §26.1 (7463). **Fix (Gemini BUG-03):** carry `severity_breakdown {minor,major,critical}`, `unweighted_regression_rate`, and `severity_weighted_penalty_score = ((minor*1)+(major*2)+(critical*5))/accepted_count`. The *consumer* (reputation gating routing) stays **DEFER-PhaseB** (E-06). **`C-05` — Latency mean skewed by thermal/cold-start outliers (ACCEPT-AS-FIX).** **Verified:** sub-agent/cost metrics live at §15.8 and Core §16.2; arithmetic latency mean is bimodal-unsafe on local Apple Silicon. **Fix (Gemini BUG-05):** `CostLatencyFinding` drops `avg_duration_ms`, carries `latency_distribution {p50_duration_ms, p90_duration_ms, sample_size}`. `MetricRollup.percentile_impl` (L1.5) supplies the bounded percentile computation. *(Note: the exact prior field name for the mean differs across §12.5/§16.2; the fix replaces any latency-mean with p50/p90 regardless of the field name.)* **`C-06` — Zero-denominator / insufficient-sample / cost tail-risk (ACCEPT).** **Fix:** every ratio metric declares `zero_denominator_policy: "undefined_insufficient_data"` (from C-01's `FormulaSpec`) and emits `CalibratedScore.sample_size`; below `min_sample_size` it returns `indeterminate`, not 0; `CostEstimate` (C-07) gains `p90_cost` beside `expected_cost`. Rides `MetricRollup.zero_denominator_fallback` (L1.5). **`C-07` — Shared `CostEstimate` / `TaskCostRecord` + metric-kind tag (ACCEPT WITH MODIFICATIONS).** **Fix:** canonical `CostEstimate {expected_cost, p90_cost, currency_or_token_unit, basis, sample_size}` + `TaskCostRecord` in Common Contracts §9A (never redefined per-doc); add `BudgetFailureKind` (`estimate_exceeded|hard_cap_hit|insufficient_budget_to_start`) and a `MetricKind` tag (`rate|count|calibrated_score|duration_distribution`) so every metric declares its kind. **`C-08` — Novelty metric space undefined (ACCEPT).** **Fix (Appx L):** `NoveltyMetricSpec` (feature-vector ref, distance metric ∈ cosine/euclidean/jaccard/hybrid, range [0,1], `no_pattern_fallback`, calibration dataset, threshold) + `NoveltyAssessment` (closest pattern, distances, novelty_score, `forces_fresh_reasoning`, `triggers_task_agent`). A discriminated metric with an explicit no-pattern fallback — no undefined space. **`C-09` — Task-mode / template-match calibration + magic 0.7 (ACCEPT).** **Fix (Appx K):** `TaskModeScoringFunction` (feature_weights, veto_precedence hard/soft, thresholds surfaced as config not a literal, `explicit_rule`) + `TemplateMatchFormula` (component_weights, `hard_veto_cap`, `soft_penalty_max_total`) + `computeTemplateOverallScore()` (weighted mean − soft penalties, capped on hard veto). Protects the direct-first default with a calibrated score, not a bare `0.7`. Lint: `validation.template_match_component_out_of_range`. **`C-10` — `criterion_semantics_hash` is lexical (ACCEPT).** **Fix:** rename to `criterion_text_hash` (lexical); add a separate `criterion_semantics_version` bumped only on a reviewed meaning change. Lint: `validation.criterion_semantics_version_unbumped_on_meaning_change` (advisory, human-reviewed). ### C-cluster lints (registered in `LintRegistry`, L1.7) ``` validation.criterion_weight_sum_zero // C-01 validation.no_aggregation_eligible_criteria // C-01 validation.template_match_component_out_of_range // C-09 validation.criterion_semantics_version_unbumped_on_meaning_change // C-10 (advisory) ``` --- ## Layer 3 (continued) — Standard ACCEPT rows (carried by reference) These rows keep their consolidated-card disposition and fix verbatim — the staged card carries them by **ID + disposition + consolidated-card anchor + one-line landing**, not re-rendered (the full schema lives in `DOC23_ADDENDA_B_RT_ADJUDICATION_CARD_CONSOLIDATED.md` at the cited section). Many land on a Layer-1 primitive (noted). None changed disposition. **One row pulled up for emphasis — `D-24` (privilege firewall, HIGH).** Cross-matter forum-post visibility (Run Board §5.4 / Source Workspace §9.4). **Lands on `GovernanceEnvelope.matter_ref` (L1.1):** a forum post / board event is visible only within its `matter_ref` unless an explicit cross-matter grant exists; `validation.cross_matter_forum_leak` (error). For a litigator this is privilege-grade — high importance, though not a code-halt. **§A — standard (17) — *full bodies in Appendix R.1*:** | ID | Disp. | Anchor | Landing | |---|---|---|---| | A-03 | ACCEPT | CC §3.7 | Pattern C chain-ID lifecycle + chain registry (pairs A-25) | | A-06 | ACCEPT | V3.3.1 §0.4.1 / CC §3.1–3.2 | `OutcomeEvaluationState` count & mapping; canonical mapping CC §3.2 (severity reframe note) | | A-08 | ACCEPT | FD §6.2 | Full state→routing matrix; missing branches | | A-09 | ACCEPT | CC §4.2 / V3.3.1 §5.17.7 | Phantom/misplaced refs → handled by `TypeOwnerRegistry` (L1.2) | | A-10 | ACCEPT-MOD | CC §3 / §11 | `EvaluationArtifactEnvelope` wrapper home; `overall_state` de-B-specced | | A-13 | ACCEPT | Core §9 vs Common §5 | Core's duplicate learning envelope removed (registry forbids dual def, L1.2) | | A-14 | ACCEPT | CC §4.2 | Qualitative-slice owner map; slice-required → error not warning | | A-15 | ACCEPT-MOD | CC §11.5 | "Build-ready" softened; command registry not mandatory | | A-17 | ACCEPT | FD §5.3 / V3.3.1 §9 | Revisor is planner, not a revision target (pairs A-16) | | A-18 | ACCEPT | FD §1.2 / §2.3 | Feedback-bundle emission discipline reconciled | | A-19 | ACCEPT | V3.3.1 §0.4.4 / §5.7.1 | `FindingState` negative exits → in L1.3 | | A-20 | ACCEPT | CC §6.4 / §9.4 | `unanchored_llm_judgment` ack field → `GovernanceEnvelope` (L1.1) | | A-21 | ACCEPT | FD §2.2 / §3.3 | Optional-but-invariant-required fields (largely satisfied by A-01/L1.3) | | A-22 | ACCEPT | CC §12 | Pending-consumer degraded-mode behavior | | A-24 | ACCEPT | CC §3.4 vs V3.3.1 §5.18.8 | Parallel Judge+Evaluator example aligned to topology | | A-25 | ACCEPT | CC §3.1 vs V3.3.1 §5.1 | `evaluation_chain_id` naming → one owned name (fold A-03) | | A-27 | ACCEPT | V3.3.1 §5.16 | `EvaluationSnapshot` source-workspace hashes typed correctly | **§B — standard (20) — *full bodies in Appendix R.2*:** | ID | Disp. | Anchor | Landing | |---|---|---|---| | B-01 | ACCEPT | V3.3.1 §11 | Unified `RevisionExecutionLifecycle` dispatcher↔receipts state machine | | B-03 | ACCEPT ⚠v | SrcWS / V3.3.1 §11.20 | Concurrent-write safety on Source Workspace | | B-04 | ACCEPT | V3.3.1 §11.20.2 / §11.22 | Rolling hash under DAG/parallel (graph exists — depends_on_step_ids, see B-25) | | B-05 | ACCEPT | V3.3.1 §11.21 | Revalidation cascade convergence bound; de-dup upstream-failure rule | | B-06 | ACCEPT | V3.3.1 §dependency | Outcome-dependency cycle + `pending_dependency` deadlock | | B-07 | ACCEPT | V3.3.1 §11.22 | Parallel sibling orphans after batch failure; parallelism first-class | | B-09 | ACCEPT | V3.3.1 §candidate | Candidate external side-effects; lifecycle vs head-pointer | | B-10 | ACCEPT | V3.3.1 §11 / FD | `TaskCancelProtocol` + Hard Call blocking scope | | B-11 | ACCEPT | V3.3.1 §skip / FD | `TaskSkipProtocol` + skip receipts | | B-12 | ACCEPT | CC §policy | Policy-freshness field → `GovernanceEnvelope.policy_generation_id` (L1.1) | | B-13 | ACCEPT | SrcWS §6 | `ResearchNeed` lease | | B-14 | ACCEPT | Task Forum | Forum deadlock breaker (+ remove `task_agent_decides` per ChatGPT) | | B-16 | ACCEPT-AS-FIX | V3.3.1 §11.9 | **Concurrency** tiebreaker: replace wall-clock `created_at` with deterministic ordering (NOT a metric epsilon) | | B-17 | ACCEPT | FD §8.4 / §10.3 | `instruction_in` overload → DOC23/DOC15/DOC24 boundary | | B-18 | ACCEPT | V3.3.1 §6.5.2 / §7.9 | `HardRevisionCall.options` bounded-nonempty → on `RevisorTerminationLedger` (L1.4) | | B-19 | ACCEPT | V3.3.1 §11.6 / §0.4.7 | `hard_call_resolved` producer → L1.4 | | B-20 | ACCEPT | V3.3.1 §6.7.2 / §11.21 | Success-condition 5 races cascade | | B-21 | ACCEPT | Core §9.0.6 | Signal-emission ordering vs receipt persistence | | B-22 | ACCEPT | V3.3.1 §5.18 / §11.15 | Pattern C latency/cost budget | | B-23 | ACCEPT | FD §6.3 | Multiple delivery branches idempotency | **§D — standard (20; D-24 pulled up above) — *full bodies in Appendix R.4*:** | ID | Disp. | Anchor | Landing | |---|---|---|---| | D-02 | ACCEPT | SrcWS §2 | Workspace taint aggregation → `GovernanceEnvelope` + max-taint (D-01) | | D-03 | ACCEPT | SrcWS §0.3 / V3.3.1 §12 | `TaskSourceWorkspace` vs `SourceWorkspace` identity | | D-05 | ACCEPT | SrcWS §6.2 / §7.4 | `ResearchNeed` scoping, ref types, `human_needed` exit | | D-06 | ACCEPT | SrcWS §4 / §5 / §6.3 | Evidence anchors, payload registry, tool-receipts, extractor-vs-source | | D-07 | ACCEPT | Task Forum §5.2 / §5.3 | Forum post visibility/supersession/governance → `GovernanceEnvelope` (L1.1) | | D-08 | ACCEPT | Task Forum §6.3 / §6.4 | Context-packet ownership, omission manifest, audience enum, budget | | D-09 | ACCEPT | Task Forum §1 / §5 | Passive-board auto-publish privacy/volume controls | | D-10 | ACCEPT | Task Forum §7.2 etc. | `ModuleAssistanceRequest`, participant/moderator model, moderator-failure | | D-11 | ACCEPT | Run Board §3.1 / §6.2 | `BoardDigest` filter rule | | D-12 | ACCEPT | Run Board §retention | Retention/compaction/event-class → `DerivedReadModelRecord` (L1.6) | | D-14 | ACCEPT | Core §3D / §4B / §5A | `InjectionSlotRegistry`, compact cards, command registry, token_budget, DOC24 ownership | | D-15 | ACCEPT | V3.3.1 §8.4 / §17.1 | Sub-agent output-contract plurality, no-sub-agent fallback | | D-16 | ACCEPT ⚠v | SrcWS / Forum | Workspace API operations defined | | D-17 | ACCEPT | CC §7 / V3.3.1 §7.9.3 | Anchor/hash hygiene: empty `StructuredAnchor`, uncomputed `context_hash` | | D-18 | ACCEPT | V3.3.1 §repeated-failure | Repeated-failure keyed on versioned refs → `FindingMatchKey` (L1.4) | | D-19 | ACCEPT | SrcWS §SourceRecord | Per-module cost attribution field → `CostEstimate`/`TaskCostRecord` (C-07) | | D-20 | ACCEPT | SrcWS §library | Library-promotion gate EC policy defined | | D-21 | ACCEPT | V3.3.1 / Core | `requires_background_progress` overload split | | D-22 | ACCEPT-MOD | CC §11.5 | Backward-compat softened; section-anchor hygiene | | D-23 | ACCEPT | FD §9.4 | "Silent ignoring fires validation" made enforceable | **§E (held, Phase-B) E-01…E-13** and **§F (declined) ×4** — carry forward unchanged; no per-row action. **§G (G-01…G-21)** → Layer 4 (next). --- ## Layer 4 — §G Professional-Reliance / UX layer §G is the differentiator (pre-flight, reviewable diffs, evidence binder, plain-English cost, a single "can I rely on this" memo). All 21 rows KEEP their ACCEPT disposition. The staged-card value-add is to **classify** them correctly against the "no new truth store" rule — because the reviews' "build §G as read-models" is true for the *surfaces* but wrong for a handful of items that carry genuine new state or are operations. Three kinds: ### L4.1 Reliance artifacts (persisted; mostly derived, a few new-truth fields) Schemas already written in the consolidated card (`[Appx O]`/`[Appx M]`); carried by reference here with classification + governance. All carry `GovernanceEnvelope` (L1.1) and are matter-scoped (D-24). - **`G-06` `TaskReliancePacket` — capstone (ACCEPT).** [CC/Core, Appx O] Assembled from existing truth (evaluation_chain_ids, evidence package, hard-call resolutions, policy decisions, budget narrative, known-good states) → **derived**; the only computed new value is `reliance_status ∈ safe_to_rely_within_scope | rely_with_limitations | not_safe_to_rely | human_review_required` + `reliance_scope`. `unresolved_limitations` uses the A-07 limitation taxonomy. Everything else in §G feeds it. - **`G-01` `EvaluationContractReview` (ACCEPT).** [Appx O] Pre-execution; interpreted goal/criteria/thresholds are **derived** from the compiled plan, but `approval_status` (`pending|approved|rejected|edited|waived_by_policy`) is **genuine new truth** (the user's decision) → persisted artifact, not a pure read-model. Gate optional per task autonomy. - **`G-02` `RevisionReviewPacket` (ACCEPT).** [Appx O] before→candidate semantic diff + `finding_to_change_map` + preservation/revalidation results are **derived**; `reviewer_action` (`accept|reject|fork|request_changes|restore_known_good_state|no_user_review_required`) is **new truth**. Pairs with A-16 (revision flows through `revision_in`) and G-04 (restore). - **`G-03` `EvidencePackage` (ACCEPT).** [Appx O] **Derived** export (snapshot + `claim_support_map` over D-06 evidence anchors); persisted as the binder. No new truth beyond the snapshot boundary. - **`G-05` `BudgetNarrative` (ACCEPT).** [Appx O] **Derived** from `TaskCostRecord`s (C-07); separates logical LLM calls from infrastructure retries. Feeds G-09. - **`G-07` `AttentionLedger` / `DecisionQueue` (ACCEPT — author minimal).** Cross-run queue of pending hard calls / blocked / contested / approvals. **Derived** projection over those items, but each `AttentionLedgerItem` carries a mutable `resolved_at` → persisted queue. UI surface → DOC20 (L4.4). ### L4.2 Operations / state primitives (NOT read-models) These are state or actions, not projections — they need real schemas/commands. - **`G-04` `KnownGoodState` (ACCEPT).** [Appx M] Named restorable checkpoint (artifact_version_refs[] + workspace_snapshot_ref + label). A saved **pointer set**, plus a `restore` **command** in the registry (D-14) with an idempotency key. Backs B-10 cancel, G-17 fork, G-02 restore. - **`G-17` `TaskRunFork` (ACCEPT).** Fork a run from a `KnownGoodState`; the honest form of the declined ShadowWorkspace (§F-01) — carries `irrevocable_side_effects_at_fork[]` (an already-sent email stays sent). An **operation** + record. - **`G-18` `ExplanationTrace` (ACCEPT).** Short human-readable causal trace emitted by every `CompiledRevisionStrategy`/`RevisionPlan` ("changed X because finding Y, preserving Z") → a small **new artifact**, `explanation_trace_ref` on the plan. Feeds G-15. - **`G-19` `TaskReplay` (ACCEPT).** Deterministic replay primitive keyed to a run snapshot + `KnownGoodState`. An **operation**; underpins G-14/G-16. ### L4.3 DOC20 read-model surfaces (`DerivedReadModelRecord`, L1.6 — no new truth) Each is a projection over existing truth; none is authoritative. Rendered as `DerivedReadModelRecord` with the noted `projection_kind` / `derived_from`. All matter-scoped (D-24); contestable items carry the G-11 contest affordance. | Row | `projection_kind` | `derived_from` | |---|---|---| | `G-12` `WorkProductCertification` (highest-leverage) | `work_product_certification` | `TaskReliancePacket` (G-06) | | `G-13` `FindingsInbox` | `findings_inbox` | canonical `EvaluationFinding`s (A-01/L1.3); matter-scoped (D-24) | | `G-14` `RunDiff` | `run_diff` | run records + `BudgetNarrative` (G-05) | | `G-15` `DecisionAuditView` | `decision_audit` | Pattern C chain (A-03) + routing (A-08) + the coordination trace | | `G-16` `RunReplayPreview` | `run_replay_preview` | `TaskReplay` (G-19) + `KnownGoodState` (G-04) | | `G-20` Unified Evaluation-Chain view | `evaluation_chain` | the chain registry (A-03); qualitative + quantitative side by side | | `G-08` `TaskHealthCard` | `task_health` | run state + `BudgetNarrative` (G-05) + last `OutcomeEvaluationState` | | `G-09` cost forecast (pre-run) | `cost_forecast` | shared `CostEstimate` (C-07) summed over packets/research/LLM (D-18 sub-totals); surfaced in G-01, reconciled by G-05 | | `G-10` unified review entry | `review_home` | composes G-13 (triage) + G-02 (changes) + G-15 (why) | | `G-11` contestability surfacing | `contestable_marker` | defeasible findings (FD §3.4); marks contestable findings/verdicts inline | `G-08`/`G-10`/`G-11` are pure surfaces; `G-12`–`G-16`/`G-20` are user-facing surfaces over the artifacts above. ### L4.4 OP-A row — DOC20 UI surfaces The contracts above are defined here (MOCKUP-READY: contracts tight, UI light); the **UI rendering is owned by DOC20**. One OP-A row: ``` OP-A → DOC20: render the §G surfaces — WorkProductCertification (G-12), FindingsInbox (G-13), RunDiff (G-14), DecisionAuditView (G-15), RunReplayPreview (G-16), Unified Evaluation-Chain view (G-20), TaskHealthCard (G-08), AttentionLedger/DecisionQueue (G-07), unified review home (G-10), contestability markers (G-11). All as DerivedReadModelRecord-backed surfaces (no new truth); matter-scoped per D-24; privilege/tier-filtered on read. ``` ### G-21 → Layer 5 Chaos/concurrency fixtures (storage-full, malformed LLM output, mid-run privilege change, clock skew, parallel writes) back B-03/B-04/B-05/B-16 and D-01/D-24 — rendered in **Layer 5** (fixtures), where the runtime fixes gate on them passing. ### Layer 4 lints (registered in `LintRegistry`, L1.7) ``` reliance.packet_status_without_evidence_or_limitations // G-06: reliance_status set with neither evidence refs nor a limitation list read_model.authoritative_write_attempt // L1.6: any §G surface attempting a write to source-of-truth ui.contestable_finding_without_affordance // G-11: a defeasible finding rendered without the contest control attention_ledger.item_without_command_or_resolution // G-07: a pending item with no command_ref and no resolved_at path ``` --- ## Layer 5 — Fixtures (golden scenarios + chaos/concurrency) Golden scenarios are **executable assertions**, not names. Each backs a specific row and states setup → expected outcome → the lint/gate it exercises. Fixture shape: ```ts interface FixtureRecord { fixture_id: string; backs_rows: string[]; setup: string; expected: string; asserts_lint?: string; gate: "slice_local" | "cross_layer" | "final_switchover"; blocking_severity: "advisory" | "warning" | "blocker"; schema_version: 1; } ``` ### L5.1 Golden scenarios (correctness fixes) | ID | Backs | Setup → Expected | Lint | |---|---|---|---| | GS-01 | A-16 | A repair `OutcomeRepairInstruction` is wired to `instruction_in`/`data_in`/`context_in` → it executes **no** mutation; carried as advisory context; only a `revision_in` (or `revision_compatible=true` + capability) port mutates | `validation.repair_routed_to_non_revision_port` | | GS-02 | D-01 | A `web_source` (`external_untrusted`) artifact is summarized by a sub-agent and consumed by the Revisor → the summary inherits `external_untrusted`; unless it passed a `SanitizationNode`, the Revisor treats it as data, not instruction | `validation.taint_not_inherited_through_summary`, `validation.source_kind_taint_unmapped` | | GS-03 | B-24 | Revisor produces 3 consecutive outputs with the **same `FindingMatchKey`** unresolved → the 4th attempt is BLOCKED; `RevisorTerminationLedger.termination_reason = "loop_breaker"`; escalates per §6.7.3 | (extends F-LOOP-01) | | GS-04 | D-13 | A `RunGuidanceItem` in `lifecycle_state: contested` is present at run start → every consumer skips it; a `RunGuidanceLifecycleReceipt` records the skip | `validation.run_guidance_consumed_while_contested` | | GS-05 | D-04 | A Tier-0 `lookup_receipt` is cited as support for a reliance/trust claim → error; Tier 0 remains valid as a "lookup occurred" record | `validation.tier0_receipt_used_as_trust_basis` | | GS-06 | D-24 | A forum post with `matter_ref = M1` is requested by a reader scoped to `M2` with no cross-matter grant → blocked | `validation.cross_matter_forum_leak` | | GS-07 | A-02 | Pattern C: the Judge reads `evaluated_target`/`evaluation_basis` → present on `EvaluationResultEnvelope` (post-fix); chain resolves at read time | (read-time resolution) | | GS-08 | B-08 | Build-time scan of V3.3.1 → exactly **one** `FailureKind` definition (§6.2 canonical; §0.4.9 references it); step-execution failures use `StepExecutionFailureKind` | `supersession.dual_running_type_family` | | GS-09 | C-03 | A multi-step `rolling_hash_in_place` plan is compiled → the LLM emits **no** `predicted_post_hash`; the Dispatcher locks on `target_section_anchor_hash` and computes hashes at runtime; the plan does not crash | (no predicted hash in output contract) | | GS-10 | C-01/C-06 | A ratio metric runs with a zero denominator → returns `indeterminate` (not 0/NaN), `CalibratedScore.sample_size` set; every score resolves a `FormulaSpec.formula_id` | `validation.criterion_weight_sum_zero` | | GS-11 | G-06 | A `TaskReliancePacket` sets `reliance_status` → it carries either evidence refs or a non-empty `unresolved_limitations`; a packet with neither fails | `reliance.packet_status_without_evidence_or_limitations` | | GS-12 | A-12 | A lane returns `lane_failed` while the outcome is otherwise satisfied → `OutcomeVerdict` is computed by the canonical lane→outcome mapping, not read off a single `EvaluationVerdict` | (verdict-level separation) | ### L5.2 Chaos / concurrency fixtures (`G-21` — the §B harness) The runtime fixes **gate** on these passing (blocking). | ID | Backs | Setup → Expected | |---|---|---| | CH-01 | B-03/B-09 | Storage fills mid-write → `WorkspaceWriteFailureKind` (`partial_artifact_written`); the head pointer is not advanced to a non-existent artifact; no orphan | | CH-02 | B-08 | The Revision Compiler emits malformed/invalid plan JSON → classified as a `StepExecutionFailureKind` and **halts mechanically** (not surfaced as a low-severity finding) | | CH-03 | D-01/D-24 | Privilege downgrades mid-run (matter access revoked) → `GovernanceEnvelope` re-evaluated against the new `policy_generation_id`; in-flight reads fail-closed | | CH-04 | B-16 | Two events carry equal/inverted wall-clock `created_at` → a deterministic tiebreaker (not `created_at`) fixes a stable order | | CH-05 | B-04/B-07 | Two revision steps target one artifact under the DAG → the precondition/rolling-hash check rejects the stale write; no lost update; parallel siblings not orphaned on one failure | | CH-06 | B-05 | A revalidation cascade is triggered → it converges within the declared bound; the duplicated upstream-failure rule fires once | | CH-07 | B-06 | A `pending_dependency` cycle is introduced → detected by `dag_acyclic` / topological cycle detection (B-25 verified mechanism); no deadlock | ### L5.3 Fixture-to-row matrix & gate - **slice_local:** GS-07, GS-08, GS-10, GS-12. **cross_layer:** GS-01…GS-06, GS-09, GS-11, CH-01…CH-07. - **Gate:** the runtime/concurrency fixes (B-03/B-04/B-05/B-06/B-07/B-09/B-16) and the compliance fixes (D-01/D-24) **do not ship** until CH-01…CH-07 pass; the safety fix (A-16) and taint fix (D-01) gate on GS-01/GS-02. - All fixtures register their lints in `LintRegistry` (L1.7); none introduces new truth. --- ## New rows (added by this restructure) Rows the restructure introduces or completes beyond the 126 consolidated-card items. Each traces to an existing decision or a named primitive; none is a free invention. ### NR.1 Layer-1 primitives as first-class rows (`TypeOwnerRegistry` entries) The seven primitives get tracked rows so their ownership is explicit (not floating): | Primitive | Owner doc | Status | |---|---|---| | `GovernanceEnvelope` (L1.1) | Common Contracts | new mixin | | `TypeOwnerRegistry` + `SchemaReferenceValidationRule` (L1.2) | Common Contracts (governance) | new | | `EvaluationFinding` event/record/view (L1.3) | Common Contracts §4.2 (A-01); shared w/ Review Studio §2 | replaces two schemas | | `FindingMatchKey` (L1.4) | V3.3.1 (reuses §6.2 `FailureKind`) | new | | `RevisorTerminationLedger` (L1.4) | V3.3.1 §6.7.3 | new (consolidates counters) | | `FormulaEvaluationReceipt`/`MetricObservation`/`MetricRollup` (L1.5) | Common Contracts §9A | new (C-01 registry) | | `DerivedReadModelRecord`/`ReadModelInvalidationSpec` (L1.6) | DOC20 / Common Contracts | new | | `LintRegistry` (L1.7) | Common Contracts (governance) | new | ### NR.2 Reliance-decay (new) Reliance is time- and dependency-bounded. **Fix:** `TaskReliancePacket` (G-06) gains `valid_until: ISO8601` + `decay_policy`; a `RelianceDecayCheck` (a `DerivedReadModelRecord`, L1.6, whose `declared_dependencies` are the packet's source/finding/policy refs) downgrades `reliance_status → "human_review_required"` when any underlying source/finding/policy has changed or aged past threshold. Lint: `reliance.packet_consumed_past_valid_until`. ### NR.3 Delegation (new) A Hard Call or scoped task can be delegated (e.g., to a supervising attorney, or a sub-agent within scope). **Fix:** ```ts interface DelegationGrant { grant_id: string; delegator_ref: string; delegate_ref: string; scope: { decision_kinds: HardRevisionCallKind[]; outcome_ids?: string[]; matter_ref?: MatterRef }; authority_basis: EvaluationAuthorityBasis; expires_at: ISO8601; revocation_ref?: string; governance: GovernanceEnvelope; schema_version: 1; } // A delegated decision carries the delegator's access_tier; a delegated Hard Call resolution records grant_id. ``` Ties to `access_tier` (e.g., `supervising_attorney`) and surfaces in the `AttentionLedger` (G-07). Lints: `delegation.decision_outside_granted_scope`, `delegation.expired_grant_used`. ### NR.4 `D-19` / `D-20` completions (complete two §D standard rows) - **`D-19`** (per-module cost attribution): add `cost_attribution_ref: TaskCostRecord` (C-07) on `SourceRecord` / module output, so cost rolls up per module. Closes the "no field" gap. - **`D-20`** (library promotion gate): define `LibraryPromotionPolicy` — `promote_to_doc73_library_candidate` requires an EC `PolicyDecisionRef` (privilege/matter scope) **and** source tier ≥ 2 before promotion to the DOC73 library. Lint: `library.promotion_without_ec_policy`. ### NR.5 Grok §3.3 — feedback→prompt responsibility matrix **Fix:** `FeedbackResponsibilityMatrix` maps each feedback kind → the component responsible for turning it into the next instruction: repair instruction → Revisor (via `revision_in`, A-16); strategic/scope change → Task Agent; ambiguous/contested → human (Gate-4). Prevents feedback from being dropped or double-handled. Lint: `feedback.without_responsible_owner`. ### NR.6 Grok §5.7 — sub-agent metric observation **Fix:** sub-agents MUST emit `MetricObservation`s (L1.5) for cost/latency/outcome so metrics aggregate across the run; without it sub-agent work is invisible to the C-07 cost rollup and the §15 quality metrics. Contract: every sub-agent activation emits ≥1 `MetricObservation` keyed to its activation_seq. Lint: `subagent.work_without_metric_observation`. ### NR.7 Grok §5.9 — BoardDigest ResearchNeed import **Fix:** the Run Board's `BoardDigest` (D-11) imports open `ResearchNeed`s (D-05/D-13) as a `research_needs_open[]` projection (a `DerivedReadModelRecord`, matter-scoped per D-24) so research gaps surface on the board rather than staying buried in the workspace. ### NR — citation/scope fixes carried - **B-24 is Grok §4.9** (not §4.10). - **B-14** also removes `task_agent_decides` (ChatGPT) — folded into the B-14 standard row. --- ## R0.4 audit additions (completeness-audit fold-ins, new rows, OP-A, parked, confirmed divergences) A source-based completeness audit of this card against the three card-reviews (1,182 extracted findings; per-row cross-check plus a deterministic name/lint check against **both** this card and the consolidated card) confirmed the structural layer was faithful and surfaced a detail layer the card-reviews added atop the consolidated card. Accepted results are folded in here and rendered inline in Appendix R. ### Audit fold-ins (R0.4 deepening) — rendered inline in Appendix R | # | Row | Fold-in | In | |---|---|---|---| | 1 | C-07 | `CostVector` / `CostDimension` / `CostCorrelationPolicy` (quantile-summing) | R.3 | | 2 | G-04 | `KnownGoodCheckpoint` hardening (policy + capability snapshot; irreversible-restore lint) | R.7 | | 3 | G-02 | `RevisionReviewDecisionReceipt` + review lints (resolves the inline-action divergence) | R.7 | | 4 | G-19 | `TaskReplayRequest` / `ReplayDivergenceRecord` + replay lints | R.7 | | 5 | A-02/A-04 | `EvaluationChainResolutionReceipt` | R.1 | | 6 | B-16 | `PlanSelectionTiebreakPolicy` (epsilon band on `plan_risk_score`) + concurrency lints — corrects the `tiebreaker_epsilon` removal | R.2 | | 7 | B-03 | `SourceWorkspaceOperation`(+`Receipt`) write-precondition / lock-lease API | R.2 | | 8 | B-05/B-06 | `RevalidationCascadeRun` + `OutcomeDependencyGraphPolicy` (bounded cascade) | R.2 | | 9 | D-06 | `ClaimSupportDerivationPolicy`/`Receipt` (+ `SourceEvidenceAnchor`) | R.4 | ### New rows (in-scope, R0.4) **NR.8 — `ReceiptCoverageRegistry`** — which operations MUST emit a receipt; closes the gap where receipts exist per-feature but nothing enforces coverage. ```ts interface ReceiptCoverageRegistryEntry { operation_kind: string; // "revision_step" | "workspace_op" | "review_decision" | "formula_eval" | ... required_receipt_type: string; // the receipt schema that must be emitted enforcement: "blocking" | "warning"; } // lint: validation.operation_without_required_receipt ``` **NR.9 — `PlanVerifierRequirementPolicy`** — when a `RevisionPlan` must pass verification before execution. ```ts interface PlanVerifierRequirementPolicy { requires_verification_when: ("multi_step" | "irreversible_effect" | "in_place_mutation" | "cross_artifact")[]; verifier: "dag_acyclic_plus_precondition" | "human_gate_4"; on_verification_fail: "block" | "downgrade_to_advisory"; } // lint: validation.plan_executed_without_required_verification ``` ### Cross-doc obligation (OP-A) **OBL-DOC24-CTXPKT-01** — `ContextPacketFidelityContract` + `TaskKnowledgePackFreshnessPolicy`: the context packet a task run receives must declare a fidelity contract (what was included/omitted vs. the source set) and a freshness policy (when a knowledge pack is stale and must be re-pulled). **Owner: DOC24** (context assembly/delivery); referenced by Addenda B at the task-run boundary. Routed to OP-A — not rendered as a DOC23 schema. DOC24-owned lints: `validation.context_packet_without_fidelity_contract`, `validation.task_knowledge_pack_consumed_past_freshness`. ### Parked — OPEN_FOR_ARCHITECT_REVIEW (in/out ruling pending; not operative) - **`WorkspaceExternalizationPolicy`/`Receipt`** — externalize large workspace contents to storage under a cost cap. - **`ExternalSourceQueryPolicy`/`Receipt`** — govern external (web/API) source queries (authority signal, rate, receipt). - **`TaskAgentForumSurfaceOwnership`** — who owns the forum surface a task agent posts to (may fold into B-15 / DOC12). ### Confirmed divergences (reviewed; intended) - **A-02 → HIGH, not CRITICAL** (2–3 reviewers said CRITICAL): a correctness fix with no safety/compliance/data-loss dimension; the silent-wrong-verdict concern is covered by the escalation path plus the new A-02/A-04 chain-resolution receipt. - **A-05 blocking reframed**: both basis arrays kept (closes the lossy-collapse build-blocker); the blocking role moves to severity + an action-permission gate on publishing/sending, not a typed `BlockingAuthorityEvaluation`. - **Row-merges declined**: A-08+B-23 and B-05+B-06+B-20 stay distinct rows (shared mechanisms rendered; row identity preserved for traceability). - **Admission gate** is a one-time pre-write `TypeOwnerRegistry` check, not a standing object. --- ## Appendix R — Full row renderings (self-contained reference) This appendix renders every adjudicated row in full so the card is self-contained for red-team and for drafting the next spec version in a fresh context. Rows **restructured in the Layers above** show a pointer (the Layer rendering governs); **standard / held / declined / §G** rows are rendered in full from the adjudicated bodies. The nine **audit fold-ins (R0.4)** surfaced by the completeness audit are marked inline. ### R.1 — §A rows **A-01** → rendered in Layer 1 (that rendering governs). **A-02** → rendered in Layer 3 core (that rendering governs). **Audit fold-in (R0.4) — chain-resolution receipt** (DEEP §4 / ChatGPT §7), pairs with A-04. `EvaluationChainResolutionPolicy` (A-04) decides the route; the receipt records HOW a chain resolved at read time so Pattern C is inspectable. ```ts interface EvaluationChainResolutionReceipt { chain_id: string; resolved_target_ref: string; // the evaluated_target the Judge read (A-02) resolution_basis: "single" | "pattern_c_parallel" | "policy_default"; policy_ref: string; resolved_at: ISO8601; schema_version: 1; } // lint: validation.chain_resolved_without_receipt ``` ### A-03 — [Common Contracts §3.7] Pattern C chain-ID lifecycle + chain registry (ACCEPT) - **Raised by:** Claude **B3**; ChatGPT CG-S2; Grok GK-3.6 / GK-4.8. - **Problem:** §5.18.4 / §3.7 say the upstream Evaluator populates `target_evaluation_chain_id` and the Judge copies it, but nothing specifies: (1) who generates the UUID and whether it's minted even when no Pattern C Judge attaches; (2) what happens when it doesn't resolve at the consumer; (3) retention/GC; (4) whether re-activations share an ID. The field has no lifecycle, so audit reconstruction only works when Pattern C wiring happens to be present. - **Disposition:** **ACCEPT.** Evaluator MUST mint a fresh **ULID** at activation and emit it; the Pattern C Judge MUST read it from `evaluator_output_in.target_evaluation_chain_id` and set its own envelope to the same value; orphan envelopes keep the field unused; retention = envelope retention; add `validation.pattern_c_chain_id_mismatch`. Back it with a chain registry so chain status is first-class. - **Fix (Common Contracts §3.7) — `[Appx E]`:** ```ts interface EvaluationChainRegistryRecord { chain_id: string; chain_kind: EvaluationChainKind; task_id: string; run_id: string; target_artifact_ref: StorageRef | null; target_artifact_version_ref: StorageRef | null; target_scope_ref: ArtifactScopeRef | null; evaluation_snapshot_ref: StorageRef; expected_producers: ProducerKind[]; received_result_ids: string[]; status: EvaluationChainStatus; created_at: ISO8601; completed_at?: ISO8601; superseded_by_chain_id?: string; validation_failures: Array<"chain_id_missing" | "chain_target_mismatch" | "chain_stale_snapshot" | "chain_ambiguous" | "chain_consumer_timeout">; schema_version: "1.0"; } ``` **A-04** → rendered in Layer 3 core (that rendering governs). **A-05** → rendered in Layer 1.3 / severity reframe (that rendering governs). ### A-06 — [V3.3.1 §0.4.1 / Common Contracts §3.1–3.2] `OutcomeEvaluationState` count & mapping skew (ACCEPT) - **Raised by:** Claude **B2**; ChatGPT CG-S10/S11/S12. - **Problem:** V3.3.1 enumerates **15** `OutcomeEvaluationState` values; Common Contracts §3.1 calls it a "14-value enum"; §3.2's verdict mapping covers 14 — `evaluating` is unmapped — and `max_iterations_reached` is referenced elsewhere but absent from the enum. The document miscounts its own enum. - **Disposition:** **ACCEPT.** Split **runtime** states (4 — never emitted to an envelope) from **disposition** states (12 — emitted, each with a verdict mapping); add `max_iterations_reached`; make a single matrix the source of truth for terminal/verdict/feedback-branch/UI-label/learning-eligibility per state. - **Fix (V3.3.1 §0.4.1 + Common Contracts §3.2) — `[Appx B]`:** ```ts type OutcomeEvaluationRuntimeState = "pending" | "pending_dependency" | "evaluating" | "dirty"; type OutcomeEvaluationDisposition = | "satisfied" | "needs_revision" | "needs_information" | "needs_verification" | "needs_human_judgment" | "unable_to_evaluate" | "blocked_by_policy" | "regressed" | "upstream_failure" | "unrecoverable" | "superseded" | "max_iterations_reached"; type OutcomeEvaluationState = OutcomeEvaluationRuntimeState | OutcomeEvaluationDisposition; // + OUTCOME_STATE_MATRIX (Appx B): per-state {state_class, emitted_to_envelope, terminal, // verdict_mapping, feedback_branch, blocks_downstream_default, revisor_default_action, // ui_label, learning_eligibility} — this matrix is normative; §3.2 references it rather than re-listing. ``` - **Rationale:** runtime states (`pending`/`evaluating`/`dirty`/`pending_dependency`) explicitly carry `emitted_to_evaluation_envelope:false`, which is exactly why `evaluating` was never mapped — it should never reach the verdict mapping at all. **A-07** → rendered in Layer 3 core (that rendering governs). ### A-08 — [FD §6.2] Routing completeness: missing branches + full state→routing matrix (ACCEPT) - **Raised by:** Claude **B4**; ChatGPT CG-S15/S16. - **Problem:** `FeedbackRoutingPolicy` has `on_satisfied`/`on_needs_revision`/`on_needs_more_sources`/`on_needs_source_verification`/`on_needs_format_repair`/`on_repeated_failure` but **no** `on_indeterminate`, `on_not_applicable`, `on_unrecoverable`, `on_blocked_by_policy`, `on_upstream_failure` — and indeterminate is not rare (5 states map to it). It also can't represent multiple simultaneous actions. - **Disposition:** **ACCEPT.** Adopt the closed `FeedbackBranch` set; the state→branch routing is the matrix's `feedback_branch` column (A-06), so routing and state stay in lockstep. Delivery/consumption receipts come from Appendix F. - **Fix (FD §6.2) — `[Appx B]` branch set + `[Appx F]` receipts:** ```ts type FeedbackBranch = | "on_satisfied" | "on_needs_revision" | "on_needs_more_sources" | "on_needs_source_verification" | "on_needs_human_judgment" | "on_blocked_by_policy" | "on_upstream_failure" | "on_unrecoverable" | "on_repeated_failure" | "none"; // FeedbackRoutingPolicy keys MUST cover every FeedbackBranch; the active branch is taken // from OUTCOME_STATE_MATRIX[state].feedback_branch. Delivery + consumption receipts: Appx F. ``` ### A-09 — [Common Contracts §4.2 / V3.3.1 §5.17.7] Phantom & misplaced schema refs + TypeOwnerRegistry (ACCEPT) - **Raised by:** Claude **B5 + B6**; ChatGPT (ref findings). - **Problem:** Four ownership errors. (1) V3.3.1 §5.17.7 says `ClaimSetBundle`/`ExtractedEvaluationUnit` live in Common Contracts, but §1.2 explicitly puts them **out of scope** (owner = Addenda A). (2) Common Contracts §4.2 says `ResearchNeed` lives in Core — it's in **Source Workspace §6.2**. (3) Same §4.2 says `OutcomeRepairInstruction` lives in Core — it's in **FD §5.2**. (4) `EvaluationAffirmation` is referenced but **defined nowhere** (phantom). A coding agent following these refs looks in the wrong document four times. - **Disposition:** **ACCEPT.** Install a `TypeOwnerRegistry` as the single ownership source and fix the four pointers. **Decide `EvaluationAffirmation`: DELETE** it from the qualitative slice — its real need (the positive/"what the artifact got right" counterpart) is met by the verdict-aware `OutcomeEvaluationSignal` denominator, which is **Phase-B-gated** (see E-05), not by a phantom finding type. - **Fix (Common Contracts, new governance section) — `[Appx A]`:** ```ts // TypeOwnerRegistry: one entry per shared type; validation.type_owner_drift.* on mismatch. // Canonical homes set by this registry: // EvaluationResultEnvelope → Common Contracts §3 (canonical) // EvaluationFinding → Common Contracts §4.2A v2.0 (canonical; projection: FeedbackFindingView) // OutcomeRepairInstruction → FD §5 (canonical) ← fixes ref (3) // ResearchNeed → Source Workspace §6 (canonical) ← fixes ref (2) // ClaimSetBundle / ExtractedEvaluationUnit → Addenda A (imported) ← fixes ref (1) // CostEstimate → Common Contracts §9A (canonical) // V3.3.1 §5.17.7: cross-ref ExtractedEvaluationUnit/ClaimSetBundle → Addenda A; keep only ArtifactScopeRef → Common Contracts §7. // Common Contracts §4.2: EvaluationAffirmation → REMOVED from QualitativeSlice (see E-05). ``` - **⚠verify:** confirm `EvaluationAffirmation` is truly absent before deleting (review grep reports zero hits — low risk). ### A-10 — [Common Contracts §3 / §11] `EvaluationArtifactEnvelope` wrapper out-of-document; `overall_state` too B-specific (ACCEPT WITH MODIFICATIONS) - **Raised by:** ChatGPT CG-S22, CG-AA4. - **Problem:** the wrapper that governs envelopes is referenced from outside the set, and `overall_state` is shaped for Addenda B, so other producers (e.g. the Addenda A Experiment) can't reuse it. - **Disposition:** **ACCEPT WITH MODIFICATIONS.** Bring `EvaluationArtifactEnvelope` into Common Contracts as the universal governance wrapper (the Addenda A coordination rows R199/R213 already adopt it on that side); generalize `overall_state` to the canonical disposition set (A-06) so every producer maps onto one vocabulary. - **Fix:** TypeOwnerRegistry entry (`EvaluationResultEnvelope`/`EvaluationArtifactEnvelope` canonical in Common Contracts) + `overall_state: OutcomeEvaluationDisposition` (from A-06) replacing the B-specific union. **A-11** → rendered in Layer 2 (flip) (that rendering governs). **A-12** → rendered in Layer 3 core (that rendering governs). ### A-13 — [Core §9 vs Common §5] Core duplicates the learning-signal envelope with malformed TypeScript (ACCEPT) - **Raised by:** ChatGPT CG-S21. - **Problem:** Core re-declares the common `EvaluationLearningSignalEnvelope`, and the duplicated TypeScript is malformed — two sources of truth, one of them broken. - **Disposition:** **ACCEPT.** Single canonical envelope in Common Contracts §5; Core §9 **imports** it and deletes its copy. - **Fix:** TypeOwnerRegistry: `EvaluationLearningSignalEnvelope` canonical = Common Contracts §5; add normative line to Core §9 — "imports `EvaluationLearningSignalEnvelope` from Common Contracts §5; MUST NOT redeclare." Remove the malformed Core block. ### A-14 — [Common Contracts §4.2 / validation] Qualitative-slice owner map wrong; slice-required check only a warning (ACCEPT) - **Raised by:** ChatGPT CG-S20; Grok GK-4.8. - **Problem:** two issues — the qualitative-slice owner map is wrong (same root as A-09), and `validation.envelope_judge_emitted_qualitative_slice` is only a **warning** even though Pattern C **requires** the slice, so a non-conforming Pattern C envelope passes validation. - **Disposition:** **ACCEPT.** Owner map fixed by the TypeOwnerRegistry (A-09); promote the validation to an **error** when `chain_kind = pattern_c_evaluator_then_judge`. - **Fix:** ```ts // validation.envelope_judge_emitted_qualitative_slice: // severity = "error" WHEN chain_kind === "pattern_c_evaluator_then_judge" // severity = "warning" otherwise ``` ### A-15 — [Common Contracts §11.5 / cross-doc obligations] "Build-ready" overstated; command registry not mandatory; compat claim too strong (ACCEPT WITH MODIFICATIONS) - **Raised by:** ChatGPT CG-C4/C6/C7; Grok GK-4.4; Claude **D5/D22**. - **Problem:** the set is labeled build-ready while undefined imported types and pending cross-doc obligations remain; the command registry (Core routes/commands) is described but **not mandatory**; the §11.5 backward-compat claim overstates stability; section-number cross-refs are used where stable anchors are needed before the DOC23 R3.2 absorption. - **Disposition:** **ACCEPT WITH MODIFICATIONS.** Gate the "build-ready" status on (a) TypeOwnerRegistry populated with no `pending_absorption` blockers, and (b) OP-A obligations closed; make the command registry **mandatory** (Appendix I); soften §11.5 to "compatible within the locked schema-of-record set, no cross-version guarantee yet"; convert section-number cross-refs to stable anchors as pre-absorption hygiene. - **Fix:** prose obligations above + Appendix I (command registry, see §D-14 next pass) + anchor-stabilization rule applied across the set before R3.2 absorption. --- **A-16** → rendered in Layer 3 core (that rendering governs). ### A-17 — [FD §5.3 / V3.3.1 §9] Revisor input confused with `revision_in` (ACCEPT) - **Raised by:** ChatGPT (BUG/HIGH). - **Problem:** FD says the Revisor consumes `OutcomeRepairInstruction` via `revision_in`. But V3.3.1 treats `revision_in` as the port on **revision-capable modules that perform repairs**; the Revisor is the **planner** that compiles repair instructions into a plan. Wiring the planner's input as `revision_in` reverses the architecture and risks an implementer building the Revisor as a revision *target*. - **Disposition:** **ACCEPT.** Give the Revisor planner-appropriate input ports; reserve `revision_in` for artifact-mutating modules. - **Fix (FD §5.3 + V3.3.1 §9):** ```ts // Revisor (planner) input ports — NOT revision_in: // feedback_bundle_in, repair_instruction_in, evaluation_result_in // revision_in is reserved for revision-capable (artifact-mutating) modules that EXECUTE the compiled plan. // validation.revisor_declares_revision_in (error). ``` ### A-18 — [FD §1.2 / §2.3] Feedback-bundle emission discipline contradicts itself (ACCEPT) - **Raised by:** ChatGPT (BUG/HIGH). - **Problem:** FD says both the envelope and the feedback bundle are emitted by every evaluator producer, then says deterministic scorers emit a bundle only on failure and pass cases may emit envelope-only. So a missing bundle is ambiguous: not emitted by design, lost, not applicable, or policy-filtered — a consumer can't tell which. - **Disposition:** **ACCEPT.** Add an emission matrix by producer-kind × verdict; require either a bundle or an explicit absence reason. - **Fix (FD §1.2 / §2.3):** ```ts interface FeedbackEmissionMatrixEntry { producer_kind: "outcome_evaluator" | "judge" | "deterministic_scorer"; verdict: EvaluationVerdict; emits_envelope: boolean; emits_feedback_bundle: "always" | "on_failure_only" | "never"; } type FeedbackBundleAbsentReason = "not_emitted_by_design" | "not_applicable" | "policy_filtered"; // Every result MUST carry a feedback bundle OR a feedback_bundle_absent_reason. validation.feedback_bundle_absence_unexplained (error). ``` ### A-19 — [V3.3.1 §0.4.4 / §5.7.1] `FindingState` pass-through `proposed` has no negative exit (ACCEPT) — Claude C2 - **Raised by:** Claude C2 (BUG/HIGH). - **Problem:** the §5.7.1 transition table gives `proposed → active` as the only outbound transition from `proposed`. There is no `proposed → dismissed` for a finding the Evaluator decides **not** to confirm, so unconfirmed findings sit in `proposed` forever or implementations invent an unlisted transition. - **Disposition:** **ACCEPT.** Add the negative exit + an auto-dismiss at activation termination. (The canonical `FindingState` from A-01 already includes `dismissed`; this supplies the missing *transition*.) - **Fix (V3.3.1 §5.7.1):** ``` add transition: proposed → dismissed predicate: "Evaluator did not confirm the finding before activation completion" A finding still in `proposed` at activation termination auto-transitions to: state = dismissed, dismissal_reason = "not_confirmed_at_termination". ``` ### A-20 — [Common Contracts §6.4 / §9.4] `unanchored_llm_judgment` acknowledgment has no field to write into (ACCEPT) — Claude C6 - **Raised by:** Claude C6 (BUG/MEDIUM). - **Problem:** §9.4 requires explicit user acknowledgment when a required criterion is scored by `unanchored_llm_judgment`, but no schema has a field to store that acknowledgment — so the warning fires every time (loud) or the ack lands in an undocumented field. (Partially related to C-01's `unanchored_llm_judgment_policy`, which governs *aggregation eligibility*, not the *acknowledgment record*.) - **Disposition:** **ACCEPT.** Add the acknowledgment fields to `Criterion`. - **Fix (Common Contracts, `Criterion`):** ```ts // add to Criterion: unanchored_aggregation_acknowledged_by_user: boolean; // default false unanchored_ack_user_ref?: string; unanchored_ack_at?: ISO8601; // Warning fires WHEN scoring_basis == "unanchored_llm_judgment" && required == true // && unanchored_aggregation_acknowledged_by_user == false; silences once acknowledged. ``` ### A-21 — [FD §2.2 / §3.3] Optional fields are required by invariants (ACCEPT — largely satisfied by A-01) - **Raised by:** ChatGPT (BUG/MEDIUM). - **Problem:** FD says `findings[i].based_on_artifact_version_ref` must resolve to a version in the snapshot, but the field is optional — so validators can't enforce the invariant consistently. - **Disposition:** **ACCEPT.** Already addressed structurally by A-01's canonical `EvaluationFinding`, which carries both `based_on_artifact_version_ref` and `based_on_artifact_version_absent_reason`. Add the enforcing validation. - **Fix:** ``` // Satisfied by A-01 canonical EvaluationFinding. Add: // validation.artifact_targeted_finding_missing_version_ref (error): finding_kind targets an artifact AND // both based_on_artifact_version_ref and based_on_artifact_version_absent_reason are null. ``` ### A-22 — [Common Contracts §12] Pending consumers lack degraded-mode behavior (ACCEPT) - **Raised by:** ChatGPT (GAP/HIGH). - **Problem:** Common Contracts lists pending target updates to DOC8/BDSM, EC Core, DOC20, DOC72, PropA, etc. Until those land, producers can emit envelopes/signals **no consumer can interpret**, with no defined behavior. - **Disposition:** **ACCEPT.** Each pending cross-doc obligation declares an "until-target-lands" behavior, recorded on its TypeOwnerRegistry entry (`status: pending_absorption`, A-09). - **Fix:** ```ts type PendingConsumerDegradedBehavior = "persist_only" | "suppress_promotion" | "disable_ui_affordance" | "emit_validation_warning" | "block_route"; // Every TypeOwnerRegistryEntry with status "pending_absorption" carries pending_consumer_behavior: PendingConsumerDegradedBehavior. ``` - **Rationale:** also feeds A-15 — "build-ready" can't be claimed while any pending obligation lacks a declared degraded behavior. **A-23** → rendered in Layer 3 core (that rendering governs). ### A-24 — [Common Contracts §3.4 vs V3.3.1 §5.18.8] Parallel Judge+Evaluator example contradicts topology (ACCEPT) — Claude C14 - **Raised by:** Claude C14 (BUG/LOW). - **Problem:** §3.4's example ("Judge and Evaluator running in parallel on the same artifact version") fits no specified topology — Pattern C has the Judge *consume* the Evaluator's output, so they can't run in parallel. - **Disposition:** **ACCEPT.** Fix the example. - **Fix:** replace the §3.4 example with "two Evaluator activations on the same snapshot during an Experiment, or an Evaluator and a deterministic scorer in parallel" — or specify the genuine parallel topology. Removes a wiring trap. ### A-25 — [Common Contracts §3.1 vs V3.3.1 §5.1] `evaluation_chain_id` naming asymmetry (ACCEPT — fold into A-03) — Claude D8 - **Raised by:** Claude D8 (BUG/LOW). - **Problem:** the field is `target_evaluation_chain_id` on the envelope but referred to as `evaluation_chain_id` in V3.3.1 §5.1 prose — same concept, two names. - **Disposition:** **ACCEPT.** Normalize to `target_evaluation_chain_id` everywhere (fold into the A-03 chain-lifecycle work). `validation.chain_id_name_drift` (lint). --- **A-26** → rendered in Layer 1 (that rendering governs). ### A-27 — [V3.3.1 §5.16 / Source Workspace] `EvaluationSnapshot` source-workspace hashes typed as artifact hashes (ACCEPT) - **Raised by:** ChatGPT Audit Addendum (BUG/HIGH). - **Problem:** `EvaluationSnapshot.source_workspace_head_hashes` is typed `Record`, but the snapshotted values are Source Workspace heads, source records, source sets, research-need queues, and verification records — **not** artifact refs. The invalid type model breaks audit, revalidation, live-edit checks (B-03), and source-freshness detection. (Distinct from B-03, which is the *precondition*; this is the *type* that precondition compares.) - **Disposition:** **ACCEPT.** Replace with a workspace-native snapshot-hash structure. - **Fix (`[Appx, addendum]`):** ```ts interface SourceWorkspaceSnapshotHashSet { source_workspace_ref: string; source_workspace_head_hash: string; source_record_hashes: Record; source_set_hashes?: Record; research_need_queue_hash?: string; verification_record_hashes?: Record; freshness_record_hashes?: Record; run_guidance_hashes?: Record; schema_version: "1.0"; } // EvaluationSnapshot: REMOVE source_workspace_head_hashes: Record; // ADD source_workspace_state_ref: StorageRef + source_workspace_snapshot_hashes: SourceWorkspaceSnapshotHashSet[]. ``` - **Rationale:** this is the concrete `SnapshotHash` type that B-03's `WorkspaceWritePrecondition.expected_snapshot_hash` and D-12 retention compare against — without it, "compare the snapshot hash" has no well-typed object. --- --- # §B. Runtime / concurrency / distributed-systems ### R.2 — §B rows ### B-01 — [V3.3.1 §11 / Core §runtime] No unified RevisionExecutionLifecycle / dispatcher↔receipts state machine (ACCEPT) - **Raised by:** Grok GK-4.1 / GK-5.6; ChatGPT (runtime). - **Problem:** the execution lifecycle (plan compiled → dispatched → step receipts → revalidation → terminal) is specified in pieces across §11.x with no single owning state machine. A coding agent assembles the order from prose and the dispatcher↔receipt handshake is implicit. - **Disposition:** **ACCEPT.** Define one normative `RevisionExecutionLifecycle` state machine in V3.3.1 §11 (states + legal transitions + which receipt drives each transition), anchored by the Appendix M primitives (`PendingDependencyInfo`, `HardCallPendingPolicy`) and the cancel/known-good types below. - **Fix:** add a §11.0 lifecycle FSM listing states `{plan_compiled, dispatched, step_running, step_receipt_received, revalidating, regression_detected, terminal_satisfied, terminal_failed, cancelled, hard_call_pending}` and the receipt that triggers each edge; every other §11.x rule references a state in this FSM rather than re-describing flow. **B-02** → rendered in Layer 3 core (that rendering governs). ### B-03 — [Source Workspace §x / V3.3.1 §11.20] Concurrent-write safety on the Source Workspace (ACCEPT) ⚠verify - **Raised by:** Gemini **F-01** (#1 build blocker) + D-05; ChatGPT CG-SW9; relates to Claude B9. - **Problem:** a `direct_fix` mutation can race a live UI edit (or a parallel module write) to the same workspace record with no precondition check, silently corrupting it; no transaction boundaries or read/write locks are specified. - **Disposition:** **ACCEPT.** Mandate a snapshot-hash precondition on every in-place workspace write; on mismatch, abort and re-evaluate rather than overwrite. Add explicit lock semantics. - **Fix (Source Workspace write path):** ```ts interface WorkspaceWritePrecondition { target_record_ref: StorageRef; expected_snapshot_hash: string; // hash read at plan time on_mismatch: "abort_and_reevaluate" | "queue_behind_lock" | "fail_outcome"; // default abort_and_reevaluate lock_mode: "optimistic_snapshot_hash" | "pessimistic_write_lock"; // default optimistic } // direct_fix MUST carry a WorkspaceWritePrecondition; dispatcher computes the real hash at execution // and compares to expected_snapshot_hash before applying. validation.workspace_write_without_precondition (error). ``` - **⚠verify:** confirm the current `direct_fix` write path and whether any precondition exists (Gemini asserts none; this is the single item Gemini calls catastrophic if missed — worth confirming before patch). **Audit fold-in (R0.4) — Source Workspace write-operation API** (DEEP §5/§7). The CH-05 fixture asserts the precondition; this renders the operation contract it checks. ```ts interface SourceWorkspaceOperation { operation_id: string; workspace_ref: string; op_kind: "append" | "update" | "supersede" | "delete"; target_artifact_ref?: string; expected_precondition_hash?: string; // optimistic concurrency (rolling hash, B-04) lock_lease?: { lease_id: string; holder_ref: string; expires_at: ISO8601 }; governance: GovernanceEnvelope; schema_version: 1; } interface SourceWorkspaceOperationReceipt { operation_id: string; result: "applied" | "precondition_failed" | "lease_lost" | "rejected"; resulting_artifact_ref?: string; applied_at: ISO8601; } // lints: validation.workspace_mutation_without_precondition ; validation.workspace_operation_without_receipt ; validation.workspace_lease_expired_but_write_applied ``` ### B-04 — [V3.3.1 §11.20.2 / §11.22] Rolling hash unsafe under DAG/parallel execution (ACCEPT) - **Raised by:** ChatGPT CG-R2 / CG-R3 / CG-AA10; Claude **B9**. - **Problem:** §11.22 allows up to `max_parallel_steps_per_plan: 4`, but §11.20.2 Rolling-Hash Mode B requires "step N+1 validates against predicted hash from step N." Two parallel steps mutating the same artifact can't both validate against one base — the chain is nondeterministic. §11.20.2 forbids concurrent *plans* on the artifact but not concurrent *steps* within one plan. (Distinct from C-03, which is the LLM-predicted-hash problem.) - **Disposition:** **ACCEPT.** Rolling-hash Mode B requires sequential execution across all steps mutating the same artifact; parallelism allowed only between steps on disjoint artifacts. - **Fix (V3.3.1 §11.20.2 + §11.22):** ``` §11.20.2: "Rolling-hash Mode B requires sequential step execution across all steps that mutate the same artifact. Parallelism within a plan is permitted only between steps targeting disjoint artifacts." §11.22: parallel batches automatically degrade to sequential when any step is rolling-hash Mode B on a shared artifact. validation.rolling_hash_parallel_steps_same_artifact (error) ``` ### B-05 — [V3.3.1 §11.21 / §11.21.2 / §5.14.1] Revalidation cascade: no convergence bound + duplicated upstream-failure rule (ACCEPT) - **Raised by:** Claude **B7** (no convergence) + **B8** (duplicated rule). - **Problem (B7):** §11.21 Phase 4 re-triggers the Revisor on regression with no bound on cascade depth; two outcomes with bidirectional `OutcomeDependencySpec.invalidated_by_outcomes` can ping-pong revisions forever — per-outcome budget never trips because each cycle burns a *different* outcome's budget; `per_plan_max_replans` is plan-level, not cascade-level. **Problem (B8):** the `upstream_failure_cascade` rule is stated in both §5.14.1 and §11.21.2 (invites drift) and neither handles the race where an outcome enters `pending_dependency` *after* the cascade fired. - **Disposition:** **ACCEPT.** Add a cascade-depth bound + cycle detection; consolidate the duplicated rule into one section with a state-entry guard for the race. - **Fix (V3.3.1 §6.14 RevisorConfig + §11.21.2 + §22):** ```ts // RevisorConfig: max_revalidation_cascade_depth: number; // default 5, measured from the originating mutation receipt // Loop Controller tracks cascade chains; a regression re-entering an outcome already in the chain → abort: // validation.revalidation_cascade_loop (new, §22) // HardRevisionCallKind += "revalidation_cycle" // surface tie-break to the user // §11.21.2 becomes the single home of upstream_failure_cascade; §5.14.1 links to it (no restating). // State-entry guard: an outcome transitioning to pending_dependency AFTER a cascade fired on its // upstream module is auto-evaluated against the upstream_failure set at state entry (not a new cascade pass). ``` **Audit fold-in (R0.4) — revalidation cascade** (DEEP §5), shared by B-05/B-06/B-20. CH-06 asserts convergence; this renders the cascade run + the dependency-graph policy that bounds it. *(Row-merge with B-06/B-20 remains declined — see Confirmed divergences; rows stay distinct, the cascade mechanism is shared.)* ```ts interface OutcomeDependencyGraphPolicy { edges: { from_outcome_id: string; to_outcome_id: string }[]; cycle_handling: "reject_at_build" | "break_with_receipt"; // dag_acyclic (B-25 verified) max_cascade_depth: number; // convergence bound } interface RevalidationCascadeRun { cascade_id: string; trigger_outcome_id: string; revalidated_outcome_ids: string[]; depth_reached: number; converged: boolean; schema_version: 1; } // lints: validation.revalidation_cascade_unbounded ; validation.duplicate_upstream_failure_revalidation // success-condition-5 race (B-20) ``` ### B-06 — [V3.3.1 §dependency / Common Contracts] Outcome-dependency cycles + `pending_dependency` deadlock (ACCEPT) - **Raised by:** Claude C4; Gemini **D-03**. - **Problem:** dependency direction is undefined for cycles, and an upstream `could_not_fix`/halt does not instantly cascade to downstream `pending_dependency` outcomes, so the graph hangs waiting on an artifact that will never arrive. - **Disposition:** **ACCEPT.** Define cycle detection at dependency-declaration time (reject or break with a Hard Call) and an instant downstream cascade on terminal upstream failure. - **Fix:** dependency graph rejects declared cycles (`validation.outcome_dependency_cycle`); on upstream terminal-failure, all transitively dependent `pending_dependency` outcomes immediately transition to `upstream_failure` (per the A-06 matrix), rather than waiting on `wait_timeout`. ### B-07 — [V3.3.1 §11.22] Parallel sibling outputs orphaned after batch failure; parallelism not first-class (ACCEPT) - **Raised by:** ChatGPT CG-R6 / CG-R7 / CG-AA11. - **Problem:** when one step in a parallel batch fails, the sibling steps' completed outputs have no defined finalization/disposition (orphaned candidates), and the parallel-batch configuration isn't a first-class, inspectable object. - **Disposition:** **ACCEPT.** Add a `ParallelBatchFinalizationReceipt` recording per-sibling disposition on batch failure; surface the parallelism config. - **Fix:** ```ts interface ParallelBatchFinalizationReceipt { batch_id: string; plan_id: string; run_id: string; sibling_results: Array<{ step_id: string; status: "completed" | "failed" | "cancelled" | "orphaned"; candidate_ref?: StorageRef; disposition: "retained" | "discarded" | "held_for_review"; }>; batch_outcome: "all_completed" | "partial_failure" | "aborted"; created_at: ISO8601; schema_version: "1.0"; } ``` **B-08** → rendered in Layer 3 core (that rendering governs). ### B-09 — [V3.3.1 §candidate] Candidate artifacts can't model external side effects; lifecycle mixed with head-pointer (ACCEPT) - **Raised by:** ChatGPT CG-R13 / CG-R14. - **Problem:** `CandidateArtifactVersion` conflates the version's lifecycle state with the "current head" pointer, and there's no way to model a candidate whose application has an external side effect (the thing that can't be branched — see §F-01). - **Disposition:** **ACCEPT.** Separate candidate lifecycle from the head pointer; route side-effecting candidates through `SideEffectIntentCandidate` (Appendix N). - **Fix (V3.3.1 candidate model + Appendix N):** ```ts // Split: CandidateArtifactVersion.lifecycle_state (candidate|accepted|rejected|superseded|reverted) // ArtifactHead.current_version_ref (separate projection; one head per artifact) // Side-effecting application uses SideEffectIntentCandidate [Appx N]: // { side_effect_class, dry_run_payload_ref, approval_status, execution_policy_ref, // state: draft|approved|executed|cancelled|blocked, execution_receipt_ref } ``` ### B-10 — [V3.3.1 §11 / FD] `TaskCancelProtocol` + Hard Call blocking scope undefined (ACCEPT) - **Raised by:** ChatGPT CG-R15 / CG-R16. - **Problem:** there is no clean mid-run cancel protocol, and a pending blocking Hard Call doesn't say what it blocks (the step, the outcome, the artifact, the whole run, or just side effects). - **Disposition:** **ACCEPT.** Adopt Appendix M's `TaskCancelProtocol` and `HardCallPendingPolicy`/`HardCallBlockingScope` verbatim. - **Fix — `[Appx M]`:** ```ts type HardCallBlockingScope = "entire_run" | "segment" | "artifact" | "outcome" | "module" | "side_effect_only"; interface HardCallPendingPolicy { hard_call_id: string; blocking_scope: HardCallBlockingScope; blocked_refs: string[]; allowed_to_continue_refs: string[]; context_visible_to_continuing_modules: "none" | "hard_call_pending_summary" | "full_context_redacted"; on_defer: "continue_with_warning" | "pause_scope" | "abort_scope"; on_timeout: "escalate" | "abort" | "continue_with_warning"; timeout_ms?: number; schema_version: "1.0"; } interface TaskCancelProtocol { cancel_request_id: string; task_id: string; run_id: string; requested_by_ref: string; cancel_scope: "entire_run" | "segment" | "module_activation" | "revision_plan" | "side_effect_intent"; target_refs: string[]; in_flight_handling: "request_graceful_stop" | "preempt_immediately" | "finish_current_step_then_stop"; side_effect_policy: "do_not_cancel_executed_side_effects" | "cancel_unexecuted_intents" | "create_corrective_artifact"; candidate_disposition: "discard" | "retain_for_manual_review" | "orphan_until_reconciled"; source_workspace_disposition: "retain_records" | "mark_records_cancelled" | "rollback_if_uncommitted"; learning_signal_policy: "suppress_success_signals" | "emit_cancel_diagnostic_only" | "emit_full_signals_with_cancel_flag"; user_receipt_ref?: StorageRef; created_at: ISO8601; schema_version: "1.0"; } ``` - **Rationale:** `side_effect_policy: do_not_cancel_executed_side_effects` is the cancel-side counterpart of the §F-01 "side effects can't branch" principle — cancel never pretends an executed effect is undone. ### B-11 — [V3.3.1 §skip / FD] `TaskSkipProtocol` + skip receipts missing (ACCEPT) - **Raised by:** ChatGPT CG-AA5. - **Problem:** an outcome/step can be skipped, but there's no protocol or receipt recording who skipped it, why, and the downstream effect. - **Disposition:** **ACCEPT.** Add a skip protocol mirroring cancel, with a receipt. - **Fix:** ```ts interface TaskSkipReceipt { skip_id: string; target_ref: string; skipped_by_ref: string; reason: "not_applicable" | "user_directed" | "dependency_unavailable" | "policy_blocked"; downstream_effect: "none" | "marks_dependents_not_applicable" | "requires_human_ack"; created_at: ISO8601; schema_version: "1.0"; } ``` ### B-12 — [Common Contracts §policy / V3.3.1] Policy freshness checks a field `PolicyDecisionRef` lacks (ACCEPT) - **Raised by:** ChatGPT CG-R8. - **Problem:** freshness logic expects a `superseded_by_decision_id` (and related staleness fields), but `PolicyDecisionRef` has no such field, so freshness can't actually be evaluated. - **Disposition:** **ACCEPT.** Add the freshness fields, or make the EC policy record the authoritative freshness source. - **Fix:** ```ts // add to PolicyDecisionRef (or PolicyEvaluationRef): issued_at: ISO8601; subject_hash: string; decision_scope_hash: string; superseded_by_decision_id?: string; policy_engine_version: string; // alt: mark EC's policy record as the required freshness source-of-truth and have the ref point to it. ``` ### B-13 — [Source Workspace §6 / Common Contracts] `ResearchNeed` concurrency — needs a lease (ACCEPT) - **Raised by:** Claude **D15**. - **Problem:** two modules can independently pick up and satisfy the same `ResearchNeed` (duplicate work, conflicting results), because there's no lease/claim with idempotency. - **Disposition:** **ACCEPT.** Add a `ResearchNeedLease` so a need is claimed atomically. - **Fix:** ```ts interface ResearchNeedLease { need_id: string; leased_by_module_id: string; lease_token: string; acquired_at: ISO8601; expires_at: ISO8601; on_expiry: "release_for_reclaim" | "escalate" | "mark_abandoned"; satisfied_by_result_ref?: StorageRef; schema_version: "1.0"; } // acquisition is atomic compare-and-set on need status; second acquirer gets need_already_leased. ``` ### B-14 — [Task Forum §x] Forum deadlock breaker missing (ACCEPT) - **Raised by:** Gemini **F-02**; Claude D3 / D20. - **Problem:** the Task Forum has no tie-break, timeout, or max-rounds, so deliberation can hang with no circuit breaker. - **Disposition:** **ACCEPT.** Forum runs as an FSM with a deliberation tick cap and a consensus threshold; on expiry it escalates to a Hard Call. - **Fix:** ```ts interface ForumDeliberationPolicy { room_id: string; max_deliberation_ticks: number; consensus_threshold_pct: number; on_no_consensus: "forum_deadlock_hard_call" | "task_agent_decides" | "prefer_safest_proposal"; schema_version: "1.0"; } // reaching max_deliberation_ticks without consensus_threshold_pct → state forum_deadlock → HardCall escalation. ``` **B-15** → rendered in Layer 2 (flip) (that rendering governs). ### B-16 — [V3.3.1 §11.9] Concurrency tie-breaker relies on wall-clock `created_at` (ACCEPT-AS-FIX) - **Raised by:** Gemini **BUG-04**. - **Problem:** Rule 4 of `concurrency_tie_breaker` is `RevisionPlan.created_at` ascending. In a local multi-threaded runtime, millisecond timestamps are subject to event-loop clock skew; granting a workspace write-lock on timestamp creates a race and can let a 45-second plan win the lock over a 1-second fix. - **Disposition:** **ACCEPT-AS-FIX** (Gemini's patch verbatim) — order by risk reduction and lock-release speed, not timestamp. - **Fix (V3.3.1 §11.9) — `[Gemini BUG-04]`:** ``` RULE concurrency_tie_breaker (UPDATED): 1. OutcomeDependencySpec.required_for_overall_pass = true > false 2. EvaluationOutcomeDefinition.is_high_stakes = true > false 3. EvaluationOutcomeDefinition.priority ascending (lower value = wins) 4. RevisionPlan.plan_risk_score descending (safer plans acquire lock first — prevent catastrophic collisions) 5. RevisionCostEstimate.total_tokens ascending (smaller/faster plans execute and release the lock quicker) ``` **Audit fold-in (R0.4) — plan-selection tiebreaker** (DEEP §5 / ChatGPT §8). *Correction:* `tiebreaker_epsilon` was removed from the L1.5 **metric** layer (correct — mis-anchored there); the reviewers' actual proposal is a **plan-selection** tiebreaker on `plan_risk_score`, rendered here. Wall-clock is forbidden as a tiebreaker (non-total-order under clock skew). ```ts interface PlanSelectionTiebreakPolicy { primary_key: "plan_risk_score"; epsilon: number; // band within which scores are treated as tied fallthrough: ("fewest_steps" | "fewest_irreversible_effects" | "lowest_plan_id_hash")[]; schema_version: 1; } // lints: validation.concurrency_tie_breaker_uses_wall_clock ; validation.concurrency_tie_breaker_missing_final_tiebreaker ; validation.concurrency_tie_breaker_not_total_order ``` ### B-17 — [FD §8.4 / §10.3] `instruction_in` overload leaks the DOC23/DOC15/DOC24 boundary (ACCEPT) - **Raised by:** Claude **B11**. - **Problem:** FD §8.4 carries both free-form instructions and typed `OutcomeRepairInstruction` payloads over the same general `instruction_in` port, with no discriminator. Receiving modules must runtime-guess the payload type; DOC15/CIL prompt assembly can't know what to render. The typed ports (`repair_instruction_in`, etc.) are marked "ergonomics, not required for V1," so every V1 implementation goes through the overload and solves discrimination differently. - **Disposition:** **ACCEPT.** Don't ship the overload. Either elevate the typed ports to required for V1 (preferred) or add a payload discriminator. - **Fix (FD §8.4 + DOC23 R3.1 port registry §10.3):** ```ts // Preferred: make these REQUIRED V1 ports (register in DOC23 R3.1 §10.3): // feedback_in, repair_instruction_in, run_guidance_in, source_need_in // Fallback if kept on instruction_in: payload union carries a discriminator DOC15/CIL dispatches on: type FeedbackPayloadKind = "free_form_instruction" | "outcome_repair_instruction" | "run_guidance" | "research_need"; // validation.instruction_in_untyped_feedback_payload (error) if a typed payload rides instruction_in without a discriminator. ``` --- ### B-18 — [V3.3.1 §6.5.2 / §7.9] `HardRevisionCall.options` may be empty though spec requires bounded options (ACCEPT) — Claude C3 - **Raised by:** Claude C3 (BUG/HIGH). - **Problem:** §6.5.1 says detection produces a Hard Call "with bounded `HumanDecisionOption[]`," but §7.9.1's schema has no non-empty constraint. Empty `options[]` makes the §21.4 UI non-functional (no buttons); the user can't resolve and the Dispatcher stays in `waiting_hard_call` indefinitely. The `default_if_no_response` trigger ("no response") is itself unspecified. - **Disposition:** **ACCEPT.** Constrain options to ≥2, provide a default pair, and specify the no-response timeout. - **Fix (V3.3.1 §7.9.1):** ``` options: HumanDecisionOption[] // MIN_LENGTH = 2 // When the Compiler cannot enumerate substantive options, default to: // ["continue_with_compiler_proposal", "pause_for_my_input"] // validation.hard_call_options_empty (error). // Specify the timeout that triggers default_if_no_response (e.g., HardCallPendingPolicy.timeout_ms, B-10). ``` ### B-19 — [V3.3.1 §11.6 / §0.4.7] `RevisionOperationKind="hard_call_resolved"` has no producer (ACCEPT) — Claude C10 - **Raised by:** Claude C10 (BUG/MEDIUM). - **Problem:** `hard_call_resolved` is a valid `RevisionOperationKind` and `HardCallResolution` is persisted, but no section says which actor emits the operation receipt. Operation receipts feed `RepairCycleSignal`; ambiguous actor → missing or duplicated receipt, breaking the `hard_call_resolved → revision_operation_receipt_ref` chain in `RevisorActionRecord`. - **Disposition:** **ACCEPT.** Name the producer. - **Fix (V3.3.1 §7.9.4, new):** ``` On recording a HardCallResolution, the Dispatcher emits a RevisionOperationReceipt with operation_kind = "hard_call_resolved" and hard_call_ref → the resolved Hard Call. receipt.actor_ref = Dispatcher runtime identity; resolution.resolved_by = UserRef (recorded separately). ``` ### B-20 — [V3.3.1 §6.7.2 / §11.21] Success-condition 5 races the cascade (ACCEPT) — Claude D2 - **Raised by:** Claude D2 (RISK/MEDIUM). - **Problem:** success-condition 5 ("cascaded dependent outcomes are re-evaluated") is checked at revision-cycle completion, but the §11.21 cascade can still be firing — so a revision can be marked successful before its own regression cascade settles. - **Disposition:** **ACCEPT.** Gate condition 5 on cascade quiescence. - **Fix:** the Loop Controller may evaluate success-condition 5 only when no revalidation is pending in the cascade chain (tracked via B-05's `max_revalidation_cascade_depth` chain state); `validation.success_marked_before_cascade_quiescent` (error). ### B-21 — [Core §9.0.6] Signal emission ordering vs receipt persistence undefined (ACCEPT) — Claude D7 - **Raised by:** Claude D7 (RISK/MEDIUM). - **Problem:** §9.0.6 shows signals emitted then passing the EC policy gate, but doesn't order signal emission against durable persistence of the receipts those signals reference — a consumer can receive a signal pointing at a not-yet-written receipt. - **Disposition:** **ACCEPT.** Enforce emit-after-persist. - **Fix (Core §9.0.6):** a learning signal MUST NOT be emitted until the receipts it references are durably written (read-your-writes guarantee at the EC policy gate); `validation.signal_references_unpersisted_receipt` (error). ### B-22 — [V3.3.1 §5.18 / §11.15] Pattern C doubles per-turn latency/cost with no budget (ACCEPT) — Claude D17 - **Raised by:** Claude D17 (RISK/MEDIUM). - **Problem:** Pattern C wires a Judge downstream of every standalone Evaluator; in an iterative revision loop this doubles evaluation latency and cost per turn, but no budget governs the Pattern C Judge separately from the Evaluator — a long loop silently doubles cost. - **Disposition:** **ACCEPT.** Add a Pattern C invocation budget / cadence. - **Fix:** ```ts interface PatternCInvocationBudget { max_judge_invocations_per_run?: number; cadence: "every_turn" | "every_n_turns" | "on_terminal_only"; n_turns?: number; // required when cadence = "every_n_turns" } // governs the Pattern C Judge independently of the upstream Evaluator. ``` ### B-23 — [FD §6.3] Multiple delivery branches fire simultaneously with no idempotency control (ACCEPT) — Claude D24 - **Raised by:** Claude D24 (RISK/MEDIUM). - **Problem:** `FeedbackRoutingPolicy` branches aren't stated to be mutually exclusive; one result can match several (e.g., `on_needs_revision` and `on_needs_more_sources`), firing multiple deliveries with no idempotency key or cost guard — duplicate or conflicting deliveries. (Complements A-08 routing completeness with the *exclusivity/idempotency* rule.) - **Disposition:** **ACCEPT.** State branch exclusivity, or an explicit multi-fire policy with idempotency keys. - **Fix (FD §6.3):** ```ts type RoutingFirePolicy = "first_match_only" | "all_matching_with_idempotency"; // when "all_matching_with_idempotency", each delivery carries // delivery_idempotency_key = hash(run_id + result_id + branch); duplicates suppressed. ``` --- **B-24** → rendered in Layer 3 core (that rendering governs). **B-25** → rendered in Layer 2 (flip) (that rendering governs). **B-26** → rendered in Layer 3 core (that rendering governs). **B-27** → rendered in Layer 3 core (that rendering governs). ### R.3 — §C rows **C-01** → rendered in Layer 3 C-cluster (that rendering governs). **C-02** → rendered in Layer 3 C-cluster (that rendering governs). **C-03** → rendered in Layer 3 C-cluster (that rendering governs). **C-04** → rendered in Layer 3 C-cluster (that rendering governs). **C-05** → rendered in Layer 3 C-cluster (that rendering governs). **C-06** → rendered in Layer 3 C-cluster (that rendering governs). **C-07** → rendered in Layer 3 C-cluster (that rendering governs). **Audit fold-in (R0.4) — richer cost model** (DEEP §6 / ChatGPT §9). Supersedes scalar-only `CostEstimate`, which becomes a projection of `CostVector`. ```ts type CostDimension = "tokens_in" | "tokens_out" | "wall_clock_ms" | "tool_invocations" | "usd_estimate" | "sub_agent_activations"; interface CostVector { dimensions: Record; schema_version: 1; } interface CostCorrelationPolicy { // how per-step CostVectors combine into a run total + quantile (C-05) summing combine: "sum" | "max" | "p50" | "p90"; quantile_sum_treatment: "assume_independent" | "assume_perfectly_correlated" | "explicit_copula_ref"; correlation_ref?: string; } // lints: validation.cost_vector_unit_mismatch ; validation.cost_quantile_sum_without_correlation_policy ``` **C-08** → rendered in Layer 3 C-cluster (that rendering governs). **C-09** → rendered in Layer 3 C-cluster (that rendering governs). **C-10** → rendered in Layer 3 C-cluster (that rendering governs). ### R.4 — §D rows **D-01** → rendered in Layer 3 core (that rendering governs). ### D-02 — [Source Workspace §2 / Governance] Workspace taint aggregation undefined (ACCEPT) - **Raised by:** ChatGPT (GAP/HIGH). - **Problem:** a workspace holds many `SourceRecord`s of differing taint, but no rule says what the workspace's aggregate taint is — so downstream consumers can't reason about the set. - **Disposition:** **ACCEPT.** Adopt `TaintAggregationPolicy` (Appx G): `workspace_taint = max_taint` over records; any privileged record marks the workspace privileged; matter-scope rule explicit. (Same policy object as D-01.) ### D-03 — [Source Workspace §0.3 / V3.3.1 §12] `TaskSourceWorkspace` vs `SourceWorkspace` identity split (ACCEPT) - **Raised by:** ChatGPT (BUG/HIGH). - **Problem:** the task-scoped `TaskSourceWorkspace` and V3.3's `SourceWorkspace` are described as if distinct, with no statement of whether they're one type or two — a coding agent builds two stores. - **Disposition:** **ACCEPT.** One canonical workspace identity in the TypeOwnerRegistry (A-09); the task-scoped form is a view/parameterization, not a separate type. `validation.dual_source_workspace_identity` (error). **D-04** → rendered in Layer 2 (flip) (that rendering governs). ### D-05 — [Source Workspace §6.2 / §7.4] `ResearchNeed` scoping, ref types, and `human_needed` exit (ACCEPT) — incl. Claude D6 - **Raised by:** ChatGPT (`run_id` conflicts with task-scoped workspaces; wrong ref types; routed to wrong port); Claude **D6** (`human_needed` no exit). - **Problem:** `ResearchNeed.run_id` conflicts with task-scoped workspaces; source/target refs use the wrong reference types; and the `human_needed` status has no exit (who resolves it, what transitions out, what happens at run end). - **Disposition:** **ACCEPT.** Adopt the canonical `ResearchNeed` (Appx G) with an explicit `need_scope`, corrected refs, and a `human_needed` exit + run-end default. - **Fix — `[Appx G]`:** ```ts // ResearchNeed adds: need_scope: "run" | "task" | "workspace"; status includes "human_needed" with exits → // "answered" | "unresolved" | "cancelled"; target uses ArtifactScopeRef|ClaimRef (not raw string). // Run-end default: any ResearchNeed still "human_needed"/"open" at run end → "unresolved" with a carry note. // (Lease for concurrency = ResearchNeedLease, B-13.) ``` ### D-06 — [Source Workspace §4 / §5 / §6.3] Evidence anchors, payload registry, weak tool-receipts, extractor-vs-source confusion (ACCEPT) - **Raised by:** ChatGPT (evidence anchors; `domain_payload` registry/versioning; tool-receipt-as-research too weak; missing-extractor mis-reported as missing-source). - **Problem:** source records can't anchor evidence to specific claims; `domain_payload` is unversioned/unregistered; a tool receipt is accepted as "material research" with too little structure; and a missing claim-extractor is reported as a missing source. - **Disposition:** **ACCEPT.** Add first-class evidence anchors + a payload registry; distinguish extractor-missing from source-missing. - **Fix — `[Appx G]`:** ```ts // SourceEvidenceAnchor { anchor_kind: page|quote|section|timestamp|row|url_fragment|paragraph|line_range; // supports_claim_refs[]; support_strength: direct|indirect|contextual|contradicts; extracted_text_ref } // DomainPayloadRegistryRef { domain_payload_kind; domain_payload_schema_ref; domain_payload_version } // New status distinguishing claim_extractor_unavailable from source_unavailable. ``` **Audit fold-in (R0.4) — claim-support derivation** (DEEP §7 / ChatGPT §10). `SourceEvidenceAnchor` (above) ties a claim to its source span; the derivation policy/receipt records HOW a support status was computed, so "supported" is never asserted bare. ```ts interface ClaimSupportDerivationPolicy { method: "exact_span" | "paraphrase_match" | "model_judgment"; min_anchor_count: number; model_judgment_requires_anchor: boolean; } interface ClaimSupportDerivationReceipt { claim_ref: string; anchors: string[]; // -> SourceEvidenceAnchor[] support_status: "supported" | "partially_supported" | "unsupported" | "not_checked"; derived_by: ClaimSupportDerivationPolicy["method"]; schema_version: 1; } // lints: validation.claim_support_status_without_derivation ; validation.evidence_anchor_without_claim_ref ; validation.extractor_missing_reported_as_source_missing ``` ### D-07 — [Task Forum §5.2 / §5.3] Forum posts: visibility, supersession, governance envelopes (ACCEPT) - **Raised by:** ChatGPT (`selected_modules` visibility broken; supersession described-not-modeled; posts need governance envelopes); Grok GK-4.7/5.4. - **Problem:** `selected_modules` visibility can't actually represent the selected modules; supersession is prose-only (no fields); and posts carry no governance (data_class/taint/policy/privilege). - **Disposition:** **ACCEPT.** Adopt the hardened `TaskRunBoardPost` (Appx H). - **Fix — `[Appx H]`:** ```ts // TaskRunBoardPost adds: visibility_target_refs: VisibilityTargetRef[] (fixes selected_modules); // lifecycle_state: "active"|"superseded"|"retracted" + supersedes_post_ids[] + superseded_by_post_id + supersession_reason; // governance: RunBoardGovernanceEnvelope { data_class, taint_class, policy_decision_refs, sanitization_required, // governance_class, privileged, matter_id }. ``` ### D-08 — [Task Forum §6.3 / §6.4] Context-packet ownership, request/receipt, omission manifest, audience-enum divergence, total budget (ACCEPT) — incl. Claude D18 - **Raised by:** ChatGPT (packet needs request/receipt/freshness/omission manifest; digest/packet audience enums diverge); Grok GK-4.3; Claude **D18** (token budget fragmented across packets). - **Problem:** `TaskRunContextPacket` overlaps DOC24's packet ownership, lacks a request/receipt/freshness/omission manifest, and its audience enum diverges from the digest's. Separately (D18), token budgets are declared per-packet with no authority checking the **sum** a module receives. - **Disposition:** **ACCEPT.** Adopt the request/receipt/omission types (Appx H) and a per-activation total-context budget enforced by the assembler. - **Fix — `[Appx H]`:** ```ts // TaskRunContextPacketRequest { audience: TaskContextAudience, requested_max_tokens, required/optional_item_refs, // staleness_policy } → TaskRunContextPacketAssemblyReceipt { packet_content_hash, actual_token_count, // omitted_items: OmittedPacketItem[], valid_until_event_seq, invalidated_by_event_kinds }. // Single TaskContextAudience enum shared by digest + packet. // D18: PerActivationContextBudget — the assembler MUST sum context packet + board digest + feedback bundle and // enforce one cap; validation.per_activation_context_budget_exceeded (error). ``` ### D-09 — [Task Forum §1 / §5] Passive board auto-publishes every event — privacy/volume controls (ACCEPT) — Claude B12 - **Raised by:** Claude **B12**; ChatGPT (RISK/HIGH). - **Problem:** the passive board "auto-publishes every event," which is unbounded and leaks content with no privacy/volume gate. - **Disposition:** **ACCEPT.** Every post carries the `RunBoardGovernanceEnvelope` (D-07) and the board applies a publication policy (volume cap + class filter). - **Fix:** add `RunBoardPublicationPolicy { auto_publish_post_kinds[], max_posts_per_run, suppress_classes: ("privileged"|"local_only")[], rate_limit_per_min }`; posts failing the policy are withheld with a receipt. Pairs with D-12 retention. ### D-10 — [Task Forum §7.2 / §3.2 / §4.6 / §3.1] ModuleAssistanceRequest schema, participant/moderator model, payload schemas, moderator-failure (ACCEPT) — incl. Claude D3 - **Raised by:** ChatGPT (request lacks endpoint/lease/timeout/answer schema; participant policy only models modules; moderator condition incoherent; `decision_out`/`signal_out` payloads missing); Claude **D3** (moderator failure path). - **Problem:** `ModuleAssistanceRequest` has no endpoint/lease/timeout/answer schema; participant policy can't model non-module participants; the moderator-required condition is incoherent; `decision_out`/`signal_out` payloads are undefined; and there's no behavior when the moderator agent is unavailable (D3). - **Disposition:** **ACCEPT.** Adopt the full `ModuleAssistanceRequest` (Appx H) + a moderator fallback analogous to the Task Agent fallback. - **Fix — `[Appx H]`:** ```ts // ModuleAssistanceRequest { target, target_endpoint_ref, answer_schema_ref, request_kind, response_policy, // lease {holder_ref, lease_version, expires_at}, timeout_ms, on_timeout: resume_with_warning|abort|escalate_human|ask_task_agent }. // Participant policy models module|subagent|task_agent|user. decision_out/signal_out get explicit payload schemas. // §4.6A moderator fallback: moderator unavailable → degrade to "none" | pause | queue (default: ask_task_agent then pause). ``` ### D-11 — [Run Board §3.1 / §6.2] BoardDigest filter rule unspecified (ACCEPT) — Claude C9 - **Raised by:** Claude **C9**. - **Problem:** `BoardDigest` carries `included_post_ids` but the **selection rule** is unspecified — a 500-post forum can't ship all posts in a ~1200-token digest, and implementations pick differently. - **Disposition:** **ACCEPT.** Extend `BoardDigestPolicy` with explicit selection. - **Fix:** ```ts // BoardDigestPolicy adds: included_post_kinds: TaskRunBoardPostKind[]; included_severity_threshold; // max_posts: number; selection_strategy: "recency"|"severity"|"score"|"mixed". // Default: kinds {evaluation_finding, repair_instruction, process_gap, user_guidance}; severity ≥ medium; max 30; mixed. ``` ### D-12 — [Run Board §retention / Source Workspace §9] Run Board retention/compaction/event-class + EC-policy persistence (ACCEPT) - **Raised by:** ChatGPT (retention/compaction/event-class missing; persistence should reference EC policy). - **Problem:** the Run Board has no retention, compaction, or event-class taxonomy, and persistence doesn't reference the EC policy decisions that should govern it. - **Disposition:** **ACCEPT.** Add retention + compaction policies and an event-class taxonomy; bind persistence to EC policy. - **Fix:** ```ts type RunBoardEventClass = "post" | "digest" | "assistance_request" | "moderation" | "lifecycle"; interface RunBoardRetentionPolicy { retain_event_classes: RunBoardEventClass[]; retain_for_days: number; ec_policy_decision_ref: PolicyEvaluationRef; } interface RunBoardCompactionPolicy { compact_after_days: number; strategy: "summarize"|"drop_low_severity"|"archive"; } ``` **D-13** → rendered in Layer 3 core (that rendering governs). ### D-14 — [Core §3D / §4B / §5A] InjectionSlotRegistry, compact card schemas, command registry, token_budget, DOC24 capability ownership, receipt booleans (ACCEPT) — incl. Claude C15 - **Raised by:** ChatGPT (InjectionSlotRegistry gap CRITICAL; compact card schemas missing; Core defines a DOC24-owned capability; `TrackedTaskReceipt` booleans need command refs; command registry should be mandatory); Claude **C15** (packet taxonomy unstated); Grok GK-5.11 (`token_budget` never populated). - **Problem:** the `InjectionSlotRegistry` for the task-system DOC24 slots is unspecified; the compact top-k card schemas are missing; Core normatively defines a DOC24-owned capability; `TrackedTaskReceipt` carries action booleans without command refs; `TaskOpportunityPacket.token_budget` is never populated/checked; and the packet taxonomy (opportunity vs run-context vs design) is unstated. - **Disposition:** **ACCEPT.** Adopt the Core appendix wholesale; move the capability definition to DOC24; make the command registry mandatory. - **Fix — `[Appx I]`:** ```ts // TaskSystemInjectionSlotRegistration { slot_id (6 DOC24 slots), slot_kind, surfaces, token_cap, // pii_redaction_required, on_unavailable: omit|degrade_direct_first|block_explicit_task_route, receipt_required }. // Compact cards: CompactTaskInvocationDirectiveCard / CompactTaskTemplateCard / CompactModulePresetCard // (each with CalibratedScore, risk_flags, token_estimate, redaction_state, source_authority). // TaskCommandRegistryEntry (mandatory) { request/response_schema_ref, idempotency_key_required, durable_write, // telemetry_event_kind, read_model_invalidations, failure_codes, required_policy_checks, owning_doc }. // TrackedTaskReceipt.available_actions: AvailableTaskAction[] (each → command_ref + idempotency_key_required). // Packet taxonomy subsection (C15): TaskOpportunityPacket (pre-task), TaskRunContextPacket (in-run), TaskAgentDesignPacket (future). // Capability def moves to DOC24 (Core references). token_budget populated + checked at assembly. ``` ### D-15 — [V3.3.1 §8.4 / §17.1] Sub-agent: output-contract plurality, no-sub-agent fallback, coordination-point count (ACCEPT) — incl. Claude B10, D1 - **Raised by:** ChatGPT (`output_contract_ref` singular vs variant semantics; no-sub-agent fallback incomplete); Claude **B10** (no fallback at evaluator point) + **D1** (coordination point count 4 vs 5). - **Problem:** `output_contract_ref` is singular but variant semantics need plurality; the "no sub-agent available" fallback is incomplete (esp. at the evaluator coordination point); and §17.1 lists five coordination points while §8.4's enum has four (no `plan_verifier`), so a profile can't declare for the fifth. - **Disposition:** **ACCEPT.** Plural output contracts; a fallback policy per coordination point; reconcile to five points. - **Fix:** ```ts // §8.4 allowed_coordination_points += "plan_verifier" (reconcile to §17.1's five). // output_contract_refs: StorageRef[] (was singular). interface SubAgentFallbackPolicy { coordination_point: AllowedCoordinationPoint; on_no_sub_agent: "use_primary_module" | "skip_with_warning" | "hard_call" | "degrade_quality_with_note"; } // every coordination point declares a fallback; validation.sub_agent_point_without_fallback (error). ``` ### D-16 — [Source Workspace / Forum] Workspace API operations referenced but never defined (ACCEPT) ⚠verify - **Raised by:** Grok GK-4.2. - **Problem:** workspace operations (create/read/append/lock) are referenced but never defined — a circular reference with no API surface. - **Disposition:** **ACCEPT.** Define the workspace API (operations, args, receipts), bound to the command registry (D-14) and write-precondition (B-03). - **⚠verify:** confirm the current state of any partial workspace-API definition before authoring. ### D-17 — [Common Contracts §7 / V3.3.1 §7.9.3] Anchor/hash hygiene: empty StructuredAnchor, uncomputed context_hash, HardCall hash normalization (ACCEPT) — incl. Claude C7, D14 - **Raised by:** Grok GK-5.5 (`TextAnchor.context_hash` never computed/validated); Claude **C7** (`StructuredAnchor {}` validates but is useless) + **D14** (`HardCallResolution` hash normalization unspecified). - **Problem:** `context_hash` is defined but never computed/validated; a `StructuredAnchor` with all-optional fields can be validly empty yet un-resolvable; and `HardCallResolution` reuse compares hashes whose normalization is unspecified (cosmetic diffs → needless re-escalation). - **Disposition:** **ACCEPT.** Compute/validate anchors; require non-empty structured anchors; specify canonical normalization before hashing. - **Fix:** ``` StructuredAnchor MUST populate ≥1 of {section_id, field_path, citation_ref}; validation.structured_anchor_empty (error). TextAnchor.context_hash MUST be computed at creation and validated at resolve; validation.context_hash_unverified (warning). HardCallResolution: canonicalize outcome_definition_hash / goal_context_hash inputs (trim, sort keys, normalize whitespace) before hashing (§7.9.3); validation.hard_call_hash_unnormalized. ``` ### D-18 — [V3.3.1 §repeated-failure] Repeated-failure detection keyed on versioned refs (ACCEPT) - **Raised by:** Grok GK-5.12. - **Problem:** repeated-failure detection keys on `affected_artifact_refs`, which are version-stamped, so the same logical artifact across versions doesn't match — repeats go undetected. - **Disposition:** **ACCEPT.** Key on stable artifact identity, not the versioned ref. - **Fix:** detection key = stable `artifact_id` (+ `outcome_definition_id`), not `artifact_version_ref`; `validation.repeated_failure_keyed_on_version` (lint). ### D-19 — [Source Workspace §SourceRecord] Per-module cost attribution has no field (ACCEPT) - **Raised by:** Grok GK-5.10. - **Problem:** cost can't be attributed per module on `SourceRecord` — no field — so per-module cost reporting (C-07 / BudgetNarrative) has nothing to read. - **Disposition:** **ACCEPT.** Add a per-module cost field. - **Fix:** `SourceRecord.acquisition_cost?: CostEstimate` (C-07 type) + `acquired_by_module_id`; feeds the cost rollups. ### D-20 — [Source Workspace §library] Library promotion gate references EC policy but is undefined (ACCEPT) - **Raised by:** Grok GK-4.10. - **Problem:** promotion of a workspace source into the durable Library references an EC policy gate, but the gate itself isn't defined. - **Disposition:** **ACCEPT.** Define the promotion gate (criteria + policy check + receipt). - **Fix:** `LibraryPromotionGate { min_tier, requires_verification_state, ec_policy_decision_ref, requires_access_tier }`; promotion emits a receipt; `validation.library_promotion_without_gate` (error). ### D-21 — [V3.3.1 / Core] `requires_background_progress` overloaded (ACCEPT) - **Raised by:** ChatGPT (CG audit addendum). - **Problem:** `requires_background_progress` is used for two different meanings (a run needing background work vs a module needing periodic progress) with no disambiguation. - **Disposition:** **ACCEPT.** Split into two named fields. - **Fix:** replace with `requires_background_execution: boolean` (run-level) and `emits_progress_heartbeat: boolean` (module-level); migrate references. ### D-22 — [Common Contracts §11.5 / cross-refs] Backward-compat overstated; section-anchor hygiene before R3.2 absorption (ACCEPT WITH MODIFICATIONS) — Claude D5/D22 - **Raised by:** Claude **D5** + **D22**. - **Problem:** §11.5's backward-compat claim overstates stability, and the set uses section-number cross-refs where stable anchors are needed before the DOC23 R3.2 absorption. (Overlaps A-15; tracked there for the build-ready gate.) - **Disposition:** **ACCEPT WITH MODIFICATIONS.** Soften the compat claim to "compatible within the locked schema-of-record set; no cross-version guarantee yet"; convert section-number cross-refs to stable anchors as pre-absorption hygiene. - **Fix:** anchor-stabilization pass across the set before R3.2 absorption; `validation.unstable_section_cross_ref` (lint). Coordinate with A-15. ### D-23 — [FD §9.4] "Silent ignoring fires validation" is unenforceable (ACCEPT) — Claude C11 - **Raised by:** Claude **C11**. - **Problem:** §9.4 says silent ignoring (no receipt) fires `validation.feedback_consumed_without_receipt` at audit, but the audit has no way to enumerate *expected* receipts — so it never fires; a module that processed-and-ignored looks identical to one that received nothing. - **Disposition:** **ACCEPT.** Track receipt expectations at dispatch so the audit can compare. - **Fix:** ```ts // §9.4A: when the router (§6) dispatches a bundle, record a FeedbackDispatchExpectation keyed to // (feedback_bundle_id, consumer_module_id, consumer_activation_seq). At run end (or after a grace period, // default 5 min post-activation) the audit compares expectations ↔ FeedbackConsumptionReceipts [Appx F]; // missing pairs fire validation.feedback_consumed_without_receipt. ``` **D-24** → rendered in Layer 3 core (that rendering governs). ### R.5 — §E rows (held, Phase-B-gated) # §E. HELD — Phase-B-gated / self-learning (tracked, not adjudicated for build) Per your instruction: held but tracked. Gated on the Phase-B corpus audit (writing these now = guessing their own spec). - **E-01 Memory Hydration phase** — pre-run `HydrateMemory` step (query BDSM/DOC72/RunGuidance, resolve by precedence, inject priors, stamp `HydratedMemoryHash`). *Architectural absence; adopted in principle, build gated.* (Gemini §7; Claude §6.1/§6.3) - **E-02 Memory Precedence Hierarchy** — `Local Intent > Matter/Scope Policy > Global DOC72`. (Gemini §7; Claude §6.1/§6.3) - **E-03 SubAgentPrior injection** ("Sub-Agent Amnesia") — inject last-N relevant findings/RepairInstructions per sub-agent. (Gemini GM-V1; Claude §6.1/§6.3) - **E-04 TaskBlueprint Topology/Payload bifurcation** — `TaskTopology` (generalized) vs `TaskPayload` (ephemeral). (Gemini §7; Claude §6.1/§6.3) - **E-05 Flawless-execution denominator** (learning side) — count clean passes via verdict-aware `OutcomeEvaluationSignal` so utility doesn't falsely decay. *(This is the real home of the deleted `EvaluationAffirmation`, A-09.)* (Gemini GM-V2; Claude §6.1) - **E-06 Sub-agent reputation** model/slices/model-class calibration (C-04's weighting feeds this). (Claude D22; ChatGPT CG-SA3/CG-SL12) - **E-07 `LearningMode` consumption behavior** (A-11's value set is adopted; the behavior is held); `goal_advancement_count` decrement (D4); `cross_model_applicability` runtime behavior (D21). - **E-08 Process-gap → design-pattern loop** (A6) + substantive-vs-process gap enforcement. (Claude A6; Grok GK-3.5) - **E-09 Longitudinal pattern view (S6)** and other learning-touching surfaces. - **E-10 All ChatGPT `[SELF-LEARNING]` items (CG-SL1..12)** incl. TIE / Task Improvement Engineer (**Appendix P**, held), LoopEffectivenessTestRunRecord, BDSM utility-compilation gap, revealed-preference dampening, outcome clustering, UserConstitution prior, multi-prior conflict, PlanDiff/ProposalDiff, active-learning bundle. - **E-11 Design-Feedback as peer to Artifact-Repair** — elevate Task Agent graph-patch proposals to first-class learned outcomes. (Grok GK-3.2/N1) - **E-12 Automatic Pattern Suggestion UI** (≥2 failures → top-3 similar patterns). (Grok GK-N4) --- ### E-13 — [Common Contracts learning envelope] Multi-user forward-compatibility fields (DEFER-PhaseB; fields addable now) - **Raised by:** ChatGPT (GAP/HIGH); Claude self-learning analysis. - **Problem:** the learning-signal envelope has `data_class` and `matter_id` but lacks `principal_id`, learning scope, share eligibility, and scope-inference basis. Retrofitting team/firm/networked learning scope later — over privilege and matter data — is dangerous. - **Disposition:** **DEFER-PhaseB for behavior; fields addable now as forward-compat** (same pattern as A-11 value-set / C-04 math: adopt the schema surface now, defer the multi-user learning *behavior* to Phase B). Flagged here so it isn't lost; the actual networked-learning semantics are part of the Phase-B learning spec. - **Fix (forward-compat fields, addable now):** ```ts // add to learning signal envelope / learning artifacts: principal_id: string; learning_scope: "local" | "matter" | "team" | "firm" | "networked"; scope_inference_basis: string; default_scope_rule: string; share_eligibility: "none" | "opt_in" | "policy_gated"; ``` --- --- # §F. DECLINED — review §6.2 (closed; not re-opened) | # | Proposed | Raised by | Why declined / adopted-instead | |---|---|---|---| | F-01 | Literal Git-style branching / ShadowWorkspace as core primitive (Branch & Merge) | Gemini headline, Grok GK-N2 | Side effects can't branch (phantom). **Instead:** `TaskRunFork` + `irrevocable_side_effects_at_fork`. | | F-02 | `TaskConfirmationSignal` as a new signal type | Gemini §7 | Duplicates existing signal/receipt machinery. **Instead:** fold into existing signals/receipts. | | F-03 | Flawless-execution as a new signal type | Gemini GM-V2 | Redundant. **Instead:** verdict-aware `OutcomeEvaluationSignal` (denominator need kept — see E-05). | | F-04 | Chunking findings for KV-cache bloat | Gemini D-02 | Fragments the unit. **Instead:** compressed-envelope view at prompt-assembly. | --- --- # §G. Conceptual / UX / surfaces / Professional Reliance Layer > **Why this cluster matters (plain language):** §A–§D make the engine *correct*. §G is what makes the output something a litigator can actually *rely on and hand to a partner* — a pre-flight check on what the system thinks you asked, a reviewable diff of every change, an evidence binder tying each claim to its source, a plain-English cost/quality account, and a single "can I rely on this, and within what limits" cover memo. The review calls this the biggest product opportunity in the set, and it's the part that differentiates ELNOR from a generic agent in 2027. ### R.6 — §F rows (declined) # §F. DECLINED — review §6.2 (closed; not re-opened) | # | Proposed | Raised by | Why declined / adopted-instead | |---|---|---|---| | F-01 | Literal Git-style branching / ShadowWorkspace as core primitive (Branch & Merge) | Gemini headline, Grok GK-N2 | Side effects can't branch (phantom). **Instead:** `TaskRunFork` + `irrevocable_side_effects_at_fork`. | | F-02 | `TaskConfirmationSignal` as a new signal type | Gemini §7 | Duplicates existing signal/receipt machinery. **Instead:** fold into existing signals/receipts. | | F-03 | Flawless-execution as a new signal type | Gemini GM-V2 | Redundant. **Instead:** verdict-aware `OutcomeEvaluationSignal` (denominator need kept — see E-05). | | F-04 | Chunking findings for KV-cache bloat | Gemini D-02 | Fragments the unit. **Instead:** compressed-envelope view at prompt-assembly. | --- --- ### R.7 — §G rows (full schemas; the Layer 4 classification governs each row's disposition/landing) # §G. Conceptual / UX / surfaces / Professional Reliance Layer > **Why this cluster matters (plain language):** §A–§D make the engine *correct*. §G is what makes the output something a litigator can actually *rely on and hand to a partner* — a pre-flight check on what the system thinks you asked, a reviewable diff of every change, an evidence binder tying each claim to its source, a plain-English cost/quality account, and a single "can I rely on this, and within what limits" cover memo. The review calls this the biggest product opportunity in the set, and it's the part that differentiates ELNOR from a generic agent in 2027. ## Professional Reliance Layer (Appendix O + M) ### G-01 — [new · Common Contracts / Core] `EvaluationContractReview` — pre-execution contract check (ACCEPT) - **Raised by:** ChatGPT (IDEA/CRITICAL). - **What & why:** before the system spends tokens/time, it surfaces *what it thinks you asked for and how it will judge success* — interpreted goal, criteria, thresholds, source requirements, required capabilities, and the Hard-Call triggers — and lets you approve, edit, or waive. Catches a misread brief before the cost is incurred (the cheapest possible place to catch it). **User-facing:** a short "here's the plan and the bar; approve to proceed" card. - **Disposition:** **ACCEPT** as a new pre-execution artifact (gate optional per task autonomy). - **Fix — `[Appx O]`:** ```ts interface EvaluationContractReview { review_id: string; task_id: string; run_id?: string; compiled_plan_ref: StorageRef; interpreted_goal: string; criteria_summary: string[]; threshold_summary: string[]; source_requirements: string[]; required_capabilities: CapabilityRef[]; hard_call_triggers: HardRevisionCallKind[]; material_differences_from_preview?: string[]; user_approval_required: boolean; approval_status: "pending" | "approved" | "rejected" | "edited" | "waived_by_policy"; approval_ref?: StorageRef; created_at: ISO8601; schema_version: "1.0"; } ``` ### G-02 — [new · FD / V3.3.1] `RevisionReviewPacket` — reviewable packet for every meaning-bearing revision (ACCEPT) - **Raised by:** ChatGPT (IDEA/HIGH). - **What & why:** every meaning-bearing revision emits a packet showing the before→candidate semantic diff, which finding drove which change (`finding_to_change_map`), preservation-constraint results, source changes, and revalidation results — so a reviewer can accept/reject/fork/request-changes/restore *with the reasoning visible*, not just a new blob. **User-facing:** "here's exactly what changed and why; accept, fork, or roll back." - **Disposition:** **ACCEPT.** Pairs with G-04 (restore) and A-16 (revision goes through `revision_in`). - **Fix — `[Appx O]`:** ```ts interface RevisionReviewPacket { packet_id: string; task_id: string; run_id: string; revision_plan_ref: StorageRef; before_artifact_version_ref: StorageRef; candidate_artifact_version_ref: StorageRef; semantic_diff_ref: StorageRef; finding_to_change_map: Record; preservation_constraint_result_refs: StorageRef[]; source_changes: SourceRecordRef[]; revalidation_result_refs: StorageRef[]; regression_risk_summary: string; reviewer_action: "accept" | "reject" | "fork" | "request_changes" | "restore_known_good_state" | "no_user_review_required"; created_at: ISO8601; schema_version: "1.0"; } ``` **Audit fold-in (R0.4) — review decision receipt** (DEEP §8 / ChatGPT §11). *Resolves the prior inline-`reviewer_action` divergence:* the packet keeps `reviewer_action` for convenience, but the durable record of a decision is a receipt, so a meaning-bearing revision cannot ship unreviewed. ```ts interface RevisionReviewDecisionReceipt { receipt_id: string; packet_id: string; // -> RevisionReviewPacket decision: "approved" | "rejected" | "approved_with_conditions"; reviewer_ref: string; conditions?: string[]; decided_at: ISO8601; governance: GovernanceEnvelope; schema_version: 1; } // lints: validation.review_decision_without_receipt ; validation.revision_review_packet_contains_current_action ; validation.meaning_bearing_revision_without_review_packet_or_policy_waiver ``` ### G-03 — [new · Source Workspace] `EvidencePackage` — exportable evidence binder (ACCEPT) - **Raised by:** ChatGPT (IDEA/HIGH). - **What & why:** the Source Workspace exported as one reviewable unit — final artifacts + workspace snapshot + a `claim_support_map` marking each claim `supported / partially_supported / unsupported / contradicted / not_checked`, plus unresolved research needs and stale/unverified sources. This is the binder a litigator hands to a partner or keeps for the file. **User-facing:** "every claim, what backs it, and what's still unverified." Depends on D-06 evidence anchors. - **Disposition:** **ACCEPT.** - **Fix — `[Appx O]`:** ```ts interface EvidencePackage { evidence_package_id: string; task_id: string; run_id: string; final_artifact_refs: StorageRef[]; source_workspace_snapshot_ref: StorageRef; source_record_refs: SourceRecordRef[]; evidence_anchor_refs: string[]; claim_support_map: Array<{ claim_ref: ClaimRef; supporting_anchor_refs: string[]; contradicting_anchor_refs: string[]; support_status: "supported" | "partially_supported" | "unsupported" | "contradicted" | "not_checked"; }>; unresolved_research_need_refs: string[]; stale_or_unverified_source_refs: SourceRecordRef[]; created_at: ISO8601; schema_version: "1.0"; } ``` ### G-04 — [new · V3.3.1 / Core] `KnownGoodState` — named restorable checkpoint (ACCEPT) - **Raised by:** ChatGPT (IDEA/HIGH). - **What & why:** a named, restorable checkpoint of a run/artifact state, so cancel (B-10), fork (G-17), and revision-review (G-02) all have a concrete thing to roll back to. **User-facing:** "save point — restore if a later change makes things worse." - **Disposition:** **ACCEPT.** Schema in Appendix M; `restore_known_good_state` is already an `AvailableTaskAction` (D-14) and a `reviewer_action` (G-02). - **Fix — `[Appx M]`:** `KnownGoodState { state_id, task_id, run_id, artifact_version_refs[], workspace_snapshot_ref, label, created_at }` + a `restore` command in the command registry (D-14) with an idempotency key. **Audit fold-in (R0.4) — `KnownGoodCheckpoint` hardening** (DEEP §8 / ChatGPT §11). The base `KnownGoodState` is hardened into a checkpoint that snapshots policy + capability context, so a restore cannot silently re-expose or re-enable. ```ts interface KnownGoodCheckpoint { checkpoint_id: string; artifact_version_refs: StorageRef[]; policy_snapshot: PolicyEvaluationRef; // policy generation in force at checkpoint capability_snapshot: CapabilityRef[]; // capabilities granted at checkpoint irreversible_side_effects_before: string[]; // effects a restore CANNOT undo governance: GovernanceEnvelope; // L1.1 schema_version: 1; } // lints: validation.known_good_checkpoint_missing_policy_snapshot ; validation.known_good_checkpoint_missing_capability_snapshot ; validation.restore_checkpoint_ignores_irreversible_side_effects ``` ### G-05 — [new · Core / Common Contracts] `BudgetNarrative` — plain-English cost/quality account (ACCEPT) - **Raised by:** ChatGPT (IDEA/HIGH). - **What & why:** a human-readable account of what a run cost and what that bought — separating *logical* LLM calls from *infrastructure retries* (so retries don't look like work), local compute, external tool cost, which optional helpers were skipped, which non-degradable modes were preserved, which degraded modes were used, and the quality impact. **User-facing:** "what it cost, what was skipped, and how that affected quality." - **Disposition:** **ACCEPT.** Consumes the shared `CostEstimate`/`TaskCostRecord` (C-07) and feeds A-02 / G-09 cost predictability. - **Fix — `[Appx O]`:** ```ts interface BudgetNarrative { budget_narrative_id: string; task_id: string; run_id: string; planned_estimate_ref?: StorageRef; actual_cost_records: TaskCostRecord[]; summary: string; logical_llm_calls: number; infrastructure_retries: number; local_compute_seconds: number; external_tool_cost_usd?: number; doc24_packet_assembly_cost?: CostEstimate; source_research_cost?: CostEstimate; skipped_optional_helpers: string[]; preserved_non_degradable_modes: string[]; degraded_modes_used: string[]; quality_impact_summary: string; created_at: ISO8601; schema_version: "1.0"; } ``` ### G-06 — [new · Common Contracts / Core] `TaskReliancePacket` — the capstone reliance artifact (ACCEPT) - **Raised by:** ChatGPT (IDEA/CRITICAL). - **What & why:** the cover memo that ties the whole layer together — assurance summary, unresolved limitations, the evidence package, revision-review packets, Hard-Call resolutions, policy decisions, budget narrative, and known-good states — and renders a single **`reliance_status`**: `safe_to_rely_within_scope` / `rely_with_limitations` / `not_safe_to_rely` / `human_review_required`, with an explicit `reliance_scope` and a user-visible summary. This is the artifact a high-stakes professional actually relies on. **User-facing:** "can I rely on this, within what scope, with what caveats." - **Disposition:** **ACCEPT** — the keystone of §G; everything else in this cluster feeds it. - **Fix — `[Appx O]`:** ```ts interface TaskReliancePacket { packet_id: string; task_id: string; run_id: string; final_artifact_refs: StorageRef[]; evaluation_chain_ids: string[]; assurance_summary_ref: StorageRef; unresolved_limitations: EvaluationLimitationKind[]; // ← A-07 limitation taxonomy evidence_package_ref?: StorageRef; revision_review_packet_refs: StorageRef[]; hard_call_resolution_refs: StorageRef[]; policy_decision_refs: PolicyEvaluationRef[]; budget_narrative_ref?: StorageRef; known_good_state_refs: string[]; reliance_status: "safe_to_rely_within_scope" | "rely_with_limitations" | "not_safe_to_rely" | "human_review_required"; reliance_scope: string; user_visible_summary: string; created_at: ISO8601; schema_version: "1.0"; } ``` - **Rationale:** binds the limitation taxonomy (A-07), evidence anchors (D-06), Pattern C chains (A-03), and budget (C-07) into one auditable statement. Without it, "the system did good work" is unverifiable; with it, reliance is scoped and inspectable. ### G-07 — [new · DOC20 / Core] `AttentionLedger` / `DecisionQueue` — cross-run attention surface (ACCEPT — author minimal) - **Raised by:** ChatGPT (IDEA/HIGH). - **What & why:** one place that surfaces everything across tasks/runs awaiting the user's attention or decision — pending Hard Calls, blocked items, contested findings, approvals. Prevents decisions from being buried inside individual runs. **User-facing:** "one inbox of everything that needs your call." - **Disposition:** **ACCEPT.** No Appendix schema; author a minimal one for review. - **Fix (PROPOSED — confirm shape):** ```ts interface AttentionLedgerItem { item_id: string; task_id: string; run_id?: string; attention_kind: "hard_call_pending" | "blocked_item" | "contested_finding" | "approval_required" | "research_need_human"; summary: string; priority: "low" | "medium" | "high" | "blocking"; command_ref?: string; created_at: ISO8601; resolved_at?: ISO8601; schema_version: "1.0"; } ``` ### G-08 — [Set-wide] No first-class "task health" surface (ACCEPT) — Claude A1 - **Problem/why:** task health is fragmented across signals; there's no single surface showing progress, blockers, budget burn, and last verdict. **Disposition:** **ACCEPT** — define a `TaskHealthCard` DOC20 surface aggregating the existing signals (no new truth). Renders from run state + BudgetNarrative (G-05) + last `OutcomeEvaluationState`. ### G-09 — [Set-wide] Cost predictability asserted but not computable before a run (ACCEPT) — Claude A2 - **Problem/why:** the set claims cost predictability but nothing assembles an end-to-end forecast before running. **Disposition:** **ACCEPT** — a pre-run forecast built from the shared `CostEstimate` (C-07) summed across packets/research/LLM calls (the D-18 per-activation budget gives the sub-totals); surfaced in EvaluationContractReview (G-01) and reconciled after by BudgetNarrative (G-05). ### G-10 — [V3.3.1 §21 / FD §8] Reviewability fragmented across ≥3 surfaces (ACCEPT) — Claude A3 - **Problem/why:** review is scattered across at least three places. **Disposition:** **ACCEPT** — unify into one review experience: FindingsInbox (G-13) for triage, RevisionReviewPacket (G-02) for changes, DecisionAuditView (G-15) for the "why." One entry point, not three. ### G-11 — [FD §3.4 / V3.3.1 §6.12] Over-relies on the user knowing to contest (ACCEPT) — Claude A5 - **Problem/why:** findings are defeasible (FD §3.4) but the UI doesn't *surface* contestability, so the burden is on the user to know they can push back. **Disposition:** **ACCEPT** — proactively mark contestable findings/verdicts in the UI with the contest affordance inline (ties to the defeasible-findings model). ### G-12 — `WorkProductCertification` — the page you staple to the cover sheet (ACCEPT — highest-leverage surface) — Claude S1 - **What:** the human-facing render of `TaskReliancePacket` (G-06): reliance status, scope, limitations, evidence summary. **Disposition:** **ACCEPT.** DOC20 surface over G-06; read-only; no new truth. ### G-13 — `FindingsInbox` — cross-task review queue (ACCEPT) — Claude S2 - **What:** a queue of findings/repair instructions across tasks, filterable by severity/state/matter. **Disposition:** **ACCEPT.** Reads canonical `EvaluationFinding`s (A-01); matter-scoped per D-24. ### G-14 — `RunDiff` — compare two runs of the same task (ACCEPT) — Claude S3 - **What:** diff two runs (inputs, plan, outcomes, cost). **Disposition:** **ACCEPT.** Reads run records + BudgetNarrative (G-05); pairs with RunReplayPreview (G-16). ### G-15 — `DecisionAuditView` — "why did it decide that" (ACCEPT) — Claude S4 - **What:** renders the decision chain behind a verdict/route (which findings, which policy, which Hard Call). **Disposition:** **ACCEPT.** The UI counterpart to the Pattern C chain (A-03) and routing (A-08); renders the coordination trace. ### G-16 — `RunReplayPreview` — preview a replay before committing (ACCEPT) — Claude S5 - **What:** show what a replay would produce before applying it. **Disposition:** **ACCEPT.** Depends on the TaskReplay primitive (G-19) + KnownGoodState (G-04). ### G-17 — [Core R0.7.1 §5.1] `TaskRunFork` + `irrevocable_side_effects_at_fork` (ACCEPT — the adopted form of the declined ShadowWorkspace) — Claude §5.1 - **What & why:** fork a run from a checkpoint to explore an alternative without disturbing the original — the *real* answer to Gemini/Grok's branch-and-merge idea (§F-01), made honest by an explicit record of side effects that **cannot** be forked (an already-sent email stays sent). **Disposition:** **ACCEPT.** - **Fix (PROPOSED, consistent with §5.1):** ```ts interface TaskRunFork { fork_id: string; parent_run_id: string; forked_from_checkpoint_ref: string; // KnownGoodState (G-04) new_run_id: string; fork_reason: string; irrevocable_side_effects_at_fork: Array<{ side_effect_id: string; kind: string; executed_at: ISO8601; note: "not_reversible_in_fork"; }>; created_at: ISO8601; schema_version: "1.0"; } ``` ### G-18 — [V3.3.1 / Core] `ExplanationTrace` as a first-class artifact (ACCEPT) — Grok - **What & why:** every `CompiledRevisionStrategy`/`RevisionPlan` emits a short human-readable causal trace ("changed X because finding Y, preserving Z"). Cheap to produce, large trust gain, and feeds DecisionAuditView (G-15). **Disposition:** **ACCEPT** — a short `explanation_trace_ref` on the plan/strategy. ### G-19 — [Set-wide] `TaskReplay` primitive (ACCEPT) — Claude D11 - **What & why:** deterministic replay of a run closes the determinism story and underpins RunReplayPreview (G-16) and RunDiff (G-14). **Disposition:** **ACCEPT** — a replay primitive keyed to a run snapshot + KnownGoodState (G-04). **Audit fold-in (R0.4) — replay request + divergence** (DEEP §8 / Claude D11). The G-19 concept above gets its schemas; a replay must never re-fire irreversible external side effects. ```ts interface TaskReplayRequest { replay_id: string; source_run_ref: string; mode: "deterministic" | "best_effort"; side_effect_policy: "suppress_external" | "sandbox_only"; schema_version: 1; } interface ReplayDivergenceRecord { replay_id: string; step_ref: string; divergence_kind: "nondeterministic_step" | "input_changed" | "policy_changed" | "tool_unavailable"; detail: string; } // lints: validation.replay_attempts_external_side_effect ; validation.deterministic_replay_with_nondeterministic_step ; validation.replay_divergence_unrecorded ``` ### G-20 — [DOC20] Unified Evaluation-Chain view (ACCEPT) — Grok - **What & why:** render a Pattern C chain as one card — qualitative findings and the quantitative verdict side by side — instead of two disconnected envelopes. The UI counterpart to A-02/A-03/A-04. **Disposition:** **ACCEPT** — DOC20 surface over the chain registry (A-03). ### G-21 — [Testing] Chaos / concurrency fixtures (ACCEPT) — ChatGPT / Claude §6.3 - **What & why:** the test harness for the §B concurrency work — storage-full, malformed LLM output, mid-run privilege change, clock skew, parallel writes to one artifact. **Disposition:** **ACCEPT** — required fixtures backing B-03/B-04/B-05/B-16 and D-01/D-24; gate the runtime fixes on them passing. --- --- # §G/§A/§D partials — folded as notes (not full rows) Covered in substance by existing rows; recorded here so they're tracked, with where each attaches: - **`authority_basis` array can be empty while hard blockers require backed authority** (Grok §5.2) → note on **A-05**: add a non-empty constraint — a finding with `severity:"blocking"` MUST carry ≥1 `assurance_basis`; `validation.blocking_finding_without_authority` (error). - **`AutonomousModePolicy` as single source of truth + visible toggle/live risk score** (Grok §3.1) → note on **A-16 / G-08**: Claude D9 already confirms the *locked fields* are correct-by-construction; add the UX requirement to surface `AutonomousModePolicy` as a visible toggle with a live risk score, and the invariant that no mutation path bypasses it (reinforces A-16). - **Consumption-receipt → `RepairCycleSignal` linkage is optional/never mandated** (Grok §4.6) → note on **D-23**: make the `consumption_receipt_ref → RepairCycleSignal` linkage field **required**, not optional; `validation.repair_cycle_signal_without_consumption_link` (error). - **OP-A rows claim the same primitive under different names** (Grok §4.4) → note on **A-15 / supersession (§8)**: the supersession matrix + TypeOwnerRegistry already force one canonical name per primitive; add an OP-A dedup pass as a build-ready gate item. - **Mandatory Plan-Verifier for high-risk plans** (Grok top-5) → note on **D-15**: D-15 adds the `plan_verifier` coordination point; add the *policy* that a `plan_verifier` sub-agent is **required** (not optional) when `plan_risk_score ≥ threshold`. --- --- *Consolidated from working cards V0.1–V0.6. 126 items adjudicated from ~228 raw reviewer assertions: 88 fixes (§A–§D), 13 held for Phase B (§E), 4 declined per review §6.2 (§F), 21 product surfaces (§G), plus 5 partials folded as notes. Fixes carry the Appendix A–P schemas inline. `⚠verify` items deferred for current-state confirmation. Next step after your decisions: the R0.4 amendment package applying the accepted rows to the operative specs.* --- ## Final consistency pass **Coverage (126 + new):** every consolidated-card item lands in exactly one place — §A 27 (L3 core 8 · L2 2 · standard 17), §B 27 (L3 core 5 · L2 2 · standard 20), §C 10 (L3 core 1 [C-03] · C-cluster 9), §D 24 (L3 core 2 · L2 1 · standard 21), §G 21 (Layer 4; G-21→Layer 5), §E 13, §F 4 = **126**. New rows NR.1–NR.7 add scope (NR.4 completes the D-19/D-20 standard rows). No item dropped; flips and re-grades explicit. **Primitives owned:** all seven L1 primitives have `TypeOwnerRegistry` entries (NR.1); no later row redefines them. **Lints — all registered in `LintRegistry` (L1.7):** Step-0 (×5), Layer-2 (×3), Layer-3 core (×6), C-cluster (×4), Layer-4 (×4), Layer-5 fixtures (their asserted lints), new-rows (reliance/delegation/library/feedback/subagent ×6). One waiver mechanism (`LintRegistryEntry.waiver`, architect-approved). **OP-A rows to carry (cross-doc obligations):** - **DOC20** — render the §G surfaces (G-07, G-08, G-10, G-11, G-12…G-16, G-20) as `DerivedReadModelRecord`-backed UI; matter-scoped (D-24). - **Addenda A** — full `NoVerdictReason` consolidation absorbing the R203 taxonomy (A-07); the canonical enum + alias registry land now, the merge when Addenda A next opens. - **DOC12** — `RoomKind.plan_review` registration (`OBL-DOC12-FORUM-01`, already exists; B-15). - **DOC73** — `LibraryPromotionPolicy` gate (NR.4 / D-20). - **Review Studio** — `OBL-RS-01…10` (already tracked in `DOC23_ADDB_Review_Studio.md`). **Corrections baked in:** `confidence_merge_strategy` removed (was → A-04, route-resolution); `tiebreaker_epsilon` removed (was → B-16, a concurrency row); `A-02` not elevated (HIGH, not CRITICAL); `B-25` de-CRITICAL. **Integrity:** §E (held, Phase-B) and §F (declined) intact and unmodified; the finding model is shared with (not forked from) the Review Studio unit; the `revision_in` chokepoint (A-16) is the single mutation path that the Review Studio human-assist also relies on. --- ## Status **Card complete and self-contained (Layers 0–5 + new rows + R0.4 audit additions + Appendix R).** Every adjudicated row is now rendered in full (Appendix R) or in its governing Layer; the nine completeness-audit fold-ins, two new rows (NR.8–NR.9), and one DOC24 OP-A obligation are folded in. Ready for the multi-LLM red-team pass (ChatGPT 5 Pro + Gemini 2.5 Pro + Codex), then fold into the R0.4 spec edits. **Awaiting your ruling (non-blocking):** the three parked items (`WorkspaceExternalizationPolicy`/`Receipt`; `ExternalSourceQueryPolicy`/`Receipt`; `TaskAgentForumSurfaceOwnership`) — in or out of scope. **Next deliverable still owed (separate):** the Test-set adjudication card (`DOC23_ADDENDA_B_TEST_SET_ADJUDICATION_CARD.md`) second-round reviews. ````