ELNOR REPO READER TEXT MIRROR Original path: Active Working and Red Team/DOC23 Working/DOC23 Red Teaming/Add B Adj Card Review 5.29/DOC23_ADDENDA_B_RT_ADJUDICATION_CARD_CONSOLIDATED.md Source repo: /Users/OpenClaw1/Elnor/Elnor Specs Git branch: main Git commit: dbaa25962edc11ab30e8d4ca1715f9ae5bf77331 Generated: 2026-06-09T01:23:58.539Z --- # DOC23 Addenda B — Red Team Adjudication Card (Consolidated) **What this is.** The complete adjudication of the DOC23 Addenda B comprehensive red-team review (`DOC_23_Add_B_RT_Reviews_1__5_28_.md`), deduplicated across its four reviewers (Claude consolidation + ChatGPT + Gemini + Grok) into one row per unique finding. Each row carries the problem, the current spec state, a **disposition**, and the **fix as concrete schema/code** ready to apply in the next spec version (R0.4). Consolidates working cards V0.1–V0.6. **Operative specs (fix targets):** Core R0.7.1 · Outcome Evaluator/Revisor V3.3.1 · Evaluation Common Contracts V1.1.1 · Source Workspace V1.0.1 · Task Forum + Run Board V1.0.1 · Feedback Delivery V1.0.1. **Totals — deduped from ~228 raw assertions across 4 reviewers → 126 adjudicated items:** | Cluster | Rows | |---|---| | §A Cross-document contract & schema | 27 | | §B Runtime / concurrency / distributed-systems | 27 | | §C Math / scoring / metrics | 10 | | §D Source Workspace / Forum / Core / Taint | 24 | | **Fix rows (A–D)** | **88** | | §E Held (Phase-B / self-learning) | 13 | | §F Declined (review §6.2) | 4 | | §G Conceptual / UX / Professional Reliance Layer | 21 | Severity highlights: **A-16** (CRITICAL — direct repair bypasses the `revision_in` safety contract), **A-01** (CRITICAL — incompatible `EvaluationFinding` schemas), **B-24** (CRITICAL — Revisor sufficiency protocol has no detection algorithm), **B-25** (CRITICAL — `RevisionPlan` steps lack a `depends_on` field the DAG-lint assumes), **D-01** (compliance blocker — transitive taint laundering), **D-24** (privilege — cross-matter forum leak). **Provenance note.** A two-stage audit produced this card. Stage 1 audited §A/§B/§C against the Claude findings and ChatGPT's main typed findings (added 16 missing rows). Stage 2 audited the merged card against the **ChatGPT Audit Addendum** and **Grok's exhaustive §4/§5 lists**, which surfaced six further omissions (A-26/A-27, B-24…B-27, two CRITICAL) — now included — plus five partials folded as notes after §G. Merge fidelity verified (no dropped/duplicated rows; §E/§F intact). **How the review self-disposes (carry-forward context).** The review states (RT§1) that *every finding is to be addressed except the four §6.2 declines*; severity describes how broken something is, not priority. Learning-touching items are **Phase-B-gated** (§E). Default disposition here is ACCEPT — your pass is to confirm / modify / reject per row, and to adjudicate the Appendix A–P fix schemas, which appear inline as each row's fix. **Disposition vocabulary:** `ACCEPT` · `ACCEPT WITH MODIFICATIONS` · `ACCEPT-AS-FIX` (adopt a reviewer's exact patch verbatim) · `DEFER-PhaseB` (held, §E) · `DECLINED` (§F). **Conventions.** Schemas in fenced `ts` blocks are the proposed spec inserts; `[Appx _]` marks schemas lifted from the review's own paste-ready package. `⚠verify` marks a current-state claim **not yet confirmed against live spec text** (deferred — confirm before patch). Section anchors `§x` are to the operative specs; `RT§` = the review. **How to use.** The index below lists every fix/surface row with its disposition and a decision column (`✓` accept / `✗` reject / `~` modify). Fastest path: "accept all except where marked," matching the review's §1 disposition, then mark only your exceptions. §E (held) and §F (declined) need no per-row decision now. **Open `⚠verify` items** (confirm before patch): A-02, A-05, A-11, A-23, B-03, B-15, B-16, C-03, D-16. --- ## Master decision index **§A — Contract & schema (27)** | ID | Finding · disposition | ✓/✗/~ | |---|---|---| | A-01 | [Common Contracts §4.2 / V3.3.1 §5.7 / FD §3.3] Two incompatible `EvaluationFinding` schemas (ACCEPT) | | | A-02 | [Common Contracts §3 / V3.3.1 §5.18] Pattern C envelope FIELD-location bug (ACCEPT) | | | A-03 | [Common Contracts §3.7] Pattern C chain-ID lifecycle + chain registry (ACCEPT) | | | A-04 | [Common Contracts §3.7] Pattern C route-resolution policy (ACCEPT) | | | A-05 | [FD §3.4 / Common Contracts §4.2A] `EvaluationAuthorityBasis` vs `AssuranceBasis` reconciliation (ACCEPT WITH MODIFICATIONS) | | | A-06 | [V3.3.1 §0.4.1 / Common Contracts §3.1–3.2] `OutcomeEvaluationState` count & mapping skew (ACCEPT) | | | A-07 | [FD §2.1 / V3.3.1 §5.18] Indeterminate not representable; `IndeterminateCause` undefined (ACCEPT) | | | A-08 | [FD §6.2] Routing completeness: missing branches + full state→routing matrix (ACCEPT) | | | A-09 | [Common Contracts §4.2 / V3.3.1 §5.17.7] Phantom & misplaced schema refs + TypeOwnerRegistry (ACCEPT) | | | A-10 | [Common Contracts §3 / §11] `EvaluationArtifactEnvelope` wrapper out-of-document; `overall_state` too B-specific (ACCEPT WITH MODIFICATIONS) | | | A-11 | [V3.3.1 §15 / Common Contracts] `LearningMode` enum referenced, never enumerated (ACCEPT — value set; *use* DEFER-PhaseB) | | | A-12 | [FD §5 / Common Contracts] `HumanOutcomeFeedbackEvent` referenced, never defined (ACCEPT) | | | A-13 | [Core §9 vs Common §5] Core duplicates the learning-signal envelope with malformed TypeScript (ACCEPT) | | | A-14 | [Common Contracts §4.2 / validation] Qualitative-slice owner map wrong; slice-required check only a warning (ACCEPT) | | | A-15 | [Common Contracts §11.5 / cross-doc obligations] "Build-ready" overstated; command registry not mandatory; compat claim too strong (ACCEPT WITH MODIFICATIONS) | | | A-16 | [FD §7.2 / V3.3.1 §3.1.3] Direct repair wiring can bypass the central `revision_in` safety contract (ACCEPT — **CRITICAL**) | | | A-17 | [FD §5.3 / V3.3.1 §9] Revisor input confused with `revision_in` (ACCEPT) | | | A-18 | [FD §1.2 / §2.3] Feedback-bundle emission discipline contradicts itself (ACCEPT) | | | A-19 | [V3.3.1 §0.4.4 / §5.7.1] `FindingState` pass-through `proposed` has no negative exit (ACCEPT) — Claude C2 | | | A-20 | [Common Contracts §6.4 / §9.4] `unanchored_llm_judgment` acknowledgment has no field to write into (ACCEPT) — Claude C6 | | | A-21 | [FD §2.2 / §3.3] Optional fields are required by invariants (ACCEPT — largely satisfied by A-01) | | | A-22 | [Common Contracts §12] Pending consumers lack degraded-mode behavior (ACCEPT) | | | A-23 | [FD §3 / §5] `ApplicabilityScope.authority_level` vs `domain_payload.authority_level` conflict (ACCEPT WITH MODIFICATIONS) — Claude D23 | | | A-24 | [Common Contracts §3.4 vs V3.3.1 §5.18.8] Parallel Judge+Evaluator example contradicts topology (ACCEPT) — Claude C14 | | | A-25 | [Common Contracts §3.1 vs V3.3.1 §5.1] `evaluation_chain_id` naming asymmetry (ACCEPT — fold into A-03) — Claude D8 | | | A-26 | [V3.3.1 §5.5.4 / §5.7] `ProgressSignal` matching references fields missing from `EvaluationFinding` (ACCEPT) | | | A-27 | [V3.3.1 §5.16 / Source Workspace] `EvaluationSnapshot` source-workspace hashes typed as artifact hashes (ACCEPT) | | **§B — Runtime / concurrency (27)** | ID | Finding · disposition | ✓/✗/~ | |---|---|---| | B-01 | [V3.3.1 §11 / Core §runtime] No unified RevisionExecutionLifecycle / dispatcher↔receipts state machine (ACCEPT) | | | B-02 | [V3.3.1 §11.x] Idempotency under-specified and not canonical (ACCEPT) | | | B-03 | [Source Workspace §x / V3.3.1 §11.20] Concurrent-write safety on the Source Workspace (ACCEPT) ⚠verify | | | B-04 | [V3.3.1 §11.20.2 / §11.22] Rolling hash unsafe under DAG/parallel execution (ACCEPT) | | | B-05 | [V3.3.1 §11.21 / §11.21.2 / §5.14.1] Revalidation cascade: no convergence bound + duplicated upstream-failure rule (ACCEPT) | | | B-06 | [V3.3.1 §dependency / Common Contracts] Outcome-dependency cycles + `pending_dependency` deadlock (ACCEPT) | | | B-07 | [V3.3.1 §11.22] Parallel sibling outputs orphaned after batch failure; parallelism not first-class (ACCEPT) | | | B-08 | [V3.3.1 §failure / Common Contracts] Failure-taxonomy gaps (ACCEPT) | | | B-09 | [V3.3.1 §candidate] Candidate artifacts can't model external side effects; lifecycle mixed with head-pointer (ACCEPT) | | | B-10 | [V3.3.1 §11 / FD] `TaskCancelProtocol` + Hard Call blocking scope undefined (ACCEPT) | | | B-11 | [V3.3.1 §skip / FD] `TaskSkipProtocol` + skip receipts missing (ACCEPT) | | | B-12 | [Common Contracts §policy / V3.3.1] Policy freshness checks a field `PolicyDecisionRef` lacks (ACCEPT) | | | B-13 | [Source Workspace §6 / Common Contracts] `ResearchNeed` concurrency — needs a lease (ACCEPT) | | | B-14 | [Task Forum §x] Forum deadlock breaker missing (ACCEPT) | | | B-15 | [Task Forum §room] `RoomKind.plan_review` referenced but never registered (ACCEPT) ⚠verify | | | B-16 | [V3.3.1 §11.9] Concurrency tie-breaker relies on wall-clock `created_at` (ACCEPT-AS-FIX) | | | B-17 | [FD §8.4 / §10.3] `instruction_in` overload leaks the DOC23/DOC15/DOC24 boundary (ACCEPT) | | | B-18 | [V3.3.1 §6.5.2 / §7.9] `HardRevisionCall.options` may be empty though spec requires bounded options (ACCEPT) — Claude C3 | | | B-19 | [V3.3.1 §11.6 / §0.4.7] `RevisionOperationKind="hard_call_resolved"` has no producer (ACCEPT) — Claude C10 | | | B-20 | [V3.3.1 §6.7.2 / §11.21] Success-condition 5 races the cascade (ACCEPT) — Claude D2 | | | B-21 | [Core §9.0.6] Signal emission ordering vs receipt persistence undefined (ACCEPT) — Claude D7 | | | B-22 | [V3.3.1 §5.18 / §11.15] Pattern C doubles per-turn latency/cost with no budget (ACCEPT) — Claude D17 | | | B-23 | [FD §6.3] Multiple delivery branches fire simultaneously with no idempotency control (ACCEPT) — Claude D24 | | | B-24 | [V3.3.1 §6.7] Revisor sufficiency protocol has no detection algorithm (ACCEPT — **CRITICAL**) | | | B-25 | [V3.3.1 §11.3] `RevisionPlan` steps have no `depends_on` field, but the plan-lint checks "DAG acyclic" (ACCEPT — **CRITICAL**) | | | B-26 | [V3.3.1 §7.5 / §11.7] Revalidation policy named inconsistently — three sources of truth (ACCEPT) | | | B-27 | [V3.3.1 §6.7 / `revision_in`] Ambiguous `regenerate` capability contract → identical-output loop (ACCEPT) — Gemini D-04 | | **§C — Math / scoring (10)** | ID | Finding · disposition | ✓/✗/~ | |---|---|---| | C-01 | [V3.3.1 §15 / §3.10 / Core] No Formula Registry — scores are bare numbers (ACCEPT) | | | C-02 | [V3.3.1 §15.1] Survivorship bias in `avg_revision_cycles_to_convergence` (ACCEPT-AS-FIX) | | | C-03 | [V3.3.1 §7.2.1] Hash-hallucination paradox in `RevisionPlan` mutation mode (ACCEPT-AS-FIX) ⚠verify | | | C-04 | [V3.3.1 §15.8.2] Flat-ratio math in "weighted" reputation score (ACCEPT-AS-FIX; reputation *use* → §E) | | | C-05 | [Core §16.2 / V3.3.1 §12.5] Thermal-throttling outliers skew latency means (ACCEPT-AS-FIX) | | | C-06 | [V3.3.1 §15.8 / cost] Zero-denominator & insufficient-sample handling missing; cost estimator ignores tail risk (ACCEPT) | | | C-07 | [Common Contracts §9A] No shared `CostEstimate` / `TaskCostRecord`; metric terminology misused (ACCEPT WITH MODIFICATIONS) | | | C-08 | [V3.3.1 §novelty] Novelty score assumes an undefined metric space (ACCEPT) | | | C-09 | [V3.3.1 §task-mode / §template-match] Task-mode not calibrated to protect direct-first; `TemplateMatchScore` no aggregation; magic 0.7 (ACCEPT) | | | C-10 | [V3.3.1 §criterion] `criterion_semantics_hash` is lexical, not semantic (ACCEPT) | | **§D — Source / Forum / Core / Taint (24)** | ID | Finding · disposition | ✓/✗/~ | |---|---|---| | D-01 | [Source Workspace §4.1 / V3.3.1 §15.10 / Task Forum §8.3] Transitive taint laundering via sub-agents and forum (ACCEPT — **compliance blocker**) | | | D-02 | [Source Workspace §2 / Governance] Workspace taint aggregation undefined (ACCEPT) | | | D-03 | [Source Workspace §0.3 / V3.3.1 §12] `TaskSourceWorkspace` vs `SourceWorkspace` identity split (ACCEPT) | | | D-04 | [Source Workspace §3 / §4 / §2.3] Source-tier defects: Tier 0, unstored transitions, misaligned verification, demotion authority (ACCEPT) — incl. Claude C13 | | | D-05 | [Source Workspace §6.2 / §7.4] `ResearchNeed` scoping, ref types, and `human_needed` exit (ACCEPT) — incl. Claude D6 | | | D-06 | [Source Workspace §4 / §5 / §6.3] Evidence anchors, payload registry, weak tool-receipts, extractor-vs-source confusion (ACCEPT) | | | D-07 | [Task Forum §5.2 / §5.3] Forum posts: visibility, supersession, governance envelopes (ACCEPT) | | | D-08 | [Task Forum §6.3 / §6.4] Context-packet ownership, request/receipt, omission manifest, audience-enum divergence, total budget (ACCEPT) — incl. Claude D18 | | | D-09 | [Task Forum §1 / §5] Passive board auto-publishes every event — privacy/volume controls (ACCEPT) — Claude B12 | | | D-10 | [Task Forum §7.2 / §3.2 / §4.6 / §3.1] ModuleAssistanceRequest schema, participant/moderator model, payload schemas, moderator-failure (ACCEPT) — incl. Claude D3 | | | D-11 | [Run Board §3.1 / §6.2] BoardDigest filter rule unspecified (ACCEPT) — Claude C9 | | | D-12 | [Run Board §retention / Source Workspace §9] Run Board retention/compaction/event-class + EC-policy persistence (ACCEPT) | | | D-13 | [FD §RunGuidance / TaskBlueprint] RunGuidanceItem persistence, cross-run injection, lifecycle, contested-check (ACCEPT) — incl. Gemini D-01 | | | D-14 | [Core §3D / §4B / §5A] InjectionSlotRegistry, compact card schemas, command registry, token_budget, DOC24 capability ownership, receipt booleans (ACCEPT) — incl. Claude C15 | | | D-15 | [V3.3.1 §8.4 / §17.1] Sub-agent: output-contract plurality, no-sub-agent fallback, coordination-point count (ACCEPT) — incl. Claude B10, D1 | | | D-16 | [Source Workspace / Forum] Workspace API operations referenced but never defined (ACCEPT) ⚠verify | | | D-17 | [Common Contracts §7 / V3.3.1 §7.9.3] Anchor/hash hygiene: empty StructuredAnchor, uncomputed context_hash, HardCall hash normalization (ACCEPT) — incl. Claude C7, D14 | | | D-18 | [V3.3.1 §repeated-failure] Repeated-failure detection keyed on versioned refs (ACCEPT) | | | D-19 | [Source Workspace §SourceRecord] Per-module cost attribution has no field (ACCEPT) | | | D-20 | [Source Workspace §library] Library promotion gate references EC policy but is undefined (ACCEPT) | | | D-21 | [V3.3.1 / Core] `requires_background_progress` overloaded (ACCEPT) | | | D-22 | [Common Contracts §11.5 / cross-refs] Backward-compat overstated; section-anchor hygiene before R3.2 absorption (ACCEPT WITH MODIFICATIONS) — Claude D5/D22 | | | D-23 | [FD §9.4] "Silent ignoring fires validation" is unenforceable (ACCEPT) — Claude C11 | | | D-24 | [Run Board §5.4 / Source Workspace §9.4] Cross-matter forum-post visibility unspecified (ACCEPT — **privilege firewall**) — Claude C12 | | **§G — Conceptual / UX / Reliance Layer (21)** | ID | Finding · disposition | ✓/✗/~ | |---|---|---| | G-01 | [new · Common Contracts / Core] `EvaluationContractReview` — pre-execution contract check (ACCEPT) | | | G-02 | [new · FD / V3.3.1] `RevisionReviewPacket` — reviewable packet for every meaning-bearing revision (ACCEPT) | | | G-03 | [new · Source Workspace] `EvidencePackage` — exportable evidence binder (ACCEPT) | | | G-04 | [new · V3.3.1 / Core] `KnownGoodState` — named restorable checkpoint (ACCEPT) | | | G-05 | [new · Core / Common Contracts] `BudgetNarrative` — plain-English cost/quality account (ACCEPT) | | | G-06 | [new · Common Contracts / Core] `TaskReliancePacket` — the capstone reliance artifact (ACCEPT) | | | G-07 | [new · DOC20 / Core] `AttentionLedger` / `DecisionQueue` — cross-run attention surface (ACCEPT — author minimal) | | | G-08 | [Set-wide] No first-class "task health" surface (ACCEPT) — Claude A1 | | | G-09 | [Set-wide] Cost predictability asserted but not computable before a run (ACCEPT) — Claude A2 | | | G-10 | [V3.3.1 §21 / FD §8] Reviewability fragmented across ≥3 surfaces (ACCEPT) — Claude A3 | | | G-11 | [FD §3.4 / V3.3.1 §6.12] Over-relies on the user knowing to contest (ACCEPT) — Claude A5 | | | G-12 | `WorkProductCertification` — the page you staple to the cover sheet (ACCEPT — highest-leverage surface) — Claude S1 | | | G-13 | `FindingsInbox` — cross-task review queue (ACCEPT) — Claude S2 | | | G-14 | `RunDiff` — compare two runs of the same task (ACCEPT) — Claude S3 | | | G-15 | `DecisionAuditView` — "why did it decide that" (ACCEPT) — Claude S4 | | | G-16 | `RunReplayPreview` — preview a replay before committing (ACCEPT) — Claude S5 | | | G-17 | [Core R0.7.1 §5.1] `TaskRunFork` + `irrevocable_side_effects_at_fork` (ACCEPT — the adopted form of the declined ShadowWorkspace) — Claude §5.1 | | | G-18 | [V3.3.1 / Core] `ExplanationTrace` as a first-class artifact (ACCEPT) — Grok | | | G-19 | [Set-wide] `TaskReplay` primitive (ACCEPT) — Claude D11 | | | G-20 | [DOC20] Unified Evaluation-Chain view (ACCEPT) — Grok | | | G-21 | [Testing] Chaos / concurrency fixtures (ACCEPT) — ChatGPT / Claude §6.3 | | **§E — Held / Phase-B (13)** and **§F — Declined, review §6.2 (4)** — no per-row decision required now; see those sections below. --- # §A. Cross-document contract & schema cluster ### A-01 — [Common Contracts §4.2 / V3.3.1 §5.7 / FD §3.3] Two incompatible `EvaluationFinding` schemas (ACCEPT) - **Raised by:** Claude **B1** (BUG/CRITICAL); ChatGPT CG-S4. *Review's "most consequential single defect."* - **Problem:** `EvaluationFinding` is defined in **both** V3.3.1 §5.7 and FD §3.3 with mutually exclusive fields. V3.3.1: `finding_text`, `severity(4)`, `state: FindingState(12)`, `basis: AssuranceBasis`, `target_artifact_ref`, `taint_class`, `confidence: low|medium|high`. FD: `finding_kind(12)`, `authority_basis: EvaluationAuthorityBasis[](9, new enum)`, `lifecycle_state: EvaluationFindingLifecycleState(7)`, `target_criterion_id`, `target_scope_ref`, `affected_claim_refs`, `confidence: number`, `based_on_board_digest_ref`. Common Contracts §4.2 cites the V3.3.1 schema as canonical — so FD declares a schema that does not match the canonical reference. A coding agent emits per V3.3.1 and reads per FD; the two share no field set. - **Disposition:** **ACCEPT.** One canonical `EvaluationFinding` in Common Contracts (new **§4.2A**, `schema_version: "2.0"`), with a thin `FeedbackFindingView` projection owned by FD. `state` and FD's `lifecycle_state` are the same concept → single `FindingState`. `AssuranceBasis` (assurance) and FD's `authority_basis` (what makes a finding a *blocker*) are **not** redundant (see A-05): the finding carries `assurance_basis[]`; the blocking role becomes `blocking_authority_satisfied` on the view. - **Fix (Common Contracts §4.2A + FD §3) — `[Appx D]`:** ```ts interface EvaluationFinding { finding_id: string; result_id: string; finding_kind: FindingKind; finding_text: string; explanation: string; severity: "low" | "medium" | "high" | "blocking"; state: FindingState; assurance_basis: AssuranceBasis[]; confidence_score: number; // 0..1 (replaces the low|med|high string) confidence_basis: ConfidenceBasis[]; confidence_explanation: string; target_artifact_ref?: StorageRef; target_version_ref?: StorageRef; target_scope_ref?: ArtifactScopeRef; target_criterion_id?: string; affected_claim_refs?: string[]; evidence_refs: StorageRef[]; verification_record_refs: StorageRef[]; supporting_material_snapshot_refs: StorageRef[]; based_on_artifact_version_ref?: StorageRef; based_on_artifact_version_absent_reason?: "non_artifact_target" | "human_review_no_artifact" | "process_observation"; based_on_source_workspace_snapshot_ref?: StorageRef; based_on_board_digest_ref?: StorageRef; taint_class: TaintClass; data_class: "public" | "internal" | "privileged" | "local_only"; matter_id?: string; privileged: boolean; policy_decision_refs: PolicyEvaluationRef[]; evaluation_target_state: "current_artifact" | "candidate_artifact" | "sandboxed_candidate"; candidate_artifact_version_ref?: CandidateArtifactVersionRef; promotion_policy_ref?: StorageRef; match_key: FindingMatchKey; superseded_by_finding_id?: string; expires_at?: ISO8601; created_at: ISO8601; schema_version: "2.0"; } interface FeedbackFindingView { finding_id: string; source_evaluation_finding_ref: string; display_summary: string; display_explanation: string; finding_kind: FindingKind; severity: "low" | "medium" | "high" | "blocking"; lifecycle_state: FindingState; blocking_authority_satisfied: boolean; routed_action_refs: StorageRef[]; schema_version: "1.1"; } ``` - **Rationale:** single owner kills the single-name/two-schemas defect; the projection keeps FD's delivery view without re-declaring the type. Register both in the TypeOwnerRegistry (A-09). Apply **before** any V3.4 / FD V1.2 work (review B1). ### A-02 — [Common Contracts §3 / V3.3.1 §5.18] Pattern C envelope FIELD-location bug (ACCEPT) - **Raised by:** ChatGPT CG-S1; Claude **§6.1** ("highest-value catch"). Distinct from chain-ID lifecycle (A-03). - **Problem:** In Pattern C the Judge reads `evaluated_target` and `evaluation_basis`, but those fields are **not** on `EvaluationResultEnvelope` — they live on `EvaluationFeedbackBundle`. The wiring the review markets as a headline feature breaks at read time. - **Disposition:** **ACCEPT.** Add the two fields to `EvaluationResultEnvelope` (Common Contracts §3); the bundle copy becomes a projection or is dropped. - **Fix (Common Contracts §3, on `EvaluationResultEnvelope`):** ```ts // add to EvaluationResultEnvelope: evaluated_target: EvaluatedTargetRef; // what was evaluated (artifact/version/scope) evaluation_basis: EvaluationBasisRef; // criteria/rubric/snapshot the verdict rests on ``` - **⚠verify:** confirm the current home of `evaluated_target`/`evaluation_basis` (review says bundle; one-line grep of Common Contracts §3 vs the bundle schema). ### A-03 — [Common Contracts §3.7] Pattern C chain-ID lifecycle + chain registry (ACCEPT) - **Raised by:** Claude **B3**; ChatGPT CG-S2; Grok GK-3.6 / GK-4.8. - **Problem:** §5.18.4 / §3.7 say the upstream Evaluator populates `target_evaluation_chain_id` and the Judge copies it, but nothing specifies: (1) who generates the UUID and whether it's minted even when no Pattern C Judge attaches; (2) what happens when it doesn't resolve at the consumer; (3) retention/GC; (4) whether re-activations share an ID. The field has no lifecycle, so audit reconstruction only works when Pattern C wiring happens to be present. - **Disposition:** **ACCEPT.** Evaluator MUST mint a fresh **ULID** at activation and emit it; the Pattern C Judge MUST read it from `evaluator_output_in.target_evaluation_chain_id` and set its own envelope to the same value; orphan envelopes keep the field unused; retention = envelope retention; add `validation.pattern_c_chain_id_mismatch`. Back it with a chain registry so chain status is first-class. - **Fix (Common Contracts §3.7) — `[Appx E]`:** ```ts interface EvaluationChainRegistryRecord { chain_id: string; chain_kind: EvaluationChainKind; task_id: string; run_id: string; target_artifact_ref: StorageRef | null; target_artifact_version_ref: StorageRef | null; target_scope_ref: ArtifactScopeRef | null; evaluation_snapshot_ref: StorageRef; expected_producers: ProducerKind[]; received_result_ids: string[]; status: EvaluationChainStatus; created_at: ISO8601; completed_at?: ISO8601; superseded_by_chain_id?: string; validation_failures: Array<"chain_id_missing" | "chain_target_mismatch" | "chain_stale_snapshot" | "chain_ambiguous" | "chain_consumer_timeout">; schema_version: "1.0"; } ``` ### A-04 — [Common Contracts §3.7] Pattern C route-resolution policy (ACCEPT) - **Raised by:** Claude **C1**; ChatGPT CG-S3. - **Problem:** §3.7 says "Judge's quantitative recommendation governs when Pattern C is wired" but resolution is "by consumer policy" with **no consumer-policy schema**. A Switch wired to both envelopes has no rule for which `route_recommendation` to obey — every consumer invents its own. - **Disposition:** **ACCEPT.** Define a resolution policy with a shipped default: qualitative blockers survive a numeric pass; the Judge cannot override the Evaluator except for `contested`/`dismissed`/`rejected_by_user` findings; disagreement routes to human review. - **Fix (Common Contracts §3.7) — `[Appx E]`:** ```ts interface EvaluationChainResolutionPolicy { policy_id: string; chain_kind: EvaluationChainKind; qualitative_blockers_survive_numeric_pass: boolean; judge_can_override_evaluator: boolean; override_allowed_only_for_finding_states: FindingState[]; route_precedence: "blocking_qualitative_first" | "judge_quantitative_first" | "human_if_disagreement"; disagreement_route: "human_review" | "task_agent_assessment" | "block_until_resolved" | "prefer_blocking"; schema_version: "1.0"; } const DEFAULT_PATTERN_C_RESOLUTION_POLICY: EvaluationChainResolutionPolicy = { policy_id: "default.pattern_c.v1", chain_kind: "pattern_c_evaluator_then_judge", qualitative_blockers_survive_numeric_pass: true, judge_can_override_evaluator: false, override_allowed_only_for_finding_states: ["contested", "dismissed", "rejected_by_user"], route_precedence: "blocking_qualitative_first", disagreement_route: "human_review", schema_version: "1.0" }; ``` ### A-05 — [FD §3.4 / Common Contracts §4.2A] `EvaluationAuthorityBasis` vs `AssuranceBasis` reconciliation (ACCEPT WITH MODIFICATIONS) - **Raised by:** ChatGPT CG-S5; Claude B1 (sub-issue). - **Problem:** FD's `authority_basis: EvaluationAuthorityBasis[]` (9 values — what makes a finding a hard blocker, per FD §3.4 rule 2) is an unrelated model to V3.3.1's `basis: AssuranceBasis`. Both are currently live; naively merging them loses the blocker semantics. - **Disposition:** **ACCEPT WITH MODIFICATIONS.** Do **not** keep two parallel basis arrays. The canonical finding (A-01) carries `assurance_basis: AssuranceBasis[]`. The *blocking authority* role moves to a computed boolean on the delivery view: - **Fix:** ```ts // FeedbackFindingView already carries: blocking_authority_satisfied: boolean; // computed per FD §3.4 rule 2 from the finding's assurance_basis + severity ``` - **⚠verify:** read FD §3.4 rule 2 so the boolean's computation is faithful (this is the one place the two models genuinely differ — don't collapse blindly). ### A-06 — [V3.3.1 §0.4.1 / Common Contracts §3.1–3.2] `OutcomeEvaluationState` count & mapping skew (ACCEPT) - **Raised by:** Claude **B2**; ChatGPT CG-S10/S11/S12. - **Problem:** V3.3.1 enumerates **15** `OutcomeEvaluationState` values; Common Contracts §3.1 calls it a "14-value enum"; §3.2's verdict mapping covers 14 — `evaluating` is unmapped — and `max_iterations_reached` is referenced elsewhere but absent from the enum. The document miscounts its own enum. - **Disposition:** **ACCEPT.** Split **runtime** states (4 — never emitted to an envelope) from **disposition** states (12 — emitted, each with a verdict mapping); add `max_iterations_reached`; make a single matrix the source of truth for terminal/verdict/feedback-branch/UI-label/learning-eligibility per state. - **Fix (V3.3.1 §0.4.1 + Common Contracts §3.2) — `[Appx B]`:** ```ts type OutcomeEvaluationRuntimeState = "pending" | "pending_dependency" | "evaluating" | "dirty"; type OutcomeEvaluationDisposition = | "satisfied" | "needs_revision" | "needs_information" | "needs_verification" | "needs_human_judgment" | "unable_to_evaluate" | "blocked_by_policy" | "regressed" | "upstream_failure" | "unrecoverable" | "superseded" | "max_iterations_reached"; type OutcomeEvaluationState = OutcomeEvaluationRuntimeState | OutcomeEvaluationDisposition; // + OUTCOME_STATE_MATRIX (Appx B): per-state {state_class, emitted_to_envelope, terminal, // verdict_mapping, feedback_branch, blocks_downstream_default, revisor_default_action, // ui_label, learning_eligibility} — this matrix is normative; §3.2 references it rather than re-listing. ``` - **Rationale:** runtime states (`pending`/`evaluating`/`dirty`/`pending_dependency`) explicitly carry `emitted_to_evaluation_envelope:false`, which is exactly why `evaluating` was never mapped — it should never reach the verdict mapping at all. ### A-07 — [FD §2.1 / V3.3.1 §5.18] Indeterminate not representable; `IndeterminateCause` undefined (ACCEPT) - **Raised by:** ChatGPT CG-S12/S13/S14. - **Problem:** `EvaluationDecision.pass: boolean` cannot carry `indeterminate`/`not_applicable`; `IndeterminateCause` is referenced but never defined; there is no limitation taxonomy, so five distinct disposition states all collapse to "indeterminate" with no cause. - **Disposition:** **ACCEPT.** Replace the boolean with the verdict enum (A-06); add `IndeterminateCause`, `EvaluationLimitationKind`, the limitation→state mapping, and `SubstantiveVerdictStatus`. - **Fix (Common Contracts, new taxonomy section) — `[Appx C]`:** ```ts type EvaluationLimitationKind = | "insufficient_evidence" | "human_judgment_needed" | "missing_capability" | "source_unavailable" | "policy_blocked" | "stale_evidence" | "unable_to_ground_claim"; type IndeterminateCause = | "missing_information" | "missing_source" | "stale_source" | "missing_capability" | "policy_block" | "human_judgment_required" | "conflicting_evidence" | "tool_failure" | "timeout" | "unsupported_scope" | "blocked_before_substantive_verdict"; type SubstantiveVerdictStatus = | "substantive_verdict_reached" | "blocked_before_substantive_verdict" | "partial_substantive_verdict"; // + LIMITATION_STATE_MAPPING (Appx C): limitation → {default_state, default_indeterminate_cause, default_recovery_route} ``` ### A-08 — [FD §6.2] Routing completeness: missing branches + full state→routing matrix (ACCEPT) - **Raised by:** Claude **B4**; ChatGPT CG-S15/S16. - **Problem:** `FeedbackRoutingPolicy` has `on_satisfied`/`on_needs_revision`/`on_needs_more_sources`/`on_needs_source_verification`/`on_needs_format_repair`/`on_repeated_failure` but **no** `on_indeterminate`, `on_not_applicable`, `on_unrecoverable`, `on_blocked_by_policy`, `on_upstream_failure` — and indeterminate is not rare (5 states map to it). It also can't represent multiple simultaneous actions. - **Disposition:** **ACCEPT.** Adopt the closed `FeedbackBranch` set; the state→branch routing is the matrix's `feedback_branch` column (A-06), so routing and state stay in lockstep. Delivery/consumption receipts come from Appendix F. - **Fix (FD §6.2) — `[Appx B]` branch set + `[Appx F]` receipts:** ```ts type FeedbackBranch = | "on_satisfied" | "on_needs_revision" | "on_needs_more_sources" | "on_needs_source_verification" | "on_needs_human_judgment" | "on_blocked_by_policy" | "on_upstream_failure" | "on_unrecoverable" | "on_repeated_failure" | "none"; // FeedbackRoutingPolicy keys MUST cover every FeedbackBranch; the active branch is taken // from OUTCOME_STATE_MATRIX[state].feedback_branch. Delivery + consumption receipts: Appx F. ``` ### A-09 — [Common Contracts §4.2 / V3.3.1 §5.17.7] Phantom & misplaced schema refs + TypeOwnerRegistry (ACCEPT) - **Raised by:** Claude **B5 + B6**; ChatGPT (ref findings). - **Problem:** Four ownership errors. (1) V3.3.1 §5.17.7 says `ClaimSetBundle`/`ExtractedEvaluationUnit` live in Common Contracts, but §1.2 explicitly puts them **out of scope** (owner = Addenda A). (2) Common Contracts §4.2 says `ResearchNeed` lives in Core — it's in **Source Workspace §6.2**. (3) Same §4.2 says `OutcomeRepairInstruction` lives in Core — it's in **FD §5.2**. (4) `EvaluationAffirmation` is referenced but **defined nowhere** (phantom). A coding agent following these refs looks in the wrong document four times. - **Disposition:** **ACCEPT.** Install a `TypeOwnerRegistry` as the single ownership source and fix the four pointers. **Decide `EvaluationAffirmation`: DELETE** it from the qualitative slice — its real need (the positive/"what the artifact got right" counterpart) is met by the verdict-aware `OutcomeEvaluationSignal` denominator, which is **Phase-B-gated** (see E-05), not by a phantom finding type. - **Fix (Common Contracts, new governance section) — `[Appx A]`:** ```ts // TypeOwnerRegistry: one entry per shared type; validation.type_owner_drift.* on mismatch. // Canonical homes set by this registry: // EvaluationResultEnvelope → Common Contracts §3 (canonical) // EvaluationFinding → Common Contracts §4.2A v2.0 (canonical; projection: FeedbackFindingView) // OutcomeRepairInstruction → FD §5 (canonical) ← fixes ref (3) // ResearchNeed → Source Workspace §6 (canonical) ← fixes ref (2) // ClaimSetBundle / ExtractedEvaluationUnit → Addenda A (imported) ← fixes ref (1) // CostEstimate → Common Contracts §9A (canonical) // V3.3.1 §5.17.7: cross-ref ExtractedEvaluationUnit/ClaimSetBundle → Addenda A; keep only ArtifactScopeRef → Common Contracts §7. // Common Contracts §4.2: EvaluationAffirmation → REMOVED from QualitativeSlice (see E-05). ``` - **⚠verify:** confirm `EvaluationAffirmation` is truly absent before deleting (review grep reports zero hits — low risk). ### A-10 — [Common Contracts §3 / §11] `EvaluationArtifactEnvelope` wrapper out-of-document; `overall_state` too B-specific (ACCEPT WITH MODIFICATIONS) - **Raised by:** ChatGPT CG-S22, CG-AA4. - **Problem:** the wrapper that governs envelopes is referenced from outside the set, and `overall_state` is shaped for Addenda B, so other producers (e.g. the Addenda A Experiment) can't reuse it. - **Disposition:** **ACCEPT WITH MODIFICATIONS.** Bring `EvaluationArtifactEnvelope` into Common Contracts as the universal governance wrapper (the Addenda A coordination rows R199/R213 already adopt it on that side); generalize `overall_state` to the canonical disposition set (A-06) so every producer maps onto one vocabulary. - **Fix:** TypeOwnerRegistry entry (`EvaluationResultEnvelope`/`EvaluationArtifactEnvelope` canonical in Common Contracts) + `overall_state: OutcomeEvaluationDisposition` (from A-06) replacing the B-specific union. ### A-11 — [V3.3.1 §15 / Common Contracts] `LearningMode` enum referenced, never enumerated (ACCEPT — value set; *use* DEFER-PhaseB) - **Raised by:** Claude **C5**; ChatGPT (learning/math). - **Problem:** `LearningMode` is referenced across §15 but never enumerated, so its value set is unknown to an implementer. - **Disposition:** **ACCEPT** the enumeration now (it's a contract surface). The *behavior* it gates — how the learning engine consumes each mode — is **DEFER-PhaseB** (E-07), because that behavior is part of the phantom learning spec the corpus audit must surface. - **Fix:** ```ts type LearningMode = "production" | "signal_generation" | "cross_calibration"; // value set only ``` - **⚠verify:** confirm the intended members against §15 (names inferred from §15 usage, not from a definition). ### A-12 — [FD §5 / Common Contracts] `HumanOutcomeFeedbackEvent` referenced, never defined (ACCEPT) - **Raised by:** Claude **D16**. - **Problem:** the human-override path references `HumanOutcomeFeedbackEvent` (and a `HumanGateSummary`-class type) but no schema exists, so a human override has no recordable shape. - **Disposition:** **ACCEPT.** Define the schema. *(Not in the appendix package — authored here as a proposed definition for your review.)* - **Fix (Common Contracts, human-feedback section) — PROPOSED, review wording:** ```ts interface HumanOutcomeFeedbackEvent { event_id: string; result_id: string; outcome_id: string; run_id: string; human_actor_ref: string; occurred_at: ISO8601; decision: "override_to_pass" | "override_to_fail" | "confirm" | "request_changes" | "defer"; overridden_verdict?: EvaluationVerdict; resulting_verdict: EvaluationVerdict; rationale_text: string; target_scope_ref?: ArtifactScopeRef; taint_class: TaintClass; data_class: "public" | "internal" | "privileged" | "local_only"; privileged: boolean; policy_decision_refs: PolicyEvaluationRef[]; schema_version: "1.0"; } ``` - **Rationale:** mirrors the canonical finding's governance fields (taint/data_class/policy refs) so human overrides flow through the same audit/learning seams. Confirm the action enum matches the intended human-gate UX. ### A-13 — [Core §9 vs Common §5] Core duplicates the learning-signal envelope with malformed TypeScript (ACCEPT) - **Raised by:** ChatGPT CG-S21. - **Problem:** Core re-declares the common `EvaluationLearningSignalEnvelope`, and the duplicated TypeScript is malformed — two sources of truth, one of them broken. - **Disposition:** **ACCEPT.** Single canonical envelope in Common Contracts §5; Core §9 **imports** it and deletes its copy. - **Fix:** TypeOwnerRegistry: `EvaluationLearningSignalEnvelope` canonical = Common Contracts §5; add normative line to Core §9 — "imports `EvaluationLearningSignalEnvelope` from Common Contracts §5; MUST NOT redeclare." Remove the malformed Core block. ### A-14 — [Common Contracts §4.2 / validation] Qualitative-slice owner map wrong; slice-required check only a warning (ACCEPT) - **Raised by:** ChatGPT CG-S20; Grok GK-4.8. - **Problem:** two issues — the qualitative-slice owner map is wrong (same root as A-09), and `validation.envelope_judge_emitted_qualitative_slice` is only a **warning** even though Pattern C **requires** the slice, so a non-conforming Pattern C envelope passes validation. - **Disposition:** **ACCEPT.** Owner map fixed by the TypeOwnerRegistry (A-09); promote the validation to an **error** when `chain_kind = pattern_c_evaluator_then_judge`. - **Fix:** ```ts // validation.envelope_judge_emitted_qualitative_slice: // severity = "error" WHEN chain_kind === "pattern_c_evaluator_then_judge" // severity = "warning" otherwise ``` ### A-15 — [Common Contracts §11.5 / cross-doc obligations] "Build-ready" overstated; command registry not mandatory; compat claim too strong (ACCEPT WITH MODIFICATIONS) - **Raised by:** ChatGPT CG-C4/C6/C7; Grok GK-4.4; Claude **D5/D22**. - **Problem:** the set is labeled build-ready while undefined imported types and pending cross-doc obligations remain; the command registry (Core routes/commands) is described but **not mandatory**; the §11.5 backward-compat claim overstates stability; section-number cross-refs are used where stable anchors are needed before the DOC23 R3.2 absorption. - **Disposition:** **ACCEPT WITH MODIFICATIONS.** Gate the "build-ready" status on (a) TypeOwnerRegistry populated with no `pending_absorption` blockers, and (b) OP-A obligations closed; make the command registry **mandatory** (Appendix I); soften §11.5 to "compatible within the locked schema-of-record set, no cross-version guarantee yet"; convert section-number cross-refs to stable anchors as pre-absorption hygiene. - **Fix:** prose obligations above + Appendix I (command registry, see §D-14 next pass) + anchor-stabilization rule applied across the set before R3.2 absorption. --- ### A-16 — [FD §7.2 / V3.3.1 §3.1.3] Direct repair wiring can bypass the central `revision_in` safety contract (ACCEPT — **CRITICAL**) - **Raised by:** ChatGPT (BUG/CRITICAL). - **Problem:** V3.3.1 requires meaning-bearing repair to go through a declared `revision_in` capability. But Feedback Delivery's examples route repair directly to `DraftRevision.instruction_in`, `FormatChecker.data_in`, and `RevisionModule.context_in`. Those direct paths **bypass capability validation, preconditions, candidate versions, policy gates, preservation checks, and receipts** — i.e., every safety mechanism the review (§6.1) calls the highest-leverage design idea in the set. This is the most severe omission the audit found. - **Disposition:** **ACCEPT.** Amend FD so repair can execute only through a revision-eligible port; all other wiring is advisory context that must not mutate the artifact. - **Fix (FD §7.2 amendment):** ```ts // Repair instructions may EXECUTE (mutate an artifact) only through: // (a) a port named `revision_in`, OR // (b) a port explicitly declared revision_compatible = true WITH ModuleRevisionCapability coverage. interface PortRevisionEligibility { port_id: string; module_id: string; revision_compatible: boolean; // default false module_revision_capability_ref?: StorageRef; // REQUIRED when revision_compatible = true } // All other direct wiring (instruction_in / data_in / context_in) carries repair as ADVISORY CONTEXT ONLY. // validation.repair_routed_to_non_revision_port (error): a meaning-bearing OutcomeRepairInstruction wired to a // port that is neither revision_in nor revision_compatible=true with capability coverage. ``` - **Rationale:** restores the single safety chokepoint. Without it, a coding agent can wire a "shortcut" repair path that silently skips policy/preservation/receipts — exactly the phantom-control class Will guards against. Pairs with A-17 (Revisor is the planner, not a revision target). ### A-17 — [FD §5.3 / V3.3.1 §9] Revisor input confused with `revision_in` (ACCEPT) - **Raised by:** ChatGPT (BUG/HIGH). - **Problem:** FD says the Revisor consumes `OutcomeRepairInstruction` via `revision_in`. But V3.3.1 treats `revision_in` as the port on **revision-capable modules that perform repairs**; the Revisor is the **planner** that compiles repair instructions into a plan. Wiring the planner's input as `revision_in` reverses the architecture and risks an implementer building the Revisor as a revision *target*. - **Disposition:** **ACCEPT.** Give the Revisor planner-appropriate input ports; reserve `revision_in` for artifact-mutating modules. - **Fix (FD §5.3 + V3.3.1 §9):** ```ts // Revisor (planner) input ports — NOT revision_in: // feedback_bundle_in, repair_instruction_in, evaluation_result_in // revision_in is reserved for revision-capable (artifact-mutating) modules that EXECUTE the compiled plan. // validation.revisor_declares_revision_in (error). ``` ### A-18 — [FD §1.2 / §2.3] Feedback-bundle emission discipline contradicts itself (ACCEPT) - **Raised by:** ChatGPT (BUG/HIGH). - **Problem:** FD says both the envelope and the feedback bundle are emitted by every evaluator producer, then says deterministic scorers emit a bundle only on failure and pass cases may emit envelope-only. So a missing bundle is ambiguous: not emitted by design, lost, not applicable, or policy-filtered — a consumer can't tell which. - **Disposition:** **ACCEPT.** Add an emission matrix by producer-kind × verdict; require either a bundle or an explicit absence reason. - **Fix (FD §1.2 / §2.3):** ```ts interface FeedbackEmissionMatrixEntry { producer_kind: "outcome_evaluator" | "judge" | "deterministic_scorer"; verdict: EvaluationVerdict; emits_envelope: boolean; emits_feedback_bundle: "always" | "on_failure_only" | "never"; } type FeedbackBundleAbsentReason = "not_emitted_by_design" | "not_applicable" | "policy_filtered"; // Every result MUST carry a feedback bundle OR a feedback_bundle_absent_reason. validation.feedback_bundle_absence_unexplained (error). ``` ### A-19 — [V3.3.1 §0.4.4 / §5.7.1] `FindingState` pass-through `proposed` has no negative exit (ACCEPT) — Claude C2 - **Raised by:** Claude C2 (BUG/HIGH). - **Problem:** the §5.7.1 transition table gives `proposed → active` as the only outbound transition from `proposed`. There is no `proposed → dismissed` for a finding the Evaluator decides **not** to confirm, so unconfirmed findings sit in `proposed` forever or implementations invent an unlisted transition. - **Disposition:** **ACCEPT.** Add the negative exit + an auto-dismiss at activation termination. (The canonical `FindingState` from A-01 already includes `dismissed`; this supplies the missing *transition*.) - **Fix (V3.3.1 §5.7.1):** ``` add transition: proposed → dismissed predicate: "Evaluator did not confirm the finding before activation completion" A finding still in `proposed` at activation termination auto-transitions to: state = dismissed, dismissal_reason = "not_confirmed_at_termination". ``` ### A-20 — [Common Contracts §6.4 / §9.4] `unanchored_llm_judgment` acknowledgment has no field to write into (ACCEPT) — Claude C6 - **Raised by:** Claude C6 (BUG/MEDIUM). - **Problem:** §9.4 requires explicit user acknowledgment when a required criterion is scored by `unanchored_llm_judgment`, but no schema has a field to store that acknowledgment — so the warning fires every time (loud) or the ack lands in an undocumented field. (Partially related to C-01's `unanchored_llm_judgment_policy`, which governs *aggregation eligibility*, not the *acknowledgment record*.) - **Disposition:** **ACCEPT.** Add the acknowledgment fields to `Criterion`. - **Fix (Common Contracts, `Criterion`):** ```ts // add to Criterion: unanchored_aggregation_acknowledged_by_user: boolean; // default false unanchored_ack_user_ref?: string; unanchored_ack_at?: ISO8601; // Warning fires WHEN scoring_basis == "unanchored_llm_judgment" && required == true // && unanchored_aggregation_acknowledged_by_user == false; silences once acknowledged. ``` ### A-21 — [FD §2.2 / §3.3] Optional fields are required by invariants (ACCEPT — largely satisfied by A-01) - **Raised by:** ChatGPT (BUG/MEDIUM). - **Problem:** FD says `findings[i].based_on_artifact_version_ref` must resolve to a version in the snapshot, but the field is optional — so validators can't enforce the invariant consistently. - **Disposition:** **ACCEPT.** Already addressed structurally by A-01's canonical `EvaluationFinding`, which carries both `based_on_artifact_version_ref` and `based_on_artifact_version_absent_reason`. Add the enforcing validation. - **Fix:** ``` // Satisfied by A-01 canonical EvaluationFinding. Add: // validation.artifact_targeted_finding_missing_version_ref (error): finding_kind targets an artifact AND // both based_on_artifact_version_ref and based_on_artifact_version_absent_reason are null. ``` ### A-22 — [Common Contracts §12] Pending consumers lack degraded-mode behavior (ACCEPT) - **Raised by:** ChatGPT (GAP/HIGH). - **Problem:** Common Contracts lists pending target updates to DOC8/BDSM, EC Core, DOC20, DOC72, PropA, etc. Until those land, producers can emit envelopes/signals **no consumer can interpret**, with no defined behavior. - **Disposition:** **ACCEPT.** Each pending cross-doc obligation declares an "until-target-lands" behavior, recorded on its TypeOwnerRegistry entry (`status: pending_absorption`, A-09). - **Fix:** ```ts type PendingConsumerDegradedBehavior = "persist_only" | "suppress_promotion" | "disable_ui_affordance" | "emit_validation_warning" | "block_route"; // Every TypeOwnerRegistryEntry with status "pending_absorption" carries pending_consumer_behavior: PendingConsumerDegradedBehavior. ``` - **Rationale:** also feeds A-15 — "build-ready" can't be claimed while any pending obligation lacks a declared degraded behavior. ### A-23 — [FD §3 / §5] `ApplicabilityScope.authority_level` vs `domain_payload.authority_level` conflict (ACCEPT WITH MODIFICATIONS) — Claude D23 - **Raised by:** Claude D23 (BUG/MEDIUM). - **Problem:** two `authority_level` fields at different nesting levels with no rule for which governs on disagreement — a coding agent can't determine the effective authority level. - **Disposition:** **ACCEPT WITH MODIFICATIONS.** Define precedence (proposed below — confirm direction) or merge to one field. - **Fix (PROPOSED — confirm precedence):** ``` // ApplicabilityScope.authority_level GOVERNS; domain_payload.authority_level is advisory metadata. // (Alternative: merge to a single authority_level on ApplicabilityScope and drop the payload copy.) // validation.authority_level_conflict (warning) when both present and differing. ``` ### A-24 — [Common Contracts §3.4 vs V3.3.1 §5.18.8] Parallel Judge+Evaluator example contradicts topology (ACCEPT) — Claude C14 - **Raised by:** Claude C14 (BUG/LOW). - **Problem:** §3.4's example ("Judge and Evaluator running in parallel on the same artifact version") fits no specified topology — Pattern C has the Judge *consume* the Evaluator's output, so they can't run in parallel. - **Disposition:** **ACCEPT.** Fix the example. - **Fix:** replace the §3.4 example with "two Evaluator activations on the same snapshot during an Experiment, or an Evaluator and a deterministic scorer in parallel" — or specify the genuine parallel topology. Removes a wiring trap. ### A-25 — [Common Contracts §3.1 vs V3.3.1 §5.1] `evaluation_chain_id` naming asymmetry (ACCEPT — fold into A-03) — Claude D8 - **Raised by:** Claude D8 (BUG/LOW). - **Problem:** the field is `target_evaluation_chain_id` on the envelope but referred to as `evaluation_chain_id` in V3.3.1 §5.1 prose — same concept, two names. - **Disposition:** **ACCEPT.** Normalize to `target_evaluation_chain_id` everywhere (fold into the A-03 chain-lifecycle work). `validation.chain_id_name_drift` (lint). --- ### A-26 — [V3.3.1 §5.5.4 / §5.7] `ProgressSignal` matching references fields missing from `EvaluationFinding` (ACCEPT) - **Raised by:** ChatGPT Audit Addendum (BUG/HIGH). - **Problem:** `ProgressSignalRecord` compares findings by `(failure_kind, target_artifact_section_ref, finding_summary_hash)`, but `EvaluationFinding` doesn't carry those fields — so repeated-failure detection, same-reason classification, and strategy-switching have no defined match keys, forcing implementers to invent hashing/scope rules. - **Disposition:** **ACCEPT.** Define the `FindingMatchKey` shape and mandate `ProgressSignal` consume it. *(The canonical `EvaluationFinding` from A-01 already carries a `match_key` field — this completes A-01 by defining that field's structure.)* - **Fix (Common Contracts §4.2A — `[Appx, addendum]`):** ```ts interface FindingMatchKey { failure_kind: FailureKind; target_scope_ref: ArtifactScopeRef | null; finding_summary_hash: string; normalized_finding_text_hash: string; criterion_id?: string; evidence_signature_hash?: string; schema_version: "1.0"; } // EvaluationFinding.match_key: FindingMatchKey (A-01 field, now typed) // ProgressSignalRecord MUST match on EvaluationFinding.match_key (not ad-hoc fields). // validation.progress_signal_match_key_drift (error) if ProgressSignal references match fields absent from match_key. ``` ### A-27 — [V3.3.1 §5.16 / Source Workspace] `EvaluationSnapshot` source-workspace hashes typed as artifact hashes (ACCEPT) - **Raised by:** ChatGPT Audit Addendum (BUG/HIGH). - **Problem:** `EvaluationSnapshot.source_workspace_head_hashes` is typed `Record`, but the snapshotted values are Source Workspace heads, source records, source sets, research-need queues, and verification records — **not** artifact refs. The invalid type model breaks audit, revalidation, live-edit checks (B-03), and source-freshness detection. (Distinct from B-03, which is the *precondition*; this is the *type* that precondition compares.) - **Disposition:** **ACCEPT.** Replace with a workspace-native snapshot-hash structure. - **Fix (`[Appx, addendum]`):** ```ts interface SourceWorkspaceSnapshotHashSet { source_workspace_ref: string; source_workspace_head_hash: string; source_record_hashes: Record; source_set_hashes?: Record; research_need_queue_hash?: string; verification_record_hashes?: Record; freshness_record_hashes?: Record; run_guidance_hashes?: Record; schema_version: "1.0"; } // EvaluationSnapshot: REMOVE source_workspace_head_hashes: Record; // ADD source_workspace_state_ref: StorageRef + source_workspace_snapshot_hashes: SourceWorkspaceSnapshotHashSet[]. ``` - **Rationale:** this is the concrete `SnapshotHash` type that B-03's `WorkspaceWritePrecondition.expected_snapshot_hash` and D-12 retention compare against — without it, "compare the snapshot hash" has no well-typed object. --- --- # §B. Runtime / concurrency / distributed-systems ### B-01 — [V3.3.1 §11 / Core §runtime] No unified RevisionExecutionLifecycle / dispatcher↔receipts state machine (ACCEPT) - **Raised by:** Grok GK-4.1 / GK-5.6; ChatGPT (runtime). - **Problem:** the execution lifecycle (plan compiled → dispatched → step receipts → revalidation → terminal) is specified in pieces across §11.x with no single owning state machine. A coding agent assembles the order from prose and the dispatcher↔receipt handshake is implicit. - **Disposition:** **ACCEPT.** Define one normative `RevisionExecutionLifecycle` state machine in V3.3.1 §11 (states + legal transitions + which receipt drives each transition), anchored by the Appendix M primitives (`PendingDependencyInfo`, `HardCallPendingPolicy`) and the cancel/known-good types below. - **Fix:** add a §11.0 lifecycle FSM listing states `{plan_compiled, dispatched, step_running, step_receipt_received, revalidating, regression_detected, terminal_satisfied, terminal_failed, cancelled, hard_call_pending}` and the receipt that triggers each edge; every other §11.x rule references a state in this FSM rather than re-describing flow. ### B-02 — [V3.3.1 §11.x] Idempotency under-specified and not canonical (ACCEPT) - **Raised by:** ChatGPT CG-R1 / CG-AA9; Grok GK-4.5 / GK-5.13. - **Problem:** the idempotency formula isn't total over plan-step kinds (some step kinds have no defined key), and the key format is specified in two places without a canonical form, so two dispatchers compute different keys for the same logical step. - **Disposition:** **ACCEPT.** One canonical idempotency key, total over a discriminated step-kind union, defined once. - **Fix (V3.3.1 §11, single normative block — PROPOSED form, confirm field names):** ```ts type StepKind = "artifact_mutation" | "source_research" | "tool_call" | "sub_agent_invocation" | "human_request" | "forum_post" | "side_effect_intent" | "no_op_marker"; interface StepIdempotencyKey { run_id: string; plan_id: string; step_id: string; step_kind: StepKind; // total: every StepKind maps to a key recipe target_ref_hash: string; // canonical hash of the step's target (artifact/version/tool/etc.) input_payload_hash: string; // RFC 8785 canonical-JSON hash of inputs attempt_class: "first" | "retry" | "regenerate"; } // canonical string = sha256(canonicalJson(key)) ``` - **Rationale:** discriminated `step_kind` makes the formula total; `target_ref_hash`+`input_payload_hash` make retries of the same logical step collide deterministically while a `regenerate` (B-04 / `previous_attempt_hash`) is distinguishable. ### B-03 — [Source Workspace §x / V3.3.1 §11.20] Concurrent-write safety on the Source Workspace (ACCEPT) ⚠verify - **Raised by:** Gemini **F-01** (#1 build blocker) + D-05; ChatGPT CG-SW9; relates to Claude B9. - **Problem:** a `direct_fix` mutation can race a live UI edit (or a parallel module write) to the same workspace record with no precondition check, silently corrupting it; no transaction boundaries or read/write locks are specified. - **Disposition:** **ACCEPT.** Mandate a snapshot-hash precondition on every in-place workspace write; on mismatch, abort and re-evaluate rather than overwrite. Add explicit lock semantics. - **Fix (Source Workspace write path):** ```ts interface WorkspaceWritePrecondition { target_record_ref: StorageRef; expected_snapshot_hash: string; // hash read at plan time on_mismatch: "abort_and_reevaluate" | "queue_behind_lock" | "fail_outcome"; // default abort_and_reevaluate lock_mode: "optimistic_snapshot_hash" | "pessimistic_write_lock"; // default optimistic } // direct_fix MUST carry a WorkspaceWritePrecondition; dispatcher computes the real hash at execution // and compares to expected_snapshot_hash before applying. validation.workspace_write_without_precondition (error). ``` - **⚠verify:** confirm the current `direct_fix` write path and whether any precondition exists (Gemini asserts none; this is the single item Gemini calls catastrophic if missed — worth confirming before patch). ### B-04 — [V3.3.1 §11.20.2 / §11.22] Rolling hash unsafe under DAG/parallel execution (ACCEPT) - **Raised by:** ChatGPT CG-R2 / CG-R3 / CG-AA10; Claude **B9**. - **Problem:** §11.22 allows up to `max_parallel_steps_per_plan: 4`, but §11.20.2 Rolling-Hash Mode B requires "step N+1 validates against predicted hash from step N." Two parallel steps mutating the same artifact can't both validate against one base — the chain is nondeterministic. §11.20.2 forbids concurrent *plans* on the artifact but not concurrent *steps* within one plan. (Distinct from C-03, which is the LLM-predicted-hash problem.) - **Disposition:** **ACCEPT.** Rolling-hash Mode B requires sequential execution across all steps mutating the same artifact; parallelism allowed only between steps on disjoint artifacts. - **Fix (V3.3.1 §11.20.2 + §11.22):** ``` §11.20.2: "Rolling-hash Mode B requires sequential step execution across all steps that mutate the same artifact. Parallelism within a plan is permitted only between steps targeting disjoint artifacts." §11.22: parallel batches automatically degrade to sequential when any step is rolling-hash Mode B on a shared artifact. validation.rolling_hash_parallel_steps_same_artifact (error) ``` ### B-05 — [V3.3.1 §11.21 / §11.21.2 / §5.14.1] Revalidation cascade: no convergence bound + duplicated upstream-failure rule (ACCEPT) - **Raised by:** Claude **B7** (no convergence) + **B8** (duplicated rule). - **Problem (B7):** §11.21 Phase 4 re-triggers the Revisor on regression with no bound on cascade depth; two outcomes with bidirectional `OutcomeDependencySpec.invalidated_by_outcomes` can ping-pong revisions forever — per-outcome budget never trips because each cycle burns a *different* outcome's budget; `per_plan_max_replans` is plan-level, not cascade-level. **Problem (B8):** the `upstream_failure_cascade` rule is stated in both §5.14.1 and §11.21.2 (invites drift) and neither handles the race where an outcome enters `pending_dependency` *after* the cascade fired. - **Disposition:** **ACCEPT.** Add a cascade-depth bound + cycle detection; consolidate the duplicated rule into one section with a state-entry guard for the race. - **Fix (V3.3.1 §6.14 RevisorConfig + §11.21.2 + §22):** ```ts // RevisorConfig: max_revalidation_cascade_depth: number; // default 5, measured from the originating mutation receipt // Loop Controller tracks cascade chains; a regression re-entering an outcome already in the chain → abort: // validation.revalidation_cascade_loop (new, §22) // HardRevisionCallKind += "revalidation_cycle" // surface tie-break to the user // §11.21.2 becomes the single home of upstream_failure_cascade; §5.14.1 links to it (no restating). // State-entry guard: an outcome transitioning to pending_dependency AFTER a cascade fired on its // upstream module is auto-evaluated against the upstream_failure set at state entry (not a new cascade pass). ``` ### B-06 — [V3.3.1 §dependency / Common Contracts] Outcome-dependency cycles + `pending_dependency` deadlock (ACCEPT) - **Raised by:** Claude C4; Gemini **D-03**. - **Problem:** dependency direction is undefined for cycles, and an upstream `could_not_fix`/halt does not instantly cascade to downstream `pending_dependency` outcomes, so the graph hangs waiting on an artifact that will never arrive. - **Disposition:** **ACCEPT.** Define cycle detection at dependency-declaration time (reject or break with a Hard Call) and an instant downstream cascade on terminal upstream failure. - **Fix:** dependency graph rejects declared cycles (`validation.outcome_dependency_cycle`); on upstream terminal-failure, all transitively dependent `pending_dependency` outcomes immediately transition to `upstream_failure` (per the A-06 matrix), rather than waiting on `wait_timeout`. ### B-07 — [V3.3.1 §11.22] Parallel sibling outputs orphaned after batch failure; parallelism not first-class (ACCEPT) - **Raised by:** ChatGPT CG-R6 / CG-R7 / CG-AA11. - **Problem:** when one step in a parallel batch fails, the sibling steps' completed outputs have no defined finalization/disposition (orphaned candidates), and the parallel-batch configuration isn't a first-class, inspectable object. - **Disposition:** **ACCEPT.** Add a `ParallelBatchFinalizationReceipt` recording per-sibling disposition on batch failure; surface the parallelism config. - **Fix:** ```ts interface ParallelBatchFinalizationReceipt { batch_id: string; plan_id: string; run_id: string; sibling_results: Array<{ step_id: string; status: "completed" | "failed" | "cancelled" | "orphaned"; candidate_ref?: StorageRef; disposition: "retained" | "discarded" | "held_for_review"; }>; batch_outcome: "all_completed" | "partial_failure" | "aborted"; created_at: ISO8601; schema_version: "1.0"; } ``` ### B-08 — [V3.3.1 §failure / Common Contracts] Failure-taxonomy gaps (ACCEPT) - **Raised by:** ChatGPT CG-R9 / CG-R10 / CG-R11 / CG-R12. - **Problem:** several failure states are emitted but not registered in any taxonomy: `receipt_persist_failed`, `preempted`, `local_resource_exhausted`, and budget-failure is split three ways inconsistently. An implementer can emit a failure value no consumer recognizes. - **Disposition:** **ACCEPT.** One registered failure taxonomy covering all emitted values, with budget-failure unified. - **Fix:** ```ts type ExecutionFailureKind = | "could_not_fix" | "failed_runtime" | "rejected_capability" | "receipt_persist_failed" | "preempted" | "local_resource_exhausted" | "budget_exhausted" | "timeout" | "upstream_failure"; // budget_exhausted unifies the three split forms // validation.unregistered_failure_kind (error) on any emitted value outside this union. ``` ### B-09 — [V3.3.1 §candidate] Candidate artifacts can't model external side effects; lifecycle mixed with head-pointer (ACCEPT) - **Raised by:** ChatGPT CG-R13 / CG-R14. - **Problem:** `CandidateArtifactVersion` conflates the version's lifecycle state with the "current head" pointer, and there's no way to model a candidate whose application has an external side effect (the thing that can't be branched — see §F-01). - **Disposition:** **ACCEPT.** Separate candidate lifecycle from the head pointer; route side-effecting candidates through `SideEffectIntentCandidate` (Appendix N). - **Fix (V3.3.1 candidate model + Appendix N):** ```ts // Split: CandidateArtifactVersion.lifecycle_state (candidate|accepted|rejected|superseded|reverted) // ArtifactHead.current_version_ref (separate projection; one head per artifact) // Side-effecting application uses SideEffectIntentCandidate [Appx N]: // { side_effect_class, dry_run_payload_ref, approval_status, execution_policy_ref, // state: draft|approved|executed|cancelled|blocked, execution_receipt_ref } ``` ### B-10 — [V3.3.1 §11 / FD] `TaskCancelProtocol` + Hard Call blocking scope undefined (ACCEPT) - **Raised by:** ChatGPT CG-R15 / CG-R16. - **Problem:** there is no clean mid-run cancel protocol, and a pending blocking Hard Call doesn't say what it blocks (the step, the outcome, the artifact, the whole run, or just side effects). - **Disposition:** **ACCEPT.** Adopt Appendix M's `TaskCancelProtocol` and `HardCallPendingPolicy`/`HardCallBlockingScope` verbatim. - **Fix — `[Appx M]`:** ```ts type HardCallBlockingScope = "entire_run" | "segment" | "artifact" | "outcome" | "module" | "side_effect_only"; interface HardCallPendingPolicy { hard_call_id: string; blocking_scope: HardCallBlockingScope; blocked_refs: string[]; allowed_to_continue_refs: string[]; context_visible_to_continuing_modules: "none" | "hard_call_pending_summary" | "full_context_redacted"; on_defer: "continue_with_warning" | "pause_scope" | "abort_scope"; on_timeout: "escalate" | "abort" | "continue_with_warning"; timeout_ms?: number; schema_version: "1.0"; } interface TaskCancelProtocol { cancel_request_id: string; task_id: string; run_id: string; requested_by_ref: string; cancel_scope: "entire_run" | "segment" | "module_activation" | "revision_plan" | "side_effect_intent"; target_refs: string[]; in_flight_handling: "request_graceful_stop" | "preempt_immediately" | "finish_current_step_then_stop"; side_effect_policy: "do_not_cancel_executed_side_effects" | "cancel_unexecuted_intents" | "create_corrective_artifact"; candidate_disposition: "discard" | "retain_for_manual_review" | "orphan_until_reconciled"; source_workspace_disposition: "retain_records" | "mark_records_cancelled" | "rollback_if_uncommitted"; learning_signal_policy: "suppress_success_signals" | "emit_cancel_diagnostic_only" | "emit_full_signals_with_cancel_flag"; user_receipt_ref?: StorageRef; created_at: ISO8601; schema_version: "1.0"; } ``` - **Rationale:** `side_effect_policy: do_not_cancel_executed_side_effects` is the cancel-side counterpart of the §F-01 "side effects can't branch" principle — cancel never pretends an executed effect is undone. ### B-11 — [V3.3.1 §skip / FD] `TaskSkipProtocol` + skip receipts missing (ACCEPT) - **Raised by:** ChatGPT CG-AA5. - **Problem:** an outcome/step can be skipped, but there's no protocol or receipt recording who skipped it, why, and the downstream effect. - **Disposition:** **ACCEPT.** Add a skip protocol mirroring cancel, with a receipt. - **Fix:** ```ts interface TaskSkipReceipt { skip_id: string; target_ref: string; skipped_by_ref: string; reason: "not_applicable" | "user_directed" | "dependency_unavailable" | "policy_blocked"; downstream_effect: "none" | "marks_dependents_not_applicable" | "requires_human_ack"; created_at: ISO8601; schema_version: "1.0"; } ``` ### B-12 — [Common Contracts §policy / V3.3.1] Policy freshness checks a field `PolicyDecisionRef` lacks (ACCEPT) - **Raised by:** ChatGPT CG-R8. - **Problem:** freshness logic expects a `superseded_by_decision_id` (and related staleness fields), but `PolicyDecisionRef` has no such field, so freshness can't actually be evaluated. - **Disposition:** **ACCEPT.** Add the freshness fields, or make the EC policy record the authoritative freshness source. - **Fix:** ```ts // add to PolicyDecisionRef (or PolicyEvaluationRef): issued_at: ISO8601; subject_hash: string; decision_scope_hash: string; superseded_by_decision_id?: string; policy_engine_version: string; // alt: mark EC's policy record as the required freshness source-of-truth and have the ref point to it. ``` ### B-13 — [Source Workspace §6 / Common Contracts] `ResearchNeed` concurrency — needs a lease (ACCEPT) - **Raised by:** Claude **D15**. - **Problem:** two modules can independently pick up and satisfy the same `ResearchNeed` (duplicate work, conflicting results), because there's no lease/claim with idempotency. - **Disposition:** **ACCEPT.** Add a `ResearchNeedLease` so a need is claimed atomically. - **Fix:** ```ts interface ResearchNeedLease { need_id: string; leased_by_module_id: string; lease_token: string; acquired_at: ISO8601; expires_at: ISO8601; on_expiry: "release_for_reclaim" | "escalate" | "mark_abandoned"; satisfied_by_result_ref?: StorageRef; schema_version: "1.0"; } // acquisition is atomic compare-and-set on need status; second acquirer gets need_already_leased. ``` ### B-14 — [Task Forum §x] Forum deadlock breaker missing (ACCEPT) - **Raised by:** Gemini **F-02**; Claude D3 / D20. - **Problem:** the Task Forum has no tie-break, timeout, or max-rounds, so deliberation can hang with no circuit breaker. - **Disposition:** **ACCEPT.** Forum runs as an FSM with a deliberation tick cap and a consensus threshold; on expiry it escalates to a Hard Call. - **Fix:** ```ts interface ForumDeliberationPolicy { room_id: string; max_deliberation_ticks: number; consensus_threshold_pct: number; on_no_consensus: "forum_deadlock_hard_call" | "task_agent_decides" | "prefer_safest_proposal"; schema_version: "1.0"; } // reaching max_deliberation_ticks without consensus_threshold_pct → state forum_deadlock → HardCall escalation. ``` ### B-15 — [Task Forum §room] `RoomKind.plan_review` referenced but never registered (ACCEPT) ⚠verify - **Raised by:** Grok GK-5.14. - **Problem:** `RoomKind.plan_review` is used but not registered in any room-kind registry, so a coding agent can't instantiate the plan-review forum. - **Disposition:** **ACCEPT.** Register the room kind in the Forum room-kind registry with its participant policy. - **Fix:** add `plan_review` to the `RoomKind` registry with allowed participants, moderator condition, and the `ForumDeliberationPolicy` default (B-14). - **⚠verify:** confirm the registry's current member list and that `plan_review` is genuinely absent. ### B-16 — [V3.3.1 §11.9] Concurrency tie-breaker relies on wall-clock `created_at` (ACCEPT-AS-FIX) - **Raised by:** Gemini **BUG-04**. - **Problem:** Rule 4 of `concurrency_tie_breaker` is `RevisionPlan.created_at` ascending. In a local multi-threaded runtime, millisecond timestamps are subject to event-loop clock skew; granting a workspace write-lock on timestamp creates a race and can let a 45-second plan win the lock over a 1-second fix. - **Disposition:** **ACCEPT-AS-FIX** (Gemini's patch verbatim) — order by risk reduction and lock-release speed, not timestamp. - **Fix (V3.3.1 §11.9) — `[Gemini BUG-04]`:** ``` RULE concurrency_tie_breaker (UPDATED): 1. OutcomeDependencySpec.required_for_overall_pass = true > false 2. EvaluationOutcomeDefinition.is_high_stakes = true > false 3. EvaluationOutcomeDefinition.priority ascending (lower value = wins) 4. RevisionPlan.plan_risk_score descending (safer plans acquire lock first — prevent catastrophic collisions) 5. RevisionCostEstimate.total_tokens ascending (smaller/faster plans execute and release the lock quicker) ``` ### B-17 — [FD §8.4 / §10.3] `instruction_in` overload leaks the DOC23/DOC15/DOC24 boundary (ACCEPT) - **Raised by:** Claude **B11**. - **Problem:** FD §8.4 carries both free-form instructions and typed `OutcomeRepairInstruction` payloads over the same general `instruction_in` port, with no discriminator. Receiving modules must runtime-guess the payload type; DOC15/CIL prompt assembly can't know what to render. The typed ports (`repair_instruction_in`, etc.) are marked "ergonomics, not required for V1," so every V1 implementation goes through the overload and solves discrimination differently. - **Disposition:** **ACCEPT.** Don't ship the overload. Either elevate the typed ports to required for V1 (preferred) or add a payload discriminator. - **Fix (FD §8.4 + DOC23 R3.1 port registry §10.3):** ```ts // Preferred: make these REQUIRED V1 ports (register in DOC23 R3.1 §10.3): // feedback_in, repair_instruction_in, run_guidance_in, source_need_in // Fallback if kept on instruction_in: payload union carries a discriminator DOC15/CIL dispatches on: type FeedbackPayloadKind = "free_form_instruction" | "outcome_repair_instruction" | "run_guidance" | "research_need"; // validation.instruction_in_untyped_feedback_payload (error) if a typed payload rides instruction_in without a discriminator. ``` --- ### B-18 — [V3.3.1 §6.5.2 / §7.9] `HardRevisionCall.options` may be empty though spec requires bounded options (ACCEPT) — Claude C3 - **Raised by:** Claude C3 (BUG/HIGH). - **Problem:** §6.5.1 says detection produces a Hard Call "with bounded `HumanDecisionOption[]`," but §7.9.1's schema has no non-empty constraint. Empty `options[]` makes the §21.4 UI non-functional (no buttons); the user can't resolve and the Dispatcher stays in `waiting_hard_call` indefinitely. The `default_if_no_response` trigger ("no response") is itself unspecified. - **Disposition:** **ACCEPT.** Constrain options to ≥2, provide a default pair, and specify the no-response timeout. - **Fix (V3.3.1 §7.9.1):** ``` options: HumanDecisionOption[] // MIN_LENGTH = 2 // When the Compiler cannot enumerate substantive options, default to: // ["continue_with_compiler_proposal", "pause_for_my_input"] // validation.hard_call_options_empty (error). // Specify the timeout that triggers default_if_no_response (e.g., HardCallPendingPolicy.timeout_ms, B-10). ``` ### B-19 — [V3.3.1 §11.6 / §0.4.7] `RevisionOperationKind="hard_call_resolved"` has no producer (ACCEPT) — Claude C10 - **Raised by:** Claude C10 (BUG/MEDIUM). - **Problem:** `hard_call_resolved` is a valid `RevisionOperationKind` and `HardCallResolution` is persisted, but no section says which actor emits the operation receipt. Operation receipts feed `RepairCycleSignal`; ambiguous actor → missing or duplicated receipt, breaking the `hard_call_resolved → revision_operation_receipt_ref` chain in `RevisorActionRecord`. - **Disposition:** **ACCEPT.** Name the producer. - **Fix (V3.3.1 §7.9.4, new):** ``` On recording a HardCallResolution, the Dispatcher emits a RevisionOperationReceipt with operation_kind = "hard_call_resolved" and hard_call_ref → the resolved Hard Call. receipt.actor_ref = Dispatcher runtime identity; resolution.resolved_by = UserRef (recorded separately). ``` ### B-20 — [V3.3.1 §6.7.2 / §11.21] Success-condition 5 races the cascade (ACCEPT) — Claude D2 - **Raised by:** Claude D2 (RISK/MEDIUM). - **Problem:** success-condition 5 ("cascaded dependent outcomes are re-evaluated") is checked at revision-cycle completion, but the §11.21 cascade can still be firing — so a revision can be marked successful before its own regression cascade settles. - **Disposition:** **ACCEPT.** Gate condition 5 on cascade quiescence. - **Fix:** the Loop Controller may evaluate success-condition 5 only when no revalidation is pending in the cascade chain (tracked via B-05's `max_revalidation_cascade_depth` chain state); `validation.success_marked_before_cascade_quiescent` (error). ### B-21 — [Core §9.0.6] Signal emission ordering vs receipt persistence undefined (ACCEPT) — Claude D7 - **Raised by:** Claude D7 (RISK/MEDIUM). - **Problem:** §9.0.6 shows signals emitted then passing the EC policy gate, but doesn't order signal emission against durable persistence of the receipts those signals reference — a consumer can receive a signal pointing at a not-yet-written receipt. - **Disposition:** **ACCEPT.** Enforce emit-after-persist. - **Fix (Core §9.0.6):** a learning signal MUST NOT be emitted until the receipts it references are durably written (read-your-writes guarantee at the EC policy gate); `validation.signal_references_unpersisted_receipt` (error). ### B-22 — [V3.3.1 §5.18 / §11.15] Pattern C doubles per-turn latency/cost with no budget (ACCEPT) — Claude D17 - **Raised by:** Claude D17 (RISK/MEDIUM). - **Problem:** Pattern C wires a Judge downstream of every standalone Evaluator; in an iterative revision loop this doubles evaluation latency and cost per turn, but no budget governs the Pattern C Judge separately from the Evaluator — a long loop silently doubles cost. - **Disposition:** **ACCEPT.** Add a Pattern C invocation budget / cadence. - **Fix:** ```ts interface PatternCInvocationBudget { max_judge_invocations_per_run?: number; cadence: "every_turn" | "every_n_turns" | "on_terminal_only"; n_turns?: number; // required when cadence = "every_n_turns" } // governs the Pattern C Judge independently of the upstream Evaluator. ``` ### B-23 — [FD §6.3] Multiple delivery branches fire simultaneously with no idempotency control (ACCEPT) — Claude D24 - **Raised by:** Claude D24 (RISK/MEDIUM). - **Problem:** `FeedbackRoutingPolicy` branches aren't stated to be mutually exclusive; one result can match several (e.g., `on_needs_revision` and `on_needs_more_sources`), firing multiple deliveries with no idempotency key or cost guard — duplicate or conflicting deliveries. (Complements A-08 routing completeness with the *exclusivity/idempotency* rule.) - **Disposition:** **ACCEPT.** State branch exclusivity, or an explicit multi-fire policy with idempotency keys. - **Fix (FD §6.3):** ```ts type RoutingFirePolicy = "first_match_only" | "all_matching_with_idempotency"; // when "all_matching_with_idempotency", each delivery carries // delivery_idempotency_key = hash(run_id + result_id + branch); duplicates suppressed. ``` --- ### B-24 — [V3.3.1 §6.7] Revisor sufficiency protocol has no detection algorithm (ACCEPT — **CRITICAL**) - **Raised by:** Grok §4.10 (RISK/CRITICAL). - **Problem:** the Revisor sufficiency protocol refers to "no sufficient procedure exists" but defines **no detection algorithm** for that condition — so the Revisor can't deterministically decide when to stop attempting and escalate. Left unspecified, it either loops or silently gives up, and two implementations behave differently on the same un-fixable outcome. - **Disposition:** **ACCEPT.** Define the detection criteria and the terminal route. *(Authored — no appendix schema; review the criteria set.)* - **Fix (V3.3.1 §6.7 — PROPOSED):** ```ts interface SufficiencyDetectionResult { outcome_id: string; no_sufficient_procedure: boolean; triggered_by: Array< | "no_capability_covers_outcome" // no module/sub-agent declares a covering capability | "max_attempts_without_score_improvement" // N attempts, ΔCalibratedScore < ε | "all_candidate_procedures_exhausted" // every applicable procedure tried and failed | "evidence_insufficient_to_proceed" // required sources/inputs unavailable (→ research need) >; attempts_made: number; best_score_seen?: CalibratedScore; recommended_route: "needs_human_judgment" | "unable_to_evaluate" | "needs_information"; schema_version: "1.0"; } // When no_sufficient_procedure = true, the Revisor MUST emit the recommended_route state (per the A-06 matrix) // and raise a Hard Call rather than re-attempting. validation.sufficiency_protocol_without_detection (error). ``` - **Rationale:** the missing stop-condition is a safety/cost issue for a litigator — without it the system can burn budget indefinitely or quietly produce nothing. Pairs with B-05 (cascade depth) and B-27 (regenerate loop guard) as the loop-termination set. ### B-25 — [V3.3.1 §11.3] `RevisionPlan` steps have no `depends_on` field, but the plan-lint checks "DAG acyclic" (ACCEPT — **CRITICAL**) - **Raised by:** Grok §5.1 (BUG/CRITICAL). - **Problem:** §11.3's deterministic plan-lint references a "DAG acyclic" check, but `RevisionPlan` steps carry **no dependency field** — so the DAG it's meant to validate isn't expressible in the schema, and B-04's DAG-safe rolling hash has no graph to walk. - **Disposition:** **ACCEPT.** Add the dependency field to the canonical step base (shared with B-26's `RevisionPlanStepBase`). - **Fix (V3.3.1 §11 — `[Appx, addendum]` `RevisionPlanStepBase`):** ```ts interface RevisionPlanStepBase { step_id: string; step_kind: RevisionPlanStepKind; depends_on_step_ids: string[]; // ← the missing DAG edges target_refs: StorageRef[]; revalidation_policy: RevisionStepRevalidationPolicy; // ← B-26 revalidation_rationale: string; } // §11.3 acyclic lint now operates on depends_on_step_ids; validation.revision_plan_step_dependency_undefined (error); // validation.revision_plan_cycle (error). B-04's rolling-hash sequencing reads this graph. ``` - **Rationale:** underpins B-04 (you can't have DAG-safe execution without a DAG) and co-resolves with B-26 — both are fields on one canonical `RevisionPlanStepBase`. ### B-26 — [V3.3.1 §7.5 / §11.7] Revalidation policy named inconsistently — three sources of truth (ACCEPT) - **Raised by:** ChatGPT Audit Addendum (BUG/HIGH). - **Problem:** the step base uses `revalidation_trigger`, the mutation protocol refers to `step.revalidation_expectation`, and the module-level typed instruction carries its own revalidation expectation — three possible governing fields, so the Dispatcher can't deterministically know which schedules revalidation. - **Disposition:** **ACCEPT.** Collapse to one base-level field on `RevisionPlanStepBase`. - **Fix (V3.3.1 §7.5 / §11.7 — `[Appx, addendum]`):** ```ts type RevisionStepRevalidationPolicy = | "none" | "revalidate_targeted_outcomes" | "revalidate_declared_dependents" | "revalidate_full_closure" | "revalidate_full_task"; // RevisionPlanStepBase.revalidation_policy is the SINGLE source of truth (B-25 schema). // Remove revalidation_trigger and step.revalidation_expectation. validation.revalidation_policy_ambiguous (error). ``` - **Note:** distinct from B-05 (the revalidation *cascade* convergence); this is the *field-naming* collision that tells the Dispatcher *whether/what* to revalidate. ### B-27 — [V3.3.1 §6.7 / `revision_in`] Ambiguous `regenerate` capability contract → identical-output loop (ACCEPT) — Gemini D-04 - **Raised by:** Gemini D-04 (MEDIUM). - **Problem:** if `regenerate` is triggered and the module returns the **same failing output** (deterministic temperature / rigid template), the Revisor loops, burning budget bouncing between identical regenerations. - **Disposition:** **ACCEPT.** The `revision_in` payload must carry prior-attempt hashes, and the module must reject a regeneration that reproduces a prior output. - **Fix (V3.3.1 `revision_in` contract):** ```ts // revision_in payload (for regenerate-capable modules): previous_attempt_hashes: string[]; // The target module's integration layer MUST reject a generation whose output hash ∈ previous_attempt_hashes, // forcing a temperature/top-k change or escalation. validation.regenerate_identical_output (error → escalate). ``` - **Rationale:** connects to B-02 (`attempt_class: "regenerate"` and the `previous_attempt_hash` referenced there) and B-24 (both are loop-termination guards). Closes the "regenerate forever" budget hole. --- --- # §C. Math / scoring / metrics ### C-01 — [V3.3.1 §15 / §3.10 / Core] No Formula Registry — scores are bare numbers (ACCEPT) - **Raised by:** Claude **§6.1 + D13**; ChatGPT CG-M1 / M2 / M4 / M6 / M11. *Review's highest-leverage structural fix; subsumes ~15 "this number has no formula" findings incl. D13 (`compiler_confidence_score`).* - **Problem:** `QualityIndex`, `DimensionScore`, `compiler_confidence_score`, reputation, novelty, template-match, cost — all referenced as numbers with no defined computation, inputs, range, units, missing-input policy, or test vectors. Two implementations produce different numbers for the same artifact. - **Disposition:** **ACCEPT.** One `FormulaRegistry`: every numeric score has a `FormulaSpec` (typed inputs, output type/range/units, missing-input + zero-denominator policy, test vectors). Add `CalibratedScore` so a 0.7 from one scorer is comparable to a 0.7 from another. - **Fix (new FormulaRegistry section, Common Contracts) — `[Appx J]`:** ```ts interface FormulaSpec { formula_id: string; owner_doc: OwnerDoc; inputs: TypedInput[]; output_type: string; units?: string; range?: [number, number]; missing_input_policy: "fail_validation" | "indeterminate" | "default_value" | "exclude_with_penalty"; zero_denominator_policy?: "undefined_insufficient_data" | "return_zero" | "return_one" | "fail_validation"; normalization_policy?: string; version: string; test_vectors: FormulaTestVector[]; } interface CalibratedScore { value: number; range: [number, number]; metric_version: string; calibration_ref?: StorageRef; confidence_interval?: [number, number]; sample_size?: number; explanation_ref?: StorageRef; } interface QualityIndex { aggregate_score: number; pass_threshold: number; required_gate_failures: string[]; aggregation_method: "all_required_then_weighted_mean" | "weighted_mean" | "min_required_score" | "all_or_nothing"; missing_dimension_policy: "fail_required" | "exclude_optional_with_penalty" | "indeterminate"; passed: boolean; metric_semantics_version: string; } // + DimensionScore, OutcomeComplianceScoringConfig, normalizeCriterionWeights() [Appx J]: // weights from criterion_weight | priority | uniform; unanchored_llm_judgment excluded from aggregation; // validation.criterion_weight_sum_zero / .no_aggregation_eligible_criteria. ``` - **Rationale:** D13 collapses into one `FormulaSpec` entry for `compiler_confidence_score`; the registry's `test_vectors` make every score CI-checkable. This is the structural keystone of the math cluster. ### C-02 — [V3.3.1 §15.1] Survivorship bias in `avg_revision_cycles_to_convergence` (ACCEPT-AS-FIX) - **Raised by:** Gemini **BUG-01**. - **Problem:** the metric's denominator is "outcomes reaching satisfied," so outcomes that burn the retry budget and escalate/abort are erased — the Revisor scores as highly efficient because only easy successes count. - **Disposition:** **ACCEPT-AS-FIX** (Gemini's bifurcation verbatim). - **Fix (V3.3.1 §15.1) — `[Gemini BUG-01]`:** ``` Metric: avg_cycles_to_success Denominator: outcomes transitioning to satisfied state Numerator: revision cycles spent on those specific outcomes Metric: wasted_cycle_burn_rate Denominator: total revision cycles executed globally Numerator: revision cycles spent on outcomes that ultimately aborted, escalated, or reached 'unrecoverable' ``` ### C-03 — [V3.3.1 §7.2.1] Hash-hallucination paradox in `RevisionPlan` mutation mode (ACCEPT-AS-FIX) ⚠verify - **Raised by:** Gemini **BUG-02**. - **Problem:** `rolling_hash_chain` requires the LLM Revision Compiler to emit a `predicted_post_hash` (a SHA-256/BLAKE3 string) for a mutation it will apply in the future. LLMs can't compute cryptographic hashes of future states; the Dispatcher's exact-match gate then fails ~100% of the time, crashing all multi-step in-place plans. (Distinct from B-04, which is the parallel-execution problem.) - **Disposition:** **ACCEPT-AS-FIX.** Remove `predicted_post_hash` from the LLM's output contract; lock on stable section anchors; the Dispatcher computes hashes at runtime. - **Fix (V3.3.1 §7.2.1) — `[Gemini BUG-02]`:** ```ts mutation_mode: "candidate_only" | "rolling_hash_in_place" in_place_mutation_locks?: Array<{ step_id: string; target_section_anchor_hash: string; // stable hash from §12.4.2, known at read-time expected_base_version_id: string; }> // predicted_post_hash REMOVED from RevisionPlan; Dispatcher computes expected hashes dynamically at runtime. ``` - **⚠verify:** confirm §7.2.1 still carries `predicted_post_hash`/`rolling_hash_chain` as an LLM output (Gemini quotes it; quick to confirm). Pairs with B-04. ### C-04 — [V3.3.1 §15.8.2] Flat-ratio math in "weighted" reputation score (ACCEPT-AS-FIX; reputation *use* → §E) - **Raised by:** Gemini **BUG-03**. - **Problem:** `advice_regression_rate` is documented as asymmetric-weighted (false-negative cost 2×) but the actual formula is a flat `regression_count / accepted_count` — 5 critical regressions score identically to 5 minor ones. - **Disposition:** **ACCEPT-AS-FIX** for the math. Note: the *consumer* of sub-agent reputation (how it gates routing/learning) is **DEFER-PhaseB** — tracked at E-06. - **Fix (V3.3.1 §15.8.2) — `[Gemini BUG-03]`:** ```ts advice_regression_rate: { accepted_count: number; regression_introduced_count: number; severity_breakdown: Record<"minor" | "major" | "critical", number>; unweighted_regression_rate: number; // regression_count / accepted_count severity_weighted_penalty_score: number; // ((minor*1)+(major*2)+(critical*5)) / accepted_count } ``` ### C-05 — [Core §16.2 / V3.3.1 §12.5] Thermal-throttling outliers skew latency means (ACCEPT-AS-FIX) - **Raised by:** Gemini **BUG-05**. - **Problem:** metrics use arithmetic `avg_duration_ms`; on local Apple Silicon, latency is violently bimodal (memory swap, cold starts, thermal throttling), so a few outliers create false "performance degradation" signals and useless Task Agent alerts. - **Disposition:** **ACCEPT-AS-FIX** — replace the mean with p50/p90 (or an IQR filter). - **Fix (Core §16.2 / Common Contracts `CostLatencyFinding`) — `[Gemini BUG-05]`:** ```ts CostLatencyFinding { // REMOVE: avg_duration_ms latency_distribution: { p50_duration_ms: number; p90_duration_ms: number; sample_size: number; } } ``` ### C-06 — [V3.3.1 §15.8 / cost] Zero-denominator & insufficient-sample handling missing; cost estimator ignores tail risk (ACCEPT) - **Raised by:** ChatGPT CG-M8 / CG-M9. - **Problem:** sub-agent metrics and the cost estimator can divide by zero and have no "insufficient data" path; the cost estimator uses point estimates and ignores tail risk. - **Disposition:** **ACCEPT.** Use the `FormulaSpec.zero_denominator_policy` (C-01) on every ratio metric; add an explicit insufficient-sample state and a tail-risk band to cost estimates. - **Fix:** every ratio metric declares `zero_denominator_policy: "undefined_insufficient_data"` and emits `CalibratedScore.sample_size`; `CostEstimate` (C-07) gains `p90_cost` alongside `expected_cost`; below a `min_sample_size` the metric returns `indeterminate`, not 0. ### C-07 — [Common Contracts §9A] No shared `CostEstimate` / `TaskCostRecord`; metric terminology misused (ACCEPT WITH MODIFICATIONS) - **Raised by:** ChatGPT CG-M7 / CG-M10 / CG-AA12. - **Problem:** cost types are redefined per-document, and several "quality metrics" misuse numerator/denominator terminology, so cost and quality numbers aren't comparable across modules. - **Disposition:** **ACCEPT WITH MODIFICATIONS.** Canonical `CostEstimate`/`TaskCostRecord` in Common Contracts §9A (already the registry's canonical home per A-09); add `BudgetFailureKind` and a `MetricKind` tag so each metric declares whether it's a rate/count/score. - **Fix:** ```ts interface CostEstimate { expected_cost: number; p90_cost: number; currency_or_token_unit: "usd" | "tokens"; basis: "historical" | "model_priced" | "heuristic"; sample_size?: number; schema_version: "1.0"; } type BudgetFailureKind = "estimate_exceeded" | "hard_cap_hit" | "insufficient_budget_to_start"; type MetricKind = "rate" | "count" | "calibrated_score" | "duration_distribution"; // all metric schemas tag metric_kind; cost types imported from Common Contracts §9A, never redefined. ``` ### C-08 — [V3.3.1 §novelty] Novelty score assumes an undefined metric space (ACCEPT) - **Raised by:** Claude (D-series); ChatGPT CG-M4. - **Problem:** novelty is scored without defining the feature space, distance metric, range, or the no-pattern fallback. - **Disposition:** **ACCEPT.** Adopt Appendix L's `NoveltyMetricSpec` + `NoveltyAssessment`. - **Fix — `[Appx L]`:** ```ts interface NoveltyMetricSpec { metric_id: string; feature_vector_definition_ref: StorageRef; embedding_model_ref?: string; distance_metric: "cosine_distance" | "euclidean_normalized" | "jaccard" | "hybrid"; distance_range: [0,1]; no_pattern_fallback: "novelty_one" | "indeterminate" | "use_domain_baseline"; calibration_dataset_ref: StorageRef; threshold_default: number; metric_semantics_version: string; schema_version: "1.0"; } interface NoveltyAssessment { input_signature_hash: string; closest_pattern_id?: string; closest_pattern_distance: number; similarity_score: number; novelty_score: number; forces_fresh_reasoning: boolean; triggers_task_agent: boolean; metric_spec_ref: string; metric_semantics_version: string; schema_version: 2; } ``` ### C-09 — [V3.3.1 §task-mode / §template-match] Task-mode not calibrated to protect direct-first; `TemplateMatchScore` no aggregation; magic 0.7 (ACCEPT) - **Raised by:** ChatGPT CG-M5 / CG-M6; Claude D12. - **Problem:** task-mode scoring isn't calibrated enough to reliably protect the direct-first default; `TemplateMatchScore` has no aggregation formula; planner confidence uses a bare `0.7` magic number with no surfaced threshold. - **Disposition:** **ACCEPT.** Adopt Appendix K's `TaskModeScoringFunction` + `TemplateMatchFormula` + `computeTemplateOverallScore`; surface the active threshold as config, not a literal. - **Fix — `[Appx K]`:** ```ts interface TaskModeScoringFunction { formula_id: "task_mode_score_v1"; feature_weights: Record; veto_precedence: Record; score_range: [0,1]; thresholds: { none_max: number; low_max: number; medium_min: number; high_min: number; explicit_rule: "explicit_task_request_or_existing_task_reference"; }; calibration_metric_ref: StorageRef; schema_version: "1.0"; } interface TemplateMatchFormula { formula_id: "template_match_score_v1"; component_weights: { semantic_intent_match: number; task_type_match: number; input_contract_match: number; output_contract_match: number; capability_availability_match: number; entity_context_match: number; user_preference_match: number; prior_assessment_score: number; recency_or_staleness_score: number; }; hard_veto_cap: number; soft_penalty_max_total: number; schema_version: "1.0"; } // computeTemplateOverallScore(): weighted mean / totalWeight, minus softPenaltySum, capped at hard_veto_cap if any hard veto. // validation.template_match_component_out_of_range. ``` ### C-10 — [V3.3.1 §criterion] `criterion_semantics_hash` is lexical, not semantic (ACCEPT) - **Raised by:** Claude (C-series); ChatGPT CG-M3. - **Problem:** the field named `criterion_semantics_hash` is computed lexically, so a wording change with identical meaning looks like a semantic change (and vice versa) — misleading any consumer that gates on it. - **Disposition:** **ACCEPT.** Either add a semantic-stability/drift-review rule or rename the field to reflect that it's lexical. - **Fix:** rename to `criterion_text_hash` (lexical) and, where semantic stability matters, add a separate `criterion_semantics_version` bumped only on a reviewed meaning change; `validation.criterion_semantics_version_unbumped_on_meaning_change` (advisory, human-reviewed). --- --- # §D. Source Workspace / Forum / Core / Taint ### D-01 — [Source Workspace §4.1 / V3.3.1 §15.10 / Task Forum §8.3] Transitive taint laundering via sub-agents and forum (ACCEPT — **compliance blocker**) - **Raised by:** Gemini **F-03**; Claude **C8**; ChatGPT (GAP/HIGH, taint into feedback/forum). The review's #2 distributed-systems gap. - **Problem:** Two coupled defects. (1) `SourceRecord.taint_class` is "inherited from source kind / retrieval method," but **no `source_kind → taint_class` map exists** — the 15 source kinds map to none of the 8 taint classes, so two implementations pick different defaults (a PACER `case_law` source is `external_authority_trusted` in one, `external_untrusted` in another). (2) **Taint is not transitive**: a tainted artifact posted to the Forum (§8.3 mentions stream) is summarized by sub-agents into "clean" critiques that don't inherit the `external_untrusted` label; when the Revisor consumes that critique, the adversarial payload re-enters through an untainted summary, bypassing the safety boundary. For a litigator this is a privilege/firewall-grade defect. - **Disposition:** **ACCEPT.** Define the default map, make taint a transitive graph property across all sub-addenda, and require a `SanitizationNode` to clear it. - **Fix (Source Workspace §4.1A + cross-addenda taint rule + `[Appx G]` `TaintAggregationPolicy`):** ```ts // §4.1A default taint_class per source_kind (override per query via taint_class_override): // document | email | file | prior_task_output → user_trusted_bounded // web_source → external_untrusted // api_result | database_record → external_authority_trusted // case_law | statute | regulation → external_authority_trusted // library_entry → internal_corpus_trusted // TRANSITIVE RULE: any output of a module or sub-agent that consumed a tainted input MUST inherit the // maximum taint of its inputs, UNLESS it passes through a declared SanitizationNode. interface SanitizationNode { node_id: string; input_taint: TaintClass; output_taint: TaintClass; method: "human_review" | "structural_strip" | "policy_filter"; evidence_ref: StorageRef; } // TaintAggregationPolicy [Appx G]: workspace_taint_rule "max_taint"; any privileged record → workspace privileged. // validation.taint_not_inherited_through_summary (error); validation.source_kind_taint_unmapped (error). ``` - **Rationale:** closes the laundering path the review calls catastrophic; the `SanitizationNode` is the only sanctioned taint-downgrade, so a coding agent can't silently launder by summarizing. ### D-02 — [Source Workspace §2 / Governance] Workspace taint aggregation undefined (ACCEPT) - **Raised by:** ChatGPT (GAP/HIGH). - **Problem:** a workspace holds many `SourceRecord`s of differing taint, but no rule says what the workspace's aggregate taint is — so downstream consumers can't reason about the set. - **Disposition:** **ACCEPT.** Adopt `TaintAggregationPolicy` (Appx G): `workspace_taint = max_taint` over records; any privileged record marks the workspace privileged; matter-scope rule explicit. (Same policy object as D-01.) ### D-03 — [Source Workspace §0.3 / V3.3.1 §12] `TaskSourceWorkspace` vs `SourceWorkspace` identity split (ACCEPT) - **Raised by:** ChatGPT (BUG/HIGH). - **Problem:** the task-scoped `TaskSourceWorkspace` and V3.3's `SourceWorkspace` are described as if distinct, with no statement of whether they're one type or two — a coding agent builds two stores. - **Disposition:** **ACCEPT.** One canonical workspace identity in the TypeOwnerRegistry (A-09); the task-scoped form is a view/parameterization, not a separate type. `validation.dual_source_workspace_identity` (error). ### D-04 — [Source Workspace §3 / §4 / §2.3] Source-tier defects: Tier 0, unstored transitions, misaligned verification, demotion authority (ACCEPT) — incl. Claude C13 - **Raised by:** ChatGPT (Tier 0 contradiction; transitions defined-not-stored; verification states misalign); Claude **C13** (tier transition policy). - **Problem:** `0` appears as a tier but contradicts `SourceRecord.tier`; `SourceTierTransition` is defined but not persisted; verification states across §2.3/§4.1 don't align; and there's no rule preventing a read-only user from **demoting** a tier-3 card to tier-1 (dropping rich content), per C13. - **Disposition:** **ACCEPT.** Remove `0` from the tier domain; persist transitions; align verification-state vocab; gate demotion by access tier. - **Fix:** ``` SourceRecord.tier ∈ {1,2,3} (remove 0). Persist every SourceTierTransition (§3.3). §3.3A tier-transition policy: demotion (from_tier > to_tier) requires cleared_by_access_tier >= "matter_team_access" AND a non-empty reason; promotion unrestricted. validation.source_tier_demotion_without_authority (error). Align verification states to one enum across §2.3 and §4.1. ``` ### D-05 — [Source Workspace §6.2 / §7.4] `ResearchNeed` scoping, ref types, and `human_needed` exit (ACCEPT) — incl. Claude D6 - **Raised by:** ChatGPT (`run_id` conflicts with task-scoped workspaces; wrong ref types; routed to wrong port); Claude **D6** (`human_needed` no exit). - **Problem:** `ResearchNeed.run_id` conflicts with task-scoped workspaces; source/target refs use the wrong reference types; and the `human_needed` status has no exit (who resolves it, what transitions out, what happens at run end). - **Disposition:** **ACCEPT.** Adopt the canonical `ResearchNeed` (Appx G) with an explicit `need_scope`, corrected refs, and a `human_needed` exit + run-end default. - **Fix — `[Appx G]`:** ```ts // ResearchNeed adds: need_scope: "run" | "task" | "workspace"; status includes "human_needed" with exits → // "answered" | "unresolved" | "cancelled"; target uses ArtifactScopeRef|ClaimRef (not raw string). // Run-end default: any ResearchNeed still "human_needed"/"open" at run end → "unresolved" with a carry note. // (Lease for concurrency = ResearchNeedLease, B-13.) ``` ### D-06 — [Source Workspace §4 / §5 / §6.3] Evidence anchors, payload registry, weak tool-receipts, extractor-vs-source confusion (ACCEPT) - **Raised by:** ChatGPT (evidence anchors; `domain_payload` registry/versioning; tool-receipt-as-research too weak; missing-extractor mis-reported as missing-source). - **Problem:** source records can't anchor evidence to specific claims; `domain_payload` is unversioned/unregistered; a tool receipt is accepted as "material research" with too little structure; and a missing claim-extractor is reported as a missing source. - **Disposition:** **ACCEPT.** Add first-class evidence anchors + a payload registry; distinguish extractor-missing from source-missing. - **Fix — `[Appx G]`:** ```ts // SourceEvidenceAnchor { anchor_kind: page|quote|section|timestamp|row|url_fragment|paragraph|line_range; // supports_claim_refs[]; support_strength: direct|indirect|contextual|contradicts; extracted_text_ref } // DomainPayloadRegistryRef { domain_payload_kind; domain_payload_schema_ref; domain_payload_version } // New status distinguishing claim_extractor_unavailable from source_unavailable. ``` ### D-07 — [Task Forum §5.2 / §5.3] Forum posts: visibility, supersession, governance envelopes (ACCEPT) - **Raised by:** ChatGPT (`selected_modules` visibility broken; supersession described-not-modeled; posts need governance envelopes); Grok GK-4.7/5.4. - **Problem:** `selected_modules` visibility can't actually represent the selected modules; supersession is prose-only (no fields); and posts carry no governance (data_class/taint/policy/privilege). - **Disposition:** **ACCEPT.** Adopt the hardened `TaskRunBoardPost` (Appx H). - **Fix — `[Appx H]`:** ```ts // TaskRunBoardPost adds: visibility_target_refs: VisibilityTargetRef[] (fixes selected_modules); // lifecycle_state: "active"|"superseded"|"retracted" + supersedes_post_ids[] + superseded_by_post_id + supersession_reason; // governance: RunBoardGovernanceEnvelope { data_class, taint_class, policy_decision_refs, sanitization_required, // governance_class, privileged, matter_id }. ``` ### D-08 — [Task Forum §6.3 / §6.4] Context-packet ownership, request/receipt, omission manifest, audience-enum divergence, total budget (ACCEPT) — incl. Claude D18 - **Raised by:** ChatGPT (packet needs request/receipt/freshness/omission manifest; digest/packet audience enums diverge); Grok GK-4.3; Claude **D18** (token budget fragmented across packets). - **Problem:** `TaskRunContextPacket` overlaps DOC24's packet ownership, lacks a request/receipt/freshness/omission manifest, and its audience enum diverges from the digest's. Separately (D18), token budgets are declared per-packet with no authority checking the **sum** a module receives. - **Disposition:** **ACCEPT.** Adopt the request/receipt/omission types (Appx H) and a per-activation total-context budget enforced by the assembler. - **Fix — `[Appx H]`:** ```ts // TaskRunContextPacketRequest { audience: TaskContextAudience, requested_max_tokens, required/optional_item_refs, // staleness_policy } → TaskRunContextPacketAssemblyReceipt { packet_content_hash, actual_token_count, // omitted_items: OmittedPacketItem[], valid_until_event_seq, invalidated_by_event_kinds }. // Single TaskContextAudience enum shared by digest + packet. // D18: PerActivationContextBudget — the assembler MUST sum context packet + board digest + feedback bundle and // enforce one cap; validation.per_activation_context_budget_exceeded (error). ``` ### D-09 — [Task Forum §1 / §5] Passive board auto-publishes every event — privacy/volume controls (ACCEPT) — Claude B12 - **Raised by:** Claude **B12**; ChatGPT (RISK/HIGH). - **Problem:** the passive board "auto-publishes every event," which is unbounded and leaks content with no privacy/volume gate. - **Disposition:** **ACCEPT.** Every post carries the `RunBoardGovernanceEnvelope` (D-07) and the board applies a publication policy (volume cap + class filter). - **Fix:** add `RunBoardPublicationPolicy { auto_publish_post_kinds[], max_posts_per_run, suppress_classes: ("privileged"|"local_only")[], rate_limit_per_min }`; posts failing the policy are withheld with a receipt. Pairs with D-12 retention. ### D-10 — [Task Forum §7.2 / §3.2 / §4.6 / §3.1] ModuleAssistanceRequest schema, participant/moderator model, payload schemas, moderator-failure (ACCEPT) — incl. Claude D3 - **Raised by:** ChatGPT (request lacks endpoint/lease/timeout/answer schema; participant policy only models modules; moderator condition incoherent; `decision_out`/`signal_out` payloads missing); Claude **D3** (moderator failure path). - **Problem:** `ModuleAssistanceRequest` has no endpoint/lease/timeout/answer schema; participant policy can't model non-module participants; the moderator-required condition is incoherent; `decision_out`/`signal_out` payloads are undefined; and there's no behavior when the moderator agent is unavailable (D3). - **Disposition:** **ACCEPT.** Adopt the full `ModuleAssistanceRequest` (Appx H) + a moderator fallback analogous to the Task Agent fallback. - **Fix — `[Appx H]`:** ```ts // ModuleAssistanceRequest { target, target_endpoint_ref, answer_schema_ref, request_kind, response_policy, // lease {holder_ref, lease_version, expires_at}, timeout_ms, on_timeout: resume_with_warning|abort|escalate_human|ask_task_agent }. // Participant policy models module|subagent|task_agent|user. decision_out/signal_out get explicit payload schemas. // §4.6A moderator fallback: moderator unavailable → degrade to "none" | pause | queue (default: ask_task_agent then pause). ``` ### D-11 — [Run Board §3.1 / §6.2] BoardDigest filter rule unspecified (ACCEPT) — Claude C9 - **Raised by:** Claude **C9**. - **Problem:** `BoardDigest` carries `included_post_ids` but the **selection rule** is unspecified — a 500-post forum can't ship all posts in a ~1200-token digest, and implementations pick differently. - **Disposition:** **ACCEPT.** Extend `BoardDigestPolicy` with explicit selection. - **Fix:** ```ts // BoardDigestPolicy adds: included_post_kinds: TaskRunBoardPostKind[]; included_severity_threshold; // max_posts: number; selection_strategy: "recency"|"severity"|"score"|"mixed". // Default: kinds {evaluation_finding, repair_instruction, process_gap, user_guidance}; severity ≥ medium; max 30; mixed. ``` ### D-12 — [Run Board §retention / Source Workspace §9] Run Board retention/compaction/event-class + EC-policy persistence (ACCEPT) - **Raised by:** ChatGPT (retention/compaction/event-class missing; persistence should reference EC policy). - **Problem:** the Run Board has no retention, compaction, or event-class taxonomy, and persistence doesn't reference the EC policy decisions that should govern it. - **Disposition:** **ACCEPT.** Add retention + compaction policies and an event-class taxonomy; bind persistence to EC policy. - **Fix:** ```ts type RunBoardEventClass = "post" | "digest" | "assistance_request" | "moderation" | "lifecycle"; interface RunBoardRetentionPolicy { retain_event_classes: RunBoardEventClass[]; retain_for_days: number; ec_policy_decision_ref: PolicyEvaluationRef; } interface RunBoardCompactionPolicy { compact_after_days: number; strategy: "summarize"|"drop_low_severity"|"archive"; } ``` ### D-13 — [FD §RunGuidance / TaskBlueprint] RunGuidanceItem persistence, cross-run injection, lifecycle, contested-check (ACCEPT) — incl. Gemini D-01 - **Raised by:** ChatGPT (promotion + lifecycle receipts; `contested` never checked before use); Gemini **D-01** (cross-run injection vector unspecified); Grok GK-5.8. - **Problem:** `RunGuidanceItem`s are generated from feedback but FD never says **where they persist** or **how a Run-A item reaches Run B**; `lifecycle_state="contested"` is never checked before a consumer uses the guidance; and there's no promotion/lifecycle receipt. - **Disposition:** **ACCEPT.** Persist guidance durably and define the cross-run injection vector + lifecycle. - **Fix (Gemini D-01 vector):** ```ts // RunGuidanceItem is written to TaskBlueprint.persistent_guidance[] via an EC durable write (local intent), // OR mapped to a DOC24/BDSM ledger update — pick one authoritative store (recommend TaskBlueprint for local intent). // lifecycle_state: "proposed"|"active"|"contested"|"superseded"; consumers MUST skip items in "contested". // Emit RunGuidanceLifecycleReceipt on each transition. validation.run_guidance_consumed_while_contested (error). ``` - **Note:** the cross-run *learning* use (precedence vs DOC72) is the Memory-Precedence-Hierarchy item, **§E (E-02)**; this row only fixes durable persistence + the contested-check. ### D-14 — [Core §3D / §4B / §5A] InjectionSlotRegistry, compact card schemas, command registry, token_budget, DOC24 capability ownership, receipt booleans (ACCEPT) — incl. Claude C15 - **Raised by:** ChatGPT (InjectionSlotRegistry gap CRITICAL; compact card schemas missing; Core defines a DOC24-owned capability; `TrackedTaskReceipt` booleans need command refs; command registry should be mandatory); Claude **C15** (packet taxonomy unstated); Grok GK-5.11 (`token_budget` never populated). - **Problem:** the `InjectionSlotRegistry` for the task-system DOC24 slots is unspecified; the compact top-k card schemas are missing; Core normatively defines a DOC24-owned capability; `TrackedTaskReceipt` carries action booleans without command refs; `TaskOpportunityPacket.token_budget` is never populated/checked; and the packet taxonomy (opportunity vs run-context vs design) is unstated. - **Disposition:** **ACCEPT.** Adopt the Core appendix wholesale; move the capability definition to DOC24; make the command registry mandatory. - **Fix — `[Appx I]`:** ```ts // TaskSystemInjectionSlotRegistration { slot_id (6 DOC24 slots), slot_kind, surfaces, token_cap, // pii_redaction_required, on_unavailable: omit|degrade_direct_first|block_explicit_task_route, receipt_required }. // Compact cards: CompactTaskInvocationDirectiveCard / CompactTaskTemplateCard / CompactModulePresetCard // (each with CalibratedScore, risk_flags, token_estimate, redaction_state, source_authority). // TaskCommandRegistryEntry (mandatory) { request/response_schema_ref, idempotency_key_required, durable_write, // telemetry_event_kind, read_model_invalidations, failure_codes, required_policy_checks, owning_doc }. // TrackedTaskReceipt.available_actions: AvailableTaskAction[] (each → command_ref + idempotency_key_required). // Packet taxonomy subsection (C15): TaskOpportunityPacket (pre-task), TaskRunContextPacket (in-run), TaskAgentDesignPacket (future). // Capability def moves to DOC24 (Core references). token_budget populated + checked at assembly. ``` ### D-15 — [V3.3.1 §8.4 / §17.1] Sub-agent: output-contract plurality, no-sub-agent fallback, coordination-point count (ACCEPT) — incl. Claude B10, D1 - **Raised by:** ChatGPT (`output_contract_ref` singular vs variant semantics; no-sub-agent fallback incomplete); Claude **B10** (no fallback at evaluator point) + **D1** (coordination point count 4 vs 5). - **Problem:** `output_contract_ref` is singular but variant semantics need plurality; the "no sub-agent available" fallback is incomplete (esp. at the evaluator coordination point); and §17.1 lists five coordination points while §8.4's enum has four (no `plan_verifier`), so a profile can't declare for the fifth. - **Disposition:** **ACCEPT.** Plural output contracts; a fallback policy per coordination point; reconcile to five points. - **Fix:** ```ts // §8.4 allowed_coordination_points += "plan_verifier" (reconcile to §17.1's five). // output_contract_refs: StorageRef[] (was singular). interface SubAgentFallbackPolicy { coordination_point: AllowedCoordinationPoint; on_no_sub_agent: "use_primary_module" | "skip_with_warning" | "hard_call" | "degrade_quality_with_note"; } // every coordination point declares a fallback; validation.sub_agent_point_without_fallback (error). ``` ### D-16 — [Source Workspace / Forum] Workspace API operations referenced but never defined (ACCEPT) ⚠verify - **Raised by:** Grok GK-4.2. - **Problem:** workspace operations (create/read/append/lock) are referenced but never defined — a circular reference with no API surface. - **Disposition:** **ACCEPT.** Define the workspace API (operations, args, receipts), bound to the command registry (D-14) and write-precondition (B-03). - **⚠verify:** confirm the current state of any partial workspace-API definition before authoring. ### D-17 — [Common Contracts §7 / V3.3.1 §7.9.3] Anchor/hash hygiene: empty StructuredAnchor, uncomputed context_hash, HardCall hash normalization (ACCEPT) — incl. Claude C7, D14 - **Raised by:** Grok GK-5.5 (`TextAnchor.context_hash` never computed/validated); Claude **C7** (`StructuredAnchor {}` validates but is useless) + **D14** (`HardCallResolution` hash normalization unspecified). - **Problem:** `context_hash` is defined but never computed/validated; a `StructuredAnchor` with all-optional fields can be validly empty yet un-resolvable; and `HardCallResolution` reuse compares hashes whose normalization is unspecified (cosmetic diffs → needless re-escalation). - **Disposition:** **ACCEPT.** Compute/validate anchors; require non-empty structured anchors; specify canonical normalization before hashing. - **Fix:** ``` StructuredAnchor MUST populate ≥1 of {section_id, field_path, citation_ref}; validation.structured_anchor_empty (error). TextAnchor.context_hash MUST be computed at creation and validated at resolve; validation.context_hash_unverified (warning). HardCallResolution: canonicalize outcome_definition_hash / goal_context_hash inputs (trim, sort keys, normalize whitespace) before hashing (§7.9.3); validation.hard_call_hash_unnormalized. ``` ### D-18 — [V3.3.1 §repeated-failure] Repeated-failure detection keyed on versioned refs (ACCEPT) - **Raised by:** Grok GK-5.12. - **Problem:** repeated-failure detection keys on `affected_artifact_refs`, which are version-stamped, so the same logical artifact across versions doesn't match — repeats go undetected. - **Disposition:** **ACCEPT.** Key on stable artifact identity, not the versioned ref. - **Fix:** detection key = stable `artifact_id` (+ `outcome_definition_id`), not `artifact_version_ref`; `validation.repeated_failure_keyed_on_version` (lint). ### D-19 — [Source Workspace §SourceRecord] Per-module cost attribution has no field (ACCEPT) - **Raised by:** Grok GK-5.10. - **Problem:** cost can't be attributed per module on `SourceRecord` — no field — so per-module cost reporting (C-07 / BudgetNarrative) has nothing to read. - **Disposition:** **ACCEPT.** Add a per-module cost field. - **Fix:** `SourceRecord.acquisition_cost?: CostEstimate` (C-07 type) + `acquired_by_module_id`; feeds the cost rollups. ### D-20 — [Source Workspace §library] Library promotion gate references EC policy but is undefined (ACCEPT) - **Raised by:** Grok GK-4.10. - **Problem:** promotion of a workspace source into the durable Library references an EC policy gate, but the gate itself isn't defined. - **Disposition:** **ACCEPT.** Define the promotion gate (criteria + policy check + receipt). - **Fix:** `LibraryPromotionGate { min_tier, requires_verification_state, ec_policy_decision_ref, requires_access_tier }`; promotion emits a receipt; `validation.library_promotion_without_gate` (error). ### D-21 — [V3.3.1 / Core] `requires_background_progress` overloaded (ACCEPT) - **Raised by:** ChatGPT (CG audit addendum). - **Problem:** `requires_background_progress` is used for two different meanings (a run needing background work vs a module needing periodic progress) with no disambiguation. - **Disposition:** **ACCEPT.** Split into two named fields. - **Fix:** replace with `requires_background_execution: boolean` (run-level) and `emits_progress_heartbeat: boolean` (module-level); migrate references. ### D-22 — [Common Contracts §11.5 / cross-refs] Backward-compat overstated; section-anchor hygiene before R3.2 absorption (ACCEPT WITH MODIFICATIONS) — Claude D5/D22 - **Raised by:** Claude **D5** + **D22**. - **Problem:** §11.5's backward-compat claim overstates stability, and the set uses section-number cross-refs where stable anchors are needed before the DOC23 R3.2 absorption. (Overlaps A-15; tracked there for the build-ready gate.) - **Disposition:** **ACCEPT WITH MODIFICATIONS.** Soften the compat claim to "compatible within the locked schema-of-record set; no cross-version guarantee yet"; convert section-number cross-refs to stable anchors as pre-absorption hygiene. - **Fix:** anchor-stabilization pass across the set before R3.2 absorption; `validation.unstable_section_cross_ref` (lint). Coordinate with A-15. ### D-23 — [FD §9.4] "Silent ignoring fires validation" is unenforceable (ACCEPT) — Claude C11 - **Raised by:** Claude **C11**. - **Problem:** §9.4 says silent ignoring (no receipt) fires `validation.feedback_consumed_without_receipt` at audit, but the audit has no way to enumerate *expected* receipts — so it never fires; a module that processed-and-ignored looks identical to one that received nothing. - **Disposition:** **ACCEPT.** Track receipt expectations at dispatch so the audit can compare. - **Fix:** ```ts // §9.4A: when the router (§6) dispatches a bundle, record a FeedbackDispatchExpectation keyed to // (feedback_bundle_id, consumer_module_id, consumer_activation_seq). At run end (or after a grace period, // default 5 min post-activation) the audit compares expectations ↔ FeedbackConsumptionReceipts [Appx F]; // missing pairs fire validation.feedback_consumed_without_receipt. ``` ### D-24 — [Run Board §5.4 / Source Workspace §9.4] Cross-matter forum-post visibility unspecified (ACCEPT — **privilege firewall**) — Claude C12 - **Raised by:** Claude **C12**. - **Problem:** the 5 visibility values don't restrict by matter; a privileged-matter post with `visibility: all_task_modules` has no stated rule about whether a module in **another matter** can read it. For a litigator, an implicit (therefore unenforced) matter boundary is a privilege-breach risk. - **Disposition:** **ACCEPT.** Make matter-scoping explicit and prior to `visibility`. - **Fix (Run Board §5.5):** ``` A post with matter_id == X is visible ONLY to readers operating under matter_id == X, regardless of `visibility`; `visibility` scopes WITHIN the matter. privileged: true is always matter-scoped, never cross-matter at any access tier. validation.forum_post_cross_matter_leak (error). ``` - **Rationale:** complements D-01 — taint transitivity stops adversarial-payload laundering; this stops privileged-content leakage across the matter firewall. Together they close the two forum-borne confidentiality holes. --- --- # §E. HELD — Phase-B-gated / self-learning (tracked, not adjudicated for build) Per your instruction: held but tracked. Gated on the Phase-B corpus audit (writing these now = guessing their own spec). - **E-01 Memory Hydration phase** — pre-run `HydrateMemory` step (query BDSM/DOC72/RunGuidance, resolve by precedence, inject priors, stamp `HydratedMemoryHash`). *Architectural absence; adopted in principle, build gated.* (Gemini §7; Claude §6.1/§6.3) - **E-02 Memory Precedence Hierarchy** — `Local Intent > Matter/Scope Policy > Global DOC72`. (Gemini §7; Claude §6.1/§6.3) - **E-03 SubAgentPrior injection** ("Sub-Agent Amnesia") — inject last-N relevant findings/RepairInstructions per sub-agent. (Gemini GM-V1; Claude §6.1/§6.3) - **E-04 TaskBlueprint Topology/Payload bifurcation** — `TaskTopology` (generalized) vs `TaskPayload` (ephemeral). (Gemini §7; Claude §6.1/§6.3) - **E-05 Flawless-execution denominator** (learning side) — count clean passes via verdict-aware `OutcomeEvaluationSignal` so utility doesn't falsely decay. *(This is the real home of the deleted `EvaluationAffirmation`, A-09.)* (Gemini GM-V2; Claude §6.1) - **E-06 Sub-agent reputation** model/slices/model-class calibration (C-04's weighting feeds this). (Claude D22; ChatGPT CG-SA3/CG-SL12) - **E-07 `LearningMode` consumption behavior** (A-11's value set is adopted; the behavior is held); `goal_advancement_count` decrement (D4); `cross_model_applicability` runtime behavior (D21). - **E-08 Process-gap → design-pattern loop** (A6) + substantive-vs-process gap enforcement. (Claude A6; Grok GK-3.5) - **E-09 Longitudinal pattern view (S6)** and other learning-touching surfaces. - **E-10 All ChatGPT `[SELF-LEARNING]` items (CG-SL1..12)** incl. TIE / Task Improvement Engineer (**Appendix P**, held), LoopEffectivenessTestRunRecord, BDSM utility-compilation gap, revealed-preference dampening, outcome clustering, UserConstitution prior, multi-prior conflict, PlanDiff/ProposalDiff, active-learning bundle. - **E-11 Design-Feedback as peer to Artifact-Repair** — elevate Task Agent graph-patch proposals to first-class learned outcomes. (Grok GK-3.2/N1) - **E-12 Automatic Pattern Suggestion UI** (≥2 failures → top-3 similar patterns). (Grok GK-N4) --- ### E-13 — [Common Contracts learning envelope] Multi-user forward-compatibility fields (DEFER-PhaseB; fields addable now) - **Raised by:** ChatGPT (GAP/HIGH); Claude self-learning analysis. - **Problem:** the learning-signal envelope has `data_class` and `matter_id` but lacks `principal_id`, learning scope, share eligibility, and scope-inference basis. Retrofitting team/firm/networked learning scope later — over privilege and matter data — is dangerous. - **Disposition:** **DEFER-PhaseB for behavior; fields addable now as forward-compat** (same pattern as A-11 value-set / C-04 math: adopt the schema surface now, defer the multi-user learning *behavior* to Phase B). Flagged here so it isn't lost; the actual networked-learning semantics are part of the Phase-B learning spec. - **Fix (forward-compat fields, addable now):** ```ts // add to learning signal envelope / learning artifacts: principal_id: string; learning_scope: "local" | "matter" | "team" | "firm" | "networked"; scope_inference_basis: string; default_scope_rule: string; share_eligibility: "none" | "opt_in" | "policy_gated"; ``` --- --- # §F. DECLINED — review §6.2 (closed; not re-opened) | # | Proposed | Raised by | Why declined / adopted-instead | |---|---|---|---| | F-01 | Literal Git-style branching / ShadowWorkspace as core primitive (Branch & Merge) | Gemini headline, Grok GK-N2 | Side effects can't branch (phantom). **Instead:** `TaskRunFork` + `irrevocable_side_effects_at_fork`. | | F-02 | `TaskConfirmationSignal` as a new signal type | Gemini §7 | Duplicates existing signal/receipt machinery. **Instead:** fold into existing signals/receipts. | | F-03 | Flawless-execution as a new signal type | Gemini GM-V2 | Redundant. **Instead:** verdict-aware `OutcomeEvaluationSignal` (denominator need kept — see E-05). | | F-04 | Chunking findings for KV-cache bloat | Gemini D-02 | Fragments the unit. **Instead:** compressed-envelope view at prompt-assembly. | --- --- # §G. Conceptual / UX / surfaces / Professional Reliance Layer > **Why this cluster matters (plain language):** §A–§D make the engine *correct*. §G is what makes the output something a litigator can actually *rely on and hand to a partner* — a pre-flight check on what the system thinks you asked, a reviewable diff of every change, an evidence binder tying each claim to its source, a plain-English cost/quality account, and a single "can I rely on this, and within what limits" cover memo. The review calls this the biggest product opportunity in the set, and it's the part that differentiates ELNOR from a generic agent in 2027. ## Professional Reliance Layer (Appendix O + M) ### G-01 — [new · Common Contracts / Core] `EvaluationContractReview` — pre-execution contract check (ACCEPT) - **Raised by:** ChatGPT (IDEA/CRITICAL). - **What & why:** before the system spends tokens/time, it surfaces *what it thinks you asked for and how it will judge success* — interpreted goal, criteria, thresholds, source requirements, required capabilities, and the Hard-Call triggers — and lets you approve, edit, or waive. Catches a misread brief before the cost is incurred (the cheapest possible place to catch it). **User-facing:** a short "here's the plan and the bar; approve to proceed" card. - **Disposition:** **ACCEPT** as a new pre-execution artifact (gate optional per task autonomy). - **Fix — `[Appx O]`:** ```ts interface EvaluationContractReview { review_id: string; task_id: string; run_id?: string; compiled_plan_ref: StorageRef; interpreted_goal: string; criteria_summary: string[]; threshold_summary: string[]; source_requirements: string[]; required_capabilities: CapabilityRef[]; hard_call_triggers: HardRevisionCallKind[]; material_differences_from_preview?: string[]; user_approval_required: boolean; approval_status: "pending" | "approved" | "rejected" | "edited" | "waived_by_policy"; approval_ref?: StorageRef; created_at: ISO8601; schema_version: "1.0"; } ``` ### G-02 — [new · FD / V3.3.1] `RevisionReviewPacket` — reviewable packet for every meaning-bearing revision (ACCEPT) - **Raised by:** ChatGPT (IDEA/HIGH). - **What & why:** every meaning-bearing revision emits a packet showing the before→candidate semantic diff, which finding drove which change (`finding_to_change_map`), preservation-constraint results, source changes, and revalidation results — so a reviewer can accept/reject/fork/request-changes/restore *with the reasoning visible*, not just a new blob. **User-facing:** "here's exactly what changed and why; accept, fork, or roll back." - **Disposition:** **ACCEPT.** Pairs with G-04 (restore) and A-16 (revision goes through `revision_in`). - **Fix — `[Appx O]`:** ```ts interface RevisionReviewPacket { packet_id: string; task_id: string; run_id: string; revision_plan_ref: StorageRef; before_artifact_version_ref: StorageRef; candidate_artifact_version_ref: StorageRef; semantic_diff_ref: StorageRef; finding_to_change_map: Record; preservation_constraint_result_refs: StorageRef[]; source_changes: SourceRecordRef[]; revalidation_result_refs: StorageRef[]; regression_risk_summary: string; reviewer_action: "accept" | "reject" | "fork" | "request_changes" | "restore_known_good_state" | "no_user_review_required"; created_at: ISO8601; schema_version: "1.0"; } ``` ### G-03 — [new · Source Workspace] `EvidencePackage` — exportable evidence binder (ACCEPT) - **Raised by:** ChatGPT (IDEA/HIGH). - **What & why:** the Source Workspace exported as one reviewable unit — final artifacts + workspace snapshot + a `claim_support_map` marking each claim `supported / partially_supported / unsupported / contradicted / not_checked`, plus unresolved research needs and stale/unverified sources. This is the binder a litigator hands to a partner or keeps for the file. **User-facing:** "every claim, what backs it, and what's still unverified." Depends on D-06 evidence anchors. - **Disposition:** **ACCEPT.** - **Fix — `[Appx O]`:** ```ts interface EvidencePackage { evidence_package_id: string; task_id: string; run_id: string; final_artifact_refs: StorageRef[]; source_workspace_snapshot_ref: StorageRef; source_record_refs: SourceRecordRef[]; evidence_anchor_refs: string[]; claim_support_map: Array<{ claim_ref: ClaimRef; supporting_anchor_refs: string[]; contradicting_anchor_refs: string[]; support_status: "supported" | "partially_supported" | "unsupported" | "contradicted" | "not_checked"; }>; unresolved_research_need_refs: string[]; stale_or_unverified_source_refs: SourceRecordRef[]; created_at: ISO8601; schema_version: "1.0"; } ``` ### G-04 — [new · V3.3.1 / Core] `KnownGoodState` — named restorable checkpoint (ACCEPT) - **Raised by:** ChatGPT (IDEA/HIGH). - **What & why:** a named, restorable checkpoint of a run/artifact state, so cancel (B-10), fork (G-17), and revision-review (G-02) all have a concrete thing to roll back to. **User-facing:** "save point — restore if a later change makes things worse." - **Disposition:** **ACCEPT.** Schema in Appendix M; `restore_known_good_state` is already an `AvailableTaskAction` (D-14) and a `reviewer_action` (G-02). - **Fix — `[Appx M]`:** `KnownGoodState { state_id, task_id, run_id, artifact_version_refs[], workspace_snapshot_ref, label, created_at }` + a `restore` command in the command registry (D-14) with an idempotency key. ### G-05 — [new · Core / Common Contracts] `BudgetNarrative` — plain-English cost/quality account (ACCEPT) - **Raised by:** ChatGPT (IDEA/HIGH). - **What & why:** a human-readable account of what a run cost and what that bought — separating *logical* LLM calls from *infrastructure retries* (so retries don't look like work), local compute, external tool cost, which optional helpers were skipped, which non-degradable modes were preserved, which degraded modes were used, and the quality impact. **User-facing:** "what it cost, what was skipped, and how that affected quality." - **Disposition:** **ACCEPT.** Consumes the shared `CostEstimate`/`TaskCostRecord` (C-07) and feeds A-02 / G-09 cost predictability. - **Fix — `[Appx O]`:** ```ts interface BudgetNarrative { budget_narrative_id: string; task_id: string; run_id: string; planned_estimate_ref?: StorageRef; actual_cost_records: TaskCostRecord[]; summary: string; logical_llm_calls: number; infrastructure_retries: number; local_compute_seconds: number; external_tool_cost_usd?: number; doc24_packet_assembly_cost?: CostEstimate; source_research_cost?: CostEstimate; skipped_optional_helpers: string[]; preserved_non_degradable_modes: string[]; degraded_modes_used: string[]; quality_impact_summary: string; created_at: ISO8601; schema_version: "1.0"; } ``` ### G-06 — [new · Common Contracts / Core] `TaskReliancePacket` — the capstone reliance artifact (ACCEPT) - **Raised by:** ChatGPT (IDEA/CRITICAL). - **What & why:** the cover memo that ties the whole layer together — assurance summary, unresolved limitations, the evidence package, revision-review packets, Hard-Call resolutions, policy decisions, budget narrative, and known-good states — and renders a single **`reliance_status`**: `safe_to_rely_within_scope` / `rely_with_limitations` / `not_safe_to_rely` / `human_review_required`, with an explicit `reliance_scope` and a user-visible summary. This is the artifact a high-stakes professional actually relies on. **User-facing:** "can I rely on this, within what scope, with what caveats." - **Disposition:** **ACCEPT** — the keystone of §G; everything else in this cluster feeds it. - **Fix — `[Appx O]`:** ```ts interface TaskReliancePacket { packet_id: string; task_id: string; run_id: string; final_artifact_refs: StorageRef[]; evaluation_chain_ids: string[]; assurance_summary_ref: StorageRef; unresolved_limitations: EvaluationLimitationKind[]; // ← A-07 limitation taxonomy evidence_package_ref?: StorageRef; revision_review_packet_refs: StorageRef[]; hard_call_resolution_refs: StorageRef[]; policy_decision_refs: PolicyEvaluationRef[]; budget_narrative_ref?: StorageRef; known_good_state_refs: string[]; reliance_status: "safe_to_rely_within_scope" | "rely_with_limitations" | "not_safe_to_rely" | "human_review_required"; reliance_scope: string; user_visible_summary: string; created_at: ISO8601; schema_version: "1.0"; } ``` - **Rationale:** binds the limitation taxonomy (A-07), evidence anchors (D-06), Pattern C chains (A-03), and budget (C-07) into one auditable statement. Without it, "the system did good work" is unverifiable; with it, reliance is scoped and inspectable. ### G-07 — [new · DOC20 / Core] `AttentionLedger` / `DecisionQueue` — cross-run attention surface (ACCEPT — author minimal) - **Raised by:** ChatGPT (IDEA/HIGH). - **What & why:** one place that surfaces everything across tasks/runs awaiting the user's attention or decision — pending Hard Calls, blocked items, contested findings, approvals. Prevents decisions from being buried inside individual runs. **User-facing:** "one inbox of everything that needs your call." - **Disposition:** **ACCEPT.** No Appendix schema; author a minimal one for review. - **Fix (PROPOSED — confirm shape):** ```ts interface AttentionLedgerItem { item_id: string; task_id: string; run_id?: string; attention_kind: "hard_call_pending" | "blocked_item" | "contested_finding" | "approval_required" | "research_need_human"; summary: string; priority: "low" | "medium" | "high" | "blocking"; command_ref?: string; created_at: ISO8601; resolved_at?: ISO8601; schema_version: "1.0"; } ``` ## Conceptual / UX findings (Claude A1–A5) ### G-08 — [Set-wide] No first-class "task health" surface (ACCEPT) — Claude A1 - **Problem/why:** task health is fragmented across signals; there's no single surface showing progress, blockers, budget burn, and last verdict. **Disposition:** **ACCEPT** — define a `TaskHealthCard` DOC20 surface aggregating the existing signals (no new truth). Renders from run state + BudgetNarrative (G-05) + last `OutcomeEvaluationState`. ### G-09 — [Set-wide] Cost predictability asserted but not computable before a run (ACCEPT) — Claude A2 - **Problem/why:** the set claims cost predictability but nothing assembles an end-to-end forecast before running. **Disposition:** **ACCEPT** — a pre-run forecast built from the shared `CostEstimate` (C-07) summed across packets/research/LLM calls (the D-18 per-activation budget gives the sub-totals); surfaced in EvaluationContractReview (G-01) and reconciled after by BudgetNarrative (G-05). ### G-10 — [V3.3.1 §21 / FD §8] Reviewability fragmented across ≥3 surfaces (ACCEPT) — Claude A3 - **Problem/why:** review is scattered across at least three places. **Disposition:** **ACCEPT** — unify into one review experience: FindingsInbox (G-13) for triage, RevisionReviewPacket (G-02) for changes, DecisionAuditView (G-15) for the "why." One entry point, not three. ### G-11 — [FD §3.4 / V3.3.1 §6.12] Over-relies on the user knowing to contest (ACCEPT) — Claude A5 - **Problem/why:** findings are defeasible (FD §3.4) but the UI doesn't *surface* contestability, so the burden is on the user to know they can push back. **Disposition:** **ACCEPT** — proactively mark contestable findings/verdicts in the UI with the contest affordance inline (ties to the defeasible-findings model). ## Proposed user surfaces (Claude S1–S5) ### G-12 — `WorkProductCertification` — the page you staple to the cover sheet (ACCEPT — highest-leverage surface) — Claude S1 - **What:** the human-facing render of `TaskReliancePacket` (G-06): reliance status, scope, limitations, evidence summary. **Disposition:** **ACCEPT.** DOC20 surface over G-06; read-only; no new truth. ### G-13 — `FindingsInbox` — cross-task review queue (ACCEPT) — Claude S2 - **What:** a queue of findings/repair instructions across tasks, filterable by severity/state/matter. **Disposition:** **ACCEPT.** Reads canonical `EvaluationFinding`s (A-01); matter-scoped per D-24. ### G-14 — `RunDiff` — compare two runs of the same task (ACCEPT) — Claude S3 - **What:** diff two runs (inputs, plan, outcomes, cost). **Disposition:** **ACCEPT.** Reads run records + BudgetNarrative (G-05); pairs with RunReplayPreview (G-16). ### G-15 — `DecisionAuditView` — "why did it decide that" (ACCEPT) — Claude S4 - **What:** renders the decision chain behind a verdict/route (which findings, which policy, which Hard Call). **Disposition:** **ACCEPT.** The UI counterpart to the Pattern C chain (A-03) and routing (A-08); renders the coordination trace. ### G-16 — `RunReplayPreview` — preview a replay before committing (ACCEPT) — Claude S5 - **What:** show what a replay would produce before applying it. **Disposition:** **ACCEPT.** Depends on the TaskReplay primitive (G-19) + KnownGoodState (G-04). ## Non-learning ideas ### G-17 — [Core R0.7.1 §5.1] `TaskRunFork` + `irrevocable_side_effects_at_fork` (ACCEPT — the adopted form of the declined ShadowWorkspace) — Claude §5.1 - **What & why:** fork a run from a checkpoint to explore an alternative without disturbing the original — the *real* answer to Gemini/Grok's branch-and-merge idea (§F-01), made honest by an explicit record of side effects that **cannot** be forked (an already-sent email stays sent). **Disposition:** **ACCEPT.** - **Fix (PROPOSED, consistent with §5.1):** ```ts interface TaskRunFork { fork_id: string; parent_run_id: string; forked_from_checkpoint_ref: string; // KnownGoodState (G-04) new_run_id: string; fork_reason: string; irrevocable_side_effects_at_fork: Array<{ side_effect_id: string; kind: string; executed_at: ISO8601; note: "not_reversible_in_fork"; }>; created_at: ISO8601; schema_version: "1.0"; } ``` ### G-18 — [V3.3.1 / Core] `ExplanationTrace` as a first-class artifact (ACCEPT) — Grok - **What & why:** every `CompiledRevisionStrategy`/`RevisionPlan` emits a short human-readable causal trace ("changed X because finding Y, preserving Z"). Cheap to produce, large trust gain, and feeds DecisionAuditView (G-15). **Disposition:** **ACCEPT** — a short `explanation_trace_ref` on the plan/strategy. ### G-19 — [Set-wide] `TaskReplay` primitive (ACCEPT) — Claude D11 - **What & why:** deterministic replay of a run closes the determinism story and underpins RunReplayPreview (G-16) and RunDiff (G-14). **Disposition:** **ACCEPT** — a replay primitive keyed to a run snapshot + KnownGoodState (G-04). ### G-20 — [DOC20] Unified Evaluation-Chain view (ACCEPT) — Grok - **What & why:** render a Pattern C chain as one card — qualitative findings and the quantitative verdict side by side — instead of two disconnected envelopes. The UI counterpart to A-02/A-03/A-04. **Disposition:** **ACCEPT** — DOC20 surface over the chain registry (A-03). ### G-21 — [Testing] Chaos / concurrency fixtures (ACCEPT) — ChatGPT / Claude §6.3 - **What & why:** the test harness for the §B concurrency work — storage-full, malformed LLM output, mid-run privilege change, clock skew, parallel writes to one artifact. **Disposition:** **ACCEPT** — required fixtures backing B-03/B-04/B-05/B-16 and D-01/D-24; gate the runtime fixes on them passing. --- --- # §G/§A/§D partials — folded as notes (not full rows) Covered in substance by existing rows; recorded here so they're tracked, with where each attaches: - **`authority_basis` array can be empty while hard blockers require backed authority** (Grok §5.2) → note on **A-05**: add a non-empty constraint — a finding with `severity:"blocking"` MUST carry ≥1 `assurance_basis`; `validation.blocking_finding_without_authority` (error). - **`AutonomousModePolicy` as single source of truth + visible toggle/live risk score** (Grok §3.1) → note on **A-16 / G-08**: Claude D9 already confirms the *locked fields* are correct-by-construction; add the UX requirement to surface `AutonomousModePolicy` as a visible toggle with a live risk score, and the invariant that no mutation path bypasses it (reinforces A-16). - **Consumption-receipt → `RepairCycleSignal` linkage is optional/never mandated** (Grok §4.6) → note on **D-23**: make the `consumption_receipt_ref → RepairCycleSignal` linkage field **required**, not optional; `validation.repair_cycle_signal_without_consumption_link` (error). - **OP-A rows claim the same primitive under different names** (Grok §4.4) → note on **A-15 / supersession (§8)**: the supersession matrix + TypeOwnerRegistry already force one canonical name per primitive; add an OP-A dedup pass as a build-ready gate item. - **Mandatory Plan-Verifier for high-risk plans** (Grok top-5) → note on **D-15**: D-15 adds the `plan_verifier` coordination point; add the *policy* that a `plan_verifier` sub-agent is **required** (not optional) when `plan_risk_score ≥ threshold`. --- --- *Consolidated from working cards V0.1–V0.6. 126 items adjudicated from ~228 raw reviewer assertions: 88 fixes (§A–§D), 13 held for Phase B (§E), 4 declined per review §6.2 (§F), 21 product surfaces (§G), plus 5 partials folded as notes. Fixes carry the Appendix A–P schemas inline. `⚠verify` items deferred for current-state confirmation. Next step after your decisions: the R0.4 amendment package applying the accepted rows to the operative specs.*