DOC23_ADDENDA_B_TEST_SET_ADJUDICATION_CARD.md

Active Working and Red Team/DOC23 Working/DOC23 Red Teaming/DOC23_ADDENDA_B_TEST_SET_ADJUDICATION_CARD.md
Generated 2026-06-09T01:23:58.539Z from commit dbaa25962edc11ab30e8d4ca1715f9ae5bf77331. Worktree: clean.
Open text page · Open raw txt · Open path URL
# DOC23 Addenda B — Test-Set (Purpose-Audit) Adjudication Card

**Scope:** Adjudicates the experimental "Test prompt" red-team harvest (`DOC23_Add_B_TEST_Prompts_RT_1`) against the operative Addenda B set and the already-flushed comprehensive review (`RED_TEAM_DOC23_ADDENDA_B_SET_V2` + amendment appendices). Decides, per the architect's instruction, **(a) what in the Test set is net-new vs. already covered, and (b) what new features/add-ons it warrants.**

**Operative set verified (read from `main`, 2026-05-28):** Core R0.7.1, V3.3.1 (Outcome Evaluator/Revisor), Common Contracts V1.1.1, Source Workspace V1.0.1, Task Forum + Run Board V1.0.1, Feedback Delivery V1.0.1, base R3.1.

**Completeness:** Every Test-file finding (≈40, enumerated by reviewer in §6) is mapped to a disposition. Genuine net-new → §2 (ADOPT) / §3 (DISCUSS); already-covered → §4 (DUP/DECLINED) with comprehensive-review anchors; full matrix in §6.

**Status:** Draft for red-team review → revise → OP-A candidate rows. No spec edits made.

**Typing:** `BUG` · `GAP` · `SUGGESTION/IDEA` · `CONFIRMED` · disposition tags `ADOPT` / `FOLD-INTO-R0.4` / `DISCUSS` / `DUP` / `DECLINED-OVERLAP`.

---

## 0. Structural fact + the one idea

The Test harvest is the Round-1/Round-2 **purpose-audit** output the comprehensive review's §8 set up (its reviewer grid, L767–778, maps 1:1 onto the Test sections). The review's author designed the questions and pre-built appendices (O reliance, N side-effects, taint/concurrency/cancel findings) to catch the answers, so most of the harvest **dedups**. The net-new residue is small and centers on one idea:

**Reporting → Enforcement.** The comprehensive review built the layer that *describes* trust (EvidencePackage, TaskReliancePacket, BudgetNarrative, the limitation taxonomy — observability). The Test harvest's net-new contribution is the layer that *enforces* it: machinery that prevents a clean `satisfied`/`approved`/"verified" verdict when the system (i) couldn't affirmatively prove the claim, (ii) couldn't check it at the assurance level requested, or (iii) wasn't sure which matter it belonged to. Plain-language: a system that *will not misrepresent* what it did, not merely one that describes it.

---

## 1. Verified facts (checked against the operative text, not inferred)

**V-1 — `HumanGateSummary` referenced-but-undefined; `decision_log_required` unbacked (Claude-2 — CONFIRMED).** `human_gate_status: HumanGateSummary[]` (Core **L1228**) and `human_gate_summary: HumanGateSummary` (Core **L4881**); no `interface/type HumanGateSummary` in Core/R3.1/V3.3.1/Forum. `decision_log_required: true` asserted Core **L6487** (§14.2) with no decision-log record type. R3.1 captures gate *outcome* only.

**V-2 — N:1 matter-resolution has no ambiguity gate (Claude-1 — CONFIRMED, correctness bug).** §13A.3 cond. 5 (Core **L5816**) admits scope context on a "high-confidence entity/matter/library match," with no behavior for the correctly-uncertain case. §13A.8 `EmailTriggerScopeSummary` (Core **L6002–6016**) carries `new_case_or_matter_candidate_refs: EntityRef[]` — **plural** — but no `confidence`, no `resolution_status`, no "multiple candidates → hold" rule. §13A.4 + **L6019** only bar inferring scope from *unrelated concurrent chats*.

**V-3 — Staleness contract substantially EXISTS; the Grok contradiction resolves against *both* runs (Grok 1a vs 2b).** Present: §5.16 `EvaluationSnapshot`; §7.13 `ArtifactMutationPrecondition`; §11.20 live-edit rolling-hash; **§11.21 revalidation cascade** ("dependent outcomes become stale… revalidates automatically", V3.3.1 **L5776**); hard-call re-escalation ("Stale resolutions do not silently apply", **L3895**); `evaluation_snapshot_ref` REQUIRED + `validation.envelope_missing_snapshot_ref` (Common §3); `dirty`/`regressed` states (**L288/L290**). **Grok-1a overstated** ("single largest *silent* gap" — false; §11.21/§11.20/L3895/`dirty` are exactly the reactions it says are missing). **Grok-2b overstated differently** (citations §5.2/§6.2/§4.3 are fabricated section numbers; its conclusions are directionally right). **Two real residuals survive:** (a) read/write staleness — the spec's own **F-CONCUR-02** (V3.3.1 **L8806**, bug at **L5162**), already in comprehensive scope (D15); (b) the **detection-vs-enforcement gap** — for *findings, run-guidance, and sources*, staleness is *detected* but enforcement on consumption is *advisory* (Grok-1a items 3/5/8). Residual (b) is the same Reporting→Enforcement pattern as §2 and partly tracked by comprehensive D-01/defeasibility (see DUP-9).

**V-4 — The proof-gate enforces concepts the spec already names.** `AssuranceBasis` already includes `claim_grounded_internal`, `source_verified_external`, `specialist_panel_judgment` (V3.3.1 **L363–377**); `OutcomeEvaluationState` already includes `needs_verification` (**L284**), `needs_information` (**L283**). Presence check across the operative set for `affirmative`/`absence of contradiction`/`SourceMissing`/`EvidencePackage`/`claim_support`/`TargetAssurance`/`ExecutedAssurance`/`satisfied_downgraded`/`quorum`: **no real hits** — confirms §2 items net-new, and that `EvidencePackage`/`claim_support_map`/`TaskReliancePacket` are still **proposals** (comprehensive Appendix O), not yet in spec.

---

## 2. ADOPT — net-new, high value

### AB-T-01 · `GAP` → ADOPT · Affirmative-proof verdict gate + `SourceMissing` token *(architect #1)*
**Source:** Will #1; Gemini-2 §2; Gemini-1 §3.
**Problem (net-new, V-4).** An outcome reaches `satisfied` on *absence of contradiction*; a silently-failed fetch leaves an empty workspace that still reads as a pass. No affirmative claim→source grounding requirement; Source Workspace emits no positive marker for a failed/absent retrieval.
**Mechanism.** (1) Source Workspace `SourceRetrievalOutcome { status: "retrieved"|"source_missing"|"access_blocked"|"stale_only"|"ambiguous_match"; reason_code?; … }` — absence becomes signal, visible to the evaluator. (2) Outcome/criterion flag `requires_affirmative_grounding: boolean`, bound to existing `AssuranceBasis` `claim_grounded_internal`/`source_verified_external`; for such outcomes `satisfied` is unreachable unless every in-scope factual claim is `support_status = supported`; any `unsupported`/`not_checked`/`contradicted` claim, or any in-scope `source_missing`/`access_blocked`, forces the **existing** `needs_verification` (no new state — avoids enum drift). (3) Consumes `EvidencePackage.claim_support_map` (still a *proposal*; co-lands).
**Lints/fixtures.** `outcome.affirmative_grounding_required_but_satisfied_with_ungrounded_claim`; `source.retrieval_failure_not_recorded_as_outcome`; `evaluator.empty_workspace_read_as_verified`. Fixture: 404 mid-run on a fact-bound outcome → `needs_verification`, offending claim + `SourceRetrievalOutcome` surfaced.
**User/feature.** Claim-by-claim panel: green "supported" (→ exact source ¶), yellow "couldn't verify — no source," red "contradicted," gray "source failed to load"; headline "Needs verification — 3 of 47 unproven, 1 source unavailable." This is the architect's #1: the *basis* for each decision incl. the negative/absence cases.
**Dependency:** EvidencePackage proposal (Appendix O). No Phase B dependency.

### AB-T-02 · `GAP` → ADOPT · Assurance / quorum / budget verdict floor
**Source:** Gemini-1 §1/§2/§4.
**Problem (net-new, V-4).** Under constraint the system does the cheaper thing and still shows green: (a) requested `specialist_panel_judgment` silently falls back to a cheaper basis → `satisfied`; (b) a forum that loses a mandated specialist tallies survivors → approved (no `quorum` concept); (c) a 5-step revision hitting `max_llm_calls` leaves a half-mutated artifact.
**Mechanism.** (2a) `target_assurance_basis` + `executed_assurance_basis` on `EvaluationResultEnvelope`; `executed < target` ⇒ `assurance_downgraded:true`, cannot be presented as a clean pass — **field + lint, not a new `satisfied_downgraded` state** (Gemini proposed a state; a flag is cleaner). (2b) `RequiredQuorumManifest` on the plan-review forum (§8.5); mandated voice absent ⇒ `quorum_satisfied:false` + `missing_required_participants` — **distinct from** the forum *deadlock* breaker (F-02 = consensus never reached; this = "consensus" reached with a required voice missing). (2c) Revision plan atomic over its `CandidateArtifactVersion`; on interrupt before all outcomes `satisfied`, `GraphStateRollback` (§11.13) reverts to `base_version` + `interrupt_reason`.
**Lints/fixtures.** `envelope.executed_below_target_assurance_presented_as_clean`; `forum.quorum_unsatisfied_returned_as_approved`; `revision.partial_plan_committed_after_interrupt`.
**User/feature.** Degraded badges ("Verified (downgraded — fast model, not the specialist)"; "Approved — specialist unavailable, 2/3"); "Revision stopped at 3/5 (budget). Reverted to last clean draft. Resume?" Matters most for unattended overnight/scheduled runs. Reconciles Claude-2's "trace-honest" CONFIRMED with Gemini-1's "verdict-dishonest": trace layer already honest (§4.7), verdict layer is what this gates.

### AB-T-03 · `BUG` → ADOPT · N:1 ambiguous matter-resolution → hold gate
**Source:** Claude-1. **Verified V-2.** Correctness/safety bug — silent, crosses a privilege boundary.
**Mechanism.** `MatterResolution { candidate_matter_refs; top_confidence; separation; status:"resolved"|"ambiguous_hold"|"unresolved_hold" }` on trigger-scope resolution (§13A.3 cond.5, §13A.8). If >1 candidate above floor with `separation < threshold`, or no candidate above floor for a privileged/matter-scoped trigger ⇒ `*_hold` ⇒ quarantine + user prompt; MUST NOT auto-bind scope/privileged context across a matter boundary on an unresolved binding. Distinct from the matter *firewall* (comprehensive C12), which governs isolation *after* assignment.
**Lint/fixture.** `task.auto_bound_matter_on_ambiguous_resolution`; `trigger.matter_resolution_missing_status`. Fixture: email from a firm in 6 matters, low separation → `ambiguous_hold`.
**User/feature.** "Needs matter assignment" holding area — "3 items couldn't be confidently matched — assign?"; nothing privileged auto-routes on a low-confidence guess.
**OP-A:** DOC24 supplies confidence/separation inputs → cross-doc obligation row required.

### AB-T-04 · `BUG` → FOLD-INTO-R0.4 · Define `HumanGateSummary`; back `decision_log_required`
**Source:** Claude-2. **Verified V-1.** Same class as B5/B6 phantom refs + D16 (`HumanOutcomeFeedbackEvent`). Define `HumanGateSummary` and add `HumanGateDecisionRecord { decider_ref; decision; rationale?; standard_applied?; shown_refs[]; weighed_refs?; decided_at }` to discharge L6487. Lint `core.referenced_type_undefined`; `gate.decision_log_required_without_record_schema`. User/feature: sign-offs (who/why/against-what-standard) become auditable.

### AB-T-05 · `GAP` → ADOPT · Risk-based minimum source-documentation tier matrix *(was missed in v1)*
**Source:** ChatGPT-2 #6. **Net-new** — the comprehensive review has only source-tier *bugs* (Tier 0 vs `SourceRecord.tier` L1552; transitions not stored L1564), not a risk-based *requirement*.
**Problem.** `documentation_mode` governs default tier, but nothing requires that source-dependent legal claims, filing-bound outputs, privileged matters, or evaluator-consumed criteria carry tier-2/3/4 (full workspace) documentation. A tier-0/1 ephemeral lookup can silently become load-bearing evidence with only a thin receipt.
**Mechanism.** A `MinimumDocumentationTierPolicy` mapping risk class → required tier (e.g., `filing_bound|privileged_matter|evaluator_load_bearing ⇒ tier ≥ 2`); enforced at the point a source is used to support an affirmatively-graded claim. Pairs directly with AB-T-01 (a claim's grounding is only as strong as its source's documentation tier).
**Lint.** `source.load_bearing_claim_supported_by_subminimum_tier`.
**User/feature.** When ELNOR uses a quick lookup to support a filing allegation, it either upgrades the source to a documented tier or flags "this support is thinly documented" — no silent reliance on an ephemeral lookup.

### AB-T-06 · `GAP` → ADOPT (lightweight) · `ModuleDecisionRationale` for ordinary agent modules *(was missed in v1)*
**Source:** Claude-2 ("ordinary agent-module decisions: not captured"). **Net-new.** The Evaluator/Revisor pipeline has rich rationale capture (`AssuranceBasis`, `HardCallResolutionLedger`, `FailureKind`); ordinary `step.agent_task`/`step.red_team`/`coding` modules capture prompt + output + policy but **no structured input→output rationale** — yet most of a real run is these modules. Add an optional lightweight `ModuleDecisionRationale` (key choice + brief why) generalizing the Evaluator pattern, so a halfway run is legible without re-reading the raw output. Keep it light to avoid prompting overhead; do not over-engineer into CausalProof (AB-T-08).
**User/feature.** "Why did the drafter structure the argument this way?" answerable from recorded state, not just inferred from the prose.

---

## 3. DISCUSS — genuine but contestable; architect decision before adoption

### AB-T-07 · IDEA · DISCUSS · `CitationManifest` (write-time text→source binding)
Gemini-2 §1. Stronger than the proposed `EvidencePackage`: mandates that any module generating/mutating factual text bind it to a `SourceRecord` + exact quote *at write time* (dense citation graph), not bundle after the fact. Scope with AB-T-01. Tension: guarantee strength vs. drafting-module overhead.

### AB-T-08 · IDEA · DISCUSS (lean against hard gate) · `CausalProof` ledger on `RevisionPlan`
Gemini-2 §3. Highest theater risk (an LLM *asserting* a rigorous legal causal chain ≠ one existing); partially served by `RevisionReviewPacket.finding_to_change_map` + `regression_risk_summary`. Recommend a light "change rationale" field at most, not a hard escalation gate.

### AB-T-09 · SUGGESTION → DISCUSS (architect-stop) · Portfolio resource/attention governor *(consolidates Claude-1's scaling tail)*
Claude-1's 40-matter analysis: every budget/queue is per-run/per-plan; §2.2 deliberately rejects a central orchestrator, so there is no home for a matter-fairness arbiter. A thin governor is an architect-level call (tension with §2.2). Its **constituent sub-findings** (each verified present, each line-itemed so none is lost):
- unbounded `TaskPortfolioAssessment` — `all_tasks` + flat unbounded arrays, no pagination/top-k/severity floor (§16A.2);
- single global attention channel — ambient card §3B.4 + chips §3C.8, no per-matter rollup *(attention-rollup half partly served by FindingsInbox S2 + AttentionLedger)*;
- single global learning-review queue — `TaskSystemImprovementProposal` lifecycle, no per-matter view / `support_count` triage (§20.1, §9C.3);
- EC sole-writer write contention + no cross-matter write prioritization (§2.4);
- machine-wide parallelism ceiling divided by nobody (§11.22 `LocalHardwareContext`);
- embedding/model fan-in — one Qwen pipeline / one Gemini / one Claude creds; nightly extraction lane (§10.8.2);
- no portfolio-level cost ceiling — every budget local (§17.2, DOC73 §14.7 $15/run);
- nightly batch backlog → silent stale current views / no "extraction debt" surfacing (§10.8, §10.10);
- context-packet write amplification — per-activation packets, no packet-specific compaction (§13A.9, §22.1) *(partial DUP of RunBoard CompactionPolicy L4385, which does not cover packets)*.
Recommendation: decide governor scope vs. §2.2 as one architect-stop item; the rollup half can borrow S2/AttentionLedger.

### AB-T-10 · SUGGESTION → DISCUSS · Per-matter sharding of privileged append logs
Claude-1. `context_feedback.jsonl` / `task_audit_events.jsonl` are single global streams (§22.1); privileged matter A and B share one physical file — wrong granularity for legal-hold/export/privilege-log. Cheap now, expensive to retrofit.

### AB-T-11 · SUGGESTION → DISCUSS · Two-isolation-units inconsistency *(was missed in v1)*
Claude-1. Pattern promotion is matter-scoped (§16.6.5) but suggestion/invocation learning is `context_class_key`-scoped (§9A), not matter-aware — so the system isolates learned *patterns* by matter but learns *invocation behavior* across matters; and rejecting a suggestion on 3 matters suppresses it on the other 37 (§9A.4). Invisible at one matter, inconsistent at 40. Decide the canonical isolation unit. (Correctness/consistency, not pure resource — kept separate from AB-T-09.)

### AB-T-12 · SUGGESTION → DISCUSS · `RunOperatorContext` / handoff primitive
Claude-2. No operator identity/intent on a run (`created_by` is a category; "next" is graph-derived; only `user_note_for_downstream`); multi-actor ownership is deferred to DOC50/§20H. Pull-forward decision for the architect; out-of-scope of the current operative set by design.

### AB-T-13 · GAP (partial) → DISCUSS · Behavior contracts behind two adopted chaos fixtures
Mid-run privilege reclassification (Grok-1b-b) and storage-full during durable write (Grok-1b-d): chaos fixtures already adopted (§6.3) and `local_resource_exhausted` is a known taxonomy fix, but the **behavioral contract** (pause? re-taint emitted artifacts? roll back side effects? fail-closed write?) is unspecified. The fixtures test behavior that doesn't yet exist.

### AB-T-14 · GAP (partial) → DISCUSS · Recovery path for malformed output on a load-bearing eval call *(was missed in v1)*
Grok-1b-c. Validation codes catch malformed envelopes (Common §9) and a chaos fixture exists, but the *recovery path* for a load-bearing evaluation call (retry different model / cheap fallback / `indeterminate` / immediate escalate) is undefined beyond the generic error ack. Overlaps `SubAgentFallbackPolicy` (comprehensive L1894) but is its own decision.

### AB-T-15 · GAP → DISCUSS · TKP freshness preflight ("don't design from stale TKP") *(was missed in v1)*
ChatGPT-2 #3. **Net-new** (zero coverage in the comprehensive review). Task Agent has a degraded mode for a stale knowledge pack but no freshness budget, invalidation trigger, or "do not design from stale TKP" preflight — so it can propose illegal graphs / stale ports from a stale substrate.

### AB-T-16 · SUGGESTION → DISCUSS · DOC24 task-opportunity classifier calibration + kill-switch *(was missed in v1)*
ChatGPT-2 #2. Comprehensive D12 calibrates the *planner* confidence threshold, not the *DOC24 task-opportunity* classifier. The missing piece is FP/FN thresholds, regression fixtures, and a kill-switch for task-routing drift that protects the direct-first experience. (Partial DUP of D12/CalibratedScore.)

### AB-T-17 · GAP (partial) → DISCUSS · Global snapshot retention/indexing contract *(was missed in v1)*
ChatGPT-2 #4. `evaluation_snapshot_ref` is required per-envelope, but there is no global retention/indexing contract guaranteeing the immutable state every producer points to still exists when replay/rollback/audit/learning needs it. (Partial DUP of the required-snapshot-ref rule.)

---

## 4. DUP / DECLINED-OVERLAP — do not re-open (with anchors)

| Test finding | Disp. | Already lives in |
|---|---|---|
| Transitive taint laundering (Gemini-1 §5, Gemini-2 §4) | DUP | F-03 (inherit-or-`SanitizationNode`, L5290/L5300) + C8 source_kind→taint. *Confirm F-03 is written into the amendment, not just flagged — re-surfaced twice.* |
| Concurrent `ResearchNeed` (Grok-1b-a) | DUP | §6.3 `ResearchNeedLease` / D15 |
| Cancel after side-effect (Grok-1b-e) | DUP | §6.3 `TaskCancelProtocol`; Appendix M |
| "Safe to touch" — idempotency should→must, concurrent-edit lock (Claude-2 #4) | DUP | Idempotency BUG §11.8 (L1212) + `Step idempotency` (L4460) + F-CONCUR-02/D15 |
| Pattern C dual-verdict resolution (ChatGPT-2 #15) | DUP | C1 + Appendix E |
| V3.3 vs Source Workspace semantics (ChatGPT-2 #5) | DUP | `TaskSourceWorkspace` identity-split BUG (L1540) |
| External source-query / export-open side-effect (ChatGPT-1 IRREV-01/02; ChatGPT-2 #7) | DUP | Appendix N; `ExternalSourceQueryPolicy`/`WorkspaceExternalizationPolicy` |
| Reliance/evidence/budget/known-good/attention surfaces (ChatGPT-2 #10 receipts incl.) | DUP (proposals) | Appendix O (`TaskReliancePacket`/`EvidencePackage`/`RevisionReviewPacket`/`BudgetNarrative`/`KnownGoodState`/`AttentionLedger`); delivery+consumption receipts (L1138/L1150). *Not yet in spec — AB-T-01/02 depend on them landing.* |
| Cross-doc build-readiness (ChatGPT-2 #1) | DUP | "Build-ready conflicts with pending cross-doc obligations" (L1868); L54/L804 |
| Packet fidelity / omission manifest (ChatGPT-2 #8) | DUP | `TaskRunContextPacket needs … omission manifest` (L1722) |
| Lifecycle causal propagation / monotonic read-model (ChatGPT-2 #9) | partial DUP | D-01 cross-run propagation (L5304); D4 monotonicity (L429) |
| Model-class learning calibration gate (ChatGPT-2 #11) | partial DUP (Phase-B) | D21 `cross_model_applicability` no runtime behavior (L495); reputation by model_class (L1906); LearningMode value set (L303) |
| Criterion semantic stability (ChatGPT-2 #12) | DUP | `criterion_semantics_hash is lexical not semantic` (L1430) |
| Optional-substrate / branch-consumer preflight (ChatGPT-2 #13) | partial DUP | `SubAgentFallbackPolicy` (L1894/L233); "block/no-op/named degraded path" (L800) |
| Substantive vs process-gap classifier (ChatGPT-2 #14) | DUP | A6 (L112); two-forum-surface clarification (L489); Grok L5646 |
| Staleness "all silent" (Grok-1a, Grok-2b) | DECLINED-OVERLAP | V-3: contract exists; residual (a) read/write = F-CONCUR-02 + D15; residual (b) advisory-enforcement → partly D-01/defeasibility, conceptually AB-T-01/02 |
| Git-branching/ShadowWorkspace; new `TaskConfirmationSignal`; flawless-exec as new signal; finding-chunking | DECLINED | Comprehensive §6.2 (already adjudicated, with replacements) |
| CONFIRMED: surface-context bleed closed; pattern-privilege bleed closed (Claude-1) | CONFIRMED | §13A.2/3/7; §16.6.5 — note as verified strengths, no action |
| CONFIRMED: policy & Evaluator/Revisor decisions richly captured; trace-honesty bright spot (Claude-2) | CONFIRMED | §12.5; AssuranceBasis/HardCallResolutionLedger; §4.7 — reconciled in AB-T-02 |

---

## 5. Recommended next steps

1. **Route this card** to ChatGPT/Grok/Gemini for confirmation (pressure-test AB-T-02's field-not-state call, AB-T-08 theater risk, AB-T-09 vs §2.2, and whether AB-T-05/06/11 are correctly net-new).
2. **Package AB-T-01 + AB-T-02 + AB-T-05 as one proposal** — "Proof & Honesty Verdict Floor" (V3.3.1 + Common Contracts + Source Workspace), co-landed with the `EvidencePackage`/`TaskReliancePacket` proposals (Appendix O). **No Phase B dependency.**
3. **AB-T-03** independently shippable; raise the DOC24 confidence/separation inputs as an OP-A candidate row.
4. **AB-T-04** folds into the R0.4 contract-hardening pass.
5. **AB-T-06 / AB-T-15 / AB-T-16** are small standalone adds; **AB-T-07–14, AB-T-17** → architect decision queue (AB-T-09 and AB-T-12 are architect-stop class).
6. **Housekeeping (from V-3 / DUP table):** confirm transitive-taint F-03 and read/write staleness (F-CONCUR-02) are in the amendment package, not just flagged.

---

## 6. Complete coverage matrix (every Test finding → disposition)

**Will intro:** evaluator ties findings to source/evidence incl. negative/absence proof → **AB-T-01**.

**Claude-1 (15):** unbounded portfolio review→AB-T-09 · global attention channel→AB-T-09(+S2 dup) · global learning queue→AB-T-09 · suppression keying→AB-T-11 · *CONFIRMED surface bleed closed*→§4 · *CONFIRMED pattern bleed closed*→§4 · **N:1 matter resolution→AB-T-03** · two isolation units→AB-T-11 · comingled logs→AB-T-10 · EC writer funnel→AB-T-09 · parallelism ceiling→AB-T-09 · embedding fan-in→AB-T-09 · no portfolio cost ceiling→AB-T-09 · nightly backlog/extraction debt→AB-T-09 · packet write amplification→AB-T-09.

**Claude-2 (handover):** ordinary-module "why"→**AB-T-06** · "next" graph-derived→AB-T-12 · *CONFIRMED policy decisions captured*→§4 · *CONFIRMED Evaluator decisions captured*→§4 · human-gate rationale/identity→**AB-T-04** · *CONFIRMED design-time why (orthogonal)*→§4 · idempotency should→must→DUP(§4) · concurrent-edit lock→DUP(§4) · live-session-gone→CONFIRMED-honest(note) · operator identity/intent→AB-T-12 · safe-to-touch→DUP(§4) · *CONFIRMED trace-honesty*→§4/AB-T-02 · intra-module discarded reasoning→folded into AB-T-06.

**Grok-1a (9) + Grok-2b (6):** EvaluationSnapshot, ArtifactMutationPrecondition, Resolved Hard Calls, Applied Patterns, hash preconditions, context packets, TKP source verifs, evaluation_snapshot_ref → **V-3 + DECLINED-OVERLAP** (contract exists). SourceFreshnessRecord, RunGuidanceItem, Defeasible Findings (advisory enforcement) → V-3 residual (b) + DUP(D-01/defeasibility).

**Grok-1b (5):** (a) ResearchNeed→DUP · (b) mid-run privilege→AB-T-13 · (c) malformed output recovery→**AB-T-14** · (d) storage-full→AB-T-13 · (e) cancel after side-effect→DUP.

**ChatGPT-1 (2 + table):** IRREV-01 external query→DUP · IRREV-02 export/open→DUP · inventory table (most actions well-guarded)→CONFIRMED.

**ChatGPT-2 (15):** #1 build-readiness→DUP · #2 DOC24 calibration/kill-switch→**AB-T-16** · #3 TKP freshness→**AB-T-15** · #4 snapshot retention/indexing→**AB-T-17** · #5 workspace semantics→DUP · #6 source-tier matrix→**AB-T-05** · #7 external-action side-effect→DUP · #8 packet omission-manifest→DUP · #9 lifecycle propagation→partial DUP · #10 receipt taxonomy→partial DUP · #11 model-class learning→partial DUP · #12 criterion semantic-hash→DUP · #13 optional-substrate preflight→partial DUP · #14 substantive/process gap→DUP · #15 Pattern C→DUP.

**Gemini-1 (5 + core fix):** quorum illusion→AB-T-02 · assurance downgrade→AB-T-02 · evidence-of-absence→AB-T-01 · budget truncation→AB-T-02 · transitive taint→DUP · outcome-state matrix→AB-T-02 (as field+lint).

**Gemini-2 (4):** CitationManifest→AB-T-07 · StrictAffirmativeProof→AB-T-01 · CausalProof→AB-T-08 · taint laundering→DUP.

---
*End of card. Inputs: Test harvest `DOC23_Add_B_TEST_Prompts_RT_1`; comprehensive review `RED_TEAM_DOC23_ADDENDA_B_SET_V2`; operative Addenda B set @ `main` 2026-05-28. All section/line anchors verified against the operative files. Every Test finding accounted for in §6.*