ELNOR REPO READER TEXT MIRROR Original path: Active Working and Red Team/DOC23 Working/DOC23 Red Teaming/Test-set Card V2 Red Team Responses/DOC23_ADDENDA_B_TEST_SET_ADJUDICATION_CARD_V2.md Source repo: /Users/OpenClaw1/Elnor/Elnor Specs Git branch: main Git commit: dbaa25962edc11ab30e8d4ca1715f9ae5bf77331 Generated: 2026-06-09T01:23:58.539Z --- # DOC23 Addenda B -- Test-Set (Purpose-Audit) Adjudication Card -- V2 **Supersedes (active use):** `DOC23_ADDENDA_B_TEST_SET_ADJUDICATION_CARD.md` (V1). V1 retained for lineage. **What V2 does:** folds in the three-reviewer red-team of V1 (Claude, ChatGPT, Gemini; multi-pass) per architect rulings (2026-05-31). All disposition calls below are architect-confirmed (D1-D6). No spec edits made; cross-doc changes are captured as OP-A candidate rows (Sec. 9), to be written through the normal OP-A / flattening process when finalized. DOC24 and the memory stack are flattening-scoped and are NOT edited here. **Scope:** Adjudicates the experimental "Test prompt" red-team harvest (`DOC23_Add_B_TEST_Prompts_RT_1`) against the operative Addenda B set and the already-flushed comprehensive review (`RED_TEAM_DOC23_ADDENDA_B_SET_V2`). Decides (a) what in the Test set is net-new vs. already covered, and (b) what new features it warrants. **Operative set verified (read from `main`):** Core R0.7.1, V3.3.1 (Outcome Evaluator/Revisor), Common Contracts V1.1.1, Source Workspace V1.0.1, Task Forum + Run Board V1.0.1, Feedback Delivery V1.0.1, base R3.1. **Typing:** `BUG` - `GAP` - `SUGGESTION/IDEA` - `BETTER_IDEA` - `CONFIRMED`; dispositions `ADOPT` / `ADOPT (re-scoped)` / `FOLD-INTO-R0.4` / `DISCUSS` / `DUP` / `DECLINED-OVERLAP`. ### What changed from V1 (summary) - **Re-weight:** AB-T-03 (ambiguous matter-resolution hold) is the highest-severity item on the card (silent privilege-boundary crossing). Ship first. (All 3 reviewers.) - **Six unifying constructs added (Sec. 2):** the scattered enforcement flags are consolidated into one **Verdict-Honesty / Clean-Verdict-Eligibility object** (P-1), one canonical **ContextBoundaryRef** isolation primitive (P-2), one **RecoveryPolicyRegistry** (P-3), a cross-cutting **gate-signal-integrity** rule (P-4), **Evidentiary Quarantine** -> DOC73 (P-5, OP-A), and an **EnforcementBadge** read-model -> DOC20 (P-6, OP-A). - **DISCUSS -> ADOPT (reviewer consensus):** AB-T-07, AB-T-10, AB-T-11, AB-T-13, AB-T-14, AB-T-15, AB-T-16, AB-T-17. AB-T-09 -> scoped ADOPT (admission policy, not orchestrator). AB-T-12 -> lightweight ADOPT. AB-T-08 stays DISCUSS (light rationale only; no hard gate). - **Re-scopes (accuracy):** AB-T-15 (enforcement exists; net-new = freshness *detection*), AB-T-17 (net-new = a reference-aware *pin*, not new retention), AB-T-13 (storage half largely covered; net-new = taxonomy extension + privilege-reclassification cascade), AB-T-06 (runtime per-activation only), AB-T-14 (reuse existing `task_agent_fallback_policy`). - **Interaction bugs in the adopted set (Sec. 4):** 8 defects that appear only when the adopted items run together (Claude pass 2). - **Gemini systemic bugs folded in (Sec. 5):** BUG-01..04, with net-new-vs-covered triage. - **New item (Sec. 6):** AB-T-HYDR -- task-path injection precedence ("memory hydration" at task start), as a tracked obligation with an open placement question for reviewers. - **Three architect questions resolved (Sec. 7).** --- ## 0. Framing -- one idea became several axes V1 ran on one axis: **Reporting -> Enforcement** (the comprehensive review built the layer that *describes* trust -- EvidencePackage, TaskReliancePacket, BudgetNarrative, the limitation taxonomy; the Test harvest's net-new contribution is the layer that *enforces* it -- machinery that refuses a clean `satisfied`/`approved`/"verified" verdict the system cannot actually back). All three reviewers endorse that thesis but say it is understated and under-structured, and that it mis-files two clusters that are not verdict-honesty at all. V2 keeps Reporting->Enforcement as the spine and adds the axes the reviewers surfaced: 1. **Verdict honesty (the spine).** Refuse a clean verdict when the system could not affirmatively prove the claim (AB-T-01), could not check it at the requested assurance (AB-T-02), was unsure which matter it belonged to (AB-T-03), or relied on a sub-minimum source (AB-T-05). Consolidated in P-1. 2. **Admission / preflight control (ChatGPT).** Some Test findings are not about the *verdict* at the end -- they are about refusing to *start* on a bad footing: TKP freshness (AB-T-15), task-opportunity classifier drift (AB-T-16), matter resolution (AB-T-03), and portfolio pressure (AB-T-09) are preflight/admission checks, not verdict reporting. 3. **Portfolio-scale boundary / isolation (Claude + ChatGPT + Gemini).** The scaling and isolation cluster (AB-T-09/10/11) is not verdict honesty; it is "which boundary owns this work, this log, this budget, this learned behavior." Consolidated in P-2 (`ContextBoundaryRef`; canonical unit = matter). 4. **Substrate durability / freshness (Claude).** AB-T-15 (TKP) and AB-T-17 (snapshots) are one family: is the durable substrate a task relies on still fresh / still present? 5. **Trustworthy gate triggers -- the precondition (Claude, P-4).** Every gate above fires on a *signal*; if the producing model supplies the signal, the gate can be gamed. For high-stakes / privileged / filing-bound work a gate trigger must be deterministic or independent-check provenance, not self-reported. Without this, axes 1-4 are a stricter *reporting* layer, not enforcement. Plain-language unchanged: a system that **will not misrepresent** what it did -- and (P-4) one whose refusal cannot be talked out of by the thing being judged. --- ## 1. Verified facts (checked against the operative text) **V-1 -- `HumanGateSummary` referenced-but-undefined; `decision_log_required` unbacked (CONFIRMED).** `human_gate_status: HumanGateSummary[]` (Core L1228) and `human_gate_summary: HumanGateSummary` (Core L4881); no `interface/type HumanGateSummary` anywhere. `decision_log_required: true` asserted Core L6487 with no decision-log record type. R3.1 captures gate *outcome* only. -> AB-T-04. **V-2 -- N:1 matter-resolution has no ambiguity gate (CONFIRMED, correctness bug).** Sec. 13A.3 cond. 5 (Core L5816) admits scope context on a "high-confidence entity/matter/library match," with no behavior for the correctly-uncertain case. Sec. 13A.8 `EmailTriggerScopeSummary` (Core L6002-6016) carries `new_case_or_matter_candidate_refs: EntityRef[]` -- plural -- but no `confidence`, no `resolution_status`, no "multiple candidates -> hold" rule. -> AB-T-03 (now highest severity). **V-3 -- Staleness contract substantially EXISTS; the AB-T-15/17 "net-new" claims must be re-scoped.** Present: Sec. 5.16 `EvaluationSnapshot`; Sec. 7.13 `ArtifactMutationPrecondition`; Sec. 11.20 live-edit rolling-hash; Sec. 11.21 revalidation cascade (V3.3.1 L5776); hard-call re-escalation ("Stale resolutions do not silently apply", L3895); `evaluation_snapshot_ref` REQUIRED + `validation.envelope_missing_snapshot_ref` (Common Sec. 3); `dirty`/`regressed` states (L288/L290). **Net-new residue, re-scoped (Sec. 3):** (a) read/write staleness = the spec's own F-CONCUR-02 (already in D15 scope); (b) detection-vs-enforcement: for findings/run-guidance/sources, staleness is *detected* but consumption enforcement is *advisory* -- same Reporting->Enforcement pattern (P-1); (c) **TKP-itself freshness detection** (AB-T-15 -- the enforcement substrate `stale_pack_behavior` + Sec. 8A.5 readiness already exists; only the freshness-*detection* policy is missing); (d) **reference-aware snapshot pinning** (AB-T-17 -- time/governance retention already exists; only a pin that survives while a live reliance/dirty-outcome/replay references the snapshot is missing). **V-4 -- The proof gate enforces concepts the spec already names; the honesty objects are still proposals.** `AssuranceBasis` already includes `claim_grounded_internal`, `source_verified_external`, `specialist_panel_judgment` (V3.3.1 L363-377); `OutcomeEvaluationState` already includes `needs_verification` (L284), `needs_information` (L283). Presence check for `affirmative`/`absence of contradiction`/`SourceMissing`/`EvidencePackage`/`claim_support`/`TargetAssurance`/`ExecutedAssurance`/`satisfied_downgraded`/`quorum`: no real hits -- confirms Sec. 3 items net-new and that `EvidencePackage`/`claim_support_map`/`TaskReliancePacket` are still proposals (comprehensive Appendix O), not yet in spec. **V-5 -- Gate-signal provenance is mixed (CONFIRMED; basis for P-4).** Of the four flagship gate triggers: AB-T-03 confidence is produced by DOC24 (independent of the drafting module) and AB-T-01 claim-extraction is produced by the Addenda-A extractor (independent) -- both un-fakeable by the worker. AB-T-05 `risk_class` and AB-T-02 `executed_assurance_basis` are, as written, settable by the producing/evaluating model itself -- self-reported and therefore gameable. No `GateSignalProvenance`-style construct exists in the operative set. -> P-4. --- ## 2. Unifying constructs (the synthesis the reviewers converged on) All three reviewers independently said the enforcement facts should not live as scattered booleans that can disagree. V2 consolidates them into six named constructs. Per-item dispositions (Sec. 3) reference these. ### P-1 -- `CleanVerdictEligibility` (Verdict-Honesty object) -- unifies AB-T-01/02/03/05 + staleness *Type: BETTER_IDEA. Source: Claude `VerdictHonestyEnvelope` + ChatGPT `ProofAndHonestyVerdictFloor`/"single clean-verdict gate" (Issue 1) + Gemini enforcement emphasis. Disposition: ADOPT.* One object, computed by the evaluator, aggregates every "cannot show clean" fact and applies **one precedence rule** so the gates cannot disagree. It is carried as a single passthrough into `FeedbackFindingView` (so no state-only consumer can drop it) and plugs into the existing `EvaluationChainResolutionPolicy` (the A-05 action-gate machinery from the main card) rather than minting a parallel verdict state. ```ts type VerdictHonestyDimension = | "affirmative_grounding" // AB-T-01: every in-scope factual claim supported | "assurance_floor" // AB-T-02: executed_assurance >= target (incl. quorum) | "matter_resolution" // AB-T-03: matter resolved, not *_hold | "source_documentation_tier" // AB-T-05: load-bearing claim's source meets min tier | "substrate_freshness"; // V-3(b)/(c): consumed findings/sources/TKP not stale type DimensionStatus = "met" | "downgraded_disclosed" | "unmet"; type CleanVerdictEligibility = { outcome_ref: string; dimensions: { dimension: VerdictHonestyDimension; status: DimensionStatus; detail_ref?: string; // e.g., the offending claim / SourceRetrievalOutcome signal_provenance: GateSignalProvenance }[]; // P-4 clean_verdict_allowed: boolean; // false if ANY dimension unmet (per precedence) presentation_status: | "clean" | "clean_with_disclosure" // all downgrades are policy-accepted + disclosed | "needs_verification" // existing OutcomeEvaluationState; affirmative/freshness unmet | "needs_human_judgment"; // non-degradable assurance/matter unmet at high risk precedence_applied: string; // id of the rule below recomputed_at: string; // MUST be recomputed on Sec.11.21 revalidation, never carried stale }; ``` **Precedence rule (resolves the Sec. 4 conflicts).** Evaluate dimensions in this order; the first `unmet` that is non-degradable for the outcome's risk class sets `presentation_status` and forces `clean_verdict_allowed=false`: `matter_resolution` (hold dominates everything -- you cannot grade work you cannot attribute) > `affirmative_grounding` -> `needs_verification` > `assurance_floor`/`source_documentation_tier` -> `needs_verification` (degradable: `clean_with_disclosure` only if the downgrade is policy-accepted AND disclosed) > `substrate_freshness` -> re-validate first, then re-evaluate. A downgrade is "disclosed" only if it rides the `FeedbackFindingView` passthrough. **Lints/fixtures.** `verdict.clean_presented_with_unmet_dimension`; `verdict.downgrade_not_passed_through_to_view`; `verdict.eligibility_carried_stale_after_revalidation`; `verdict.precedence_not_applied`. Fixture: an outcome that is simultaneously `affirmative_grounding=unmet` and `assurance_floor=downgraded` resolves to `needs_verification` (grounding dominates), with both facts surfaced. ### P-2 -- `ContextBoundaryRef` (canonical isolation primitive) -- unifies AB-T-09/10/11 *Type: BETTER_IDEA. Source: ChatGPT `ContextBoundaryRef` + Claude "canonical isolation = matter" + Gemini "matter absolute". Disposition: ADOPT. Resolves architect Q2 (Sec. 7).* One boundary primitive every per-matter mechanism keys off. **Matter is the dominant boundary; `context_class_key` is a subordinate within-matter sub-key.** This fixes the two-isolation-units inconsistency (AB-T-11): pattern promotion (matter-scoped, Sec. 16.6.5) and suggestion/invocation learning (`context_class_key`-scoped, Sec. 9A) now resolve through the same ref, so rejecting a suggestion on 3 matters no longer suppresses it on the other 37. ```ts type ContextBoundaryRef = { boundary_id: string; matter_ref?: string; // dominant; absent only for explicitly cross-matter/global scope context_class_key?: string; // subordinate sub-key within the matter scope_kind: "matter" | "work_product" | "library" | "source_set" | "global"; privilege_class?: "privileged" | "work_product" | "ordinary"; }; ``` Applied to: AB-T-09 portfolio admission/ledger (per-`ContextBoundaryRef` fairness), AB-T-10 append-log partition (one physical stream per boundary, for legal-hold/export/privilege-log), AB-T-11 Sec. 9A learning signals (key `context_class_key` *under* `matter_ref`), and P-1 `matter_resolution` (an `*_hold` yields a `PENDING_MATTER_ASSIGNMENT` boundary -- see Sec. 4 bug 5). Snapshot retention (AB-T-17) and quarantine (P-5) also carry it. **Lints.** `boundary.matter_scoped_mechanism_keyed_without_matter_ref`; `learning.sec9a_signal_crosses_matter_without_boundary`; `appendlog.privileged_streams_share_physical_file`. ### P-3 -- `RecoveryPolicyRegistry` -- unifies AB-T-13/14 *Type: BETTER_IDEA. Source: ChatGPT. Disposition: ADOPT. Reuses the existing `task_agent_fallback_policy` (Core Sec. 6.9.1) shape; does not invent a parallel fallback engine.* Named, deterministic recovery policies for the failure classes the adopted chaos fixtures already test but whose *behavior* is unspecified: ```ts type RecoveryTrigger = | "malformed_loadbearing_eval_output" // AB-T-14 | "durable_store_exhausted" // AB-T-13 storage half (taxonomy extension) | "mid_run_privilege_reclassification" // AB-T-13 privilege half (the high-severity net-new) | "tool_or_model_unavailable"; type RecoveryPolicy = { trigger: RecoveryTrigger; strategy: "retry_alternate_model" | "cheap_fallback" | "mark_indeterminate" | "escalate_human" | "fail_closed_write" | "pause_and_retaint"; retaint_emitted_artifacts?: boolean; // privilege-reclass: cascade re-taint over already-produced artifacts rollback_side_effects?: boolean; records_to: "ModuleDecisionRationale" | "HardCallResolutionLedger"; }; ``` The **mid-run privilege-reclassification** path is the genuine high-severity net-new (Claude split): on a reclassification event (existing, Core ~L7723) the registry triggers a re-taint cascade over artifacts already produced under the old classification -- which today does not happen. The storage-full path is largely a taxonomy extension (`durable_store_exhausted` + `fail_closed_write`) atop V3.3.1 Sec. 11.17 `WorkspaceWriteFailureKind`. **Lints.** `recovery.loadbearing_eval_malformed_without_policy`; `recovery.privilege_reclass_without_retaint_cascade`; `recovery.durable_store_exhausted_not_fail_closed`. ### P-4 -- Gate-signal integrity (cross-cutting precondition) *Type: BETTER_IDEA (the single most important addition -- Claude). Disposition: ADOPT. Resolves architect concern D2. Basis: V-5.* Every enforcement gate fires on a signal; if the producing/evaluating model supplies it, the gate launders self-assessment into a clean verdict. Tag each gate signal with provenance and forbid self-certified bypass for high-stakes work. ```ts type GateSignalProvenance = "deterministic" | "independent_check" | "self_reported"; ``` **Rule `no_self_certified_bypass`.** For an outcome whose risk class is `filing_bound | privileged_matter | evaluator_load_bearing`, a P-1 dimension may be `met`/`downgraded_disclosed` **only** if its `signal_provenance` is `deterministic` or `independent_check`. A `self_reported` signal cannot open the gate for that class. Two concrete consequences: (1) `executed_assurance_basis` (AB-T-02) MUST be derived from the execution trace -- the system reads which models/voices/checks actually ran -- never accepted from the evaluating model's claim; (2) `risk_class` (AB-T-05) for high-stakes work MUST come from a deterministic/independent source (matter classification, task type), not the producing model. **Lints.** `gate.high_stakes_dimension_met_on_self_reported_signal`; `assurance.executed_basis_model_claimed_not_trace_derived`; `risk_class.self_reported_for_high_stakes`. ### P-5 -- Evidentiary Quarantine -> DOC73 (OP-A) *Type: BETTER_IDEA. Source: Gemini. Disposition: ADOPT as OP-A obligation (owner DOC73). Cross-doc; not built here.* An artifact from a run that ended `presentation_status != clean` (downgraded / `SourceMissing` / `ambiguous_hold`) can still be picked up by background extraction (DOC73) and promoted to the Library/Corpus -- so degraded work silently becomes corpus truth. Quarantine such artifacts (`quarantined_from_promotion: true`, carrying the `ContextBoundaryRef`); DOC73 is forbidden from extracting/promoting them until a human signs a `WorkProductCertification`. Sec. 9 OP-A row `OBL-DOC73-QUARANTINE-01`. ### P-6 -- `EnforcementBadge` read-model -> DOC20 (OP-A) *Type: BETTER_IDEA. Source: ChatGPT (badge) + ChatGPT new-idea B (grouped `EnforcementCase` explanation). Disposition: ADOPT as OP-A obligation (owner DOC20). Cross-doc; not built here.* One UI read-model that computes the badge from `CleanVerdictEligibility` (P-1), so DOC20/21/22 do not each invent badge semantics. Surfaces the headline ("Needs verification -- 3 of 47 unproven, 1 source unavailable") and a click-through `EnforcementCase` grouping the contributing dimensions + their provenance. Sec. 9 OP-A row `OBL-DOC20-ENFORCEMENT-BADGE-01`. --- ## 3. Per-item dispositions (updated; cross-refs to Sec. 2 packages) Ship order (severity): **AB-T-03** (highest -- silent privilege crossing) -> AB-T-13 privilege-reclass half -> P-4 gate-signal integrity -> P-1 (AB-T-01/02/05 as one object) -> the rest. **AB-T-01 - `GAP` - ADOPT - Affirmative-proof verdict gate + `SourceMissing` token (architect #1).** Rolls into P-1 (`affirmative_grounding`). Unchanged mechanism (V1): `SourceRetrievalOutcome` makes absence a signal; `requires_affirmative_grounding` makes `satisfied` unreachable with any ungrounded in-scope claim; forces existing `needs_verification`. **V2 adds:** (i) ChatGPT "expected source" semantics -- an outcome declares its expected sources, and a missing *expected* source is `source_missing`, not a silent pass; (ii) it must compose with AB-T-05 via the ADEQUATELY-GROUNDED predicate (Sec. 4 bug 1). OP-A dependency: `EvidencePackage`/`claim_support_map` proposal co-lands. **AB-T-02 - `GAP` - ADOPT - Assurance / quorum / budget verdict floor.** Rolls into P-1 (`assurance_floor`). **D1/Q1 resolved: field-not-state** -- a structured `AssuranceExecutionRecord { target_basis[], executed_basis[], assurance_status: "met"|"downgraded_policy_accepted"|"downgraded_needs_review"|"not_met" }` (not floating booleans, not a new `satisfied_downgraded` state), carried on the P-1 passthrough, with a per-risk-class flip rule that forces `needs_verification`/`needs_human_judgment` when a *required* basis is dropped -- resolved through the existing `EvaluationChainResolutionPolicy`. *Gemini dissent noted:* Gemini wanted a discrete `satisfied_downgraded` state (a boolean gets flown past a DAG); the flip-rule + mandatory passthrough give the DAG-routable halt for the cases that matter without a new enum every consumer must learn. **V2 also folds:** quorum loss == an assurance downgrade (Sec. 4 bug 3); budget-rollback preserves the evidence layer (Sec. 4 bug 4). Quorum mechanism (`RequiredQuorumManifest`) and atomic-revision rollback (`GraphStateRollback`) unchanged from V1. **AB-T-03 - `BUG` - ADOPT (CRITICAL; ship first) - N:1 ambiguous matter-resolution -> hold gate.** Rolls into P-1 (`matter_resolution`, the dominant dimension). All three reviewers rank this highest (malpractice-grade silent privilege crossing). `MatterResolution { candidate_matter_refs; top_confidence; separation; status:"resolved"|"ambiguous_hold"|"unresolved_hold" }`. **V2 tightens (Claude schema-hole):** resolve iff a single candidate >= floor, OR >=2 candidates with `separation >= threshold`; privileged triggers use a higher floor. On `*_hold`: emit a `PENDING_MATTER_ASSIGNMENT` `ContextBoundaryRef` (P-2), fail-closed to privileged, quarantine + user prompt; never auto-bind scope/privileged context across a matter boundary (Sec. 4 bug 5). OP-A `OBL-DOC24-MATTER-CONF-01`: DOC24 supplies confidence/separation inputs. **AB-T-04 - `BUG` - FOLD-INTO-R0.4 - define `HumanGateSummary`; back `decision_log_required`.** Define `HumanGateSummary` + `HumanGateDecisionRecord { decider_ref; decision; rationale?; standard_applied?; shown_refs[]; weighed_refs?; decided_at; quorum_waived? }`. **V2 adds** `quorum_waived` (Sec. 4 bug 6) and requires `shown_refs` to include displayed-material refs. Lints `core.referenced_type_undefined`; `gate.decision_log_required_without_record_schema`. **AB-T-05 - `GAP` - ADOPT - Risk-based minimum source-documentation tier.** Rolls into P-1 (`source_documentation_tier`). `MinimumDocumentationTierPolicy` maps risk class -> required tier (`filing_bound|privileged_matter|evaluator_load_bearing => tier >= 2`). **V2 strengthens (ChatGPT Issue 4):** enforce at the point a source is *used* to support an affirmatively-graded claim (source USE, not mere existence); and `risk_class` must be provenance-clean (P-4), not self-reported. Lint `source.load_bearing_claim_supported_by_subminimum_tier`. **AB-T-06 - `GAP` - ADOPT (re-scoped, lightweight) - runtime `ModuleDecisionRationale`.** **Re-scope:** design-time rationale already exists (`TaskModuleDesignRationaleCard`); net-new is only the *runtime per-activation* rationale for ordinary `step.agent_task`/`step.red_team`/`coding` modules. Optional, materiality-scoped (material decisions only), no hidden chain-of-thought. Do not escalate into `CausalProof` (AB-T-08). **AB-T-07 - `IDEA` - ADOPT (was DISCUSS) - `CitationManifest` (write-time binding), scoped.** Reviewer consensus to adopt, **scoped** to final/filing/public/load-bearing factual text only (too heavy for every draft): a `factual_drafting` capability emits a `CitationManifest` binding generated text -> `SourceRecord` + exact quote at write time. Strengthens P-1 `affirmative_grounding`. Tension (drafting overhead) resolved by the scoping. **AB-T-08 - `IDEA` - DISCUSS (decline hard gate) - `CausalProof`.** All three reviewers: highest theater risk (an LLM asserting a rigorous causal chain != one existing). Adopt at most a light `RevisionChangeRationale` field on the revision plan; no hard escalation gate. (Partially served by `RevisionReviewPacket.finding_to_change_map`.) **AB-T-09 - `SUGGESTION` - scoped ADOPT (was architect-stop) - portfolio admission policy.** **Q2 resolved:** a thin governor is allowed only as admission/read-model policy over the *existing* scheduler, never an orchestrator (Sec. 2.2-safe). `PortfolioAdmissionDecision { may_directly_mutate_task_graph: false; may_directly_dispatch_module: false; ... }` + a `PortfolioResourceLedger`, both keyed by `ContextBoundaryRef` (P-2). Gemini's `LocalHardwareSemaphoreLease` (physical VRAM/thermal lease) is a *complementary, distinct* primitive (physical-resource layer, likely Core-owned) flagged for later -- not the same as cost/attention admission. The V1 constituent sub-findings (unbounded `TaskPortfolioAssessment`; single attention channel; single learning-review queue; EC write contention; parallelism ceiling; model fan-in; no portfolio cost ceiling; extraction-debt; packet write amplification) remain the implementation checklist. **AB-T-10 - `SUGGESTION` - ADOPT (was DISCUSS) - per-matter append-log partition.** Via P-2: one physical stream per `ContextBoundaryRef` for `context_feedback.jsonl` / `task_audit_events.jsonl`, for legal-hold/export/privilege-log. Cheap now, expensive to retrofit. **AB-T-11 - `SUGGESTION` - ADOPT (was DISCUSS) - canonical isolation unit.** **Resolved by P-2 + Q2:** matter is dominant; Sec. 9A suggestion/invocation learning is keyed by `context_class_key` *under* `matter_ref`, ending the two-isolation-units inconsistency (rejecting a suggestion on 3 matters no longer suppresses it on 37). **AB-T-12 - `SUGGESTION` - ADOPT (lightweight; was DISCUSS) - `RunOperatorContext`.** Record operator identity/authority/handoff on a run now (minimal `RunOperatorContext`); defer full multi-actor ownership to DOC50/Sec. 20H. (ChatGPT lightweight-adopt; Claude had it out-of-scope -- split the difference: identity now, model later.) **AB-T-13 - `GAP` - ADOPT (was DISCUSS; split) - chaos-fixture behavior contracts.** Via P-3. **Split (Claude):** storage-full half is largely covered by V3.3.1 Sec. 11.17 `WorkspaceWriteFailureKind` -> net-new is a taxonomy extension (`durable_store_exhausted` + `fail_closed_write`); the **mid-run privilege-reclassification half is the high-severity net-new** -- a re-taint cascade over already-produced artifacts on a reclassification event (Core ~L7723), which today does not happen. **AB-T-14 - `GAP` - ADOPT (was DISCUSS; re-scoped) - malformed load-bearing eval recovery.** Via P-3 (`malformed_loadbearing_eval_output`). **Re-scope:** reuse the existing `task_agent_fallback_policy` (Core Sec. 6.9.1) shape, scoped to the evaluator/judge path (the V1 anchor `SubAgentFallbackPolicy` was a proposal, not live text). **AB-T-15 - `GAP` - ADOPT (was DISCUSS; re-scoped) - TKP freshness DETECTION.** **Re-scope (V-3c):** enforcement already exists (`stale_pack_behavior: block_graph_proposals` + Sec. 8A.5 `TaskKnowledgePackReadiness`); net-new is only the freshness-*detection* policy `TaskKnowledgePackFreshnessPolicy` (TTL + invalidation triggers) that feeds the existing enforcement. Same obligation as the main card's `OBL-DOC24-CTXPKT-01`; cross-reference, do not duplicate. **AB-T-16 - `SUGGESTION` - ADOPT (was DISCUSS) - task-opportunity classifier calibration + kill-switch.** `TaskOpportunityClassifierPolicy`: FP/FN thresholds, regression fixtures, and a kill-switch for task-routing drift that protects the direct-first experience. (Distinct from comprehensive D12, which calibrates the *planner* threshold.) **AB-T-17 - `GAP` - ADOPT (was DISCUSS; re-scoped) - reference-aware snapshot pin.** **Re-scope (V-3d):** time/governance retention already exists (90-day default + matter-class table); net-new is a `SnapshotReferencePin` that pins a snapshot while a live reliance packet / dirty outcome / replay references it (overrides the Sec. 16 default), carrying the `ContextBoundaryRef` + privilege constraint. Not a new retention contract. --- ## 4. Interaction bugs in the adopted set (Claude pass 2 -- visible only when items run together) These are defects that appear once the adopted items co-exist; P-1's single object + precedence rule resolves most. 1. **AB-T-01 ^ AB-T-05 do not compose.** A claim marked "supported" by a tier-0 ephemeral source passes AB-T-01 but violates AB-T-05. Fix: an `ADEQUATELY_GROUNDED` predicate = `support_status==supported` AND source tier >= `MinimumDocumentationTierPolicy` -- P-1 `affirmative_grounding` consumes the predicate, not raw support. 2. **AB-T-01 vs AB-T-02 verdict precedence.** Grounding-fail wants `needs_verification`; assurance-downgrade wants `satisfied`+flag. Which wins? P-1 precedence rule: grounding dominates -> `needs_verification`. 3. **AB-T-02a and AB-T-02b are one mechanism.** Quorum loss IS an assurance downgrade; `specialist_panel_judgment` cannot be an `executed_basis` if quorum is unmet. Unify both under `assurance_floor` (do not maintain a separate quorum verdict path). 4. **Budget-rollback discards grounded work.** `GraphStateRollback` on a budget interrupt reverts the artifact but must preserve the evidence/grounding layer (a `ResumeProgressSummary`) so a resumed run does not re-prove everything. 5. **Matter-limbo (AB-T-03 ^ 09 ^ 10) has no home.** `ambiguous_hold` has no resolved matter -- so whose append-log, whose budget? Fix: the `PENDING_MATTER_ASSIGNMENT` `ContextBoundaryRef` (P-2) owns the limbo log/budget; fail-closed to privileged. 6. **Quorum waiver needs a record (AB-T-02b ^ AB-T-04).** A human waiving a missing quorum must produce a `HumanGateDecisionRecord` with `quorum_waived: true` -- otherwise the waiver is itself unaudited. 7. **`needs_verification` can spin (AB-T-01).** Repeated failed verification must be bounded: wire to the `RevisorTerminationLedger` (prior round) so bounded failed-verification escalates to `needs_human_judgment` rather than looping. 8. **Stale eligibility (cross-cutting).** `assurance_downgraded` and the whole `CleanVerdictEligibility` (P-1) MUST be recomputed on the Sec. 11.21 revalidation cascade, never carried stale, and must clear if re-evaluation achieves full assurance. --- ## 5. Gemini systemic bugs (folded in per architect; whole-family re-review, triaged) These came from Gemini's deep-dive over the entire Addenda B family rather than the Test harvest; folded in per architect instruction, each triaged net-new-vs-covered with its binding site. **BUG-01 - `BUG` - ADOPT - Quadratic token blowout (unpruned iteration context).** The revision loop ingests full prior-iteration history each pass (`RevisionIntelligencePacket`), with no eviction -> O(n^2) token growth on long revisions and eventual window overflow. Fix: a `ContextEvictionPolicy` (sliding window; compact iterations `1..N-2` to summaries, keep the last two verbatim). Net-new (no pruning policy found at V3.3.1 Sec. 7.4/Sec. 11.20). Binds: the V3.3.1 revision loop (main-card territory; flagged for that track, recorded here). **BUG-02 - `BUG` - ADOPT - Deterministic replay dedups stochastic retries.** A deterministic `idempotency_key` causes a stochastic retry to return the cached first-run output, defeating variant evaluation / true re-sampling. Fix: include a `stochastic_nonce`/`sampling_salt` in the key formula **iff** the capability declares itself non-deterministic. Net-new (the main card's TaskReplay/G-19 is about replay *determinism*; this is the inverse). Binds: V3.3.1 Sec. 11.8 idempotency. (== Gemini's "Stochastic Idempotency Nonce Registry" proposal.) **BUG-03 - `BUG` - REDIRECT-OP-A - Async context-packet race.** `TaskRunContextPacket` budget is capped at assembly, but the forum is async append-only; a race between budget calc and prompt serialization can overflow/truncate the window. Fix: a `ContextSequenceLock` (freeze the forum timeline at a sequence id for the assembly). **This is the main card's OP-A `OBL-DOC24-CTXPKT-01` (`ContextPacketFidelityContract`)** -- the sequence-lock is that contract. Route there; do not duplicate. **BUG-04 - `GAP` - DISCUSS (security caveat) - Syntactic-taint deadlock.** A deterministic mechanical formatter run on an `external_untrusted` document yields output that stays `external_untrusted` forever, so layout can never be cleaned without quarantine. Gemini's fix bifurcates taint propagation (`semantic_ingestion` vs `syntactic_transformation`; `deterministic_mechanical` tools preserve payload but do not widen quarantine). **Caveat (P-4):** relaxing taint for "mechanical" tools is security-sensitive -- the `deterministic_mechanical` classification must itself be provenance-clean (gate-signal integrity), or it becomes an injection-laundering path. DISCUSS in the taint-model context with that constraint. *(Gemini's "Pre-Execution Memory Hydration Engine" proposal lands as AB-T-HYDR, Sec. 6.)* --- ## 6. AB-T-HYDR (new) - Task-path injection precedence ("memory hydration" at task start) *Type: BETTER_IDEA (Gemini Test-set Proposal 1 + architect additions). Disposition: ADOPT as a tracked obligation. Do NOT write into DOC24 now -- DOC24 is flattening-scoped; this card holds the obligation until placement is decided.* - **Mostly already covered (do not duplicate).** DOC23 Core Sec. 13A specifies the task-context source set (Sec. 13A.3 Task Context Isolation Invariant), the single sealed pass at run start (Sec. 13A.7 `TaskRunScopeEnvelope`), attachments + ingestion-results as scope evidence (Sec. 13A.4), and eligible-category/profile rules (Sec. 13A.5-13A.6). DOC24 supplies the de-confliction primitives (Sec. 10.3A user-directive-highest; Sec. 30 factual-contradiction hedging; Sec. 27.0A budget droppability; Sec. 38 sealed assembly + manifests). - **The genuine gap (what to build).** An explicit **instruction-precedence order** among the Sec. 13A.3 sources for conflicts that are neither factual (Sec. 30) nor budget (Sec. 27.0A) -- local blueprint guidance vs. matter policy vs. global DOC72 learned pattern. Default to develop: user directives (highest) > task initial input/objective > local blueprint guidance/RunGuidanceItems > matter/scope policy > global learned pattern; safety/policy-required non-droppable. Plus an instruction-conflict strategy (mask / merge-flag / escalate, recorded in the Sec. 38 manifest) and a provenance constraint (the source-class label driving precedence must be trace-derived, not self-reported, for high-stakes/privileged/filing work -- this is P-4 gate-signal integrity applied to injection). - **Thin open point.** Large-attachment handling: small inject directly (Sec. 27.0A); large ingest via Source Workspace/DOC25 (`SourceArtifact`/`ArtifactSegment`) -> file lane (Sec. 27) / compact (Sec. 28); front-load-extract vs. lazy-retrieve = open. Under the matter/privilege boundary (P-2). - **REVIEWER QUESTION (placement).** Where should this ultimately live -- a DOC24 OP-A (extending `OBL-DOC24-CTXPKT-01`), a Core Sec. 13A.x subsection, or split? And is the proposed precedence order right? Carried as OP-A candidate `OBL-DOC24-TASKCTX-PRECEDENCE-01` (Sec. 9); not written into DOC24 here. --- ## 7. Architect questions resolved - **Q1 - AB-T-02 field vs. state.** RESOLVED: **field-not-state** -- structured `AssuranceExecutionRecord` + mandatory passthrough + per-risk-class flip rule resolved through the existing `EvaluationChainResolutionPolicy`; no new `satisfied_downgraded` state. Gemini's DAG-routing concern is met by the flip-rule + passthrough. (AB-T-02; P-1.) - **Q2 - portfolio governor + canonical isolation unit.** RESOLVED: a thin governor is allowed only as admission/read-model policy over the existing scheduler, never an orchestrator (Sec. 2.2-safe; AB-T-09). Canonical isolation unit = **matter**, expressed as `ContextBoundaryRef` with `context_class_key` subordinate (P-2; AB-T-11); Sec. 9A learning signals are keyed under it. Gemini's hardware lease is a separate physical-resource primitive for later. - **Q3 - are AB-T-05/06/15 already satisfied?** RESOLVED: AB-T-05 net-new (ADOPT; bind to source USE). AB-T-06 partial (design-time exists; net-new = runtime per-activation only). AB-T-15 partial (enforcement exists; net-new = freshness *detection*). (See V-3, AB-T-05/06/15.) --- ## 8. DUP / DECLINED-OVERLAP -- do not re-open (carried from V1, still valid) | Test finding | Disp. | Already lives in | |---|---|---| | Transitive taint laundering (Gemini-1 Sec. 5, Gemini-2 Sec. 4) | DUP | F-03 (inherit-or-`SanitizationNode`, L5290/L5300) + C8. *Confirm F-03 is written into the amendment.* (BUG-04 refines the mechanical-tool edge.) | | Concurrent `ResearchNeed` (Grok-1b-a) | DUP | Sec. 6.3 `ResearchNeedLease` / D15 | | Cancel after side-effect (Grok-1b-e) | DUP | Sec. 6.3 `TaskCancelProtocol`; Appendix M | | "Safe to touch" idempotency should->must, concurrent-edit lock | DUP | Idempotency BUG Sec. 11.8 (L1212) + F-CONCUR-02/D15 (BUG-02 adds the stochastic-retry edge) | | Pattern C dual-verdict resolution | DUP | C1 + Appendix E | | V3.3 vs Source Workspace semantics | DUP | `TaskSourceWorkspace` identity-split BUG (L1540) | | External source-query / export-open side-effect | DUP | Appendix N; `ExternalSourceQueryPolicy`/`WorkspaceExternalizationPolicy` | | Reliance/evidence/budget/known-good/attention surfaces | DUP (proposals) | Appendix O. *Not yet in spec -- AB-T-01/02/P-1 depend on them landing.* | | Cross-doc build-readiness | DUP | "Build-ready conflicts with pending cross-doc obligations" (L1868) | | Packet fidelity / omission manifest | DUP | `TaskRunContextPacket needs omission manifest` (L1722) (BUG-03 sequence-lock -> OBL-DOC24-CTXPKT-01) | | Lifecycle causal propagation / monotonic read-model | partial DUP | D-01 (L5304); D4 monotonicity (L429) | | Model-class learning calibration gate | partial DUP (Phase-B) | D21 `cross_model_applicability` (L495) | | Criterion semantic stability | DUP | `criterion_semantics_hash is lexical not semantic` (L1430) | | Optional-substrate / branch-consumer preflight | partial DUP | `SubAgentFallbackPolicy` (L1894); "block/no-op/named degraded path" (L800) | | Substantive vs process-gap classifier | DUP | A6 (L112); two-forum clarification (L489) | | Staleness "all silent" (Grok-1a/2b) | DECLINED-OVERLAP | V-3: contract exists; residuals re-scoped into AB-T-15/17 + P-1 | | Git-branching/ShadowWorkspace; new `TaskConfirmationSignal`; flawless-exec signal; finding-chunking | DECLINED | Comprehensive Sec. 6.2 (already adjudicated) | | CONFIRMED: surface-context bleed closed; pattern-privilege bleed closed | CONFIRMED | Sec. 13A.2/3/7; Sec. 16.6.5 -- verified strengths | | CONFIRMED: policy & Evaluator/Revisor decisions richly captured; trace-honesty | CONFIRMED | Sec. 12.5; AssuranceBasis/HardCallResolutionLedger; Sec. 4.7 -- reconciled in AB-T-02 | --- ## 9. Next steps + OP-A candidate rows **Ship order:** AB-T-03 (CRITICAL) -> AB-T-13 privilege-reclass half -> P-4 gate-signal integrity -> P-1 (AB-T-01/02/05 + AB-T-07 as one proposal: "Proof & Honesty Verdict Floor") -> P-2 (`ContextBoundaryRef`, with AB-T-09/10/11) -> P-3 (AB-T-13/14) -> AB-T-15/16/17/06/12 standalone adds -> BUG-01/02 (V3.3.1 track) -> AB-T-08 / BUG-04 architect-discuss. **OP-A candidate rows (to be written through the OP-A / flattening process when finalized; nothing written to OPA/DOC24/DOC73/DOC20 here):** | OP-A id (candidate) | Owner | Obligation summary | From | |---|---|---|---| | `OBL-DOC24-MATTER-CONF-01` | DOC24 | Supply confidence/separation inputs for `MatterResolution` | AB-T-03 | | `OBL-DOC24-CTXPKT-01` (existing) | DOC24 | `TaskKnowledgePackFreshnessPolicy` + `ContextPacketFidelityContract` (sequence-lock) | AB-T-15, BUG-03 | | `OBL-DOC73-QUARANTINE-01` | DOC73 | Quarantine non-clean-verdict artifacts from Library/Corpus promotion until `WorkProductCertification` | P-5 | | `OBL-DOC20-ENFORCEMENT-BADGE-01` | DOC20 | `EnforcementBadge` read-model + `EnforcementCase` from `CleanVerdictEligibility` | P-6 | | `OBL-DOC24-TASKCTX-PRECEDENCE-01` | DOC24 (placement OPEN) | Task-path injection precedence order + conflict strategy; **reviewer question: DOC24 OP-A vs Core Sec. 13A.x vs split** | AB-T-HYDR | **Housekeeping (from V-3 / DUP):** confirm transitive-taint F-03 and read/write staleness (F-CONCUR-02) are in the amendment package, not just flagged. --- ## 10. Refreshed coverage map (V1 Sec. 6 mapping holds; V2 deltas) Every Test finding from V1 Sec. 6 remains mapped; V2 changes the *disposition/home* as follows: - **Verdict-honesty cluster -> P-1:** AB-T-01 (affirmative grounding), AB-T-02 (assurance floor; Gemini quorum/downgrade/budget all here), AB-T-03 (matter resolution; CRITICAL), AB-T-05 (source tier), staleness residual V-3(b). - **Boundary cluster -> P-2 (`ContextBoundaryRef`):** AB-T-09 (admission, scoped), AB-T-10 (append-log partition), AB-T-11 (isolation unit) + all of Claude-1's scaling sub-findings. - **Recovery cluster -> P-3:** AB-T-13 (split), AB-T-14. - **Cross-cutting:** P-4 gate-signal integrity (AB-T-02/05 triggers); P-5 quarantine (DOC73); P-6 badge (DOC20). - **Promoted DISCUSS->ADOPT:** AB-T-07, 10, 11, 13, 14, 15, 16, 17; AB-T-09 scoped; AB-T-12 lightweight; AB-T-08 stays DISCUSS (light). - **Re-scoped:** AB-T-06 (runtime only), AB-T-14 (reuse fallback), AB-T-15 (detection), AB-T-17 (pin). - **New:** AB-T-HYDR (Sec. 6); BUG-01..04 (Sec. 5); 8 interaction bugs (Sec. 4). - **FOLD-INTO-R0.4:** AB-T-04. - **DUP/DECLINED:** unchanged (Sec. 8). --- *End of card V2. Inputs: Test harvest `DOC23_Add_B_TEST_Prompts_RT_1`; comprehensive review `RED_TEAM_DOC23_ADDENDA_B_SET_V2`; the three-reviewer red-team of V1 (`DOC23 Test Set Adj Card Reviews`); operative Addenda B set @ `main`. All section/line anchors verified against the operative files. Cross-doc changes are OP-A candidates only; no spec edits made.*