DOC23 Test Set Adj Card Reviews .md
Active Working and Red Team/DOC23 Working/DOC23 Red Teaming/DOC23 Test Set Adj Card Reviews .md
ELNOR REPO READER TEXT MIRROR
Original path: Active Working and Red Team/DOC23 Working/DOC23 Red Teaming/DOC23 Test Set Adj Card Reviews .md
Source repo: /Users/OpenClaw1/Elnor/Elnor Specs
Git branch: main
Git commit: dbaa25962edc11ab30e8d4ca1715f9ae5bf77331
Generated: 2026-06-09T01:23:58.539Z
---
# Claude:
Before I dig in, let me locate the uploaded card and confirm which spec files I have cached, since my tools this session reach the sandbox and GitHub rather than the upload mount directly.The card itself isn't reachable from my sandbox (the upload mount isn't mounted here). Before I tell you I'm stuck, let me check whether it (or the purpose-audit reviews) was committed to the repo, and pull the spec files I don't have cached.Found it — the card is committed at `Active Working and Red Team/DOC23 Working/DOC23 Red Teaming/DOC23_ADDENDA_B_TEST_SET_ADJUDICATION_CARD.md`, and the basis reviews are almost certainly `DOC23 Add B TEST Prompts RT 1 (5.28).md`. Pulling the card, the basis reviews, and the spec files I don't have cached.Card is in hand. Let me read it in full before anything else.The card is fully read. It's a strong, well-organized card (17 AB-T items, clean DUP table, honest about its own Grok-citation problems). Now I need to verify the load-bearing claims against the spec text — especially the three you flagged. Let me retry the CORE spec fetch (it errored) and run the verification greps.CORE in hand. Now the verification batch against the actual spec text.Strong confirmations. V-1 (HumanGateSummary undefined), V-2 (no matter-resolution ambiguity gate), and AB-T-05 (no risk-based tier requirement) all check out against the text. But AB-T-15 needs a closer look — there *is* a `stale_pack_behavior` field and a §8A "TKP boundary and readiness" section, which bears on the card's "net-new" claim. Let me verify the remaining load-bearing items and read those TKP sections.Verifications are landing cleanly. Two things I want to read in full before writing: the §8A "TKP boundary and readiness" section + `stale_pack_behavior` (to test AB-T-15's "net-new" claim — this is the one I suspect is at least partially covered), and the exact `EmailTriggerScopeSummary` / §9A fields (for precise AB-T-03 and AB-T-11 schemas).Verification complete. The card's spec anchors check out almost everywhere I tested (HumanGateSummary undefined, the matter-resolution gap, the AssuranceBasis/state enums, §2.2, the isolation split) — with **one overclaim I'll push back on (AB-T-15)**. Here's the review.
------
# Independent review — DOC23 Addenda B Test-Set (Purpose-Audit) Adjudication Card
*Verified against live `main` text (Core R0.7.1, V3.3.1, Common V1.1.1, SW V1.0.1, Forum V1.0.1, FD V1.0.1). The basis harvest (`DOC23 Add B TEST Prompts RT 1`) is in the repo and I pulled it; I verified the card's spec claims directly rather than trusting line anchors. "Prior review" = my consolidated review of the previous (comprehensive) adjudication round.*
## 1. Top-line verdict
The "Reporting → Enforcement" framing is **correct and is the right primary organizing idea — but it's understated, and it under-credits my own prior layer.** It's correct that AB-T-01/02 (and the staleness residual-b) are a genuine *enforcement* layer — machinery that refuses a clean verdict the system can't back — sitting on top of the *observability* layer (EvidencePackage/reliance/budget) the comprehensive review designed. That's real and valuable, and the field-not-state instinct (AB-T-02) is exactly the right instantiation. **But the framing mis-files two distinct net-new residues that aren't reporting-vs-enforcement at all:** (a) a **portfolio-scale / isolation** cluster (AB-T-09/10/11) the single-run, single-matter architecture simply doesn't have primitives for, and (b) a **substrate-durability** cluster (AB-T-15/17) about whether the things verdicts point at still exist and are fresh. Calling the residue "small" also undersells **AB-T-03**, which is the single highest-severity item on the card — a silent privilege-boundary crossing on a low-confidence matter guess is a malpractice-grade risk for your practice, not a verdict-honesty nicety. Net: the dedup work is sound and the central idea is right, but I'd re-weight (AB-T-03 first) and add a second framing axis (portfolio + substrate durability).
## 2. Per-item table
| AB-T | Verdict | One-line reason |
| ---- | -------------------------------------------- | ------------------------------------------------------------ |
| 01 | **Agree** | Net-new confirmed (no `requires_affirmative_grounding`/`SourceRetrievalOutcome` in any spec); reuse my prior `ClaimSupportStatus`/`EvidencePackageClaimEntry` for the enforcement. |
| 02 | **Agree** | Field-not-state is right — but the downgrade flag must ride the view passthrough + a per-risk-class flip rule (see Q1). Net-new confirmed (no `quorum`/`target/executed_assurance`). |
| 03 | **Agree (raise severity → highest)** | Verified: `EmailTriggerScopeSummary.new_case_or_matter_candidate_refs` is plural with no confidence/status/hold; §13A.3 cond.5 admits "high-confidence match" with no uncertain-case behavior. Privilege-boundary bug. |
| 04 | **Agree** | Verified phantom ref (Core L1228/L4881 use `HumanGateSummary`, no definition; L6487 `decision_log_required:true`, no record type). This is an *instance my prior §6.2 type-owner sweep catches* — add it to that sweep. |
| 05 | **Agree** | Verified net-new: SW has only `documentation_mode` (default/suggested tier), no risk→tier *requirement*. Generalizes my prior D-04 (tier-0 can't support claims). |
| 06 | **Partial** | Net-new for the *runtime* scope, but design-time rationale already exists (`TaskModuleDesignRationaleCard`, `design_rationale`); scope it explicitly to per-activation and reuse the card pattern. |
| 07 | **Agree (DISCUSS)** | `CitationManifest` is a stronger write-time form of my `claim_support_map`; scope with AB-T-01; real drafting-overhead tension. |
| 08 | **Agree (DISCUSS, lean against gate)** | Theater risk is real; my prior `finding_to_change_map.resolution_claim` already gives the light version. A "change rationale" field at most. |
| 09 | **Partial** | A thin governor *is* justified for the 40-matter case; the card under-commits. There's a §2.2-safe minimum form (policy + ledger over the existing EC scheduler) — see Q2. |
| 10 | **Agree (DISCUSS)** | Per-matter sharding of privileged append logs is correct and cheap now, expensive to retrofit (legal-hold/export/privilege-log granularity). |
| 11 | **Agree it's real; Disagree it should wait** | Verified: §9A signals carry `context_class_key`, no `matter_id`; firewall/patterns are matter-scoped. It's a correctness bug with a clear answer (matter canonical) resolvable now via my prior `learning_scope` — see Q2. |
| 12 | **Agree (DISCUSS)** | `RunOperatorContext` is a legitimate pull-forward; genuinely out of the current operative scope by design (DOC50/§20H). |
| 13 | **Agree (DISCUSS)** | Correct — the chaos fixtures test behavior that doesn't exist yet; the behavior contract must be written first. |
| 14 | **Agree (DISCUSS)** | Malformed-output recovery on a load-bearing eval call is its own decision; connect to my prior B-24 triggers + `SubAgentFallbackPolicy`. |
| 15 | **Partial / Disagree on "net-new"** | Overclaim. Enforcement *exists* (`stale_pack_behavior:"block_graph_proposals"` + `require_preflight_before_save` + §8A.5 `TaskKnowledgePackReadiness`). Only the freshness-*detection* policy is missing — re-scope. |
| 16 | **Agree (DISCUSS)** | DOC24 task-opportunity classifier kill-switch + FP/FN thresholds is a real gap vs D12 (which calibrates the *planner*). |
| 17 | **Agree (DISCUSS)** | Global snapshot retention/indexing is the durability backing my prior reliance-decay/read-model-invalidation needs; partial-DUP call is right. |
DUP table: agree across the board, with two notes — (i) ChatGPT-2 #9 (lifecycle propagation/monotonic read-model) is better covered by my prior **read-model invalidation** (`DerivedReadModelRecord`/`ReadModelInvalidationSpec`) than by D-01/D4 alone; cite that. (ii) The housekeeping flag is correct: transitive-taint (F-03) and read/write staleness (F-CONCUR-02) are still *proposals/flagged*, not in-spec — same "depends on the amendment landing" status as AB-T-01/02's Appendix-O dependency.
## 3. Detailed notes (flagged items, with exact insertions)
Types referenced from my prior review (`GovernanceEnvelope`, `ClaimSupportStatus`, `EvidencePackageClaimEntry`, the §6.2 type-owner sweep, read-model invalidation) are reused, not re-minted.
### AB-T-01 — Affirmative-proof gate + `SourceMissing` — ADOPT
Verified net-new (only unrelated `card_source_missing`/`goal_advancement_source_missing` tokens exist). Reuse my prior `ClaimSupportStatus`.
```ts
// Source Workspace — absence becomes a positive signal
type SourceRetrievalStatus = "retrieved"|"source_missing"|"access_blocked"|"stale_only"|"ambiguous_match"|"partial";
interface SourceRetrievalOutcome {
retrieval_id: string; requested_source_ref?: SourceRef; status: SourceRetrievalStatus;
reason_code?: "http_404"|"http_403"|"auth_failed"|"timeout"|"not_found_in_corpus"|"tool_error"|"freshness_expired"|"ambiguous_query";
attempted_at: ISO8601; workspace_id: string; governance: GovernanceEnvelope; schema_version: "1.0";
}
// V3.3.1 outcome/criterion flag — bound to existing AssuranceBasis {claim_grounded_internal|source_verified_external}
interface EvaluationOutcomeDefinition { /* …existing… */ requires_affirmative_grounding?: boolean; }
// RULE affirmative_grounding: for requires_affirmative_grounding outcomes, `satisfied` is UNREACHABLE unless
// every in-scope claim EvidencePackageClaimEntry.support_status === "supported" (my prior ClaimSupportStatus)
// AND no in-scope SourceRetrievalOutcome.status ∈ {source_missing, access_blocked, ambiguous_match};
// otherwise force the EXISTING needs_verification (no new state — correct, avoids enum drift).
// Lints: outcome.affirmative_grounding_required_but_satisfied_with_ungrounded_claim;
// source.retrieval_failure_not_recorded_as_outcome; evaluator.empty_workspace_read_as_verified
```
**Prior-review modification:** my hardened `EvidencePackage`/`G-03` was the *reporting* layer (it described `claim_support_map` + `extractor_status`). AB-T-01 is the *enforcement* binding it lacked — add `requires_affirmative_grounding` + the satisfied-unreachable rule to that proposal so the two co-land.
### AB-T-02 — Assurance/quorum/budget verdict floor — ADOPT (field-not-state; see Q1 for the schema + the passthrough/flip rule).
**Prior-review modification:** `assurance_downgraded` + `executed_assurance_basis` must be added to my `FeedbackFindingView` passthrough (§6.3), exactly like the `blocking_authority_satisfied` passthrough — otherwise a state-only consumer drops the downgrade.
### AB-T-03 — N:1 ambiguous matter-resolution hold — ADOPT (highest severity)
Verified exact shape (Core §13A.8): the summary carries plural candidates and nothing else. Add:
```ts
interface MatterResolution {
candidate_matter_refs: EntityRef[]; top_confidence: number; separation: number; // top1 − top2
status: "resolved"|"ambiguous_hold"|"unresolved_hold"; resolved_matter_ref?: EntityRef; schema_version: "1.0";
}
// Add `matter_resolution: MatterResolution` to EmailTriggerScopeSummary (+ TriggerScopeSummary/ScheduledRunScopeSummary).
// RULE: bind scope/privileged context ONLY when status==="resolved".
// >1 candidate above floor with separation < threshold ⇒ ambiguous_hold;
// 0 above floor for a privileged/matter-scoped trigger ⇒ unresolved_hold; *_hold ⇒ quarantine + user prompt;
// MUST NOT auto-bind across a matter boundary on a *_hold.
// Binds to my prior GovernanceEnvelope: GovernanceEnvelope.matter_id MUST NOT be set from a non-"resolved" MatterResolution.
// Lints: task.auto_bound_matter_on_ambiguous_resolution; trigger.matter_resolution_missing_status
// OP-A: DOC24 supplies confidence/separation → cross-doc obligation row.
```
This is **distinct from and prior to** my D-24 matter firewall (which governs isolation *after* assignment). AB-T-03 is what makes a `matter_id` trustworthy in the first place. Correct, independently shippable, and I'd ship it first.
### AB-T-04 — `HumanGateSummary` / `HumanGateDecisionRecord` — FOLD-INTO-R0.4
Verified phantom ref. Same class as the prior round's B-08/D-16. Add:
```ts
interface HumanGateSummary { // discharge Core L1228/L4881
gate_id: string; gate_kind: "approval"|"review"|"hard_call"|"sign_off";
status: "pending"|"approved"|"rejected"|"deferred"|"expired";
decider_ref?: string; decided_at?: ISO8601; decision_record_ref?: StorageRef; schema_version: "1.0";
}
interface HumanGateDecisionRecord { // discharge Core L6487 decision_log_required
decision_id: string; gate_id: string; decider_ref: string;
decision: "approved"|"rejected"|"modified"|"deferred";
rationale?: string; standard_applied?: string; shown_refs: StorageRef[]; weighed_refs?: StorageRef[];
decided_at: ISO8601; governance: GovernanceEnvelope; schema_version: "1.0"; // EC-written
}
// Lints: core.referenced_type_undefined; gate.decision_log_required_without_record_schema
```
**Prior-review note:** add `HumanGateSummary` to the seed list of my **exhaustive TypeOwnerRegistry sweep (§6.2)** — that sweep is precisely what surfaces this whole class mechanically; AB-T-04 is one row of it.
### AB-T-05 — Risk-based minimum source-tier matrix — ADOPT
Verified net-new. This **generalizes my prior D-04** (which only said tier-0 can't support claims):
```ts
type DocumentationRiskClass = "filing_bound"|"privileged_matter"|"evaluator_load_bearing"|"client_deliverable"|"ordinary";
interface MinimumDocumentationTierPolicy {
policy_id: string;
rules: Array<{ risk_class: DocumentationRiskClass; min_tier: 0|1|2|3|4 }>; // filing/privileged/load_bearing ⇒ ≥2
enforced_at: "source_use_for_affirmative_claim"; schema_version: "1.0";
}
// Binds to AB-T-01 (a claim's grounding strength is capped by its source's tier).
// Reconcile with prior D-04: tier-0 lookup_receipt (may_support_downstream_claims:false) can never satisfy min_tier≥1.
// Lint: source.load_bearing_claim_supported_by_subminimum_tier
```
### AB-T-06 — `ModuleDecisionRationale` — ADOPT (lightweight, scoped)
```ts
interface ModuleDecisionRationale { // RUNTIME per-activation — NOT the design-time TaskModuleDesignRationaleCard
activation_id: string; module_id: string; step_id: string;
key_choice: string; why: string; considered_alternatives?: string[];
governance: GovernanceEnvelope; schema_version: "1.0";
}
// SCOPE: design-time rationale already exists (TaskModuleDesignRationaleCard/design_rationale). This fills the
// missing RUNTIME activation rationale for ordinary step.agent_task/red_team/coding modules. Keep light (no CausalProof).
```
### AB-T-15 — TKP freshness preflight — re-scope (PARTIAL, not net-new)
**This is my one substantive disagreement with a "net-new" call.** The enforcement the card says is missing already exists: `TaskAgentEducationProfile.stale_pack_behavior: "block_graph_proposals" | "use_live_catalog_and_warn" | "degraded_lookup_only"`, plus `require_preflight_before_save`, `require_validation_before_proposal`, and §8A.5 `TaskKnowledgePackReadiness.readiness_state`. "Don't design from a stale TKP" is `stale_pack_behavior:"block_graph_proposals"`. What's genuinely missing is the **detection** half — *when* does readiness flip to stale:
```ts
interface TaskKnowledgePackFreshnessPolicy {
pack_id: string;
max_age_before_stale_ms: number; // the missing freshness BUDGET
invalidation_triggers: Array<"module_catalog_changed"|"validator_changed"|"port_contract_changed"|"spec_source_changed"|"template_set_changed"|"policy_changed">;
on_stale: "drive_existing_stale_pack_behavior"; // feeds the EXISTING enforcement; NO new enforcement path
schema_version: "1.0";
}
// RULE: readiness_state := stale when age > budget OR any trigger fires (then existing block_graph_proposals applies).
// Lint: tkp.readiness_stale_without_freshness_policy
```
**Prior-review connection:** these `invalidation_triggers` are the same family as my `ReadModelInvalidationSpec` — a stale TKP is a stale read-model. Don't mint a bespoke mechanism; see New Idea #2.
### AB-T-09 / AB-T-11 — see Q2 (governor + isolation unit).
### AB-T-13 / 14 / 16 / 17 — DISCUSS, adopt-leaning
- **AB-T-13**: write the behavior contract before the fixtures pass. Minimum: on mid-run privilege reclassification ⇒ re-taint already-emitted artifacts (reuse my prior **transitive-taint/D-01** + read-model invalidation) and pause downstream; on storage-full during a durable write ⇒ **fail-closed** (no partial durable record), surface `local_resource_exhausted`.
- **AB-T-14**: define the recovery ladder for a load-bearing eval call — `retry_alt_model → cheap_fallback → indeterminate → escalate`, reusing my prior **B-24** triggers (`capability_unavailable_or_unhealthy`, `verification_unresolvable`) and `SubAgentFallbackPolicy`.
- **AB-T-16**: a `TaskOpportunityClassifierPolicy { fp_ceiling; fn_ceiling; kill_switch: boolean; regression_fixture_refs[] }` consuming my prior `CalibratedScore`/FormulaRegistry; the kill-switch protects the direct-first experience.
- **AB-T-17**: a `SnapshotRetentionPolicy { retain_until_event_seq | retain_while_referenced_by: ReadModelKind[]; index_by: ["evaluation_snapshot_ref"] }` — this is the **durability backing** my prior reliance-decay needs (a reliance packet that points at a GC'd snapshot is worse than none). Make my `DerivedReadModelRecord.valid_until_event_seq` depend on it.
## 4. New ideas (sparked by these adjudications)
1. **One `VerdictHonestyEnvelope`, not four scattered flags.** AB-T-01 (grounding), AB-T-02 (assurance downgrade + quorum), AB-T-03 (matter-resolution confidence), and the staleness residual-b are all the *same* question: "may this be shown as clean?" Consolidate into one envelope on the result — `{ affirmatively_grounded: bool; assurance_downgraded: bool; quorum_satisfied: bool; matter_resolution_status; consumed_stale_inputs: bool; clean_presentation_allowed: bool }` — with one rule (`clean_presentation_allowed = all clear`) and one passthrough into `FeedbackFindingView`. This is the *enforcement* analog of my prior `GovernanceEnvelope` (one mixin instead of N ad-hoc fields), and it's the cleanest expression of the card's own "Reporting→Enforcement" thesis.
2. **A single `SubstrateFreshnessPolicy` family.** AB-T-15 (TKP), AB-T-17 (snapshots), my reliance-decay, and read-model invalidation are four instances of "durable substrate goes stale; detect + act." Define one freshness/retention/invalidation-trigger taxonomy that every durable substrate references, rather than a bespoke mechanism each. Prevents exactly the drift the prior round's type-owner sweep was built to kill.
3. **Portfolio fairness as policy-over-existing-scheduler** (the Q2 mechanism, generalized): a `ResourceFairnessPolicy` consulted at *every existing admission point* (EC scheduler, §11.22 parallelism ceiling, §2.4 write queue, model fan-in) + a `PortfolioResourceLedger` read-model — a cross-cutting policy, never a new actor. Reusable for any shared-resource arbitration without tripping §2.2.
4. **"Absence is a first-class result" as a cross-cutting primitive.** AB-T-01's `SourceRetrievalOutcome` (a positive token for a failed fetch) is one instance of a general rule: every retrieval/tool-call/evidence-check/capability-lookup must emit a positive outcome even on failure, so "nothing came back" is never indistinguishable from "didn't run." A shared `NegativeResultRecord` discipline (ties to my prior `NullResult`/B-24 work) would close a whole family of silent-empty bugs at once.
## 5. Answers to your three questions
**Q1 — AB-T-02 field-not-state: right?** Yes — with two additions. `OutcomeEvaluationState` already carries the verdict (`satisfied`, `needs_verification`, …); "downgraded" is an orthogonal *assurance-quality* axis, so a `satisfied_downgraded` state would conflate two axes and force every state-consumer (routing matrix, transitions, UI) to learn a new state. The field is correct **iff** you add: (1) a **required passthrough** of `assurance_downgraded` + `executed_assurance_basis` onto the presentation projection (`FeedbackFindingView`) with a hard rule that downgraded ⇒ no clean badge — so the flag rides the same channel as the verdict and can't be silently dropped; and (2) a **per-risk-class flip rule** deciding when a downgrade additionally forces `needs_verification` rather than passing as `satisfied (downgraded)`:
```ts
interface EvaluationResultEnvelope { // Common §3 additions
/* …existing… */
target_assurance_basis: AssuranceBasis[]; // requested
executed_assurance_basis: AssuranceBasis[]; // achieved (both reuse the existing 14-value AssuranceBasis)
assurance_downgraded: boolean; // executed does not meet target
assurance_downgrade_explanation?: string;
}
interface AssuranceDowngradePolicy { // the flip rule
risk_class: DocumentationRiskClass;
required_bases: AssuranceBasis[]; // if a required basis was dropped → force needs_verification
tolerated_substitutions: Array<{ from: AssuranceBasis; to: AssuranceBasis }>; // else satisfied + downgraded:true
schema_version: "1.0";
}
// FeedbackFindingView (extend my prior §6.3): + assurance_downgraded: boolean; executed_assurance_basis: AssuranceBasis[];
// RULE verdict_presentation: assurance_downgraded === true ⇒ MUST NOT render "verified"/clean; render "verified (downgraded)".
// Lint: envelope.executed_below_target_assurance_presented_as_clean
```
So: a flag for the general case, governed by a policy for when it must also move the verdict — strictly better than a blanket new state.
**Q2 — AB-T-09 governor vs §2.2; minimum-viable form; canonical isolation unit.** Verified §2.2 ("No central runtime orchestrator," Core L323) rejects an orchestrator that **acts on workflow** — "makes ad-hoc workflow decisions, rewires execution, or spawns sub-agents." It does **not** prohibit bounded, policy-driven admission control; it explicitly says "runtime autonomy is **bounded by**…", and an EC scheduler already exists (L1631). So a thin governor is justified and §2.2-safe **if it is a policy + accounting read-model, not an actor**:
```ts
interface PortfolioResourcePolicy {
portfolio_cost_ceiling_usd?: number; // the missing portfolio-level cap (every current budget is per-run)
per_matter_fair_share: Record<string, number>; parallelism_shares?: Record<string,number>;
write_priority_weights?: Record<string, number>; // consumed by the EXISTING EC write queue (§2.4)
schema_version: "1.0";
}
interface PortfolioResourceLedger { /* cross-matter usage accounting */ } // a DerivedReadModelRecord (my §6.7)
// Enforcement = DETERMINISTIC GATE at existing admission points (EC scheduler L1631, §11.22 ceiling, §2.4 queue):
// it may DELAY / THROTTLE / ORDER a queue; it MUST NOT rewire graphs, make ad-hoc workflow decisions, or spawn sub-agents.
// Admission control ≠ orchestration. Bounded by explicit policy (no ad-hoc decisions) + inspectable (the ledger).
```
The line that keeps it on the right side of §2.2: it **orders and throttles a queue; it never decides workflow or spawns**. That's the bounded-autonomy case §2.2 contemplates, not the orchestrator it bans.
**Canonical isolation unit: matter, with `context_class_key` as a within-matter sub-key.** Verified the inconsistency: firewall + pattern promotion are matter-scoped (L4073, §16.6.5, L9901-9935), but §9A's `TaskSuggestionFeedbackEvent`/`TaskInvocationLearningSignal` carry `context_class_key` and **no `matter_id`** — so rejecting a suggestion on 3 matters suppresses it on the other 37 (§9A.4). Matter is the privilege boundary, so it must be the outer key. The fix is **my prior `GovernanceEnvelope.learning_scope`** applied to §9A:
```ts
// Add to §9A.1/§9A.2 signals: matter_id: string; governance: GovernanceEnvelope;
// Isolation key := (matter_id, context_class_key). matter_id is the firewall; context_class_key generalizes WITHIN it.
// Cross-matter generalization of invocation behavior ALLOWED only when GovernanceEnvelope.learning_scope ∈ {firm, networked}
// AND data_class ∈ {public, internal} (never for privileged/matter-scoped signals).
// Lint: learning.invocation_signal_missing_matter_scope ; learning.suppression_crossed_matter_without_scope_permission
```
This makes both subsystems consistent (both key on matter as the firewall) and is the concrete §9A application of the `learning_scope` field I already proposed — so **AB-T-11 is resolvable now, not an open architect question.**
**Q3 — of AB-T-05/06/15, is any already satisfied by the specs?**
- **AB-T-05 — no, genuinely net-new.** SW has only `documentation_mode` (a default/suggested tier, L115/L235); there is no risk-based *requirement*. The card is right.
- **AB-T-06 — partially.** Net-new for *runtime per-activation* rationale, but **design-time** rationale already exists (`TaskModuleDesignRationaleCard`, `design_rationale`, §8.x). Adopt, but scope it to runtime and reuse the existing card pattern (don't imply design-time rationale is absent).
- **AB-T-15 — yes, partially already satisfied, and the card overclaims it.** The *enforcement* preflight exists: `stale_pack_behavior:"block_graph_proposals"` + `require_preflight_before_save` + §8A.5 `TaskKnowledgePackReadiness`. Only the freshness-*detection* policy (TTL + invalidation triggers that flip readiness to stale) is genuinely missing. Re-scope AB-T-15 from "net-new preflight" to "freshness-detection policy feeding the existing readiness/stale-pack enforcement."
------
**Modifications to my prior-round suggestions, consolidated:** (1) my `EvidencePackage`/G-03 gains AB-T-01's enforcement binding (`requires_affirmative_grounding` + satisfied-unreachable rule); (2) my `FeedbackFindingView` passthrough gains `assurance_downgraded`/`executed_assurance_basis` (AB-T-02); (3) my D-04 tier-0 rule is subsumed by AB-T-05's full `MinimumDocumentationTierPolicy`; (4) my §6.2 type-owner sweep seed-list gains `HumanGateSummary` (AB-T-04); (5) my `learning_scope` is the concrete fix for AB-T-11 and must be applied to §9A; (6) my read-model invalidation + reliance-decay gain AB-T-17's snapshot-retention as their durability backing and absorb AB-T-15's TKP freshness as one instance of New-Idea-#2's `SubstrateFreshnessPolicy`.
Two offers: I can run the same **line-by-line dedup-completeness check** on the card's §6 matrix against the raw `TEST Prompts RT 1` harvest that I did last round (I have it pulled), and/or **commit this review** to the repo's `DOC23 Red Teaming/` folder. Say the word on either.
Good — let me push harder on this. The most valuable thing I can do is hunt for *cross-item interaction bugs* in the card's proposals (the way the B-02/B-27 collision surfaced last round), re-test a few more "net-new" and "DUP" calls against the text, and check whether my own first-pass schemas have holes. Let me verify the spec-dependent questions first.Sandbox reset and dropped the cached specs. Re-fetching and running the verification in a single cell so a restart can't strand it mid-way.Verifications back, and they materially change three of the card's calls plus surface a cluster of cross-item interaction bugs. Here's the deeper dive.
# Deeper dive — DOC23 Addenda B Test-Set card (re-review)
## The sharpened framing point (the biggest thing I'd add)
On a second pass, the "Reporting → Enforcement" framing isn't just *understated* — it's missing the precondition that makes the enforcement layer real. **Every enforcement gate in this card fires on a signal that, if the producing model supplies it, can be gamed or hallucinated to bypass the gate.** AB-T-05's risk class, AB-T-02's `executed_assurance_basis`, AB-T-01's "which claims are in-scope/factual," AB-T-03's confidence — if the model that did the work also self-certifies whether the gate applies, you've built a gate with the key taped to it. An enforcement layer whose *triggers* are self-reported doesn't enforce; it launders. So the deepest net-new requirement isn't "enforcement," it's **enforcement with trustworthy triggers** — and the card doesn't touch trigger integrity. Two of its four flagship gates are exposed (AB-T-05, AB-T-02); two are already protected by independent producers (AB-T-03 gets confidence from DOC24; AB-T-01 gets claims from the Addenda-A extractor) — which is exactly the pattern the exposed two should copy. I make this New Idea #1 below; it's the single most important addition.
## A. Corrections to the card's net-new / DUP calls (verified against text)
**AB-T-17 is over-claimed as net-new — it's partially covered, like AB-T-15 was.** The card says "no global retention/indexing contract." Not true: `EvaluationSnapshot` retention already exists — <cite>"snapshots are persisted via EC. Retention follows the EvaluationArtifactGovernancePolicy per §16. Default retention: 90 days"</cite> (V3.3.1 L2415), plus the matter-classification retention table (L7739: 7y/indefinite/10y/…), `retention_rules[]` (L7623), and `retention_policy_ref` on the workspace (L5916). The genuine gap is narrower and sharper: **retention is time/governance-based, not reference-aware** — a snapshot can be GC'd at its 90-day expiry *while a live `TaskReliancePacket`, a `dirty` downstream outcome, or a replay/rollback still references it*. Re-scope AB-T-17 to a **reference-pin**, not a new retention contract:
```ts
interface SnapshotReferencePin {
snapshot_ref: StorageRef;
pinned_by_refs: StorageRef[]; // reliance packets, dirty outcomes, open read-models
survives_time_expiry: true; // overrides the §16 90-day default while pinned
pin_released_when: "all_referrers_resolved_or_invalidated";
schema_version: "1.0";
}
// RULE: §16 retention MAY NOT GC a snapshot with a live SnapshotReferencePin.
// PRIVILEGE CONSTRAINT: a privileged snapshot may not be pinned into a non-privileged matter's read-model (leak).
// Lint: snapshot.gc_eligible_while_referenced ; snapshot.cross_matter_pin_violates_privilege
```
This is also the durability backing my prior reliance-decay needs — fold it there.
**AB-T-13 should be split; the two halves have very different novelty.** Verified: the storage-full half is largely covered — V3.3.1 §11.17 already has a typed `WorkspaceWriteFailureKind` taxonomy with `receipt_recovery_required` ("artifact written but receipt write failed," L4430/L5593) and `validation.workspace_failure_used_partially_completed_blanket` (L8493). So the storage-exhaustion case is a **taxonomy extension** (add `durable_store_exhausted` + a fail-closed contract), not a new mechanism. The mid-run-**privilege-reclassification** half is the genuinely high-severity net-new piece — and matter re-classification is already a contemplated event (<cite>"Re-classification…"</cite>, V3.3.1 L7723), it just has no cascade. **Elevate the privilege half** (it's the same privilege-firewall family as AB-T-03/D-24) and connect it to my prior transitive-taint + read-model invalidation:
```ts
// On matter privilege re-classification (existing event, L7723) → MANDATORY cascade:
// re-taint every already-emitted artifact, forum post, context packet, and reliance packet for that matter;
// invalidate derived read-models (my prior ReadModelInvalidationSpec); quarantine any already-exported items.
// add WorkspaceWriteFailureKind: "durable_store_exhausted" (fail-closed: no partial durable record)
// Lints: privilege.reclassification_without_retaint_cascade ; write.durable_store_exhausted_committed_partial
```
**AB-T-14 cites a proposal, not the live spec.** The card anchors it to "SubAgentFallbackPolicy (comprehensive L1894)" — but that's a *proposal* from the comprehensive review, not live text. The live spec has `task_agent_fallback_policy` (§6.9.1, L2914) and a rolling-hash "fallback to candidate-only mode plus replan" (L5768) — neither covers a malformed envelope from a **load-bearing evaluator/judge** call. So AB-T-14 stays net-new, but reuse the **existing** `task_agent_fallback_policy` shape and scope it to the evaluator path; don't invent against a proposal.
## B. Cross-item interaction bugs (the category that matters most)
These are defects that appear only when the adopted items run together — none is visible reading a single item.
**1. AB-T-01 ∧ AB-T-05 don't compose — you can pass grounding with a thinly-documented source.** AB-T-01 gates on `support_status === "supported"`; AB-T-05 gates on source tier ≥ 2. They're independent checks on the *same* claim, so a claim "supported" by a tier-0 lookup receipt passes AB-T-01 and silently violates AB-T-05. Compose them into one predicate:
```ts
// a claim is ADEQUATELY GROUNDED iff:
// support_status === "supported"
// AND supporting_source.tier >= MinimumDocumentationTierPolicy.min_tier_for(claim.risk_class)
// AB-T-01's satisfied-gate consumes ADEQUATELY GROUNDED, not bare support_status.
// Lint: claim.supported_but_subminimum_tier_treated_as_grounded
```
**2. AB-T-01 ∧ AB-T-02 have no verdict precedence.** When grounding fails (AB-T-01 → `needs_verification`) *and* assurance is downgraded (AB-T-02 → `satisfied`+flag), which wins? Undefined today. This is what my "VerdictHonestyEnvelope" idea needs to resolve — now with the precedence I omitted last round:
```ts
interface VerdictHonestyEnvelope {
affirmatively_grounded: boolean; assurance_downgraded: boolean; quorum_satisfied: boolean;
matter_resolution_status: "resolved"|"ambiguous_hold"|"unresolved_hold";
consumed_stale_inputs: boolean; signal_integrity: "clean"|"some_self_reported";
presentation: "clean"|"downgraded"|"needs_verification"|"blocked";
}
// PRECEDENCE (highest wins — this is the missing rule):
// matter_resolution_status != "resolved" → blocked (no verdict at all)
// !affirmatively_grounded || consumed_stale_inputs → needs_verification
// (signal_integrity=="some_self_reported" && work is high_stakes/privileged) → needs_verification
// assurance_downgraded || !quorum_satisfied → downgraded (never a clean badge)
// else → clean
```
**3. AB-T-02a and AB-T-02b are the same mechanism.** Verified `specialist_panel_judgment` = <cite>"multiple specialists, aggregated"</cite> (V3.3.1 L1842). So a forum that loses a mandated specialist (2b, quorum) *cannot truthfully* report `specialist_panel_judgment` as its executed basis — that **is** an assurance downgrade (2a). Don't model them as two independent checks:
```ts
// RULE: quorum_satisfied === false for a panel outcome ⇒ executed_assurance_basis EXCLUDES specialist_panel_judgment
// ⇒ assurance_downgraded === true. One mechanism; quorum is a special case of assurance integrity.
```
**4. AB-T-02 budget-rollback discards grounded work.** `GraphStateRollback` to `base_version` correctly reverts the half-mutated *artifact*, but if it also drops the accumulated `SourceRetrievalOutcome`s / claim-support, then "Resume?" re-fetches everything and the grounding work is lost. Preserve the evidence layer across an artifact rollback:
```ts
// On budget-interrupt rollback: revert artifact → base_version BUT retain SourceWorkspace evidence state +
// write a ResumeProgressSummary { grounded_claims[]; retrieved_sources[]; outcomes_satisfied[] } so resume is incremental.
```
**5. AB-T-03 ∧ AB-T-09 ∧ AB-T-10: matter-limbo has no home.** An `ambiguous_hold`/`unresolved_hold` trigger has no resolved matter — so which matter's append log (AB-T-10) takes its audit events, and whose budget (AB-T-09) accounts the quarantine work? Nothing answers this. Add a quarantine context, fail-closed to privileged:
```ts
const PENDING_MATTER_ASSIGNMENT = "matter:__pending_assignment__";
// *_hold records (logs, resource accounting, GovernanceEnvelope) attach here with data_class:"privileged"
// (treat unresolved as maximally restricted until proven otherwise). On resolution → migrate to the real matter.
// Lint: hold.records_written_to_concrete_matter_before_resolution
```
**6. AB-T-02b ∧ AB-T-04: a quorum waiver needs a decision record.** A human who approves *despite* unmet quorum must leave a trail, or you get a clean approval with a silently-waived specialist requirement:
```ts
// HumanGateDecisionRecord gains: quorum_waived?: boolean; waived_required_participants?: string[];
// RULE: ForumApprovalResult.quorum_satisfied===false may only proceed via a HumanGateDecisionRecord with quorum_waived===true.
```
**7. AB-T-01 `needs_verification` can spin — connect it to last round's termination ledger.** A permanently-404 source routes a fact-bound outcome to `needs_verification` indefinitely. AB-T-01 must hand off to the `RevisorTerminationLedger` I specified last round so bounded failed-verification attempts escalate to `needs_human_judgment`/`unable_to_evaluate`. Cross-round dependency — flag it explicitly so the implementer wires them.
**8. `assurance_downgraded` must be recomputed on cascade revalidation.** §11.21 marks downstream outcomes `dirty` and re-evaluates them; the downgrade flag must be **recomputed each revalidation, never carried stale** — and must *clear* if the re-eval achieves full assurance. Same rule for the whole VerdictHonestyEnvelope. Without it, a fixed problem still shows a downgrade badge (or worse, a stale clean badge).
## C. Holes in my own first-pass schemas
**9. AB-T-02's `executed < target` was undefined over `AssuranceBasis[]`.** Comparing assurance arrays needs an ordering. Use a required-set rule as primary, a rank table as secondary:
```ts
// primary (safer): downgrade iff a REQUIRED basis is absent
// assurance_downgraded ||= outcome.required_assurance_bases.some(b => !executed_assurance_basis.includes(b))
// secondary (weaker-substitute detection): an illustrative rank, architect sets exact values
const ASSURANCE_RANK: Partial<Record<AssuranceBasis,number>> = {
deterministic_check:5, human_confirmed_in_run:5, structured_validation:4, source_verified_external:4,
specialist_panel_judgment:4, policy_backed:4, claim_grounded_internal:3, trace_verified:3,
coverage_mapping:3, statistical_threshold:3, comparative_judge:2, historical_baseline:2, llm_expert_judgment:1 };
// assurance_downgraded ||= max(rank of executed) < max(rank of target)
```
**10. `MatterResolution` didn't handle the single-candidate case.** Tighten the resolve condition:
```ts
// resolved iff: (exactly one candidate with top_confidence >= resolve_floor)
// OR (>=2 candidates, top >= resolve_floor AND separation >= sep_threshold)
// privileged/matter-scoped triggers use a HIGHER resolve_floor than ordinary triggers.
// everything else → ambiguous_hold (≥2 close) or unresolved_hold (none clears floor)
```
## D. New ideas
**1. Gate-signal integrity ("no self-certified bypass") — the headline.** Make every enforcement gate's *trigger* carry provenance, and forbid self-reported triggers from satisfying a gate on high-stakes/privileged/filing-bound work:
```ts
type GateSignalProvenance = "deterministic" | "independent_check" | "self_reported";
// attach to: AB-T-05 risk_class, AB-T-02 executed_assurance_basis, AB-T-01 claim-scope, AB-T-03 confidence
// RULE no_self_certified_bypass: for high_stakes|privileged|filing_bound, a gate MUST require
// deterministic OR independent_check provenance; a self_reported signal cannot satisfy/relax it.
// Lint: gate.satisfied_on_self_reported_signal_for_high_stakes
// Today: AB-T-05 risk_class and AB-T-02 executed_basis are self-reportable (fix); AB-T-03 (DOC24) and
// AB-T-01 (independent Addenda-A extractor) are already external — keep them that way.
```
This is the precondition the whole "enforcement" layer rests on, and it's the enforcement-side analog of my prior "store derivation receipts, not asserted statuses."
**2. `executed_assurance_basis` must be trace-attested, not model-claimed (an instance of #1, but worth its own contract).** The card credits trace-honesty (§4.7). Extend it: `executed_assurance_basis` is **computed from the execution trace** (which voices/models/checks actually ran), never set by the evaluating model. A model claiming `[specialist_panel_judgment]` after running one model is exactly the dishonesty AB-T-02 exists to stop — so the field that drives AB-T-02 must itself be system-derived. `validation.executed_assurance_basis_not_trace_derived`.
**3. Consolidate the four flags into the one `VerdictHonestyEnvelope`** (schema in B-2) rather than shipping AB-T-01/02/03 + staleness as four independent fields+lints. One envelope, one precedence rule, one passthrough into the view — the enforcement analog of the single `GovernanceEnvelope` from last round. It also gives the UI one place to compute the badge instead of four consumers each guessing.
## E. Re-weighting and bottom line
Severity order I'd actually ship in: **AB-T-03** (privilege-boundary bug) → **AB-T-13 privilege-reclassification half** (re-taint cascade; I'd split it out and raise it from DISCUSS) → **New Idea #1 gate-signal integrity** (without it AB-T-01/02/05 are partly theater) → **AB-T-01/02/05 as the unified VerdictHonestyEnvelope** → the rest. The card's dedup and its core insight are sound; what this pass adds is that (a) two more "net-new" calls (AB-T-17, AB-T-13-storage) are partly pre-covered and should be re-scoped like AB-T-15 was, (b) the adopted items have real composition bugs (grounding×tier, verdict precedence, quorum=downgrade, matter-limbo, rollback-loses-evidence, needs_verification spin), and (c) the enforcement layer needs trigger integrity to be more than a reporting layer with stricter labels.
Two offers stand from last round (line-by-line dedup check of §6 against the raw harvest; commit this to the repo) — and if useful, I can now also write the **VerdictHonestyEnvelope + gate-signal-integrity** as a single consolidated proposal spec, since five of the card's items collapse into it.
# ChatGPT
## 1. Top-line verdict
The card’s central framing is **correct but slightly understated**. The net-new residue is indeed mostly **Reporting → Enforcement**: the existing and prior-proposed system can describe trust, but the test set adds gates that prevent clean pass/approval/verified surfaces when proof, assurance level, quorum, matter assignment, or source documentation is insufficient. The card itself states that the residue is about preventing a clean `satisfied`/`approved`/`verified` verdict where the system could not prove, check, or assign the work correctly. I would add one sibling theme: **admission control before execution**—TKP freshness, task-opportunity classifier drift, matter-resolution ambiguity, and portfolio pressure are all preflight or admission problems, not just verdict-reporting problems.
I agree with most net-new calls. The strongest adoptions are **AB-T-01, AB-T-02, AB-T-03, AB-T-05, AB-T-06, AB-T-15, AB-T-16, and AB-T-17**. I would change several `DISCUSS` rows to scoped `ADOPT`, especially AB-T-07, AB-T-10, AB-T-13, AB-T-14, AB-T-15, AB-T-16, and AB-T-17.
------
## 2. Per-item table
| ID | View | One-line reason |
| ------- | ----------------------------------------------- | ------------------------------------------------------------ |
| AB-T-01 | **Agree / ADOPT** | Net-new. Current specs have assurance concepts and source workspace records, but no affirmative proof gate or retrieval-failure token. |
| AB-T-02 | **Partial / ADOPT with mods** | Field-not-state is right for downgraded assurance, but the field needs a derivation record and a “clean verdict allowed” gate. |
| AB-T-03 | **Agree / ADOPT** | Ambiguous N:1 matter binding is a real privilege/correctness gap; hold gate is needed. |
| AB-T-04 | **Agree / FOLD-INTO-R0.4** | Phantom human-gate summary/decision log belongs in the same contract-hardening pass as human feedback and gate receipts. |
| AB-T-05 | **Agree / ADOPT** | Current Source Workspace has tiers and `documentation_mode`, but no risk-based minimum tier matrix. |
| AB-T-06 | **Partial / ADOPT lightweight** | Real gap, but only material module decisions should require rationale; do not capture hidden chain-of-thought. |
| AB-T-07 | **Partial / change to scoped ADOPT** | Write-time citation binding is valuable for claim-bearing/final text, but too heavy if applied to every drafting token. |
| AB-T-08 | **Agree with DISCUSS / lean against hard gate** | `CausalProof` is likely theater; use a lightweight change-rationale record instead. |
| AB-T-09 | **Partial / DISCUSS but specify MVP** | A thin governor is justified if it is a read-model/admission policy, not a central orchestrator. |
| AB-T-10 | **Partial / change to ADOPT** | Per-matter append-log partitioning is a cheap privilege/legal-hold improvement; do it if logs carry privileged material. |
| AB-T-11 | **Partial / change to ADOPT policy decision** | The matter-vs-context-class inconsistency is real; canonical isolation must be explicit. |
| AB-T-12 | **Partial / DISCUSS lightweight now** | Full multi-actor ownership can defer, but run operator intent/identity should be recorded now. |
| AB-T-13 | **Partial / change to ADOPT** | Chaos fixtures without behavior contracts are incomplete; define privilege-reclassification and storage-full behavior. |
| AB-T-14 | **Partial / change to ADOPT** | Load-bearing malformed eval output needs deterministic recovery, not generic error handling. |
| AB-T-15 | **Partial / change to ADOPT** | Not satisfied by current specs; already covered by my prior R2 recommendation, but should land now. |
| AB-T-16 | **Partial / change to ADOPT** | DOC24 task-opportunity calibration is distinct from planner calibration; add thresholds and kill switch. |
| AB-T-17 | **Partial / change to ADOPT** | Required snapshot refs are not enough; retention/indexing must guarantee replay/audit availability. |
------
## 3. Detailed notes and spec insertions
### AB-T-01 — Affirmative-proof verdict gate + retrieval-failure token
**Disposition:** Agree with `ADOPT`; modify mechanism to reuse the prior R2 derivation/receipt discipline.
This is genuinely net-new in the operative specs. V3.3.1 already has `AssuranceBasis` values such as `source_verified_external` and `claim_grounded_internal`, and it already has `needs_information` / `needs_verification` outcome states. But Source Workspace currently records query result status only as `success | empty | partial | error`, which is too coarse for an evaluator proof gate. The card’s proposed `SourceRetrievalOutcome` is therefore not duplicative; it is the missing enforcement token.
**Spec insert: Source Workspace V1.1 §2.4 — SourceRetrievalOutcome**
```ts
type SourceRetrievalOutcomeStatus =
| "retrieved"
| "source_missing"
| "access_blocked"
| "stale_only"
| "ambiguous_match"
| "retrieval_error"
| "not_attempted_policy_blocked";
type SourceRetrievalReasonCode =
| "http_404"
| "permission_denied"
| "paywall_or_auth_required"
| "provider_timeout"
| "provider_rate_limited"
| "query_returned_no_results"
| "only_stale_versions_found"
| "multiple_candidate_sources"
| "policy_disallowed_source_type"
| "connector_unavailable"
| "unknown";
interface SourceRetrievalOutcome {
retrieval_outcome_id: string; // "srcro-{ulid}"
query_id?: string; // SourceQueryRecord.query_id
source_record_ref?: SourceRecordRef; // present when retrieved or stale_only
status: SourceRetrievalOutcomeStatus;
reason_code?: SourceRetrievalReasonCode;
attempted_source_type:
| "document_library"
| "web"
| "api"
| "database"
| "email"
| "file_system"
| "browser"
| "connector"
| "manual_upload"
| "prior_task_output";
target_claim_refs: ClaimRef[];
target_scope_refs: ArtifactScopeRef[];
provider_ref?: string;
retrieval_receipt_ref?: StorageRef;
policy_decision_refs: PolicyEvaluationRef[];
created_at: ISO8601;
schema_version: "1.0";
}
```
**Modify existing SourceQueryRecord:**
```ts
interface SourceQueryRecord {
query_id: string;
query_text: string;
query_kind: "freeform" | "structured" | "database_query" | "api_call";
executed_by_module_id: string;
executed_by_activation_seq: number;
executed_at: ISO8601;
result_source_ids: string[];
result_status: "success" | "empty" | "partial" | "error";
retrieval_outcome_refs: StorageRef[]; // NEW: SourceRetrievalOutcome[]
result_cost: CostEstimate;
schema_version: "1.1";
}
```
**Spec insert: Common Contracts §3.8 — Affirmative grounding gate**
```ts
type ClaimSupportStatus =
| "supported"
| "partially_supported"
| "unsupported"
| "contradicted"
| "not_checked"
| "source_missing"
| "access_blocked"
| "indeterminate_stale_or_unverified";
interface OutcomeGroundingRequirement {
outcome_id: string;
requires_affirmative_grounding: boolean;
claim_scope:
| "all_factual_claims"
| "claims_marked_load_bearing"
| "claims_within_target_scope"
| "custom_claim_set";
required_support_statuses_for_clean_pass: ClaimSupportStatus[]; // default ["supported"]
required_assurance_basis: Array<
| "claim_grounded_internal"
| "source_verified_external"
>;
missing_or_blocked_source_route:
| "needs_verification"
| "needs_information"
| "needs_human_judgment";
schema_version: "1.0";
}
interface OutcomeProofGateReceipt {
receipt_id: string;
result_id: string;
outcome_id: string;
grounding_requirement_ref: StorageRef;
claim_support_derivation_receipt_refs: StorageRef[];
retrieval_outcome_refs: StorageRef[];
all_in_scope_claims_supported: boolean;
blocked_by_claim_refs: ClaimRef[];
blocked_by_retrieval_outcome_refs: StorageRef[];
clean_satisfied_allowed: boolean;
forced_state_if_not_allowed:
| "needs_verification"
| "needs_information"
| "needs_human_judgment";
created_at: ISO8601;
schema_version: "1.0";
}
```
**Normative rule:**
```ts
RULE affirmative_grounding_verdict_floor:
IF OutcomeGroundingRequirement.requires_affirmative_grounding == true
AND OutcomeProofGateReceipt.clean_satisfied_allowed == false:
EvaluationResultEnvelope.overall_state MUST NOT be "satisfied".
The producer MUST set overall_state to forced_state_if_not_allowed.
```
**Lints:**
```ts
validation.outcome.affirmative_grounding_required_but_satisfied_with_ungrounded_claim
validation.source.retrieval_failure_not_recorded_as_outcome
validation.evaluator.empty_workspace_read_as_verified
validation.proof_gate_missing_for_affirmatively_grounded_outcome
```
This should co-land with the prior R2 `EvidencePackage` / `ClaimSupportDerivationReceipt` work. The current card correctly says the proposed `EvidencePackage` is not yet operative and AB-T-01 depends on it.
------
### AB-T-02 — Assurance/quorum/budget verdict floor
**Disposition:** Partial. Adopt, but not as a new `OutcomeEvaluationState`.
The card is right to reject a new `satisfied_downgraded` state. `OutcomeEvaluationState` already carries runtime/disposition semantics; adding a downgraded-pass state would blur state, assurance, and UI presentation. The existing V3.3.1 enum already separates dispositions like `satisfied`, `needs_information`, `needs_verification`, and `needs_human_judgment`. Assurance mismatch is not itself an outcome state; it is a **verdict floor / presentation constraint**.
But the card’s field-only proposal is too thin. A coding agent needs a derivation object, a “clean verdict allowed” boolean, and routes for non-degradable assurance miss.
**Spec insert: Common Contracts §3.9 — Assurance execution and verdict floor**
```ts
type AssuranceDowngradeKind =
| "model_class_downgrade"
| "specialist_unavailable"
| "panel_quorum_not_met"
| "source_verification_unavailable"
| "budget_or_time_cap"
| "tool_unavailable"
| "policy_disallowed_required_assurance";
type AssuranceSatisfactionStatus =
| "met"
| "downgraded_policy_accepted"
| "downgraded_needs_review"
| "not_met";
type VerdictPresentationStatus =
| "clean_pass"
| "pass_with_disclosed_downgrade"
| "blocked_no_clean_verdict";
interface AssuranceExecutionRecord {
assurance_execution_id: string;
result_id: string;
outcome_id: string;
target_assurance_basis: AssuranceBasis[];
minimum_required_assurance_basis: AssuranceBasis[];
executed_assurance_basis: AssuranceBasis[];
assurance_status: AssuranceSatisfactionStatus;
downgrade_kind?: AssuranceDowngradeKind;
downgrade_allowed_by_policy_ref?: PolicyEvaluationRef;
downgrade_accepted_by_user_ref?: StorageRef;
clean_verdict_allowed: boolean;
presentation_status: VerdictPresentationStatus;
explanation: string;
evidence_refs: StorageRef[];
created_at: ISO8601;
schema_version: "1.0";
}
```
**Modify EvaluationResultEnvelope:**
```ts
interface EvaluationResultEnvelope {
// existing fields...
assurance_execution_ref?: StorageRef; // AssuranceExecutionRecord
// DO NOT add "satisfied_downgraded" to OutcomeEvaluationState.
}
```
**Normative rule:**
```ts
RULE assurance_verdict_floor:
IF AssuranceExecutionRecord.assurance_status == "met":
clean_verdict_allowed = true;
presentation_status = "clean_pass";
IF assurance_status == "downgraded_policy_accepted":
EvaluationResultEnvelope.overall_state MAY be "satisfied";
clean_verdict_allowed = false;
presentation_status = "pass_with_disclosed_downgrade";
TaskReliancePacket MUST include "assurance_downgrade" as a limitation.
IF assurance_status IN ("downgraded_needs_review", "not_met"):
EvaluationResultEnvelope.overall_state MUST NOT be "satisfied";
route to "needs_verification" or "needs_human_judgment" per policy.
```
**Plan-review quorum: Task Forum §8.5A**
```ts
type RequiredParticipantRole =
| "domain_specialist"
| "source_verifier"
| "format_checker"
| "risk_reviewer"
| "task_agent"
| "human_user"
| "custom";
interface RequiredQuorumManifest {
manifest_id: string;
room_id: string;
plan_id?: string;
run_id: string;
required_participants: Array<{
role: RequiredParticipantRole;
participant_ref?: string;
min_count: number;
mandatory: boolean;
}>;
present_participants: Array<{
role: RequiredParticipantRole;
participant_ref: string;
attended: boolean;
vote_ref?: StorageRef;
}>;
quorum_satisfied: boolean;
missing_required_participants: Array<{
role: RequiredParticipantRole;
participant_ref?: string;
missing_count: number;
}>;
decision_allowed_clean: boolean;
degraded_decision_allowed_by_policy_ref?: PolicyEvaluationRef;
created_at: ISO8601;
schema_version: "1.0";
}
```
**Budget/interruption: V3.3.1 §11.13A**
```ts
interface RevisionInterruptRollbackReceipt {
receipt_id: string;
run_id: string;
plan_id: string;
base_candidate_version_ref: CandidateArtifactVersionRef;
interrupted_candidate_version_ref?: CandidateArtifactVersionRef;
completed_step_ids: string[];
incomplete_step_ids: string[];
interrupt_reason:
| "max_llm_calls"
| "budget_exhausted"
| "timeout"
| "user_cancelled"
| "local_resource_exhausted"
| "policy_blocked";
rolled_back_to_base: boolean;
graph_state_rollback_ref: StorageRef;
clean_revision_allowed: false;
created_at: ISO8601;
schema_version: "1.0";
}
```
**Lints:**
```ts
validation.envelope.executed_below_target_assurance_presented_as_clean
validation.forum.quorum_unsatisfied_returned_as_approved
validation.revision.partial_plan_committed_after_interrupt
validation.assurance_execution_record_missing_for_target_assurance
```
------
### AB-T-03 — Ambiguous matter-resolution hold gate
**Disposition:** Agree with `ADOPT`.
This is a privilege-boundary bug. The card identifies the right distinction: the matter firewall protects after assignment; it does not protect against bad assignment. The fix should produce a reusable binding object, not just a one-off email trigger field.
**Spec insert: Core §13A.9 — MatterResolution**
```ts
type MatterResolutionStatus =
| "resolved"
| "ambiguous_hold"
| "unresolved_hold"
| "not_matter_scoped";
interface MatterResolutionCandidate {
matter_ref: EntityRef;
confidence: number; // 0..1
evidence_refs: StorageRef[];
match_features: string[];
}
interface MatterResolution {
resolution_id: string;
trigger_ref: StorageRef;
trigger_kind:
| "email"
| "calendar_event"
| "document"
| "task_invocation"
| "user_instruction"
| "connector_event";
candidate_matter_refs: MatterResolutionCandidate[];
top_confidence?: number;
separation?: number; // top_confidence - next_confidence
confidence_floor: number;
separation_floor: number;
status: MatterResolutionStatus;
resolved_matter_ref?: EntityRef;
hold_reason?:
| "multiple_candidates_above_floor"
| "top_confidence_below_floor"
| "privileged_scope_requires_user_assignment"
| "conflicting_evidence"
| "no_candidate";
user_assignment_required: boolean;
user_assignment_ref?: StorageRef;
created_at: ISO8601;
schema_version: "1.0";
}
```
**Normative rule:**
```ts
RULE no_privileged_autobind_on_ambiguous_matter:
IF trigger is privileged OR matter-scoped
AND MatterResolution.status != "resolved":
the system MUST quarantine the trigger into a "needs matter assignment" holding area;
MUST NOT route privileged content into a task, workspace, forum, or memory scope.
```
**Lints:**
```ts
validation.task.auto_bound_matter_on_ambiguous_resolution
validation.trigger.matter_resolution_missing_status
validation.privileged_trigger_routed_without_resolved_matter
```
------
### AB-T-04 — HumanGateSummary and decision log
**Disposition:** Agree with `FOLD-INTO-R0.4`.
This belongs in the same R0.4 contract-hardening bundle as `HumanOutcomeFeedbackEvent`, `HardCallResolution`, and the prior “all meaningful UI actions must map to durable command/receipt” discipline.
**Spec insert: Common Contracts §10A — Human gate records**
```ts
type HumanGateDecision =
| "approved"
| "rejected"
| "approved_with_limitations"
| "requested_changes"
| "deferred"
| "cancelled";
interface HumanGateSummary {
gate_id: string;
gate_kind:
| "plan_review"
| "assurance_downgrade_acceptance"
| "external_side_effect_approval"
| "hard_call_resolution"
| "candidate_version_acceptance"
| "library_promotion"
| "memory_promotion"
| "matter_assignment";
required: boolean;
status:
| "not_required"
| "pending"
| "satisfied"
| "rejected"
| "expired"
| "superseded";
decision_record_ref?: StorageRef;
blocking_scope:
| "entire_run"
| "segment"
| "artifact"
| "outcome"
| "module"
| "side_effect_only"
| "none";
created_at: ISO8601;
updated_at: ISO8601;
schema_version: "1.0";
}
interface HumanGateDecisionRecord {
decision_record_id: string;
gate_id: string;
decider_ref: string;
decision: HumanGateDecision;
rationale?: string;
standard_applied?: string;
shown_refs: StorageRef[];
weighed_refs?: StorageRef[];
policy_decision_refs: PolicyEvaluationRef[];
decided_at: ISO8601;
schema_version: "1.0";
}
```
**Lints:**
```ts
validation.core.referenced_type_undefined
validation.gate.decision_log_required_without_record_schema
validation.human_gate_summary_without_decision_record
```
------
### AB-T-05 — Risk-based minimum source documentation tier
**Disposition:** Agree with `ADOPT`; strengthen as a policy matrix tied to claim support.
This is not satisfied by the current Source Workspace. The operative spec has `SourceDocumentationMode`, a tier table, and `documentation_mode` governing default tier selection. It also defines tier 0 through tier 4. But there is no rule that a filing-bound, privileged, legal-authority, or evaluator-load-bearing source must meet a minimum tier before it supports a claim. The card’s “net-new” call is right.
**Spec insert: Source Workspace §3.4 — Minimum documentation tier policy**
```ts
type SourceUseRiskClass =
| "casual_lookup"
| "internal_context"
| "drafting_background"
| "claim_support"
| "evaluator_load_bearing"
| "legal_authority"
| "filing_bound"
| "external_publication"
| "privileged_matter"
| "memory_or_library_promotion";
type SourceDocumentationTier = 0 | 1 | 2 | 3 | 4;
interface MinimumDocumentationTierRule {
risk_class: SourceUseRiskClass;
min_tier: SourceDocumentationTier;
require_verification_state?: Array<
| "verified"
| "partially_verified"
| "unverified"
| "failed"
>;
require_freshness_state?: Array<
| "fresh"
| "unknown"
| "needs_update"
| "stale"
>;
on_subminimum:
| "promote_source_tier"
| "needs_verification"
| "needs_information"
| "block_clean_pass"
| "allow_with_disclosed_limitation";
policy_ref?: string;
schema_version: "1.0";
}
interface MinimumDocumentationTierPolicy {
policy_id: string;
default_rules: MinimumDocumentationTierRule[];
matter_override_rules?: Array<{
matter_id: string;
rules: MinimumDocumentationTierRule[];
}>;
created_at: ISO8601;
schema_version: "1.0";
}
const DEFAULT_MIN_DOCUMENTATION_TIER_POLICY: MinimumDocumentationTierPolicy = {
policy_id: "default.source_min_tier.v1",
default_rules: [
{
risk_class: "casual_lookup",
min_tier: 0,
on_subminimum: "allow_with_disclosed_limitation",
schema_version: "1.0"
},
{
risk_class: "drafting_background",
min_tier: 1,
on_subminimum: "promote_source_tier",
schema_version: "1.0"
},
{
risk_class: "claim_support",
min_tier: 2,
require_verification_state: ["verified", "partially_verified"],
on_subminimum: "needs_verification",
schema_version: "1.0"
},
{
risk_class: "evaluator_load_bearing",
min_tier: 2,
require_verification_state: ["verified"],
on_subminimum: "block_clean_pass",
schema_version: "1.0"
},
{
risk_class: "legal_authority",
min_tier: 3,
require_verification_state: ["verified"],
require_freshness_state: ["fresh"],
on_subminimum: "block_clean_pass",
schema_version: "1.0"
},
{
risk_class: "filing_bound",
min_tier: 3,
require_verification_state: ["verified"],
require_freshness_state: ["fresh"],
on_subminimum: "block_clean_pass",
schema_version: "1.0"
},
{
risk_class: "privileged_matter",
min_tier: 2,
on_subminimum: "needs_verification",
schema_version: "1.0"
},
{
risk_class: "memory_or_library_promotion",
min_tier: 3,
require_verification_state: ["verified"],
on_subminimum: "block_clean_pass",
schema_version: "1.0"
}
],
created_at: "<set_at_policy_install>",
schema_version: "1.0"
};
```
**Source-use declaration:**
```ts
interface SourceUseDeclaration {
declaration_id: string;
source_record_ref: SourceRecordRef;
used_for:
| "claim_support"
| "background_context"
| "citation"
| "legal_authority"
| "evaluator_basis"
| "library_promotion"
| "memory_promotion";
risk_classes: SourceUseRiskClass[];
target_claim_refs: ClaimRef[];
target_artifact_scope_refs: ArtifactScopeRef[];
observed_tier: SourceDocumentationTier;
required_min_tier: SourceDocumentationTier;
verification_state_at_use:
| "verified"
| "partially_verified"
| "unverified"
| "failed";
freshness_state_at_use:
| "fresh"
| "stale"
| "needs_update"
| "unknown";
policy_result:
| "allowed"
| "allowed_with_disclosed_limitation"
| "requires_tier_promotion"
| "blocked_clean_pass"
| "needs_verification"
| "needs_information";
policy_ref: string;
created_at: ISO8601;
schema_version: "1.0";
}
```
**Lints:**
```ts
validation.source.load_bearing_claim_supported_by_subminimum_tier
validation.source.tier0_used_as_claim_support
validation.source.legal_authority_used_without_fresh_verified_tier3
validation.source.use_declaration_missing_for_claim_support
```
This modifies my prior R2 package by adding a real **risk matrix**, not only tier hygiene.
------
### AB-T-06 — ModuleDecisionRationale
**Disposition:** Partial. Adopt lightweight and materiality-scoped.
The gap is real: Evaluator/Revisor have structured assurance, states, failures, and receipts; ordinary modules may leave only prompt/output artifacts. But requiring a rationale for every module action would add overhead and tempt hidden chain-of-thought capture. The right contract is a **rationale summary for material decisions only**.
**Spec insert: Common Contracts §10B — ModuleDecisionRationale**
```ts
type ModuleDecisionKind =
| "content_structure_choice"
| "argument_or_strategy_choice"
| "source_selection"
| "tool_choice"
| "degradation_choice"
| "routing_choice"
| "scope_choice"
| "artifact_mutation_choice"
| "externalization_choice"
| "not_recorded";
type ModuleDecisionMateriality =
| "non_material"
| "material_to_artifact"
| "material_to_verdict"
| "material_to_source_support"
| "material_to_policy_or_privilege"
| "material_to_external_side_effect";
interface ModuleDecisionRationale {
decision_id: string;
task_id: string;
run_id: string;
module_id: string;
module_activation_seq: number;
decision_kind: ModuleDecisionKind;
materiality: ModuleDecisionMateriality;
input_refs: StorageRef[];
output_refs: StorageRef[];
rationale_summary: string; // user-visible; no hidden reasoning
alternatives_considered_summary?: string; // optional, brief
policy_decision_refs: PolicyEvaluationRef[];
source_workspace_snapshot_ref?: StorageRef;
evaluation_snapshot_ref?: StorageRef;
hidden_reasoning_included: false;
created_at: ISO8601;
schema_version: "1.0";
}
```
**Normative rule:**
```ts
RULE module_decision_rationale_required:
A ModuleDecisionRationale is REQUIRED when materiality is any value other than "non_material".
It is OPTIONAL for non-material routine operations.
It MUST NOT include hidden chain-of-thought; rationale_summary is a user-visible explanation.
```
**Lints:**
```ts
validation.material_module_decision_without_rationale
validation.module_decision_rationale_contains_hidden_reasoning
validation.module_decision_rationale_missing_input_or_output_ref
```
This should be integrated with the prior R2 `ExplanationTrace` discipline.
------
### AB-T-07 — CitationManifest
**Disposition:** Change `DISCUSS` to **scoped ADOPT**.
The card is right that a `CitationManifest` is stronger than post-hoc `EvidencePackage`. I would not require it for all text generation/mutation. Require it for **claim-bearing final/public/filing-bound text** and for claims that later feed AB-T-01’s proof gate.
**Spec insert: Common Contracts §7A — CitationManifest**
```ts
type CitationManifestScope =
| "final_artifact"
| "filing_bound_artifact"
| "external_publication"
| "claim_bearing_section"
| "draft_internal_optional";
interface CitationManifestEntry {
entry_id: string;
claim_ref: ClaimRef;
artifact_scope_ref: ArtifactScopeRef;
source_record_ref: SourceRecordRef;
source_tier_at_write: SourceDocumentationTier;
support_anchor_ref: StorageRef; // SourceEvidenceAnchor
quote_or_excerpt_ref?: StorageRef;
support_strength:
| "direct"
| "indirect"
| "contextual"
| "contradicts";
verification_state_at_write:
| "verified"
| "partially_verified"
| "unverified"
| "failed";
freshness_state_at_write:
| "fresh"
| "stale"
| "needs_update"
| "unknown";
created_by_module_id: string;
created_by_activation_seq: number;
schema_version: "1.0";
}
interface CitationManifest {
manifest_id: string;
task_id: string;
run_id: string;
artifact_ref: StorageRef;
artifact_version_ref: StorageRef;
manifest_scope: CitationManifestScope;
entries: CitationManifestEntry[];
coverage_status:
| "complete_for_claim_bearing_text"
| "partial"
| "not_required_internal_draft"
| "claim_extractor_unavailable";
created_at: ISO8601;
schema_version: "1.0";
}
```
**Normative rule:**
```ts
RULE citation_manifest_required_for_high_risk_text:
A CitationManifest is REQUIRED for final_artifact, filing_bound_artifact,
external_publication, and claim_bearing_section scopes.
For draft_internal_optional scopes, it MAY be omitted unless the draft is used
as an evaluator load-bearing source.
```
**Lints:**
```ts
validation.claim_bearing_final_artifact_without_citation_manifest
validation.citation_manifest_entry_source_below_min_tier
validation.citation_manifest_claim_without_support_anchor
```
------
### AB-T-08 — CausalProof ledger
**Disposition:** Agree with `DISCUSS`; lean against hard gate.
Do not add a `CausalProof` hard gate. It would often be an LLM asserting proof, not a proof. The useful, buildable piece is a lightweight change-rationale record that links finding → revision step → changed scope.
**Spec insert: V3.3.1 §7.5A — RevisionChangeRationale**
```ts
interface RevisionChangeRationale {
rationale_id: string;
plan_id: string;
step_id: string;
source_finding_refs: string[]; // EvaluationFinding.finding_id
changed_scope_refs: ArtifactScopeRef[];
rationale_summary: string; // user-visible
intended_effect:
| "resolve_finding"
| "reduce_severity"
| "preserve_constraint"
| "avoid_regression"
| "improve_support"
| "format_only"
| "other";
confidence:
| "low"
| "medium"
| "high";
hidden_reasoning_included: false;
created_at: ISO8601;
schema_version: "1.0";
}
```
**Lint:**
```ts
validation.meaning_bearing_revision_without_change_rationale
```
------
### AB-T-09 and AB-T-11 — Portfolio governor and canonical isolation unit
**Disposition:** Partial. Keep `DISCUSS` as architect-stop for the full governor, but define an MVP that does not violate §2.2’s anti-central-orchestrator rule. AB-T-11 should be promoted to **ADOPT as a policy decision** because the matter-vs-context-class inconsistency is a correctness issue. The card’s AB-T-09 list of scaling tail risks is real: unbounded assessment arrays, single attention/learning queues, machine-wide contention, no portfolio cost ceiling, and extraction debt all show up as a 40-matter problem. AB-T-11 is the isolation seam behind that problem.
**Canonical isolation answer:** do not choose “matter vs context-class.” Define a `ContextBoundaryRef` where **matter dominates context class** for legal/privileged work. Context class can be a secondary bucket only inside the governing boundary.
**Spec insert: Core §3A.9 — ContextBoundaryRef**
```ts
type IsolationBoundaryKind =
| "privileged_matter"
| "matter"
| "client"
| "project"
| "task_family"
| "context_class"
| "global_nonprivileged";
interface ContextBoundaryRef {
boundary_id: string;
boundary_kind: IsolationBoundaryKind;
matter_id?: string;
client_id?: string;
project_id?: string;
task_family_id?: string;
context_class_key?: string;
privileged: boolean;
canonical_for:
| "pattern_promotion"
| "suggestion_learning"
| "invocation_learning"
| "attention_rollup"
| "budget_governance"
| "append_log_partition"
| "source_workspace";
parent_boundary_ref?: string;
schema_version: "1.0";
}
```
**Normative isolation rule:**
```ts
RULE matter_boundary_dominates_context_class:
For privileged or matter-scoped work, matter_id / privileged_matter is the
primary isolation boundary.
context_class_key MAY be used only inside the matter/client/project boundary.
A rejection, suppression, preference, or learned invocation behavior from
one matter MUST NOT suppress or alter behavior in another matter unless it
passes a governed cross-boundary promotion policy.
```
**Thin governor MVP: Core §16B — PortfolioGovernor**
```ts
type PortfolioGovernorDecision =
| "allow_now"
| "queue"
| "defer_low_priority"
| "require_user_attention"
| "throttle_background_only"
| "block_due_to_policy";
interface PortfolioResourceSnapshot {
snapshot_id: string;
boundary_ref: ContextBoundaryRef;
active_run_count: number;
queued_run_count: number;
pending_hard_call_count: number;
pending_user_attention_count: number;
estimated_cost_vector_ref?: StorageRef;
actual_cost_vector_ref?: StorageRef;
local_parallelism_slots_used: number;
local_parallelism_slots_available: number;
extraction_backlog_count?: number;
stale_current_view_count?: number;
created_at: ISO8601;
schema_version: "1.0";
}
interface PortfolioAdmissionPolicy {
policy_id: string;
boundary_kind: IsolationBoundaryKind;
max_active_runs?: number;
max_background_runs?: number;
max_pending_attention_items?: number;
max_estimated_cost_vector_ref?: StorageRef;
fairness_policy:
| "matter_round_robin"
| "priority_then_age"
| "deadline_first"
| "user_pinned_first"
| "manual_only";
may_dispatch_work: false; // key anti-orchestrator invariant
schema_version: "1.0";
}
interface PortfolioAdmissionDecision {
decision_id: string;
request_ref: StorageRef;
boundary_ref: ContextBoundaryRef;
resource_snapshot_ref: StorageRef;
policy_ref: string;
decision: PortfolioGovernorDecision;
rationale_summary: string;
// The governor recommends/advises; EC or the existing task runtime enforces.
emits_command_ref?: string;
may_directly_mutate_task_graph: false;
may_directly_dispatch_module: false;
created_at: ISO8601;
schema_version: "1.0";
}
```
**Lints:**
```ts
validation.portfolio_governor_dispatches_work_directly
validation.context_class_learning_crosses_matter_without_policy
validation.privileged_matter_missing_context_boundary
validation.portfolio_attention_item_without_boundary
```
This changes my prior suggestions by making `ContextBoundaryRef` the canonical isolation primitive and by explicitly making the governor non-orchestrating.
------
### AB-T-10 — Per-matter privileged append-log sharding
**Disposition:** Change `DISCUSS` to **ADOPT**, if `context_feedback.jsonl` / `task_audit_events.jsonl` contain privileged or matter-scoped records. This is cheap now and very expensive later.
**Spec insert: Core §22.1A — AppendLogPartitionPolicy**
```ts
type AppendLogPartitionKind =
| "global_nonprivileged"
| "by_matter"
| "by_client"
| "by_context_boundary"
| "local_only";
interface AppendLogPartitionPolicy {
policy_id: string;
log_kind:
| "context_feedback"
| "task_audit_events"
| "run_board_events"
| "feedback_consumption_receipts"
| "learning_signal_spool";
partition_kind: AppendLogPartitionKind;
partition_key_source:
| "matter_id"
| "client_id"
| "context_boundary_ref"
| "data_class"
| "none";
privileged_requires_partition: boolean;
local_only_requires_partition: boolean;
export_policy:
| "export_per_partition"
| "export_global_with_redaction"
| "export_blocked";
legal_hold_scope:
| "partition"
| "global_log"
| "matter"
| "client";
schema_version: "1.0";
}
interface AppendLogPartitionRef {
log_kind: string;
partition_kind: AppendLogPartitionKind;
partition_key: string;
physical_path_ref: StorageRef;
governance: GovernanceEnvelope;
schema_version: "1.0";
}
```
**Lints:**
```ts
validation.privileged_event_written_to_global_unpartitioned_log
validation.append_log_partition_missing_for_matter_scoped_event
```
------
### AB-T-12 — RunOperatorContext / handoff primitive
**Disposition:** Keep as `DISCUSS`, but adopt a narrow version now.
This should not pull DOC50-style multi-user ownership forward wholesale. But a run should record who is operating it, what role they have, and any handoff instruction.
**Spec insert: Core §5.4A — RunOperatorContext**
```ts
type RunOperatorRole =
| "owner"
| "reviewer"
| "delegate"
| "assistant_operator"
| "observer";
interface RunOperatorContext {
context_id: string;
task_id: string;
run_id: string;
operator_ref: string;
operator_role: RunOperatorRole;
operator_intent_summary?: string;
handoff_note_for_downstream?: string;
authority_scope:
| "view_only"
| "approve_internal_changes"
| "approve_external_side_effects"
| "assign_matter"
| "full_owner";
created_at: ISO8601;
expires_at?: ISO8601;
schema_version: "1.0";
}
```
**Lint:**
```ts
validation.human_handoff_without_operator_context
```
------
### AB-T-13 — Behavior contracts behind chaos fixtures
**Disposition:** Change `DISCUSS` to **ADOPT**.
The card is right: fixtures without behavior contracts do not test anything real. Mid-run privilege reclassification and storage-full during durable write need normative fail-closed behavior.
**Spec insert: Common Contracts §10C — RuntimeIntegrityEvent**
```ts
type RuntimeIntegrityEventKind =
| "privilege_reclassification"
| "storage_pressure"
| "durable_write_failed"
| "local_resource_exhausted"
| "model_output_malformed"
| "policy_snapshot_changed";
interface RuntimeIntegrityEvent {
event_id: string;
event_kind: RuntimeIntegrityEventKind;
task_id: string;
run_id?: string;
affected_refs: StorageRef[];
severity:
| "info"
| "warning"
| "blocking"
| "critical";
default_action:
| "pause_scope"
| "fail_closed"
| "retry_once"
| "rollback_uncommitted"
| "re_taint_emitted_artifacts"
| "require_human_review";
created_at: ISO8601;
schema_version: "1.0";
}
```
**Privilege reclassification:**
```ts
interface PrivilegeReclassificationEvent {
event_id: string;
prior_governance_ref: StorageRef;
new_governance: GovernanceEnvelope;
affected_artifact_refs: StorageRef[];
affected_source_record_refs: SourceRecordRef[];
affected_forum_post_refs: StorageRef[];
affected_learning_signal_refs: StorageRef[];
required_remediation:
| "re_taint"
| "quarantine"
| "withdraw_from_prompt_context"
| "invalidate_read_models"
| "human_review";
remediation_receipt_refs: StorageRef[];
created_at: ISO8601;
schema_version: "1.0";
}
```
**Storage pressure:**
```ts
interface DurableWritePressurePolicy {
policy_id: string;
on_storage_full:
| "fail_closed_no_partial_write"
| "pause_scope"
| "spill_to_approved_partition"
| "block_new_runs";
preserve_audit_receipt: boolean;
allow_lossy_compaction: false;
schema_version: "1.0";
}
```
**Lints:**
```ts
validation.chaos_fixture_without_behavior_contract
validation.privilege_reclassification_without_read_model_invalidation
validation.storage_full_partial_write_without_failure_receipt
```
------
### AB-T-14 — Malformed output recovery for load-bearing eval call
**Disposition:** Change `DISCUSS` to **ADOPT**.
Generic schema validation is not enough. A load-bearing evaluation call needs deterministic recovery: retry, fallback, downgrade to indeterminate, or escalate.
**Spec insert: Common Contracts §9.8 — LoadBearingEvaluationRecoveryPolicy**
```ts
type LoadBearingEvaluationOutputFailure =
| "schema_invalid"
| "missing_required_field"
| "unknown_enum_value"
| "malformed_json"
| "inconsistent_verdict_state"
| "missing_snapshot_ref"
| "unsafe_free_text_error";
type EvaluationRecoveryAction =
| "retry_same_model_once"
| "retry_different_model"
| "retry_with_stricter_schema"
| "fallback_deterministic_scorer"
| "emit_indeterminate"
| "raise_hard_call"
| "abort_scope";
interface LoadBearingEvaluationRecoveryPolicy {
policy_id: string;
applies_when:
| "load_bearing_outcome"
| "filing_bound"
| "external_publication"
| "privileged_matter"
| "always";
max_retry_count: number;
recovery_order: EvaluationRecoveryAction[];
final_failure_state:
| "needs_verification"
| "needs_human_judgment"
| "unable_to_evaluate"
| "unrecoverable";
preserve_malformed_output_ref: boolean;
schema_version: "1.0";
}
interface MalformedEvaluationOutputReceipt {
receipt_id: string;
result_attempt_id: string;
failure_kind: LoadBearingEvaluationOutputFailure;
raw_output_ref?: StorageRef;
validation_failure_refs: string[];
policy_ref: string;
recovery_action_taken: EvaluationRecoveryAction;
resulting_state:
| "needs_verification"
| "needs_human_judgment"
| "unable_to_evaluate"
| "unrecoverable"
| "retrying";
created_at: ISO8601;
schema_version: "1.0";
}
```
**Lints:**
```ts
validation.load_bearing_malformed_eval_without_recovery_receipt
validation.malformed_eval_output_silently_retried
validation.malformed_eval_output_returned_as_clean_pass
```
------
### AB-T-15 — TKP freshness preflight
**Disposition:** Change `DISCUSS` to **ADOPT**.
This is **not satisfied by the operative specs**. The Core owns Task Agent education/TKP as part of its purpose, but current text does not give the “don’t design from stale TKP” preflight. The test card’s “zero coverage in comprehensive review” is no longer true relative to my final R2 package, because I already added a TKP freshness policy there. But it is still net-new relative to the live specs, so it should be included.
**Spec insert: Core §8.7A — Task Knowledge Pack freshness preflight**
```ts
type TKPFreshnessStatus =
| "fresh"
| "stale"
| "invalidated"
| "unknown"
| "missing";
type TKPDesignMode =
| "full_design_allowed"
| "advisory_only"
| "blocked_until_recompile"
| "user_confirmation_required";
type TKPInvalidatingEventKind =
| "module_catalog_changed"
| "port_registry_changed"
| "capability_registry_changed"
| "side_effect_policy_changed"
| "template_registry_changed"
| "task_runtime_contract_changed"
| "common_contract_schema_changed"
| "type_owner_registry_changed";
interface TaskKnowledgePackFreshnessPolicy {
policy_id: string;
max_age_hours: number;
invalidating_event_kinds: TKPInvalidatingEventKind[];
min_required_tkp_semver: string;
min_required_module_catalog_hash?: string;
min_required_port_registry_hash?: string;
min_required_policy_snapshot_hash?: string;
on_stale_for_task_design: TKPDesignMode;
on_unknown_freshness: TKPDesignMode;
schema_version: "1.0";
}
interface TaskKnowledgePackPreflightReceipt {
receipt_id: string;
task_agent_activation_id: string;
tkp_ref?: StorageRef;
freshness_status: TKPFreshnessStatus;
tkp_compiled_at?: ISO8601;
tkp_semver?: string;
observed_module_catalog_hash?: string;
observed_port_registry_hash?: string;
observed_policy_snapshot_hash?: string;
invalidating_event_refs: StorageRef[];
design_mode: TKPDesignMode;
policy_ref: string;
created_at: ISO8601;
schema_version: "1.0";
}
```
**Normative rule:**
```ts
RULE task_agent_must_preflight_tkp:
Before proposing a task graph, template, module preset, capability binding,
or side-effect policy recommendation, Task Agent MUST emit a
TaskKnowledgePackPreflightReceipt.
If design_mode is "advisory_only", Task Agent may explain options but MUST NOT
produce executable graph patches.
If design_mode is "blocked_until_recompile", Task Agent MUST request TKP
recompile and stop design.
```
**Lints:**
```ts
validation.task_agent_design_from_stale_tkp
validation.task_agent_design_without_tkp_preflight
validation.tkp_freshness_unknown_but_full_design_allowed
```
------
### AB-T-16 — DOC24 task-opportunity classifier calibration + kill switch
**Disposition:** Change `DISCUSS` to **ADOPT**.
This is distinct from planner confidence. It protects the product’s direct-first UX: when a user wants a simple direct answer, task machinery should not over-trigger.
**Spec insert: Core §13A.10 / DOC24 obligation — TaskOpportunityClassifierQualityGate**
```ts
type TaskOpportunityClassifierDecision =
| "stay_direct"
| "quiet_task_chip"
| "suggest_saved_task"
| "invoke_task_agent"
| "explicit_task_request";
interface TaskOpportunityClassifierCalibrationPolicy {
policy_id: string;
measurement_window_days: number;
min_sample_size: number;
max_false_positive_rate_for_task_suggestion: number;
max_false_negative_rate_for_explicit_task_like_request: number;
direct_first_protection_enabled: boolean;
kill_switch:
| "off"
| "disable_saved_task_suggestions"
| "disable_task_agent_invocation"
| "stay_direct_only";
review_trigger:
| "threshold_breach"
| "user_complaint"
| "regression_fixture_failure"
| "manual";
schema_version: "1.0";
}
interface TaskOpportunityClassifierObservation {
observation_id: string;
request_ref: StorageRef;
predicted_decision: TaskOpportunityClassifierDecision;
final_user_accepted_decision?: TaskOpportunityClassifierDecision;
false_positive?: boolean;
false_negative?: boolean;
reason_codes: string[];
created_at: ISO8601;
schema_version: "1.0";
}
interface TaskOpportunityClassifierCalibrationReport {
report_id: string;
policy_ref: string;
measurement_window: DateRange;
sample_size: number;
false_positive_rate: number;
false_negative_rate: number;
regression_fixture_refs: StorageRef[];
kill_switch_action_taken?: TaskOpportunityClassifierCalibrationPolicy["kill_switch"];
created_at: ISO8601;
schema_version: "1.0";
}
```
**Lints:**
```ts
validation.task_opportunity_classifier_without_calibration_policy
validation.task_opportunity_false_positive_rate_exceeded
validation.task_opportunity_kill_switch_not_applied_after_threshold_breach
```
------
### AB-T-17 — Global snapshot retention/indexing contract
**Disposition:** Change `DISCUSS` to **ADOPT**.
Common Contracts requires `evaluation_snapshot_ref` on envelopes, and V3.3.1 added `EvaluationSnapshot`. But a required reference is not a retention/indexing guarantee. This was already part of my prior R2 read-model/retention thinking; the card is right to call it out.
**Spec insert: Common Contracts §10C — SnapshotRetentionPolicy**
```ts
type SnapshotKind =
| "evaluation_snapshot"
| "source_workspace_snapshot"
| "artifact_version_snapshot"
| "graph_topology_snapshot"
| "policy_snapshot"
| "capability_snapshot"
| "task_run_context_packet"
| "board_digest";
type SnapshotRetentionClass =
| "run_lifetime"
| "task_lifetime"
| "matter_lifetime"
| "reliance_bound"
| "legal_hold"
| "indefinite_until_superseded";
interface SnapshotRetentionPolicy {
policy_id: string;
snapshot_kind: SnapshotKind;
retention_class: SnapshotRetentionClass;
min_retention_days?: number;
must_retain_while_referenced_by:
| "evaluation_result_envelope"
| "feedback_bundle"
| "task_reliance_packet"
| "evidence_package"
| "revision_review_packet"
| "learning_signal"
| "hard_call_resolution"
| "legal_hold";
index_required: boolean;
index_keys: Array<
| "snapshot_ref"
| "task_id"
| "run_id"
| "matter_id"
| "artifact_ref"
| "source_workspace_ref"
| "evaluation_chain_id"
>;
garbage_collection_allowed: boolean;
schema_version: "1.0";
}
interface SnapshotIndexRecord {
snapshot_ref: StorageRef;
snapshot_kind: SnapshotKind;
task_id?: string;
run_id?: string;
matter_id?: string;
produced_at: ISO8601;
retained_until?: ISO8601;
retention_policy_ref: string;
referenced_by_refs: StorageRef[];
availability_status:
| "available"
| "archived_available"
| "redacted"
| "purged_by_policy"
| "missing_error";
schema_version: "1.0";
}
```
**Lints:**
```ts
validation.snapshot_ref_missing_from_index
validation.snapshot_purged_while_referenced
validation.evaluation_snapshot_ref_unresolvable
validation.reliance_packet_snapshot_unavailable
```
------
## 4. New ideas sparked by this test-set card
### New idea 1 — Proof & Honesty Verdict Floor as a named package
The card suggests packaging AB-T-01 + AB-T-02 + AB-T-05 together. I agree and would add AB-T-07 in scoped form.
**Package name:** `ProofAndHonestyVerdictFloor`.
**Members:**
```ts
ProofAndHonestyVerdictFloor =
| SourceRetrievalOutcome
| OutcomeGroundingRequirement
| OutcomeProofGateReceipt
| AssuranceExecutionRecord
| RequiredQuorumManifest
| MinimumDocumentationTierPolicy
| SourceUseDeclaration
| CitationManifest
```
**Core rule:**
```ts
RULE proof_and_honesty_floor:
The system MUST NOT present an outcome, forum decision, revision, or final
artifact as cleanly satisfied/approved/verified when any required proof,
assurance, quorum, source documentation tier, or matter assignment is missing.
```
### New idea 2 — ContextBoundaryRef as a cross-system primitive
AB-T-09 and AB-T-11 reveal the same missing concept: portfolio resource governance, append-log partitioning, learning isolation, attention queues, and matter-bound privilege all need one boundary primitive.
Use `ContextBoundaryRef` for:
```ts
pattern_promotion
suggestion_learning
invocation_learning
portfolio_governor
attention_ledger
append_log_partition
source_workspace
forum_room
snapshot_retention
```
This should be the canonical isolation unit; `matter_id` and `context_class_key` become fields inside it, not competing top-level owners.
### New idea 3 — RecoveryPolicyRegistry
AB-T-13 and AB-T-14 show a broader defect: chaos fixtures and malformed outputs need named recovery policies. Add a small registry.
```ts
interface RecoveryPolicyRegistryEntry {
policy_id: string;
trigger_kind:
| "malformed_evaluation_output"
| "storage_full"
| "privilege_reclassification"
| "external_tool_failure"
| "source_retrieval_failure"
| "model_unavailable"
| "policy_snapshot_changed";
applies_to:
| "evaluation"
| "revision"
| "source_workspace"
| "forum"
| "task_agent_design"
| "external_side_effect";
recovery_actions: string[];
terminal_fallback:
| "needs_verification"
| "needs_information"
| "needs_human_judgment"
| "unable_to_evaluate"
| "blocked_by_policy"
| "unrecoverable";
fixture_refs: StorageRef[];
schema_version: "1.0";
}
```
### New idea 4 — Enforcement badge contract
The UI surfaces should not invent their own badges for proof/assurance/quorum/tier. Define one compact read model.
```ts
type EnforcementBadgeKind =
| "clean"
| "downgraded_assurance"
| "missing_source"
| "subminimum_source_tier"
| "quorum_missing"
| "ambiguous_matter"
| "stale_snapshot"
| "needs_human_review";
interface EnforcementBadge {
badge_id: string;
badge_kind: EnforcementBadgeKind;
target_ref: StorageRef;
user_visible_label: string;
severity: "info" | "warning" | "blocking";
source_receipt_refs: StorageRef[];
schema_version: "1.0";
}
```
This prevents DOC20/DOC21/DOC22 from inventing separate visual semantics for the same enforcement facts.
------
## 5. Completeness and overlap with prior review
The test-set card’s coverage matrix is plausible: it maps the test harvest back to AB-T rows and DUP/DECLINED rows, and it explicitly maps ChatGPT-2 #2/#3/#4/#6 to AB-T-16/15/17/05. I do not see a dropped class of test finding in the card itself.
But relative to **my prior R2 package**, the card overstates “net-new” for a few items:
| Item | Live specs? | Prior R2 recommendations? | My final call |
| ------- | ------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
| AB-T-05 | Not satisfied | Partly touched by tier/source hardening, but not as full risk matrix | Adopt; add explicit policy. |
| AB-T-06 | Not satisfied | Partly adjacent to ExplanationTrace/DecisionAudit, but not ordinary module rationale | Adopt materiality-scoped version. |
| AB-T-15 | Not satisfied | Yes, prior R2 had TKP freshness preflight | Adopt; treat as already covered by my prior recommendation, not current spec. |
| AB-T-16 | Not satisfied | Prior R2 identified DOC24 classifier calibration, but not operative | Adopt. |
| AB-T-17 | Not satisfied | Prior R2 had read-model/snapshot invalidation direction | Adopt; make retention/indexing explicit. |
I agree with the card’s DUP call for external source-query and workspace export/open side effects **only if** the prior R2 `ExternalSourceQueryPolicy` / `WorkspaceExternalizationPolicy` actually lands. Source Workspace currently exposes browser/Finder/native/save/export actions and external source mechanisms, so the risk remains real in the operative text. The test card marks it DUP because the prior comprehensive package covers it, not because the live spec already does.
------
## 6. Explicit answers to the three questions
### 1. AB-T-02 — field-not-state or first-class state?
**The field-not-state choice is right**, but “field” should mean a structured `AssuranceExecutionRecord`, not just `target_assurance_basis` and `executed_assurance_basis` floating on the envelope.
Do **not** add `satisfied_downgraded` to `OutcomeEvaluationState`. A downgraded assurance pass is not a separate outcome state; it is either:
1. `satisfied` with a disclosed limitation, when policy/user explicitly accepts the downgrade; or
2. `needs_verification` / `needs_human_judgment`, when the target assurance was non-degradable or unaccepted.
Use the schema above:
```ts
interface AssuranceExecutionRecord {
assurance_execution_id: string;
result_id: string;
outcome_id: string;
target_assurance_basis: AssuranceBasis[];
minimum_required_assurance_basis: AssuranceBasis[];
executed_assurance_basis: AssuranceBasis[];
assurance_status:
| "met"
| "downgraded_policy_accepted"
| "downgraded_needs_review"
| "not_met";
downgrade_kind?:
| "model_class_downgrade"
| "specialist_unavailable"
| "panel_quorum_not_met"
| "source_verification_unavailable"
| "budget_or_time_cap"
| "tool_unavailable"
| "policy_disallowed_required_assurance";
clean_verdict_allowed: boolean;
presentation_status:
| "clean_pass"
| "pass_with_disclosed_downgrade"
| "blocked_no_clean_verdict";
explanation: string;
evidence_refs: StorageRef[];
created_at: ISO8601;
schema_version: "1.0";
}
```
Core lint:
```ts
validation.envelope.executed_below_target_assurance_presented_as_clean
```
### 2. AB-T-09 — thin governor vs no central orchestrator; canonical isolation unit?
A thin governor is justified **only** if it is not an orchestrator. It should not dispatch modules, mutate graphs, decide substantive priority, or become a second runtime. It should be a **portfolio read model + admission/throttle policy** that produces decisions EC/runtime can honor through existing command paths.
Minimum viable form:
```ts
interface PortfolioAdmissionDecision {
decision_id: string;
request_ref: StorageRef;
boundary_ref: ContextBoundaryRef;
resource_snapshot_ref: StorageRef;
policy_ref: string;
decision:
| "allow_now"
| "queue"
| "defer_low_priority"
| "require_user_attention"
| "throttle_background_only"
| "block_due_to_policy";
rationale_summary: string;
may_directly_mutate_task_graph: false;
may_directly_dispatch_module: false;
created_at: ISO8601;
schema_version: "1.0";
}
```
Canonical isolation unit: **`ContextBoundaryRef`**, not matter alone and not context-class alone. For legal/privileged work, matter dominates. `context_class_key` is subordinate and only operates inside the matter/client/project boundary.
```ts
interface ContextBoundaryRef {
boundary_id: string;
boundary_kind:
| "privileged_matter"
| "matter"
| "client"
| "project"
| "task_family"
| "context_class"
| "global_nonprivileged";
matter_id?: string;
client_id?: string;
project_id?: string;
task_family_id?: string;
context_class_key?: string;
privileged: boolean;
canonical_for:
| "pattern_promotion"
| "suggestion_learning"
| "invocation_learning"
| "attention_rollup"
| "budget_governance"
| "append_log_partition"
| "source_workspace";
parent_boundary_ref?: string;
schema_version: "1.0";
}
```
### 3. Are AB-T-05, AB-T-06, AB-T-15 already satisfied?
**AB-T-05:** Not satisfied by operative specs. Source Workspace has documentation modes and tiers, including tier 0 and tier 4, and `documentation_mode` governs default tier. But there is no risk-based minimum tier matrix. Adopt it.
**AB-T-06:** Not satisfied. Evaluator/Revisor have rich structured records, but ordinary modules do not have a lightweight material-decision rationale contract. The closest prior proposals are `ExplanationTrace` and `DecisionAuditView`, but those are not the same as a per-module material choice summary. Adopt, but materiality-scoped.
**AB-T-15:** Not satisfied by current operative specs. The Core owns Task Agent education/TKP in purpose and topology, but I did not find a current TKP freshness preflight in the operative Addenda B set. It **is** already addressed by my prior R2 recommendations, so the new card should say: “net-new versus live spec; covered by prior R2 suggestion; should land in R0.4.”
------
## Cut / keep summary
**Cut or keep as DISCUSS only:**
- AB-T-08 as hard `CausalProof`: do not adopt as hard gate.
- Full AB-T-09 central governor: do not adopt if it dispatches or mutates; adopt only thin read-model/admission version.
- Full AB-T-12 multi-actor ownership: keep for later; adopt only `RunOperatorContext`.
**Upgrade from DISCUSS to ADOPT:**
- AB-T-07 scoped `CitationManifest`.
- AB-T-10 per-matter append-log partitioning.
- AB-T-13 runtime integrity behavior contracts.
- AB-T-14 load-bearing malformed eval recovery.
- AB-T-15 TKP freshness preflight.
- AB-T-16 task-opportunity classifier calibration/kill switch.
- AB-T-17 snapshot retention/indexing.
**Package together:**
- AB-T-01 + AB-T-02 + AB-T-05 + scoped AB-T-07 as **Proof & Honesty Verdict Floor**.
- AB-T-09 + AB-T-10 + AB-T-11 as **Context Boundary / Portfolio Isolation package**.
- AB-T-13 + AB-T-14 as **RecoveryPolicyRegistry / chaos behavior package**.
According to a document from May 29, 2026, the test-set card frames the net-new residue as **Reporting → Enforcement**: the system should not show a clean `satisfied` / `approved` / “verified” verdict when proof, assurance level, or matter binding is missing. It packages AB-T-01 + AB-T-02 + AB-T-05 as a proposed “Proof & Honesty Verdict Floor,” while leaving AB-T-07–14 and AB-T-17 mostly in an architect-decision queue. My deeper pass: that framing is right, but the proposed enforcement layer still needs a **single eligibility/denial object** so the gates do not become scattered booleans that disagree.
## Top-line after the second pass
I would keep the card’s main thesis, but I would **upgrade more of the DISCUSS rows to ADOPT** and make the enforcement layer more systematic. The biggest missed issue is that AB-T-01, AB-T-02, AB-T-05, scoped AB-T-07, AB-T-13, AB-T-14, AB-T-15, AB-T-16, and AB-T-17 all produce “you cannot safely show a clean result” facts, but the card does not yet define one canonical **clean-verdict eligibility gate** that aggregates them.
The second biggest missed issue is **negative evidence semantics**. `source_missing`, `access_blocked`, `not_checked`, `unsupported`, `contradicted`, and “not attempted” must not collapse into one generic failure bucket. They imply different user-facing messages and different recovery routes.
The third issue is **boundary consistency**. AB-T-09/10/11 should not be left as loose portfolio ideas; they expose one shared missing primitive: `ContextBoundaryRef`. Without it, matter-scoped privilege, learning isolation, append-log partitioning, attention rollups, portfolio budgets, and snapshot retention will each invent slightly different boundary logic.
------
# 1. Revised per-item table
| ID | Updated view | One-line reason |
| -------------------------- | ----------------------------------- | ------------------------------------------------------------ |
| AB-T-01 | **Agree, modify** | Adopt, but `SourceRetrievalOutcome` needs expected-source semantics, not just query-result status. |
| AB-T-02 | **Agree, modify** | Field-not-state is still right, but use a structured `AssuranceExecutionRecord` plus clean-verdict eligibility aggregation. |
| AB-T-03 | **Agree** | Real safety bug; matter ambiguity must hold before scope/memory/forum routing. |
| AB-T-04 | **Agree** | Fold into R0.4, but require displayed-material refs so the decision log proves what the user saw. |
| AB-T-05 | **Agree, strengthen** | Net-new and important; minimum tier policy must bind to *source use*, not just source records. |
| AB-T-06 | **Agree, narrow** | Require only for material module decisions; prohibit hidden reasoning. |
| AB-T-07 | **Change to ADOPT, scoped** | Required for final/filing/public/load-bearing factual text; too heavy for all drafts. |
| AB-T-08 | **Keep DISCUSS / light adopt only** | Do not create `CausalProof`; create `RevisionChangeRationale` only. |
| AB-T-09 | **Partial / scoped ADOPT** | A thin governor is justified only as admission/read-model policy, not orchestration. |
| AB-T-10 | **Change to ADOPT** | Privileged/matter logs should be partitioned now; this is cheap before implementation. |
| AB-T-11 | **Change to ADOPT** | Canonical isolation unit cannot be deferred; use `ContextBoundaryRef`. |
| AB-T-12 | **Partial / lightweight ADOPT** | Do not import full multi-user ownership, but record operator identity/authority/handoff. |
| AB-T-13 | **Change to ADOPT** | Chaos fixtures without behavior contracts are false tests. |
| AB-T-14 | **Change to ADOPT** | Load-bearing malformed evaluation output needs deterministic recovery. |
| AB-T-15 | **Change to ADOPT** | Stale TKP design is a preflight blocker, not just a discussion item. |
| AB-T-16 | **Change to ADOPT** | Direct-first task classifier drift needs FP/FN thresholds and kill switch. |
| AB-T-17 | **Change to ADOPT** | Snapshot refs without retention/indexing are audit theater. |
| DUP side-effect rows | **Keep DUP, but conditional** | Only DUP if prior `ExternalSourceQueryPolicy` / `WorkspaceExternalizationPolicy` lands; not covered by live spec. |
| Staleness DECLINED-OVERLAP | **Partial** | The base staleness machinery exists, but consumption-time enforcement still needs a gate. |
------
# 2. Deeper issues I missed or would now change
## Issue 1 — Add one canonical Clean Verdict Eligibility layer
The card says the new residue is “Reporting → Enforcement,” and AB-T-01/02/05 are meant to prevent clean verdicts when proof, assurance, or source tier is insufficient. That is right, but the current proposal spreads the enforcement across separate lints. That will produce drift: AB-T-01 says “not grounded,” AB-T-02 says “downgraded,” AB-T-05 says “source below tier,” AB-T-17 says “snapshot unavailable,” and the UI has to infer the final display state.
Add a single aggregation object.
```ts
// Common Contracts §3.10 — Clean Verdict Eligibility
type CleanVerdictBlockerKind =
| "affirmative_grounding_missing"
| "source_retrieval_failed"
| "source_access_blocked"
| "claim_not_checked"
| "claim_unsupported"
| "claim_contradicted"
| "source_documentation_tier_below_minimum"
| "source_verification_missing"
| "source_freshness_stale"
| "assurance_downgraded"
| "assurance_not_met"
| "quorum_missing"
| "matter_resolution_ambiguous"
| "snapshot_unavailable"
| "tkp_stale_for_design"
| "malformed_eval_recovery_unresolved"
| "runtime_integrity_event_unresolved"
| "policy_blocked"
| "human_review_required";
type CleanVerdictEligibilityStatus =
| "clean_verdict_allowed"
| "allowed_with_disclosed_limitation"
| "clean_verdict_blocked";
type CleanVerdictRecoveryRoute =
| "needs_information"
| "needs_verification"
| "needs_human_judgment"
| "blocked_by_policy"
| "unable_to_evaluate"
| "unrecoverable"
| "defer_until_recheck";
interface CleanVerdictBlocker {
blocker_id: string;
blocker_kind: CleanVerdictBlockerKind;
source_ref: StorageRef; // proof gate receipt, assurance record, source-use declaration, etc.
target_ref: StorageRef; // result/envelope/artifact/claim/outcome/forum decision
severity: "warning" | "blocking";
recovery_route: CleanVerdictRecoveryRoute;
user_visible_summary: string;
internal_detail_ref?: StorageRef;
created_at: ISO8601;
schema_version: "1.0";
}
interface CleanVerdictEligibilityReceipt {
receipt_id: string;
result_id?: string;
outcome_id?: string;
artifact_ref?: StorageRef;
forum_decision_ref?: StorageRef;
task_reliance_packet_ref?: StorageRef;
input_receipt_refs: StorageRef[];
blockers: CleanVerdictBlocker[];
eligibility_status: CleanVerdictEligibilityStatus;
allowed_final_state?:
| "satisfied"
| "needs_information"
| "needs_verification"
| "needs_human_judgment"
| "blocked_by_policy"
| "unable_to_evaluate"
| "unrecoverable";
presentation_status:
| "clean"
| "disclosed_limitation"
| "blocked_no_clean_verdict";
created_at: ISO8601;
schema_version: "1.0";
}
```
Normative rule:
```ts
RULE clean_verdict_eligibility_controls_presentation:
Any object presented to the user as "satisfied", "approved", "verified",
"safe to rely", or "clean pass" MUST have a CleanVerdictEligibilityReceipt.
IF any CleanVerdictBlocker.severity == "blocking":
eligibility_status = "clean_verdict_blocked";
presentation_status = "blocked_no_clean_verdict";
the target MUST NOT be displayed as a clean pass.
IF blockers.length > 0 AND no blocker is "blocking":
eligibility_status = "allowed_with_disclosed_limitation";
presentation_status = "disclosed_limitation".
IF blockers.length == 0:
eligibility_status = "clean_verdict_allowed";
presentation_status = "clean".
```
Lints:
```ts
validation.clean_verdict_without_eligibility_receipt
validation.clean_verdict_presented_despite_blocker
validation.clean_verdict_blocker_without_recovery_route
validation.clean_verdict_eligibility_inputs_stale
```
This is the strongest consolidation I would add. It prevents one-off lints from becoming inconsistent UI semantics.
------
## Issue 2 — AB-T-01 needs “expected source” semantics
The card’s `SourceRetrievalOutcome` is directionally right: it makes absence visible, and AB-T-01 correctly says a silently failed fetch should not leave an empty workspace that reads as a pass. But a simple query outcome cannot tell the difference between:
- “we expected a specific source and could not retrieve it,”
- “we searched generally and found nothing,”
- “we never attempted the source due to policy,”
- “we found a source but it is not the one needed,”
- “we found a source but the user lacks authority to use it.”
That distinction matters for legal work. A missing expected 10-K exhibit and a broad web query returning zero results are not the same proof state.
Revise AB-T-01 like this.
```ts
// Source Workspace §2.4 — Source retrieval outcome, revised
type ExpectedSourceBasis =
| "user_named_source"
| "citation_in_artifact"
| "claim_requires_authority"
| "prior_source_record"
| "library_candidate"
| "policy_required_source"
| "research_plan_item"
| "general_query_no_expected_source";
type SourceRetrievalOutcomeStatus =
| "retrieved_expected_source"
| "retrieved_alternative_source"
| "source_missing"
| "access_blocked"
| "stale_only"
| "ambiguous_match"
| "not_attempted_policy_blocked"
| "not_attempted_missing_credentials"
| "query_returned_no_results"
| "retrieval_error";
type SourceRetrievalReasonCode =
| "http_404"
| "permission_denied"
| "paywall_or_auth_required"
| "provider_timeout"
| "provider_rate_limited"
| "query_returned_no_results"
| "only_stale_versions_found"
| "multiple_candidate_sources"
| "policy_disallowed_source_type"
| "connector_unavailable"
| "credential_missing"
| "ambiguous_identifier"
| "unknown";
interface ExpectedSourceTarget {
expected_source_id?: string;
expected_source_label: string;
expected_source_basis: ExpectedSourceBasis;
expected_citation_text?: string;
expected_url_or_identifier?: string;
expected_jurisdiction_or_corpus?: string;
target_claim_refs: ClaimRef[];
target_scope_refs: ArtifactScopeRef[];
schema_version: "1.0";
}
interface SourceRetrievalOutcome {
retrieval_outcome_id: string;
query_id?: string;
source_record_ref?: SourceRecordRef;
expected_source_target: ExpectedSourceTarget;
status: SourceRetrievalOutcomeStatus;
reason_code?: SourceRetrievalReasonCode;
attempted_source_type:
| "document_library"
| "web"
| "api"
| "database"
| "email"
| "file_system"
| "browser"
| "connector"
| "manual_upload"
| "prior_task_output";
provider_ref?: string;
retrieval_receipt_ref?: StorageRef;
workspace_externalization_receipt_ref?: StorageRef;
policy_decision_refs: PolicyEvaluationRef[];
created_at: ISO8601;
schema_version: "1.1";
}
```
Add a distinction to the proof gate:
```ts
type SourceAbsenceMeaning =
| "expected_source_absent"
| "general_search_no_results"
| "not_allowed_to_search"
| "could_not_access"
| "ambiguous_result"
| "not_checked";
interface SourceAbsenceAssessment {
assessment_id: string;
retrieval_outcome_ref: StorageRef;
absence_meaning: SourceAbsenceMeaning;
blocks_clean_pass: boolean;
recommended_route:
| "needs_information"
| "needs_verification"
| "needs_human_judgment"
| "blocked_by_policy";
explanation: string;
schema_version: "1.0";
}
```
Lints:
```ts
validation.source_missing_without_expected_source_target
validation.general_query_empty_treated_as_expected_source_missing
validation.expected_source_absent_but_clean_pass_allowed
validation.source_access_blocked_without_recovery_route
```
This prevents “absence” from becoming a blunt instrument.
------
## Issue 3 — AB-T-02 field-not-state is right, but the field needs a verdict-floor record
I still agree with the card: do **not** add `satisfied_downgraded` as a new outcome state. The card explicitly proposes field + lint rather than a new state. That is the right call because assurance downgrade is an overlay on a result, not a distinct domain outcome.
But the field cannot be just `target_assurance_basis` and `executed_assurance_basis`. It needs policy, reason, user acceptance, and presentation effects.
```ts
// Common Contracts §3.9 — Assurance execution record
type AssuranceDowngradeKind =
| "model_class_downgrade"
| "specialist_unavailable"
| "specialist_panel_quorum_not_met"
| "source_verification_unavailable"
| "budget_or_time_cap"
| "tool_unavailable"
| "policy_disallowed_required_assurance"
| "human_reviewer_unavailable";
type AssuranceSatisfactionStatus =
| "met"
| "downgraded_policy_accepted"
| "downgraded_user_accepted"
| "downgraded_needs_review"
| "not_met";
interface AssuranceExecutionRecord {
assurance_execution_id: string;
result_id: string;
outcome_id: string;
target_assurance_basis: AssuranceBasis[];
minimum_required_assurance_basis: AssuranceBasis[];
executed_assurance_basis: AssuranceBasis[];
target_model_class?: ModelClass;
executed_model_class?: ModelClass;
target_quorum_ref?: StorageRef;
executed_quorum_ref?: StorageRef;
assurance_status: AssuranceSatisfactionStatus;
downgrade_kind?: AssuranceDowngradeKind;
downgrade_allowed_by_policy_ref?: PolicyEvaluationRef;
downgrade_accepted_by_user_ref?: StorageRef;
clean_verdict_allowed: boolean;
clean_verdict_eligibility_receipt_ref?: StorageRef;
explanation: string;
evidence_refs: StorageRef[];
created_at: ISO8601;
schema_version: "1.0";
}
```
Normative rule:
```ts
RULE assurance_execution_to_clean_verdict:
IF assurance_status == "met":
CleanVerdictBlocker is not emitted.
IF assurance_status IN ("downgraded_policy_accepted", "downgraded_user_accepted"):
emit CleanVerdictBlocker {
blocker_kind: "assurance_downgraded",
severity: "warning",
recovery_route: "defer_until_recheck"
}
IF assurance_status IN ("downgraded_needs_review", "not_met"):
emit CleanVerdictBlocker {
blocker_kind: "assurance_not_met",
severity: "blocking",
recovery_route: "needs_verification" OR "needs_human_judgment"
}
```
Lints:
```ts
validation.assurance_execution_record_missing_for_target_assurance
validation.executed_below_target_assurance_presented_as_clean
validation.assurance_downgrade_without_policy_or_user_acceptance
validation.assurance_downgrade_not_reflected_in_reliance_packet
```
------
## Issue 4 — AB-T-05 must be about source *use*, not source existence
The card’s minimum tier policy is important and net-new: current source tiers do not create a risk-based requirement, so a thin lookup can silently become load-bearing evidence. But the policy should not attach only to `SourceRecord`. A source can be tier 1 in the workspace and be harmless as background; it becomes dangerous when used to support a claim, citation, legal authority proposition, evaluator criterion, library promotion, or final work product.
So the controlling object should be `SourceUseDeclaration`.
```ts
// Source Workspace §3.4 — Risk-based source-use floor
type SourceUsePurpose =
| "background_context"
| "drafting_inspiration"
| "claim_support"
| "citation_support"
| "legal_authority"
| "evaluator_basis"
| "filing_bound_support"
| "external_publication_support"
| "library_promotion"
| "memory_promotion";
type SourceUseRiskClass =
| "casual_lookup"
| "internal_context"
| "drafting_background"
| "claim_support"
| "evaluator_load_bearing"
| "legal_authority"
| "filing_bound"
| "external_publication"
| "privileged_matter"
| "memory_or_library_promotion";
type MinimumTierPolicyResult =
| "allowed"
| "allowed_with_disclosed_limitation"
| "requires_tier_promotion"
| "blocked_clean_pass"
| "needs_verification"
| "needs_information";
interface MinimumDocumentationTierRule {
risk_class: SourceUseRiskClass;
min_tier: SourceDocumentationTier;
require_verification_state?: Array<
| "verified"
| "partially_verified"
| "unverified"
| "failed"
>;
require_freshness_state?: Array<
| "fresh"
| "unknown"
| "needs_update"
| "stale"
>;
on_subminimum: MinimumTierPolicyResult;
schema_version: "1.0";
}
interface SourceUseDeclaration {
declaration_id: string;
source_record_ref: SourceRecordRef;
source_workspace_snapshot_ref: StorageRef;
used_for: SourceUsePurpose;
risk_classes: SourceUseRiskClass[];
target_claim_refs: ClaimRef[];
target_artifact_scope_refs: ArtifactScopeRef[];
observed_tier: SourceDocumentationTier;
required_min_tier: SourceDocumentationTier;
verification_state_at_use:
| "verified"
| "partially_verified"
| "unverified"
| "failed";
freshness_state_at_use:
| "fresh"
| "stale"
| "needs_update"
| "unknown";
policy_result: MinimumTierPolicyResult;
policy_ref: string;
clean_verdict_blocker_ref?: StorageRef;
created_at: ISO8601;
schema_version: "1.0";
}
```
Lints:
```ts
validation.source_use_declaration_missing_for_claim_support
validation.source.load_bearing_claim_supported_by_subminimum_tier
validation.source.legal_authority_used_without_fresh_verified_tier3
validation.source.evaluator_basis_below_minimum_tier
validation.source.tier0_used_as_claim_support
```
This also resolves the remaining ambiguity around tier 0: tier 0 can exist as a receipt-only lookup, but it cannot support a clean claim.
------
## Issue 5 — AB-T-06 should be tied to materiality and decision class
The card says ordinary modules lack structured input→output rationale, while Evaluator/Revisor have rich rationale capture. That is a real gap. But a blanket rationale for every module activation will bloat prompts and tempt hidden chain-of-thought capture. Make it materiality-scoped.
```ts
// Common Contracts §10B — ModuleDecisionRationale
type ModuleDecisionKind =
| "content_structure_choice"
| "argument_or_strategy_choice"
| "source_selection"
| "tool_choice"
| "degradation_choice"
| "routing_choice"
| "scope_choice"
| "artifact_mutation_choice"
| "externalization_choice"
| "not_recorded_non_material";
type ModuleDecisionMateriality =
| "non_material"
| "material_to_artifact"
| "material_to_verdict"
| "material_to_source_support"
| "material_to_policy_or_privilege"
| "material_to_external_side_effect"
| "material_to_user_reliance";
interface ModuleDecisionRationale {
decision_id: string;
task_id: string;
run_id: string;
module_id: string;
module_activation_seq: number;
decision_kind: ModuleDecisionKind;
materiality: ModuleDecisionMateriality;
input_refs: StorageRef[];
output_refs: StorageRef[];
rationale_summary: string;
alternatives_considered_summary?: string;
source_workspace_snapshot_ref?: StorageRef;
evaluation_snapshot_ref?: StorageRef;
clean_verdict_eligibility_receipt_ref?: StorageRef;
policy_decision_refs: PolicyEvaluationRef[];
hidden_reasoning_included: false;
created_at: ISO8601;
schema_version: "1.0";
}
```
Rule:
```ts
RULE material_module_decision_rationale_required:
ModuleDecisionRationale is REQUIRED when materiality != "non_material".
It is OPTIONAL for routine non-material activation choices.
It MUST NOT include hidden chain-of-thought.
```
Lints:
```ts
validation.material_module_decision_without_rationale
validation.module_decision_rationale_contains_hidden_reasoning
validation.module_decision_rationale_missing_input_or_output_ref
```
------
## Issue 6 — AB-T-07 should be adopted as a scoped write-time manifest
The card leaves `CitationManifest` as DISCUSS because it is stronger than post-hoc `EvidencePackage` and carries drafting overhead. My deeper view: adopt it, but scope it. For final / filing-bound / external-publication / load-bearing factual text, post-hoc evidence binding is too late.
```ts
// Common Contracts §7A — CitationManifest
type CitationManifestRequirementLevel =
| "required"
| "optional"
| "not_required";
type CitationManifestScope =
| "final_artifact"
| "filing_bound_artifact"
| "external_publication"
| "claim_bearing_section"
| "evaluator_load_bearing_draft"
| "draft_internal_optional";
interface CitationManifestPolicy {
policy_id: string;
scope: CitationManifestScope;
requirement_level: CitationManifestRequirementLevel;
min_source_tier: SourceDocumentationTier;
require_support_anchor: boolean;
require_quote_or_excerpt_for_direct_support: boolean;
on_missing_manifest:
| "block_clean_verdict"
| "needs_verification"
| "allow_with_disclosed_limitation"
| "not_applicable";
schema_version: "1.0";
}
interface CitationManifestEntry {
entry_id: string;
claim_ref: ClaimRef;
artifact_scope_ref: ArtifactScopeRef;
source_record_ref: SourceRecordRef;
source_use_declaration_ref: StorageRef;
support_anchor_ref: StorageRef;
quote_or_excerpt_ref?: StorageRef;
support_strength:
| "direct"
| "indirect"
| "contextual"
| "contradicts";
verification_state_at_write:
| "verified"
| "partially_verified"
| "unverified"
| "failed";
freshness_state_at_write:
| "fresh"
| "stale"
| "needs_update"
| "unknown";
created_by_module_id: string;
created_by_activation_seq: number;
schema_version: "1.0";
}
interface CitationManifest {
manifest_id: string;
task_id: string;
run_id: string;
artifact_ref: StorageRef;
artifact_version_ref: StorageRef;
manifest_scope: CitationManifestScope;
entries: CitationManifestEntry[];
coverage_status:
| "complete_for_claim_bearing_text"
| "partial"
| "not_required_internal_draft"
| "claim_extractor_unavailable";
policy_ref: string;
created_at: ISO8601;
schema_version: "1.0";
}
```
Lints:
```ts
validation.claim_bearing_final_artifact_without_citation_manifest
validation.citation_manifest_entry_source_below_min_tier
validation.citation_manifest_claim_without_support_anchor
validation.citation_manifest_claim_extractor_unavailable_but_clean_pass
```
------
## Issue 7 — AB-T-09/10/11 should be one Context Boundary package
The card’s AB-T-09 lists a real 40-matter scaling cluster: unbounded portfolio arrays, global attention/learning queues, EC writer contention, model fan-in, no portfolio cost ceiling, extraction debt, and context-packet write amplification. AB-T-10 adds global privileged logs, and AB-T-11 adds inconsistent matter-vs-context-class learning isolation. These are not three separate ideas. They need one boundary primitive.
```ts
// Core §3A.9 — ContextBoundaryRef
type IsolationBoundaryKind =
| "privileged_matter"
| "matter"
| "client"
| "project"
| "task_family"
| "context_class"
| "global_nonprivileged";
interface ContextBoundaryRef {
boundary_id: string;
boundary_kind: IsolationBoundaryKind;
matter_id?: string;
client_id?: string;
project_id?: string;
task_family_id?: string;
context_class_key?: string;
privileged: boolean;
canonical_for:
| "pattern_promotion"
| "suggestion_learning"
| "invocation_learning"
| "attention_rollup"
| "budget_governance"
| "append_log_partition"
| "source_workspace"
| "snapshot_retention"
| "forum_scope";
parent_boundary_ref?: string;
schema_version: "1.0";
}
```
Rule:
```ts
RULE matter_boundary_dominates_context_class:
For privileged or matter-scoped work, boundary_kind "privileged_matter"
or "matter" is the primary isolation boundary.
context_class_key MAY be used only as a secondary classifier within the
governing matter/client/project boundary.
Rejection, suppression, learned invocation behavior, and suggestion learning
MUST NOT cross matter boundaries without governed promotion.
```
### Thin governor that does not violate the anti-orchestrator rule
```ts
// Core §16B — Portfolio admission read model
type PortfolioGovernorDecision =
| "allow_now"
| "queue"
| "defer_low_priority"
| "require_user_attention"
| "throttle_background_only"
| "block_due_to_policy";
interface PortfolioResourceSnapshot {
snapshot_id: string;
boundary_ref: ContextBoundaryRef;
active_run_count: number;
queued_run_count: number;
pending_hard_call_count: number;
pending_user_attention_count: number;
estimated_cost_vector_ref?: StorageRef;
actual_cost_vector_ref?: StorageRef;
local_parallelism_slots_used: number;
local_parallelism_slots_available: number;
extraction_backlog_count?: number;
stale_current_view_count?: number;
context_packet_pending_write_count?: number;
created_at: ISO8601;
schema_version: "1.0";
}
interface PortfolioAdmissionPolicy {
policy_id: string;
boundary_kind: IsolationBoundaryKind;
max_active_runs?: number;
max_background_runs?: number;
max_pending_attention_items?: number;
max_estimated_cost_vector_ref?: StorageRef;
fairness_policy:
| "matter_round_robin"
| "priority_then_age"
| "deadline_first"
| "user_pinned_first"
| "manual_only";
may_dispatch_work: false;
may_mutate_task_graph: false;
schema_version: "1.0";
}
interface PortfolioAdmissionDecision {
decision_id: string;
request_ref: StorageRef;
boundary_ref: ContextBoundaryRef;
resource_snapshot_ref: StorageRef;
policy_ref: string;
decision: PortfolioGovernorDecision;
rationale_summary: string;
emits_command_ref?: string;
may_directly_mutate_task_graph: false;
may_directly_dispatch_module: false;
created_at: ISO8601;
schema_version: "1.0";
}
```
Lints:
```ts
validation.portfolio_governor_dispatches_work_directly
validation.portfolio_governor_mutates_task_graph
validation.context_class_learning_crosses_matter_without_policy
validation.privileged_matter_missing_context_boundary
validation.portfolio_attention_item_without_boundary
```
### Append-log partitioning uses the same boundary
```ts
// Core §22.1A — AppendLogPartitionPolicy
type AppendLogPartitionKind =
| "global_nonprivileged"
| "by_matter"
| "by_client"
| "by_context_boundary"
| "local_only";
interface AppendLogPartitionPolicy {
policy_id: string;
log_kind:
| "context_feedback"
| "task_audit_events"
| "run_board_events"
| "feedback_consumption_receipts"
| "learning_signal_spool"
| "workspace_externalization_receipts";
partition_kind: AppendLogPartitionKind;
partition_key_source:
| "matter_id"
| "client_id"
| "context_boundary_ref"
| "data_class"
| "none";
privileged_requires_partition: boolean;
local_only_requires_partition: boolean;
export_policy:
| "export_per_partition"
| "export_global_with_redaction"
| "export_blocked";
legal_hold_scope:
| "partition"
| "matter"
| "client"
| "global_log";
schema_version: "1.0";
}
```
Lints:
```ts
validation.privileged_event_written_to_global_unpartitioned_log
validation.append_log_partition_missing_for_matter_scoped_event
validation.legal_hold_scope_not_partitionable
```
------
## Issue 8 — AB-T-13/14 need a RecoveryPolicyRegistry
The card correctly observes that chaos fixtures for mid-run privilege reclassification and storage-full do not define expected behavior. It also correctly identifies malformed load-bearing eval output as more than generic validation. But these should not be two ad hoc recovery objects. Add a general registry.
```ts
// Common Contracts §10C — Recovery policy registry
type RecoveryTriggerKind =
| "malformed_evaluation_output"
| "storage_full"
| "durable_write_failed"
| "privilege_reclassification"
| "source_retrieval_failure"
| "external_tool_failure"
| "model_unavailable"
| "policy_snapshot_changed"
| "tkp_stale"
| "snapshot_unavailable";
type RecoveryAction =
| "retry_same_model_once"
| "retry_different_model"
| "retry_with_stricter_schema"
| "fallback_deterministic_scorer"
| "emit_indeterminate"
| "raise_hard_call"
| "pause_scope"
| "fail_closed"
| "rollback_uncommitted"
| "re_taint_emitted_artifacts"
| "invalidate_read_models"
| "abort_scope"
| "block_clean_verdict";
interface RecoveryPolicyRegistryEntry {
policy_id: string;
trigger_kind: RecoveryTriggerKind;
applies_to:
| "evaluation"
| "revision"
| "source_workspace"
| "forum"
| "task_agent_design"
| "external_side_effect"
| "portfolio_governor";
recovery_order: RecoveryAction[];
terminal_fallback:
| "needs_verification"
| "needs_information"
| "needs_human_judgment"
| "unable_to_evaluate"
| "blocked_by_policy"
| "unrecoverable";
clean_verdict_blocker_kind?: CleanVerdictBlockerKind;
fixture_refs: StorageRef[];
schema_version: "1.0";
}
interface RecoveryActionReceipt {
receipt_id: string;
policy_ref: string;
trigger_kind: RecoveryTriggerKind;
target_ref: StorageRef;
action_taken: RecoveryAction;
resulting_state:
| "recovered"
| "retrying"
| "needs_verification"
| "needs_information"
| "needs_human_judgment"
| "unable_to_evaluate"
| "blocked_by_policy"
| "unrecoverable";
clean_verdict_blocker_ref?: StorageRef;
created_at: ISO8601;
schema_version: "1.0";
}
```
Lints:
```ts
validation.chaos_fixture_without_recovery_policy
validation.recovery_trigger_without_receipt
validation.load_bearing_malformed_eval_without_recovery_receipt
validation.storage_full_partial_write_without_failure_receipt
validation.privilege_reclassification_without_read_model_invalidation
```
Then specialize for load-bearing eval:
```ts
interface LoadBearingEvaluationRecoveryPolicy extends RecoveryPolicyRegistryEntry {
trigger_kind: "malformed_evaluation_output";
max_retry_count: number;
final_failure_state:
| "needs_verification"
| "needs_human_judgment"
| "unable_to_evaluate"
| "unrecoverable";
preserve_malformed_output_ref: boolean;
schema_version: "1.0";
}
```
------
## Issue 9 — AB-T-15 should block design, not just degrade
The card marks TKP freshness as net-new because Task Agent has stale-pack degraded mode but no freshness budget, invalidation trigger, or “do not design from stale TKP” preflight. I would strengthen it: stale TKP can still advise generally, but it cannot emit executable graph patches, module presets, port bindings, side-effect policy changes, or routing changes.
```ts
// Core §8.7A — Task Knowledge Pack freshness preflight
type TKPFreshnessStatus =
| "fresh"
| "stale"
| "invalidated"
| "unknown"
| "missing";
type TKPDesignMode =
| "full_design_allowed"
| "advisory_only"
| "blocked_until_recompile"
| "user_confirmation_required";
type TKPInvalidatingEventKind =
| "module_catalog_changed"
| "port_registry_changed"
| "capability_registry_changed"
| "side_effect_policy_changed"
| "template_registry_changed"
| "task_runtime_contract_changed"
| "common_contract_schema_changed"
| "type_owner_registry_changed"
| "revisor_contract_changed"
| "source_workspace_api_changed";
interface TaskKnowledgePackFreshnessPolicy {
policy_id: string;
max_age_hours: number;
invalidating_event_kinds: TKPInvalidatingEventKind[];
min_required_tkp_semver: string;
min_required_module_catalog_hash?: string;
min_required_port_registry_hash?: string;
min_required_policy_snapshot_hash?: string;
min_required_type_owner_registry_hash?: string;
on_stale_for_task_design: TKPDesignMode;
on_unknown_freshness: TKPDesignMode;
on_missing_tkp: TKPDesignMode;
schema_version: "1.0";
}
interface TaskKnowledgePackPreflightReceipt {
receipt_id: string;
task_agent_activation_id: string;
tkp_ref?: StorageRef;
freshness_status: TKPFreshnessStatus;
tkp_compiled_at?: ISO8601;
tkp_semver?: string;
observed_module_catalog_hash?: string;
observed_port_registry_hash?: string;
observed_policy_snapshot_hash?: string;
observed_type_owner_registry_hash?: string;
invalidating_event_refs: StorageRef[];
design_mode: TKPDesignMode;
policy_ref: string;
created_at: ISO8601;
schema_version: "1.0";
}
```
Rule:
```ts
RULE task_agent_design_requires_fresh_tkp:
Before proposing a task graph, executable template, module preset,
port binding, side-effect-capability binding, or graph patch, Task Agent
MUST emit a TaskKnowledgePackPreflightReceipt.
If design_mode == "advisory_only":
Task Agent may explain options but MUST NOT emit executable graph patches.
If design_mode == "blocked_until_recompile":
Task Agent MUST request TKP recompile and stop design.
If design_mode == "user_confirmation_required":
generated proposals MUST be non-executable until user confirmation.
```
Lints:
```ts
validation.task_agent_design_without_tkp_preflight
validation.task_agent_design_from_stale_tkp
validation.tkp_freshness_unknown_but_full_design_allowed
validation.stale_tkp_emits_executable_graph_patch
```
------
## Issue 10 — AB-T-16 should have a hard “direct-first guardrail”
The card correctly separates DOC24 task-opportunity classifier calibration from planner calibration. I would add a product guardrail: if the classifier is degraded or uncalibrated, direct mode wins unless the user explicitly invokes a task.
```ts
// Core / DOC24 obligation — TaskOpportunityClassifierQualityGate
type TaskOpportunityClassifierDecision =
| "stay_direct"
| "quiet_task_chip"
| "suggest_saved_task"
| "invoke_task_agent"
| "explicit_task_request";
type TaskOpportunityClassifierMode =
| "normal"
| "suggestions_only"
| "quiet_chip_only"
| "explicit_task_only"
| "disabled_stay_direct";
interface TaskOpportunityClassifierCalibrationPolicy {
policy_id: string;
measurement_window_days: number;
min_sample_size: number;
max_false_positive_rate_for_task_suggestion: number;
max_false_positive_rate_for_task_agent_invocation: number;
max_false_negative_rate_for_explicit_task_like_request: number;
direct_first_protection_enabled: boolean;
kill_switch:
| "off"
| "disable_saved_task_suggestions"
| "disable_task_agent_invocation"
| "explicit_task_only"
| "stay_direct_only";
review_trigger:
| "threshold_breach"
| "user_complaint"
| "regression_fixture_failure"
| "manual";
schema_version: "1.0";
}
interface TaskOpportunityClassifierObservation {
observation_id: string;
request_ref: StorageRef;
predicted_decision: TaskOpportunityClassifierDecision;
final_user_accepted_decision?: TaskOpportunityClassifierDecision;
user_dismissed_task_suggestion?: boolean;
user_accepted_task_suggestion?: boolean;
false_positive?: boolean;
false_negative?: boolean;
reason_codes: string[];
created_at: ISO8601;
schema_version: "1.0";
}
interface TaskOpportunityClassifierCalibrationReport {
report_id: string;
policy_ref: string;
measurement_window: DateRange;
sample_size: number;
false_positive_rate_task_suggestion: number;
false_positive_rate_task_agent_invocation: number;
false_negative_rate_explicit_task_like_request: number;
regression_fixture_refs: StorageRef[];
effective_classifier_mode: TaskOpportunityClassifierMode;
kill_switch_action_taken?: TaskOpportunityClassifierCalibrationPolicy["kill_switch"];
created_at: ISO8601;
schema_version: "1.0";
}
```
Rule:
```ts
RULE direct_first_on_classifier_degradation:
If TaskOpportunityClassifierCalibrationReport.effective_classifier_mode is
"explicit_task_only" or "disabled_stay_direct", DOC24 MUST NOT invoke Task
Agent or suggest saved tasks unless the user explicitly asks for a task.
```
Lints:
```ts
validation.task_opportunity_classifier_without_calibration_policy
validation.task_opportunity_false_positive_rate_exceeded
validation.task_opportunity_kill_switch_not_applied_after_threshold_breach
validation.task_agent_invoked_while_classifier_in_direct_only_mode
```
------
## Issue 11 — AB-T-17 should be tied to the clean-verdict gate
The card correctly says `evaluation_snapshot_ref` being required does not guarantee the snapshot still exists. I would make missing snapshots not just an audit problem, but a clean-verdict blocker when the snapshot is needed for reliance.
```ts
// Common Contracts §10D — Snapshot retention and indexing
type SnapshotKind =
| "evaluation_snapshot"
| "source_workspace_snapshot"
| "artifact_version_snapshot"
| "graph_topology_snapshot"
| "policy_snapshot"
| "capability_snapshot"
| "task_run_context_packet"
| "board_digest"
| "tkp_snapshot";
type SnapshotRetentionClass =
| "run_lifetime"
| "task_lifetime"
| "matter_lifetime"
| "reliance_bound"
| "legal_hold"
| "indefinite_until_superseded";
interface SnapshotRetentionPolicy {
policy_id: string;
snapshot_kind: SnapshotKind;
retention_class: SnapshotRetentionClass;
min_retention_days?: number;
must_retain_while_referenced_by: Array<
| "evaluation_result_envelope"
| "feedback_bundle"
| "task_reliance_packet"
| "evidence_package"
| "revision_review_packet"
| "learning_signal"
| "hard_call_resolution"
| "legal_hold"
| "clean_verdict_eligibility_receipt"
>;
index_required: boolean;
index_keys: Array<
| "snapshot_ref"
| "task_id"
| "run_id"
| "matter_id"
| "artifact_ref"
| "source_workspace_ref"
| "evaluation_chain_id"
| "context_boundary_ref"
>;
garbage_collection_allowed: boolean;
schema_version: "1.0";
}
interface SnapshotIndexRecord {
snapshot_ref: StorageRef;
snapshot_kind: SnapshotKind;
task_id?: string;
run_id?: string;
matter_id?: string;
context_boundary_ref?: ContextBoundaryRef;
produced_at: ISO8601;
retained_until?: ISO8601;
retention_policy_ref: string;
referenced_by_refs: StorageRef[];
availability_status:
| "available"
| "archived_available"
| "redacted"
| "purged_by_policy"
| "missing_error";
clean_verdict_blocker_ref?: StorageRef;
schema_version: "1.0";
}
```
Lints:
```ts
validation.snapshot_ref_missing_from_index
validation.snapshot_purged_while_referenced
validation.evaluation_snapshot_ref_unresolvable
validation.reliance_packet_snapshot_unavailable
validation.clean_verdict_snapshot_unavailable
```
------
# 3. New ideas I would add now
## New idea A — EnforcementBadge as the UI contract
The card’s desired user experiences include support panels, degraded assurance badges, matter-hold prompts, and source-verification warnings. Define one UI-facing badge record so DOC20/DOC21/DOC22 do not invent separate semantics.
```ts
// DOC20 / Common Contracts — EnforcementBadge
type EnforcementBadgeKind =
| "clean"
| "downgraded_assurance"
| "missing_source"
| "access_blocked"
| "subminimum_source_tier"
| "quorum_missing"
| "ambiguous_matter"
| "stale_snapshot"
| "stale_tkp"
| "needs_human_review"
| "malformed_eval_recovered"
| "runtime_integrity_block";
interface EnforcementBadge {
badge_id: string;
badge_kind: EnforcementBadgeKind;
target_ref: StorageRef;
user_visible_label: string;
user_visible_detail?: string;
severity: "info" | "warning" | "blocking";
source_receipt_refs: StorageRef[];
command_ref?: string; // e.g., "assign matter", "re-run verification", "open evidence"
created_at: ISO8601;
schema_version: "1.0";
}
```
Lint:
```ts
validation.enforcement_badge_without_source_receipt
validation.blocking_enforcement_badge_without_command_or_recovery_route
```
## New idea B — EnforcementCase as a user-facing grouped explanation
Badges are atomic; users need grouped explanations. Example: “Cannot certify because 3 claims lack support, 1 source is inaccessible, and the evaluator ran below requested assurance.”
```ts
interface EnforcementCase {
case_id: string;
target_ref: StorageRef;
title: string;
summary: string;
badge_refs: StorageRef[];
clean_verdict_eligibility_receipt_ref: StorageRef;
recommended_actions: Array<{
action_label: string;
command_ref?: string;
expected_effect: string;
}>;
status:
| "open"
| "partially_resolved"
| "resolved"
| "dismissed_by_user"
| "superseded";
created_at: ISO8601;
schema_version: "1.0";
}
```
## New idea C — Evidence Debt Ledger
AB-T-01/05/07/17 all create “not safe yet, but fixable” debt. A dedicated ledger makes it reviewable and batchable.
```ts
type EvidenceDebtKind =
| "missing_source"
| "access_blocked"
| "subminimum_documentation_tier"
| "unverified_source"
| "stale_source"
| "citation_manifest_missing"
| "claim_not_checked"
| "snapshot_unavailable";
interface EvidenceDebtItem {
debt_id: string;
debt_kind: EvidenceDebtKind;
task_id: string;
run_id?: string;
matter_id?: string;
target_claim_ref?: ClaimRef;
source_record_ref?: SourceRecordRef;
artifact_scope_ref?: ArtifactScopeRef;
blocking_clean_verdict: boolean;
recovery_route:
| "source_research"
| "source_verification"
| "citation_manifest_generation"
| "human_review"
| "snapshot_restore"
| "cannot_recover";
status:
| "open"
| "leased"
| "resolved"
| "waived_with_limitation"
| "unrecoverable";
related_clean_verdict_blocker_ref?: StorageRef;
created_at: ISO8601;
schema_version: "1.0";
}
```
This lets Q show an “Evidence Debt” queue rather than burying proof failures in individual runs.
------
# 4. Updated answers to the three specific questions
## 1. AB-T-02: field or first-class state?
**Field, not state — but make it a record and feed a clean-verdict eligibility receipt.** A `satisfied_downgraded` state would conflate the outcome’s substantive result with the assurance path used to reach it. The clean state should remain `satisfied` only when allowed, and the downgrade should appear as `AssuranceExecutionRecord` + `CleanVerdictBlocker`.
Use this exact pattern:
```ts
interface AssuranceExecutionRecord {
assurance_execution_id: string;
result_id: string;
outcome_id: string;
target_assurance_basis: AssuranceBasis[];
minimum_required_assurance_basis: AssuranceBasis[];
executed_assurance_basis: AssuranceBasis[];
assurance_status:
| "met"
| "downgraded_policy_accepted"
| "downgraded_user_accepted"
| "downgraded_needs_review"
| "not_met";
downgrade_kind?:
| "model_class_downgrade"
| "specialist_unavailable"
| "specialist_panel_quorum_not_met"
| "source_verification_unavailable"
| "budget_or_time_cap"
| "tool_unavailable"
| "policy_disallowed_required_assurance"
| "human_reviewer_unavailable";
clean_verdict_allowed: boolean;
clean_verdict_eligibility_receipt_ref?: StorageRef;
explanation: string;
evidence_refs: StorageRef[];
created_at: ISO8601;
schema_version: "1.0";
}
```
Do not add:
```ts
// DO NOT ADD
"satisfied_downgraded"
```
Add lints:
```ts
validation.assurance_execution_record_missing_for_target_assurance
validation.executed_below_target_assurance_presented_as_clean
validation.assurance_downgrade_without_policy_or_user_acceptance
validation.assurance_downgrade_not_reflected_in_reliance_packet
```
## 2. AB-T-09: thin governor and canonical isolation unit?
A thin governor is justified only as **admission/read-model policy**. It must not become a scheduler, module dispatcher, graph mutator, or second orchestrator. The minimum viable form is `PortfolioResourceSnapshot` + `PortfolioAdmissionPolicy` + `PortfolioAdmissionDecision`, all explicitly marked `may_dispatch_work:false` and `may_mutate_task_graph:false`.
The canonical isolation unit should be **`ContextBoundaryRef`**, not raw matter and not raw context class. For privileged legal work, matter dominates context class. Context class is subordinate inside the governing matter/client/project boundary.
Core types:
```ts
interface ContextBoundaryRef {
boundary_id: string;
boundary_kind:
| "privileged_matter"
| "matter"
| "client"
| "project"
| "task_family"
| "context_class"
| "global_nonprivileged";
matter_id?: string;
client_id?: string;
project_id?: string;
task_family_id?: string;
context_class_key?: string;
privileged: boolean;
canonical_for:
| "pattern_promotion"
| "suggestion_learning"
| "invocation_learning"
| "attention_rollup"
| "budget_governance"
| "append_log_partition"
| "source_workspace"
| "snapshot_retention"
| "forum_scope";
parent_boundary_ref?: string;
schema_version: "1.0";
}
```
## 3. Are AB-T-05, AB-T-06, AB-T-15 already satisfied?
**AB-T-05 is not satisfied by live specs.** The card correctly says current source-tier work covers tier bugs, not a risk-based minimum tier requirement. It is also only partially covered by prior R2 recommendations. Adopt it with `SourceUseDeclaration`.
**AB-T-06 is not satisfied by live specs.** Existing Evaluator/Revisor traceability does not cover ordinary module material choices. The card correctly identifies that most real runs involve ordinary modules whose “why” is not structured. Adopt it, but materiality-scoped.
**AB-T-15 is not satisfied by live specs.** The card correctly says stale TKP degraded mode exists as an idea, but there is no freshness budget, invalidation trigger, or “do not design from stale TKP” preflight. It is covered by my prior R2 suggestions, but not by the operative text; adopt it now.
------
# 5. Final cut / keep changes after deeper pass
## Upgrade to ADOPT
- **AB-T-07** scoped `CitationManifest`.
- **AB-T-10** append-log partitioning.
- **AB-T-11** canonical `ContextBoundaryRef`.
- **AB-T-13** runtime integrity / recovery behavior contracts.
- **AB-T-14** load-bearing eval recovery.
- **AB-T-15** TKP preflight.
- **AB-T-16** task-opportunity classifier calibration + kill switch.
- **AB-T-17** snapshot retention/indexing.
## Keep as DISCUSS / light version only
- **AB-T-08**: no `CausalProof` hard gate; adopt only `RevisionChangeRationale`.
- **Full AB-T-09 central governor**: reject if it dispatches work; adopt only thin admission/read-model governor.
- **Full AB-T-12 multi-user ownership**: defer; adopt only `RunOperatorContext`.
## Add as new package
- **CleanVerdictEligibilityReceipt**
- **EnforcementBadge**
- **EnforcementCase**
- **EvidenceDebtItem**
- **RecoveryPolicyRegistryEntry**
- **ContextBoundaryRef**
## Final synthesis
The card is right that this test-set residue is about **not lying with clean verdicts**. But the next draft should not implement that as scattered lints. It should implement a named enforcement substrate:
```text
Proof & Honesty Verdict Floor
= SourceRetrievalOutcome
+ SourceAbsenceAssessment
+ OutcomeProofGateReceipt
+ AssuranceExecutionRecord
+ RequiredQuorumManifest
+ SourceUseDeclaration / MinimumDocumentationTierPolicy
+ scoped CitationManifest
+ SnapshotIndexRecord
+ CleanVerdictEligibilityReceipt
+ EnforcementBadge / EnforcementCase
```
That package turns the card’s core insight into a coherent buildable system: every clean pass has a receipt proving why it is allowed, every blocked pass has a typed reason and recovery path, and every user-facing badge points back to the underlying receipts.
# Gemini
**Top-Line Verdict**
The central framing of this card—shifting from "Reporting" to "Enforcement"—is absolutely correct, but the card slightly understates the ruthlessness required to enforce it. Generating a detailed telemetry trace that accurately describes how a system failed (reporting) does not protect a final filing if the execution graph treats a degraded pass as a green light (enforcement). By injecting strict affirmative-proof gates and ambiguity holds, this card bridges the gap between observability and evidentiary proof. However, the card's instinct to rely on boolean fields instead of discrete state transitions for downgraded results is a systemic risk that will allow automated pipelines to blindly consume compromised artifacts.
### Per-Item Adjudication Table
| **ID** | **Agree/Disagree/Partial** | **Reason** |
| ----------- | -------------------------------- | ------------------------------------------------------------ |
| **AB-T-01** | **Agree** | "Absence of contradiction" is legally useless; affirmative grounding is mandatory for load-bearing claims. |
| **AB-T-02** | **Disagree (State vs Field)** | A boolean flag on an envelope does not halt a DAG. Downgraded assurance must be a discrete routing state. |
| **AB-T-03** | **Agree** | N:1 matter ambiguity is a critical privilege-breach vector; hard holds are non-negotiable. |
| **AB-T-05** | **Agree** | This is net-new and closes the loophole where an ephemeral tier-0 search is silently used as load-bearing evidence. |
| **AB-T-06** | **Partial (Merge with AB-T-08)** | Lightweight rationale for ordinary modules is good, but it should absorb the `CausalProof` requirement (AB-T-08) to prevent KV cache bloat while maintaining auditability. |
| **AB-T-07** | **Agree (Elevate to ADOPT)** | I demanded this in the prior red-team pass. A `CitationManifest` is non-negotiable for Rule 11/PSLRA standards. |
| **AB-T-09** | **Partial** | A central orchestrator violates §2.2, but local unified memory will throttle. The solution is a decentralized hardware lease, not an orchestrator. |
| **AB-T-11** | **Agree** | Matter MUST be the canonical isolation unit. Learning invocation weights across matters violates the privilege firewall. |
### Detailed Notes & Schema Insertions
#### AB-T-02: Assurance Downgrade (State vs. Field)
The card proposes keeping the outcome as `satisfied` and adding a boolean `assurance_downgraded: true`. This is a critical error. The Loop Controller routes execution based on the `OutcomeEvaluationState`. If the state is `satisfied`, downstream modules will consume the artifact, assuming it passed the rigorous standard. The UI might show a yellow warning, but the DAG will fly right past it.
**Fix:** You must add a discrete state. This forces the `EvaluationChainResolutionPolicy` (from the prior adjudication card) to explicitly decide whether to route a downgraded pass to the next node or halt for human review.
TypeScript
```
// Update to Common Contracts §3.2 (OutcomeEvaluationDisposition)
type OutcomeEvaluationDisposition =
| "satisfied"
| "satisfied_downgraded" // NEW: Passed, but executed below target assurance or quorum
| "needs_revision" | "needs_information" | "needs_verification"
| "needs_human_judgment" | "unable_to_evaluate" | "blocked_by_policy" | "regressed"
| "upstream_failure" | "unrecoverable" | "superseded" | "max_iterations_reached";
// Envelope Schema update
interface EvaluationResultEnvelope {
// ... existing fields ...
target_assurance_basis: AssuranceBasis[];
executed_assurance_basis: AssuranceBasis[];
quorum_satisfied: boolean;
// If executed_assurance_basis lacks a required target, or quorum_satisfied === false,
// the overall_state MUST be "satisfied_downgraded" (or worse), NEVER "satisfied".
// validation.downgraded_assurance_emitted_as_clean_satisfied (error)
}
```
#### AB-T-07 & AB-T-08: CitationManifest & CausalProof (Merged via AB-T-06)
The card marks `CitationManifest` (AB-T-07) as DISCUSS due to drafting-module overhead. For federal securities litigation, overhead on factual assertion is a feature, not a bug. If the system writes a factual claim, it must bind it to the source at write-time. However, `CausalProof` (AB-T-08) for the Revisor can be safely merged into `ModuleDecisionRationale` (AB-T-06) to keep token costs manageable.
TypeScript
```
// Add to Source Workspace V1.0.1 (or Common Contracts)
interface CitationManifest {
manifest_id: string;
generating_module_id: string;
generated_artifact_ref: StorageRef;
citations: Array<{
text_anchor_hash: string; // The exact generated text boundary
source_record_ref: string;
extracted_quote: string; // The exact text from the source relied upon
support_type: "direct_fact" | "synthesized_inference";
}>;
schema_version: "1.0";
}
// Any module with capability `factual_drafting` MUST emit a CitationManifest.
// validation.factual_draft_missing_citation_manifest (error).
interface ModuleDecisionRationale {
rationale_id: string;
run_id: string;
module_id: string;
decision_category: "strategic_structure" | "revisor_causality" | "fallback_path_chosen";
rationale_summary: string;
primary_driving_finding_refs?: string[]; // Links to EvaluationFinding
schema_version: "1.0";
}
```
#### AB-T-09: Portfolio Resource Governor vs. Local Constraints
You cannot introduce a central orchestrator without violating Core §2.2. However, running 40 parallel tasks across different matters on local Apple Silicon will result in catastrophic unified memory swapping, context eviction, and thermal throttling. The solution is a **Decentralized Hardware Semaphore**. Tasks don't ask an orchestrator for permission; they attempt to acquire a local hardware lease.
TypeScript
```
// Add to Core R0.7.1
interface LocalHardwareSemaphoreLease {
lease_id: string;
task_id: string;
requested_vram_tier: "low" | "medium" | "high"; // Maps to concurrent active model footprint
acquired_at: ISO8601;
expires_at: ISO8601;
status: "acquired" | "queued_for_compute" | "rejected_thermal_throttle";
}
// If status === "queued_for_compute", the task module self-suspends (yields)
// and registers an SSE listener for hardware-release events. No central orchestrator required.
```
### New Ideas (Net-New Proposals)
**1. Evidentiary Quarantine (Corpus/Library Defense)**
Currently, if a downgraded artifact or a hold-gated artifact completes its local run, it might still be picked up by the Persistent Background Extraction (DOC73) and promoted to the Library/Corpus. We need an **Evidentiary Quarantine**. Any artifact generated in a run that concluded with a `satisfied_downgraded` state, a `SourceMissing` token, or an `ambiguous_hold` MUST be cryptographically tagged as `quarantined`. DOC73 must be strictly forbidden from extracting, embedding, or promoting quarantined artifacts to the highest structural tiers until a human explicitly signs the `WorkProductCertification`.
### Answers to Specific Questions
**1. AB-T-02: State vs. Field for Assurance Downgrade?**
It **must** be a first-class state (`satisfied_downgraded`). A field-only approach is structurally dishonest at the orchestration layer. The DAG routes on state. If you pass `satisfied` with a metadata field saying "but I didn't actually use the required specialist," you are relying on every downstream module and UI surface to manually check that boolean before proceeding. Adding `satisfied_downgraded` forces the task graph to explicitly handle the compromise via the `EvaluationChainResolutionPolicy`. (See exact schema in the Detailed Notes above).
**2. AB-T-09 & AB-T-11: Portfolio Governor & Isolation Unit**
- **The Governor:** A thin governor is justified by physical hardware constraints, but it must not be a central task arbiter. Use the `LocalHardwareSemaphoreLease` (defined above). It acts purely as a local resource lock (VRAM/Compute), completely agnostic to the *business logic* of the tasks. This preserves the decentralized, modular nature of §2.2.
- **The Canonical Isolation Unit (AB-T-11):** **Matter is the absolute canonical isolation unit.** Learning across context-classes while ignoring matter boundaries is a fatal privilege leak. If the Task Agent learns that a specific judge prefers a certain formatting style in Matter A, and applies that autonomously to Matter B, that is a cross-matter contamination of work product. All learning, pattern matching, and sub-agent priors MUST be partitioned by `matter_id` first.
**3. Are AB-T-05, AB-T-06, and AB-T-15 actually net-new?**
- **AB-T-05 (Risk-based tier matrix):** **Net-New.** The prior adjudication card (D-04) fixed the *mechanics* of source tiers (removed Tier 0, fixed demotion authorization). However, it did nothing to enforce a *floor* based on task risk. AB-T-05 is the policy layer that dictates you cannot use a Tier 1 (ephemeral) source for a load-bearing claim.
- **AB-T-06 (ModuleDecisionRationale):** **Partially Covered/Net-New Extension.** In the prior round, Grok proposed `ExplanationTrace` (adopted as G-18). However, G-18 was scoped specifically to the *Revisor* (`CompiledRevisionStrategy`). AB-T-06 extends this concept to *all* standard execution modules (`step.agent_task`). It is technically net-new for non-revisor modules, but borrows existing conceptual architecture.
- **AB-T-15 (TKP freshness preflight):** **Net-New.** The prior round heavily audited staleness for the *Source Workspace* (rolling hashes, live-edits, revalidation cascades—see B-03, B-04, B-05). It completely missed the staleness of the *Task Knowledge Pack (TKP)* itself. If the foundational blueprint/instructions are stale, checking the source documents for freshness is irrelevant. This is a critical new catch.
## Top-Line Assessment
After a deep-dive re-review of the entire `DOC23 Addenda B` family, the architecture represents an exceptionally high-fidelity blueprint for observable agent orchestration, but it contains **latent systemic vulnerabilities** at the runtime-execution boundary. The system accurately tracks state changes post-mortem, but its real-time mathematical contracts and transaction boundaries are susceptible to **quadratic token bloat, deterministic replay deadlocks on stochastic modules, and asynchronous context race conditions**. The structure successfully establishes a scannable and discrete memory logging trail, but it creates a **taint-propagation deadlock** that treats mechanical transformations identically to untrusted semantic inputs.
## 1. Systemic Primitives & Mathematical Bugs Audit
### [BUG-01] Quadratic Token Blowout via Unpruned Iteration Context
- **Location:** Outcome Evaluator/Revisor V3.3.1 §7.4 (`RevisionIntelligencePacket`) & §11.20 (Iterative Cycles)
- **The Technical Flaw:** During an active revision loop, the Revisor compiles a `RevisionPlan` that ingests the historical chain of `EvaluationFindings`, `OutcomeEvaluationResults`, and `SemanticChangelogs` from all prior iterations within the run. However, the spec fails to declare an **active memory pruning or context eviction policy** for in-flight execution.
- **The Consequence:** As the loop reaches iteration $N$, the `RevisionIntelligencePacket` token footprint grows quadratically. The local OpenClaw model runtime will experience extreme context window degradation, catastrophic KV-cache bloat, escalating latency spikes, or immediate memory out-of-bounds crashes on the local substrate.
- **The Fix:** Implement a strict **Sliding Window Context Eviction Policy** within the `RevisionIntelligencePacket` contract, forcing the compaction of iterations $1$ through $N-2$ into historical semantic summaries while preserving exact cryptographic hashes of the raw milestones.
- Schema Patch (V3.3.1 §7.4):
TypeScript
```
interface ContextEvictionPolicy {
max_uncompacted_iterations: number; // Defaults to 2
compilation_context_mode: "full_history" | "sliding_window_compacted";
compaction_strategy: {
preserve_raw_hashes_only: boolean;
semantic_summary_token_cap: number;
};
}
// Append to RevisionIntelligencePacket schema:
context_eviction_profile: ContextEvictionPolicy;
evicted_iteration_manifest: Array<{
iteration_seq: number;
original_packet_hash: string;
compacted_summary_ref: StorageRef; // Truncated high-level delta text
}>
```
- **Lint ID:** `validation.context_bloat_unpruned_iteration_chain`
### [BUG-02] Deterministic Replay Deadlock on Stochastic Capabilities
- **Location:** Outcome Evaluator/Revisor V3.3.1 §11.8 (Deterministic Idempotency)
- **The Mathematical Flaw:** The spec mandates that `TypedRevisionInstruction.idempotency_key` is calculated deterministically via a hash of inputs (`params_hash`, `preserve_constraints_hash`, etc.). If a module revision capability has stochastic properties—such as an agent running at `temperature = 0.7` for creative drafting or alternative prompt variant generation—a retry or rerun of that step will yield an identical key.
- **The Consequence:** The deterministic execution kernel will intercept the rerun, see the matching key, bypass model execution entirely, and return the cached output from the *first* run. This completely defeats the purpose of variant evaluation, creative retries, or probabilistic exploration loops within experiments.
- **The Fix:** Incorporate a dynamic `stochastic_nonce` or `sampling_salt` into the idempotency key formula if, and only if, the underlying `ModuleRevisionCapability` declares non-deterministic execution semantics.
- Formula Patch (V3.3.1 §11.8):
TypeScript
```
// Update the TypedRevisionInstruction key derivation formula:
interface StochasticIdempotencyExtension {
is_stochastic: boolean; // Pulsed from ModuleRevisionCapability.idempotency_semantics
sampling_salt?: string; // sha256(run_id + iteration_seq + retry_count)
}
// Invariant Key Generation Adjustment:
// If capability.idempotency_semantics === "non_idempotent_stochastic", then:
// TypedRevisionInstruction.idempotency_key = sha256(
// base_step_key + params_hash + StochasticIdempotencyExtension.sampling_salt
// )
```
- **Lint ID:** `validation.stochastic_capability_bypassed_by_deterministic_key`
### [BUG-03] Asynchronous Context Drift and Race Conditions
- **Location:** Task Forum + Run Board V1.0.1 §6.3 (`TaskRunContextPacket`) & Source Workspace V1.0.1
- **The Technical Flaw:** The `TaskRunContextPacket` budget is evaluated and capped at `max_tokens` when assembled by DOC24. However, the Task Forum operates on an **asynchronous, append-only paradigm** where multiple sub-agents can post updates or annotations mid-flight.
- **The Consequence:** A race condition occurs if sub-agents append new critical posts to the forum stream *after* DOC24 calculates the context packet budget but *before* the OpenClaw prompt compiler serializes the final text payload. The prompt window will blow past its hardware tokens floor, causing silent text truncation or execution failure at the inference boundary.
- **The Fix:** Enforce an immutable **Context Sequence Lock** at packet initialization. The packet must freeze the observable forum timeline at a specific transaction sequence ID, discarding downstream appends until the next tick.
- Schema Patch (Task Forum V1.0.1 §6.3):
TypeScript
```
interface ContextSequenceLock {
frozen_forum_sequence_id: number; // Highest auto-incremented post index at packet assembly
frozen_source_workspace_version: string;
concurrency_mutation_allowed: boolean; // Hardcoded to false for in-flight compilation
}
// Append to TaskRunContextPacket schema:
runtime_sequence_lock: ContextSequenceLock;
```
- **Lint ID:** `validation.context_packet_mutation_during_flight`
### [BUG-04] Syntactic Taint Deadlock (The Formatting Trap)
- **Location:** Source Workspace V1.0.1 (`SourceRecord`) & Outcome Evaluator/Revisor V3.3.1 §15.10
- **The Logical Flaw:** The architecture states that any output of a module or sub-agent that consumed a tainted input automatically inherits the maximum taint class of its inputs.
- **The Consequence:** This creates a **syntactic deadlock**. If a purely mechanical, deterministic tool (e.g., a syntax normalizer, markdown formatter, or AST lint corrector) operates on an `external_untrusted` document to fix formatting errors, the output file remains permanently locked as `external_untrusted`. The Revisor cannot apply a simple layout cleanup to a document without forcing the whole task graph into a high-stakes quarantine, even if zero semantic text processing occurred.
- **The Fix:** Bifurcate taint propagation rules to distinguish between **Semantic Ingestion** (which propagates taint) and **Deterministic Syntactic Transformation** (which preserves the payload but allows safe mechanical tools to return success receipts without triggering structural quarantine).
- Logic Patch (V3.3.1 §15.10):
TypeScript
```
type TaintPropagationVector = "semantic_ingestion" | "syntactic_transformation";
interface TaintInheritanceRule {
propagation_vector: TaintPropagationVector;
tool_execution_class: "deterministic_mechanical" | "llm_generative";
// If tool_execution_class === "deterministic_mechanical"
// AND propagation_vector === "syntactic_transformation",
// then output document retains payload content but does NOT widen
// downstream execution quarantine scopes.
}
```
- **Lint ID:** `validation.syntactic_transformation_locked_by_semantic_taint`
## 2. Memory System Evaluation: Coherence & Unification
### Does this structure accomplish the goals of unifying and improving the memory system?
**No, it handles logs seamlessly but fragments operational context.**
The current design establishes excellent observational ledgering via `EvaluationResultEnvelopes` and `TaskTraceSpans`. It acts as a world-class recorder. However, the mechanism for **re-injecting** those accumulated records back into future tasks is split across competing scopes:
1. Local edits update the `TaskBlueprint` (local operational scope).
2. `RunGuidanceItems` float inside `Feedback Delivery`.
3. `SourceRecords` track history inside the workspace substrate.
Because there is no unified **Memory Hydration Lifecycle Phase** before a task runs, these distinct memory tracks collide. A global pattern retrieved from the DOC72 Skill Graph can easily contradict a highly localized `RunGuidanceItem` inside the blueprint. Without a mathematical priority engine to resolve these collisions at runtime, the system will experience **instruction cross-contamination or agent amnesia**.
## 3. New Architectural Proposals (Net-New Primitives)
### Proposal 1: Pre-Execution Memory Hydration Engine
To unify the memory architecture, we must formalize how data is read and de-conflicted *before* the execution graph ticks. We introduce a strict, deterministic compilation gateway called the **Memory Hydration Phase** into Core R0.7.1.
TypeScript
```
interface MemoryPrecedenceHierarchy {
priority_order: Array<
| "local_task_blueprint_guidance"
| "matter_scope_policy"
| "source_workspace_warnings"
| "global_skill_graph_doc72"
>;
conflict_resolution_strategy: "mask_lower_precedence" | "merge_with_warning_flag" | "escalate_human";
}
interface HydratedMemoryState {
hydration_id: string;
task_id: string;
run_id: string;
resolved_precedence_policy: MemoryPrecedenceHierarchy;
// The consolidated, de-conflicted instruction set fed to the context assembler
active_instruction_vector_hash: string;
compiled_priors_manifest_ref: StorageRef;
hydrated_at: ISO8601;
schema_version: "1.0";
}
```
- **Execution Rule:** Before `RevisionDispatcherState` transitions from `ready` to `dispatching`, the kernel MUST execute the hydration phase, emit a `HydratedMemoryState` record, and seal it with an immutable hash. If any global pattern directly contradicts local blueprint guidance, the global pattern is explicitly masked out of the final context packet.
### Proposal 2: Stochastic Idempotency Nonce Registry
To prevent the deterministic replay deadlocks outlined in Bug 2, we introduce a decentralized runtime nonce tracker to verify stochastic operations safely.
TypeScript
```
interface StochasticNonceRegistryEntry {
nonce_id: string;
run_id: string;
step_id: string;
iteration_seq: number;
retry_count: number;
stochastic_seed: number; // Dynamically generated random seed passed to OpenClaw
produced_artifact_hash_placeholder?: string;
created_at: ISO8601;
}
```
- **Execution Rule:** When a module with capability safety class `meaning_bearing` and stochastic properties executes, the Dispatcher injects the `stochastic_seed` into the execution environment, guaranteeing that repeated steps generate distinct output hashes and unique idempotency keys while retaining full auditability.