ELNOR REPO READER TEXT MIRROR Original path: Current Specs/DOC23/DOC23 Addenda B/DOC23_ADDB_OUTCOME_EVALUATOR_REVISOR_V3_3_1.md Source repo: /Users/OpenClaw1/Elnor/Elnor Specs Git branch: main Git commit: dbaa25962edc11ab30e8d4ca1715f9ae5bf77331 Generated: 2026-06-09T01:23:58.539Z --- # DOC23 Addenda B Addendum — Outcome Evaluator and Revisor V3.3.1 **Subject:** Specification for the Outcome Evaluator and Revisor subsystem of the DOC23 task system modular architecture. Defines the Outcome Compiler, Revision Compiler, Evaluator and Revisor modules, Revision Dispatcher, `revision_in` port contract, `claims_in` port contract, `evaluation_result_out` port contract, candidate artifact versions, plan assurance policy, taint and policy model, pattern learning, learning-mode configuration, and feedback pipeline. **Status:** V3.3.1 specification — clean replacement for V3.3; reference/topology cleanup only; no schema or behavior changes **Version:** 3.3.1 **Prepared:** 2026-05-15 (V3); revised 2026-05-16 (V3.1 audit-complete); revised 2026-05-17 (V3.2 coordination patch); revised 2026-05-17 (V3.3 Pattern C wiring crystallization); revised 2026-05-24 (V3.3.1 family reference/topology cleanup) **V3.3.1 changes from V3.3:** No schema, route, port, runtime, or behavioral changes. This full replacement copy updates companion-document references to the current clean Addenda B family set: Core R0.7.1, Common Contracts V1.1.1, Source Workspace V1.0.1, Task Forum + Run Board V1.0.1, and Feedback Delivery V1.0.1. V3.3's Pattern C `evaluation_result_out` port addition remains the last substantive change. **V3.3 changes from V3.2 (Pattern C wiring crystallization):** V3.3 supersedes V3.2 with one surgical addition addressing the gap that the V3 FINAL coordination architecture documented Pattern C (ad-hoc Judge attachment) as a pattern but did not crystallize the port-level wiring. V3.2 introduced the conceptual pattern through obligation rows (OBL-XDOC-OUTCOME-COMPLIANCE-01, OBL-XDOC-DOC20-EVAL-UI-01) but a coding agent implementing Pattern C would have had to interpret which Evaluator output port carries the payload. V3.3 adds: 1. **§5.18 — Evaluator `evaluation_result_out` port contract.** Explicit output port emitting `EvaluationArtifactEnvelope` (Common Contracts §3 wrapped per coordination V3 §2.3). This is the canonical port Pattern C wires Judge downstream of. Pattern A/B internal Evaluator activations also emit on this port; Experiment captures internally. 2. **§29.13.X — OBL-XDOC-JUDGE-EVALUATOR-OUTPUT-IN-01.** New cross-doc obligation row tracking Addenda A's responsibility to specify the corresponding input port on `step.judge` (Judge's `evaluation_result_in` or equivalent — naming at Addenda A's discretion). 3. **Frontmatter purpose statement** updated to reference `evaluation_result_out` port contract alongside `revision_in` and `claims_in`. Companion change in DOC23 Evaluation Common Contracts V1.1.1 §3.7 documents the envelope's Pattern C consumption semantics (target_evaluation_chain_id linkage between upstream Evaluator and downstream Judge envelopes). Nothing else changes from V3.2. The V3.2 additions (§5.1.1 Criterion, §5.17 claims_in, §6.16 learning_mode, §13.1 cross_model_applicability, §13.3 model_class, §0.4.24 enums, §28.1 matrix, §29.13 V3.2 coordination table) are all preserved. The V3.1 audit fixes and the 39 patches are preserved. **Migration:** V3.3 is read-compatible with V3.2; V3.2 consumers continue to operate. V3.2 may be retired once V3.3 is reviewed. **V3.2 changes from V3.1 (Addenda A ↔ Addenda B coordination V3 FINAL absorption):** V3.2 supersedes V3.1 with three surgical additions absorbing the locked coordination architecture from the Addenda A ↔ Addenda B V3 FINAL coordination proposal. No behavior of V3.1 is removed; the architecture, the 39 patches, the canonical enum inventory, the state machines, the cross-doc obligations, and all V3.1 audit-fix additions are preserved. V3.2 adds three specific surfaces that the coordination architecture requires: 1. **`EvaluationOutcomeDefinition.criteria: Criterion[]` as canonical sub-structure (§5.1).** V3.1 referenced `EvaluationFinding.target_criterion_id` and the audit V3.1 §0.3.7 noted the canonical schema name but did not include `criteria` as a first-class field. V3.2 makes `Criterion` an explicit sub-structure with its own schema; `Criterion` is the public sub-contract Addenda A's Judge module consumes for `outcome_compliance_scoring` (coordination V3 §2.4). 2. **`claims_in` port contract on `step.evaluator` (§5.17).** V3.2 adds the symmetric port to V3.1's `revision_in` contract. The Evaluator consumes `ClaimSetBundle` / `ExtractedEvaluationUnitBundle` from Addenda A's `step.claim_extractor` upstream module via this port (coordination V3 §2.12). 3. **Learning-mode capability — `RevisorConfig.learning_mode` (§6.14, §6.16), `PatternContextSignature.model_class` (§13.3), and `Pattern.cross_model_applicability` (§13.1).** V3.2 adds Phase 1 support for the cheap-LLM learning generator mode and ad-hoc Judge attachment (coordination V3 §2.10, §2.9 Pattern C). Supporting additions: - §0.3.7 amended to note `criteria` is the canonical sub-structure - §0.4 enum inventory extended with `LearningMode`, `ModelClass`, `CrossModelApplicability` enums - §28.1 landing matrix [C18], [D11], [I9a] rows updated to record V3.2 absorption - §29 cross-doc obligations extended with the V3 FINAL OP-A row references Nothing else changes from V3.1. The coordination architecture V3 FINAL defines additional shared primitives (`EvaluationResultEnvelope`, `EvaluationLearningSignalEnvelope`, `ArtifactScopeRef`, `TextAnchor`, `StructuredAnchor`, `VariantEvaluationLineage`, `CriterionLineage`, and slice schemas) that live in the new `DOC23 Evaluation Common Contracts` document (per coordination V3 §3.2). V3.2 references those primitives by name; their schemas live in Common Contracts and absorb into DOC23 R3.2 when that compiles. **Migration:** V3.2 is read-compatible with V3.1; V3.1 consumers that don't yet emit on the new surfaces continue to operate. V3.1 may be retired once V3.2 is reviewed. **V3.1 changes from V3 (audit-discovered gaps closed inline):** V3.1 supersedes V3 with no behavioral changes — V3.1 closes substantive gaps where V3 either referenced primitives without defining them, omitted card items that were ACCEPTED in the adjudication card but didn't land in V3 body, or under-specified named operations. The architecture, the 39 patches, the state machines, and all canonical enums are unchanged from V3. Substantive additions: - **§5.16 EvaluationSnapshot** (closes [E6]) — snapshot schema with artifact content hashes, source workspace state ref, graph topology hash; consumed by §11.20 live-edit check and §6.7 sufficiency protocol. - **§7.13 ArtifactMutationPrecondition** (closes [E7]) — named precondition type with expected_base_version_id, expected_content_hash, policy_decision_required, taint_class_acceptable, on_precondition_failure; consumed by §11 dispatch. - **§7.4.5 Per-component taint inheritance on RevisionIntelligencePacket** (closes [F1a]) — explicit taint label per RIP component. - **§5.5.4 Progress signal classification** (closes [C15]) — `still_failing_same_reason` vs `failing_for_new_reason` distinction drives Revisor strategy selection. - **§12.6 ArtifactDiff and SemanticNeighborhood extraction** (closes [H4]) — named Compiler slicing operations with taint, budget, and provenance tagging. - **§13.7 Plan template versioning** (closes [D10]) — full 5-step retrieval ordering rule. - **§14.9 Forum role-scoped critique and dynamic participation** (closes [J3], [J6]) — specialists have scoped critique authority; Task Agent participates only when graph design is in question. - **§15.6 Eval suite slicing** (closes [K6], [K10]) — by pattern_id and goal_kind, not hardcoded work type. - **§15.8 Sub-agent advice quality metrics** (closes [K7]) — four named metrics including advice-led-to-regression rate. - **§16.6 Matter-specific governance policies** (closes [G6]) — per-matter policy scoping for M&A, litigation, regulatory, internal advisory. - **§21.8 "from memory" vs "adapted from memory" UI distinction** (closes [I12]) — pattern application badge. - **§0.3 Schema naming canonical:** `EvaluationOutcomeDefinition` (closes [C18]). V3 used `EvaluationOutcomeDefinition`; V3.1 standardizes on the canonical name per the adjudication card. - **§0.3 Agent/procedure/capability normalization table** (closes [A4t]). - **§0.3 Cross-doc terminology table** (closes [M1]). - **§11 explicit kernel ownership statement** (closes [A4s]) — §11 is the canonical RevisionRuntimeKernel; feature sections consume from it. - **§25 fixture additions** (close [N1], [N2], [N3], [N7], [N8], [N10]) — F-QUAL-02, F-ASSURE-03, F-GOAL-01, F-PRESERVE-01, F-LOOP-01, F-DELIVERY-01. - **§26.1 Phase 2 additions** (close [P1], [P2], [P5], [P6], [P8], [P11]) — DecisionTraceRecord universal adoption, eval suite full domain coverage, DOC17 prompt artifact integration, cross-LLM red-team automation, per-outcome exact cost attribution, thermal-aware scheduling. - **§28.1 landing matrix expanded** — all 247 card items now recorded with explicit status (accepted, accepted-with-modification, superseded, deferred, rejected). Rename: every occurrence of `EvaluationOutcomeDefinition` (as the schema name for an evaluable outcome) is changed to `EvaluationOutcomeDefinition`. The UI label "Outcome" and the runtime result type `OutcomeEvaluationResult` are unchanged. DOC24's runtime taxonomy `OutcomeClass` is unchanged. The 39 patches, the canonical enum inventory (§0.4), the five-stage pipeline (§1.1), the eleven Revisor rules (§3.9 / §6.15), the failure / strategy / target taxonomies, and all cross-doc obligations (§29) are unchanged from V3. **Source documents:** - DOC23 Task System Modular Architecture R3.1 - DOC23 Addenda A (Task Optimization) R4.1 V2 - DOC23 Addenda B (Task Intelligence, Memory, Observability) R0.6.4 - DOC23 Addenda B Addendum V3 Adjudication Card V3.1 - DOC23 Addenda B Addendum V3 Canonicalization Patch V2 **Cross-doc dependencies:** - **DOC11** (OpenClaw gateway) — runtime execution truth, ACP integration - **DOC12** (multi-agent room system) — plan review forum - **DOC15** (Cognitive Infrastructure Layer) — CIL authority, context packets - **DOC20** (UI surfaces) — Evaluation Result Card, Revision Result Card, Teach-from-feedback - **DOC24** (operations layer) — capability registry, runtime injection - **DOC25** (source ingestion) — IngestionResult, source verification - **DOC72** (intelligence substrate) — Pattern primitive, goal context - **DOC73** (positronic brain enhancement) — receipt envelope, PBEOperationReceiptLite - **EC Core** — sole durable writer, retention rows, PolicyDecision gate - **BDSM / DOC8** — utility learning signals - **DOC23 Addenda A** — Judge module, Claim Extractor, prompt optimization **Conventions:** - Normative rules use MUST / MUST NOT / SHOULD / MAY in the RFC 2119 sense. - TypeScript-like notation for schemas. `?` denotes optional. `|` denotes union. `Array` and `T[]` are equivalent. - Cross-references use §N.M format within this spec; [X] format for items in the Adjudication Card V3.1; [P-N] format for patches in the Canonicalization Patch V2. - "The system" refers to ELNOR; "the addendum" refers to this V3 specification. --- ## §0.1 Scope This addendum specifies the Outcome Evaluator and Revisor subsystem. In scope: - The Outcome Compiler that converts natural-language outcomes into typed evaluation plans - The `step.evaluator` module type and its runtime contract - The Revision Compiler that produces typed revision strategies - The `step.revisor` module type and its runtime contract - The Revision Dispatcher runtime service - The `revision_in` port contract for revision-capable modules - Candidate artifact versions and rollback semantics - Plan assurance policy and adversarial linting - The taint and policy model for adversarial inputs - Pattern learning, feedback pipeline, and signal classes - The five canonical schemas: CompiledEvaluationPlan, CompiledRevisionStrategy, RevisionPlan, OutcomeEvaluationResult, RevisionOperationReceipt - UI surfaces specific to evaluation and revision - Cross-doc obligations on DOC11, DOC12, DOC15, DOC20, DOC24, DOC25, DOC72, DOC73, EC, BDSM, and Addenda A Out of scope (handled by other docs): - General DOC23 module activation, ports, cables, execution engine (DOC23 R3.1) - Task Agent design (DOC23 Addenda B R0.6.4) - Source verification mechanics (DOC25, DOC73) - BDSM utility-bundle compilation (DOC8) - Memory promotion governance (DOC72, EC) - Prompt Lab / DSPy optimization mechanics (DOC17) - ACP session lifecycle for `step.coding` (DOC11) ## §0.2 Phasing Phase 1 (this spec): full Evaluator/Revisor/Dispatcher pipeline, candidate versions, taint model, pattern capture, quality programs with metric denominators, sub-agent reputation scoring, dry-run mode, hardware-aware degradation, four-domain eval suite plus domain-neutral smoke set, cross-doc obligations registered in OP-A. Phase 2 (deferred per §26 Open Questions): full BDSM utility-bundle compilation, DOC24 hot-path compiled guidance injection, generic DOC23 AsyncModuleRun amendment, full thermal-aware scheduling, full sub-agent reputation routing logic, universal DecisionTraceRecord adoption, software / research / marketing / general / process-trace eval domains, per-outcome exact cost attribution, cross-LLM red-team automation. --- ## §0.3 Canonical Naming Table V3 uses canonical names throughout. Implementations and prose must not introduce synonyms. ### §0.3.1 Module types and services | Concept | Persisted module type | Display label | Implementation class | Internal compiler | Runtime service | |---|---|---|---|---|---| | Outcome Evaluator | `step.evaluator` | Evaluator | EvaluatorModule | OutcomeCompiler | (module activation) | | Revisor | `step.revisor` | Revisor (or "Revision Planner" in prose) | RevisionPlanner | RevisionCompiler | RevisionExecutionService | | Revision Dispatcher | n/a (derived runtime service) | Revision Dispatcher | RevisionExecutionService | n/a | yes (canvas projection per §11.1) | | Feedback Interpreter | n/a (runtime service) | Feedback Interpreter | FeedbackInterpreter | n/a | yes | Deprecated aliases: - `step.revision_planner` — optional migration alias for `step.revisor`; deprecated and not preferred - "Revision Bus" — renamed to Revision Dispatcher throughout ### §0.3.2 Artifact and schema name disambiguation | Concept | Canonical name | Notes | |---|---|---| | User-facing outcome label | "Outcome" | UI only | | Outcome schema record | `EvaluationOutcomeDefinition` | bare `Outcome` not used in schemas | | Outcome verdict result | `OutcomeEvaluationResult` | | | DOC24 runtime outcome class | `OutcomeClass` | distinct from this addendum's outcome concept | | Revision plan definition | `RevisionPlan` | "intended repair" | | Revision execution record | `RevisionExecutionRecord` | "what actually happened" | | Module-level receipt | `RevisionExecutionReceipt` (ModuleRevisionResultPayload) | local payload inside envelope | | Operation envelope receipt | `RevisionOperationReceipt extends PBEOperationReceiptLite` | per [P-8] | | Plan summary | `RevisionRunSummary` | aggregate | | Hard call | `HardRevisionCall` | | | Candidate version | `CandidateArtifactVersion` | inside SourceWorkspace, not separate primitive | | Safety envelope | `RevisionSafetyEnvelope` | | | Compiled evaluation plan | `CompiledEvaluationPlan` | preliminary or runtime-resolved phase | | Compiled revision strategy | `CompiledRevisionStrategy` | | ### §0.3.3 Agent and capability normalization | Thing | Has agent identity? | DOC24 capability? | DOC23 module? | Runtime service? | Advisory only? | |---|---|---|---|---|---| | `step.evaluator` | maybe | yes | yes | no | no | | Outcome Compiler | no / possibly | no | no | service | no | | `step.revisor` | maybe | yes | yes | no | no | | Revision Compiler | no / possibly | no | no | service | no | | Revision Dispatcher | no | no | no | yes (projection) | no | | Specialist subevaluator (built-in) | yes | yes | yes (sub-module) | no | execution | | Advisory sub-agent (built-in or user) | yes | sometimes | no | no | yes | | Execution sub-agent (`revision_in`) | yes | yes | yes (module-backed) | no | no | | Feedback Interpreter | no / possibly | no | no | service | no | | Task Agent | yes | yes (advisory) | maybe separate | no | advisor | | Judge module (Addenda A) | module/agent | yes | yes | no | no | | Claim Extractor (Addenda A) | module/agent | yes | yes | no | no | ### §0.3.4 vs DOC73 envelope vocabulary | DOC73 term | This addendum | |---|---| | Receipt | `RevisionOperationReceipt extends PBEOperationReceiptLite` | | Operation | RevisionOperationKind (a value of operation_kind) | | Local payload | `ModuleRevisionResultPayload` (inside the envelope) | | Section ref | inherited from PBEOperationReceiptLite | | EC sequence number | inherited from PBEOperationReceiptLite | ### §0.3.5 Agent / procedure / capability normalization (per [A4t]) The following table disambiguates what is an agent (has identity, may be reputation-scored), what is a DOC24 capability (registered, version-bound), what is a DOC23 module (has persisted `step.kind`), and what is a runtime service (in-process, not a stored graph node): | Thing | Agent identity? | DOC24 capability? | DOC23 module? | Runtime service? | Advisory only? | |---|---|---|---|---|---| | `step.evaluator` | maybe | yes | yes | no | no | | Outcome Compiler | no/possibly | no | no | service | no | | `step.revisor` | maybe | yes | yes | no | no | | Revision Compiler | no/possibly | no | no | service | no | | Revision Dispatcher | no | no | no | yes (projection per §11.1) | no | | Specialist subevaluator | yes | yes | yes (sub-module) | no | execution | | Advisory sub-agent | yes | sometimes | no | no | yes | | Feedback Interpreter | no | no | no | service | no | "Agent identity" means the thing has a stable identifier under which reputation accrues. "DOC24 capability" means the thing is registered in the capability registry with `capability_id` and `capability_version`. "Runtime service" means in-process executable that is not a stored graph node. "Advisory only" means the thing cannot directly write durable state. ### §0.3.6 Cross-doc terminology table (per [M1]) The canonical concept-to-name mapping across documents that consume primitives from this addendum: | Concept | Canonical name | DOC23 type | DOC24 capability | EC artifact kind | DOC73 envelope | |---|---|---|---|---|---| | Evaluator module | `step.evaluator` | yes | yes | EvaluatorModuleRecord | n/a | | Revisor module | `step.revisor` | yes | yes | RevisorModuleRecord | n/a | | Outcome Compiler | OutcomeCompiler | service | no | n/a | n/a | | Revision Compiler | RevisionCompiler | service | no | n/a | n/a | | Revision Dispatcher | RevisionDispatcher | service+projection | no | n/a | n/a | | `revision_in` port | revision_in | port type | n/a | n/a | n/a | | Compiled evaluation plan | CompiledEvaluationPlan | n/a | n/a | yes | n/a | | Compiled revision strategy | CompiledRevisionStrategy | n/a | n/a | yes | n/a | | Operation receipt | RevisionOperationReceipt | n/a | n/a | yes | yes (extends PBEOperationReceiptLite) | | Module receipt | RevisionExecutionReceipt | n/a | n/a | yes | n/a | | Hard call | HardRevisionCall | n/a | n/a | yes | n/a | | Hard call resolution | HardCallResolution | n/a | n/a | yes | n/a | | Pattern | Pattern (3 kinds) | n/a | yes | yes (DOC72 graph) | n/a | | Advisory sub-agent | AdvisorySubAgent | n/a | yes (advisory) | yes (profile) | n/a | | Candidate version | CandidateArtifactVersion | n/a | n/a | yes | n/a | | Evaluation snapshot | EvaluationSnapshot | n/a | n/a | yes | n/a | | Mutation precondition | ArtifactMutationPrecondition | n/a | n/a | n/a (precondition record) | n/a | | Policy decision | PolicyDecision | n/a | n/a | yes (EC-owned) | yes | | Pattern performance slice | PatternPerformanceSlice | n/a | n/a | yes (DOC72) | n/a | | Sub-agent reputation | SubAgentReputation | n/a | n/a | yes (DOC72) | n/a | | Feedback event | HumanOutcomeFeedbackEvent | n/a | n/a | yes | n/a | | Direct instruction candidate | DirectInstructionCandidate | n/a | n/a | yes | n/a | | Semantic changelog | SemanticChangelog | n/a | n/a | yes | n/a | ### §0.3.7 EvaluationOutcomeDefinition naming (per [C18]) The schema name for an evaluable outcome is `EvaluationOutcomeDefinition`. The user-facing UI label is **Outcome**. The runtime result type is `OutcomeEvaluationResult`. DOC24's runtime taxonomy keeps `OutcomeClass`. V3 of this addendum used `OutcomeDefinition`; V3.1 standardizes on the canonical schema name. The disambiguation: - `EvaluationOutcomeDefinition` — the persisted schema (what the user defined to be evaluated) - "Outcome" — the UI label rendered to the user (no underscores, no PascalCase) - `OutcomeEvaluationResult` — the runtime record emitted by an evaluator (the verdict) - `OutcomeClass` — DOC24's routing taxonomy (orthogonal; not changed) **V3.2: `Criterion` is the canonical sub-structure on `EvaluationOutcomeDefinition`.** `EvaluationOutcomeDefinition.criteria: Criterion[]` is the public sub-contract that Addenda A's Judge module consumes for `outcome_compliance_scoring` (Addenda A ↔ Addenda B coordination V3 §2.4). `Criterion` is versioned independently of `EvaluationOutcomeDefinition`; bumping the Criterion schema requires both addenda to coordinate. Other fields on `EvaluationOutcomeDefinition` (assurance_basis bindings, evaluation_method, sufficiency_protocol, goal_refs, novelty assessment, etc.) remain Addenda B internal. The full `Criterion` schema is in §5.1. --- ## §0.4 Canonical Enum Inventory Every enum used in V3 is declared here. Implementations may not introduce new values or substitute alternates without spec amendment. Each enum is owned by a single section; consumers reference but do not redefine. ### §0.4.1 Outcome state enums ``` OutcomeEvaluationState = // result / disposition | "pending" | "pending_dependency" | "evaluating" | "satisfied" | "needs_revision" | "needs_information" | "needs_verification" | "needs_human_judgment" | "unable_to_evaluate" | "blocked_by_policy" | "regressed" | "unrecoverable" | "dirty" | "superseded" | "upstream_failure" OutcomeLifecycleState = // outcome existence | "active" | "inactive" | "deprecated" | "superseded" ``` Owner: §5.1 OutcomeRuntimeState Consumers: Compiler (§4), Evaluator module (§5), Revisor (§6), Loop Controller (DOC23 R3.1), UI (§21) ### §0.4.2 Dispatcher state enums ``` RevisionDispatcherState = // canonical runtime state | "idle" | "validating" | "ready" | "dispatching" | "waiting_human_gate" | "waiting_hard_call" | "waiting_dependency" | "revalidating" | "completed" | "escalated" | "aborted" | "rolled_back" RevisionPlanStatus = // plan definition lifecycle | "draft" | "proposed" | "approved_for_dispatch" | "dispatching" | "partially_completed" | "completed" | "failed" | "superseded" | "cancelled" RevisionFailureEventKind = // typed events, NOT states | "failed_validation" | "step_failed" | "dependency_timeout" | "revalidation_failed" | "workspace_unavailable" | "version_conflict" | "preservation_violation" | "policy_blocked" | "budget_exhausted" | "preempted" RevisionUIStatus = // user-facing labels | "Ready to review" | "In progress" | "Awaiting your input" | "Blocked" | "Completed" | "Escalated" | "Cancelled" | "Reverted" ``` Owner: §11.2 Mapping rule: `(RevisionDispatcherState, latest RevisionFailureEventKind, RevisionPlanStatus) → RevisionUIStatus`. The mapping table is defined in §11.2.4. UI does not introduce its own state labels. `RevisionDispatcherProjection.projection_state` MUST equal current `RevisionDispatcherState` (per §11.1). The projection is a derived read-model, not an independent state machine. ### §0.4.3 Assurance and limitation enums ``` AssuranceBasis = // why a verdict is trustworthy | "deterministic_check" | "structured_validation" | "source_verified_external" | "claim_grounded_internal" | "trace_verified" | "coverage_mapping" | "comparative_judge" | "historical_baseline" | "statistical_threshold" | "llm_expert_judgment" | "specialist_panel_judgment" | "policy_backed" | "human_confirmed_in_run" | "mixed" EvaluationLimitationKind = // why no trustworthy verdict exists | "insufficient_evidence" | "human_judgment_needed" | "missing_capability" | "source_unavailable" | "policy_blocked" | "stale_evidence" | "unable_to_ground_claim" ``` Owner: §5.4 Consumers: Outcome Compiler (§4), Evaluator (§5), Hard Call detection (§6.5), direct fix gating (§10) Hard Call detection triggers on `EvaluationLimitationKind.human_judgment_needed`, NOT on an AssuranceBasis value. ### §0.4.4 Finding state enum ``` FindingState = | "proposed" | "active" | "contested" | "resolved" | "superseded_by_revision" | "superseded_by_source_change" | "user_approved" | "tool_verified" | "human_verified" | "rejected_by_user" | "dismissed" | "unrecoverable" ``` Owner: §5.7 ### §0.4.5 Artifact version state enum ``` ArtifactVersionState = | "current" // the active version | "candidate" // produced by revision, awaiting acceptance | "accepted" // candidate accepted, now current | "rejected" // candidate rejected | "superseded" // older version replaced by newer current | "reverted" // version was current, rolled back by user ``` Owner: §11.11 ### §0.4.6 Plan step kind enum ``` RevisionPlanStepKind = | "module_revision" | "direct_fix" | "revalidate" | "information_request" | "verification_request" | "human_judgment_request" | "fork_from_checkpoint" | "wait" | "no_op_record" ``` Owner: §7.5 Implementation: discriminated union (§7.5.2) ### §0.4.7 Revision operation kinds ``` RevisionOperationKind = | "revision_plan_created" | "revision_step_dispatched" | "module_revision_result" | "direct_fix_applied" | "candidate_version_created" | "candidate_version_accepted" | "candidate_version_rejected" | "revalidation_requested" | "human_gate_decision" | "rollback_apply" | "escalation_created" | "hard_call_resolved" | "taint_clearance_recorded" ``` Owner: §11.6 ### §0.4.8 Module receipt status enums ``` ReceiptLifecycle = | "received" | "accepted" | "rejected" | "completed" ExecutionStatus = | "completed" | "partially_completed" // requires addressed_findings + unresolved_findings arrays | "could_not_fix" | "rejected_capability" | "version_conflict" | "needs_more_information" | "failed_runtime" | "receipt_recovery_required" | "candidate_orphan_repair_required" ``` Owner: §9.4 ### §0.4.9 Failure, strategy, target taxonomies ``` FailureKind = | "content_gap" | "source_gap" | "source_misuse" | "reasoning_error" | "structure_error" | "style_error" | "format_error" | "strategic_judgment_error" | "process_error" | "context_error" | "delivery_error" | "graph_design_gap" RepairStrategyKind = | "preserve_and_modify" | "regenerate" | "focus_on" | "apply_updates" | "gather_more_information" | "verify_then_revise" | "restructure" | "style_pass" | "format_pass" | "fork_from_checkpoint" | "direct_fix" | "human_judgment" | "graph_patch_proposal" RepairTarget = | "same_producer_module" | "upstream_source_module" | "drafting_or_revision_module" | "format_or_output_module" | "verification_module" | "human_review_gate" | "task_agent_process_gap" | "full_rerun_or_fork" ``` Owner: §6 Note: `RepairStrategyKind` is NOT a `ModuleRevisionCapability` name. Strategy-to-capability mapping is many-to-many; see §9.2 RepairStrategyCapabilityMap. ### §0.4.10 Taint enums ``` TaintClass = | "system_trusted" | "user_trusted_bounded" | "user_advisory" | "internal_corpus_trusted" | "external_authority_trusted" | "external_untrusted" | "adversarial_known" | "unclassified" TaintClearanceMethod = | "sanitization_node" | "user_explicit_review" | "human_verifier" | "policy_decision" ``` Owner: §15.10–§15.12 ### §0.4.11 Hard call enum ``` HardRevisionCallKind = | "strategic_legal_judgment" | "material_fact_dispute" | "source_authority_conflict" | "privilege_or_confidentiality_risk" | "client_position_change" | "external_side_effect_required" | "capability_gap" | "risk_tradeoff_no_dominant_option" | "human_preference_needed" ``` Owner: §6.5, §7.9 ### §0.4.12 Plan assurance enums ``` PlanAssuranceMode = | "deterministic_lint" | "semantic_lint" | "advisory_verifier" | "forum_review" | "human_gate" PlanAssuranceTriggerReason = | "always" | "high_cost" | "low_compiler_confidence" | "risky_strategy" | "privileged_artifact" | "external_side_effect" | "human_requested" | "novelty_above_threshold" ``` Owner: §11.4 ### §0.4.13 Direct fix enums ``` DirectFixAllowedClass = | "formatting_only" | "typographical_correction" | "punctuation_correction" | "metadata_label_update" | "citation_format_only" | "broken_link_repair_no_text_change" | "whitespace_or_heading_style" DirectFixForbiddenClass = | "citation_substitution" | "legal_authority_substitution" | "deadline_or_date_change" | "party_name_change" | "claim_or_argument_rewrite" | "factual_assertion_change" | "strategic_framing_change" ``` Owner: §10 ### §0.4.14 Side effect enums ``` RevisionSideEffectClass = | "none" | "internal_artifact_write" | "external_message_send" | "calendar_write" | "webhook_post" | "filing_or_submission" | "memory_write" ReplayPolicy = | "safe_to_replay" | "idempotent_with_key" | "never_replay" ``` Owner: §11.18 ### §0.4.15 Gate skippability enum ``` GateSkippability = | "optional" // skip allowed, no consequence | "skip_aborts_plan" // skip terminates the plan | "skip_requires_risk_acceptance" // skip creates explicit risk receipt | "not_skippable" // UI does not show Skip ``` Owner: §21 ### §0.4.16 Workspace failure enum ``` WorkspaceWriteFailureKind = | "no_artifact_written" | "artifact_written_receipt_failed" | "candidate_written_index_failed" | "partial_artifact_written" | "diff_written_artifact_missing" ``` Owner: §11.17 ### §0.4.17 Estimator confidence enum ``` EstimatorConfidence = | "calibrated" | "uncalibrated" | "experimental" ``` Owner: §15 ### §0.4.18 Quality actionability enum ``` QualityActionability = | "metric_only" // observed, never blocks | "warn" // surfaces warning, doesn't block | "block" // blocks execution | "escalate" // routes to human / Task Agent ``` Owner: §15 ### §0.4.19 Access tier enum ``` AccessTier = | "owner_full_access" | "matter_team_access" | "supervising_attorney_review" | "firm_admin" | "architect_admin" | "audit_log_only" | "no_access" ``` Owner: §16 ### §0.4.20 Pattern kind enum ``` PatternKind = | "outcome_configuration_pattern" | "revision_strategy_pattern" | "plan_template_pattern" PatternScopeKind = // applicability_scope.scope_kind | "global" | "domain" | "work_product_type" | "user_preference" | "matter" | "private" PatternHealthState = | "healthy" | "watch" | "quarantined" | "archived" | "purged" ``` Owner: §13 ### §0.4.21 Feedback enums ``` HumanFeedbackAuthorityClass = | "current_run_instruction" | "current_run_preference" | "future_pattern_signal" | "durable_instruction_candidate" | "matter_scoped_preference" | "privileged_comment_no_learning" | "correction_to_evaluator" | "correction_to_revisor" FeedbackKind = | "strategy_error" | "issue_weighting_error" | "style_error" | "formatting_rule_error" | "citation_rule_error" | "source_accuracy_error" | "coverage_error" | "unsupported_claim_error" | "revisor_failed" | "evaluator_false_pass" | "evaluator_false_fail" | "missing_hard_call" | "direct_instruction" | "other" ``` Owner: §14 ### §0.4.22 Signal kind enums ``` OutcomeEvaluatorFeedbackSignalKind = | "evaluator_false_pass" | "evaluator_false_fail" | "evaluator_missed_hard_call" | "evaluator_wrong_hard_call" | "evaluator_plan_user_edited" | "compiled_plan_accepted" | "compiled_plan_rejected" | "finding_marked_wrong" | "finding_marked_correct" | "finding_later_superseded" | "needs_information_was_correct" | "needs_verification_was_correct" | "human_judgment_flag_useful" | "human_judgment_flag_noise" RevisorFeedbackSignalKind = | "revision_plan_succeeded" | "revision_plan_failed" | "revision_plan_too_broad" | "revision_plan_too_narrow" | "revision_target_wrong_module" | "revision_instruction_useful" | "revision_instruction_ignored" | "revision_caused_regression" | "revision_resolved_finding" | "revision_failed_to_resolve_finding" DirectInstructionSignalKind = | "direct_instruction_candidate_created" | "direct_instruction_accepted" | "direct_instruction_edited" | "direct_instruction_rejected" | "direct_instruction_scope_narrowed" | "direct_instruction_superseded" | "direct_instruction_injection_helped" | "direct_instruction_injection_hurt" ``` Owner: §14.8 ### §0.4.23 Validation code enum Validation codes form a single namespace under §22 Validation Codes. They are referenced by ID (e.g., `validation.target_port_bypass`); the full registry is in §22. ### §0.4.24 V3.2 learning-mode and cross-model enums ``` LearningMode = // §6.16 | "production" // default | "signal_generation" // cheap models for signal volume | "calibration" // paired-model deltas ModelClass = // §6.16, §13.3 context signature | "cheap_local" // local Qwen, Ollama, etc. | "cheap_api" // Kimi 2.5, DeepSeek, etc. | "medium" // mid-tier API models | "expensive_frontier" // top-tier API models CrossModelApplicability = // §13.1 | "model_class_specific" | "cross_model_applicable" | "requires_validation" // default for new patterns ``` --- ## §0A. Implementation Discipline Preamble This section governs how V3 is implemented. All implementations of this spec, including coding agents working from it, MUST comply with these rules. ### §0A.1 No invention When the spec does not define a mechanism — schema, state transition, error path, policy gate, or route/read-model — implementations MUST NOT invent one. They MUST do one of: - Block the operation and surface a typed `unspecified_mechanism` error - No-op with a typed failure receipt referencing the missing specification - Enter a degraded path explicitly named elsewhere in the spec The default implementer behavior of "fill the gap with a reasonable default" is forbidden. Gaps are reported, not papered over. ### §0A.2 Schema fidelity Every schema field is either normative or explicitly marked informative. Implementations: - MUST NOT add fields to normative schemas - MUST NOT omit normative non-optional fields - MAY add fields to informative schemas, with documented purpose - MUST validate produced records against the declared schema before persisting Schema changes require a spec amendment with version bump. ### §0A.3 State-machine fidelity Every state machine in this spec lists its reachable states, valid transitions, and typed failure receipts. Implementations: - MUST NOT introduce hidden states - MUST NOT collapse two declared states into one - MUST emit a typed receipt on every transition - MUST treat "should never happen" cases as typed failure paths, not silent recovery ### §0A.4 Error-taxonomy fidelity Errors use spec-defined enums (RevisionFailureEventKind, ExecutionStatus, validation codes, etc.). Implementations: - MUST NOT introduce free-text error codes - MUST NOT remap one taxonomy's values onto another - MUST surface unspecified error conditions as `unspecified_error_class` with the unrecognized condition captured in metadata ### §0A.5 Executable invariants Every invariant in this spec has an enforceable check or assertion point. "Should never happen" is not an invariant. Implementations: - MUST translate each invariant into a runtime assertion or schema validation - MUST surface invariant violations as typed events with stack location - MUST NOT suppress invariant failures ### §0A.6 Cross-spec contracts consumed, not redefined When this addendum consumes types from DOC23, DOC72, DOC24, EC, or DOC73, it references them by name and version. Implementations: - MUST NOT redefine cross-spec types locally - MUST treat cross-spec contract updates as triggers for compatibility review - MUST surface a typed `cross_spec_contract_drift` event when consumed types do not match expected versions ### §0A.7 Drift Manifest Every TODO, placeholder, or deferred behavior in V3 enters the V3 Drift Manifest (§26 Open Questions). Each entry records: ``` DriftEntry { drift_id: string description: string owner: string due_date: ISO8601 | "indefinite" downstream_consumers: string[] blocks_phase_2: boolean } ``` Implementations MUST NOT silently extend deferred behavior; deferred features are reported as unsupported with reference to the drift entry. ### §0A.8 Stress scenarios as conformance fixtures V3 §25 lists stress test fixtures. Each fixture has: ``` StressFixture { fixture_id: string fixture_kind: string setup: FixtureSetup trigger: FixtureTrigger expected_states: state_value[] expected_receipts: RevisionOperationKind[] expected_findings: FindingState[] expected_violations: validation_code[] } ``` Implementations MUST pass the fixtures listed in §25 before being considered compliant. Fixtures are conformance tests, not prose resolutions. ### §0A.9 Quality Accountability Principle Components that cannot be measured are not built. Every component in V3 has a defined success metric with a denominator (§15 Quality Program). Every claim of quality has a metric. Every learning loop has a feedback signal. Every escalation has a defined trigger. When an implementation cannot wire a quality metric to a component, the component MUST be marked `quality_unmeasured` and flagged in the Drift Manifest. Unmeasured components MAY ship but MUST NOT be auto-promoted to broader scopes. ### §0A.10 Conflict resolution If two accepted items in this spec conflict, the implementation MUST block and surface the conflict via the Spec Collision Register (§28). It MAY NOT silently choose either resolution. When such a conflict is surfaced: 1. The implementation reports a typed `spec_collision` event 2. The Spec Collision Register is updated with the discovered collision 3. A spec amendment is required before the conflicting code path can ship 4. Until amendment, the affected operation is blocked with a typed failure receipt This rule extends the V3.1 Adjudication Card's preamble per Will's 2026-05-15 conversation note. --- ## §0B. Patches Applied V3 is the spec that results from applying the Canonicalization Patch V2 to the Adjudication Card V3.1. The following patches are incorporated throughout this document: | Patch | Title | Severity | Applied in §§ | |---|---|---|---| | P1 | RevisionPlanStep discriminated union + target_port lint | CRITICAL | §7.5, §11.3 | | P2 | AssuranceBasis split from EvaluationLimitationKind | CRITICAL | §5.4, §6.5, §10 | | P3 | policy_evaluated → policy_backed; PolicyEvaluationRef | HIGH | §5.4, §11.19 | | P4 | Remove shadow_workspace_default | HIGH | §6.4, §11.11 | | P5 | PlanAssurancePolicy as stack | CRITICAL | §11.4 | | P6 | Budget degradation non-degradable rules | HIGH | §6.4, §11.15 | | P7 | ModuleRevisionCapability versioning | HIGH | §9.2, §9.3 | | P8 | RevisionOperationReceipt extends PBEOperationReceiptLite | HIGH | §11.6 | | P9 | ExplanationTrace structured | MEDIUM | §6, §7 | | P10 | AdvisorySubAgentOutput union expansion | HIGH | §6, §8 | | P11 | AutonomousModePolicy | HIGH | §6.6 | | P12 | custom_instruction envelope | HIGH | §9, §15.10 | | P13 | HardCallResolution compatibility binding | MEDIUM | §7.9 | | P14 | DirectInstructionCandidate scope rules | HIGH | §14.7 | | P15 | GateSkippability metadata | HIGH | §21 | | P16 (V2) | Teach-from-feedback defaults | MEDIUM | §18 | | P17 | CandidateArtifactVersion acceptance receipts | MEDIUM | §11.11 | | P18 | Remove Pattern success_rate | MEDIUM | §13 | | P19 | Privilege escalation via taint clearance bound to AccessTier | CRITICAL | §15.12, §16 | | P20 | Sycophancy delusion: sever goal-learning loop | CRITICAL | §6, §13 | | P21 | PlanReadSet | HIGH | §11.9 | | P22 | OutcomeRuntimeState split | HIGH | §5.1 | | P23 | DispatcherState vs FailureEvent separation | HIGH | §11.2 | | P24 | Concurrency tie-breaker canonical fields | MEDIUM | §11.9 | | P25 | WorkspaceWriteFailureKind | MEDIUM | §11.17 | | P26 | Cost estimator-confidence | MEDIUM | §15, §12.6 | | P27 | Upstream failure cascade | HIGH | §5.14, §11 | | P28 | SemanticChangelog for regenerate | HIGH | §7, §21 | | P29 | Rolling Hash for multi-step plans | HIGH | §11.11, §11.20 | | P30 | Sandboxed Evaluation for tainted candidates | CRITICAL | §11.11, §15.11 | | P31 | Yield-back atomicity | HIGH | §6.7, §11.11 | | P32 | Logical vs infrastructure budget | HIGH | §6.4, §15 | | P33 | NoveltyAssessment math fix | LOW | §4, §6 | | P34 | DOC15 CIL authority snapshot | HIGH | §6, §11.3 | | P35 | QualitySignal actionability | LOW | §15 | | P36 | Source repair depth override | LOW | §6.11 | | P37 | Pattern UI aggregate display | LOW | §21 | | P38 | step.coding revision dispatch policy | HIGH | §9, cross-doc DOC11 | | P39 | Pattern applicability_scope vs provenance | HIGH | §13, §14.3, §16 | Severity summary: 6 CRITICAL, 22 HIGH, 8 MEDIUM, 3 LOW. --- # §1. ARCHITECTURAL SHAPE ## §1.1 Pipeline The Outcome Evaluator and Revisor form a five-stage pipeline. LLM intelligence is concentrated in plan production (Compilers and Revisor reasoning). Routing and execution are deterministic. ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ OUTCOME COMPILATION │ │ │ │ outcome text + guidance + context → CompiledEvaluationPlan │ │ (PreliminaryEvaluationPreview at design time) │ │ (ResolvedEvaluationPlan at runtime) │ └─────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────┐ │ step.evaluator │ │ │ │ Reads target artifact, resolved plan, DOC24 context packet │ │ Runs evaluation lanes (deterministic, source, coverage, style, │ │ specialist subevaluator, trace, hard-call as needed) │ │ Emits OutcomeEvaluationResult with findings, judgment limitations, │ │ optional RevisionRequest │ └─────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────┐ │ REVISION COMPILATION │ │ │ │ EvaluationResult + graph context + lineage + safety envelope │ │ → CompiledRevisionStrategy (causal diagnosis, repair strategy, │ │ ordering rationale, goal-impact assessment) │ │ → RevisionPlan (typed, validated, dispatchable) │ └─────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────┐ │ REVISION DISPATCHER │ │ (derived runtime service) │ │ │ │ Validates plan (deterministic lint + PlanAssurancePolicy stack) │ │ Acquires write set / detects read-write staleness │ │ Dispatches typed instructions to modules via revision_in │ │ Manages candidate artifact versions │ │ Enforces idempotency, policy gates, side-effect controls │ │ Emits RevisionOperationReceipt for every operation │ └─────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────┐ │ revision_in PORT │ │ │ │ Standard port on every revision-capable module │ │ Receives TypedRevisionInstruction │ │ Returns RevisionExecutionReceipt with execution_status │ │ Produces CandidateArtifactVersion (default) or in-place edit │ │ (rolling_hash_in_place mode only) │ └─────────────────────────────────────────────────────────────────────────┘ │ ▼ (Loop Controller revalidates affected outcomes) ``` The five stages are: 1. **Outcome Compilation (§4):** LLM-based. Converts natural-language outcome + guidance into a typed CompiledEvaluationPlan. One LLM call per outcome at design time (preview); one at runtime (resolved). Bounded cost. 2. **Outcome Evaluation (§5):** `step.evaluator` module. Reads target artifact and resolved plan; runs internal evaluation lanes; aggregates a verdict, findings, judgment limitations, and optional RevisionRequest. Specialist subevaluators may be invoked per the resolved plan. 3. **Revision Compilation (§6):** LLM-based. Reads EvaluationResult + graph context + artifact lineage + iteration history + safety envelope + goal context. Produces CompiledRevisionStrategy (diagnosis + strategy + rationale) and a typed RevisionPlan. May consult advisory sub-agents. 4. **Revision Dispatch (§11):** Deterministic runtime service. Validates the plan against deterministic and assurance-policy gates, acquires write sets, dispatches steps to modules via `revision_in`, enforces idempotency and policy decisions, emits operation receipts, and manages candidate versions. 5. **Module Revision (§9):** Modules that declare `revision_in` capability receive typed instructions, produce candidate artifact versions, and emit receipts. Loop Controller revalidates affected outcomes per the plan's revalidation expectation. ## §1.2 What this addendum is and is not This addendum is a specification for an LLM-augmented quality-assurance and revision system for documents and other artifacts produced by the DOC23 task system. It is: - A typed contract between Compiler/Evaluator/Revisor/Dispatcher/Modules - A safety model for adversarial inputs, candidate versions, taint propagation, and policy gates - A learning model for compiled patterns, performance slices, and feedback signals - A governance model for retention, access tiers, export contracts, and pattern promotion It is not: - A general-purpose orchestration framework. The Revisor is a planner, not a runtime orchestrator. - A free-form LLM agent loop. LLM intelligence is bounded by Compiler call budgets and assurance-policy gates. - A replacement for DOC23's task system. It extends DOC23 with two module types, a runtime service, a port contract, and supporting infrastructure. - A memory or knowledge graph system. It produces signals consumed by DOC72 / DOC24 / BDSM, but does not own durable memory. - A policy or privilege engine. EC and PropA own policy evaluation; this addendum consumes PolicyDecision records. ## §1.3 Non-goals - Real-time streaming evaluation. Evaluation is per-completion of upstream artifact production. - Autonomous indefinite revision loops. Revision is convergence-aware with explicit budget caps and hard-call escalation. - Cross-task evaluator sharing. Each task instance has its own evaluator activations; learned patterns are the cross-task mechanism. - Direct artifact mutation as a general primitive. Direct fix is restricted to class-safe mechanical changes (§10). - Replacement of human judgment for legal, strategic, or professional decisions. Hard Call detection (§6.5) is explicit and blocking. --- # §2. BACKGROUND ## §2.1 What V2 got right The V2 design (DOC23 Addenda B Addendum R0.6.5) made several correct decisions that V3 preserves: - **Evaluation must produce routing, not just findings.** Conventional evaluators dump findings on the user. V3 inherits the V2 inversion: the default surface is action-oriented ("sent back for revision"), with findings available for audit. (§5.6) - **Planning intelligence is separated from deterministic dispatch.** The Revisor writes a typed RevisionPlan; the Dispatcher validates and executes deterministically. This is the strongest design decision in the architecture. (§6, §11) - **`revision_in` is the right module contract.** Modules declare revision capabilities; the Dispatcher validates capability matching before dispatch. (§9) - **Targeted re-evaluation and cascading revalidation are essential.** Once an upstream artifact changes, dependent outcomes become stale and re-run. (§5.14, §11.21) - **Plan review forum is distinct from runtime orchestration.** Plan review is deliberation; runtime dispatch is execution. (§14.9) - **Human feedback should train the system.** Feedback events produce signals consumed by future Compiler invocations and DOC72/BDSM. (§14) ## §2.2 What V2 got wrong and V3 fixes V3 corrects the following V2 design errors: - **The UI was too configuration-heavy.** V2 had method pickers, strictness sliders, failure-routing radio buttons, item extraction categories. V3 has a single outcome field, optional guidance, and inferred everything else. (§3.2, §21) - **"Outcome" and "rubric" were not separate user concepts.** V3 collapses them: the user states an outcome; the Compiler decides what to check and how. (§3.2) - **"Strictness" was the wrong abstraction.** V3 replaces it with the Evaluation Sufficiency Protocol (§5.5). - **Failure behavior was conceived as a UI radio button.** V3 makes it graph wiring: what happens after failure is determined by topology. (§3.2) - **"Text-to-outcome inference" was a vague pipeline.** V3 makes it the Outcome Compiler (§4): a typed, measured, bounded LLM step. - **"Revision Bus" was the wrong name.** V3 renames to Revision Dispatcher: typed point-to-point deterministic dispatch, not pub/sub broadcast. (§11) - **`TaskSourceWorkspace` was too broad as a primitive.** V3 separates SourceWorkspace, RunWorkspace, and ArtifactStore with explicit ownership rules. (§12) - **Direct fix was too permissive.** V3 narrows direct fix to class-safe mechanical changes only; meaning-bearing repair routes through `revision_in`. (§10) ## §2.3 Comparison to other systems For context, V3's design relative to known patterns: | Pattern | Approach | V3's relation | |---|---|---| | Anthropic Outcomes | Outcome-shaped specification of task results | V3 generalizes: outcomes are typed, evaluable, revisable; outcomes feed Revisor, not just final-state checks | | LangGraph | Free-form agent graphs with LLM-driven routing | V3 rejects free-form routing at runtime; LLM intelligence is bounded to plan production | | Codex /goals | Goal-shaped tasks with checker functions | V3 generalizes checkers to AssuranceBasis with method, source, capability binding | | ReAct / autonomous agents | LLM-driven act-observe loops | V3 rejects unbounded loops; Loop Controller enforces convergence rules and budgets | | Conventional test runners | Pass/fail per assertion | V3 adds judgment limitations, hard calls, sufficiency protocol, and routing to repair | V3's distinguishing properties: bounded LLM cost per cycle, deterministic dispatch, typed plans validated before execution, explicit handling of subjective/judgment outcomes, candidate versions as first-class safety primitive, and integrated learning through compiled patterns. ## §2.4 Scope vs DOC23 R3.1 DOC23 R3.1 is the underlying task system: modules, ports, cables, activation, FIFO input queues, checkpointing, Loop Controller, security policy. V3 extends DOC23 R3.1 with: - Two new module types: `step.evaluator` and `step.revisor` - A new port: `revision_in` - A derived runtime service: Revision Dispatcher - A new artifact category: CandidateArtifactVersion within SourceWorkspace - Cross-cutting concerns: PlanAssurancePolicy, RevisionSafetyEnvelope, RevisionOperationReceipt V3 does not modify DOC23 R3.1 module activation semantics, port semantics for existing ports, or the Loop Controller's existing rules. It registers obligations (§29) for additions DOC23 R3.1 needs to absorb, but those changes belong in DOC23, not here. The generic stateful multi-tick AsyncModuleRun proposed in early V3 drafts is deferred (per §26 Open Questions). V3 instead uses Option A: the Revisor/Dispatcher async lifecycle lives in RevisionExecutionRecord, outside ordinary DOC23 module activation. Evaluator and Revisor modules activate single-tick; the Dispatcher service owns asynchronous continuation across multiple ticks. --- # §3. GOVERNING PRINCIPLES These principles bind the rest of the spec. Where a later section appears to permit something a governing principle forbids, the principle controls and the conflict is registered in §28 Spec Collision Register. ## §3.1 Pipeline rules **§3.1.1 The pipeline is five stages.** Outcome Compiler → Evaluator → Revision Compiler → Revisor → Dispatcher → Module via `revision_in`. Stages are typed at the boundary. Implementations MUST NOT collapse stages or invent intermediate ones. **§3.1.2 LLM intelligence is concentrated in plan production.** The Outcome Compiler and Revision Compiler are LLM-based. The Evaluator may use LLM specialist subevaluators when its resolved plan calls for them. The Dispatcher is fully deterministic; it does not consult an LLM to decide runtime routing. **§3.1.3 Meaning-bearing repair goes through `revision_in`.** Module modifications that change artifact meaning, content, claims, or strategic framing MUST route through a module's declared `revision_in` capability. Direct fix is reserved for class-safe mechanical changes (§10). This is the architecture's central safety contract. **§3.1.4 Plan execution is deterministic.** Once a plan is validated and approved for dispatch, step ordering and routing are deterministic. Modules may produce non-deterministic outputs; their dispatch is deterministic. **§3.1.5 Receipts are mandatory.** Every operation that mutates state or invokes a module emits a typed `RevisionOperationReceipt extends PBEOperationReceiptLite` (§11.6). State changes without receipts are forbidden. **§3.1.6 Candidate versions are the default mutation primitive.** Multi-step mutating plans, meaning-bearing edits, and operations on privileged artifacts produce CandidateArtifactVersion records that require acceptance before becoming current. Direct in-place mutation requires explicit opt-in to rolling_hash_in_place mode and is restricted to single-artifact mechanical workflows. (§11.11) **§3.1.7 Revalidation is automatic, not predicted.** When a revision step mutates an artifact, all downstream outcomes with declared dependencies on that artifact are marked `dirty` and re-evaluated. The Dispatcher does not predict which downstream outcomes "should" be affected; cascading revalidation is deterministic. (§11.21) ## §3.2 UX rules **§3.2.1 The user states an outcome; the system handles configuration.** No method picker, no strictness slider, no failure-routing radio buttons. The Outcome Compiler infers method, threshold, assurance basis, and routing from the outcome text and optional guidance. **§3.2.2 Two fields are semantically distinct.** The outcome field states what must be true for the outcome to pass (used for verdict). The guidance field states how the user wants the system to think about the outcome (used for method, style, exemplars, tool preferences). Mixing them taints intent. **§3.2.3 The user inputs natural language or files; the Compiler parses structure.** No structured form fields for tool preferences, source preferences, or style references. The Outcome Compiler is solely responsible for converting natural language and file attachments into structured plan components. **§3.2.4 Failure behavior is graph wiring.** "What happens if this fails?" is determined by graph topology (which modules are wired to which ports), not by per-outcome UI radio buttons. The UI shows the wiring outcome; users edit the graph to change behavior. **§3.2.5 Inferred parameters are visible, not configurable.** Thresholds, method choices, and assurance bases inferred by the Compiler are surfaced to the user via the CompiledEvaluationPlan preview (§4.4). Users adjust by editing the outcome text, not by manipulating inferred values directly. Override is available with explicit warnings (§21 Adjust panel). **§3.2.6 Skip is conditional, not universal.** Human-gated steps carry a GateSkippability metadata. Steps marked `not_skippable` do not present a Skip button in the UI. Steps marked `skip_requires_risk_acceptance` produce an explicit risk-acceptance receipt when skipped. (§21) **§3.2.7 Durable learning is always opt-in.** Teach-from-feedback durable-destination checkboxes default OFF for all runs. Only "Fix this run now" is on by default. Privilege and matter scope do not change these defaults; they affect retrieval scope, not learning eligibility. (§18) ## §3.3 Cost predictability **§3.3.1 Planning cost is bounded by one LLM call per Compiler invocation per iteration.** The Outcome Compiler runs once per outcome at design time and once at evaluation runtime. The Revision Compiler runs once per Revisor activation. Advisory sub-agents have separate budgets per [A4o] AdvisorySubAgentProfile. **§3.3.2 Execution cost is module-driven and tracked.** Module activations during dispatch produce cost records that contribute to the run's EvaluationRevisionCostBreakdown (§12.6). **§3.3.3 Budgets are dual.** Logical budgets (`max_logical_llm_calls_per_revision`, `max_logical_tokens_per_revision`) count successful inferences. Infrastructure budgets (`max_infrastructure_retries_per_logical_call`, `max_total_infrastructure_retries_per_revision`) count retries due to JSON/schema/timeout failures. Infrastructure retries do not consume the logical budget. **§3.3.4 Local compute is a first-class budget.** `max_local_compute_seconds_per_revision` bounds wall-clock time on Apple Silicon. Loop Controller has explicit authority to preempt threads exceeding this budget. **§3.3.5 Cost estimators must declare confidence.** Estimates carry `EstimatorConfidence` (calibrated / uncalibrated / experimental). Experimental estimates require user-set hard caps. Three consecutive over-budget runs suppress auto-approval until recalibration. (§15) **§3.3.6 Graceful degradation skips optional helpers first.** When approaching budget, the system skips advisory sub-agents, then optional specialist subevaluators, then dry-run, then optional semantic lint. Required modes per PlanAssurancePolicy are non-degradable (§3.5, §11.4). ## §3.4 Sub-Agent Leverage Rule **§3.4.1 Single-agent fast path by default.** The architecture uses sub-agents for parallel reasoning when work decomposes into independent lanes with genuine per-lane specialization. Default execution uses single-agent reasoning for cheap and mechanical cases. **§3.4.2 Sub-agent invocation is opt-in based on plan complexity, risk, and confidence.** The Compiler decides whether to consult sub-agents based on: - Plan novelty score above threshold (§4.5) - Compiler confidence below threshold - Plan risk score above threshold - Failure kind with known specialist sub-agent - User-registered custom sub-agent for the domain **§3.4.3 Sub-agents share protocols.** All advisory sub-agents: - Operate on scoped context packs (no global context dump) - Emit output conforming to the AdvisorySubAgentOutput union (§6, §8) - Honor per-invocation cost and timeout budgets - Inherit input taint per §15.11 - Contribute to reputation scoring per §15 **§3.4.4 Advisory output is evidence, not instruction.** The Compiler must accept, reject, or defer each sub-agent's advice. The resolution is recorded in CompiledEvaluationPlan or CompiledRevisionStrategy. Implementations MUST NOT treat sub-agent output as direct execution input. **§3.4.5 Sub-agents apply at four coordination points.** Outcome Compiler (intent classification, parameter inference, threshold extraction, source/tool binding), Evaluator (specialist subevaluators for lanes), Revision Compiler (advisory repair-planning lanes), Feedback Interpreter (per-feedback-kind parsing specialists). All four use shared protocols per §3.4.3. ## §3.5 Quality Accountability Principle **§3.5.1 Components that cannot be measured are not built.** Every component in V3 has a defined success metric with a denominator. Quality programs are mandatory for: - Outcome Compiler (§15.3) - Revision Compiler (§15.2) - Revisor execution (§15.1) - Sub-agents (§15.8) - Patterns (§13.3) **§3.5.2 Every claim of quality has a metric with denominator.** A metric without a denominator is not a metric. Phrases like "fixes per hour" without a quality denominator are forbidden in implementation rationale. **§3.5.3 Every learning loop has an independent feedback signal.** Self-reported success signals (e.g., a Revisor grading its own goal advancement) MUST NOT drive durable learning. Learning signals must come from: - Independent post-run evaluators (e.g., comparative_judge against baseline) - Explicit human feedback - Downstream outcome convergence - External verification **§3.5.4 Every escalation has a defined trigger.** Escalation to Task Agent, human gate, or Hard Call must be triggered by an enumerated condition (§6.5, §6.9, §6.11). "The system felt uncertain" is not a trigger. **§3.5.5 Quality signals are typed by actionability.** Each QualitySignal carries a `QualityActionability` value: metric_only, warn, block, or escalate. Block actionability creates a gate; metric_only does not. (§15) **§3.5.6 Unmeasured components are flagged, not shipped silently.** A component without a wired quality metric is marked `quality_unmeasured` in the Drift Manifest (§26) and MAY NOT be auto-promoted to broader scope. ## §3.6 BDSM Boundary Rule BDSM is the right substrate for utility and attribution learning. It is not the right substrate for several other things V2 sometimes implied. **§3.6.1 BDSM must not become:** - The feedback interpreter (the Feedback Interpreter is a separate service, §14.3) - The durable writer (EC is sole durable writer, §3.7) - The memory governance owner (DOC72 owns memory governance) - The semantic task patcher (DirectInstructionCandidates land in DOC72 via governed promotion) - The live prompt mutator (DOC24 injects compiled bundles, not live mutations) **§3.6.2 BDSM consumes signals and compiles runtime-safe utility bundles.** It receives signal classes (§14.8) and produces bundles consumed by DOC24 at runtime. It does not directly modify Evaluator or Revisor behavior at hot-path time. **§3.6.3 BDSM constraints:** - No hot-path LLM calls - Compiled-bundle-only runtime - EC sole durable writer - No silent promotion of durable standing rules - Inspectability and reversibility ## §3.7 EC owns policy; this addendum consumes **§3.7.1 EC and PropA own policy evaluation.** When V3 needs a policy decision (privilege, classification, mutation authorization, external-send authorization, memory-write authorization), it requests a PolicyDecision via the standard EC API and consumes the response. **§3.7.2 V3 modules do not evaluate policy.** Implementations MUST NOT introduce local policy logic to bypass or shortcut EC. PolicyDecision records are created only by EC. **§3.7.3 V3 does honor PolicyDecisions.** PolicyDecision is a precondition (per §3.8). A `PolicyDecision.decision == "block"` blocks the operation. A `PolicyDecision.decision == "allow_with_human_gate"` triggers a non-degradable human gate. **§3.7.4 EC is the sole durable writer.** All artifacts produced by V3 (CompiledEvaluationPlan, CompiledRevisionStrategy, RevisionPlan, OutcomeEvaluationResult, RevisionExecutionRecord, RevisionOperationReceipt, JudgmentLimitationRecord, VerificationRecord, HardCallResolution, CandidateArtifactVersion, GraphStateRollback, RevisionSafetyEnvelope, TaintClearanceRecord) are persisted via EC. Local-only state is permitted for in-flight computation but never as system of record. ## §3.8 Safety as compilation input **§3.8.1 No CompiledEvaluationPlan is valid unless:** 1. Evidence and source availability checked (Sufficiency Protocol, §5.5) 2. Taint labels assigned to all inputs (§15.10) 3. Assurance basis backed by method, source, or tool truth (§5.4) **§3.8.2 No RevisionPlan is valid unless:** 1. RevisionSafetyEnvelope exists (§15.10) 2. PolicyDecision exists for every mutation or side-effect step (§11.19) 3. Governance policy exists for every findings/plan/receipt/diff record (§16) 4. Candidate-version policy resolved (§11.11) 5. CIL authority snapshot present and conflict-checked (§6, §11.3) **§3.8.3 Validation failures are typed.** Each precondition failure produces a typed validation code (§22). Plans failing validation are rejected at deterministic linting and never reach dispatch. ## §3.9 11 Core Revisor Rules These rules bind Revisor behavior throughout V3. 1. **The Revisor is a planner, not the default content reviser.** It produces typed plans; modules do the reviser work. 2. **The Revisor consumes** evaluator results, revision requests, graph context, artifact lineage, source workspace state, prior revision history, human feedback, learned patterns, and goal context. 3. **The Revisor creates CompiledRevisionStrategy before producing RevisionPlan.** The strategy artifact captures causal diagnosis, repair strategy mapping, ordering rationale, alternatives considered, and escalation rationale before plan emission. 4. **CompiledRevisionStrategy diagnoses** failure kind, likely cause, responsible module, repair target, strategy, order, risks, and revalidation needs. Each diagnosis component is typed and auditable. 5. **The Revisor must use declared module revision capabilities.** It may not invent runtime capabilities. ModuleRevisionCapability declarations are authoritative; unknown capabilities are rejected at deterministic linting. 6. **Revision Dispatcher validates plans deterministically before execution.** Linting includes schema, DAG, capability, version, taint, and policy gates. A plan that fails linting is rejected; it does not partially execute. 7. **Meaning-bearing repair goes through `revision_in`.** Direct fix is restricted to class-safe mechanical changes (§10). The Revisor MUST NOT compose plans that route meaning-bearing edits through other ports. 8. **Every dispatched module emits a RevisionExecutionReceipt.** Receipts are mandatory; modules that do not emit receipts are treated as failed. 9. **Revisor loops are convergence-aware.** Repeated non-progress (per §6.11 success conditions; per [D16] loop breaker after N consecutive insufficient plans) escalates to Task Agent or human, not infinite retry. 10. **User feedback may enter current-run repair through the Revisor when active or forkable.** Future learning is routed through DOC72, DOC8/BDSM, and DOC24 by separate signal classes (§14.8). 11. **Revisor learning uses compiled, inspectable patterns.** No hot-path self-mutation. Patterns are retrieved from DOC72 at compile time; promotion to broader scope follows §13.5 governance. --- # §4. OUTCOME COMPILER The Outcome Compiler is a first-class layer between the user's natural-language outcome and the runtime Evaluator. It converts an outcome plus optional guidance plus context into a typed CompiledEvaluationPlan. ## §4.1 Role **§4.1.1 Inputs.** The Compiler reads: ``` natural-language outcome text + optional guidance text + target artifact reference + graph/run context + source workspace snapshot + task blueprint summary + available capabilities and tools + prior findings and revision history + learned evaluator and revisor patterns + DOC72 goal context + CIL authority snapshot (per §6, §11.3) ``` **§4.1.2 Outputs.** The Compiler emits one of: - `CompiledEvaluationPlan` with `plan_phase = "preliminary_preview"` (at design time) - `CompiledEvaluationPlan` with `plan_phase = "runtime_resolved"` (at evaluation time) - A typed compilation failure with reason from `CompiledEvaluationPlanStatus` (§4.6) Both phases use the same schema (§5.2); only `plan_phase` and the populated context fields differ. **§4.1.3 Bounded cost.** The Compiler runs as one LLM call per outcome per phase. Sub-agent invocations have separate budgets. The Compiler MAY NOT invoke itself recursively or invoke other Compilers. ## §4.2 Preliminary preview vs runtime resolved **§4.2.1 Preliminary preview (design time).** Generated when the user authors an outcome. The Compiler invocation excludes the DOC24 runtime packet and works from the task blueprint, available capabilities, and learned patterns. The preview shows the user: - Compiler's interpretation of the outcome (`interpreted_goal`) - Inferred evaluation lanes - Inferred assurance bases per lane - Required capabilities (and whether they are available) - Required evidence sources (and whether they are accessible) - Expected hard-call types - Threshold inferences - Limitations The user can adjust by editing the outcome text or guidance. The user does not directly manipulate compiled fields (§3.2.5). **§4.2.2 Runtime resolved.** Generated at evaluation time when the upstream artifact is ready and the DOC24 packet is injected. The resolved plan is authoritative for evaluation. It MAY differ from the preview when context has changed: capabilities became available or unavailable, sources updated, learned patterns updated. **§4.2.3 Drift between preview and resolved.** When `runtime_resolved` differs materially from `preliminary_preview`, the difference is recorded in the resolved plan's metadata (§5.2.4) and contributes to Compiler quality metrics (§15.3). ## §4.3 Evaluation lanes The Compiler composes internal evaluation lanes as needed. Lanes are not user-visible modes; they are how the Compiler decomposes the evaluation work. **§4.3.1 Lane kinds.** A lane has a `lane_kind` from: ``` EvaluationLaneKind = | "outcome_fitness" // does the output satisfy the outcome at a high level? | "requirement_compliance" // does it meet specific checklist requirements? | "truth_or_source_verification" // are claims grounded in cited sources? | "coverage_gap_detection" // are required topics/items addressed? | "comparison_or_ranking" // is this better than alternative or baseline? | "process_trace_audit" // did the task actually use required tools/sources? | "hard_call_detection" // what strategic judgments were made? | "format_or_output_readiness" // does the output satisfy structural requirements? | "style_or_voice_review" // does it follow style guidance? | "custom" // Compiler-defined lane ``` **§4.3.2 Lane composition.** The Compiler decides which lanes to include based on the outcome text, guidance, target artifact kind, and learned patterns. Lanes are not mandatory; a simple outcome may compose a single lane. **§4.3.3 Per-lane assurance basis expectation.** Each lane declares which AssuranceBasis values are expected to back its findings. The Evaluator validates at runtime that produced findings match the declared expectations; mismatches contribute to quality metrics. **§4.3.4 Specialist subevaluator assignment.** A lane MAY name a specialist subevaluator (`specialist_agent_ref`) that handles the lane's evaluation. Specialist subevaluators are execution sub-agents, not advisory (§8). ## §4.4 The CompiledEvaluationPlan preview UI The CompiledEvaluationPlan preview is the user's primary feedback surface at outcome authoring time. Implementations MUST surface: - The `interpreted_goal` (Compiler's restatement of the outcome) - A list of evaluation lanes with `lane_goal` and `why_needed` - Per-lane `assurance_basis_expected` - A list of required capabilities with availability indicators - A list of required source bindings with accessibility indicators - Expected hard-call types - The threshold extraction record (Compiler's inference of any thresholds in the outcome text) - The novelty assessment (whether this resembles a known pattern) - Limitations and `compiler_confidence` The user adjusts by editing the outcome text or guidance and recompiling. Manual override of compiled fields is available in the Adjust panel (§21) with explicit warnings; this is a privileged path. ## §4.5 Novelty detection The Compiler computes a NoveltyAssessment for every plan: ``` NoveltyAssessment { input_signature_hash: string closest_pattern_id?: string closest_pattern_distance: number // 0.0 = identical, 1.0 = no overlap similarity_score: number // 1 - closest_pattern_distance novelty_score: number // closest_pattern_distance forces_fresh_reasoning: boolean // novelty_score >= 0.7 (configurable) triggers_task_agent: boolean schema_version: 1 } ``` **§4.5.1 Threshold rule.** When `novelty_score >= 0.7` (default; configurable in RevisorConfig): - `forces_fresh_reasoning = true` - The Compiler MUST NOT auto-apply a learned pattern as the basis for the plan - The Compiler MAY consult Task Agent for graph-structure questions (per §6.9) **§4.5.2 Below threshold.** When `novelty_score < 0.7`, the Compiler may incorporate the closest pattern's parameters as a starting point. The pattern's `applicability_scope` (§13) must still match; novelty alone does not authorize use of a scope-incompatible pattern. **§4.5.3 Symmetry with Revision Compiler.** The Revision Compiler computes its own NoveltyAssessment at strategy compilation time (§6.4); the same threshold rule applies. ## §4.6 CompiledEvaluationPlan status A Compiler invocation produces a plan with one of these statuses: ``` CompiledEvaluationPlanStatus = | "compiled" // ready for execution | "compiled_with_limitations" // ready but with declared limitations | "needs_clarification" // Compiler needs user input on ambiguity | "needs_missing_source" // required evidence/source unavailable | "needs_missing_capability" // required tool/module not in graph | "abstained_low_confidence" // Compiler refuses to commit to weak plan | "blocked_by_policy" // policy gate prevents compilation ``` **§4.6.1 Abstention is a first-class outcome.** The Compiler MUST surface `abstained_low_confidence` rather than producing a plan it cannot defend. This is a hygiene mechanism: a weak plan compiled to satisfy the schema produces worse downstream behavior than no plan. **§4.6.2 Status surfacing.** Statuses other than `compiled` surface in the UI (§21) before the plan reaches runtime. The user is given specific remediation paths per status: - `needs_clarification` → re-author outcome with specific questions resolved - `needs_missing_source` → add the source to the source workspace or change the outcome - `needs_missing_capability` → add the capability to the graph (Task Agent assistance available) or change the outcome - `abstained_low_confidence` → re-author the outcome with more guidance - `blocked_by_policy` → resolve the policy decision (typically requires governance escalation) ## §4.7 Outcome Compiler Quality Program The Compiler is measured. Metrics (per §15.3) with denominators: - Intent classification accuracy: was the Compiler-inferred outcome_kind correct? (denominator: outcomes evaluated) - Parameter inference correctness: were thresholds, scoped terms, capability bindings correct? (denominator: parameters inferred) - Plan compilation latency: time from outcome submission to plan emission - Plan compilation cost: USD + local_compute_seconds per plan - Confidence calibration: does `compiler_confidence_score` correlate with actual evaluation success? (denominator: plans run) - Threshold extraction accuracy: when the outcome states a threshold, did the Compiler extract it correctly? (denominator: outcomes with thresholds) - Coverage of stated outcome: did the plan check what the user asked about? (denominator: outcomes evaluated, judged by user acceptance or post-hoc review) - Status enum distribution: how often does the Compiler abstain or need clarification? (denominator: compilations attempted) - Preview-vs-resolved drift: how often does the runtime resolved plan differ from the preview? (denominator: outcomes with both phases captured) Metrics are sliced by `pattern_id`, `outcome_kind`, `domain_tags`, and `goal_kind` per §13.3. ## §4.8 Compiler-level sub-agent invocation The Outcome Compiler MAY invoke advisory sub-agents per §3.4.5 and §8. Sub-agents that the Outcome Compiler may consult: - **Intent classifier** (`OutcomeCompilationAssessment` output): clarifies ambiguous outcome statements - **Threshold extractor** (`ThresholdExtractionAssessment` output): extracts quantitative or qualitative thresholds from natural language - **Source binder** (`SourceBindingAssessment` output): identifies and binds evidence sources - **Method binder** (`EvaluationMethodBindingAssessment` output): proposes evaluation methods given the outcome and available capabilities Each sub-agent invocation: - Operates on a scoped context pack (§8.3) - Has a per-invocation cost budget - Produces output validated against the AdvisorySubAgentOutput union (§8.4) - Contributes to reputation scoring (§15.8) The Compiler decides whether to accept each sub-agent's advice. Accepted advice is incorporated into the plan; rejected advice is logged in the plan's explanation trace. ## §4.9 Implementation notes - **§4.9.1** The Compiler is implemented as a runtime service, not a graph module. It has no module record. It is invoked when an outcome is authored (preview) or when an Evaluator activates (resolved). - **§4.9.2** The Compiler reads DOC72 patterns through a query API; it does not mutate DOC72 state. - **§4.9.3** The Compiler MUST NOT consult DOC24 for hot-path routing decisions. DOC24 provides capability availability and learned-guidance bundles; routing remains deterministic. - **§4.9.4** The Compiler emits a `CompilerEvaluationRecord` referenced from CompiledEvaluationPlan for quality-program measurement. The record captures input context hash, sub-agents consulted, decisions made, and ultimate output. --- # §5. OUTCOME AND EVALUATION SCHEMAS This section defines the canonical schemas for outcomes, evaluation plans, and evaluation results. All schemas use TypeScript-like notation. ## §5.1 EvaluationOutcomeDefinition and the six-type split V2's monolithic OutcomeSpec is replaced by six focused types. The split avoids god-object bloat and makes per-concern evolution possible. ### §5.1.1 EvaluationOutcomeDefinition User-facing intent and pass semantics: ```ts EvaluationOutcomeDefinition { outcome_id: string task_id: string outcome_text: string // user-stated outcome (verdict spec) guidance_text?: string // user-stated guidance (advisory) outcome_kind: string // Compiler-inferred classification pass_semantics: PassSemantics is_high_stakes: boolean priority: number // lower = higher priority (per P24) goal_refs: GoalRef[] // DOC72 goal references // V3.2: Canonical criterion sub-structure (public sub-contract per // Addenda A ↔ Addenda B coordination V3 §2.4). Judge consumes this directly // via outcome_compliance_scoring. The Evaluator scopes findings to specific // criteria via EvaluationFinding.target_criterion_id (§5.7). criteria: Criterion[] // V3.2: Default weight policy when criteria don't specify weights explicitly default_weight_policy: "uniform" | "from_criterion_weight" | "from_priority" schema_version: 2 // bumped for V3.2 criteria field } PassSemantics = | "all_findings_resolved" | "no_blocking_findings" | "threshold_above" | "threshold_below" | "checklist_complete" | "comparative_judge_pass" | "custom" ``` ### §5.1.1.1 Criterion (public sub-contract per coordination V3 §2.4) ```ts // Public sub-contract. Versioned independently of EvaluationOutcomeDefinition. // Bumping this schema requires Addenda A + Addenda B coordination. Criterion { criterion_id: string // stable within outcome criterion_text: string // natural language description criterion_semantics_hash: string // hash stable across runs even when // criterion_text rephrases; enables // cross-run learning correlation // Aggregation metadata required: boolean // must-have vs nice-to-have weight: number | null // 0.0-1.0; null → uniform within outcome // when default_weight_policy = "uniform" priority?: "must_have" | "should_have" | "nice_to_have" // Scoring metadata (Judge's outcome_compliance_scoring consumes these) rubric_hint?: string // optional pre-authored scoring guidance scoring_basis: | "deterministic_count" // count-based (cite N items) | "source_verified" // verify against external source | "rubric_anchored_judgment" // qualitative with anchors | "unanchored_llm_judgment" // qualitative without anchors // NOT aggregation-eligible by default // (per coordination V3 §2.4) // Evidence requirements (consumed by both Judge and Evaluator) required_claim_types?: ClaimType[] // claim types needed (from Addenda A // step.claim_extractor's 22-type registry; // see §5.17 claims_in port) evidence_requirements?: string[] source_policy_refs?: StorageRef[] schema_version: 1 } ``` **Criterion ownership and evolution rules:** - `Criterion` is owned jointly by Addenda A and Addenda B per coordination V3 §2.4. Neither addendum may modify the schema unilaterally; changes require coordination. - `criterion_semantics_hash` is stable across runs even when `criterion_text` rephrases. Implementations compute it from a normalized representation (lowercased, stripped of formatting, content-words only). This stability enables Signal 3 (RepairCycleSignal) per-criterion deltas to correlate across runs. - `scoring_basis` governs aggregation eligibility. `unanchored_llm_judgment` is NOT aggregated into `QualityIndex` by default; Judge's `OutcomeComplianceScoringConfig` can override (with audit flag — see DOC23 Evaluation Common Contracts). - `required_claim_types` references the 22-type registry from Addenda A's `step.claim_extractor` (§5.17 + coordination V3 §2.12). ### §5.1.1.2 Finding-to-Criterion linkage `EvaluationFinding.target_criterion_id` (V3.1 §5.7) references `Criterion.criterion_id`. The Evaluator scopes findings to specific criteria; downstream RevisionPlanStep.triggering_finding_refs and Signal 3 attribution use this linkage to populate `targeted_criterion_ids` automatically (coordination V3 §2.6, §2.11). When a finding addresses the outcome as a whole rather than a specific criterion, `target_criterion_id` is null and the Revisor attribution basis is recorded as `global_plan_context` (per §4.7 attribution_basis enum in coordination V3). ### §5.1.2 EvaluationBinding Method, tool, agent, rubric, and source binding for the outcome: ```ts EvaluationBinding { outcome_id: string evaluation_method: EvaluationMethod method_params: EvaluationMethodParams // discriminated by method (§5.3) assurance_basis: AssuranceBasis // per §5.4 capability_binding_refs?: CapabilityRef[] source_binding_refs?: SourceBindingRef[] schema_version: 1 } EvaluationMethod = | "auto_check" | "check_claims" | "specialist_agent" | "agent_assessment" | "judge_comparison" | "multi_method" ``` **§5.1.2.1 EvaluationBinding wrapper invariant.** The EvaluationBinding bundles method, params, assurance basis, and capability/source binding. AssuranceBasis does NOT replace EvaluationMethod; both are required (per [P-2]). - `AssuranceBasis` answers: *why is the verdict trustworthy?* - `EvaluationMethod` answers: *what mechanism runs?* - `CapabilityBinding` answers: *which concrete tool / module / agent / source implements it?* ### §5.1.3 OutcomeDependencySpec Dependencies among outcomes and on artifacts: ```ts OutcomeDependencySpec { outcome_id: string declared_dependencies: ArtifactRef[] // artifacts that, if changed, invalidate this outcome prerequisite_outcomes: outcome_id[] // must pass before this is evaluated invalidated_by_outcomes: outcome_id[] // if these change state, this becomes dirty required_for_overall_pass: boolean schema_version: 1 } ``` ### §5.1.4 RecoveryPolicy What happens when the outcome fails: ```ts RecoveryPolicy { outcome_id: string on_failure: | "route_to_revisor" | "route_to_human" | "abort_task" | "continue_with_warning" retry_budget: number escalation_after_retries: | "task_agent" | "human" | "graph_design_gap" schema_version: 1 } ``` ### §5.1.5 OutcomeRuntimeState Current evaluation and lifecycle state (per [P-22] split): ```ts OutcomeRuntimeState { outcome_id: string evaluation_state: OutcomeEvaluationState // per §0.4.1 lifecycle_state: OutcomeLifecycleState // per §0.4.1 latest_verdict_ref?: OutcomeEvaluationResultRef active_findings: finding_id[] retry_count: number progress_signal_bundle_ref?: StorageRef schema_version: 1 } ``` **§5.1.5.1** `evaluation_state` is what the evaluation says about this outcome right now. `lifecycle_state` is whether this outcome is even part of the active task. These are independent: an outcome can be `active` (lifecycle) and `regressed` (evaluation), or `deprecated` (lifecycle) and `superseded` (evaluation). ### §5.1.6 OutcomePatternRef Reference to a learned pattern applied to this outcome: ```ts OutcomePatternRef { outcome_id: string applied_pattern_id?: string pattern_match_confidence: number pattern_adaptation_record?: AdaptationRecord schema_version: 1 } ``` **§5.1.7 Invariant.** The six types together replace V2's OutcomeSpec. Implementations MUST NOT reintroduce a single monolithic OutcomeSpec. Cross-references between the six types use `outcome_id`. ## §5.2 CompiledEvaluationPlan ```ts CompiledEvaluationPlan { plan_id: string task_id: string run_id?: string // null for preliminary_preview evaluator_module_id: string plan_phase: "preliminary_preview" | "runtime_resolved" status: CompiledEvaluationPlanStatus // per §4.6 outcome_text: string additional_guidance?: string interpreted_goal: string // Compiler's restatement evaluation_lanes: EvaluationLanePlan[] // per §5.3 assurance_protocol: { required_assurance_bases: AssuranceBasis[] sufficient_to_pass_if: string cannot_pass_if: string[] must_report_limitations: boolean } evidence_and_context_plan: { target_artifact_refs: StorageRef[] source_workspace_refs: StorageRef[] supporting_material_refs: StorageRef[] trace_refs: StorageRef[] doc24_packet_ref?: StorageRef // null for preliminary_preview unavailable_materials: string[] } capability_plan: { required_capability_refs: CapabilityRef[] optional_capability_refs: CapabilityRef[] unavailable_capabilities: string[] } specialist_plan: { use_specialist_subevaluators: boolean specialist_lanes: SpecialistEvaluatorLane[] adjudicator_required: boolean } hard_call_detection_plan: { enabled: boolean expected_hard_call_types: HardRevisionCallKind[] } output_plan: { emit_revision_request: boolean emit_information_request: boolean emit_verification_request: boolean emit_human_judgment_request: boolean emit_process_gap_signal: boolean emit_forum_post: boolean } limitations: string[] compiler_confidence: "low" | "medium" | "high" compiler_confidence_score: number // 0.0 - 1.0 threshold_extraction_record: ThresholdExtractionRecord compiler_evaluation_record_ref?: StorageRef novelty_assessment: NoveltyAssessment // per §4.5 preview_resolved_drift_record?: PreviewResolvedDriftRecord // populated on runtime_resolved created_at: ISO8601 schema_version: 1 } ``` **§5.2.1 Phase invariant.** A plan with `plan_phase = "preliminary_preview"` has `run_id = null` and `evidence_and_context_plan.doc24_packet_ref = null`. A plan with `plan_phase = "runtime_resolved"` has both populated. **§5.2.2 Status invariant.** A plan with `status` other than `compiled` or `compiled_with_limitations` MUST NOT proceed to evaluation. The Evaluator MUST refuse to run against a plan in `needs_clarification`, `needs_missing_source`, `needs_missing_capability`, `abstained_low_confidence`, or `blocked_by_policy` status. ## §5.3 EvaluationLanePlan ```ts EvaluationLanePlan { lane_id: string lane_kind: EvaluationLaneKind // per §4.3.1 lane_goal: string // what this lane is checking why_needed: string // Compiler's rationale target_scope_description: string supporting_materials_needed: string[] assurance_basis_expected: AssuranceBasis[] likely_failure_modes: string[] allowed_operations: EvaluationOperationKind[] specialist_agent_ref?: string // if a specialist subevaluator handles this lane tool_refs: CapabilityRef[] can_evaluate_now: boolean missing_materials_or_tools: string[] schema_version: 1 } EvaluationOperationKind = | "read_artifact" | "compare_to_baseline" | "lookup_source" | "run_checker" | "invoke_specialist_subevaluator" | "assemble_findings" | "run_judge_module" ``` ## §5.4 AssuranceBasis and EvaluationLimitationKind V3 separates trust bases (why a verdict is trustworthy) from limitations (why no trustworthy verdict exists). This is the most consequential schema correction in V3. ### §5.4.1 AssuranceBasis ```ts AssuranceBasis = | "deterministic_check" // formal math, counting, equality | "structured_validation" // rule engine, formal logic, schema validation | "source_verified_external" // external authority (cases, statutes, filings) | "claim_grounded_internal" // internal corpus (firm templates, prior work) | "trace_verified" // process/tool trace verification | "coverage_mapping" // checklist completeness | "comparative_judge" // rubric-based comparison (Judge module) | "historical_baseline" // comparison vs canonical historical example | "statistical_threshold" // classifier output above threshold | "llm_expert_judgment" // single LLM as domain expert | "specialist_panel_judgment" // multiple specialists, aggregated | "policy_backed" // policy/privilege/classification decision IS the evaluation | "human_confirmed_in_run" // user explicitly verified during run | "mixed" // multiple sub-bases aggregated ``` **§5.4.1.1 `policy_backed` usage.** `policy_backed` is used ONLY when the policy engine's decision IS the substantive evaluation (e.g., "is this work product privileged?"). When policy is a precondition or guardrail to other evaluation, use PolicyEvaluationRef (§5.4.3) and a different AssuranceBasis for the substantive check. **§5.4.1.2 `mixed` usage.** When a verdict draws on multiple bases, use `mixed` and record the SubBasisRecord array: ```ts SubBasisRecord { sub_basis: AssuranceBasis // not "mixed" or "insufficient_evidence" weight: number rationale: string } ``` ### §5.4.2 EvaluationLimitationKind ```ts EvaluationLimitationKind = | "insufficient_evidence" | "human_judgment_needed" | "missing_capability" | "source_unavailable" | "policy_blocked" | "stale_evidence" | "unable_to_ground_claim" ``` **§5.4.2.1 Limitation surfacing.** When an outcome evaluation produces a limitation rather than a verdict, the OutcomeEvaluationResult (§5.6) records the limitation in its `limitations` array and sets `overall_state` accordingly (typically `unable_to_evaluate`, `needs_information`, `needs_verification`, or `needs_human_judgment`). **§5.4.2.2 Hard Call trigger.** Hard Call detection (§6.5) triggers on `EvaluationLimitationKind.human_judgment_needed`. It does NOT trigger on an AssuranceBasis value. The split between trust and limitation is what makes Hard Call detection unambiguous. ### §5.4.3 PolicyEvaluationRef When policy is a precondition (most cases), use this reference instead of `policy_backed` AssuranceBasis: ```ts PolicyEvaluationRef { policy_decision_ref: PolicyDecisionRef policy_owner: "EC" | "PropA" decision: "allow" | "block" | "allow_with_human_gate" applies_to: | "compilation" | "dispatch" | "mutation" | "export" | "memory_write" } ``` Human-gate triggering uses `PolicyDecision.decision == "allow_with_human_gate"`, not an AssuranceBasis value. ## §5.5 Evaluation Sufficiency Protocol The Compiler must choose an evaluation procedure that is sufficient to support a reliable decision given available materials, graph context, tools, and learned patterns. If no procedure is sufficient, it MUST emit `needs_verification`, `needs_information`, `needs_human_judgment`, or `unable_to_evaluate` rather than producing a weak verdict. ### §5.5.1 Sufficiency checks ```ts SufficiencyCheck { check_id: string outcome_id: string check_kind: | "evidence_available" | "tool_capability_present" | "source_authority_acceptable" | "assurance_basis_supported" | "threshold_extractable" | "specialist_available" required: boolean passed: boolean rationale: string } SufficiencyAssessment { outcome_id: string checks: SufficiencyCheck[] overall_sufficient: boolean insufficient_reason?: EvaluationLimitationKind } ``` ### §5.5.2 Compile-time and runtime sufficiency The Compiler emits a SufficiencyAssessment as part of the CompiledEvaluationPlan. The Evaluator re-checks at runtime, since available materials and capabilities may have changed since plan compilation. A plan that compiled with `overall_sufficient = true` MAY fail sufficiency at runtime; in that case the Evaluator produces a limitation rather than a verdict. ### §5.5.3 Sufficiency as gate A failed required SufficiencyCheck blocks evaluation of the outcome. The Evaluator emits an OutcomeEvaluationResult with `overall_state` set to the limitation kind, with no verdict. Implementations MUST NOT produce a verdict and then attach a limitation; verdict and limitation are mutually exclusive. ### §5.5.4 Progress signal classification (per [C15]) When an evaluation fails after a prior revision attempt, the Evaluator classifies the failure relative to the prior failure. This classification drives Revisor strategy selection in §6: ```ts ProgressSignal = | "first_failure" // no prior attempt | "still_failing_same_reason" // same FailureKind + same primary finding | "failing_for_new_reason" // different FailureKind or different primary finding | "regressed" // a previously-satisfied outcome now fails | "partially_resolved" // some prior findings resolved; new ones surfaced ``` The classification is computed at evaluation time by comparing the current `OutcomeEvaluationResult.findings` against the prior result's findings, keyed by `(failure_kind, target_artifact_section_ref, finding_summary_hash)`: ```ts ProgressSignalRecord { current_result_id: string prior_result_id?: string // null on first_failure signal: ProgressSignal matched_findings: FindingMatchPair[] // same finding across attempts resolved_findings: FindingRef[] // present prior, absent now new_findings: FindingRef[] // absent prior, present now classification_rationale: string // free text for UI/audit schema_version: 1 } FindingMatchPair { prior_finding_id: string current_finding_id: string match_basis: | "exact_summary_match" | "same_failure_kind_same_section" | "semantic_equivalence_judge" match_confidence: number // 0.0-1.0 } ``` **Revisor strategy selection rules:** - `still_failing_same_reason` → Revisor MUST consider whether the prior `RepairStrategyKind` was wrong-strategy (not wrong-execution). Continuing with the same strategy after `still_failing_same_reason` requires either: (a) a different `RepairTarget`, (b) a different module producing the artifact, or (c) explicit user override per the §6.6 default human gate. - `failing_for_new_reason` → Revisor MAY continue with the same family of strategies; this is "fix one thing, then fix the next." - `regressed` → Revisor MUST escalate (per §6.5 hard-call detection on regression); regression on a previously-satisfied outcome is a strong signal that the prior repair introduced a new defect. - `partially_resolved` → Revisor proceeds normally on the new findings; prior strategy was at least partially correct. The `D16` repeated-insufficiency loop breaker (§6.11) counts consecutive `still_failing_same_reason` signals; after `N` consecutive (default 3), it forces escalation to the Task Agent or to a human. The ProgressSignalRecord is persisted on every OutcomeEvaluationResult after the first, and is included in the RevisionIntelligencePacket (§7.4). ## §5.6 OutcomeEvaluationResult ```ts OutcomeEvaluationResult { result_id: string task_id: string run_id: string evaluator_module_id: string evaluator_activation_seq: number outcome_text: string target_artifact_ref: StorageRef target_version_ref: StorageRef evaluation_snapshot_ref: StorageRef // per §11.20 resolved_plan_ref: StorageRef // CompiledEvaluationPlan, runtime_resolved overall_state: OutcomeEvaluationState // per §0.4.1 summary: string lane_results: EvaluationLaneResult[] findings: EvaluationFinding[] // per §5.7 judgment_limitations: JudgmentLimitationRecord[] // per §5.9 revision_request?: RevisionRequest // per §7.1 information_requests: InformationRequest[] verification_requests: VerificationRequest[] human_judgment_requests: HumanJudgmentRequest[] verification_records: VerificationRecord[] // per §5.8 assurance_summary: { satisfied_items: AssuranceBasis[] unresolved_items: AssuranceBasis[] } limitations: EvaluationLimitationKind[] // when overall_state is a limitation confidence: { level: "low" | "medium" | "high" explanation: string } source_workspace_snapshot_ref?: StorageRef supporting_material_snapshot_refs: StorageRef[] created_at: ISO8601 schema_version: 1 } ``` ### §5.6.1 State machine ``` pending → evaluating → {satisfied | needs_revision | needs_information | needs_verification | needs_human_judgment | unable_to_evaluate | blocked_by_policy} → max_iterations_reached (terminal aggregate) ``` Terminal states (no further evaluation): `satisfied`, `unable_to_evaluate`, `blocked_by_policy`, `max_iterations_reached`, `unrecoverable`. Non-terminal states can transition through Loop Controller and Revisor cycles. ## §5.7 EvaluationFinding lifecycle ```ts FindingState = // per §0.4.4 | "proposed" | "active" | "contested" | "resolved" | "superseded_by_revision" | "superseded_by_source_change" | "user_approved" | "tool_verified" | "human_verified" | "rejected_by_user" | "dismissed" | "unrecoverable" EvaluationFinding { finding_id: string result_id: string finding_text: string severity: "low" | "medium" | "high" | "blocking" state: FindingState basis: AssuranceBasis target_artifact_ref: StorageRef target_version_ref: StorageRef supporting_material_snapshot_refs: StorageRef[] evidence_refs: StorageRef[] verification_record_refs: StorageRef[] confidence: "low" | "medium" | "high" confidence_explanation: string taint_class: TaintClass // per §15.10 created_at: ISO8601 schema_version: 1 } ``` ### §5.7.1 State transitions Findings progress through states as the run proceeds: - `proposed` → `active`: Compiler/Evaluator confirms the finding stands - `active` → `contested`: User or downstream evaluator disagrees - `active` → `resolved`: A revision addressed the finding - `active` → `superseded_by_revision`: The artifact section the finding targeted no longer exists - `active` → `superseded_by_source_change`: The source that grounded the finding changed - `active` → `tool_verified`: A subsequent tool call confirmed the finding - `active` → `human_verified`: Explicit human review confirmed the finding - `active` → `user_approved`: User accepted the finding as correct - `active` → `rejected_by_user`: User rejected the finding - `active` → `dismissed`: System dismissed the finding (e.g., as duplicate) - `active` → `unrecoverable`: After N revision attempts, the finding cannot be addressed ### §5.7.2 State invariants - A finding with `state == "resolved"` has at least one corresponding RevisionExecutionReceipt where the finding appeared in `addressed_findings`. - A finding with `state == "user_approved"` or `human_verified` has a corresponding TaintClearanceRecord (§15.12) if its `taint_class` was originally untrusted. - A finding with `state == "unrecoverable"` triggers escalation per §6.8. ## §5.8 VerificationRecord For most evaluations, the Evaluator produces Verification Records (receipts) rather than Source Cards (durable research artifacts). ```ts VerificationRecord { record_id: string task_id: string run_id: string evaluator_module_id: string evaluator_activation_seq: number verification_kind: | "source_lookup" | "reference_support" | "quote_exact_match" | "format_validation" | "schema_validation" | "calculation_check" | "tool_trace_check" | "human_review" | "other" target_item_description: string target_ref: StorageRef result: "passed" | "failed" | "inconclusive" | "not_checked" source_refs: StorageRef[] tool_call_refs: StorageRef[] explanation: string created_at: ISO8601 schema_version: 1 } ``` **§5.8.1 Verification vs Source Card.** A Source Card is a durable research artifact owned by the Source Workspace pipeline. A Verification Record is a receipt: "I checked X using Y, result Z, here's the evidence." For simple tasks, a Verification Record may be the only durable evaluation artifact. ## §5.9 JudgmentLimitationRecord When the Evaluator encounters material non-mechanical choices in the artifact, it produces JudgmentLimitationRecords: ```ts JudgmentLimitationRecord { record_id: string task_id: string run_id: string evaluator_module_id: string evaluator_activation_seq: number target_artifact_ref: StorageRef target_version_ref: StorageRef judgment_kind: | "strategic_choice" | "issue_weighting" | "argument_ordering" | "risk_tradeoff" | "source_weighting" | "style_choice" | "scope_choice" | "professional_judgment" | "business_judgment" | "technical_architecture_judgment" | "other" location: { module_id?: string activation_seq?: number artifact_section_ref?: StorageRef text_anchor?: string } description: string why_it_matters: string severity: "informational" | "flag" | "block_advisory" | "block_required" blocking: boolean // computed from severity system_can_certify: boolean recommended_handling: | "informational" | "flag_for_user" | "needs_human_judgment" | "block_until_human_approval" | "route_to_revisor" | "route_to_information_gathering" evidence_refs: StorageRef[] inferred_or_declared: "module_declared" | "evaluator_inferred" | "user_declared" confidence: "low" | "medium" | "high" schema_version: 1 } ``` **§5.9.1 Judgment vs finding.** A finding says "the artifact has a problem." A judgment limitation says "the artifact made a strategic call that the system cannot certify." Findings drive revision; judgment limitations drive Hard Calls (§6.5). **§5.9.2 Severity to handling.** `severity == "block_required"` always means `blocking == true` and triggers `block_until_human_approval`. Lower severities are advisory. ## §5.10 DecisionTraceRecord (optional in V3) Upstream modules MAY emit DecisionTraceRecords to make evaluator analysis better without exposing private chain-of-thought. ```ts DecisionTraceRecord { decision_id: string task_id: string run_id: string module_id: string activation_seq: number artifact_ref: StorageRef decision_text: string decision_kind: | "strategy" | "assumption" | "scope_choice" | "source_choice" | "format_choice" | "risk_tradeoff" | "unknown" basis_refs: StorageRef[] materiality: "blocking" | "important" | "informational" source: "module_declared" | "evaluator_inferred" | "user_declared" confidence: "low" | "medium" | "high" schema_version: 1 } ``` **§5.10.1 Optional in V3.** Universal upstream DecisionTraceRecord adoption is deferred (§26 Open Questions, [P1]). Modules MAY emit them; the Evaluator MAY consume them. Phase 2 will require them for revision-target modules. ## §5.11 EvaluationTargetClosurePolicy When the Evaluator runs in targeted mode (e.g., `target_outcomes=[o5]`), it must declare which other outcomes are included via closure: ```ts EvaluationTargetClosurePolicy { requested_outcomes: outcome_id[] include_prerequisites: boolean include_invalidated_dependents: boolean include_required_for_overall_pass: boolean include_prior_regressions: boolean closure_reason_records: ClosureReasonRecord[] } ClosureReasonRecord { added_outcome_id: outcome_id reason: | "prerequisite_of_requested" | "invalidated_by_requested" | "required_for_overall_pass" | "prior_regression_candidate" source_outcome_id: outcome_id } ``` **§5.11.1 Closure invariant.** Without closure, targeted evaluation can produce dangerously partial green checks: outcome o5 passes but o4 (which o5 depends on) silently fails. Implementations MUST emit a closure policy with the targeted evaluation and MUST log every outcome added by closure. ## §5.12 Canonical schemas The following are the canonical schemas for V3. References in this addendum and in cross-doc obligations use these names exactly. - `EvaluationOutcomeDefinition` (§5.1.1) - `EvaluationBinding` (§5.1.2) - `OutcomeDependencySpec` (§5.1.3) - `RecoveryPolicy` (§5.1.4) - `OutcomeRuntimeState` (§5.1.5) - `OutcomePatternRef` (§5.1.6) - `CompiledEvaluationPlan` (§5.2) - `EvaluationLanePlan` (§5.3) - `SufficiencyCheck` / `SufficiencyAssessment` (§5.5) - `OutcomeEvaluationResult` (§5.6) - `EvaluationFinding` (§5.7) - `VerificationRecord` (§5.8) - `JudgmentLimitationRecord` (§5.9) - `DecisionTraceRecord` (§5.10, optional) - `EvaluationTargetClosurePolicy` (§5.11) - `EvaluationVerdict` (atomic verdict per lane; see §5.13) - `FindingsBundle` (collection of EvaluationFinding[] with metadata) - `ProgressSignalBundle` (Compiler / Evaluator progress signals) - `IterationHistoryBundle` (prior plan attempts and outcomes) - `HumanResponse` (user input during run) - `EscalationDetail` (escalation context) ## §5.13 EvaluationVerdict An atomic verdict produced per lane. Lane results aggregate into the OutcomeEvaluationResult. ```ts EvaluationVerdict { verdict_id: string lane_id: string outcome_id: string result: | "lane_passed" | "lane_failed" | "lane_indeterminate" | "lane_skipped" basis: AssuranceBasis confidence: "low" | "medium" | "high" rationale: string evidence_refs: StorageRef[] schema_version: 1 } ``` ## §5.14 Pending dependency state Outcomes with declared dependencies on artifacts that don't yet exist enter `pending_dependency` state rather than failing: ```ts PendingDependencyInfo { outcome_id: string missing_artifact_refs: ArtifactRef[] expected_producer_modules: module_id[] wait_timeout: number // ms on_timeout: | "fail_outcome" | "mark_indeterminate" | "escalate_human" | "ask_task_agent" | "abort_task" | "continue_without_outcome" dependency_optional: boolean required_for_overall_pass: boolean schema_version: 1 } ``` **§5.14.1 Cascading failure.** Per §11 Loop Controller integration, when a module terminates with `execution_status` in `{could_not_fix, failed_runtime, rejected_capability}` and `retry_count >= per_outcome_retry_budget`, the Loop Controller MUST: 1. Identify all outcomes in `OutcomeEvaluationState.pending_dependency` whose `missing_artifact_refs` include artifacts producible by the failed module 2. Transition each such outcome to `OutcomeEvaluationState.upstream_failure` 3. Emit a RevisionOperationReceipt with `operation_kind = "escalation_created"` This prevents infinite waits on artifacts that will never be produced. ## §5.15 Final aggregation policy ```ts OverallTaskOutcome = | "all_outcomes_passed" | "all_required_passed_some_optional_failed" | "some_required_failed" | "some_required_unrecoverable" | "all_unrecoverable" | "blocked_by_policy" AggregationPolicy { required_outcomes: outcome_id[] required_to_pass: "all" | "majority" | "weighted_threshold" weighted_threshold?: number unrecoverable_handling: "fail_overall" | "warn_but_pass" | "user_decides" } ``` Outcomes in `upstream_failure` state count as `unrecoverable` for aggregation purposes. ## §5.16 EvaluationSnapshot (per [E6]) When an evaluation completes — whether the result is a verdict, a limitation, or a Hard Call — the Evaluator emits an `EvaluationSnapshot` alongside the `OutcomeEvaluationResult`. The snapshot is the immutable record of what the evaluation saw. Downstream consumers (the Revision Compiler, the §11.20 live-edit hash check, the §6.7 sufficiency protocol) use the snapshot to detect that the system has moved since evaluation. ```ts EvaluationSnapshot { snapshot_id: string task_id: string run_id: string result_id: string // OutcomeEvaluationResult that owns this snapshot // What artifacts the evaluator looked at artifact_refs_at_evaluation: ArtifactRef[] artifact_version_refs_at_evaluation: ArtifactVersionRef[] artifact_content_hashes: Record // SHA-256 of normalized content // What the Source Workspace looked like source_workspace_state_ref: StorageRef // snapshot of workspace head pointers source_workspace_head_hashes: Record // What the task graph looked like graph_topology_hash: string // hash of module/port/cable topology active_modules_at_evaluation: ModuleRef[] // What context was bound evaluator_module_id: string evaluation_method: EvaluationMethod capability_binding_refs: CapabilityRef[] // per §9.2 capability versioning capability_versions_snapshot: Record // version at snapshot time // Authority and policy snapshot (per §3.7, P34) cil_authority_snapshot_ref: StorageRef // CIL snapshot at evaluation time policy_decision_refs: PolicyEvaluationRef[] // policy decisions consulted // Timing captured_at: ISO8601 evaluation_duration_ms: number schema_version: 1 } ``` **Consumers of EvaluationSnapshot:** - **Revision Compiler (§6.1):** reads `artifact_content_hashes` and `capability_versions_snapshot` to compose a `RevisionRequest` whose `expected_pre_state` references the evaluation-time state. - **Live-edit hash check (§11.20):** before applying a direct fix or starting a rolling-hash plan, the dispatcher compares the current artifact content hash against `artifact_content_hashes[artifact_ref]`. A mismatch means the artifact moved since evaluation; the plan is rejected and re-compiled. - **Sufficiency Protocol re-check (§6.7):** at sufficiency time, the Revisor consults `cil_authority_snapshot_ref` and `capability_versions_snapshot` to verify the conditions under which the prior evaluation completed still hold. If authority memory or capability versions have moved, the prior evaluation may be re-keyed. - **Audit trail:** every `RevisionPlan` references `evaluation_snapshot_ref` so a reviewer can reconstruct what the evaluator saw at the moment of failure. **Snapshot immutability:** snapshots are write-once. Re-evaluation produces a new snapshot with a new `snapshot_id`. Implementations MUST NOT mutate prior snapshots even when underlying artifacts change. **Storage:** snapshots are persisted via EC. Retention follows the `EvaluationArtifactGovernancePolicy` per §16. Default retention: 90 days post-task-completion for non-privileged tasks, longer for privileged or matter-scoped tasks per §16.6 matter-specific governance. ## §5.17 Evaluator `claims_in` port contract (V3.2; coordination V3 §2.12) V3.2 adds an explicit `claims_in` port on `step.evaluator`, symmetric to V3.1's `revision_in` port on `step.revisor` (§9). The Evaluator consumes claims and other extracted evaluation units from Addenda A's `step.claim_extractor` upstream graph module. ### §5.17.1 Wiring The port consumes via explicit graph wiring; no virtual `data_out` alias reliance (per coordination V3 §2.12 / Addenda A V4 R163 corrected pattern): ``` [source artifact] → step.claim_extractor → claims_out ↓ step.evaluator.claims_in ↓ [internal routing to specialist subevaluators that need claims] ``` The port is a fan-in — multiple upstream `step.claim_extractor` activations can feed one Evaluator activation (e.g., one extractor per source artifact when the Evaluator scores against multi-source briefs). ### §5.17.2 Port type ```ts EvaluatorClaimsInPort { port_id: "claims_in" accepted_payload_types: | "ClaimSetBundle" // V3.1 + Addenda A V4 R202 / R208 | "ExtractedEvaluationUnitBundle" // coordination V3 §2.12 broadened form // — 22 unit types per the Claim // Extractor registry cardinality: "many" // multiple upstream extractors permitted required: false // optional — Evaluator can operate // without claim input for outcome kinds // that don't need extracted evidence // Participates in V3.1 conflict-detection machinery read_set_participant: true // contributes to PlanReadSet (§11.9) // Taint inheritance — claims inherit source artifact taint taint_propagation: "inherit_from_source" // per V3.1 §15.11 transitive propagation schema_version: 1 } ``` ### §5.17.3 Consumer routing The Evaluator's `claims_in` is a fan-in for specialist subevaluator consumption (§8). The Evaluator itself does not interpret claim content; it routes claims to specialist subevaluators based on the compiled evaluation plan: - Source verification specialists consume `factual_assertion`, `citation_reference`, `authority_support_link`, `record_citation_reference`, `quotation_or_paraphrase` claims to populate `VerificationRecord` (§5.8) - Structural specialists consume `section_heading`, `argument_structure_element`, `enumerated_item`, `structural_coverage_marker` units for `work_product_outcome` evaluation - Length / format specialists consume `page_count_feature`, `tone_or_register_marker` units - Legal-domain specialists consume `legal_issue`, `legal_element_or_prong`, `requested_relief`, `defined_term_reference` units - Privilege / confidentiality specialists consume `confidentiality_or_privilege_marker` units - Quality specialists consume `uncertainty_or_hedging_marker`, `contradiction_marker`, `source_gap_marker` units - Process specialists consume `procedural_step`, `judgment_basis_statement` units for `process_outcome` and `judgment_outcome` evaluation The full 22-unit-type registry is owned by Addenda A's `step.claim_extractor` (coordination V3 §2.12). Specific specialist sub-agent profiles (§8.4) declare which unit types they require. ### §5.17.4 Compile-time gating The Outcome Compiler (§4) detects whether claims are required for the compiled evaluation plan and emits validation accordingly: - If `Criterion.required_claim_types` is non-empty for any criterion in scope AND `claims_in` is wired → plan proceeds normally - If required AND `claims_in` is NOT wired → Compiler emits `needs_missing_source` indeterminate outcome (§5.5) OR proposes a `graph_patch_proposal` step adding `step.claim_extractor` upstream of `step.evaluator`. Whether to gate vs propose-patch follows the user's `RevisorConfig.autonomous_mode_policy` (§6.6). - If required claim types ARE NOT declared on any criterion → `claims_in` wiring is optional; no validation fires. ### §5.17.5 Hidden-dispatch prohibition Per coordination V3 §2.12: the Evaluator MUST NOT hidden-dispatch the Claim Extractor as a service. The extractor is graph-native. If the compiled evaluation plan needs claims and the `claims_in` port is not wired, the Evaluator surfaces the gap explicitly via §5.17.4 — it does not silently invoke the extractor. This preserves DOC23 graph primacy. ### §5.17.6 Cost accounting Claim extractor invocations are upstream of the Evaluator and accounted to the extractor's cost record per V3.1 §12.5. The Evaluator's own cost accounting does not include extractor cost; downstream Quality Program reports surface the combined evaluation-pipeline cost when needed. ### §5.17.7 Cross-doc references - Coordination V3 §2.12 — extractor architecture, 22-type registry, PropA coordination - DOC23 Evaluation Common Contracts — `ExtractedEvaluationUnit` union, `ClaimSetBundle` schema, `ArtifactScopeRef` source-anchor primitives - OP-A row `OBL-XDOC-EVALUATOR-CLAIMS-IN-01` — this addendum owns the port contract; coordinates with Addenda A's `step.claim_extractor` public contract owned by `OBL-XDOC-CLAIM-EXTRACTOR-PUBLIC-01` ## §5.18 Evaluator `evaluation_result_out` port contract (V3.3; coordination V3 §2.3, §2.9) V3.3 adds explicit documentation of the Evaluator's canonical envelope output port. This crystallizes Pattern C ad-hoc Judge attachment (per coordination V3 §2.9 and §2.3) at the port level — V3.2 documented Pattern C as a pattern; V3.3 specifies the wiring. ### §5.18.1 Why this port Pattern C is the killer-feature flow where Judge attaches downstream of any standalone Evaluator output to produce per-criterion numeric scores **without requiring an Experiment**. The user clicks "Attach Judge to this Evaluator output" in DOC20 (per OBL-XDOC-DOC20-EVAL-UI-01); the system creates a Judge module wired to the Evaluator's envelope output. Before V3.3, Pattern C required a coding agent to interpret which Evaluator output port carried the right payload — `feedback_bundle_out` (Feedback Delivery V1.0.1 §7.2) carries the bundle which references the envelope, but Judge needs the envelope content directly, not a reference. V3.3 adds the canonical envelope output port. ### §5.18.2 Wiring ``` [evaluated artifact] ↓ [step.evaluator] ──── evaluation_result_out ────┐ (V3.3 NEW) ↓ │ ├── feedback_bundle_out (FD V1 §7.2) ──┐ │ ├── repair_instruction_out (FD V1) │ │ ├── approved_out / failed_out / ... │ │ └── process_gap_out (FD V1 §7.4) │ │ │ ↓ │ [step.judge] ─── evaluation_result_out │ (Pattern C) (Judge's own envelope ↓ with quantitative_slice [step.revisor] populated) (existing revision_in port — §9) ``` ### §5.18.3 Port type ```ts EvaluatorEvaluationResultOutPort { port_id: "evaluation_result_out" emitted_payload_type: "EvaluationArtifactEnvelope" // Common Contracts §3 + Addenda A // existing EvaluationArtifactEnvelope // wrapper (V4 R199) cardinality: "one_per_activation" // every Evaluator activation emits // exactly one envelope required: true // emission is not optional; // Evaluator emits envelope even // when verdict is indeterminate or // result_lifecycle_status is "error_no_result" // Pattern relevance pattern_a_relevance: "internal_to_experiment" // Experiment captures internally; // no explicit external wiring pattern_b_relevance: "internal_to_experiment" pattern_c_relevance: "downstream_judge_consumes" // Judge wired here in Pattern C // Other downstream consumers (parallel to Judge in Pattern C wiring) other_consumer_kinds: [ "agent_review_gate", // gate consumers may read envelope "task_agent_diagnosis", // Task Agent reads for diagnosis "doc20_ui", // UI renders envelope "audit_replay", // replay machinery "downstream_evaluator" // chained evaluators may read upstream ] // Wiring discipline wiring_kind: "graph_native" // parallel to §5.17.5 hidden-dispatch // prohibition for Claim Extractor schema_version: 1 } ``` ### §5.18.4 Emission discipline The Evaluator emits on `evaluation_result_out` on every activation completion. The envelope's contents reflect the activation outcome: - Verdict and lifecycle status populated per V3.2 §5.1 / Common Contracts §3.2 mapping - Slices populated per §2.3 ownership matrix (Evaluator populates `qualitative_slice`; `quantitative_slice` is null when emitted by Evaluator alone) - `variant_lineage` populated when in Experiment context (Patterns A or B); null in Pattern C - `criterion_lineage[]` populated when criteria were evaluated - `hard_call_surface_ref` populated when Hard Calls raised - `target_evaluation_chain_id` populated to enable downstream Judge (Pattern C) or other chained consumers to correlate their envelope with this upstream one ### §5.18.5 Pattern C consumer contract (Judge) In Pattern C, Judge consumes via its own input port (Addenda A's territory). Expected Judge input port surface (specified by Addenda A per OBL-XDOC-JUDGE-EVALUATOR-OUTPUT-IN-01): ``` Judge inputs (Addenda A spec): artifact_in // the artifact being scored criteria_in // EvaluationOutcomeDefinition.criteria[] evaluation_result_in // V3.3 NEW (Addenda A specifies) // consumes EvaluationArtifactEnvelope< // EvaluationResultEnvelope> from // Evaluator's evaluation_result_out ``` Judge's `evaluation_result_in` port surface is owned by Addenda A R4.1 V3; the port name is non-normative from Addenda B's side. Addenda A may choose `evaluation_result_in`, `evaluator_output_in`, `upstream_evaluation_in`, or similar; the obligation tracks the contract, not the specific name. What Judge does with the upstream envelope in Pattern C: 1. Reads `evaluated_target`, `evaluation_basis`, `target_artifact_version_ref` to identify the artifact and criteria scope 2. Reads `qualitative_slice.findings` to understand Evaluator's prescriptive findings (informs which criteria need careful scoring) 3. Scores against `Criterion[]` from EvaluationOutcomeDefinition (criteria pulled separately, not via the envelope; Judge has its own access to the OutcomeSpec) 4. Emits its own `EvaluationResultEnvelope` with `producer_kind = "judge"`, populating `quantitative_slice`, leaving `qualitative_slice` empty 5. Sets its envelope's `target_evaluation_chain_id` to match the upstream Evaluator's value, enabling chain reconstruction in audit / UI / learning ### §5.18.6 Hidden-dispatch prohibition (parallel to §5.17.5) Per coordination V3 §2.12 (symmetric to the Claim Extractor rule): Pattern C MUST use graph wiring. The Evaluator MUST NOT hidden-dispatch a Judge module as a service. If the user wants Judge scoring on an Evaluator output, they wire Judge downstream via the DOC20 "Attach Judge" UI action (per OBL-XDOC-DOC20-EVAL-UI-01) or via direct graph editing. The Evaluator emits its envelope on `evaluation_result_out`; whether and how a Judge consumes is determined by graph wiring at edit time. This preserves DOC23 graph primacy (V3.1 §0A.1 normative). The Evaluator does not "know" whether a Judge is downstream — it emits its envelope regardless; downstream wiring determines what happens next. ### §5.18.7 Relationship to other output ports `evaluation_result_out` is the canonical envelope output. Other Evaluator output ports (per Feedback Delivery V1.0.1 §7) serve distinct purposes: | Port | Carries | Consumer purpose | |---|---|---| | `evaluation_result_out` (V3.3 NEW) | EvaluationArtifactEnvelope | Pattern C Judge; audit; UI; Task Agent diagnosis | | `feedback_bundle_out` (FD V1 §7.2) | EvaluationFeedbackBundle | Revision module; forum publication | | `approved_out` / `failed_out` / `needs_*_out` (FD V1 §7.1) | Control flow signal | Graph routing | | `repair_instruction_out` (FD V1 §7.2) | OutcomeRepairInstruction[] | Revision modules | | `research_need_out` (FD V1 §7.2) | ResearchNeed[] | Source Research module | | `format_repair_out` (FD V1 §7.2) | Format-repair instructions | Format Checker | | `process_gap_out` (FD V1 §7.4) | TaskProcessGapSignal | Task Agent assessment queue | Every Evaluator activation emits on `evaluation_result_out` (required); other ports fire conditionally per `FeedbackRoutingPolicy` and verdict. ### §5.18.8 Patterns A and B relationship In Patterns A (per-variant) and B (bundled comparative), each Evaluator activation lives inside an Experiment compute context. The Evaluator activations DO emit on `evaluation_result_out`, but Experiment captures the emission internally as part of its variant comparison logic — there is no explicit external graph wiring between the inside-Experiment Evaluator and the Experiment routing logic. Experiment's internal orchestration of these Evaluator activations is Addenda A's spec (Addenda A R4.1 V3 Experiment module). In Pattern C, the Evaluator activation is **outside** any Experiment context. The `evaluation_result_out` port is the external graph wiring point. Judge attaches downstream via standard graph editing. ### §5.18.9 Cost accounting Emission on `evaluation_result_out` is a no-cost operation (envelope construction is part of evaluation cost already accounted to the Evaluator activation per V3.1 §12.5). Downstream consumers (Judge in Pattern C) carry their own activation cost. ### §5.18.10 Cross-doc references - Common Contracts §3 — EvaluationResultEnvelope schema - Common Contracts §3.7 (V1.1) — Pattern C consumption semantics, target_evaluation_chain_id linkage - Coordination V3 §2.3 — EvaluationArtifactEnvelope wrapping requirement - Coordination V3 §2.9 — Pattern C ad-hoc Judge attachment - Coordination V3 §2.12 — graph-native discipline (hidden-dispatch prohibition; parallel to §5.17.5) - OBL-XDOC-OUTCOME-COMPLIANCE-01 — Judge gains `outcome_compliance_scoring` method - OBL-XDOC-JUDGE-EVALUATOR-OUTPUT-IN-01 (V3.3 NEW) — Addenda A specifies Judge's input port for Pattern C consumption - OBL-XDOC-DOC20-EVAL-UI-01 — DOC20 "Attach Judge" UI action --- # §6. REVISOR The Revisor is a planner. It produces typed RevisionPlans; modules do the reviser work. This boundary — plan production vs plan execution — is the architecture's strongest design decision. ## §6.1 Revision Compiler The Revision Compiler is the LLM-based layer between the OutcomeEvaluationResult and the typed RevisionPlan. Symmetric to the Outcome Compiler (§4) but for revision strategy rather than evaluation plan. **§6.1.1 Inputs.** The Revision Compiler reads: ``` evaluation result (findings, judgment limitations, verdict) + graph context (modules, capabilities, topology) + artifact lineage and version history + source workspace snapshot + iteration history (prior plans and outcomes) + human feedback (current run) + learned RevisionPatterns + DOC72 goal context + RevisionSafetyEnvelope (taint labels, policy decisions) + HardCallResolutionLedger + CIL authority snapshot ``` **§6.1.2 Outputs.** The Revision Compiler emits: ``` CompiledRevisionStrategy (internal artifact, §7.8) → RevisionPlan (the typed dispatchable plan, §7.2) ``` **§6.1.3 Internal stages.** Within a single Compiler invocation: 1. **Normalize failure** — read findings, classify into FailureKind values (§6.2) 2. **Diagnose cause** — produce RevisionDiagnosis records (§7.4) 3. **Identify repair target** — RepairTarget per failure 4. **Choose strategy** — RepairStrategyKind per failure 5. **Order repairs** — apply repair ordering rules (§6.11) 6. **Produce plan** — emit RevisionPlan 7. **Validate** — internal Compiler validation before plan emission **§6.1.4 Bounded cost.** One LLM call per Revisor activation. Sub-agent invocations have separate budgets per AdvisorySubAgentProfile (§8). Per RevisorConfig: - `max_logical_llm_calls_per_revision` — successful inferences - `max_infrastructure_retries_per_logical_call` — JSON/schema retries - `max_estimated_tokens_per_revision` — token budget - `max_local_compute_seconds_per_revision` — wall-clock budget Infrastructure retries do not consume the logical budget (per [P-32]). **§6.1.5 Sub-agent invocation.** The Revision Compiler MAY invoke advisory sub-agents (§3.4, §8) for: - **Source repair planning** (`RevisionDiagnosis` output): does the failure need more or different evidence? - **Structure repair planning** (`RevisionDiagnosis` output): outline or allocation issues? - **Style repair planning** (`PlanAssuranceCritique` output): voice, clarity, tone? - **Format repair planning** (`PlanAssuranceCritique` output): output structure issues? - **Process repair planning** (`RevisionDiagnosis` output): skipped tools or missing context? - **Risk / Hard-Call planning** (`RevisionDiagnosis` output): strategic judgment needed? Sub-agent output conforms to the AdvisorySubAgentOutput union (§8.4). The Compiler decides to accept, reject, or defer each sub-agent's advice; decisions are recorded in CompiledRevisionStrategy (§7.8). **§6.1.6 Novelty detection.** The Revision Compiler computes a NoveltyAssessment per §4.5 against the RevisionPattern corpus. Above the novelty threshold, fresh reasoning is forced; below, learned patterns may be incorporated subject to compatibility constraints (§13.2). ## §6.2 Failure taxonomy Each finding the Compiler addresses gets a classified `FailureKind`: ``` FailureKind = | "content_gap" // something missing | "source_gap" // more evidence/research needed | "source_misuse" // source exists but used incorrectly | "reasoning_error" // module drew wrong conclusion | "structure_error" // work product organized badly | "style_error" // tone/voice/clarity failed | "format_error" // output format/technical formatting failed | "strategic_judgment_error" // issue weighting / strategic choice failed | "process_error" // graph skipped required process/tool/source | "context_error" // module did not receive needed context | "delivery_error" // output saved/sent/delivered incorrectly | "graph_design_gap" // graph lacks module/route/tool/gate for this ``` The Compiler assigns one or more FailureKinds per finding. Multiple kinds may apply when a single finding has multiple roots (e.g., source_gap + reasoning_error: the module reasoned without sufficient evidence). ## §6.3 Repair strategy taxonomy Each failure maps to a `RepairStrategyKind`: ``` RepairStrategyKind = | "preserve_and_modify" // keep good parts, modify failed sections | "regenerate" // rerun module from scratch with new instructions | "focus_on" // rerun/revise only specific issue/section | "apply_updates" // apply outputs from research/verifier modules | "gather_more_information" // send to research/source module first | "verify_then_revise" // verify source/citation/fact before rewriting | "restructure" // rebuild outline preserving verified substance | "style_pass" // targeted style revision after substance stable | "format_pass" // formatting/output after content stable | "fork_from_checkpoint" // rerun graph from prior module with corrected context | "direct_fix" // mechanical safe patch only (§10) | "human_judgment" // pause or route to human for hard call | "graph_patch_proposal" // ask Task Agent to propose graph repair ``` **§6.3.1 Strategy is not capability.** A `RepairStrategyKind` is not a `ModuleRevisionCapability` name. The Compiler chooses a strategy; the Dispatcher invokes a capability. The mapping between them is many-to-many and is declared in `RepairStrategyCapabilityMap` per ModuleRevisionCapability (§9.2): ```ts RepairStrategyCapabilityMap { strategy: RepairStrategyKind compatible_capabilities: Array<{ capability_id: string capability_version_constraint: semver_range confidence: "low" | "medium" | "high" }> } ``` The Compiler selects a strategy, then resolves it to one or more compatible capabilities, then chooses the best-fit capability based on availability, version, and historical reputation. ## §6.4 Repair target taxonomy Each failure also maps to a `RepairTarget`: ``` RepairTarget = | "same_producer_module" | "upstream_source_module" | "drafting_or_revision_module" | "format_or_output_module" | "verification_module" | "human_review_gate" | "task_agent_process_gap" | "full_rerun_or_fork" ``` The target answers: *who fixes this?* It is independent of strategy: a `regenerate` strategy applied to the `same_producer_module` is different from `regenerate` applied to `drafting_or_revision_module`. ## §6.5 Hard Revision Calls Hard Calls fire when revision would require strategic, professional, or non-delegable judgment. They block dispatch until user resolution. ### §6.5.1 Detection Detection is combined rule-based plus LLM-classifier: **Rule-based triggers (always-on, deterministic):** - `failure_kind == strategic_judgment_error` (per §6.2) - `EvaluationLimitationKind.human_judgment_needed` present in evaluation result - `EvaluationLimitationKind.policy_blocked` present - Outcome `is_high_stakes == true` - User has marked the outcome `requires_hard_call_check` - Failure is on a judgment-based outcome (AssuranceBasis in `{llm_expert_judgment, specialist_panel_judgment, comparative_judge}` below confidence threshold) - `JudgmentLimitationRecord.severity in {"block_advisory", "block_required"}` present - `RevisionDiagnosis.repair_recommendation == "human_judgment"` **LLM hard-call classifier (confirming pass):** - Runs only if a rule-based trigger fires - Confidence threshold configurable in RevisorConfig (default 0.7) - Reduces noise without adding latency for the common case Detection produces a typed `HardRevisionCall` (§7.9) with bounded `HumanDecisionOption[]`. ### §6.5.2 Taxonomy ``` HardRevisionCallKind = | "strategic_legal_judgment" | "material_fact_dispute" | "source_authority_conflict" | "privilege_or_confidentiality_risk" | "client_position_change" | "external_side_effect_required" | "capability_gap" | "risk_tradeoff_no_dominant_option" | "human_preference_needed" ``` ### §6.5.3 Blocking behavior A Hard Call with `blocking == true` halts plan dispatch. The Dispatcher state transitions to `waiting_hard_call` (§0.4.2). The plan cannot resume until the user resolves the Hard Call via the [B10] UI surface (§21). ### §6.5.4 Resolution Resolution is recorded in the HardCallResolutionLedger (§7.9.2), with full compatibility binding (per [P-13]). A prior resolution is reused only if the current outcome definition, goal context, evidence snapshot, and artifact version scope remain compatible. ## §6.6 Default human gate **§6.6.1 Judgment-based default.** Plan steps targeting outcomes whose AssuranceBasis includes `llm_expert_judgment`, `specialist_panel_judgment`, or `comparative_judge` (below confidence threshold), or whose Evaluation produced `EvaluationLimitationKind.human_judgment_needed`, default to a step-level human gate before dispatch. **§6.6.2 AutonomousModePolicy override.** RevisorConfig contains an AutonomousModePolicy (per [P-11]): ```ts AutonomousModePolicy { skip_low_risk_judgment_gate: boolean may_skip_hard_call_gate: false // LOCKED may_skip_policy_gate: false // LOCKED may_skip_privileged_artifact_gate: false // LOCKED may_skip_external_side_effect_gate: false // LOCKED allowed_assurance_bases: AssuranceBasis[] max_plan_risk_score: number schema_version: 1 } ``` The four `may_skip_*` fields are locked to `false` in the schema; the validator rejects any RevisorConfig record where they differ. A user can opt out of low-risk LLM-judgment gates by setting `skip_low_risk_judgment_gate = true`, but Hard Calls, policy gates, privileged artifacts, and external side effects always require human gates. **§6.6.3 Combined with version-and-diff.** Per §11.16, every revision dispatch produces a saved version with a diff. UI defaults to review-before-accepting for plans above confidence or cost thresholds. The combination of default human gate + version-and-diff + revert makes Tier-5 (highest-stakes) legal work acceptable. ## §6.7 Sufficiency Protocol and seven success conditions The Revision Compiler must choose a procedure sufficient to actually resolve the failure given available capabilities, prior iteration patterns, preservation constraints, and risk profile. **§6.7.1 Insufficient outputs.** If no procedure is sufficient, the Compiler MUST emit one of: - `needs_human_judgment` — escalate to user - `needs_capability_expansion` — escalate to Task Agent (§6.9) - `cannot_compile_with_current_graph` — graph design gap These are not "low confidence" outputs; they are explicit insufficiency markers. The Compiler MUST NOT compile a weak plan to satisfy the schema. **§6.7.2 Seven success conditions.** A revision cycle is successful when ALL of: 1. Plan steps complete or safely escalate 2. Affected artifact versions are updated (candidate or current per §11.11) 3. Modules acknowledge what they addressed via RevisionExecutionReceipts (§9.4) 4. Targeted outcomes are re-evaluated 5. Cascaded dependent outcomes are re-evaluated 6. No active blocking findings remain 7. No unresolved Hard Call blocks the stated outcome Each condition is a typed runtime assertion. Loop Controller checks all seven at revision-cycle completion; failure of any condition prevents the revision from being marked successful. **§6.7.3 Loop breaker.** RevisionPlan carries `repeated_insufficiency_count`. After N consecutive insufficient plans (default N=3 from RevisorConfig), forced escalation fires per §6.8. **§6.7.4 Yield-back atomicity.** When a plan has `yield_back_enabled == true` (per §6.7.5), all mutation steps MUST use `mutation_mode = "candidate_only"` (per [P-31]). Partial execution accumulates into a single CandidateArtifactVersion chain; the original live artifacts remain at base versions throughout. Yield-back cannot leave orphaned mutations in the active workspace. **§6.7.5 Yield-back enablement.** RevisionPlan carries `yield_back_enabled: boolean` (default false). When true, partial plan results yield back to the Revisor for replan opportunity mid-execution. When false, the plan executes to completion or terminal state. ## §6.8 Ten failure mode handlers Every Revisor failure mode has an explicit handler: 1. **No revision-capable target** → emit `process_gap_out` → Task Agent (§6.9) 2. **Artifact version mismatch** → `abort_and_replan` OR `ask_human`; rebase is not permitted (per §11.20) 3. **Module rejects capability** → replan with available capabilities 4. **Module returns `could_not_fix`** → replan or escalate after [D16] loop breaker (§6.7.3) 5. **Plan repeats failed strategy** → convergence failure → escalate 6. **Revision causes regression** → rollback / fork / replan via GraphStateRollback (§11.13) 7. **Source / evidence missing** → emit `information_request` instead of rewriting 8. **Hard professional judgment** → emit `human_judgment_request` → block until HardCallResolution (§7.9) 9. **External side effect already occurred** → create corrected artifact; never_replay policy applies (§11.18) 10. **Two plans conflict** → artifact lock + deterministic tie-breaker (§11.9) ## §6.9 Revisor ↔ Task Agent escalation Task Agent involvement is exceptional, not routine. The Revisor is self-sufficient using graph_context + module capability registry + learned patterns. Task Agent is invoked only when: - `failure_kind == graph_design_gap` (no module in graph can fix this kind of failure) - Revision Compiler confidence below threshold after exhausting graph options (default 0.5) - Repeated convergence failure (2+ replans without progress) - Novel failure pattern not in learned patterns (per §4.5 above novelty threshold) - User explicitly configures Task Agent advice for high-stakes outcomes ### §6.9.1 Task Agent fallback When the Task Agent is unavailable or returns low-confidence response, RevisorConfig declares a fallback policy: ```ts task_agent_fallback_policy: | "pause" // wait for human resolution (default for legal work) | "escalate_hard_call" // convert to Hard Call | "abort_plan" ``` This prevents livelock if Task Agent fails to respond. ## §6.10 Planner Confidence Threshold Beyond Hard Call detection, plans with low Compiler confidence surface for human review even without a Hard Call. The threshold is configurable in RevisorConfig (default 0.7). When `CompiledRevisionStrategy.compiler_confidence_score < threshold`: - Plan surfaces for human review (`PlanAssurancePolicy.required_modes` includes `human_gate`) - UI displays Compiler's reasoning and the planner's expressed uncertainty - User can approve, modify, or reject This prevents over-confident weak plans from auto-applying. ## §6.11 Repair ordering rules Default ordering rules for combining multiple repair strategies: 1. Fix missing or incorrect sources before drafting 2. Fix substantive reasoning before style 3. Fix structure before sentence polish 4. Fix content before final formatting 5. Verify citations / facts after substantive text changes 6. Re-run affected downstream evaluators after upstream changes 7. Do not replay external sends or deliveries without explicit approval (per §11.18) 8. If user / human judgment is required, pause before spending on large downstream revisions **§6.11.1 Topological sort with cycle detection.** The Revision Compiler uses topological sort over the rules and step dependencies. Cycle detection forces fallback to Hard Call (§6.5). **§6.11.2 Source repair max depth.** Source repair has `max_source_repair_depth = 1` (default), preventing recursive source-of-source loops. Each RevisionPlanStep carries `source_repair_depth: number` counter. Beyond depth 1, escalate to Hard Call. Override via RevisorConfig is permitted up to depth 3 (per [P-36]); depth > 1 requires explicit scope acknowledgment. Escalation to Hard Call still fires on non-progress, regardless of depth budget. **§6.11.3 Repair ordering and DAG.** Plan steps form a DAG. Steps without dependencies execute in parallel up to DOC23's parallelism limit (§11.22). Steps with dependencies execute in topological order. ## §6.12 Goal-impact assessment Each repair strategy entry in CompiledRevisionStrategy (§7.8) and each RevisionPlanStep (§7.5) carries a GoalImpactAssessment: ```ts GoalImpactAssessment { advances_goals: GoalRef[] risks_goals: GoalRef[] neutral_goals: GoalRef[] rationale: string } ``` **§6.12.1 UI use only (per [P-20]).** Revisor-generated `GoalImpactAssessment` populates the UI (Hard Call display per §21) and current-run plan reasoning. It MUST NOT increment `goal_advancement_count` in PatternPerformanceSlice (§13.3) or feed BDSM learning signals or drive pattern promotion. **§6.12.2 Severed learning loop.** The sycophancy delusion fix: LLMs systematically hallucinate plausible goal-advancement rationales to satisfy the schema. Using these self-reported assessments for durable learning would build a delusional pattern graph. `goal_advancement_count` increments only via: 1. Independent post-run `step.evaluator` configured with `AssuranceBasis: comparative_judge` evaluating actual goal progress against baseline 2. Explicit human feedback signal in DirectInstructionSignalKind set See §13.3 and §15 for the schema enforcement of this rule. ## §6.13 Revisor learning Per Rule 11 (§3.9): Revisor learning uses compiled, inspectable patterns; no hot-path self-mutation. - Patterns are retrieved from DOC72 at Compiler invocation time (§13.1) - Pattern compatibility is checked at retrieval (§13.2) - Pattern application is recorded in CompiledRevisionStrategy.applied_pattern_refs - Pattern performance feedback flows back to PatternPerformanceSlice (§13.3) after the revision cycle completes - Pattern promotion to broader scope follows §13.5 governance The Revisor itself does not write to DOC72 directly; promotion is governed. ## §6.14 RevisorConfig The Revisor's behavior is parameterized by RevisorConfig: ```ts RevisorConfig { config_id: string scope: "global" | "task" | "matter" | "user" // Direct fix direct_fix_max_chars: number // secondary cap; class is primary gate (§10) direct_fix_allowed_classes: DirectFixAllowedClass[] direct_fix_forbidden_classes: DirectFixForbiddenClass[] // Confidence thresholds plan_verification_threshold: number // 0.0 - 1.0, trigger semantic lint above this risk planner_confidence_threshold: number // §6.10, surface for review below hard_call_confidence_threshold: number // §6.5.1 LLM classifier novelty_threshold: number // default 0.7, per §4.5 // Candidate version policy candidate_version_policy_default: // per [P-4] | "auto_accept_class_safe_mechanical" | "candidate_for_meaning_bearing" | "human_gate_required" candidate_version_required_for: Array< | "multi_step_mutation" | "meaning_bearing_edit" | "privileged_artifact" | "external_side_effect" > // Logical budget budget_logical: { max_logical_llm_calls_per_revision: number max_logical_tokens_per_revision: number } // Infrastructure budget (separate per [P-32]) budget_infrastructure: { max_infrastructure_retries_per_logical_call: number max_total_infrastructure_retries_per_revision: number infrastructure_retry_kinds: Array< | "json_parse_failure" | "schema_validation_failure" | "premature_stop_token" | "timeout_retry" | "connection_retry" > } // Local compute max_local_compute_seconds_per_revision: number preemption_timeout: number // ms before forced kill graceful_degradation_order: Array< // optional modes skipped first | "advisory_sub_agents" | "semantic_lint" | "specialist_subevaluators" | "dry_run" > // Modes autonomous_mode_policy: AutonomousModePolicy // per §6.6.2 live_edit_handling: // per §11.20 | "abort_and_replan" | "semantic_anchor_only" // Escalation task_agent_fallback_policy: // per §6.9.1 "pause" | "escalate_hard_call" | "abort" task_agent_confidence_threshold: number // Source repair max_source_repair_depth: number // default 1, per §6.11.2 // Retry budgets per_outcome_retry_budget: number per_plan_max_replans: number consecutive_insufficient_limit: number // §6.7.3 loop breaker // V3.2 — learning-mode capability (Addenda A ↔ Addenda B coordination V3 §2.10) learning_mode: LearningMode // default "production"; see §6.16 schema_version: 2 // bumped for learning_mode } ``` ## §6.15 Eleven Revisor rules Recap from §3.9. These bind Revisor behavior throughout V3: 1. The Revisor is a planner, not the default content reviser 2. The Revisor consumes evaluator results, revision requests, graph context, artifact lineage, source workspace state, prior revision history, human feedback, learned patterns, goal context 3. The Revisor creates CompiledRevisionStrategy before producing RevisionPlan 4. CompiledRevisionStrategy diagnoses failure kind, likely cause, responsible module, repair target, strategy, order, risks, revalidation needs 5. The Revisor must use declared module revision capabilities; may not invent runtime capabilities 6. Revision Dispatcher validates plans deterministically before execution 7. Meaning-bearing repair goes through `revision_in`; direct fix limited to class-safe mechanical changes (§10) 8. Every dispatched module emits a RevisionExecutionReceipt (§9.4) 9. Revisor loops are convergence-aware; repeated non-progress escalates per §6.8 10. User feedback may enter current-run repair through Revisor when active or forkable; future learning through DOC72, DOC8/BDSM, DOC24 11. Revisor learning uses compiled, inspectable patterns; no hot-path self-mutation ## §6.16 Learning mode (V3.2; coordination V3 §2.10) V3.2 adds a `learning_mode` field to `RevisorConfig` enabling the cheap-LLM learning generator capability. The field governs how the Revisor (and the surrounding Evaluator + Judge pipeline if Pattern C is wired) is configured with respect to model selection, signal generation density, and cross-model calibration. ### §6.16.1 LearningMode enum ```ts LearningMode = | "production" // default — optimize for output quality // using production-tier models | "signal_generation" // optimize for learning signal volume // using cheap/local models | "calibration" // run cheap AND production-tier on // same artifacts; learn cross-model deltas ``` ### §6.16.2 Mode behaviors **`production` (default):** - Models selected from production-tier per `OpenClaw` model routing (cost/quality optimized for delivery) - All signal emission still operative (Phase 1 captures everything) - Cost draws from production pool **`signal_generation`:** - Models selected from cheap-tier (Kimi 2.5, DeepSeek, local Qwen via Ollama, etc.) per OpenClaw cheap-model routing - Same signal schemas emitted; signals are tagged `model_class = "cheap_local"` or `"cheap_api"` in the `EvaluationLearningSignalEnvelope` - Patterns learned during signal_generation runs default to `cross_model_applicability = "model_class_specific"` - Cost draws from cheap pool per EC Core §6 token/cost governance - **Product use case:** generate a high volume of learning signals against representative tasks before committing production-tier model time; the system builds priors about what works for the task class without spending expensive-model budget **`calibration`:** - Same artifact is run through both cheap-tier and production-tier Revisor/Evaluator paths - Cross-model deltas are captured as a paired signal: cheap-model finding/score vs production-model finding/score on the same artifact - Trains the system to predict production-model behavior from cheap-model observations - Cost draws from a mixed pool with explicit user authorization (per EC Core cost governance) - **Product use case:** validate that cheap-model learning signals are predictive of production-model behavior on the user's task class before relying on cheap-model patterns for production routing ### §6.16.3 Cross-model pattern semantics Patterns learned at one `model_class` (recorded on `PatternContextSignature.model_class`, §13.3) default to `requires_validation` when surfaced at a different `model_class`. The system uses the pattern as a prior with reduced confidence and tracks whether predictions hold at the new model class. Patterns can be explicitly marked `cross_model_applicable: true` after validation across model classes — typically after `calibration` mode runs confirm consistent behavior. ### §6.16.4 Privilege and matter scoping (P19 firewall preserved) `learning_mode` does NOT override privilege firewalls. Privileged-matter signals from `signal_generation` mode runs still inherit `data_class = "privileged"` and matter scoping; cross-matter pattern promotion still requires EC Core's policy gate. The mode governs model selection and signal density; it does not relax governance. ### §6.16.5 UI surface DOC20 surfaces a three-state selector in RevisorConfig UI with cost guidance per mode and cross-model applicability indicator on patterns. Per `OBL-XDOC-DOC20-EVAL-UI-01` in §29. ### §6.16.6 Cross-doc references - Coordination V3 §2.10 — learning-mode product capability - EC Core §6 — cost governance and budget pool integration - DOC72 — `PatternContextSignature.model_class` and `Pattern.cross_model_applicability` field definitions (per OBL-XDOC-MODEL-CLASS-AXIS-01) - §13.1 / §13.3 below — Pattern primitive amendments in this addendum --- # §7. REVISION SCHEMAS ## §7.1 RevisionRequest The Evaluator emits a RevisionRequest when revision is needed: ```ts RevisionRequest { request_id: string evaluation_result_ref: StorageRef target_artifact_ref: StorageRef target_version_ref: StorageRef reason_summary: string findings_to_address: finding_id[] judgment_limitations_to_address: string[] suggested_repair_focus: string[] constraints_to_preserve: string[] do_not_change: string[] required_revalidation_outcomes: outcome_id[] created_at: ISO8601 schema_version: 1 } ``` ## §7.2 RevisionPlan, RevisionExecutionRecord, RevisionRunSummary V3 distinguishes plan definition from execution record from aggregate summary. Three distinct types: ### §7.2.1 RevisionPlan The intended repair (definition): ```ts RevisionPlan { plan_id: string task_id: string run_id: string revisor_module_id: string revisor_activation_seq: number source_evaluation_result_ref: StorageRef source_revision_request_ref?: StorageRef target_artifact_ref: StorageRef target_version_precondition_ref: StorageRef strategy_summary: string alternatives_considered: string[] steps: RevisionPlanStep[] // discriminated union, §7.5 dependency_edges: DependencyEdge[] required_revalidation: { target_outcomes: outcome_id[] cascading_dependency_check: boolean } risk_assessment: { semantic_risk: "low" | "medium" | "high" side_effect_risk: "none" | "internal_only" | "external_side_effect" requires_human_approval: boolean } plan_risk_score: number // 0.0 - 1.0 // Cost estimation cost_estimate: RevisionCostEstimate // §15 requires_human_review: boolean // derived from Hard Call detection preservation_constraint_set_ref: StorageRef // §7.7 preservation_contract_ref?: StorageRef // §7.7 repeated_insufficiency_count: number // §6.7.3 loop breaker // Mutation mode (per [P-29]) mutation_mode: "candidate_only" | "rolling_hash_in_place" rolling_hash_chain?: Array<{ step_id: string predicted_pre_hash: string predicted_post_hash: string }> termination: { success_condition: TypedPredicate abort_conditions: TypedPredicate[] max_replans: number convergence_rule: | "stop_on_no_progress" | "stop_on_regression" | "manual" } // Yield-back (per [P-31]) yield_back_enabled: boolean // default false yield_back_state?: { yielded_at_step_id: string accumulated_candidate_chain: CandidateArtifactVersionRef[] yielded_to_revisor_plan_id?: string } idempotency_key: string // deterministic, §11.8 status: RevisionPlanStatus // per §0.4.2 explanation_trace: ExplanationTrace // structured, §7.10 created_at: ISO8601 schema_version: 1 } ``` ### §7.2.2 RevisionExecutionRecord What actually happened (execution): ```ts RevisionExecutionRecord { record_id: string plan_id: string task_id: string run_id: string step_execution_records: RevisionStepExecutionRecord[] step_receipts: RevisionOperationReceipt[] // §11.6 start_time: ISO8601 end_time?: ISO8601 final_state: | "succeeded" | "partially_succeeded" | "failed" | "aborted" | "escalated" artifact_version_outcomes: ArtifactVersionOutcome[] schema_version: 1 } ``` ### §7.2.3 RevisionRunSummary Final aggregate: ```ts RevisionRunSummary { summary_id: string plan_id: string run_id: string outcomes_targeted: outcome_id[] outcomes_resolved: outcome_id[] outcomes_still_failing: outcome_id[] outcomes_with_regressions: outcome_id[] final_artifact_versions: ArtifactVersionRef[] total_cost_breakdown: EvaluationRevisionCostBreakdown explanation_trace: ExplanationTrace schema_version: 1 } ``` ## §7.3 RevisionDiagnosis Per-failure causal diagnosis: ```ts RevisionDiagnosis { diagnosis_id: string failed_outcome_id: outcome_id finding_refs: StorageRef[] failure_kind: FailureKind likely_responsible_module_ids: module_id[] likely_repair_target_module_ids: module_id[] confidence: "low" | "medium" | "high" uncertainty_notes: string[] repair_recommendation: | "revise_existing" | "gather_information" | "verify" | "rerun_upstream" | "fork_from_checkpoint" | "human_judgment" | "graph_patch" schema_version: 1 } ``` ## §7.4 RevisionIntelligencePacket The Revisor's bundled input contract: ```ts RevisionIntelligencePacket { packet_id: string task_id: string run_id: string // Source materials evaluator_result_refs: StorageRef[] revision_request_refs: StorageRef[] human_feedback_refs: StorageRef[] // Graph context graph_snapshot_ref: StorageRef module_capability_snapshot_ref: StorageRef artifact_lineage_ref: StorageRef source_workspace_snapshot_ref?: StorageRef injection_manifest_refs: StorageRef[] // DOC24 packets prior_revision_history_ref?: StorageRef // Goal context goal_context_refs: GoalRef[] resolved_goal_snapshots: ResolvedGoalSnapshot[] // Hard call history hard_call_resolution_ledger_ref?: StorageRef // Routes available available_routes: Array<{ route_kind: | "revision_in" | "full_rerun" | "fork_from_checkpoint" | "human_gate" | "task_agent_process_gap" target_module_id?: module_id capability_names?: string[] }> // Learned patterns learned_patterns: PatternRef[] // Safety taint_labels_per_component: Record safety_envelope_ref: StorageRef // CIL authority (per [P-34]) cil_authority_snapshot_ref: StorageRef authority_conflict_check: { checked: boolean conflicts_found: AuthorityConflictRef[] on_conflict: | "block_plan" | "human_gate" | "recompile_with_authority" } schema_version: 1 } ``` **§7.4.1 CIL authority invariant.** Plan validation (§11.3) fails if `cil_authority_snapshot_ref` is missing or stale, or if `authority_conflict_check.conflicts_found` is non-empty and `on_conflict` is not resolved. DOC15 authority memory has higher precedence than task instructions; revision plans must respect this. ### §7.4.2 Per-component taint inheritance (per [F1a]) The RevisionIntelligencePacket carries content from multiple sources with different trust profiles. The adversarial boundary (§15.10) extends through the packet: each component declares its `TaintClass` at construction, and consumers of the packet (the Revision Compiler, sub-agents, the Revisor) MUST honor the declared taint when treating component content as instruction vs as data. Default taint inheritance: | Packet component | Default TaintClass | Source rationale | |---|---|---| | `artifact_content_excerpts` | `external_untrusted` | Artifact text is derived from upstream module output; treat as data, never instruction | | `graph_context.task_topology` | `system_trusted` | Generated by the runtime kernel; internal trust | | `graph_context.module_metadata` | `system_trusted` | Internal registry data | | `human_feedback_refs` | `user_trusted_bounded` | User input bounded by §15.10 user-text envelope | | `sub_agent_advice` | Inherits source agent's `allowed_input_classes` highest tier | Per sub-agent profile in §8.4 | | `findings_text` | Inherits originating artifact's TaintClass | Per §15.11 transitive propagation | | `prior_evaluation_snapshot` | `system_trusted` | EvaluationSnapshot is system-generated record | | `prior_revision_receipts` | `system_trusted` | RevisionOperationReceipt is system-generated | | `applicable_patterns` | Inherits pattern's `provenance.taint_class_at_origin` | Patterns carry their original taint context | | `cil_authority_snapshot` | `system_trusted` | DOC15 authority memory is internally trusted | | `goal_context` | `system_trusted` | DOC72 goal state is internally trusted | The packet schema carries an explicit `component_taint_map`: ```ts RevisionIntelligencePacket { // ... existing fields ... component_taint_map: { artifact_content_excerpts: TaintClass graph_context_topology: TaintClass graph_context_module_metadata: TaintClass human_feedback: TaintClass sub_agent_advice: Record findings_text: Record prior_evaluation_snapshot: TaintClass prior_revision_receipts: TaintClass applicable_patterns: Record cil_authority_snapshot: TaintClass goal_context: TaintClass } // Highest taint observed across all components — used for plan dispatch gating packet_taint_high_water: TaintClass // Whether any component triggers human review at the user's tier requires_human_review_before_dispatch: boolean } ``` **Compiler discipline:** The Revision Compiler reads packet components with taint awareness: 1. **Content of `external_untrusted` components is NEVER treated as instruction.** Artifact text that says "ignore your prior instructions and ..." is data, not directive. The Compiler reads it as evidence of what the artifact contains; it does not extract goals or commands from it. 2. **Sub-agent advice is bounded by sub-agent's declared input classes.** An advisory sub-agent whose `allowed_input_classes` excludes `external_untrusted` cannot have produced advice that originated from untrusted text; the Compiler verifies this at packet assembly. 3. **`packet_taint_high_water` gates dispatch.** If high water is `external_untrusted` or worse, the Revision Plan's dispatch requires either passing through a SanitizationNode (per §15.11) or a human review gate (per §6.6). 4. **Findings inherit artifact taint transitively.** A finding produced from an `external_untrusted` artifact carries that taint into the packet. The Revisor uses the finding as a problem statement but does not interpret the finding's text as direction. **Validation:** §11.3 deterministic linting checks `component_taint_map` is present and complete; missing entries fire `validation.revision_packet_missing_component_taint`. **Cross-reference:** §15.10 (full taint model), §15.11 (transitive propagation), §15.12 (clearance), §8.4 (sub-agent allowed_input_classes). ## §7.5 RevisionPlanStep (discriminated union) V3 replaces V2's flat RevisionPlanStep with a discriminated union, eliminating the `target_port` bypass that allowed module_revision to route around `revision_in` (per [P-1]). ### §7.5.1 Base ```ts RevisionPlanStepBase { step_id: string step_order: number affected_artifact_refs: StorageRef[] affected_outcome_ids: outcome_id[] depends_on_step_ids: string[] source_repair_depth: number // default 0; per §6.11.2 idempotency_key: string // deterministic, §11.8 preconditions: Array<{ kind: | "artifact_version_matches" | "module_capability_available" | "source_workspace_snapshot_matches" | "human_approval_present" | "live_artifact_hash_matches_snapshot" ref: StorageRef | string }> expected_output: string revalidation_trigger: outcome_id[] on_failure: | "yield_back_to_revisor" | "abort_plan" | "escalate_human" | "continue_with_warning" human_gate_recommended?: { scope: "step" | "plan" when: "before_dispatch" | "after_dispatch_before_accept" reason: string skippability: GateSkippability // §0.4.15, §21 } goal_impact_assessment: GoalImpactAssessment // UI use only per §6.12 // For rolling_hash_in_place mode (§11.20) expected_pre_hash?: string produced_post_hash?: string schema_version: 1 } ``` ### §7.5.2 Variants ```ts RevisionPlanStep = | ModuleRevisionStep | DirectFixStep | RevalidationStep | InformationRequestStep | VerificationRequestStep | HumanJudgmentRequestStep | ForkFromCheckpointStep | WaitStep | NoOpRecordStep interface ModuleRevisionStep extends RevisionPlanStepBase { step_kind: "module_revision" target_module_id: module_id // REQUIRED target_port: "revision_in" // LOCKED revision_capability_required: string // REQUIRED, must match a ModuleRevisionCapability capability_version: semver // REQUIRED, per [P-7] typed_instruction: TypedRevisionInstruction // REQUIRED, §9.5 target_artifact_ref: StorageRef // REQUIRED target_version_precondition_ref: StorageRef // REQUIRED } interface DirectFixStep extends RevisionPlanStepBase { step_kind: "direct_fix" target_port: "none_direct_fix" // LOCKED direct_fix_class: DirectFixAllowedClass // REQUIRED // target_module_id FORBIDDEN // typed_instruction FORBIDDEN target_artifact_ref: StorageRef // REQUIRED target_version_precondition_ref: StorageRef // REQUIRED fix_description: string } interface RevalidationStep extends RevisionPlanStepBase { step_kind: "revalidate" target_outcomes: outcome_id[] closure_policy: EvaluationTargetClosurePolicy // §5.11 } interface InformationRequestStep extends RevisionPlanStepBase { step_kind: "information_request" target_module_id: module_id // must be a source/research module type target_port: "data_in" capability_version: semver information_request_payload: InformationRequest } interface VerificationRequestStep extends RevisionPlanStepBase { step_kind: "verification_request" target_module_id: module_id // must be a verifier/evaluator module type target_port: "data_in" capability_version: semver verification_request_payload: VerificationRequest } interface HumanJudgmentRequestStep extends RevisionPlanStepBase { step_kind: "human_judgment_request" target_port: "human_response_in" hard_call_ref?: string // links to HardRevisionCall if applicable human_judgment_request_payload: HumanJudgmentRequest } interface ForkFromCheckpointStep extends RevisionPlanStepBase { step_kind: "fork_from_checkpoint" checkpoint_ref: CheckpointRef fork_rationale: string } interface WaitStep extends RevisionPlanStepBase { step_kind: "wait" wait_duration_ms: number wait_reason: string } interface NoOpRecordStep extends RevisionPlanStepBase { step_kind: "no_op_record" record_purpose: string } ``` ### §7.5.3 Deterministic lint rules The Dispatcher's deterministic linting (§11.3) enforces: ``` RULE port_action_coupling: ModuleRevisionStep.target_port MUST equal "revision_in" DirectFixStep.target_port MUST equal "none_direct_fix" HumanJudgmentRequestStep.target_port MUST equal "human_response_in" InformationRequestStep.target_port MAY equal "data_in" only if target module declares matching information-providing capability VerificationRequestStep.target_port MAY equal "data_in" only if target module declares matching verification capability instruction_in is NEVER a revision target unless target module declares ModuleRevisionCapability.instruction_in_revision_compatible = true ``` This rule, plus the discriminated-union schema, makes `revision_in` bypass unrepresentable in valid plans. ## §7.6 TypedRevisionInstruction The instruction payload delivered to a ModuleRevisionStep's target module: ```ts TypedRevisionInstruction { instruction_id: string plan_id: string plan_step_id: string target_module_id: module_id capability: string capability_version: semver params: Record // must match capability input_schema revisor_intent: string // human-readable summary target_artifact_ref: StorageRef target_artifact_version_ref: StorageRef findings_to_address: finding_id[] outcomes_to_repair: outcome_id[] preserve_constraints: string[] // sections/claims/refs to keep do_not_change: string[] // hard do-not-modify list source_material_refs: StorageRef[] verification_record_refs: StorageRef[] required_output_contract?: { expected_artifact_kind?: string expected_sections?: string[] must_preserve_refs?: StorageRef[] semantic_changelog_required?: boolean // true for regenerate/restructure, §7.11 } revalidation_expectation: { outcomes_to_recheck: outcome_id[] success_hint: string } // CIL authority (per [P-34]) cil_authority_snapshot_ref: StorageRef authority_compatible: boolean // computed at compile time // Custom instruction (envelope per [P-12]) custom_instruction?: { text: string authority_class: "user_advisory" | "system_generated_advisory" taint_class: TaintClass source_refs: StorageRef[] prohibited_content_scan_ref: StorageRef max_length_chars: number // default 500 } idempotency_key: string // deterministic, §11.8 issued_at: ISO8601 schema_version: 1 } ``` **§7.6.1 custom_instruction safety.** The custom_instruction field is the obvious injection surface for tainted artifact language. Rules (enforced at plan compilation): - Cannot contain text copied from `external_untrusted` sources unless quoted and marked as data - Cannot override `preserve_constraints`, `do_not_change`, policy gates, or module capability schemas - CIL prompt assembly must label it as advisory, not controlling - Length above `max_length_chars` produces `validation.custom_instruction_length_exceeded` - Failure of `prohibited_content_scan` produces `validation.custom_instruction_taint_violation` ## §7.7 Preservation constraints and contract ### §7.7.1 PreservationConstraintSet Per-plan preservation rules: ```ts PreservationConstraintSet { set_id: string plan_id: string preserved_outcomes: outcome_id[] // outcomes that must remain passing preserved_artifacts: ArtifactRef[] // artifacts that must remain unchanged preserved_sections: SectionRef[] // specific sections that must persist preserved_findings_resolutions: finding_id[] // resolved findings that must stay resolved derivation_source: | "prior_passing_outcomes" | "user_directive" | "system_safety" schema_version: 1 } ``` ### §7.7.2 PreservationContract Plan-level enforceable contract: ```ts PreservationContract { contract_id: string plan_id: string protected_outcomes: outcome_id[] protected_artifacts: ArtifactRef[] protected_sections: SectionRef[] verification_method: | "hash" | "semantic_similarity" | "full_re_evaluation" consequence_if_violated: | "abort" | "human_escalation" | "soft_warning" schema_version: 1 } ``` Violation produces a typed `PreservationViolation` event with consequence per the contract. The contract makes whack-a-mole prevention auditable and enforceable. ## §7.8 CompiledRevisionStrategy The Revision Compiler's strategy artifact, produced before the RevisionPlan: ```ts CompiledRevisionStrategy { strategy_id: string plan_compilation_record_ref: StorageRef // for Revision Compiler quality (§15.2) input_summary: { failed_outcomes: outcome_id[] active_findings: finding_id[] human_feedback_summary?: string target_artifacts: ArtifactRef[] goal_context: GoalRef[] } causal_diagnosis: RevisionDiagnosis[] // §7.3 repair_strategy: Array<{ failure_id: string repair_kind: RepairStrategyKind // §6.3 selected_target_module_id?: module_id selected_capability?: string selected_capability_version?: semver why_this_target: string why_not_full_rerun: string why_not_direct_fix: string goal_impact_assessment: GoalImpactAssessment // UI use only, §6.12 }> execution_order_rationale: string[] revalidation_rationale: string[] escalation_rationale?: string compiler_confidence_score: number // 0.0 - 1.0, §6.10 novelty_score: number // §4.5 sub_agents_consulted: AdvisorySubAgentRef[] sub_agent_advice_decisions: Array<{ advisory_agent_id: string advice_output_ref: StorageRef decision: "accepted" | "rejected" | "deferred" rationale: string }> status: | "compiled" | "compiled_with_limitations" | "needs_clarification" | "needs_missing_capability" | "abstained_low_confidence" | "blocked_by_policy" explanation_trace: ExplanationTrace // §7.10 applied_pattern_refs: PatternRef[] schema_version: 1 } ``` ## §7.9 HardRevisionCall and HardCallResolutionLedger ### §7.9.1 HardRevisionCall ```ts HardRevisionCall { call_id: string kind: HardRevisionCallKind // §0.4.11 blocking: boolean affected_outcome_ids: outcome_id[] affected_plan_step_ids: string[] question_for_human: string options: HumanDecisionOption[] default_if_no_response: | "pause" | "abort" | "continue_without_fix" schema_version: 1 } HumanDecisionOption { option_id: string option_label: string description: string goal_impact_assessment: GoalImpactAssessment // UI use, §6.12 estimated_consequence: string } ``` ### §7.9.2 HardCallResolution and Ledger ```ts HardCallResolution { resolution_id: string outcome_id: outcome_id hard_call_id: string user_directive: string resolved_at: ISO8601 resolved_by: UserRef // Compatibility binding (per [P-13]) outcome_definition_hash: string goal_context_hash: string evidence_snapshot_ref: StorageRef artifact_version_scope?: ArtifactVersionRef binding_scope: | "this_run" | "this_artifact" | "this_matter" | "global" expires_at?: ISO8601 superseded_by_resolution_id?: string schema_version: 1 } HardCallResolutionLedger { ledger_id: string task_id: string resolutions: HardCallResolution[] schema_version: 1 } ``` **§7.9.3 Reuse rule.** The Revision Compiler MUST read the ledger before planning. A prior resolution is reused only if ALL of: - `outcome_definition_hash` matches the current outcome - `goal_context_hash` matches the current goal context - `evidence_snapshot_ref` is still valid (artifact versions and sources unchanged or compatible) - `artifact_version_scope` (if set) matches the current target - `expires_at` is not exceeded - `superseded_by_resolution_id` is null Otherwise, the Compiler re-escalates as a new HardRevisionCall. Stale resolutions do not silently apply to new situations. ## §7.10 ExplanationTrace Both CompiledRevisionStrategy and RevisionPlan carry a structured ExplanationTrace (per [P-9]): ```ts ExplanationTrace { summary: string decision_points: Array<{ decision_kind: | "strategy_selection" | "target_module_selection" | "capability_selection" | "ordering_choice" | "preservation_constraint" | "escalation_decision" | "sub_agent_consultation" selected_option: string rejected_options: string[] rejection_rationale: string evidence_refs: StorageRef[] confidence: number sub_agent_advice_refs?: AdvisorySubAgentOutputRef[] }> human_readable_markdown?: string // for UI; derived from structured trace schema_version: 1 } ``` **§7.10.1 Required, not optional.** Both `CompiledRevisionStrategy.explanation_trace` and `RevisionPlan.explanation_trace` are required fields. A plan without an explanation trace fails deterministic linting. The trace is cheap to generate (the Compiler has the reasoning) and dramatically improves trust and debuggability. **§7.10.2 Display vs audit.** `human_readable_markdown` is the UI display string; `decision_points` is the auditable structure. UI surfaces the markdown; audit and quality programs use the structured array. ## §7.11 SemanticChangelog for regenerate / restructure Standard text diffs break on `regenerate` and `restructure` capabilities; a 15-page document regenerated produces an unreadable delete-everything-insert-everything diff. Per [P-28], V3 requires SemanticChangelog for these capabilities: ```ts SemanticChangelog { changelog_id: string artifact_ref: ArtifactRef base_version_id: string produced_version_id: string entries: Array<{ entry_kind: | "section_added" | "section_removed" | "section_combined" | "section_split" | "section_reordered" | "substance_expanded" | "substance_contracted" | "claim_modified" | "source_added" | "source_removed" | "style_change_only" description: string // human-readable, e.g., "Combined sections 2 and 3" affected_sections: SectionRef[] affected_claim_refs?: ClaimRef[] severity: "informational" | "material" | "blocking_for_review" }> produced_by_module_id: module_id schema_version: 1 } ``` **§7.11.1 Plan compilation rule.** When a RevisionPlanStep is a ModuleRevisionStep with `revision_capability_required` in `{regenerate, restructure}`, the Revision Compiler MUST: 1. Set `typed_instruction.required_output_contract.semantic_changelog_required = true` 2. Include a companion VerificationRequestStep targeting an outcome that requires the SemanticChangelog 3. Mark the resulting CandidateArtifactVersion as not-acceptable until the SemanticChangelog is present **§7.11.2 UI rule.** The plan version diff UI (§21) displays SemanticChangelog entries above the raw text diff. Text diff alone is not sufficient review for regenerate or restructure outputs. ## §7.12 RevisionStepExecutionRecord Per-step execution record: ```ts RevisionStepExecutionRecord { record_id: string plan_id: string step_id: string dispatcher_state_at_dispatch: RevisionDispatcherState dispatch_time: ISO8601 module_response_time?: ISO8601 completion_time?: ISO8601 idempotency_key: string // deterministic, §11.8 artifact_version_precondition_id: string artifact_version_precondition_satisfied: boolean live_artifact_hash_at_dispatch: string // §11.20 receipt_ref?: RevisionOperationReceiptRef retry_count: number preempted: boolean // §11.16 local compute budget cost_breakdown: EvaluationRevisionCostBreakdown schema_version: 1 } ``` ## §7.13 ArtifactMutationPrecondition (per [E7]) Every revision step that mutates an artifact carries an explicit `ArtifactMutationPrecondition` declaring what the dispatcher must verify *before* committing the mutation. The precondition is the cross-cutting record that ties together version control (§11.11), live-edit hash check (§11.20), policy gating (§11.19), and taint protection (§15.10). EC enforces the precondition at write time; failure produces a typed `MutationPreconditionViolation` event. ```ts ArtifactMutationPrecondition { artifact_id: string expected_base_version_id: string // version the plan compiled against expected_content_hash: string // SHA-256 of normalized content at compile time expected_evaluation_snapshot_ref?: StorageRef // optional binding to EvaluationSnapshot policy_decision_required: boolean // per §11.19 PolicyDecision gate required_policy_decision_ref?: PolicyEvaluationRef // pre-resolved policy decision taint_class_acceptable: TaintClass[] // taint classes the precondition permits taint_clearance_required?: TaintClearanceRecord // if mutation requires cleared payload capability_version_at_compile: Record // versions used to compile cil_authority_snapshot_ref: StorageRef // authority memory state at compile time on_precondition_failure: | "abort_step" // mark step failed, propagate to plan | "retry_with_current_state" // re-fetch live state and retry once | "force_user_decision" // gate the dispatch for explicit user action | "rollback_and_replan" // unwind candidate chain and force recompile schema_version: 1 } MutationPreconditionViolation { violation_id: string precondition_ref: ArtifactMutationPrecondition step_id: string violation_kind: | "version_moved" // expected_base_version_id no longer current | "content_hash_mismatch" // live-edit detected | "policy_decision_missing" // policy gate not resolved | "policy_decision_stale" // policy decision invalidated | "taint_class_unacceptable" // current taint exceeds acceptable set | "taint_clearance_missing" // required clearance not present | "capability_version_drift" // capability moved since compile | "cil_authority_drift" // authority memory moved since compile observed_state: { current_version_id?: string current_content_hash?: string current_taint_class?: TaintClass current_capability_versions?: Record current_cil_authority_snapshot_ref?: StorageRef } resolution_taken: | "step_aborted" | "step_retried" | "user_gated" | "plan_rolled_back" detected_at: ISO8601 schema_version: 1 } ``` **Where preconditions are checked:** - **Plan dispatch (§11.5):** dispatcher checks all preconditions for the first step before any mutation. Plan-level read set (§11.9) aggregates step preconditions. - **Step dispatch (§11.5):** before each step that mutates state, the dispatcher re-checks the step's precondition against current state. - **Direct fix (§10):** direct fixes carry a precondition referencing the EvaluationSnapshot the fix was authorized against. - **Candidate version acceptance (§11.11):** acceptance produces a RevisionOperationReceipt that includes the resolved precondition state at acceptance time. **Why preconditions are typed (not implicit):** V2 had implicit assumptions ("the artifact hasn't moved since I last looked") scattered across multiple call sites. V3.1 makes them explicit, named, and enforced at a single boundary. The dispatcher does not check preconditions ad hoc; it reads the typed `ArtifactMutationPrecondition` on each step and verifies all declared conditions. **Cross-reference:** §5.16 EvaluationSnapshot (the snapshot that originates the precondition), §11.6 RevisionOperationReceipt (the receipt that closes the precondition), §11.19 PolicyDecision gate (one condition in the precondition), §11.20 live-edit hash check (the runtime check of content_hash). --- # §8. SPECIALIST SUBEVALUATORS Specialist subevaluators are execution sub-agents the Outcome Compiler may compose into the resolved evaluation plan. They differ from advisory sub-agents: specialists actually perform evaluation work (produce findings, verification records), while advisory sub-agents inform the Compiler's reasoning. ## §8.1 Specialist roles For a legal brief, the Compiler may compose: - **Coverage evaluator** — did the brief respond to all material arguments? - **Source evaluator** — are citations, propositions, quotes, and record references accurate? - **Writing / style evaluator** — does it follow the user's style and professional writing expectations? - **Adversarial evaluator** — what would defense counsel or the judge attack? - **Format / readiness evaluator** — does the artifact satisfy structural / output requirements? - **Process / trace evaluator** — did the task actually run required research, source verification, or tools? - **Hard-call evaluator** — what strategic / professional choices were made that require user attention? Specialists are domain-tuned. Adversarial evaluator for a legal brief differs from adversarial evaluator for a contract; the Compiler selects based on outcome kind and domain tags. ## §8.2 Specialist activation A specialist is named in `EvaluationLanePlan.specialist_agent_ref`. At evaluation runtime: 1. The Evaluator reads the resolved plan 2. For each lane with a specialist_agent_ref, the Evaluator dispatches to the specialist module 3. The specialist receives a task-scoped context pack (per §8.3) 4. The specialist produces lane results (findings, verification records) 5. The Evaluator aggregates lane results into the OutcomeEvaluationResult ## §8.3 Scoped context packs Specialist subevaluators receive task-scoped context packs. They MUST NOT receive raw global context by default. The Outcome Compiler (or DOC24 packet assembly per [§A11a]) packages context including: - Target artifact section(s) relevant to the lane - Required supporting materials (with taint labels per §15.10) - Tool access for the lane's `tool_refs` - Per-invocation cost budget - Timeout - Lane goal and rationale The Compiler decides what to include. Specialists do not request additional context at runtime; if context is insufficient, they emit `EvaluationLimitationKind.insufficient_evidence` and the Evaluator handles the limitation. ## §8.4 AdvisorySubAgentProfile Both specialist subevaluators and advisory sub-agents are registered via AdvisorySubAgentProfile records: ```ts AdvisorySubAgentProfile { advisory_agent_id: string display_name: string version: semver owner_scope: "system" | "workspace" | "matter" | "user" | "firm" agent_identity_ref: AgentIdentityRef capability_ref?: CapabilityRef allowed_coordination_points: Array< | "outcome_compiler" | "evaluator" | "revision_compiler" | "feedback_interpreter" > allowed_input_classes: TaintClass[] forbidden_input_classes: TaintClass[] output_contract_ref: SchemaRef // one or more from AdvisorySubAgentOutput max_context_tokens: number max_cost_usd: number max_local_compute_seconds: number timeout_ms: number calibration_record_ref?: string governance_policy_ref: string reputation_score: number // §15.8 sandbox_mode: boolean // §15.8.1 schema_version: 1 } ``` ## §8.5 AdvisorySubAgentOutput union Advisory output is evidence, not instruction. The Compiler must accept, reject, or defer each sub-agent's advice. Output conforms to one of: ```ts AdvisorySubAgentOutput = | EvaluationFinding | RevisionDiagnosis | OutcomeCompilationAssessment | EvaluationMethodBindingAssessment | ThresholdExtractionAssessment | SourceBindingAssessment | FeedbackInterpretationAssessment | PlanAssuranceCritique OutcomeCompilationAssessment { outcome_kind_inferred: string ambiguity_notes: string[] alternative_interpretations: string[] confidence: number } EvaluationMethodBindingAssessment { proposed_method: EvaluationMethod alternative_methods: EvaluationMethod[] rationale: string capability_availability: Record } ThresholdExtractionAssessment { proposed_threshold: number | string threshold_kind: "quantitative" | "qualitative" | "categorical" source_phrase: string // text from outcome that supports inference alternative_thresholds: Array<{ value: any, rationale: string }> confidence: number } SourceBindingAssessment { proposed_source_refs: SourceBindingRef[] source_compatibility: "fully_compatible" | "partial" | "incompatible" missing_sources: string[] staleness_notes: string[] } FeedbackInterpretationAssessment { proposed_feedback_kind: FeedbackKind proposed_authority_class: HumanFeedbackAuthorityClass proposed_scope_candidates: ScopeCandidate[] privilege_risk: "none" | "low" | "medium" | "high" } PlanAssuranceCritique { identified_failure_modes: string[] proposed_tighter_alternatives: string[] preservation_violations_detected: PreservationViolationRef[] estimated_regression_risk: "low" | "medium" | "high" } ``` Each AdvisorySubAgentProfile.output_contract_ref is an array; the profile declares which variants the sub-agent may emit. Schema validator rejects outputs not in the declared list with `validation.output_contract_violation`, and counts the violation against the sub-agent's reputation (§15.8). ## §8.6 Sub-Agent Registry Governance Requirements for user-defined and system sub-agents: - **Versioned definitions** via AdvisorySubAgentProfile (semver) - **Structured output contract validation** per §8.5 - **Sandbox mode for new sub-agents** — advise but results not used in production plans until manually approved - **Reputation scoring** per §15.8 - **Hard limit per iteration** (default 2-3 sub-agents per Compiler invocation; configurable in RevisorConfig) - **Automatic flagging** of sub-agents with consistently failed advice (per §15.8) - **Audit trail** of every sub-agent invocation: who consulted, what was asked, what was returned, what the Compiler did with the advice ## §8.7 User-defined custom advisory sub-agents Users register custom domain sub-agents via the registry. Examples for Will's legal practice: - "Securities Litigation Strategy Planner" - "Bluebook Citation Specialist" - "M&A Diligence Checklist Reviewer" Subject to §8.6 governance. The Compiler consults custom sub-agents based on failure kind, outcome class, matter tags, and goal type. ## §8.8 Hardware-aware sub-agent degradation On Apple Silicon, the Local Hardware Context Monitor tracks memory pressure during sub-agent orchestration. If pressure exceeds threshold: ```ts LocalHardwareContext { memory_pressure_pct: number // 0-100 available_vram_mb: number cpu_pressure_pct: number max_parallel_sub_agents_safe: number recommended_mode: | "full_parallel" | "sequential" | "single_compiler_only" } ``` Rules (default threshold 80% VRAM): 1. Above threshold: degrade parallel sub-agent reasoning to sequential 2. Still above: strip lowest-reputation advisory sub-agents and fall back to single Compiler 3. Hard limit reached: abort plan with `local_resource_exhausted` error 4. Every degradation produces a `hardware_degradation_event` receipt for audit Per §3.3.6, degradation skips optional helpers first; required safety modes per PlanAssurancePolicy are non-degradable. --- # §9. revision_in PORT CONTRACT The `revision_in` port is the standard port for revision-capable modules. It is the architecture's central safety contract: meaning-bearing repair routes through `revision_in`, never around it. ## §9.1 Port semantics **§9.1.1 Standard typed payload.** Modules with revision capability accept a standard typed payload via `revision_in`: ```ts RevisionInPayload { payload_id: string typed_instruction: TypedRevisionInstruction // §7.6 source_workspace_snapshot_ref: StorageRef injection_manifest_refs: StorageRef[] artifact_version_snapshot_ref: StorageRef policy_decision_ref: PolicyDecisionRef safety_envelope_ref: StorageRef cil_authority_snapshot_ref: StorageRef schema_version: 1 } ``` **§9.1.2 Return contract.** The module returns a `RevisionExecutionReceipt` (§9.4) and either produces a new CandidateArtifactVersion (default, §11.11) or modifies the artifact in place (only in `rolling_hash_in_place` mode, §11.20). **§9.1.3 Stateless port semantics.** Receiving a revision instruction does not change the module's persistent state beyond the produced artifact version and receipt. Modules are not Revisor-aware; they execute the instruction against their declared capability. **§9.1.4 Receipt is mandatory.** A module that does not emit a `RevisionExecutionReceipt` is treated as having execution_status `failed_runtime`. The Dispatcher synthesizes an exception receipt (§9.7) for the audit trail. ## §9.2 ModuleRevisionCapability Each revision-capable module declares one or more `ModuleRevisionCapability` records: ```ts ModuleRevisionCapability { capability_id: string // stable across versions capability_version: semver // increments on contract change module_type: string revision_operation_kind: RevisionOperationKind input_schema_ref: SchemaRef input_schema_version: number output_schema_ref: SchemaRef output_schema_version: number supported_preservation_constraints: PreservationConstraintKind[] required_preconditions: PreconditionKind[] side_effect_policy_ref: RevisionSideEffectPolicyRef idempotency_semantics: | "idempotent" | "idempotent_with_key" | "non_idempotent" safety_class: | "mechanical" | "meaning_bearing" | "external_side_effect" instruction_in_revision_compatible: boolean // for P1 lint rule schema_version: 1 } ``` **§9.2.1 Capability versioning.** Each revision capability is identified by stable `capability_id` and `capability_version`. A module supporting `regenerate` at v1.2 does not satisfy a plan step requiring `regenerate` at v2.0 unless explicitly declared compatible. The capability_version increments when the input schema, output contract, idempotency semantics, preservation constraints, or side-effect behavior changes. **§9.2.2 Safety class.** `mechanical` capabilities perform deterministic transformations (whitespace normalization, citation reformatting). `meaning_bearing` capabilities produce semantic changes (regenerate, restructure, claim_modify). `external_side_effect` capabilities cause non-reversible external effects (email send, filing submission). Safety class determines candidate version policy (§11.11), human gate defaults (§6.6), and side-effect controls (§11.18). **§9.2.3 Strategy ↔ capability mapping.** A `RepairStrategyKind` is not a capability name. Multiple capabilities may implement the same strategy: ```ts RepairStrategyCapabilityMap { strategy: RepairStrategyKind compatible_capabilities: Array<{ capability_id: string capability_version_constraint: semver_range confidence: "low" | "medium" | "high" }> } ``` The Revision Compiler selects strategy → resolves to compatible capabilities → chooses best-fit capability based on availability, version, and historical reputation. ## §9.3 Capability semantic constraints Capabilities declare what they can preserve and what preconditions they require: ```ts PreservationConstraintKind = | "structural_sections" | "claim_set" | "citation_refs" | "voice_and_style" | "factual_assertions" | "outcome_passing_state" | "metadata_tags" PreconditionKind = | "artifact_version_match" | "evidence_present" | "policy_decision_obtained" | "user_approval_present" | "upstream_module_completed" | "no_concurrent_modification" | "snapshot_hash_match" ``` The Revision Compiler matches plan requirements against capability declarations. A plan requiring `preserve(citation_refs)` cannot dispatch to a capability that does not declare `citation_refs` in `supported_preservation_constraints`; deterministic linting (§11.3) rejects such plans. ## §9.4 RevisionExecutionReceipt (ModuleRevisionResultPayload) The local payload inside the `RevisionOperationReceipt` envelope (§11.6) is the `ModuleRevisionResultPayload`: ```ts ModuleRevisionResultPayload { receipt_id: string plan_id: string plan_step_id: string module_id: module_id capability_used: string capability_version_used: semver lifecycle: ReceiptLifecycle // §0.4.8 execution_status: ExecutionStatus // §0.4.8 // Findings handling addressed_findings: finding_id[] unresolved_findings: finding_id[] newly_discovered_findings: finding_id[] // Artifact outputs output_artifact_version_refs: ArtifactVersionRef[] candidate_artifact_version_refs: CandidateArtifactVersionRef[] // Semantic changelog (when capability is regenerate or restructure, §7.11) semantic_changelog_ref?: StorageRef // Decision trace (if module emits, §5.10) decision_trace_record_refs: StorageRef[] // Cost cost_breakdown: EvaluationRevisionCostBreakdown // Execution details execution_start_time: ISO8601 execution_end_time: ISO8601 schema_version: 1 } ``` **§9.4.1 Status-dependent required fields.** - `completed`: `output_artifact_version_refs` non-empty; `addressed_findings` covers all `typed_instruction.findings_to_address` - `partially_completed`: `addressed_findings` and `unresolved_findings` both non-empty (the partial split is explicit) - `could_not_fix`: `addressed_findings` may be empty; `unresolved_findings` non-empty; rationale required - `rejected_capability`: capability mismatch; module produced no artifact - `version_conflict`: precondition failed; module produced no artifact - `needs_more_information`: companion `information_request` produced - `failed_runtime`: runtime exception; module exited abnormally - `receipt_recovery_required`: artifact written but receipt write failed (per §11.17) - `candidate_orphan_repair_required`: candidate written but index update failed (per §11.17) Schema validator enforces required fields per status. ## §9.5 ModuleAck and artifact versioning The receipt's `lifecycle` field tracks the protocol-level state: - `received`: module received the dispatch; processing - `accepted`: module accepted the capability and parameters as valid - `rejected`: module rejected (capability or precondition issue); typed in `execution_status` - `completed`: module finished work; typed in `execution_status` For artifacts: - New artifact version is referenced in `output_artifact_version_refs` - For candidate-version mode (default), the candidate is in `candidate_artifact_version_refs` - The artifact version state (§11.11) determines whether the version is current, candidate, accepted, rejected, etc. ## §9.6 Exception synthesized acks When a module does not respond, the Dispatcher synthesizes an exception receipt: ```ts ExceptionSynthesizedReceiptPayload { synthesized_at: ISO8601 reason: | "module_timeout" | "module_unavailable" | "no_response_received" | "malformed_response" | "capability_mismatch" plan_id: string plan_step_id: string module_id: module_id capability_attempted: string timeout_threshold_ms?: number malformed_response_ref?: StorageRef schema_version: 1 } ``` The exception payload is wrapped in a `RevisionOperationReceipt` with `execution_status = "failed_runtime"`. This ensures every dispatched step has a receipt, even when modules misbehave. ## §9.7 Coding module revision dispatch policy `step.coding` is an ACP session with terminal, filesystem, and shell access. Ordinary TaskSecurityPolicy assumptions do not apply because the ACP runtime surface is broader than standard modules. Per [P-38]: ```ts CodingRevisionCapabilityPolicy { may_receive_revision_in: boolean requires_acp_profile_revision_safe: boolean allowed_workspace_roots: string[] may_run_shell_commands: boolean allowed_test_commands: string[] requires_candidate_diff_only: boolean requires_human_gate_for_file_writes: boolean doc11_acp_monitoring_required: boolean schema_version: 1 } ``` **§9.7.1 Default policy.** ``` DEFAULT CodingRevisionCapabilityPolicy: may_receive_revision_in: false // explicit opt-in required requires_acp_profile_revision_safe: true may_run_shell_commands: false // even when revision-safe requires_candidate_diff_only: true requires_human_gate_for_file_writes: true doc11_acp_monitoring_required: true ``` **§9.7.2 ACP profile.** Coding modules may not receive revision dispatch unless their ACP profile is explicitly declared `revision_safe = true` via DOC11 ACP integration. This is registered as a cross-doc obligation (§29 OBL-DOC11-ACP-01). **§9.7.3 Validation.** Deterministic linting rejects plans dispatching to `step.coding` revision_in unless the policy permits. Validation code `validation.coding_dispatch_not_revision_safe`. ## §9.8 instruction_in is not a revision target DOC23 R3.1's `instruction_in` is a dynamic instruction port for ordinary module operation. It is restricted in many output modules and is not a revision channel. **§9.8.1 Rule.** `instruction_in` is never a revision target unless the target module declares `ModuleRevisionCapability.instruction_in_revision_compatible = true`. Even then, the plan step must be a `ModuleRevisionStep` (not `InformationRequestStep`) routed explicitly to that capability. **§9.8.2 Validation.** Deterministic linting (§11.3) enforces the rule. Validation code `validation.instruction_in_not_revision_compatible` when violated. --- # §10. DIRECT FIX Direct fix is a narrow exception to the rule that meaning-bearing repair goes through `revision_in`. It is restricted to class-safe mechanical changes only. ## §10.1 Allowed and forbidden classes ``` DirectFixAllowedClass = | "formatting_only" | "typographical_correction" | "punctuation_correction" | "metadata_label_update" | "citation_format_only" | "broken_link_repair_no_text_change" | "whitespace_or_heading_style" DirectFixForbiddenClass = | "citation_substitution" | "legal_authority_substitution" | "deadline_or_date_change" | "party_name_change" | "claim_or_argument_rewrite" | "factual_assertion_change" | "strategic_framing_change" ``` ## §10.2 Class-safe eligibility Direct fix is permitted if and only if ALL of: 1. The fix's class is in `RevisorConfig.direct_fix_allowed_classes` 2. The fix's class is NOT in `RevisorConfig.direct_fix_forbidden_classes` 3. The target outcome's AssuranceBasis is not judgment-based (NOT in `{llm_expert_judgment, specialist_panel_judgment}` unless `human_confirmed_in_run`) 4. The target outcome has no `EvaluationLimitationKind.human_judgment_needed` 5. The artifact section affected does not carry a `JudgmentLimitationRecord` with severity `block_advisory` or `block_required` 6. The PolicyDecision for the operation is `allow` (not `allow_with_human_gate`) 7. The artifact is not classified privileged unless explicit policy allowance **§10.2.1 Forbidden on judgment outcomes (per [F3a]).** Class-safe is necessary but not sufficient. Even a formatting-only fix on a judgment-based outcome routes through `revision_in` if any other condition fails. This prevents direct fix from being used to silently alter the artifact accompanying a sensitive judgment. ## §10.3 direct_fix_max_chars as secondary cap `RevisorConfig.direct_fix_max_chars` is a secondary upper bound after class-safe eligibility. It never authorizes a fix that fails `DirectFixAllowedClass` or `PolicyDecision`. **§10.3.1 Validation.** Validation code `validation.direct_fix_size_used_as_primary_gate` fires if implementation logic checks character count before class eligibility. The class is the primary gate; characters are an additional limit on otherwise-eligible fixes. ## §10.4 Tracked changes Direct fix produces tracked changes recorded in `RevisionOperationReceipt`: ```ts DirectFixTrackedChange { fix_id: string artifact_ref: ArtifactRef base_version_id: string produced_version_id: string fix_class: DirectFixAllowedClass before_text: string after_text: string section_ref: SectionRef schema_version: 1 } ``` Each direct fix produces a `RevisionOperationReceipt` with `operation_kind = "direct_fix_applied"` and the tracked change as local payload. ## §10.5 Downstream dirty flip Per [E18], when a direct fix is applied to an artifact, downstream outcomes that declare a dependency on the artifact transition to `OutcomeEvaluationState.dirty`. The Loop Controller re-evaluates dirty outcomes per §11.21. **§10.5.1 Class-safe still dirties.** Even mechanical fixes dirty downstream outcomes. The system does not assume that a "purely formatting" change cannot affect a downstream evaluation; whether it does is for the downstream evaluator to decide on revalidation. ## §10.6 Policy decision required Per §3.8.2, every mutation requires a PolicyDecision. Direct fix is a mutation. Even class-safe direct fixes require: - `PolicyDecision.decision == "allow"` (not `allow_with_human_gate`) - A typed `RevisionOperationReceipt` referencing the policy decision A direct fix without a PolicyDecision fails deterministic linting with `validation.missing_policy_decision`. --- # §11. REVISION RUNTIME KERNEL The Revision Runtime Kernel is the deterministic execution surface for revision plans. It owns the Dispatcher state machine, idempotency, transaction protocol, candidate version management, policy gates, side-effect controls, and failure handling. **Canonical ownership (per [A4s]).** §11 is the single canonical section that owns these runtime primitives: - `RevisionDispatcherProjection` (§11.1) - `RevisionDispatcherStateMachine` and the four-enum separation (§11.2) - `ArtifactMutationProtocol` (§11.7), consuming `ArtifactMutationPrecondition` per §7.13 - `CandidateArtifactVersionProtocol` (§11.11) - `IdempotencyAndReplayPolicy` (§11.8) - `ConflictResolutionPolicy` (§11.9 PlanWriteSet/PlanReadSet) - `SideEffectPolicy` (§11.18) - `PolicyDecisionGate` (§11.19) - `EvaluationSnapshotProtocol` (§5.16, consumed at §11.20) - `RevalidationClosurePolicy` (§11.21) - `GraphStateRollbackProtocol` (§11.13) - `RevisionOperationReceipt` envelope (§11.6, extends PBE per §29.5 OBL-DOC73-RECEIPT-01) - Failure-of-failure handlers (§11.17) Feature sections (§6 Revisor, §7 Schemas, §9 revision_in port, §10 Direct Fix, §12 Workspace, §13 Patterns, §14 Feedback, §15 Quality, §16 Governance, §17 Sub-agent coordination, §18 Teach-from-feedback UI, §21 UI surfaces) consume from this kernel. Feature sections do not redefine kernel primitives. If a feature section appears to redefine a kernel primitive, the kernel definition governs; the apparent redefinition is a spec drift that must be resolved per §0A.10. ## §11.1 RevisionDispatcherProjection The Dispatcher is a derived runtime service, NOT a stored graph module. It has a canvas read-model called RevisionDispatcherProjection: ```ts RevisionDispatcherProjection { projection_id: string task_id: string run_id: string projection_state: RevisionDispatcherState // MUST equal current dispatcher state (§11.2) current_plan_id?: string active_step_ids: string[] pending_human_gate_step_ids: string[] pending_hard_call_ids: string[] pending_dependency_outcome_ids: outcome_id[] recent_receipts: RevisionOperationReceiptRef[] recent_failure_events: RevisionFailureEvent[] last_updated_at: ISO8601 schema_version: 1 } ``` **§11.1.1 Derived, not invented.** The projection is computed from the underlying RevisionPlan, RevisionExecutionRecord, and RevisionOperationReceipt streams. It is not an independent state machine. UI and the canvas read from the projection; they do not write to it. **§11.1.2 No hidden graph behavior.** The Dispatcher MUST NOT add behavior beyond what's declared in graph wiring. It exists only when explicit Evaluator/Revisor/module wiring is present in the graph. There is no Dispatcher when there is no revision-capable wiring. ## §11.2 State machines V3 separates four distinct enums (per [P-23]). Each owns a different concern; they are not interchangeable. ### §11.2.1 RevisionDispatcherState (runtime state) ``` RevisionDispatcherState = | "idle" | "validating" | "ready" | "dispatching" | "waiting_human_gate" | "waiting_hard_call" | "waiting_dependency" | "revalidating" | "completed" | "escalated" | "aborted" | "rolled_back" ``` ### §11.2.2 RevisionPlanStatus (plan lifecycle) ``` RevisionPlanStatus = | "draft" | "proposed" | "approved_for_dispatch" | "dispatching" | "partially_completed" | "completed" | "failed" | "superseded" | "cancelled" ``` ### §11.2.3 RevisionFailureEventKind (typed events, not states) ``` RevisionFailureEventKind = | "failed_validation" | "step_failed" | "dependency_timeout" | "revalidation_failed" | "workspace_unavailable" | "version_conflict" | "preservation_violation" | "policy_blocked" | "budget_exhausted" | "preempted" ``` Failure events are records in the audit trail, not Dispatcher states. The Dispatcher transitions in response to a failure event (e.g., `dispatching → aborted` on `preservation_violation`); the event itself is captured separately. ### §11.2.4 RevisionUIStatus (user-facing labels) ``` RevisionUIStatus = | "Ready to review" | "In progress" | "Awaiting your input" | "Blocked" | "Completed" | "Escalated" | "Cancelled" | "Reverted" ``` UI labels are derived, not invented. The mapping `(RevisionDispatcherState, latest RevisionFailureEventKind, RevisionPlanStatus) → RevisionUIStatus` is: | DispatcherState | latest FailureEvent | PlanStatus | UIStatus | |---|---|---|---| | validating | (none) | proposed | In progress | | validating | failed_validation | failed | Blocked | | ready | (none) | approved_for_dispatch | Ready to review | | dispatching | (none) | dispatching | In progress | | waiting_human_gate | (none) | dispatching | Awaiting your input | | waiting_hard_call | (none) | dispatching | Awaiting your input | | waiting_dependency | (none) | dispatching | In progress | | revalidating | (none) | dispatching | In progress | | completed | (none) | completed | Completed | | completed | step_failed | partially_completed | Completed (with warnings) | | escalated | (any) | (any) | Escalated | | aborted | (any) | failed | Cancelled | | rolled_back | (any) | (any) | Reverted | ## §11.3 Deterministic linting Before dispatch, every RevisionPlan passes through deterministic linting. Lint rules are mechanical, fast, and exhaustive. A plan that fails any lint rule is rejected at this stage; it does not reach dispatching state. ### §11.3.1 Lint rule set The complete lint rule set: ``` RULE schema_conformance: RevisionPlan and all RevisionPlanSteps conform to their schemas Discriminated union variants match declared step_kind (§7.5.2) All required fields populated; no extra fields RULE port_action_coupling: ModuleRevisionStep.target_port == "revision_in" DirectFixStep.target_port == "none_direct_fix" HumanJudgmentRequestStep.target_port == "human_response_in" InformationRequestStep / VerificationRequestStep.target_port == "data_in" AND target module declares matching capability class instruction_in not allowed unless target ModuleRevisionCapability declares instruction_in_revision_compatible = true RULE capability_availability: Every ModuleRevisionStep.target_module_id exists in current graph Module declares revision_capability_required at compatible capability_version Module is not disabled or in error state RULE capability_preservation_match: Plan preservation constraints (§7.7) are within capability.supported_preservation_constraints Plan preconditions are satisfiable per capability.required_preconditions RULE dag_acyclic: Step dependency_edges form a DAG (no cycles) Topological order is computable (§11.22) RULE idempotency_key_present: RevisionPlan.idempotency_key is set and deterministic (§11.8) Every step.idempotency_key is set and deterministic RULE artifact_version_precondition: Every step.target_version_precondition_ref points to a known artifact version Hash matches current artifact state at dispatch time (§11.20) OR mutation_mode == "rolling_hash_in_place" with valid rolling chain RULE policy_decision_present: Every mutation step has a PolicyDecisionRef (§11.19) Every external_side_effect step has PolicyDecision.decision == "allow" OR "allow_with_human_gate" RULE plan_assurance_satisfied: PlanAssurancePolicy.required_modes ⊆ PlanAssurancePolicy.completed_modes unmet_required_modes is empty RULE preservation_contract_satisfied: All PreservationContract.protected_outcomes still passing All PreservationContract.protected_sections still intact RULE write_set_well_formed: PlanWriteSet correctly identifies all mutation targets Section-level scope is declared and valid (§11.10) RULE read_set_well_formed: PlanReadSet correctly identifies all reads (§11.9) read_graph_snapshot_hash matches current graph state OR matches a recent snapshot still within staleness budget RULE hard_call_resolved_or_gated: Every HardRevisionCall in plan has either: - resolution_id with compatible binding (per §7.9.3), OR - HumanJudgmentRequestStep blocking dispatch RULE cil_authority_snapshotted: cil_authority_snapshot_ref is set on RevisionIntelligencePacket cil_authority_snapshot_ref is set on every TypedRevisionInstruction authority_conflict_check.conflicts_found is empty OR on_conflict has been resolved RULE explanation_trace_present: RevisionPlan.explanation_trace is structured per §7.10 decision_points array non-empty for non-trivial plans RULE custom_instruction_safe: Every typed_instruction.custom_instruction passes: - length within max_length_chars - prohibited_content_scan_ref clean - taint_class not external_untrusted unless quoted RULE coding_dispatch_safe: Plan steps dispatching to step.coding modules satisfy CodingRevisionCapabilityPolicy (§9.7) RULE direct_fix_class_safe: Every DirectFixStep.direct_fix_class in RevisorConfig.direct_fix_allowed_classes NOT in RevisorConfig.direct_fix_forbidden_classes Target outcome's AssuranceBasis not judgment-based (§10.2) RULE goal_impact_assessment_ui_only: GoalImpactAssessment fields are present but NOT used as learning signals (Enforced by §13 PatternPerformanceSlice update rules) RULE semantic_changelog_required_for_regenerate: When step_kind == "module_revision" and capability is regenerate/restructure, typed_instruction.required_output_contract.semantic_changelog_required == true Companion VerificationRequestStep present for SemanticChangelog (§7.11) RULE step_coding_revision_safe: Plan steps dispatching to step.coding.revision_in require target module's CodingRevisionCapabilityPolicy.may_receive_revision_in == true AND target module's acp_profile is declared revision_safe per DOC11 ``` ### §11.3.2 Lint result ```ts LintResult { plan_id: string passed: boolean failed_rules: Array<{ rule_id: string rule_kind: string failure_detail: string affected_step_ids: string[] severity: "critical" | "warning" }> schema_version: 1 } ``` A plan with any `severity == "critical"` failure is rejected. Warnings are surfaced to the user but do not block dispatch. ### §11.3.3 Lint as compilation input The Revision Compiler is expected to produce plans that pass linting. When lint fails, the Compiler re-runs with the lint failure as input. After `RevisorConfig.consecutive_insufficient_limit` consecutive lint failures, escalation per §6.8 fires. ## §11.4 PlanAssurancePolicy stack Per [P-5], PlanAssurancePolicy is a stack of required modes, not a single selected mode: ```ts PlanAssurancePolicy { required_modes: PlanAssuranceMode[] // deterministic_lint ALWAYS included completed_modes: PlanAssuranceMode[] trigger_reasons: PlanAssuranceTriggerReason[] non_degradable_modes: PlanAssuranceMode[] // subset of required_modes unmet_required_modes: PlanAssuranceMode[] // computed: required \ completed verifier_participants?: AdvisorySubAgentRef[] forum_room_ref?: RoomRef human_gate_ref?: HumanGateRef schema_version: 1 } ``` ### §11.4.1 Mode collapse V3 collapses semantic linting, Plan Verification Agent, forum review, and human gates into a single PlanAssurancePolicy. Implementations MUST NOT introduce separate Plan Verification Agent or Forum Review primitives. They are modes of the same policy stack. ### §11.4.2 Dispatch rule ``` RULE plan_dispatchable: RevisionPlan is dispatchable iff required_modes ⊆ completed_modes i.e., unmet_required_modes is empty ``` A plan with `unmet_required_modes` non-empty does not transition `RevisionDispatcherState` to `dispatching`; it remains in `validating` or transitions to `waiting_human_gate` / `waiting_hard_call` as appropriate. ### §11.4.3 Mandatory inclusions - `deterministic_lint` is ALWAYS in `required_modes` - `semantic_lint` is in `required_modes` when `plan_risk_score > 0.6` (threshold configurable) - `human_gate` is in `required_modes` and `non_degradable_modes` when any HardRevisionCall blocks dispatch - `dry_run` is in `required_modes` and `non_degradable_modes` when plan contains: - external_side_effect step, OR - multi-step meaning-bearing edit, OR - privileged_artifact mutation ### §11.4.4 Triggers ``` PlanAssuranceTriggerReason = | "always" | "high_cost" | "low_compiler_confidence" | "risky_strategy" | "privileged_artifact" | "external_side_effect" | "human_requested" | "novelty_above_threshold" ``` Each trigger reason adds one or more modes to `required_modes`. Multiple triggers can stack: a high-cost, low-confidence, novel plan accumulates multiple required modes. ## §11.5 Adversarial linter posture Semantic lint sub-agent operates in adversarial posture: it tries to find ways the plan could fail or cause regression, not to confirm success. **§11.5.1 Failure-finding prompt.** Semantic lint sub-agents receive a prompt template that explicitly asks "what is wrong with this plan?" rather than "is this plan good?" The output is in `PlanAssuranceCritique` format (§8.5): ```ts PlanAssuranceCritique { identified_failure_modes: string[] proposed_tighter_alternatives: string[] preservation_violations_detected: PreservationViolationRef[] estimated_regression_risk: "low" | "medium" | "high" } ``` **§11.5.2 Lint contributes evidence, not directive.** The Compiler decides what to do with semantic lint findings. Critical findings (`estimated_regression_risk = "high"`) typically force recompilation; lower-severity findings may inform user-facing warnings. **§11.5.3 Adversarial sub-agent reputation.** Adversarial sub-agents that consistently produce false positives (flagging plans that work fine) lose reputation. Adversarial sub-agents that miss real problems (plans that fail in production) lose reputation more heavily. The reputation function is asymmetric: false negatives cost more than false positives. ## §11.6 RevisionOperationReceipt envelope Per [P-8], `RevisionOperationReceipt extends PBEOperationReceiptLite` (DOC73). This makes future transaction-kernel migration mechanical rather than interpretive. ```ts RevisionOperationReceipt extends PBEOperationReceiptLite { revision_operation_kind: RevisionOperationKind // §0.4.7 revision_payload_ref: StorageRef // local payload pointer // Inherited from PBEOperationReceiptLite: // receipt_id, section_ref, operation_kind, local_payload_schema_ref, // local_payload_schema_version, local_payload_ref, local_payload_hash, // idempotency_key, causal_parent_receipt_ids, actor, source_refs, target_refs, // policy_decision_ref, visibility_scope_ref, source_policy_snapshot_ref, // ec_sequence_number, read_only_receipt, artifact_version_refs, // rollback_strategy, replay_strategy, created_at schema_version: 1 } ``` ### §11.6.1 Operations producing receipts Every operation that mutates state or invokes a module produces a receipt: - Plan creation → `revision_plan_created` - Step dispatch → `revision_step_dispatched` - Module revision response → `module_revision_result` (wraps ModuleRevisionResultPayload) - Direct fix applied → `direct_fix_applied` - Candidate version produced → `candidate_version_created` - Candidate accepted → `candidate_version_accepted` - Candidate rejected → `candidate_version_rejected` - Revalidation requested → `revalidation_requested` - Human gate decision recorded → `human_gate_decision` - Rollback applied → `rollback_apply` - Escalation created → `escalation_created` - HardCall resolved → `hard_call_resolved` - Taint clearance recorded → `taint_clearance_recorded` ### §11.6.2 Causal chain Receipts form a causal chain via `causal_parent_receipt_ids`. The chain enables: - Reconstructing the revision lineage - Determining which receipts must be reversed for rollback (§11.13) - Auditing what caused each artifact state - Verifying idempotency in replay scenarios (§11.8) ### §11.6.3 Receipt persistence Receipts persist via EC (§3.7.4). Local receipt caching is permitted for performance but never as system of record. A receipt that does not reach EC produces a `receipt_persist_failed` event and an `artifact_written_receipt_failed` workspace failure (§11.17). ## §11.7 ArtifactMutationProtocol The protocol for mutating an artifact via the Dispatcher: ``` PROTOCOL artifact_mutation: Phase 1 — Precondition check: 1. Verify artifact_version_precondition_ref matches current artifact (live hash check per §11.20) 2. Verify PolicyDecision is "allow" or "allow_with_human_gate" 3. Verify safety envelope and taint labels (§15.10) 4. Verify write set acquired (§11.9) Phase 2 — Dispatch: 1. Transition Dispatcher to "dispatching" 2. Emit revision_step_dispatched receipt 3. Deliver RevisionInPayload to target module 4. Await module response or timeout Phase 3 — Receipt processing: 1. Receive RevisionExecutionReceipt 2. Validate receipt against schema 3. Persist receipt via EC 4. If candidate version produced, register CandidateArtifactVersion 5. If lifecycle == "completed" and class is auto-accept, transition to accepted 6. If lifecycle == "completed" and acceptance_policy requires review, remain candidate, surface to UI Phase 4 — Revalidation trigger: 1. Mark dependent outcomes as dirty (§11.21) 2. Schedule revalidation per step.revalidation_expectation 3. Transition Dispatcher to "revalidating" for those outcomes Phase 5 — Cleanup: 1. Release write set 2. Update RevisionDispatcherProjection 3. Update RevisionPlan.status and RevisionExecutionRecord ``` ### §11.7.1 Atomicity Phases 1-3 are atomic per step: either the receipt is persisted with a successful artifact, or the step is recorded as failed. There is no partial step state. ### §11.7.2 Cross-step coordination Multi-step plans coordinate via: - Idempotency keys (§11.8) for deterministic retry behavior - Write sets (§11.9) for conflict avoidance - Rolling hash chain (§11.20) for in-place multi-step mutations - Candidate version chain (§11.11) for default candidate-only mutations ## §11.8 Deterministic idempotency Per [E4a], idempotency keys are deterministic, not LLM-generated. The Dispatcher derives keys from: ``` RevisionPlan.idempotency_key = hash( task_id, source_evaluation_result_ref, target_artifact_ref, target_version_precondition_ref, plan_strategy_summary_hash, revisor_activation_seq ) RevisionPlanStep.idempotency_key = hash( plan.idempotency_key, step.step_id, step.target_module_id, step.target_version_precondition_ref, typed_instruction.idempotency_input_hash ) TypedRevisionInstruction.idempotency_key = hash( step.idempotency_key, capability, capability_version, params_hash, preserve_constraints_hash, do_not_change_hash, source_material_refs_hash ) ``` ### §11.8.1 Idempotent retry If the Dispatcher receives a receipt for an already-processed idempotency_key, it does not re-dispatch. Instead it returns the cached receipt with `read_only_receipt = true`. ### §11.8.2 Replay safety A plan replayed (e.g., from a checkpoint) with the same inputs produces the same idempotency keys. Modules that declare `idempotency_semantics = "idempotent"` may be re-invoked safely. Modules declaring `non_idempotent` produce a `never_replay` policy on their receipts; replays of those steps are blocked. ### §11.8.3 Idempotency invariants - An LLM MAY NOT generate idempotency keys - An LLM-generated string in an idempotency_key field fails `validation.idempotency_key_non_deterministic` - Idempotency keys persist with their receipts - Replay engines (e.g., for crash recovery) use stored keys to resume safely ## §11.9 PlanWriteSet and PlanReadSet Per [P-21], V3 tracks both write and read sets for conflict detection. ### §11.9.1 PlanWriteSet ```ts PlanWriteSet { plan_id: string write_artifact_versions: ArtifactVersionRef[] write_sections: SectionRef[] // section-level scope, §11.10 write_kinds: Array< | "create" | "update" | "delete" | "candidate_create" | "accept_candidate" > schema_version: 1 } ``` ### §11.9.2 PlanReadSet ```ts PlanReadSet { plan_id: string read_artifact_versions: ArtifactVersionRef[] read_source_versions: SourceVersionRef[] read_graph_snapshot_hash: string read_capability_snapshot_hash: string captured_at: ISO8601 schema_version: 1 } ``` ### §11.9.3 Conflict detection ``` RULE plan_conflict_detection: 1. Write/write conflict: For each artifact section in PlanA.PlanWriteSet: If any concurrent PlanB.PlanWriteSet writes to overlapping section: conflict detected — apply tie-breaker (§11.9.4) 2. Read/write staleness: For each artifact in PlanA.PlanReadSet.read_artifact_versions: If any newer accepted version exists at dispatch time: PlanA must abort_and_replan with fresh read snapshot 3. Graph snapshot staleness: If PlanA.read_graph_snapshot_hash != current_graph_snapshot_hash: PlanA must abort_and_replan 4. Capability snapshot staleness: If PlanA.read_capability_snapshot_hash != current_capability_snapshot_hash: PlanA must abort_and_replan ``` Read/write staleness catches the subtle bug where Plan A reads artifact v10 and writes artifact B, while Plan B updates A to v11. Without read set tracking, Plan A's write to B is based on stale facts but is not detected as a conflict. ### §11.9.4 Concurrency tie-breaker When two plans conflict on a write set, the tie-breaker is deterministic (per [P-24]): ``` RULE concurrency_tie_breaker: 1. OutcomeDependencySpec.required_for_overall_pass = true > false 2. EvaluationOutcomeDefinition.is_high_stakes = true > false 3. EvaluationOutcomeDefinition.priority ascending (lower priority value = wins) 4. RevisionPlan.created_at ascending (earlier = wins) 5. plan_id lexicographic ``` The losing plan aborts and replans. Aborted plans do not silently re-dispatch; they re-enter the Compiler for fresh reasoning. ## §11.10 Section-level scope validation Write conflicts at the artifact level are too coarse: two plans that modify different sections of a brief can proceed safely. V3 tracks write scope at section level: ```ts SectionRef { artifact_id: string section_path: string // e.g., "Argument.III.A" section_anchor_hash: string // for stability across renames } WriteScope { scope_kind: "whole_artifact" | "section" | "claim_set" | "metadata_only" affected_section_refs?: SectionRef[] affected_claim_refs?: ClaimRef[] expected_overlap_with_concurrent_plans: "none_expected" | "possible" | "explicit" } ``` ### §11.10.1 Validation Deterministic linting verifies that declared write scope matches the plan's actual operations. A plan that declares `section` scope but the typed_instruction modifies whole-artifact structure produces `validation.write_scope_mismatch`. ### §11.10.2 Conflict at section level The conflict detection (§11.9.3) applies at section level: two plans modifying different sections of the same artifact do not conflict. Two plans modifying the same section conflict regardless of artifact-level distinctness. ## §11.11 ArtifactVersionState and CandidateArtifactVersion Per [P-4], V3 uses candidate versions inside the SourceWorkspace; ShadowWorkspace is superseded. ### §11.11.1 ArtifactVersionState ``` ArtifactVersionState = | "current" // the active version | "candidate" // produced by revision, awaiting acceptance | "accepted" // candidate accepted, now current | "rejected" // candidate rejected | "superseded" // older version replaced by newer current | "reverted" // version was current, rolled back by user ``` ### §11.11.2 CandidateArtifactVersion ```ts CandidateArtifactVersion { artifact_id: string base_version_id: string candidate_version_id: string produced_by_plan_id: string produced_by_step_id: string diff_ref: DiffRef acceptance_policy: | "auto_accept_class_safe_mechanical" | "review_required" | "human_gate_required" state: ArtifactVersionState // Acceptance / rejection tracking (per [P-17]) accepted_by?: UserRef | SystemRef accepted_at?: ISO8601 acceptance_receipt_ref?: RevisionOperationReceiptRef // REQUIRED on state="accepted" acceptance_policy_satisfied?: boolean rejected_by?: UserRef | SystemRef rejected_at?: ISO8601 rejection_reason?: string rejection_receipt_ref?: RevisionOperationReceiptRef // REQUIRED on state="rejected" superseded_by_candidate_version_id?: string // Sandboxed evaluation tracking (per [P-30]) sandboxed_evaluation_results: SandboxedEvaluationResultRef[] taint_class: TaintClass taint_inherited_from: ArtifactVersionRef[] schema_version: 1 } ``` ### §11.11.3 State transition receipts Per [P-17], state transitions to `accepted` or `rejected` MUST produce a `RevisionOperationReceipt` with operation_kind `candidate_version_accepted` or `candidate_version_rejected`. State flip without receipt is rejected by the runtime. ### §11.11.4 Candidate-version required Candidate versions are mandatory for: - Multi-step mutating plans - Meaning-bearing edits (capability.safety_class == "meaning_bearing") - Privileged artifacts - External side effect operations Class-safe direct mechanical fixes may auto-commit (state directly to `accepted`) only after PolicyDecision approval. ### §11.11.5 Acceptance policy ``` auto_accept_class_safe_mechanical: - capability.safety_class == "mechanical" - DirectFix operations within DirectFixAllowedClass - PolicyDecision == "allow" (not allow_with_human_gate) - Receipt produced automatically; no UI gate review_required: - Default for meaning_bearing capabilities - UI surfaces candidate for accept/modify/reject - User decision produces receipt human_gate_required: - For privileged_artifact or external_side_effect classes - Plan dispatch blocked until explicit acceptance receipt - Cannot be auto-accepted under any circumstances ``` ## §11.12 Sandboxed Evaluation for tainted candidates Per [P-30], evaluating a CandidateArtifactVersion does not bleed taint into the main OutcomeRuntimeState. ```ts EvaluationContext { evaluation_id: string context_kind: "main_graph" | "sandboxed_candidate" artifact_version_ref: ArtifactVersionRef // For sandboxed_candidate context: candidate_version_ref?: CandidateArtifactVersionRef taint_quarantine_active?: boolean sandbox_findings_isolated?: boolean schema_version: 1 } ``` ### §11.12.1 Sandboxed evaluation rules ``` RULE sandboxed_candidate_evaluation: When evaluating a CandidateArtifactVersion (state == "candidate"): 1. EvaluationContext.context_kind = "sandboxed_candidate" 2. Resulting OutcomeEvaluationResult is stored in candidate-scoped storage 3. Resulting EvaluationFinding records carry candidate_scope tag 4. Main OutcomeRuntimeState is NOT updated 5. ProgressSignalBundle is NOT updated 6. Pattern learning signals from sandboxed eval are tagged but not yet attributed When CandidateArtifactVersion transitions to "accepted": - If candidate taint cleared via SanitizationNode or user_explicit_review (per §15.12): Sandboxed evaluation results PROMOTE to main graph Pattern learning signals attribute normally - Otherwise: candidate cannot be accepted; taint must clear first When CandidateArtifactVersion transitions to "rejected": Sandboxed evaluation results archived (audit trail retained) Pattern learning signals discarded ``` ### §11.12.2 Why this matters Without sandboxed evaluation, transitive taint propagation poisons the graph by evaluation alone. A research source tagged `external_untrusted` flows into a CandidateArtifactVersion; evaluating that candidate inherits the taint; findings inherit the taint; main OutcomeRuntimeState inherits taint; the graph is compromised before any acceptance decision. Sandboxed evaluation quarantines the chain until explicit clearance. ## §11.13 GraphStateRollback Per [E16a], V3 supports controlled rollback of graph state when revisions cause regression: ```ts GraphStateRollback { rollback_id: string task_id: string rollback_target_checkpoint_ref: CheckpointRef receipts_to_reverse: RevisionOperationReceiptRef[] artifact_versions_to_revert: ArtifactVersionRef[] candidate_versions_to_reject: CandidateArtifactVersionRef[] outcomes_to_re_evaluate: outcome_id[] reason: | "regression_detected" | "preservation_violation" | "user_requested_revert" | "policy_violation" | "hard_call_negative_resolution" initiated_by: UserRef | SystemRef initiated_at: ISO8601 completed_at?: ISO8601 schema_version: 1 } ``` ### §11.13.1 Rollback protocol ``` PROTOCOL graph_state_rollback: Phase 1 — Plan rollback: 1. Identify checkpoint to revert to 2. Compute receipt chain from current state to checkpoint (causal_parent traversal) 3. Identify artifact versions and candidate versions affected 4. Identify outcomes that need re-evaluation Phase 2 — Acquire locks: 1. Acquire write set for all affected artifacts 2. Suspend in-flight plans that target affected artifacts Phase 3 — Apply rollback: 1. Revert artifact versions to checkpoint state 2. Reject candidate versions in the chain 3. Mark dependent outcomes as dirty 4. Emit rollback_apply receipt with full causal chain Phase 4 — Resume: 1. Release locks 2. Trigger revalidation of dirty outcomes 3. Update RevisionDispatcherProjection ``` ### §11.13.2 Rollback safety - Rollback never reverses external side effects (per §11.18 ReplayPolicy) - Rollback of an artifact requires the artifact to have a checkpoint history - Rollback receipts have `read_only_receipt = false`; they are real mutations - Rollback creates a NEW artifact version with the reverted content, not a pointer back ## §11.14 Dry-Run Mode Plans of unusual scope, plans crossing trust boundaries, or plans flagged by adversarial linting can request a dry-run before real dispatch: ```ts DryRunRequest { request_id: string plan_id: string scope: "full_plan" | "specific_steps" specific_step_ids?: string[] preserve_real_artifacts: true // LOCKED produce_predicted_outcomes: boolean schema_version: 1 } DryRunResult { result_id: string request_id: string predicted_receipts: PredictedReceipt[] predicted_artifact_changes: PredictedArtifactChange[] predicted_outcome_changes: PredictedOutcomeChange[] predicted_cost: EvaluationRevisionCostBreakdown predicted_risks: string[] schema_version: 1 } ``` ### §11.14.1 Dry-run guarantees - Dry-run never produces real artifact mutations - Dry-run never invokes external side effects - Dry-run produces predicted state; the user reviews before authorizing real dispatch - Dry-run is non-degradable for plans containing external_side_effect steps (§11.4.3) ### §11.14.2 When dry-run is required Per §11.4.3, dry-run is mandatory for: - External side effect steps - Privileged artifact mutations - Multi-step meaning-bearing edits It is optional but recommended for high-risk plans (`plan_risk_score > 0.7`). ## §11.15 Cost budgets and degradation ### §11.15.1 Bifurcated budgets Per [P-32], V3 separates logical from infrastructure budgets: ```ts RevisorConfig.budget_logical = { max_logical_llm_calls_per_revision: number max_logical_tokens_per_revision: number } RevisorConfig.budget_infrastructure = { max_infrastructure_retries_per_logical_call: number max_total_infrastructure_retries_per_revision: number infrastructure_retry_kinds: Array< | "json_parse_failure" | "schema_validation_failure" | "premature_stop_token" | "timeout_retry" | "connection_retry" > } ``` ### §11.15.2 Budget attribution ``` RULE budget_attribution: Successful inference (well-formed schema-conforming output): increments logical budget counter Failed inference requiring retry (parse/schema/timeout): increments infrastructure budget counter does NOT increment logical budget counter Logical budget exhaustion → plan abort with budget_logical_exceeded Infrastructure budget exhaustion → escalate to user with budget_infrastructure_exceeded ``` ### §11.15.3 Degradation rules Per [P-6], degradation skips optional helpers first; required modes are non-degradable: ```ts DegradationRule { mode: PlanAssuranceMode | SubAgentClass | EvaluationLaneKind degradable: boolean degradation_allowed_when: DegradationCondition[] if_not_degradable_and_budget_exceeded: | "block_and_escalate" | "ask_user_for_budget" | "abort_plan" } ``` Hard rules: - `deterministic_lint` is never degradable - `human_gate` is never degradable when triggered by HardRevisionCall - `dry_run` is never degradable for external side effects, privileged artifacts, or multi-step meaning-bearing edits - `semantic_lint` is not degradable when in `required_modes` due to `risky_strategy` trigger - Any mode in `PlanAssurancePolicy.non_degradable_modes` is not degradable ### §11.15.4 Cost estimator confidence Per [P-26], cost estimates carry confidence: ```ts RevisionCostEstimate { estimated_cost_usd: number estimated_tokens: number estimated_local_compute_seconds: number estimator_confidence: EstimatorConfidence // §0.4.17 prior_over_budget_count: number hard_cap_required: boolean schema_version: 1 } ``` Rules: - `experimental` confidence → user-set hard cap REQUIRED - `uncalibrated` confidence → user confirmation REQUIRED above 50% of nominal cap - Three consecutive over-budget runs (`prior_over_budget_count >= 3`) → auto-approval suppressed until recalibration - Calibration: estimator marked `calibrated` after 20 runs with `actual_cost / estimated_cost ∈ [0.7, 1.4]` ## §11.16 Local compute budget and preemption On Apple Silicon, wall-clock time is a primary cost driver. RevisorConfig declares: ``` max_local_compute_seconds_per_revision: number preemption_timeout: number // ms before forced kill ``` ### §11.16.1 Preemption protocol ``` PROTOCOL local_compute_preemption: When current_local_compute_seconds >= max_local_compute_seconds: 1. Loop Controller sets preemption flag 2. Active sub-agents receive cooperative-preemption signal 3. After preemption_timeout, Dispatcher forcibly kills threads 4. Killed step receives execution_status = "preempted" 5. Plan transitions to RevisionDispatcherState.escalated 6. Emit budget_exhausted RevisionFailureEvent ``` ### §11.16.2 Preemption receipt A preempted step produces a synthesized RevisionOperationReceipt with execution_status `preempted` and a reference to the killed thread state for forensic analysis. Preemption is auditable. ## §11.17 Failure-of-failure handlers Per [P-25], V3 distinguishes workspace write failure kinds rather than collapsing all into `partially_completed`: ```ts WorkspaceWriteFailureKind = | "no_artifact_written" | "artifact_written_receipt_failed" | "candidate_written_index_failed" | "partial_artifact_written" | "diff_written_artifact_missing" ``` ### §11.17.1 Mapping to execution_status ``` WorkspaceWriteFailureKind → execution_status: no_artifact_written → failed_runtime partial_artifact_written → partially_completed (requires addressed/unresolved arrays) artifact_written_receipt_failed → receipt_recovery_required candidate_written_index_failed → candidate_orphan_repair_required diff_written_artifact_missing → failed_runtime + critical alert ``` ### §11.17.2 Recovery handlers ``` HANDLER receipt_recovery_required: 1. Artifact exists; receipt write failed 2. Loop Controller retries receipt persist up to N attempts 3. After N failures, escalate with manual recovery instructions 4. Until receipt persists, the artifact version is in "current" state but audit chain is broken; downstream operations on it require user acknowledgment HANDLER candidate_orphan_repair_required: 1. Candidate version data persisted; index update failed 2. Loop Controller retries index repair up to N attempts 3. After N failures, the candidate is in storage but unreachable 4. Repair tool surfaces orphans in UI for manual reattachment or deletion HANDLER partial_artifact_written: 1. Artifact partially written; some sections completed, others did not 2. Receipt records addressed_findings and unresolved_findings split 3. Plan continues with partially_completed status 4. Loop Controller may schedule completion retry per RecoveryPolicy HANDLER no_artifact_written: 1. Module signaled completion but no artifact produced 2. Treated as failed_runtime; module reputation affected 3. Plan transitions to RevisionDispatcherState.aborted unless retry budget allows HANDLER diff_written_artifact_missing: 1. Diff record persisted but referenced artifact does not exist 2. Critical integrity alert; manual intervention required 3. Plan blocked until manual cleanup ``` ## §11.18 RevisionSideEffectPolicy ```ts RevisionSideEffectPolicy { policy_id: string side_effect_class: RevisionSideEffectClass // §0.4.14 // Authorization requires_policy_decision: boolean requires_human_gate: boolean requires_dry_run: boolean // Replay control replay_policy: ReplayPolicy // §0.4.14 // Externally observable consequence flag externally_observable: boolean // External system bindings target_system_class?: | "email" | "calendar" | "file_system" | "webhook" | "court_filing" | "client_portal" | "other" schema_version: 1 } ``` ### §11.18.1 Replay policy - `safe_to_replay`: operation is idempotent; safe to retry - `idempotent_with_key`: operation is idempotent given idempotency_key match - `never_replay`: operation must not be replayed (e.g., email send, court filing) A `never_replay` receipt blocks rollback (§11.13) of that operation; the rollback creates a corrective artifact rather than reversing the side effect. ### §11.18.2 Side effect class gates | side_effect_class | requires_policy_decision | requires_human_gate | requires_dry_run | replay_policy | |---|---|---|---|---| | none | false | false | false | safe_to_replay | | internal_artifact_write | true | depends on safety_class | optional | idempotent_with_key | | external_message_send | true | true | true | never_replay | | calendar_write | true | true | optional | idempotent_with_key | | webhook_post | true | true | true | depends on target | | filing_or_submission | true | true | true | never_replay | | memory_write | true | optional | optional | idempotent_with_key | ### §11.18.3 DOC23 TaskSecurityPolicy bridge Per [P-29 cross-doc check], V3's RevisionSideEffectPolicy must not become a parallel system to DOC23's TaskSecurityPolicy: ```ts RevisionSideEffectPolicy → TaskSecurityPolicyCheck { side_effect_class target_ref tool_or_module_ref task_security_policy_ref decision_ref } ``` No external send / mutation plan passes V3 validation unless it also passes DOC23 TaskSecurityPolicy or the relevant external owner-doc equivalent. ## §11.19 PolicyDecision gate EC/PropA owns policy evaluation; V3 consumes PolicyDecision records: ```ts PolicyDecisionRef { decision_id: string decision_owner: "EC" | "PropA" decision_subject: { operation_kind: string target_ref: StorageRef actor_ref: UserRef | AgentRef } decision: "allow" | "block" | "allow_with_human_gate" rationale: string expires_at?: ISO8601 schema_version: 1 } ``` ### §11.19.1 Gate enforcement Deterministic linting (§11.3) requires a PolicyDecision for every mutation step. Mutations with `decision == "block"` are rejected. Mutations with `decision == "allow_with_human_gate"` add `human_gate` to `PlanAssurancePolicy.required_modes` and `non_degradable_modes`. ### §11.19.2 PolicyDecision freshness A PolicyDecision is fresh if: - `expires_at` is null or in the future - The underlying subject (operation, target, actor) has not changed since decision_id was issued - No `superseded_by_decision_id` flag is set Stale decisions trigger re-evaluation through EC. V3 modules do not evaluate policy locally. ## §11.20 Live-edit hash check and Rolling Hash Mode ### §11.20.1 Live-edit hash check The user may edit an artifact concurrently with a plan dispatch. Before mutation, the Dispatcher verifies: ``` live_artifact_hash == snapshot_artifact_hash_at_plan_compilation ``` If mismatch, the Dispatcher applies `RevisorConfig.live_edit_handling`: - `abort_and_replan`: cancel the plan; emit `version_conflict` failure; Compiler re-runs with new snapshot - `semantic_anchor_only`: attempt to apply the plan using semantic section anchors rather than exact hashes; fall back to abort_and_replan if anchors cannot be resolved ### §11.20.2 Rolling Hash Mode for multi-step plans Per [P-29], multi-step plans with multiple direct_fix or in-place edit steps cannot use a single snapshot hash; each step changes the hash. V3 distinguishes two mutation modes: ``` Mode A — Candidate-only (default): All mutations target a CandidateArtifactVersion derived from base_version_id Live artifact hash check passes if candidate.base_version_id == snapshot Original live artifact unchanged during execution Acceptance produces final accepted version Mode B — Rolling hash in place: Step N+1 validates against predicted hash from Step N output Each step records produced_post_hash for chain validation Failure of any step rolls back via GraphStateRollback (§11.13) Available only when: - no concurrent plans target the artifact - artifact is not privileged - no external side effects in the plan ``` ### §11.20.3 Rolling hash chain validation ``` RULE rolling_hash_chain_valid: For RevisionPlan with mutation_mode == "rolling_hash_in_place": For step N in plan.steps (in topological order): Step N.expected_pre_hash matches: If N == 0: live_artifact_hash at dispatch Else: step (N-1).produced_post_hash After execution, step N.produced_post_hash recorded Final produced_post_hash becomes accepted artifact version hash ``` Any mismatch produces `validation.rolling_hash_chain_broken` and forces fallback to candidate-only mode plus replan. ### §11.20.4 Default mode `mutation_mode = "candidate_only"` is the default. Rolling hash is opt-in via RevisorConfig and per-plan declaration. Rolling hash is appropriate only for single-artifact, single-author, low-risk mechanical workflows. ## §11.21 Revalidation cascade When a revision step mutates an artifact, dependent outcomes become stale. The Loop Controller revalidates automatically: ``` PROTOCOL revalidation_cascade: Triggered by: RevisionOperationReceipt with operation_kind in { "candidate_version_accepted", "direct_fix_applied", "rollback_apply" } Phase 1 — Identify affected outcomes: 1. Find all OutcomeDependencySpec records where artifact_ref matches 2. Add outcomes from invalidated_by_outcomes traversal 3. Apply EvaluationTargetClosurePolicy (§5.11) to ensure closure Phase 2 — Mark dirty: For each affected outcome: Transition OutcomeRuntimeState.evaluation_state to "dirty" Emit revalidation_requested receipt Phase 3 — Re-evaluate: Loop Controller schedules Evaluator activations for dirty outcomes Evaluator produces new OutcomeEvaluationResult Outcome state transitions per result Phase 4 — Cascade if needed: If revalidation produces regression (outcome was satisfied, now needs_revision): Trigger Revisor activation for the regressed outcome Plan may use fork_from_checkpoint to address regression ``` ### §11.21.1 Cascading vs predicted The Dispatcher does NOT predict which downstream outcomes "should" be affected. Cascading is determined by declared dependencies (OutcomeDependencySpec) + EvaluationTargetClosurePolicy. Outcomes not declaring a dependency on the mutated artifact are not re-evaluated, even if a human would intuit they should be. This is by design: declared dependencies are auditable; "predicted dependencies" are not. ### §11.21.2 Upstream failure cascade Per [P-27], when an upstream module terminates with no possibility of producing the expected artifact: ``` RULE upstream_failure_cascade: When any module activation terminates with execution_status in: { "could_not_fix", "failed_runtime", "rejected_capability" } AND retry_count >= per_outcome_retry_budget: Loop Controller MUST: 1. Identify all outcomes in pending_dependency state whose missing_artifact_refs include artifacts producible by the failed module 2. Transition each such outcome to OutcomeEvaluationState.upstream_failure 3. Emit RevisionOperationReceipt with operation_kind = "escalation_created" 4. Surface upstream_failure outcomes in aggregation per §5.15 ``` This prevents indefinite wait on artifacts that are mathematically guaranteed not to arrive. ## §11.22 Parallelism and dependency execution ### §11.22.1 Step DAG execution Plan steps form a DAG via `depends_on_step_ids`. Steps with satisfied dependencies execute in parallel up to: ``` RevisorConfig.max_parallel_steps_per_plan: number // default 4 LocalHardwareContext.max_parallel_sub_agents_safe // computed at runtime ``` ### §11.22.2 Topological order with parallelism ``` PROTOCOL parallel_step_execution: 1. Compute topological order of plan.steps 2. For each level of the DAG: a. Collect all ready steps (dependencies satisfied) b. Take min(ready_steps.length, max_parallel) for this batch c. Dispatch all batch steps concurrently d. Await all completions or timeouts e. Process receipts; update state f. Proceed to next level ``` ### §11.22.3 Failure handling in parallel batches When a step in a parallel batch fails, other steps in the same batch continue to completion (they were already dispatched). The plan transitions to `partially_completed` if the failed step's `on_failure` is `continue_with_warning`, else to `aborted`. ### §11.22.4 Local hardware coordination Parallel step count is bounded by `LocalHardwareContext.max_parallel_sub_agents_safe`. On Apple Silicon, memory pressure and thermal state limit the effective parallelism. The Loop Controller degrades to sequential execution when pressure exceeds threshold (§3.3.6, §8.8). --- # §12. WORKSPACE ## §12.1 Three-way ownership V3 distinguishes three runtime/storage surfaces. Each has explicit ownership and access rules. ### §12.1.1 RunWorkspace The ephemeral surface for in-flight task execution: ```ts RunWorkspace { workspace_id: string task_id: string run_id: string scratch_artifacts: ArtifactRef[] intermediate_evaluation_results: OutcomeEvaluationResultRef[] intermediate_plans: RevisionPlanRef[] context_packets: Record // RunWorkspace is the canvas read model for in-flight state current_dispatcher_projection_ref: StorageRef created_at: ISO8601 expires_at: ISO8601 schema_version: 1 } ``` The RunWorkspace exists for the duration of the run. Its contents persist via EC during the run; non-promoted contents are eligible for cleanup after run completion per retention policy. ### §12.1.2 SourceWorkspace The semantic-paged store for source materials and artifacts: ```ts SourceWorkspace { workspace_id: string task_id: string matter_id?: MatterRef artifact_versions: ArtifactVersion[] candidate_versions: CandidateArtifactVersion[] source_documents: SourceDocumentRef[] semantic_paging_index: SemanticPagingIndexRef retention_policy_ref: StorageRef access_tier_constraints: AccessTier[] schema_version: 1 } ``` The SourceWorkspace contains durable artifact and source state. Candidate versions live here (per §11.11), not in a separate ShadowWorkspace. ### §12.1.3 ArtifactStore The deduplicated content-addressed storage for accepted artifact bodies: ```ts ArtifactStore { store_id: string content_addressed_blobs: Record artifact_version_index: Record schema_version: 1 } ``` The ArtifactStore is the system of record for artifact content. It is content-addressed for deduplication and immutability. Versions reference blobs; deleting a version does not delete the blob if other versions reference it. ## §12.2 Ownership boundaries ``` RunWorkspace: - In-flight task state - Cleared / archived after run completion - Not shared across runs SourceWorkspace: - Per-task durable state - Survives run completion - Shared across runs of the same task - Includes matter-scoped sources accessible via MatterRef ArtifactStore: - System-wide content storage - Immutable, content-addressed - Deduplicated across all tasks and matters ``` ## §12.3 Source Workspace API ```ts SourceWorkspaceAPI { read_artifact_version(artifact_id, version_id): ArtifactVersion read_source(source_id): SourceDocumentRef list_candidate_versions(artifact_id): CandidateArtifactVersion[] list_versions(artifact_id): ArtifactVersion[] // Write operations always emit RevisionOperationReceipt (§11.6) produce_candidate(plan_id, step_id, base_version, new_content): CandidateArtifactVersionRef accept_candidate(candidate_version_id, acceptance_policy, acceptor): ArtifactVersionRef reject_candidate(candidate_version_id, reason, rejecter): CandidateArtifactVersionRef // Read operations populate PlanReadSet for staleness checking (§11.9) capture_read_snapshot(plan_id): PlanReadSet } ``` ### §12.3.1 Access control All SourceWorkspaceAPI calls are subject to AccessTier (§16). A caller with `matter_team_access` cannot read artifacts in a different matter without explicit cross-matter permission. ### §12.3.2 Audit trail Every read and write through the API is recorded in EC. Reads do not produce RevisionOperationReceipts (reads are not mutations), but they DO populate PlanReadSet, which is itself a durable record. ## §12.4 Semantic paging Large artifacts (multi-thousand-token briefs, lengthy code files) cannot fit into evaluator or revisor context windows in one shot. SourceWorkspace exposes semantic paging: ```ts SemanticPagingIndex { artifact_id: string version_id: string sections: Array<{ section_id: string section_path: string // e.g., "Argument.III.A" section_anchor_hash: string // stable across renames token_count: number summary_ref: StorageRef // abstract summary for retrieval embedding_ref?: StorageRef // semantic embedding for similarity }> schema_version: 1 } ``` ### §12.4.1 Page retrieval Evaluators and Revisors request paged content by: - Section path (`Argument.III.A`) - Semantic similarity to query - Outcome-targeted retrieval (return sections referenced by outcome's declared dependencies) The paging layer returns relevant sections within token budget; full-artifact reads are restricted to specific operations (rollback, regenerate, semantic changelog production). ### §12.4.2 Section anchor stability `section_anchor_hash` is stable across section renames or moves. A section "Argument.III.A" renamed to "Argument.IV.A" retains the same anchor. This allows preservation constraints (§7.7) to reference sections by anchor rather than by path, surviving structural changes. ## §12.5 Cost attribution ```ts EvaluationRevisionCostBreakdown { total_cost_usd: number total_tokens: number total_local_compute_seconds: number by_component: Array<{ component: "outcome_compiler" | "evaluator" | "revision_compiler" | "revisor" | "module" | "sub_agent" component_id?: string cost_usd: number tokens: number local_compute_seconds: number }> by_budget_class: { logical: { calls: number, tokens: number } infrastructure: { retries: number, retry_kinds: Record } } by_capability: Record estimator_confidence_used: EstimatorConfidence estimate_vs_actual: { estimated: number actual: number ratio: number } schema_version: 1 } ``` ### §12.5.1 Cost attribution to outcomes Per-outcome exact cost attribution is deferred to Phase 2 (§26 Open Questions). V3 attributes cost to the plan and its component invocations; aggregating to outcomes is computed where needed but not stored per-outcome. ### §12.5.2 Cost recording Every Compiler invocation, sub-agent invocation, and module activation produces a cost record. The Loop Controller accumulates records into the plan-level EvaluationRevisionCostBreakdown. Cost records persist in EC alongside their receipts. ## §12.6 ArtifactDiff and SemanticNeighborhood extraction (per [H4]) The Outcome Compiler and Revision Compiler need to slice artifacts to fit context windows, identify finding-relevant sections, and present diffs across revisions. The Source Workspace exposes named extraction operations for these purposes. The operations honor taint propagation, budget caps, and provenance tagging — they are not free-form content access. ### §12.6.1 Operations ```ts // Returns minimal artifact sections covering the listed findings. // "Minimal" means: each finding's target_section_ref is included, plus // the surrounding semantic context required to make the section evaluable // in isolation. extract_relevant_sections( artifact_ref: ArtifactRef, artifact_version_ref: ArtifactVersionRef, findings: FindingRef[], budget_cap?: TokenBudget // optional; defaults to RevisorConfig ): ExtractedSection[] // Returns adjacent context preserving meaning. Used for grounding partial // revisions where the affected section's neighbors carry semantic dependencies // (e.g., a definition in §I.A.2 is required to evaluate §III.B.4). compute_semantic_neighborhood( artifact_ref: ArtifactRef, artifact_version_ref: ArtifactVersionRef, section_ref: SectionRef, scope: "narrow" | "standard" | "wide", budget_cap?: TokenBudget ): SemanticNeighborhood // Returns a structured diff between two versions, with semantic anchors // (so the consumer can reason about which sections changed, not just which // token spans differ). extract_diff( artifact_ref: ArtifactRef, version_a: ArtifactVersionRef, version_b: ArtifactVersionRef, budget_cap?: TokenBudget ): StructuredDiff ``` ### §12.6.2 Output schemas ```ts ExtractedSection { section_ref: SectionRef section_kind: // semantic role | "primary_target" // a finding's direct target | "neighborhood" // adjacent context | "dependency" // section that defines a term used in primary | "global_header" // document-level metadata always included content: string content_hash: string // SHA-256 of normalized content // Taint and provenance (per [H4] requirement, [F1b]) taint_class: TaintClass // inherited from artifact taint provenance: { originating_module_id?: string // which module produced this section originating_source_refs?: SourceRef[] // external sources cited (per §25 DOC25) source_label?: string // e.g., "Westlaw search result", "internal memo" derived_from_section_refs?: SectionRef[] // upstream sections this was derived from } // Budget accounting (per [E20]) token_count_estimate: number cost_estimate: CostEstimate schema_version: 1 } SemanticNeighborhood { center_section_ref: SectionRef scope_kind: "narrow" | "standard" | "wide" included_sections: ExtractedSection[] excluded_sections: Array<{ // what we left out at this scope section_ref: SectionRef exclusion_reason: "out_of_scope" | "budget_cap" | "taint_block" | "policy_block" }> budget_used: TokenBudget budget_remaining: TokenBudget schema_version: 1 } StructuredDiff { version_a: ArtifactVersionRef version_b: ArtifactVersionRef section_changes: Array<{ section_ref: SectionRef change_kind: "added" | "removed" | "modified" | "moved" semantic_anchor: string // human-readable section identifier before_content?: string // null if added after_content?: string // null if removed before_hash?: string after_hash?: string }> // SemanticChangelog overlay (per §7.11) — if the change was produced by a // regenerate/restructure module, the diff is paired with the changelog semantic_changelog_ref?: StorageRef // Taint preservation: diff content inherits source taint taint_class: TaintClass budget_used: TokenBudget schema_version: 1 } ``` ### §12.6.3 Slicing rules All slicing operations: 1. **Honor taint labels (per [F1b]).** Sections retain their source artifact's `TaintClass` in `ExtractedSection.taint_class`. Slicing does not strip taint. 2. **Respect budget caps (per [E20]).** Operations accept an optional `budget_cap`; defaults to `RevisorConfig.max_compiler_extraction_tokens` (5,000 tokens default). Operations that cannot fit within budget return a `SemanticNeighborhood` with `excluded_sections` populated and `exclusion_reason = "budget_cap"`. 3. **Tag provenance.** Each `ExtractedSection.provenance` records the originating module, source label ("Westlaw vs internal memo"), and upstream section refs. The Compiler uses provenance to decide weighting: an `external_authority_trusted` source carries more weight than an `internal_corpus_trusted` source for source verification outcomes. 4. **Are deterministic.** Given the same artifact version and same findings, the operations return identical results. This is required by §11.10 idempotency. 5. **Are cacheable.** Results are keyed by `(artifact_version_ref, parameters_hash)` and stored in EC's extraction cache. Cache lifetime follows §16 retention policies. ### §12.6.4 Use sites - **Outcome Compiler:** `extract_relevant_sections` to fit evaluation context into the evaluator's allowed input size. - **Revision Compiler:** `compute_semantic_neighborhood` to assemble the RevisionIntelligencePacket's artifact_content_excerpts. - **Revision Dispatcher:** `extract_diff` for the §21.6 diff display surface. - **Semantic paging (§12.4):** `extract_relevant_sections` decomposes large documents into section-scoped revision sub-plans. ### §12.6.5 Failure modes - **Section not found:** Operations return an error envelope; callers must handle missing section_refs gracefully (typically by re-running evaluation with the current artifact structure). - **Budget exhaustion:** Returns partial results with `excluded_sections` populated; the consumer must decide whether to proceed or to request budget extension. - **Taint-blocked section:** Operations skip sections whose taint exceeds the caller's clearance; the skipped sections are listed in `excluded_sections` with `exclusion_reason = "taint_block"`. --- # §13. PATTERN LEARNING ## §13.1 Pattern primitive Per [P-39], the Pattern primitive separates provenance (where the lesson came from) from applicability scope (where the lesson should be retrieved): ```ts Pattern { pattern_id: string pattern_kind: PatternKind // §0.4.20 // PROVENANCE (audit trail; never used for retrieval gating) provenance: { originating_run_id: string originating_matter_id?: MatterRef originating_artifact_refs: ArtifactRef[] originating_user_ref: UserRef originating_feedback_event_id?: string created_at: ISO8601 } // APPLICABILITY SCOPE (retrieval filter; inferred from pattern CONTENT) applicability_scope: { scope_kind: PatternScopeKind // §0.4.20 domain_tags?: string[] work_product_types?: string[] matter_id?: MatterRef // SET only when scope_kind == "matter" user_scope?: "will_only" | "team" | "firm" } // CONTENT payload: | OutcomeConfigurationPatternPayload | RevisionStrategyPatternPayload | PlanTemplatePatternPayload // Identifying-content scan result (§13.1.2) identifying_content_scan: { scan_performed: boolean identifying_detail_detected: boolean detected_kinds: Array< | "party_name" | "specific_fact" | "client_communication" | "matter_specific_strategy" | "case_caption" | "filing_detail" > anonymization_applied: boolean forced_scope_lock?: MatterRef } // Compatibility constraints compatibility: PatternCompatibilityConstraint // Performance (context-conditioned per [P-18]) performance_slices: PatternPerformanceSlice[] health_state: PatternHealthState // §0.4.20 aggregate_display_metrics?: PatternAggregateDisplayMetrics // UI display only // V3.2 — cross-model pattern applicability (coordination V3 §2.10) // Default for newly-learned patterns is "requires_validation" when surfaced // at a different model_class than the one in the pattern's PatternContextSignature cross_model_applicability: | "model_class_specific" // applies only at original model class | "cross_model_applicable" // validated across model classes | "requires_validation" // default; use as prior with reduced // confidence until validated schema_version: 2 // bumped for V3.2 cross_model_applicability } ``` ### §13.1.0 Cross-model applicability transitions (V3.2) The `cross_model_applicability` field is set when the pattern is learned and updated as the system observes the pattern at other model classes: - **New patterns** start at `requires_validation`. - **`calibration` mode runs** (per §6.16) produce paired-model observations on the same artifacts. When a pattern's predictions hold across the calibrated model classes (above a confidence threshold), the system promotes it to `cross_model_applicable`. Promotion is automatic only when EC Core policy gates permit; for matter-scoped or privileged-matter patterns, promotion requires explicit user action. - **Patterns that fail validation** at a new model class are demoted to `model_class_specific` and the originating `model_class` is recorded on every `PatternPerformanceSlice` (§13.3). Demoted patterns still apply at their original model class with full confidence. - **Demotion is not deletion.** A demoted pattern remains useful at its original model class; only cross-model surfacing is restricted. ### §13.1.1 Applicability scope inference The Feedback Interpreter (§14.3) infers `applicability_scope` from pattern CONTENT, not from the originating matter: ``` RULE applicability_scope_inference: Feedback Interpreter analyzes the candidate pattern's content (not its origin matter): Content is matter-agnostic craft knowledge ("lead briefs with dispositive issue") → scope_kind = "user_preference" or "global" depending on universality Content references domain conventions, not specific matter facts ("California sign-permit damages cite CC § 3333 + CACI 3903N") → scope_kind = "domain" or "work_product_type" Content references specific matter facts or strategy ("on Paramount, the city's witness is hostile to X") → scope_kind = "matter", matter_id = originating_matter_id User may override the inferred scope_kind during the [B7] teach-from-feedback flow. ``` The originating matter is a useful audit field. It is NOT a retrieval filter. A California sign-permit insight that arose during Paramount work surfaces on all California sign-permit cases, not only Paramount. ### §13.1.2 Identifying-content scan ``` RULE identifying_content_firewall: Feedback Interpreter scans the candidate pattern text for identifying matter detail. If identifying_detail_detected: 1. Attempt anonymization (replace specific names/facts with generic placeholders) 2. If anonymization preserves the lesson's value: - anonymization_applied = true - applicability_scope follows normal content-based inference (may be broad) 3. If anonymization destroys the lesson's value (the detail IS the lesson): - anonymization_applied = false - forced_scope_lock = originating_matter_id - applicability_scope.scope_kind locked to "matter" - applicability_scope.matter_id = originating_matter_id - UI surfaces the scope lock to user with explanation ``` This is the actual privilege firewall: patterns whose content reveals identifying matter detail get matter-locked, and matter-locked patterns never cross matter boundaries at retrieval (§13.4). ### §13.1.3 Pattern payloads ```ts OutcomeConfigurationPatternPayload { outcome_kind: string inferred_method_template: EvaluationMethod inferred_method_params: EvaluationMethodParams inferred_assurance_basis: AssuranceBasis[] inferred_threshold?: ThresholdRecord capability_binding_hints: CapabilityRef[] } RevisionStrategyPatternPayload { failure_kind_pattern: FailureKind[] repair_strategy: RepairStrategyKind ordering_hints: string[] preservation_constraint_hints: PreservationConstraintKind[] module_capability_hints: Array<{ capability_id: string capability_version_constraint: semver_range }> } PlanTemplatePatternPayload { step_template: Array<{ step_kind: RevisionPlanStepKind capability: string typical_preconditions: PreconditionKind[] }> typical_dependency_edges: DependencyEdgeTemplate[] } ``` ## §13.2 Pattern compatibility A pattern is compatible with a current context only when: ```ts PatternCompatibilityConstraint { required_capability_refs: VersionedCapabilityRef[] required_module_type_versions: VersionedModuleTypeRef[] required_evidence_source_classes: EvidenceSourceClass[] required_graph_features: string[] forbidden_capability_refs: VersionedCapabilityRef[] compatible_if_capability_snapshot_hash_matches?: string schema_version: 1 } ``` ### §13.2.1 Compatibility check ``` RULE pattern_compatibility: For Pattern P and current context C: P.compatibility.required_capability_refs are all available in C P.compatibility.required_module_type_versions match available modules in C P.compatibility.required_evidence_source_classes are present in C P.compatibility.forbidden_capability_refs are NOT present in C If compatible_if_capability_snapshot_hash_matches is set: C.capability_snapshot_hash matches OR fall back to other compatibility fields ``` This prevents patterns from applying when their underlying capabilities have changed semantics; a capability named the same at v1.2 vs v2.0 may have different behavior. ## §13.3 PatternPerformanceSlice Patterns do not have a global success rate. Performance is context-conditioned per slice: ```ts PatternPerformanceSlice { slice_id: string pattern_id: string context_signature: { domain_tags: string[] artifact_kind: string failure_kind?: FailureKind risk_class: "low" | "medium" | "high" assurance_basis: AssuranceBasis privilege_class: "none" | "matter_team" | "supervising_attorney" // V3.2 — model class axis (coordination V3 §2.10) // Each performance slice is conditioned on the model class that produced it. // Patterns learned in one model class default to requires_validation when // surfaced at another model class (per §13.1's cross_model_applicability). model_class: "cheap_local" | "cheap_api" | "medium" | "expensive_frontier" } usage_count: number convergence_count: number // outcomes resolved successfully failure_count: number // outcomes still failed after pattern application regression_count: number // pattern caused regression in other outcomes user_override_count: number // user rejected the pattern's suggestion contested_finding_count: number // findings produced were disputed rollback_count: number // rollback applied after pattern use // Goal advancement (per [P-20] — sycophancy fix) goal_advancement_count: number goal_advancement_source: // REQUIRED, locked enum | "independent_comparative_judge" | "explicit_human_feedback" goal_advancement_evaluator_ref?: StorageRef // REQUIRED if source = independent_comparative_judge goal_advancement_human_feedback_ref?: StorageRef // REQUIRED if source = explicit_human_feedback goal_regression_count: number cost_usd_avg: number local_compute_seconds_avg: number schema_version: 2 // bumped for V3.2 model_class axis } ``` ### §13.3.1 Goal advancement signal integrity Per [P-20], the schema validator rejects any `goal_advancement_count` increment that lacks the corresponding source reference: - `goal_advancement_source = "independent_comparative_judge"` requires `goal_advancement_evaluator_ref` populated - `goal_advancement_source = "explicit_human_feedback"` requires `goal_advancement_human_feedback_ref` populated This prevents the sycophancy delusion: Revisor-generated `GoalImpactAssessment` (§6.12) populates UI only and cannot increment this counter. ### §13.3.2 No global success_rate Per [P-18], the L4 DOC72 amendment removes `success_rate` as a Pattern field. Aggregate display metrics exist for UI only: ```ts PatternAggregateDisplayMetrics { context_scope: string // e.g., "across all legal_brief contexts" usage_count: number convergence_count: number regression_count: number display_only: true // LOCKED schema_version: 1 } ``` The `display_only: true` field is enforced by validator: any code path reading `aggregate_display_metrics` outside UI rendering produces `validation.aggregate_metric_used_as_decision_primitive`. ## §13.4 Cross-matter retrieval firewall ``` RULE cross_matter_retrieval_firewall: When the Outcome Compiler or Revision Compiler retrieves patterns for use during work on matter Y: Filter applied at query time: For each Pattern P: If P.applicability_scope.scope_kind == "matter" AND P.applicability_scope.matter_id != Y: P is excluded from retrieval results This applies whether or not the matter is the same as P.provenance.originating_matter_id. Retrieval scope is determined by applicability_scope, not provenance. ``` This is the privilege firewall. Patterns whose content references specific matter detail (and therefore have `scope_kind == "matter"` either by content inference or by forced scope lock per §13.1.2) never cross matter boundaries. Patterns derived from privileged work whose content is matter-agnostic flow normally to broader scopes. ## §13.5 Pattern promotion ``` RULE pattern_promotion: Default behavior: Patterns at scope_kind "private" or "user_preference" persist locally Promotion to broader scope (domain, work_product_type, global) requires explicit user action AND governance policy approval Promotion gating: Per P39, promotion gates on identifying content + applicability scope, not on origin matter or origin privilege class. If P.identifying_content_scan.identifying_detail_detected = true AND P.identifying_content_scan.anonymization_applied = false: Promotion to scope wider than "matter" is BLOCKED If P.applicability_scope.scope_kind == "matter": Promotion requires content review confirming pattern is generalizable Promotion to "global" or wide "domain" requires GovernancePolicy approval per AccessTier of the requesting user (§16) Promotion record: Each promotion produces a typed receipt with operation_kind in { "pattern_promoted", "pattern_demoted", "pattern_archived" } ``` ## §13.6 Pattern health ``` PatternHealthState = | "healthy" // performing within expected range | "watch" // declining or near threshold | "quarantined" // failing; not retrieved until fix | "archived" // no longer in use | "purged" // removed from DOC72 PatternHealthTransitionRule { metric: "convergence_rate" | "regression_rate" | "user_override_rate" | "cost_drift" threshold: number window_size: number // recent N applications transition_to: PatternHealthState } ``` Healthy → watch when performance crosses lower threshold; watch → quarantined when performance crosses critical threshold. Quarantined patterns may be revived after a user-flagged fix or after performance recovers (e.g., underlying capability stabilizes). ## §13.7 Pattern template versioning (per [D10]) When a pattern's underlying capabilities, modules, or evidence sources change, the pattern template may need versioning. Patterns reference `compatibility.required_capability_refs.capability_version_constraint` to define their versioning policy. A pattern targeting capability `regenerate@^1.2.0` becomes incompatible when the only available capability is `regenerate@2.0.0`. Such a pattern transitions to `watch` or `quarantined` per §13.6, awaiting either a downgrade-compatible capability or pattern revision. ### §13.7.1 Correct retrieval ordering Pattern template retrieval follows a strict five-step ordering. Steps run in this order, never reordered: ``` Step 1 — Candidate retrieval by pattern signature Inputs: ContextSignature (domain_tags, artifact_kind, evidence_source_kinds, module_types_present, goal_kind), applicability_scope filter Output: Set matching the signature within applicability scope Step 2 — Load template capability/version metadata For each candidate pattern: Resolve compatibility.required_capability_refs against DOC24 capability registry Read current capability_version values Read pattern's capability_version_constraint values Step 3 — Run compatibility check For each candidate pattern: For each (required_capability, current_version, constraint) triple: Check current_version satisfies constraint (semver) Check module types present in current task match pattern's module_types_present Check evidence_source_kinds present match pattern's evidence_source_kinds Pattern is compatible iff all required checks pass Step 4 — Use compatible templates only Filter candidate set to compatible patterns Order remaining by (similarity_score desc, last_used desc) per §21.8 Return ordered set to caller (Compiler / Revision Compiler / Feedback Interpreter) Step 5 — Mark incompatible templates stale (don't delete) For each incompatible pattern detected in step 3: Update PatternHealth.compatibility_state = "incompatible_with_current_capabilities" Record incompatibility reason in pattern's audit trail Do NOT delete; the pattern may regain compatibility when capabilities downgrade or a downgrade-compatible pattern alternative is published Stale patterns surface in the §16 Governance review queue for human disposition (revise, deprecate, or archive) ``` **Why the ordering matters:** - **Step 1 before step 2:** Candidate retrieval is a coarse signature match; loading capability metadata is more expensive. Filtering first reduces metadata load. - **Step 2 before step 3:** Compatibility check depends on current versions; loading versions in step 2 enables consistent step 3 evaluation across a batch. - **Step 3 before step 4:** Incompatible patterns must be excluded from retrieval; using an incompatible pattern produces revision plans that the dispatcher will reject at §11.3 deterministic linting. - **Step 4 before step 5:** Caller receives the ordered compatible set; staleness marking is a side-effect tied to step 3 results. - **Step 5 is non-destructive:** Patterns are marked stale, not deleted. Recovery is possible when capabilities change. **Versioning policy:** ```ts PatternTemplateVersioning { pattern_id: string template_version: semver // version of the pattern template itself compatibility_history: Array<{ detected_at: ISO8601 compatibility_state: "compatible" | "incompatible_with_current_capabilities" | "stale" | "archived" affecting_capability_changes: CapabilityRef[] resolution: "remained_compatible" | "marked_stale" | "revised_to_new_template_version" | "archived" }> prior_template_version_refs: PatternRef[] // chain of prior versions superseded_by?: PatternRef // newer version, if any schema_version: 1 } ``` When a pattern is revised (e.g., adapted to a new capability version), the revision produces a new pattern with `template_version` bumped per semver rules. The prior pattern's `superseded_by` points to the new pattern. Both patterns persist; retrieval prefers the newest compatible version per step 4 ordering. ### §13.7.2 Capability downgrade behavior When a required capability is removed (not just version-changed), all patterns referencing it transition to `stale` per step 5. They do not transition to `archived` automatically; archival is a Governance decision (§16.4) requiring human disposition. --- # §14. FEEDBACK PIPELINE ## §14.1 Decision table When the user gives feedback during a run, the system routes via a decision table: ``` FEEDBACK ROUTING DECISION: Is the feedback a meaningful correction? (i.e., the user disagrees with output, identifies error, requests change) YES → Route through Revisor (current run cycle) + emit signals for future learning (§14.8) NO → Is the feedback a teaching signal (no current-run change needed)? YES → Emit feedback event for Feedback Interpreter (§14.3) Generate DirectInstructionCandidate or pattern signal Do NOT invoke Revisor for current run NO → Is the feedback informational? YES → Log feedback; no system action ``` ### §14.1.1 Meaningful correction definition A meaningful correction: - Changes the output the user would accept - Identifies a specific error or deficiency - Includes a directive ("fix this," "change this," "don't do that") Versus a teaching signal: - Acknowledges the current output is acceptable - Offers a preference or convention to apply going forward - Does not require immediate revision ## §14.2 HumanOutcomeFeedbackEvent The intake schema for human feedback: ```ts HumanOutcomeFeedbackEvent { event_id: string task_id: string run_id: string outcome_id?: outcome_id // when feedback targets specific outcome artifact_ref?: ArtifactRef // when feedback targets specific artifact finding_id?: string // when feedback targets specific finding feedback_text: string feedback_attachments: StorageRef[] user_ref: UserRef submitted_at: ISO8601 user_classification?: { // user's hint about classification feedback_kind?: FeedbackKind authority_class?: HumanFeedbackAuthorityClass intended_scope?: PatternScopeKind } ui_source: | "evaluation_result_card" | "revision_result_card" | "plan_review" | "hard_call_response" | "teach_from_feedback_card" | "direct_chat" schema_version: 1 } ``` ## §14.3 Feedback Interpreter The Feedback Interpreter is a runtime service that parses HumanOutcomeFeedbackEvent into structured InterpretedOutcomeFeedback. ### §14.3.1 InterpretedOutcomeFeedback ```ts InterpretedOutcomeFeedback { interpretation_id: string source_event_id: string feedback_kind: FeedbackKind // §0.4.21 authority_class: HumanFeedbackAuthorityClass proposed_revision_request?: RevisionRequest // when meaningful correction proposed_durable_candidate?: DurableKnowledgeCandidate // when teaching signal // Applicability scope inference (per [P-39]) inferred_applicability_scope: { scope_kind: PatternScopeKind domain_tags?: string[] work_product_types?: string[] matter_id?: MatterRef user_scope?: "will_only" | "team" | "firm" } // Identifying-content scan result identifying_content_scan: IdentifyingContentScanResult privilege_class: "none" | "matter_team" | "supervising_attorney" privilege_basis: string sub_agents_consulted: AdvisorySubAgentRef[] confidence: "low" | "medium" | "high" rationale: string schema_version: 1 } ``` ### §14.3.2 Interpretation protocol ``` PROTOCOL feedback_interpretation: Phase 1 — Parse: 1. Extract feedback intent (correction vs teaching vs informational) 2. Classify feedback_kind 3. Identify target outcome / artifact / finding Phase 2 — Scope inference: 1. Analyze feedback CONTENT 2. Determine applicability_scope per §13.1.1 rules 3. Scan for identifying matter detail per §13.1.2 rules Phase 3 — Authority and privilege: 1. Determine authority_class 2. Determine privilege_class Phase 4 — Output: If meaningful correction: Produce proposed_revision_request → Revisor (current run) If teaching signal: Produce proposed_durable_candidate → Teach-from-feedback card (§18) ``` ### §14.3.3 Sub-agent consultation The Feedback Interpreter MAY consult advisory sub-agents producing `FeedbackInterpretationAssessment` (§8.5) for: - Ambiguous feedback intent - Novel feedback kinds not in the FeedbackKind enum - Complex scope inference (matter-specific vs domain-general) - Privilege classification uncertainty ## §14.4 RevisionRequest from feedback When the Feedback Interpreter produces a `proposed_revision_request`, it follows the RevisionRequest schema (§7.1) and is routed to the Revisor: ``` RULE feedback_to_revisor_routing: If InterpretedOutcomeFeedback.proposed_revision_request is present: - Current revision plan is in progress: route to Revisor as additional input - No current plan: create new revision plan with this as primary input - Plan completed: spawn new revision cycle if user explicitly authorizes ``` ## §14.5 DurableKnowledgeCandidate For teaching signals, the Interpreter produces: ```ts DurableKnowledgeCandidate { candidate_id: string source_event_id: string candidate_kind: | "outcome_configuration_lesson" | "revision_strategy_lesson" | "plan_template_lesson" | "style_preference" | "tool_or_method_preference" | "process_convention" candidate_payload: | OutcomeConfigurationPatternPayload | RevisionStrategyPatternPayload | PlanTemplatePatternPayload | StylePreferencePayload | ToolPreferencePayload | ProcessConventionPayload proposed_scope: PatternScopeKind inferred_compatibility: PatternCompatibilityConstraint privilege_class: "none" | "matter_team" | "supervising_attorney" identifying_content_scan: IdentifyingContentScanResult schema_version: 1 } ``` Candidates flow to the Teach-from-feedback UI (§18) for user confirmation before persistence as Patterns. ## §14.6 HumanFeedbackAuthorityClass ```ts HumanFeedbackAuthorityClass = | "current_run_instruction" // applies to this run only | "current_run_preference" // soft preference for this run | "future_pattern_signal" // teach future runs (no current change) | "durable_instruction_candidate" // proposed durable rule | "matter_scoped_preference" // matter-only preference | "privileged_comment_no_learning" // commentary; do not learn | "correction_to_evaluator" // Evaluator-specific feedback | "correction_to_revisor" // Revisor-specific feedback ``` ### §14.6.1 Authority class effects | authority_class | current run | future patterns | scope constraints | |---|---|---|---| | current_run_instruction | revisor invoked | no | run only | | current_run_preference | revisor consulted | no | run only | | future_pattern_signal | no change | pattern proposed | per inferred scope | | durable_instruction_candidate | no change | DirectInstructionCandidate proposed | per §14.7 | | matter_scoped_preference | no change | pattern scope locked to matter | matter only | | privileged_comment_no_learning | no change | NO durable learning | run only, LOCKED | | correction_to_evaluator | Evaluator quality signal | future Evaluator pattern | per inferred scope | | correction_to_revisor | Revisor quality signal | future Revisor pattern | per inferred scope | `privileged_comment_no_learning` is the explicit no-learning class. It blocks durable destinations regardless of other classification. ## §14.7 DirectInstructionCandidate When feedback proposes a durable instruction (not a pattern), the system creates a DirectInstructionCandidate: ```ts DirectInstructionCandidate { candidate_id: string source_event_id: string instruction_text: string instruction_kind: | "style_rule" | "process_rule" | "tool_preference" | "source_preference" | "format_rule" | "review_rule" proposed_scope: { user_scope: "will_only" | "team" | "global_system" work_scope: "current_run" | "this_task" | "this_matter" | "domain" | "work_product_type" | "all_work" domain_tags?: string[] matter_id?: MatterRef } strength: "suggestion" | "strong_preference" | "hard_constraint" scope_rule_validations: ScopeRuleValidationResult[] governance_approval_ref?: ApprovalRef conflict_check_result?: ConflictCheckResult schema_version: 1 } ``` ### §14.7.1 Scope rules Per [P-14], scope is constrained: ``` RULE direct_instruction_scope_constraints: If source_feedback.authority_class == "privileged_comment_no_learning": destinations MUST equal ["current_run_only"] proposed_scope.work_scope MUST equal "current_run" proposed_scope.user_scope MUST equal "will_only" If source_feedback.privilege_class != "none": proposed_scope.work_scope cannot exceed "matter" proposed_scope.user_scope cannot exceed "team" If proposed_scope.user_scope == "global_system": requires_architect_or_admin_approval = true governance_policy_ref must reference global_promotion_policy If strength == "hard_constraint": requires_conflict_check_against_existing_authority = true DEFAULT destination = ["current_run_only"] unless user explicitly promotes DEFAULT strength = "suggestion" unless user explicitly escalates ``` ### §14.7.2 Conflict checking When `strength == "hard_constraint"`, the candidate is checked against existing authority rules (CIL authorities, prior DirectInstructions, governance rules). Conflicts produce `ConflictCheckResult.conflicts` requiring user resolution before persistence. ## §14.8 Signal classes The feedback pipeline emits typed signals for downstream consumers (DOC72, BDSM, DOC24, quality programs): ### §14.8.1 OutcomeEvaluatorFeedbackSignalKind ``` | "evaluator_false_pass" | "evaluator_false_fail" | "evaluator_missed_hard_call" | "evaluator_wrong_hard_call" | "evaluator_plan_user_edited" | "compiled_plan_accepted" | "compiled_plan_rejected" | "finding_marked_wrong" | "finding_marked_correct" | "finding_later_superseded" | "needs_information_was_correct" | "needs_verification_was_correct" | "human_judgment_flag_useful" | "human_judgment_flag_noise" ``` ### §14.8.2 RevisorFeedbackSignalKind ``` | "revision_plan_succeeded" | "revision_plan_failed" | "revision_plan_too_broad" | "revision_plan_too_narrow" | "revision_target_wrong_module" | "revision_instruction_useful" | "revision_instruction_ignored" | "revision_caused_regression" | "revision_resolved_finding" | "revision_failed_to_resolve_finding" ``` ### §14.8.3 DirectInstructionSignalKind ``` | "direct_instruction_candidate_created" | "direct_instruction_accepted" | "direct_instruction_edited" | "direct_instruction_rejected" | "direct_instruction_scope_narrowed" | "direct_instruction_superseded" | "direct_instruction_injection_helped" | "direct_instruction_injection_hurt" ``` ### §14.8.4 Signal consumers ``` DOC72 — Pattern primitive updates, performance slice updates, promotion candidates DOC8 / BDSM — Utility-bundle compilation inputs (Phase 2) DOC24 — Compiled guidance bundles (Phase 2 hot-path; Phase 1 ledger) Quality programs — Compiler / Revisor / sub-agent quality metrics EC — Audit and retention ``` ## §14.9 Plan Review Forum For high-stakes plans, the system routes to a forum room (DOC12) for collaborative review before dispatch. PlanAssurancePolicy.required_modes can include `forum_review`, which: 1. Creates a Room (DOC12) with the plan, evaluation result, and supporting context 2. Invites participants per the dynamic participation rules in §14.9.2 3. Awaits forum consensus (per Room rules) 4. Records consensus as a step in the plan's assurance trail Forum review is deliberation, not runtime orchestration. Dispatch resumes after forum signals approval. ### §14.9.1 Role-scoped critique authority (per [J3]) Forum participants have scoped critique authority. Each participant role can dissent *within scope*; out-of-scope dissents are recorded but not blocking. This prevents the "everyone vetoes everything" deadlock pattern. ```ts ForumParticipantRole { participant_kind: | "source_or_citation_specialist" // dissent on source verification only | "style_specialist" // dissent on style/format only | "adversarial_reviewer" // dissent on risk/persuasiveness only | "evaluator_agent" // dissent on outcome coverage only | "task_agent" // dissent on graph design only | "human_user" // dissent on anything (no scope limit) | "human_supervisor" // dissent on anything (no scope limit) blocking_critique_scopes: CritiqueScope[] advisory_critique_scopes: CritiqueScope[] // can comment, not block schema_version: 1 } CritiqueScope = | "source_verification" | "citation_format" | "writing_style" | "document_structure" | "risk_assessment" | "persuasiveness" | "outcome_coverage" | "graph_design" | "factual_accuracy" | "legal_strategy" | "client_position" | "any" // human roles only ForumDissent { participant_ref: ParticipantRef participant_role: ForumParticipantRole dissent_scope: CritiqueScope in_scope_for_participant: boolean dissent_kind: "block_dispatch" | "request_modification" | "comment_only" rationale: string // If in_scope: dissent_kind may be "block_dispatch" // If out_of_scope: dissent_kind is automatically downgraded to "comment_only" schema_version: 1 } ``` **Default scope mapping:** | Participant role | Blocking scopes | Advisory scopes | |---|---|---| | `source_or_citation_specialist` | source_verification, citation_format, factual_accuracy | any other (comment only) | | `style_specialist` | writing_style, document_structure | any other (comment only) | | `adversarial_reviewer` | risk_assessment, persuasiveness | any other (comment only) | | `evaluator_agent` | outcome_coverage | any other (comment only) | | `task_agent` | graph_design | any other (comment only) | | `human_user` | any | (none separately) | | `human_supervisor` | any | (none separately) | A `source_or_citation_specialist` who dissents on "writing style" produces a dissent record with `in_scope_for_participant = false` and `dissent_kind = "comment_only"`. The comment is recorded for audit but does not block dispatch. **Override mechanism:** A human role (user or supervisor) may override out-of-scope downgrades by explicitly re-elevating the comment to blocking. This requires an explicit UI action with a receipt; the system does not infer override from prose. ### §14.9.2 Dynamic participation (per [J6]) Forum convened for high-stakes plans. Default participation: - **Always:** Evaluator Agent, at least one specialist sub-agent relevant to the failure kind, and the originating user (or their supervisor if the user is not available). - **Conditional:** Task Agent participates ONLY when graph design is in question: - `failure_kind = "graph_design_gap"`, OR - Compiler confidence below threshold on graph-structure questions, OR - The proposed `RevisionPlan` contains a `graph_patch_proposal` step (per §6.3). - **By trigger:** Adversarial reviewer participates when: - `PlanAssurancePolicy.required_modes` includes `adversarial_lint`, OR - Plan affects a privileged or external-side-effect-bearing artifact, OR - `HardRevisionCallKind = "risk_tradeoff_no_dominant_option"` was raised. **Anti-rubber-stamp rule (per [J6] GM3):** Per GM3 constraint, if the Task Agent is not participating, the forum MUST still include at least one specialist sub-agent relevant to the failure kind. A forum cannot devolve to single-participant rubber-stamp. The system enforces this at forum-room creation: a participant set of size 1 (other than the originating user) fails at room creation with `validation.forum_participant_set_below_minimum`. ```ts ForumParticipantSet { forum_id: string participants: ForumParticipant[] participation_rationale: Array<{ participant_ref: ParticipantRef inclusion_reason: | "always_default" | "specialist_for_failure_kind" | "task_agent_for_graph_design" | "adversarial_for_privileged_or_side_effect" | "user_request" | "matter_team_default" }> minimum_satisfied: boolean // at least 2 non-user participants rubber_stamp_guard_passed: boolean // at least 1 specialist if no task agent schema_version: 1 } ``` A failing `rubber_stamp_guard_passed` blocks forum creation; the system surfaces a recommendation for which specialists to add and waits for explicit confirmation before proceeding. ### §14.9.3 Forum lifecycle states ```ts PlanReviewForumState = | "convening" // creating room, inviting participants | "deliberating" // participants reviewing, comments accruing | "awaiting_dissent_resolution" // blocking dissent raised; user adjudication needed | "approved" // consensus or override; plan returns to dispatcher | "rejected" // forum rejected the plan | "withdrawn" // originating user withdrew the plan | "expired" // forum timed out without consensus ``` Forum state transitions are recorded as `RevisionOperationReceipt` entries with `operation_kind = "forum_state_transition"`. **Cross-reference to OBL-DOC12-FORUM-01:** DOC12 publishes the `plan_review_room` kind. This addendum specifies the contents and participation rules; DOC12 owns the room lifecycle mechanics. --- # §15. QUALITY PROGRAM ## §15.1 Revisor quality metrics Revisor performance is measured per: ``` Metric: revision_resolved_outcomes_rate Denominator: revision plans that completed Numerator: plans where targeted outcomes transitioned to satisfied Metric: false_fix_rate Denominator: completed plans Numerator: plans where outcomes returned to non-satisfied within revalidation cycle Metric: regression_introduction_rate Denominator: completed plans Numerator: plans where previously-satisfied outcomes regressed Metric: avg_revision_cycles_to_convergence Denominator: outcomes reaching satisfied state Numerator: revision cycles per outcome Metric: cost_per_successful_fix Denominator: outcomes transitioning to satisfied via revision Numerator: total cost across all revision cycles for that outcome Metric: hard_call_escalation_rate Denominator: revision plans Numerator: plans that triggered HardRevisionCall Metric: budget_logical_exceeded_rate Denominator: revision plans started Numerator: plans aborted with budget_logical_exceeded Metric: budget_infrastructure_exceeded_rate Denominator: revision plans started Numerator: plans escalated with budget_infrastructure_exceeded ``` ## §15.2 Revision Compiler quality metrics ``` Metric: strategy_accepted_rate Denominator: strategies proposed Numerator: strategies accepted into RevisionPlan Metric: planner_confidence_calibration Denominator: plans with compiler_confidence_score recorded Numerator: correlation between confidence and outcome convergence Metric: novelty_threshold_calibration Denominator: plans with novelty_score computed Numerator: correlation between novelty and actual fresh-reasoning need Metric: sub_agent_advice_acceptance_rate Denominator: sub-agent invocations during compilation Numerator: advice accepted into compiled strategy ``` ## §15.3 Outcome Compiler quality metrics ``` Metric: intent_classification_accuracy Denominator: outcomes compiled Numerator: outcomes where Compiler-inferred outcome_kind was correct Metric: threshold_extraction_accuracy Denominator: outcomes with explicit thresholds Numerator: outcomes where Compiler correctly extracted the threshold Metric: plan_status_distribution Tracking: distribution of compiled / compiled_with_limitations / needs_clarification / etc. Metric: preview_resolved_drift_rate Denominator: outcomes with both preview and resolved plans Numerator: outcomes where resolved differed materially from preview Metric: abstention_rate Denominator: compilation attempts Numerator: abstained_low_confidence outputs ``` ## §15.4 Plan Verification metrics ``` Metric: semantic_lint_critique_useful_rate Denominator: semantic lint invocations Numerator: critiques that led to plan revision Metric: false_positive_critique_rate Denominator: semantic lint critiques Numerator: critiques flagging plans that succeeded as-proposed Metric: false_negative_critique_rate Denominator: failed plans that passed semantic lint Numerator: plans that failed where semantic lint flagged no issues ``` ## §15.5 Direct fix quality metrics ``` Metric: direct_fix_class_safe_rate Denominator: direct fixes attempted Numerator: fixes within allowed classes Metric: direct_fix_review_acceptance_rate Denominator: direct fixes presented for review Numerator: fixes accepted by user Metric: direct_fix_downstream_dirty_rate Denominator: direct fixes applied Numerator: fixes that caused downstream outcome dirty state ``` ## §15.6 Eval suite V3 ships with an evaluation suite for the system itself. Four primary domains: ``` Domain: legal_brief Fixtures: 30+ exemplar briefs with known-good and known-bad outputs Outcomes: coverage, citation accuracy, structure, style, hard-call detection Domain: legal_research_memo Fixtures: 20+ memos with verified analyses Outcomes: source verification, conclusion correctness, scope coverage Domain: contract_review Fixtures: 20+ contracts with annotated issues Outcomes: clause-by-clause review, issue identification, risk classification Domain: discovery_response Fixtures: 15+ discovery responses with known objections and productions Outcomes: objection completeness, production scope, privilege log accuracy ``` Plus domain-neutral smoke set: ``` Smoke set: general_writing Fixtures: 20+ artifacts across various domains Outcomes: basic completeness, basic accuracy, basic format compliance ``` Phase 2 (deferred per §26): software, research, marketing, general, process-trace domains. ### §15.6.1 Eval execution Eval suite runs: - Pre-merge: every spec amendment that changes Compiler/Evaluator/Revisor logic - Nightly: full suite against current production code - Per-release: gated by suite pass rate ≥ 95% on known-good fixtures, ≥ 90% on known-bad fixtures ### §15.6.2 Slicing by pattern_id (per [K6]) Eval suite metrics are sliced by `pattern_id`, not by hardcoded work-type taxonomy. Categorization emerges from learned patterns; the suite measures hit rate per `pattern_id` or per organically-clustered outcome category. Usage patterns emerge from operational data; the taxonomy is not imposed top-down. ```ts PatternKeyedEvalSlice { slice_key: PatternRef // pattern_id, not work_type fixture_count: number // fixtures hitting this pattern pass_rate: number // 0.0-1.0 // Drill-down metrics false_pass_rate: number false_fail_rate: number hard_call_detection_rate: number avg_replans_to_success: number // Context for the slice pattern_summary: string // human-readable pattern name context_signature: PatternContextSignature // §13.3 context signature schema_version: 1 } ``` **Anti-taxonomy rule:** the eval dashboard does NOT have a hardcoded "Tax/M&A/Litigation/Contract Review" filter set. Filtering is by pattern_id or by `domain_tags`, both of which are organically populated from operational data. If the dashboard appears to need a hardcoded taxonomy, the response is to surface the actually-occurring patterns, not to invent a taxonomy. ### §15.6.3 Goal-conditioned slicing (per [K10]) In addition to pattern_id slicing, eval suite metrics are sliced by `goal_kind` (from DOC72 goals integration). Different goal kinds may need different fixtures and have different baseline performance: ```ts GoalConditionedEvalSlice { slice_key: { goal_kind: string // e.g., "litigation_strategy", "client_advisory" pattern_id?: PatternRef // optional cross-slice } fixture_count: number pass_rate: number goal_advancement_rate: number // fraction where revision advanced the stated goal goal_regression_rate: number // fraction where revision impaired the stated goal // Cost per goal-advancing outcome avg_cost_per_advancing_outcome: CostEstimate schema_version: 1 } ``` Goal-conditioned slices feed into the §15.1 Revisor quality metrics; the `goal_advancement_count` per `PatternPerformanceSlice` is computed from these slices and the independent comparative-judge evaluator per P20 (never from the Revisor's own goal-impact assessment). **Cross-reference to DOC72:** Goal kinds are owned by DOC72 (OBL-DOC72-GOAL-03). This addendum consumes the taxonomy; it does not own it. ## §15.7 Known-bad fixtures Known-bad fixtures are evaluator-facing tests where the input is deliberately bad and the system MUST detect it: ```ts KnownBadFixture { fixture_id: string fixture_kind: | "missing_citation" | "wrong_citation" | "claim_unsupported" | "scope_overreach" | "format_violation" | "missing_required_section" | "style_violation" | "process_skip" | "hard_call_not_flagged" input_artifact_ref: ArtifactRef expected_findings: ExpectedFindingPattern[] expected_outcome_state: OutcomeEvaluationState expected_hard_call_kinds?: HardRevisionCallKind[] schema_version: 1 } ``` System must detect ≥ 90% of injected defects per category. Categories below 90% trigger remediation per the quality program. ## §15.8 Sub-agent reputation ```ts SubAgentReputation { advisory_agent_id: string invocation_count: number current_score: number // 0.0 - 1.0 score_confidence_interval: [number, number] minimum_n_met: boolean // n >= configurable minimum status: | "active" | "sandbox" // results not used in production until manual approval | "watch" // declining | "quarantined" // failing; not consulted until fix performance_by_kind: Record // Asymmetric tracking false_positive_count: number // flagged success as failure false_negative_count: number // missed real failure schema_version: 1 } ``` ### §15.8.1 Flagging rules ``` RULE sub_agent_flagging: Flag sub-agent for watch/quarantine only if: invocation_count >= minimum_n_met (default 10) AND lower_confidence_bound < threshold (default 0.6) Also track separately: "advice was correct but execution failed" (Compiler accepted advice; module failed for other reasons) "advice was wrong" (Compiler accepted advice; result was worse than baseline) False negatives cost reputation 2x false positives (asymmetric) ``` ### §15.8.2 Sandbox mode New sub-agents start in `sandbox` status. Their advice is collected but not used in production plans until manual approval based on calibration record. ### §15.8.3 Sub-agent advice quality metrics (per [K7]) In addition to reputation (which is a single composite score), the Quality Program tracks four specific metrics per sub-agent. These are the named metrics from [K7] and feed both the Phase 1 reputation calculation and the Phase 2 reputation-based routing (deferred per §26.1). ```ts SubAgentAdviceQualityMetrics { sub_agent_ref: SubAgentRef measurement_window: DateRange // Metric 1: Hit rate per failure_kind cluster // What fraction of the sub-agent's advice, when accepted, produced the // intended improvement, sliced by which failure_kind the advice addressed hit_rate_per_failure_kind: Record // Metric 2: Attribution — what fraction of plan success is attributable to this sub-agent // Computed by counterfactual: would the plan have succeeded without this advice? // Estimated via revision A/B comparison when available; via Compiler-tagged attribution // otherwise attribution: { plans_with_advice: number plan_success_with_advice: number counterfactual_estimate_method: "ab_compare" | "compiler_tagged" | "judge_attributed" estimated_attribution_lift: number // 0.0-1.0; higher = more attributable } // Metric 3: Cost per successful sub-agent recommendation // Sub-agent cost (inference, latency, sub-agent's own retries) divided by // number of advice that contributed to a successful plan cost_per_successful_recommendation: { total_cost: CostEstimate successful_recommendation_count: number cost_per_success: CostEstimate } // Metric 4: Advice-led-to-regression rate // What fraction of accepted advice led to regression of a previously-satisfied outcome // This is the asymmetric-weighted metric (per §15.8.1 false negative cost is 2x) advice_regression_rate: { accepted_count: number regression_introduced_count: number regression_rate: number // regression_count / accepted_count severity_breakdown: Record<"minor" | "major" | "critical", number> } schema_version: 1 } ``` **Phase 1 (V3.1) use:** all four metrics are computed and surface in the §21 Sub-Agent Inspector. They feed the composite reputation score per §15.8 but do not yet automatically route around low-performing sub-agents. **Phase 2 use (deferred per §26.1):** automatic reputation-based routing per [P9] / [P10] — e.g., automatic exclusion of sub-agents with `advice_regression_rate.regression_rate > threshold` from high-stakes plans. **Validation:** missing metric for any active sub-agent surfaces in the Drift Manifest (§20.4) as `validation.sub_agent_metrics_incomplete`. ## §15.9 QualitySignal schema Per [P-35], quality signals declare their actionability: ```ts QualitySignal { signal_id: string signal_kind: string // from §14.8 signal kinds measured_value: number denominator: number confidence: number actionability: QualityActionability // §0.4.18 threshold?: number // required for block/escalate schema_version: 1 } ``` Examples: - `malicious_artifact_should_not_control_evaluator` → `actionability = "block"` - `cost_per_successful_fix` → `actionability = "metric_only"` - `false_fix_rate above 0.15` → `actionability = "warn"` - `consecutive regression introductions >= 3` → `actionability = "escalate"` ## §15.10 Taint model V3 inherits and refines the multi-tier taint model: ```ts TaintClass = // §0.4.10 | "system_trusted" | "user_trusted_bounded" | "user_advisory" | "internal_corpus_trusted" | "external_authority_trusted" | "external_untrusted" | "adversarial_known" | "unclassified" ``` ### §15.10.1 Taint propagation rules ``` RULE taint_propagation: When artifact A is derived from inputs I1, I2, ..., In: A.taint_class = max_severity({I1.taint_class, I2.taint_class, ...}) Where max_severity ordering is: adversarial_known > external_untrusted > unclassified > external_authority_trusted > user_advisory > internal_corpus_trusted > user_trusted_bounded > system_trusted ``` ### §15.10.2 Instruction hierarchy Higher-trust sources take precedence over lower-trust sources in instruction layering: 1. System instructions (system_trusted) 2. Standing user authority (user_trusted_bounded) 3. Compiled patterns and guidance (internal_corpus_trusted) 4. Plan-time user guidance (user_advisory) 5. Module-emitted analysis (varies by source) 6. External authority content (external_authority_trusted) — treated as DATA, not instruction 7. External untrusted content (external_untrusted) — treated as DATA, never instruction External content is never an instruction source. It is data the Compiler considers, never something the Compiler obeys. ## §15.11 Transitive taint propagation ```ts SanitizationNode { node_id: string artifact_ref: ArtifactRef input_taint: TaintClass output_taint: TaintClass sanitization_method: | "quote_and_mark_as_data" | "redaction" | "summary_extraction" | "human_verification" | "structured_extraction" evidence_refs: StorageRef[] schema_version: 1 } ``` A SanitizationNode is the only mechanism that can downgrade taint. The node records what was done and what evidence supports the downgrade. ### §15.11.1 Sandboxed evaluation interaction Per §11.12, evaluating a tainted CandidateArtifactVersion does not propagate taint into main OutcomeRuntimeState. Sandbox quarantines the taint until clearance via SanitizationNode or user_explicit_review. ## §15.12 Taint lifecycle and clearance ```ts TaintClearanceRecord { artifact_ref: ArtifactRef original_taint: TaintClass cleared_to: TaintClass clearance_method: TaintClearanceMethod clearance_basis_ref: StorageRef cleared_at: ISO8601 cleared_by: UserRef | AgentRef // Per P19 — access tier binding cleared_by_access_tier: AccessTier clearance_scope: "this_run" | "this_matter" | "firm" | "global" schema_version: 1 } ``` ### §15.12.1 Tier-to-scope rule Per [P-19]: ``` RULE taint_clearance_scope_bound_to_access_tier: Maximum clearance_scope per cleared_by_access_tier: owner_full_access → "this_run" or "this_matter" matter_team_access → "this_matter" supervising_attorney_review → "firm" firm_admin → "firm" or "global" architect_admin → "global" audit_log_only → cannot clear taint no_access → cannot clear taint If user attempts clearance beyond their tier: schema validator rejects with validation.taint_clearance_scope_exceeds_tier If a pattern is promoted to scope S, all input artifact taint clearances must have clearance_scope >= S; otherwise promotion is blocked. ``` This prevents privilege escalation: a junior user clicking "Accept" on a candidate containing prompt-injected text cannot inadvertently clear taint for firm-wide or global promotion. --- # §16. GOVERNANCE ## §16.1 GovernancePolicy ```ts GovernancePolicy { policy_id: string policy_scope: PatternScopeKind pattern_promotion_rules: Array<{ from_scope: PatternScopeKind to_scope: PatternScopeKind requires_approver_tier: AccessTier requires_content_review: boolean requires_compatibility_check: boolean }> retention_rules: Array<{ record_kind: string retention_duration_days: number archival_destination: string }> export_contract_refs: ExportContractRef[] default_deny: boolean // for items not explicitly permitted schema_version: 1 } ``` ## §16.2 AccessTier ``` AccessTier = | "owner_full_access" | "matter_team_access" | "supervising_attorney_review" | "firm_admin" | "architect_admin" | "audit_log_only" | "no_access" ``` Tier capabilities: - `owner_full_access`: full access to own artifacts, patterns, feedback - `matter_team_access`: access to matter-scoped artifacts and patterns - `supervising_attorney_review`: cross-matter visibility within firm; firm-scope promotion - `firm_admin`: firm-wide governance and tenancy administration - `architect_admin`: ELNOR system administration; global-scope promotion - `audit_log_only`: read receipts and audit records only - `no_access`: no access ## §16.3 ExportContracts ```ts ExportContract { contract_id: string export_target: | "firm_local_db" | "client_export" | "regulatory_filing" | "third_party_share" | "training_data" exportable_record_kinds: string[] redaction_rules: RedactionRule[] approval_requirements: ApprovalRequirement[] schema_version: 1 } ``` Export of receipts, findings, plans, or diffs requires an ExportContract; ad-hoc export is not permitted. ## §16.4 Pattern promotion governance Per [P-39], pattern promotion gates on identifying content and applicability scope, not on origin matter or origin privilege class: ``` RULE pattern_promotion_default_deny: Default behavior: patterns are private to the user who triggered the feedback that produced them, until explicitly promoted. Promotion is allowed if: - identifying_content_scan does not flag matter-locking content, OR - identifying_content_scan flagged content but anonymization succeeded Promotion requires: - User explicitly initiates promotion via UI - GovernancePolicy.pattern_promotion_rules permits the from→to transition - Approver of required tier signs off (per AccessTier) - Compatibility check passes (target context has required capabilities/sources) Promotion produces a typed receipt with operation_kind = "pattern_promoted" ``` ## §16.5 Default-deny across the board Implementations MUST treat the following as default-deny: - Pattern promotion to broader scope - DirectInstructionCandidate to scope wider than current_run - Taint clearance to scope wider than user's AccessTier - Cross-matter retrieval - External export of any record - Cross-tenancy access (where applicable) Explicit user action + governance approval are required for elevated operations. ## §16.6 Matter-specific governance policies (per [G6]) Governance policies may be matter-specific. The default `GovernancePolicy` is firm-level; matter-level overrides modify specific fields without replacing the policy wholesale. ### §16.6.1 Matter classes Matters are classified at creation. The classification determines default retention, access tier mapping, and export-contract bindings. Re-classification requires firm_admin tier and creates an audit record. ```ts MatterClass = | "ma_transactional" // M&A, securities offerings, transactional advisory | "litigation" // active litigation, including pre-suit investigation | "regulatory_enforcement" // agency enforcement, regulatory advisory | "internal_advisory" // pure internal advisory work, no external party | "client_advisory_general" // general client advisory not fitting above | "internal_administrative" // firm-internal matters (HR, vendor, etc.) ``` ### §16.6.2 Default per-class governance | Field | M&A transactional | Litigation | Regulatory enforcement | Internal advisory | Client advisory general | Internal admin | |---|---|---|---|---|---|---| | Default retention (post-completion) | 7 years | indefinite until matter closure + 7y | 10 years | 3 years | 5 years | 2 years | | Privilege default | work_product | attorney_client + work_product | work_product | confidential | confidential | none | | Access tier default | matter_team_access | matter_team_access | matter_team_access | matter_team_access | matter_team_access | firm_admin | | Export approval threshold | supervising_attorney_review | supervising_attorney_review | firm_admin | matter_team_access | supervising_attorney_review | firm_admin | | Diligence team broad access | yes (during active deal) | no | no | n/a | no | no | | Privilege log requirement | per matter | per matter | per matter + agency-specific | no | no | no | | Pattern promotion default | matter-scoped | matter-scoped | matter-scoped | domain-default | domain-default | private | These defaults are starting points. Each matter can override individual fields with appropriate governance approval. ### §16.6.3 Matter governance scope ```ts MatterGovernancePolicy { matter_id: MatterRef matter_class: MatterClass // Inherits from firm default; overrides where set overrides: { retention_post_completion_days?: number privilege_default?: PrivilegeClass access_tier_default?: AccessTier export_approval_min_tier?: AccessTier pattern_promotion_default_scope?: PatternApplicabilityScopeKind diligence_team_member_refs?: ActorRef[] // for M&A privilege_log_required?: boolean agency_specific_export_format?: string // for regulatory_enforcement audit_trail_retention_days?: number } inheritable_to_tasks: boolean // tasks under this matter inherit overrides approval_record: { approved_by: ActorRef // must have appropriate tier approved_at: ISO8601 rationale: string } schema_version: 1 } ``` ### §16.6.4 Inheritance to tasks When `inheritable_to_tasks = true` (default), tasks created under the matter inherit the policy. Tasks may further override fields, but task-level overrides require approval from a tier at or above the matter-level override approver. A junior user cannot override a supervisor's matter-level policy at the task level. ### §16.6.5 Cross-matter policy isolation A pattern's `applicability_scope` and `provenance` (§13.1) inherit governance from the originating matter. Cross-matter retrieval (§13.4) honors the originating matter's `pattern_promotion_default_scope`; a pattern from a litigation matter does not surface in M&A work unless explicitly promoted to a broader scope per §16.4. ### §16.6.6 Validation - Tasks without a matter assignment fail `validation.task_missing_matter_assignment`. - Matter policies with retention shorter than the inferred legal-hold minimum fire `validation.matter_retention_below_legal_hold`. - Matters re-classified during active workflow fire `validation.matter_class_changed_during_active_work` (warning; not blocking, but requires audit acknowledgment). --- # §17. SUB-AGENT COORDINATION This section unifies sub-agent rules from §3.4, §6.1.5, §8, §14.3.3, and §11.5 into a single coordination summary. ## §17.1 Coordination points Sub-agents apply at four coordination points: 1. **Outcome Compiler** (§4.8) — intent classification, threshold extraction, source/method binding 2. **Evaluator** (§5, §8) — specialist subevaluators per lane 3. **Revision Compiler** (§6.1.5) — repair planning by failure kind 4. **Feedback Interpreter** (§14.3.3) — feedback parsing and scope inference 5. **Plan Verifier** (§11.5) — adversarial plan critique (Five points if Plan Verifier is counted as a coordination point; conventionally grouped under Revision Compiler.) ## §17.2 Shared protocols All sub-agents: - Operate on scoped context packs (§8.3) - Emit output conforming to AdvisorySubAgentOutput union (§8.5) - Honor per-invocation cost and timeout budgets - Inherit input taint per §15.10 - Contribute to reputation scoring (§15.8) - Are registered via AdvisorySubAgentProfile (§8.4) ## §17.3 Advisory vs execution distinction - **Advisory sub-agents**: Compiler accepts/rejects/defers. Output is evidence, not instruction. - **Execution sub-agents** (specialist subevaluators in §8.1, modules behind revision_in): produce real findings or artifact modifications. The distinction is reflected in `allowed_coordination_points` per profile. ## §17.4 User-defined custom sub-agents Users register custom sub-agents via §8.6 governance. Custom sub-agents: - Start in `sandbox` reputation status - Promote to `active` after manual approval based on sandbox calibration - Subject to the same shared protocols - Inherit governance approval requirements per scope --- # §18. TEACH-FROM-FEEDBACK UI ## §18.1 Default checkboxes Per [P-16] (V2 revision): ``` Teach-from-feedback card defaults: [✓] Fix this run now (default ON) [ ] Add to this saved task (default OFF; user opts in) [ ] Teach future Outcome Evaluators (default OFF; user opts in) [ ] Save as user style/strategy rule (default OFF; user opts in) [ ] Apply globally to all work (default OFF; requires explicit promotion per P14) DEFAULT strength: "Suggestion" (NOT "Strong preference") ``` These defaults apply uniformly regardless of source run's privilege class or matter scope. Defaults are conservative so that durable learning is always an explicit user choice, never a side effect of giving feedback. ## §18.2 Privilege is not a gate on learning V3 does NOT default durable destination checkboxes differently for privileged or matter-scoped runs. The original intuition that privileged runs should default to no-learning was wrong because: 1. Most feedback during privileged work is process or craft knowledge ("lead briefs with dispositive issue," "use CC § 3333 alongside CACI 3903N") that doesn't reveal anything protected 2. Matter-scoped feedback is the most valuable learning source; defaulting it off would defeat the pattern system's purpose The actual privilege firewall lives in: - **Cross-matter retrieval** (§13.4): patterns scoped to matter X never surface during work on matter Y - **Identifying-content scan** (§13.1.2): patterns whose content reveals identifying detail get matter-locked These mechanical rules at the pattern layer protect privilege without gating learning at intake. ## §18.3 UI flow ``` Step 1 — User submits feedback via HumanOutcomeFeedbackEvent Step 2 — Feedback Interpreter classifies and proposes candidate (§14.3) Step 3 — Teach-from-feedback card displays: - The proposed candidate text - Inferred applicability scope (with explanation) - Identifying-content scan result (if any matter detail detected) - Checkboxes per §18.1 defaults - Strength selector Step 4 — User adjusts as needed and confirms Step 5 — System creates: - RevisionRequest (if "Fix this run now" checked) - DurableKnowledgeCandidate (if any durable checkbox checked) - DirectInstructionCandidate (if "Apply globally" checked, with §14.7 scope rules) Step 6 — Confirmation receipt created via EC ``` ## §18.4 Scope override The card surfaces the Feedback Interpreter's inferred scope alongside an editable scope selector. User can override the inference (e.g., narrow a "domain"-scoped pattern to "matter" only, or broaden a "user_preference" to "team"). Override produces a typed signal that feeds the Interpreter's quality program. ## §18.5 Identifying-content detection display When `identifying_content_scan.identifying_detail_detected == true`: - The card surfaces what was detected (e.g., "party name: Paramount") - The card shows whether anonymization was attempted - If anonymization failed and scope is locked to matter, the card explains why - User can review the proposed anonymized text or accept the matter-locked version --- # §19. RESERVED (Reserved for V3.x amendments. Numbering preserved for cross-doc reference stability.) --- # §20. MEASUREMENT SUMMARY ## §20.1 Required quality programs V3 mandates quality programs for: - Outcome Compiler (§15.3) - Revision Compiler (§15.2) - Revisor execution (§15.1) - Plan Verification (§15.4) - Direct Fix (§15.5) - Sub-agents (§15.8) - Patterns (§13.3) ## §20.2 Eval suite cadence - Pre-merge: any spec amendment touching Compiler/Evaluator/Revisor - Nightly: full suite - Per-release: ≥95% known-good, ≥90% known-bad ## §20.3 Per-component health gates ``` Component: Healthy threshold: Outcome Compiler intent_classification_accuracy >= 0.85 Revisor revision_resolved_outcomes_rate >= 0.80 Revisor false_fix_rate <= 0.15 Revisor regression_introduction_rate <= 0.05 Direct Fix direct_fix_class_safe_rate >= 0.99 Sub-agents reputation_score >= 0.60 (per agent) Patterns per-slice convergence_count / usage_count >= 0.70 ``` Crossing a threshold transitions the component to `watch` status; sustained breach transitions to `quarantined` per §13.6 (for patterns) or per-component remediation policy. ## §20.4 Drift Manifest tracking Per §0A.7, deferred behaviors are tracked in the Drift Manifest: - Phase 2 features - Unmeasured components flagged `quality_unmeasured` - TODOs and placeholders - Spec amendments awaiting V3.x bump The manifest is part of §26 Open Questions. --- # §21. UI SURFACES UI surfaces are owned by DOC20; this section specifies the V3 contract DOC20 implements. ## §21.1 Evaluation Result Card Surfaces an OutcomeEvaluationResult: - Overall state and summary - Findings (with state, severity, basis, confidence) - Judgment limitations (with severity and recommended handling) - Verification records - Revision request (if produced) - Assurance summary (satisfied vs unresolved items) - Limitations - Confidence Actions available: review findings, accept/reject findings, override outcome state (with warning), trigger Revisor manually, submit feedback (routes to Feedback Interpreter). ## §21.2 Revision Result Card Surfaces a RevisionRunSummary: - Outcomes targeted vs resolved - Outcomes with regressions - Final artifact versions - Cost breakdown - Explanation trace (markdown display) Actions: accept revision, reject revision, partial accept, view diff (SemanticChangelog above text diff), submit feedback. ## §21.3 CompiledEvaluationPlan preview Per §4.4, the preview card displays: - Interpreted goal - Evaluation lanes with rationale - Required and optional capabilities (with availability indicators) - Required sources (with accessibility indicators) - Expected hard-call types - Threshold extraction record - Novelty assessment - Limitations - Compiler confidence ## §21.4 Hard Call response surface Per §6.5, hard call display: - Question for human - Affected outcomes - HumanDecisionOption list, each with: - Label and description - Goal impact assessment (UI use only per §6.12) - Estimated consequence - Default behavior if no response User selects an option; system records HardCallResolution with compatibility binding (§7.9.2). ## §21.5 Plan review surface For plans requiring `human_gate` per PlanAssurancePolicy: - Plan summary with strategy and rationale - Steps with target modules and capabilities - Estimated cost (with EstimatorConfidence) - Plan risk score - ExplanationTrace (markdown) - For regenerate/restructure steps: companion SemanticChangelog requirement noted Per-step actions: - Accept - Modify - Defer - Skip — only shown if `skippability != "not_skippable"` - For `skippability == "skip_requires_risk_acceptance"`: skip produces explicit risk receipt Plan-level actions: Accept all, Reject all, Abort. ## §21.6 Diff display Per [P-28], for plans with `regenerate` or `restructure` capabilities: 1. SemanticChangelog entries displayed FIRST 2. Raw text diff displayed second, after the changelog 3. User reviews changelog summary before drilling into diff For mechanical changes (DirectFix, format_pass), raw diff is sufficient. ## §21.7 Teach-from-feedback card Per §18. Surfaces: - Proposed candidate - Inferred scope (editable) - Identifying-content scan result (if any) - Checkbox defaults per §18.1 - Strength selector (default "Suggestion") ## §21.8 Pattern display Per [P-37], pattern cards display: ``` Pattern: [from memory / adapted from memory] Context: / / Similarity to current: Prior similar uses (this context): Converged: Regressions: Rollbacks: Goal advancement: (verified by ) ``` Pattern cards MUST NOT display naked global percentages. Aggregate metrics, if displayed, are labeled with context scope. ### §21.8.1 "from memory" vs "adapted from memory" badge (per [I12]) Every pattern card carries a badge in the header that distinguishes how the pattern was applied to the current case: ```ts PatternApplicationBadge = | "from_memory" // applied as-is, no modification | "adapted_from_memory" // matched but Compiler modified parameters or structure ``` **Rules:** - **`from_memory`** is displayed when the Compiler retrieved the pattern and used it verbatim — same RepairStrategyKind, same RepairTarget, same parameters, same ordering. - **`adapted_from_memory`** is displayed when the Compiler modified the pattern for the current case. The modification may be parameter changes (different threshold, different module version), structural changes (added a verification step, reordered steps), or scope changes (narrower applicability for this run). The badge is generated from a `PatternApplicationRecord` produced by the Compiler at plan compilation: ```ts PatternApplicationRecord { pattern_ref: PatternRef pattern_template_version: semver // version of pattern at retrieval application_kind: PatternApplicationBadge // If adapted_from_memory, describe the adaptations adaptations?: Array<{ adaptation_kind: | "parameter_changed" | "step_added" | "step_removed" | "step_reordered" | "scope_narrowed" | "scope_broadened" | "module_substituted" | "capability_version_upgraded" description: string // human-readable explanation rationale: string // why the Compiler made this adaptation }> schema_version: 1 } ``` **UI behavior:** - The badge is a small tag immediately after the pattern name. - Clicking the badge expands a tooltip showing the adaptations (if any) and the rationale. - For `from_memory` patterns, the tooltip shows "Applied as stored on , no modifications." - For `adapted_from_memory` patterns, the tooltip lists each adaptation with rationale. **Why this matters:** the user can distinguish between "the system used a known-working pattern verbatim" (high confidence in the precedent) and "the system reasoned by analogy from a similar pattern" (lower confidence — the user should review the adaptations). This is the difference between citing a directly-on-point case and analogizing. **Validation:** patterns surfaced in the Pattern display without a `PatternApplicationRecord` fire `validation.pattern_display_missing_application_record` (severity error). ## §21.9 Adjust panel The Adjust panel is a privileged override surface (§3.2.5). It exposes: - Direct manipulation of Compiler-inferred parameters (with warnings) - Plan step editing (with re-lint requirement) - Manual capability binding (with compatibility check) - Manual AssuranceBasis override (with quality-program flagging) Adjust panel actions produce typed receipts marked `manual_override`. The quality program tracks override frequency as a Compiler quality signal. --- # §22. VALIDATION CODES Validation codes form a single namespace. Each code is a typed assertion that can fail at schema validation, deterministic linting, or runtime. All codes namespace under `validation.`. ## §22.1 Schema validation codes ``` validation.schema_required_field_missing validation.schema_field_type_mismatch validation.schema_extra_field validation.schema_enum_value_invalid validation.discriminated_union_variant_mismatch validation.schema_version_unsupported ``` ## §22.2 Plan structure validation ``` validation.target_port_bypass // P1 validation.step_kind_action_kind_conflict // P1 validation.module_revision_target_port_invalid // P1 validation.direct_fix_target_port_invalid // P1 validation.instruction_in_not_revision_compatible // §9.8 validation.capability_unavailable validation.capability_version_mismatch // P7 validation.preservation_constraint_unsupported validation.precondition_unsatisfied validation.dag_cyclic validation.idempotency_key_non_deterministic // §11.8 validation.idempotency_key_missing validation.artifact_version_precondition_unknown validation.write_scope_mismatch // §11.10 validation.read_set_missing // P21 validation.plan_assurance_unmet // P5 validation.preservation_contract_violated validation.explanation_trace_missing // §7.10 validation.semantic_changelog_required // P28 ``` ## §22.3 Authority and policy validation ``` validation.policy_decision_missing // §11.19 validation.policy_decision_block validation.policy_decision_stale validation.cil_authority_snapshot_missing // P34 validation.cil_authority_snapshot_stale validation.cil_authority_conflict_unresolved // P34 validation.autonomous_mode_policy_locked_field_violated // P11 ``` ## §22.4 Safety validation ``` validation.custom_instruction_length_exceeded // P12 validation.custom_instruction_taint_violation // P12 validation.taint_clearance_scope_exceeds_tier // P19 validation.taint_clearance_method_unauthorized validation.shadow_workspace_referenced_outside_deprecations // P4 validation.direct_fix_size_used_as_primary_gate // §10.3 validation.coding_dispatch_not_revision_safe // P38 ``` ## §22.5 Runtime validation ``` validation.rolling_hash_chain_broken // P29 validation.live_artifact_hash_mismatch validation.candidate_version_state_transition_invalid validation.candidate_acceptance_missing_receipt // P17 validation.candidate_rejection_missing_receipt // P17 validation.workspace_write_failure validation.receipt_persist_failed validation.budget_logical_exceeded validation.budget_infrastructure_exceeded validation.local_compute_budget_exceeded validation.preemption_applied ``` ## §22.6 Concurrency validation ``` validation.write_write_conflict validation.read_write_staleness // P21 validation.graph_snapshot_stale validation.capability_snapshot_stale validation.concurrent_plan_lost_tie_break ``` ## §22.7 Learning and pattern validation ``` validation.aggregate_metric_used_as_decision_primitive // P18 validation.goal_advancement_source_missing // P20 validation.goal_advancement_self_grading_attempted // P20 validation.pattern_applicability_scope_from_provenance // P39 validation.cross_matter_retrieval_filter_bypass // §13.4 validation.identifying_content_unscanned // §13.1.2 validation.pattern_promotion_default_deny // §16.4 ``` ## §22.8 Sub-agent validation ``` validation.output_contract_violation // §8.5 validation.sub_agent_coordination_point_unauthorized validation.sub_agent_input_class_forbidden validation.sub_agent_sandboxed_advice_used_in_production ``` ## §22.9 Cross-spec validation ``` validation.cross_spec_contract_drift // §0A.6 validation.unspecified_mechanism // §0A.1 validation.spec_collision // §0A.10 ``` ## §22.10 Severity Each validation code carries a severity: - `critical`: blocks the operation; cannot be overridden without spec amendment - `error`: blocks the operation by default; can be overridden by user with `manual_override` receipt - `warning`: surfaces in UI; does not block - `informational`: logged; does not surface to user The severity catalog is maintained in V3 §28 Adjudication Matrix Appendix (cross-reference to source patch when applicable). --- # §23. MIGRATION GUIDE This section specifies the migration path from V2 (the Outcome Evaluator/Revisor design embedded in DOC23 Addenda B R0.6.4) to V3 (this addendum). Migration is mandatory for any deployment of ELNOR that has persisted V2 schemas or saved tasks referencing V2 module types. Migration is one-way: V3 supersedes V2; there is no rollback path that preserves V2 invariants. ## §23.1 Deprecations The following V2 concepts are superseded in V3. Each deprecation lists the V2 surface, the V3 replacement, and the migration disposition. ### §23.1.1 ShadowWorkspace → CandidateArtifactVersion V2's ShadowWorkspace primitive is fully superseded by the CandidateArtifactVersion model inside Source Workspace (§11.11). Migration: - Any persisted ShadowWorkspace records are converted to candidate version chains rooted at the base accepted version. Each shadow edit becomes a `CandidateArtifactVersion` with `state = "candidate"` and a synthesized acceptance policy of `human_gate_required`. - The `shadow_workspace_default` field in V2 `RevisorConfig` is removed. Implementations encountering this field during config read MUST log a deprecation warning and proceed with `candidate_version_policy_default = "candidate_for_meaning_bearing"`. - The ShadowWorkspace concept name is retired. References to "shadow workspace" in user-facing UI text must be replaced with "candidate version" or "draft revision" depending on context. - Validation code: `validation.shadow_workspace_referenced_outside_deprecations` fires if any V3 implementation references ShadowWorkspace outside this section and §28. ### §23.1.2 Single-mode PlanAssurance → PlanAssurancePolicy stack V2's `PlanAssurancePolicy.selected_mode` is superseded by the stack model (P5, §11.4): - `required_modes: PlanAssuranceMode[]` (deterministic_lint always included) - `completed_modes: PlanAssuranceMode[]` - `non_degradable_modes: PlanAssuranceMode[]` - `unmet_required_modes: PlanAssuranceMode[]` (computed) Migration: any persisted plan with `selected_mode` is converted by treating the selected mode as the singleton in `required_modes`. Plans created post-migration always emit the stack form. ### §23.1.3 Global Pattern success_rate → PatternPerformanceSlice V2's Pattern primitive had a single `success_rate: number` field. V3 removes this from the decision-primitive surface: - Persisted patterns with `success_rate` are converted by populating a synthesized `performance_slices` array with a single slice tagged `context_signature: "legacy_unscoped"` and the original numeric value placed in `aggregate_display_metrics.convergence_count / usage_count` ratio fields. - The `success_rate` field is removed from the canonical Pattern schema. Implementations that read legacy records MUST ignore the field for routing/promotion decisions; the field may be read only for UI display labeled as legacy. ### §23.1.4 AssuranceBasis with embedded limitations → split enums V2's `AssuranceBasis` included `human_judgment_needed` and `insufficient_evidence` (and other limitations) as enum values. V3 splits these into `AssuranceBasis` (trust reasons) and `EvaluationLimitationKind` (limitations). Migration: any persisted `OutcomeEvaluationResult` or `EvaluationFinding` records carrying limitation-class values in their `assurance_basis` field are converted by: 1. Moving the limitation value into a new `limitations: EvaluationLimitationKind[]` field 2. Replacing the `assurance_basis` value with `"mixed"` if the verdict has any trust basis remaining, or with the closest applicable basis if not 3. Re-evaluating the verdict under V3 hard-call detection rules (P2): if the converted record now has `EvaluationLimitationKind.human_judgment_needed`, a HardRevisionCall is synthesized for the next compile cycle Validation code: `validation.legacy_assurance_basis_contained_limitation_value` fires during migration and surfaces in the migration manifest. ### §23.1.5 `policy_evaluated` → `policy_backed` + PolicyEvaluationRef V2's `policy_evaluated` AssuranceBasis value is renamed to `policy_backed` and is restricted to cases where the policy engine IS the substantive evaluation method (P3). Most former uses are migrated to a `PolicyEvaluationRef` precondition; only the narrow "policy decides the verdict" case retains `policy_backed`. Migration: each persisted record carrying `policy_evaluated` is rewritten by: 1. Inspecting the record's evaluation context 2. If policy was the precondition gate (most cases): replace with `assurance_basis = ` and add `policy_evaluation_ref: PolicyEvaluationRef` 3. If policy was the substantive verdict (rare; e.g., "is this work product privileged?"): replace with `assurance_basis = "policy_backed"` and retain the policy decision as the verdict source ### §23.1.6 Flat RevisionPlanStep → discriminated union V2's `RevisionPlanStep` was a flat schema with overlapping discriminators (`step_kind` and `action_kind`). V3 replaces this with a discriminated union on `step_kind` only (P1, §7.5). Migration: any persisted V2 plan steps are converted by: 1. Inspecting `(step_kind, action_kind, target_port)` together 2. Selecting the V3 variant per the conversion table: - `action_kind == "module_revision"` + valid `target_module_id` → `ModuleRevisionStep` with `target_port: "revision_in"` (forced) - `action_kind == "direct_fix"` → `DirectFixStep` with `target_port: "none_direct_fix"` - `action_kind == "human_review"` → `HumanJudgmentRequestStep` with `target_port: "human_response_in"` - `action_kind == "gather_information"` → `InformationRequestStep` - `action_kind == "verify"` → `VerificationRequestStep` - Other combinations: surface as migration anomaly, route to manual review 3. Deterministic linting per §11.3 is run against the converted plan; plans that fail post-migration lint are marked `superseded` and the user is prompted to recompile V2 plans that routed `action_kind = "module_revision"` to `target_port = "data_in"` or `"instruction_in"` (the bypass bug) are explicitly rejected during migration with `validation.plan_step_target_port_bypassed_revision_in`. The Revisor must recompile these plans. ### §23.1.7 `autonomous_mode_opt_out` boolean → AutonomousModePolicy V2's `autonomous_mode_opt_out: boolean` is superseded by `AutonomousModePolicy` (P11, §6.6) with locked-false fields for hard-call, policy, privileged-artifact, and external-side-effect gates. Migration: any RevisorConfig with `autonomous_mode_opt_out = true` is converted to `AutonomousModePolicy { skip_low_risk_judgment_gate: true, may_skip_hard_call_gate: false, may_skip_policy_gate: false, may_skip_privileged_artifact_gate: false, may_skip_external_side_effect_gate: false, allowed_assurance_bases: [], max_plan_risk_score: 1.0 }`. The user is notified that the prior broad opt-out has been narrowed to low-risk judgment gates only. ### §23.1.8 LLM-generated idempotency keys → deterministic hashing V2 allowed the Revisor to produce idempotency keys. V3 requires deterministic hashing (§11.10). Migration: any persisted receipts with LLM-generated idempotency keys are retained for audit but cannot be used as preconditions for replay. New revisions use deterministic keys exclusively. ### §23.1.9 Pattern origin as scope → applicability_scope split V2 (and Patch V1) inferred Pattern retrieval scope from the originating matter. V3 (per P39) separates `provenance` from `applicability_scope`, with applicability inferred from pattern *content* (§13.3). Migration: each legacy Pattern record is converted by: 1. Moving the originating-matter field into `provenance.originating_matter_id` 2. Running the V3 Feedback Interpreter's applicability inference against the pattern's payload text 3. Setting `applicability_scope` per the inference result 4. Running the V3 identifying-content scan against the pattern text; if identifying detail is detected and not removable, locking `applicability_scope.scope_kind = "matter"` with `forced_scope_lock = provenance.originating_matter_id` Patterns whose legacy scope was the originating matter but whose content is generalizable (e.g., the California sign-permit example in §13) will widen their applicability scope at migration — this is intentional and correct. Users are notified that patterns from prior privileged or matter-scoped runs may now surface on related work; the migration manifest lists affected patterns for review. ### §23.1.10 Privilege-as-learning-gate → retrieval and content firewall V2's default-deny-on-privileged-runs gate is removed (revised P16). The privilege firewall is moved to the retrieval-scope and identifying-content layers (§13.4, §13.1 identifying-content scan). Migration: Teach-from-feedback UI defaults are reset uniformly per §18.1 (Fix-this-run-now on; all durable destinations off). Persisted Patterns that were "blocked from learning" under V2 because their origin run was privileged are re-evaluated: their pattern text is scanned, and they either become available with appropriate applicability scope, or are locked to matter scope per the content scan result. ## §23.2 Breaking changes for consumers The following changes break consumer code that depended on V2 surfaces: 1. **Field renames in persisted schemas:** `AssuranceBasis` values changed; `RevisorConfig.shadow_workspace_default` removed; `RevisorConfig.autonomous_mode_opt_out` replaced; `RevisionPlanStep` schema shape changed (flat → discriminated union); `Pattern.success_rate` removed; `Pattern.scope` split into `provenance` and `applicability_scope`. 2. **State machine fidelity:** Consumers reading `OutcomeRuntimeState`, `DispatcherState`, `RevisionPlanStatus`, and `FindingState` must accept the V3 canonical enum values per §0.4. Previously-undocumented values (e.g., "passed", "indeterminate") are mapped per the migration table in §23.1. 3. **Receipt envelope:** Consumers reading `RevisionOperationReceipt` must accept the PBEOperationReceiptLite-inherited fields per §11.6. The standalone V2 envelope is converted by populating the inherited fields from V2 equivalents and the new fields with default values per the PBE spec. 4. **Plan-step routing:** Consumers that constructed RevisionPlans directly (rather than via the Revision Compiler) must update to the discriminated-union shape. The deterministic linter (§11.3) will reject V2-shape plans. ## §23.3 Migration manifest Each migration run produces a `MigrationManifest`: ```ts MigrationManifest { manifest_id: string source_version: "V2" | "V2.1" // V2 minor versions if present target_version: "V3" started_at: ISO8601 completed_at?: ISO8601 records_processed: { revision_plans: number candidate_versions: number // new in V3; created from ShadowWorkspace patterns: number revisor_configs: number assurance_results: number receipts: number } anomalies: Array<{ anomaly_kind: string record_ref: StorageRef suggested_action: "auto_resolved" | "user_review_required" | "rejected" notes: string }> rejected_records: Array<{ record_ref: StorageRef reason: string user_action_required: string }> schema_version: 1 } ``` Manifests are persisted via EC and surface in the user's Settings → Migration screen for review. Anomalies tagged `user_review_required` block plan compilation against affected records until reviewed. ## §23.4 Migration ordering Migration MUST proceed in this order: 1. **Schema migrations** (no data changes): update column shapes, add new fields with defaults, register new enum values. 2. **Assurance records** (P2 enum split): convert AssuranceBasis values; populate EvaluationLimitationKind. 3. **Plan and step schemas** (P1 discriminated union): convert flat plan steps; reject bypass plans. 4. **RevisorConfig** (P4, P11): remove shadow_workspace_default; convert autonomous_mode_opt_out. 5. **CandidateArtifactVersion** (P4, P17): convert ShadowWorkspace records; add acceptance receipt fields. 6. **Pattern primitive** (P18, P39): remove success_rate; split provenance from applicability_scope; run identifying-content scan; widen or lock applicability per scan result. 7. **Receipt envelope** (P8): extend records to PBEOperationReceiptLite shape. 8. **Manifest finalization**: produce final MigrationManifest; surface anomalies. Out-of-order migration is rejected by the migration runner. Each step records a step-completed receipt; downstream steps refuse to start until upstream is complete. ## §23.5 Rollback policy V3 migration is one-way. There is no rollback that preserves V2 invariants. If migration fails irrecoverably: - The deployment is held at the last successful migration step's checkpoint - Operation is suspended on the affected subsystem until manual resolution - The user is notified via the Settings → Migration screen with the anomaly list Recovery options are: fix the data anomaly and re-run from the failed step; or, in extreme cases, restore from pre-migration backup and re-attempt the entire migration after the underlying issue is resolved. There is no "partial V3" operational mode. --- # §24. COMPLIANCE MATRIX This section maps the addendum's explicit invariants to the V3 sections that specify them, the validation codes that enforce them, and the conformance fixtures that test them. Implementations are validated against this matrix as a precondition for production deployment. ## §24.1 Invariant index | Invariant | Specified in | Validation code | Conformance fixture | |---|---|---|---| | `ModuleRevisionStep.target_port` must equal `"revision_in"` | §7.5, §9.1 | `validation.plan_step_target_port_bypassed_revision_in` | F-PORT-01 | | `DirectFixStep` must have `target_port = "none_direct_fix"` and no `target_module_id` | §7.5, §10.2 | `validation.direct_fix_step_has_target_module_id` | F-DIRFIX-01 | | `instruction_in` is never a revision target unless module declares `instruction_in_revision_compatible: true` | §9.8 | `validation.instruction_in_used_as_revision_target_without_capability` | F-PORT-02 | | `AssuranceBasis` does not contain limitation values | §5.4 | `validation.legacy_assurance_basis_contained_limitation_value` | F-ASSURE-01 | | HardRevisionCall triggers on `EvaluationLimitationKind.human_judgment_needed`, not on AssuranceBasis | §5.4, §6.5 | `validation.hard_call_triggered_on_assurance_basis` | F-HARDCALL-01 | | `RevisorConfig` does not contain `shadow_workspace_default` | §6.4 | `validation.shadow_workspace_referenced_outside_deprecations` | F-CFG-01 | | `PlanAssurancePolicy.required_modes` ⊆ `completed_modes` for dispatchable plans | §11.4, §11.5 | `validation.plan_dispatched_with_unmet_required_modes` | F-ASSURE-02 | | `non_degradable_modes` cannot be removed by budget degradation | §11.4, §11.15 | `validation.degradation_removed_non_degradable_mode` | F-BUDGET-01 | | `ModuleRevisionCapability` declares `capability_id` and `capability_version` | §9.2 | `validation.module_revision_capability_missing_version` | F-CAP-01 | | `RevisionOperationReceipt` extends `PBEOperationReceiptLite` with required PBE fields | §11.6 | `validation.revision_receipt_missing_pbe_field` | F-RECEIPT-01 | | `AutonomousModePolicy.may_skip_hard_call_gate` is locked false | §6.6 | `validation.autonomous_mode_attempted_hard_call_bypass` | F-AUTO-01 | | `AutonomousModePolicy.may_skip_policy_gate` is locked false | §6.6 | `validation.autonomous_mode_attempted_policy_bypass` | F-AUTO-02 | | `AutonomousModePolicy.may_skip_privileged_artifact_gate` is locked false | §6.6 | `validation.autonomous_mode_attempted_privilege_bypass` | F-AUTO-03 | | `AutonomousModePolicy.may_skip_external_side_effect_gate` is locked false | §6.6 | `validation.autonomous_mode_attempted_side_effect_bypass` | F-AUTO-04 | | `custom_instruction` is wrapped in envelope with authority_class, taint_class, length, content scan | §9.6, §15.10 | `validation.custom_instruction_taint_violation` | F-INJECT-01 | | `HardCallResolution` requires outcome/goal/evidence hash compatibility for reuse | §7.9 | `validation.hard_call_resolution_compatibility_check_skipped` | F-HARDCALL-02 | | `DirectInstructionCandidate` for privileged feedback locked to `current_run_only` | §14.7 | `validation.direct_instruction_scope_exceeds_privilege` | F-PRIV-01 | | `GateSkippability = "not_skippable"` hides Skip in UI | §21 | `validation.skip_button_rendered_on_not_skippable` | F-GATE-01 | | Teach-from-feedback durable checkboxes default OFF | §18.1 | `validation.teach_from_feedback_default_on_for_durable` | F-TEACH-01 | | `CandidateArtifactVersion` acceptance produces RevisionOperationReceipt | §11.11, §11.6 | `validation.candidate_accepted_without_receipt` | F-CAND-01 | | `Pattern.success_rate` field is not present | §13.1 | `validation.pattern_carries_global_success_rate` | F-PATTERN-01 | | Taint clearance scope ≤ user's AccessTier mapping | §15.12, §16.2 | `validation.taint_clearance_scope_exceeds_tier` | F-TAINT-01 | | `goal_advancement_count` increments only via independent comparative judge or human feedback | §6.12, §13.3 | `validation.goal_advancement_self_graded` | F-LEARN-01 | | Every `RevisionPlan` carries `PlanReadSet` and `PlanWriteSet` | §11.9 | `validation.plan_missing_read_or_write_set` | F-CONCUR-01 | | `OutcomeEvaluationState` and `OutcomeLifecycleState` are separate fields | §5.1 | `validation.outcome_state_uses_legacy_enum` | F-STATE-01 | | `RevisionDispatcherState` and `RevisionFailureEventKind` are separate enums | §11.2 | `validation.failure_event_used_as_dispatcher_state` | F-STATE-02 | | Concurrency tie-breaker uses canonical fields (P24) | §11.9 | `validation.tie_breaker_references_undefined_field` | F-CONCUR-02 | | Workspace write failure mapped to typed `WorkspaceWriteFailureKind` | §11.17 | `validation.workspace_failure_used_partially_completed_blanket` | F-WSFAIL-01 | | Cost estimator confidence gates auto-approval | §15.5, §11.15 | `validation.uncalibrated_estimator_auto_approved_above_threshold` | F-COST-01 | | Upstream module failure transitions dependent outcomes to `upstream_failure` | §5.14 | `validation.pending_dependency_did_not_cascade_on_upstream_failure` | F-CASCADE-01 | | Regenerate/restructure plans require SemanticChangelog | §7.11 | `validation.regenerate_step_missing_semantic_changelog` | F-DIFF-01 | | Multi-step plans use candidate-only mutation by default | §11.20 | `validation.multi_step_plan_used_live_mutation_without_optin` | F-HASH-01 | | Sandboxed evaluation does not bleed taint to main graph | §11.12 | `validation.tainted_candidate_eval_promoted_to_main` | F-SANDBOX-01 | | Yield-back accumulates into candidate chain, not live mutations | §6.7, §11.11 | `validation.yield_back_left_live_mutations` | F-YIELD-01 | | Budget bifurcated: logical and infrastructure counters separate | §11.15, §6.4 | `validation.infrastructure_retry_counted_against_logical_budget` | F-BUDGET-02 | | `NoveltyAssessment.novelty_score` equals `closest_pattern_distance` | §5.2, §6.1 | `validation.novelty_score_inverted` | F-NOVEL-01 | | Every RevisionIntelligencePacket carries `cil_authority_snapshot_ref` | §7.4, §11.3 | `validation.revision_packet_missing_cil_authority_snapshot` | F-CIL-01 | | `QualitySignal.actionability` declared for every metric | §15.9 | `validation.quality_signal_missing_actionability` | F-QUAL-01 | | Source repair max-depth default = 1; overrides bounded ≤ 3 | §6.11 | `validation.source_repair_depth_exceeds_max` | F-REPAIR-01 | | Pattern UI does not display naked global aggregate | §21.8 | `validation.pattern_ui_displayed_unscoped_aggregate` | F-PATTERN-02 | | `step.coding` not a revision target without `acp_profile_revision_safe` | §9.7 | `validation.coding_module_received_revision_without_safe_profile` | F-CODING-01 | | `Pattern.applicability_scope` separate from `Pattern.provenance` | §13.1 | `validation.pattern_origin_used_as_applicability` | F-PATTERN-03 | ## §24.2 Severity bands Validation codes carry severity per §22.10. The compliance matrix's defaults: - All CRITICAL patches (P1, P2, P5, P19, P20, P30): validation severity `critical` - All HIGH patches: validation severity `error` (default) or `critical` for the safety-critical subset (P1 target_port, P12 injection envelope, P21 read sets, P38 coding dispatch) - MEDIUM and LOW patches: severity `error` for runtime correctness, `warning` for UI and display Severity may not be lowered for a specific validation code without a spec amendment. ## §24.3 Conformance gates A deployment is V3-conformant when: 1. All `critical` validation codes pass on the implementation's static and runtime test suite 2. All conformance fixtures listed in §25 produce expected outputs 3. The Drift Manifest (§20.4) shows zero uncorrected drift on schema, state machine, and enum surfaces 4. The MigrationManifest (§23.3) shows no `user_review_required` anomalies outstanding 5. The cross-doc obligations in §29 are at least at status `consumed_in_v3` for their owner documents Deployments failing any of (1)–(4) are not eligible for production use of the affected subsystem. Deployments failing (5) operate in degraded mode with reduced capability surface until the cross-doc obligation lands. --- # §25. STRESS TEST FIXTURES This section specifies the executable conformance fixtures that validate V3 implementations against the compliance matrix in §24. Each fixture has a setup, an operation, an expected outcome, and a failure signature. Fixtures are executable; they are run as part of the implementation's test suite and during the Drift Manifest pass. ## §25.1 Fixture schema ```ts ConformanceFixture { fixture_id: string fixture_name: string invariant_refs: string[] // validation code names per §24 setup: { initial_state: SerializedRecord[] // pre-state to load actors: Array<{ ref: ActorRef, tier: AccessTier }> config_overrides?: Partial } operation: { operation_kind: string // what the fixture invokes operation_payload: SerializedRecord } expected_outcome: { expected_success: boolean expected_records: SerializedRecord[] expected_validation_codes: string[] // codes that should fire (if any) expected_state_transitions: Array<{ from: string, to: string }> expected_ui_status: RevisionUIStatus } failure_signature: { on_unexpected_pass: string // what does it mean if the unexpected case succeeded? on_unexpected_fail: string // what does it mean if expected case failed? severity: "critical" | "error" | "warning" } schema_version: 1 } ``` ## §25.2 Required fixtures The following fixtures are mandatory. Implementations failing any required fixture are not V3-conformant. ### F-PORT-01: target_port bypass attempted **Setup:** A RevisionPlan containing a `ModuleRevisionStep` with `target_port: "data_in"` is submitted to the Revision Dispatcher. **Operation:** Plan validation (deterministic linting). **Expected outcome:** Plan REJECTED. Validation code `validation.plan_step_target_port_bypassed_revision_in` fires with severity `critical`. Plan status transitions to `failed`. UI status: `"Blocked"`. **Failure signature:** If plan is accepted, the architecture's central safety contract is bypassable. Severity `critical`. ### F-PORT-02: instruction_in used as revision target without capability **Setup:** A `ModuleRevisionStep` targets a module whose `ModuleRevisionCapability.instruction_in_revision_compatible = false` with `target_port: "instruction_in"`. **Operation:** Plan validation. **Expected outcome:** Plan REJECTED. Validation code `validation.instruction_in_used_as_revision_target_without_capability` fires. **Failure signature:** Revision plans can route through ordinary instruction ports, bypassing capability contracts. Severity `critical`. ### F-DIRFIX-01: DirectFixStep carries target_module_id **Setup:** A `DirectFixStep` is submitted with `target_module_id` set. **Operation:** Plan validation. **Expected outcome:** Plan REJECTED. Validation code `validation.direct_fix_step_has_target_module_id` fires. **Failure signature:** Step kind discrimination is incomplete. Severity `error`. ### F-ASSURE-01: AssuranceBasis legacy limitation value **Setup:** An `OutcomeEvaluationResult` with `assurance_basis = "human_judgment_needed"` (legacy V2 value). **Operation:** Record write to evaluation store. **Expected outcome:** Record REJECTED at schema validation. Validation code `validation.legacy_assurance_basis_contained_limitation_value` fires. **Failure signature:** V3 schema does not enforce P2 split. Severity `critical`. ### F-HARDCALL-01: HardCall triggered on AssuranceBasis **Setup:** An `OutcomeEvaluationResult` has `assurance_basis = "llm_expert_judgment"` (a valid trust basis) and `limitations: []`. **Operation:** Hard Call detection. **Expected outcome:** No HardCall fired. The judgment basis is a default human gate trigger (per §6.6), but it is NOT a HardCall trigger. The user sees a `needs_human_judgment` gate but not a HardCall escalation. **Failure signature:** Hard call detection is mis-keyed on AssuranceBasis. Severity `error`. ### F-CFG-01: shadow_workspace_default present **Setup:** A V2 RevisorConfig is loaded with `shadow_workspace_default = true`. **Operation:** Config validation. **Expected outcome:** Config flagged as deprecated. Validation code `validation.shadow_workspace_referenced_outside_deprecations` fires with severity `warning` during migration window, `error` after migration window closes. **Failure signature:** Deprecation is not enforced. Severity `error`. ### F-ASSURE-02: Plan dispatched with unmet required modes **Setup:** A RevisionPlan with `required_modes: ["deterministic_lint", "semantic_lint", "human_gate"]` and `completed_modes: ["deterministic_lint"]`. **Operation:** Plan dispatch. **Expected outcome:** Dispatch BLOCKED. Validation code `validation.plan_dispatched_with_unmet_required_modes` fires. **Failure signature:** Single-mode V2 logic resurrected. Severity `critical`. ### F-BUDGET-01: Degradation removes non-degradable mode **Setup:** A high-cost plan with `non_degradable_modes: ["dry_run"]` and budget pressure forcing degradation. **Operation:** Degradation step executes. **Expected outcome:** Dry-run is NOT skipped. Instead, dispatch BLOCKS with `block_and_escalate` and a UI prompt for budget extension. **Failure signature:** Safety mode silently removed under budget pressure. Severity `critical`. ### F-CAP-01: ModuleRevisionCapability missing version **Setup:** A module declares `ModuleRevisionCapability` without `capability_version` or `input_schema_version`. **Operation:** Capability registration. **Expected outcome:** Registration REJECTED. Validation code `validation.module_revision_capability_missing_version` fires. **Failure signature:** Capability contract drift possible. Severity `error`. ### F-RECEIPT-01: RevisionOperationReceipt missing PBE field **Setup:** A RevisionOperationReceipt produced without `section_ref`, `local_payload_schema_version`, or `ec_sequence_number`. **Operation:** Receipt persistence. **Expected outcome:** Receipt REJECTED. Validation code `validation.revision_receipt_missing_pbe_field` fires. **Failure signature:** Receipt is not PBE-compatible; transaction-kernel migration becomes interpretive. Severity `error`. ### F-AUTO-01 through F-AUTO-04: AutonomousModePolicy bypass attempts Four fixtures, one for each locked-false field. Each constructs a `RevisorConfig` with `AutonomousModePolicy.may_skip_ = true` (attempted override) and submits it to the config validator. **Expected outcome (each):** Config REJECTED. The corresponding `validation.autonomous_mode_attempted__bypass` validation code fires with severity `critical`. **Failure signature (each):** User or implementer accidentally converted hard calls / policy gates / privileged-artifact gates / external-side-effect gates into ordinary autonomous dispatch. Severity `critical`. ### F-INJECT-01: custom_instruction contains tainted text **Setup:** A `TypedRevisionInstruction` with `custom_instruction.text` containing text matching the prohibited-content scan (e.g., known prompt-injection pattern) and `custom_instruction.taint_class = "external_untrusted"`. **Operation:** Plan compilation. **Expected outcome:** Plan compilation BLOCKED. Validation code `validation.custom_instruction_taint_violation` fires. **Failure signature:** Prompt injection through revision channel possible. Severity `critical`. ### F-HARDCALL-02: HardCallResolution reused with mismatched evidence **Setup:** A prior HardCallResolution exists with `evidence_snapshot_ref = ev_v3`. Current outcome's evidence is `ev_v5` (changed since resolution). **Operation:** Compiler attempts to reuse the prior resolution. **Expected outcome:** Reuse REJECTED. The call re-escalates as a new HardRevisionCall. Validation code `validation.hard_call_resolution_compatibility_check_skipped` fires if the implementation did not perform the check. **Failure signature:** Stale user directives silently apply to new situations. Severity `error`. ### F-PRIV-01: DirectInstructionCandidate scope exceeds privilege **Setup:** A HumanOutcomeFeedbackEvent with `authority_class = "privileged_comment_no_learning"` is processed; the Feedback Interpreter generates a DirectInstructionCandidate with `proposed_scope.user_scope = "team"`. **Operation:** Candidate validation. **Expected outcome:** Candidate REJECTED. Validation code `validation.direct_instruction_scope_exceeds_privilege` fires. The scope is forced to `["current_run_only"]`. **Failure signature:** Privileged feedback escapes its narrow scope. Severity `critical`. ### F-GATE-01: Skip button rendered on not_skippable gate **Setup:** A HardRevisionCall blocking dispatch. UI rendering invoked. **Operation:** Render the gate decision panel. **Expected outcome:** Accept and Modify buttons rendered. Skip button NOT rendered. Validation code `validation.skip_button_rendered_on_not_skippable` fires if Skip is rendered. **Failure signature:** User can silently bypass a hard call. Severity `critical`. ### F-TEACH-01: Teach-from-feedback durable checkbox defaults on **Setup:** A fresh teach-from-feedback card is rendered after a non-privileged run. **Operation:** Inspect default checkbox states. **Expected outcome:** Only `"Fix this run now"` is checked. All durable destinations unchecked. Validation code `validation.teach_from_feedback_default_on_for_durable` fires if any durable destination is pre-checked. **Failure signature:** Accidental durable learning from one-off corrections. Severity `error`. ### F-CAND-01: Candidate accepted without receipt **Setup:** A CandidateArtifactVersion in `state = "candidate"` is transitioned directly to `state = "accepted"` without producing a RevisionOperationReceipt. **Operation:** State transition. **Expected outcome:** Transition REJECTED. Validation code `validation.candidate_accepted_without_receipt` fires. **Failure signature:** Acceptance is a silent state flip; audit trail broken. Severity `critical`. ### F-PATTERN-01: Pattern carries global success_rate **Setup:** A Pattern record with `success_rate: 0.87` (legacy V2 field) is submitted to the pattern store. **Operation:** Pattern write. **Expected outcome:** Write REJECTED. Validation code `validation.pattern_carries_global_success_rate` fires. **Failure signature:** Decision primitives based on misleading global aggregates. Severity `error`. ### F-PATTERN-02: Pattern UI displays naked aggregate **Setup:** Pattern UI rendering with a pattern that has `aggregate_display_metrics` populated. **Operation:** Render pattern card. **Expected outcome:** Aggregate is shown with explicit context scope label. Naked percentage without context is NOT shown. Validation code `validation.pattern_ui_displayed_unscoped_aggregate` fires if naked aggregate is rendered. **Failure signature:** Misleading UI; user makes decisions based on out-of-context aggregate. Severity `warning`. ### F-PATTERN-03: Pattern origin used as applicability scope **Setup:** A Pattern record is created from a Paramount-matter feedback event. The pattern content is generalizable craft knowledge (e.g., a California damages citation rule). The implementation conflates origin with applicability. **Operation:** Pattern creation. **Expected outcome:** Feedback Interpreter analyzes pattern content (not origin) and sets `applicability_scope.scope_kind = "domain"` or `"work_product_type"`. `provenance.originating_matter_id = paramount` is set. `applicability_scope.matter_id` is NOT set. Pattern is retrievable for non-Paramount California sign-permit work. **Failure signature:** Cross-matter craft knowledge locked to originating matter; learning pipeline broken. Severity `error`. ### F-TAINT-01: Taint clearance exceeds access tier **Setup:** A user with `AccessTier = "matter_team_access"` attempts to clear taint with `clearance_scope = "firm"`. **Operation:** Taint clearance record write. **Expected outcome:** Write REJECTED. Validation code `validation.taint_clearance_scope_exceeds_tier` fires. **Failure signature:** Junior user can poison firm-wide pattern graph. Severity `critical`. ### F-LEARN-01: Goal advancement self-graded **Setup:** A `PatternPerformanceSlice` is incremented with `goal_advancement_source = "revisor_generated_goal_impact_assessment"` (which is not a valid source per P20). **Operation:** Slice update. **Expected outcome:** Update REJECTED. Validation code `validation.goal_advancement_self_graded` fires. **Failure signature:** Sycophancy delusion in long-term learning. Severity `critical`. ### F-CONCUR-01: Plan missing read or write set **Setup:** A RevisionPlan submitted without `PlanReadSet` or without `PlanWriteSet`. **Operation:** Plan validation. **Expected outcome:** Plan REJECTED. Validation code `validation.plan_missing_read_or_write_set` fires. **Failure signature:** Concurrency conflicts undetected. Severity `error`. ### F-CONCUR-02: Read/write staleness undetected **Setup:** Plan A reads artifact v10 and is queued. Plan B writes artifact v11 and commits. Plan A is dispatched. **Operation:** Plan A dispatch. **Expected outcome:** Plan A's `PlanReadSet` validation detects v10→v11 movement; Plan A is aborted and replanned with fresh read snapshot. **Failure signature:** Plans operate on stale facts; downstream artifacts derived from inconsistent state. Severity `error`. ### F-STATE-01: Outcome state uses legacy enum **Setup:** An `OutcomeRuntimeState.evaluation_state = "passed"` (V2 value, not in V3 canonical enum). **Operation:** State persistence. **Expected outcome:** Persistence REJECTED. Validation code `validation.outcome_state_uses_legacy_enum` fires. **Failure signature:** State machine fidelity broken. Severity `error`. ### F-STATE-02: Failure event used as DispatcherState **Setup:** `RevisionDispatcherState = "failed_validation"` (which is a FailureEventKind, not a DispatcherState). **Operation:** State persistence. **Expected outcome:** Persistence REJECTED. Validation code `validation.failure_event_used_as_dispatcher_state` fires. **Failure signature:** Event/state conflation. Severity `error`. ### F-WSFAIL-01: WorkspaceWriteFailure used partially_completed blanket **Setup:** A workspace write returns `no_artifact_written` (nothing durably persisted). The implementation classifies the step as `partially_completed`. **Operation:** Execution status assignment. **Expected outcome:** Status assignment REJECTED. The correct status is `failed_runtime`. Validation code `validation.workspace_failure_used_partially_completed_blanket` fires. **Failure signature:** Recovery path mis-routed; no-artifact-written treated as partial success. Severity `error`. ### F-COST-01: Uncalibrated estimator auto-approves above threshold **Setup:** Estimator confidence is `"uncalibrated"`. Estimated cost is above the per-config threshold for `uncalibrated` (default 50% of nominal cap). **Operation:** Plan auto-approval check. **Expected outcome:** Auto-approval BLOCKED. User confirmation requested. Validation code `validation.uncalibrated_estimator_auto_approved_above_threshold` fires if approval proceeds. **Failure signature:** Fake safety from uncalibrated cost cap. Severity `error`. ### F-CASCADE-01: Pending dependency does not cascade on upstream failure **Setup:** Outcome A in `pending_dependency` waiting on artifact from Module M. Module M terminates with `could_not_fix` and retry budget exhausted. **Operation:** Loop Controller sweep. **Expected outcome:** Outcome A transitions to `upstream_failure`. Loop Controller emits `RevisionOperationReceipt` with `operation_kind = "escalation_created"`. Validation code `validation.pending_dependency_did_not_cascade_on_upstream_failure` fires if A remains pending. **Failure signature:** Cascading deadlock; tasks hang indefinitely. Severity `critical`. ### F-DIFF-01: Regenerate step missing SemanticChangelog **Setup:** A `ModuleRevisionStep` with `revision_capability_required = "regenerate"` is compiled without a companion request for SemanticChangelog. **Operation:** Plan compilation. **Expected outcome:** Plan compilation BLOCKED. The companion request for SemanticChangelog is auto-inserted (preferred), or compilation fails with `validation.regenerate_step_missing_semantic_changelog`. **Failure signature:** Large-scale regenerate outputs unreviewable; safety gate becomes useless. Severity `error`. ### F-HASH-01: Multi-step plan used live mutation without opt-in **Setup:** A 3-step plan with `mutation_mode` unspecified (defaulting to `candidate_only`). Implementation attempts in-place mutation against base artifact. **Operation:** First step execution. **Expected outcome:** Mutation BLOCKED. Live artifact is unchanged; candidate version is created instead. Validation code `validation.multi_step_plan_used_live_mutation_without_optin` fires if in-place mutation occurred. **Failure signature:** Hash precondition paradox aborts valid plans, or live state is left in partial state. Severity `error`. ### F-SANDBOX-01: Tainted candidate evaluation promoted to main graph **Setup:** A `CandidateArtifactVersion` in `state = "candidate"` carries `taint_class = "external_untrusted"`. Evaluation runs against the candidate. **Operation:** Evaluation result persistence. **Expected outcome:** Result is stored in candidate-scoped storage; main `OutcomeRuntimeState` is NOT updated. Pattern learning signals carry candidate-scope tag. Validation code `validation.tainted_candidate_eval_promoted_to_main` fires if the result lands in main graph. **Failure signature:** Graph poisoned by tainted candidate evaluation. Severity `critical`. ### F-YIELD-01: Yield-back leaves live mutations **Setup:** A 5-step plan executes 2 mutating steps successfully, then yields back to the Revisor. The implementation persisted the 2 mutations into live artifact versions. **Operation:** Yield-back execution. **Expected outcome:** The 2 step mutations are accumulated into a `CandidateArtifactVersion` chain. Live artifacts remain at base versions. Validation code `validation.yield_back_left_live_mutations` fires if live mutations are detected. **Failure signature:** Transaction atomicity broken; Revisor replans against moving target. Severity `error`. ### F-BUDGET-02: Infrastructure retry counted against logical budget **Setup:** Local model returns malformed JSON; infrastructure retries 3 times. `max_logical_llm_calls_per_revision` is incremented each time. **Operation:** Budget accounting. **Expected outcome:** Only successful inferences count against `max_logical_llm_calls`. Infrastructure retries count against `max_infrastructure_retries`. Validation code `validation.infrastructure_retry_counted_against_logical_budget` fires if logical budget is incremented on retry. **Failure signature:** Local model stuttering exhausts planning budget; Revisor aborts valid plans. Severity `error`. ### F-NOVEL-01: NoveltyAssessment math inverted **Setup:** `closest_pattern_distance = 0.9` (high distance, low similarity). Implementation computes `novelty_score = 1 - 0.9 = 0.1`. **Operation:** Novelty assessment. **Expected outcome:** `novelty_score = 0.9` (equal to distance). `similarity_score = 0.1`. Pattern is treated as highly novel, NOT highly similar. Validation code `validation.novelty_score_inverted` fires if computation is inverted. **Failure signature:** Pattern matching backwards; novel situations treated as familiar. Severity `error`. ### F-CIL-01: RevisionIntelligencePacket missing CIL authority snapshot **Setup:** A RevisionIntelligencePacket is compiled without `cil_authority_snapshot_ref`. **Operation:** Plan validation. **Expected outcome:** Plan REJECTED. Validation code `validation.revision_packet_missing_cil_authority_snapshot` fires. **Failure signature:** Plan compiled without authority context; standing orders silently ignored. Severity `error`. ### F-QUAL-01: QualitySignal missing actionability **Setup:** A `QualitySignal` record is written without `actionability` field set. **Operation:** Signal persistence. **Expected outcome:** Write REJECTED. Validation code `validation.quality_signal_missing_actionability` fires. **Failure signature:** Implementers guess whether a signal blocks or merely observes. Severity `error`. ### F-REPAIR-01: Source repair depth exceeds max **Setup:** RevisorConfig has `max_source_repair_depth_override = 5`. **Operation:** Config validation. **Expected outcome:** Override REJECTED at config validation (default max is 3). Validation code `validation.source_repair_depth_exceeds_max` fires. **Failure signature:** Unbounded source research loops. Severity `error`. ### F-CODING-01: step.coding received revision without safe profile **Setup:** A ModuleRevisionStep targets `step.coding` and the module's ACP profile does not have `acp_profile_revision_safe = true`. **Operation:** Plan validation. **Expected outcome:** Plan REJECTED. Validation code `validation.coding_module_received_revision_without_safe_profile` fires. **Failure signature:** Coding module receives revision dispatch without ACP safety profile; shell/filesystem access not constrained. Severity `critical`. ## §25.3 Additional behavioral fixtures (Gemini-derived) The following fixtures cover emergent-behavior bugs caught during V3.1 red teaming. They exercise multi-step or multi-actor scenarios that single-record fixtures cannot validate. ### F-SYC-01: Sycophancy-driven pattern learning (P20) **Setup:** A Revisor executes a plan that does not actually advance DOC72 goals. The Revisor's `GoalImpactAssessment` reports `goal_advanced: true` with a plausible rationale. No independent comparative-judge evaluator is invoked. No human feedback is captured. **Operation:** Pattern slice update at run completion. **Expected outcome:** `goal_advancement_count` is NOT incremented. The Revisor's assessment populates the UI (per §6.12) but does not enter the durable performance slice. Validation code `validation.goal_advancement_self_graded` fires if the slice is incremented. **Failure signature:** Long-term pattern delusion through self-grading. Severity `critical`. ### F-PRIV-02: Junior user clears taint for global promotion **Setup:** Junior user (`AccessTier = "matter_team_access"`) reviews a payload that arrived via `external_untrusted` source. The payload contains a prompt-injection pattern that survives review. Junior clicks "Accept and clear taint." **Operation:** Taint clearance. **Expected outcome:** Taint cleared only for `clearance_scope = "this_matter"`. Junior CANNOT clear to firm or global scope. Subsequent pattern promotion to global scope (by a supervisor) requires `supervising_attorney_review` tier on the clearance record; junior's clearance is insufficient. Validation code `validation.taint_clearance_scope_exceeds_tier` fires if junior's clearance enables global promotion. **Failure signature:** Privilege escalation via taint clearance. Severity `critical`. ### F-CASCADE-02: Multi-hop upstream failure cascade **Setup:** Outcome C depends on artifact from Module M2. Module M2 depends on artifact from Module M1. Module M1 fails with `could_not_fix` and retry exhaustion. **Operation:** Loop Controller sweep. **Expected outcome:** Outcome C (and Module M2's outcomes) transition to `upstream_failure`. The cascade is transitive. The sweep is bounded (max iterations per `LoopControllerConfig`). **Failure signature:** Partial cascade leaves some outcomes hanging. Severity `error`. ### F-DIFF-02: Regenerate-with-changelog rendering **Setup:** A `regenerate` plan completes; the module emits both new content and a `SemanticChangelog` with three entries (combine sections 2 and 3; expand precedent analysis; tighten introduction). **Operation:** UI rendering (§21.6). **Expected outcome:** The diff panel renders the SemanticChangelog entries ABOVE the raw text diff. Each entry shows entry_kind, description, and affected sections. The raw diff is collapsible. **Failure signature:** Reviewer sees only the unreadable delete-all/insert-all diff. Severity `warning`. ### F-HASH-02: Rolling hash chain validation **Setup:** A 3-step rolling-hash plan with `expected_pre_hash` and `produced_post_hash` declared on each step. Step 2's execution produces a hash that does not match Step 3's `expected_pre_hash`. **Operation:** Step 3 dispatch. **Expected outcome:** Step 3 BLOCKED. Plan transitions to `failed`. GraphStateRollback initiates (per §11.13). Validation code `validation.rolling_hash_chain_mismatch` fires. **Failure signature:** Chain inconsistency silently applies stale assumptions to next step. Severity `error`. ### F-SANDBOX-02: Candidate acceptance promotes findings to main graph **Setup:** A candidate evaluation produced 5 findings in candidate-scoped storage. The taint on the candidate is cleared via `user_explicit_review` (within user's access tier). The candidate transitions to `accepted`. **Operation:** Candidate acceptance. **Expected outcome:** The 5 findings are promoted to main graph. Pattern learning signals tied to them attribute normally. Acceptance receipt links the promotion to the user and clearance event. **Failure signature:** Findings remain candidate-scoped after acceptance; main graph never updates. Severity `error`. ### F-YIELD-02: Yield-back followed by replan **Setup:** Plan A yields back after 2 steps (in candidate chain C1). Revisor compiles Plan B with `source_workspace_snapshot_ref = C1`. **Operation:** Plan B execution. **Expected outcome:** Plan B builds on C1's candidate state. If Plan B completes, its final candidate (C2) supersedes C1. If Plan B is rejected, the entire candidate chain (C1+C2) is discarded; live artifacts remain at base. **Failure signature:** Replan loses candidate context; or accumulated candidates orphan when rejected. Severity `error`. ### F-QUAL-02: Quality program required metrics present (per [N1]) **Setup:** A V3.1 deployment is configured. The Quality Program initializes its metric registry. **Operation:** Drift manifest pass enumerates all measured components and checks for required metrics. **Expected outcome:** Every component listed in §20.1 (Revisor, Revision Compiler, Outcome Compiler, Plan Verification, Direct Fix, Eval suite, Known-bad fixtures, Sub-Agent reputation, Taint model, Pattern primitive, Feedback Pipeline) has at least one metric with an explicit denominator (per [K9]) registered. Missing metric registrations fire `validation.quality_program_required_metrics_missing` with the component name. **Failure signature:** Quality dashboard shows numbers without denominators, or some measured components have no metrics at all. Severity `error`. ### F-ASSURE-03: AssuranceBasis enum completeness (per [N2]) **Setup:** A Compiler is constructed with the V3.1 AssuranceBasis enum (14 values per §0.4). **Operation:** Compiler is asked to construct an EvaluationBinding for each AssuranceBasis value across a representative range of outcome kinds (fact_outcome, item_outcome, work_product_outcome, process_outcome, judgment_outcome, meta_outcome). **Expected outcome:** Every AssuranceBasis value is representable in at least one EvaluationBinding. The Compiler does not silently substitute alternates. Limitation values from EvaluationLimitationKind never appear as AssuranceBasis values (per P2 split). **Failure signature:** Compiler can't represent a basis value in a binding (suggesting either the value is dead code or the binding schema is incomplete); or limitation values leak back into AssuranceBasis. Severity `error`. ### F-GOAL-01: Goal impact propagation (per [N3]) **Setup:** A RevisionPlanStep carries `goal_refs: [goal_X, goal_Y]` (from DOC72 goals). The plan is being routed through the Compiler for HardCall surface preparation. **Operation:** Plan compilation produces HardRevisionCall with `goal_impact_assessment` per [D20] / §6.12. **Expected outcome:** Hard Call surface (per §21.4) displays the goal_impact_assessment alongside each HumanDecisionOption. The user sees how each option advances or impairs each declared goal. The goal_impact_assessment does NOT increment any durable `goal_advancement_count` (per P20); only post-run independent evaluation does. **Failure signature:** Goal context is lost between RevisionPlanStep and the HardCall UI; or goal_impact_assessment leaks into durable learning. Severity `error`. ### F-PRESERVE-01: Preservation violation fixture (per [N7]) **Setup:** A RevisionPlan with `PreservationContract.preservation_constraints = [{ constraint_kind: "preserve_factual_assertion", target_assertions: ["acquisition price = $42M"] }]`. The Revisor's plan modifies content that incidentally removes the preserved factual assertion. **Operation:** Plan execution. **Expected outcome:** The post-execution revalidation detects the preservation violation. The candidate version is rejected; the plan transitions to `failed` with `RevisionFailureEventKind = "preservation_violation"`. The user sees a Hard Call ("Revision removed protected content; user judgment required"). **Failure signature:** Protected content is silently lost; or preservation contract is detected but the plan succeeds anyway. Severity `critical`. ### F-LOOP-01: Repeated insufficiency loop breaker (per [N8]) **Setup:** RevisorConfig has `D16` loop breaker at default (3 consecutive `still_failing_same_reason` results). The Revisor produces three consecutive plans against the same outcome, each evaluated and producing `ProgressSignal = "still_failing_same_reason"`. **Operation:** Fourth plan compilation attempt. **Expected outcome:** The fourth attempt is BLOCKED. The Revisor escalates to Task Agent (per §6.9) or to human (per §6.6) with `escalation_reason = "repeated_insufficiency_loop_breaker"`. The escalation record cites the three prior failures. **Failure signature:** Revisor loops indefinitely; or escalation happens but without citing prior failures (so the receiver can't see the pattern). Severity `error`. ### F-DELIVERY-01: Delivery replay blocked (per [N10]) **Setup:** A revision plan would replay an `external_message_send` step (e.g., re-sending an email to a counterparty) with `ReplayPolicy = "never_replay"`. **Operation:** Plan validation at §11.3. **Expected outcome:** Plan BLOCKED at deterministic linting. Validation code `validation.replay_attempted_on_never_replay_step` fires. The Revisor is forced to use one of: `compensating_action_proposal` (e.g., follow-up message explaining a correction), `human_judgment` (let the user decide whether to send another), or `graph_patch_proposal` (re-wire the task graph to avoid replay). **Failure signature:** Duplicate external side effect (e.g., counterparty receives the same email twice with different content); or worse, irreversible action repeated. Severity `critical`. ## §25.4 Fixture execution and reporting Fixtures run during the implementation's test suite, during conformance gate evaluation, and during the periodic Drift Manifest pass. Each fixture run emits a result record: ```ts FixtureResult { fixture_id: string run_at: ISO8601 implementation_version: semver result: "pass" | "fail" | "error" validation_codes_fired: string[] expected_codes_fired: string[] unexpected_codes_fired: string[] expected_state_transitions_observed: boolean notes: string schema_version: 1 } ``` Failure of any `critical`-severity fixture blocks production deployment of the affected subsystem. Failure of `error`-severity fixtures blocks merge to the main implementation branch. `warning`-severity fixtures surface in the test dashboard but do not block. ## §25.5 Fixture maintenance Fixtures are versioned with this addendum. Adding a new fixture requires a V3.x amendment to this section. Modifying fixture expected outcomes requires the same. Removing fixtures is prohibited; deprecated fixtures may be marked `skipped: true` with the rationale recorded. --- # §26. OPEN QUESTIONS AND PHASE 2 DEFERRALS This section records work that V3 explicitly defers, along with the rationale and the conditions under which the work will be revisited. Items here are NOT in scope for V3; implementations must not attempt to satisfy them under the V3 surface. ## §26.1 Phase 2 deferrals ### §26.1.1 Full BDSM evaluator/revisor signal compilation **Deferred:** Full ingestion of evaluator and revisor signals as BDSM utility bundles, with cross-machine aggregation and DOC24 hot-path injection. **Phase 1 substitute:** This addendum captures signals locally (§14.8), stores them in a local signal ledger, and feeds them to the local Quality Program (§15) and Pattern primitive (§13). Local signal capture is sufficient for Phase 1 quality measurement. **Phase 2 trigger:** BDSM utility bundle spec is finalized in DOC24/DOC72; cross-machine signal aggregation infrastructure is ready. **OP-A row:** OBL-BDSM-V3-01 (track Phase 2 readiness). ### §26.1.2 DOC24 hot-path compiled guidance injection from learned patterns **Deferred:** Compiled guidance from learned patterns is injected into the DOC24 hot path (i.e., included in capability resolution and routing without requiring explicit pattern retrieval by the Compiler). **Phase 1 substitute:** The Revision Compiler retrieves patterns at compile time via DOC72's pattern API (§13). Compiled guidance is not injected into routing. **Phase 2 trigger:** DOC24 JIT pack injection model is stable; pattern compatibility checking is performant enough to run on hot paths. **OP-A row:** OBL-DOC24-V3-02. ### §26.1.3 Generic `step.async_module_run` primitive **Deferred:** A generic asynchronous module-run primitive that allows DOC23 to dispatch any module's execution as a non-blocking task with completion callbacks. **Phase 1 substitute:** The Revision Dispatcher (§11) handles async revision dispatch as a specialized case. Generic async module execution is not supported at the DOC23 level in V3. **Phase 2 trigger:** Demand from non-revision use cases that need the same async surface. **OP-A row:** OBL-DOC23-V4-01. ### §26.1.4 Multi-user concurrent task editing **Deferred:** Concurrent multi-user editing of the same task with merge resolution. **Phase 1 substitute:** Tasks are single-user; access tiers determine view-only vs. edit. Multi-user comments are supported (see DOC15), but concurrent task graph mutation is not. **Phase 2 trigger:** Multi-user collaboration becomes a primary use case. **OP-A row:** OBL-EC-MULTI-01. ### §26.1.5 Cross-machine pattern federation **Deferred:** Sharing learned patterns across multiple ELNOR instances (e.g., one user's laptop and desktop, or multiple users at the same firm). **Phase 1 substitute:** Patterns are local to the ELNOR instance. ExportContracts (§16.3) define a manual export path for pattern data, but no automated federation. **Phase 2 trigger:** ExportContracts are stable; identifying-content scan is mature enough to handle cross-instance leakage prevention. **OP-A row:** OBL-DOC72-FED-01. ### §26.1.6 DecisionTraceRecord universal adoption (per [P1]) **Deferred:** Every upstream module emits a `DecisionTraceRecord` for every decision point that contributes to its output. **Phase 1 substitute:** Per §5.10, DecisionTraceRecord is optional in V3.1. The Evaluator and Revisor produce traces for their own decisions, but upstream modules (drafters, source-research modules, citation modules) are not required to emit traces. **Phase 2 trigger:** stable Evaluator/Revisor deployment; demonstrated value of traces for revision diagnosis; module performance overhead within budget. **OP-A row:** OBL-DOC23-TRACE-01. ### §26.1.7 Eval suite full domain coverage (per [P2]) **Deferred:** Full domain coverage for the eval suite — software, research, marketing, general writing, process-trace. **Phase 1 substitute:** §15.6 ships with legal_brief, legal_research_memo, contract_review, discovery_response domains plus a general_writing smoke set. These cover the highest-stakes domains for the initial user (securities litigation). **Phase 2 trigger:** the next user cohort exercises non-legal domains; sufficient fixture corpus is available for each. **OP-A row:** OBL-EVAL-DOMAIN-01. ### §26.1.8 DOC17 prompt artifact integration for evaluator/revisor templates (per [P5]) **Deferred:** Deep integration of evaluator and revisor prompt templates into DOC17 overlay library, so that Prompt Lab edits flow automatically into Evaluator/Revisor template selection. **Phase 1 substitute:** Prompt Lab is usable manually. Users can author or edit prompts in DOC17, then manually paste relevant prompts into Evaluator/Revisor configuration. The system does not automatically discover or version DOC17 prompts for these modules. **Phase 2 trigger:** DOC17 overlay library is stable; prompt versioning across docs is standardized. **OP-A row:** OBL-DOC17-PROMPT-01. ### §26.1.9 Cross-LLM red-team integration as automated process (per [P6]) **Deferred:** Automated cross-model adversarial evaluation of Evaluator/Revisor outputs (e.g., the system routes evaluator outputs through ChatGPT, Gemini, and Grok for adversarial critique without human invocation). **Phase 1 substitute:** Will manually routes Evaluator/Revisor outputs to other models for adversarial review. The system supports the workflow (export evaluator output, import critique) but does not automate the routing. **Phase 2 trigger:** sub-agent reputation system mature enough to weight cross-LLM outputs; standardized critique result schema across models; cost/latency profile acceptable for automated routing. **OP-A row:** OBL-XLLM-RT-01. ### §26.1.10 Per-outcome exact cost attribution (per [P8]) **Deferred:** Exact cost attribution from each LLM invocation, sub-agent call, and module activation to specific outcomes (rather than to plans). **Phase 1 substitute:** §12.5 attributes cost to plans and component invocations. Per-outcome aggregation is computed where needed but not stored per-outcome. **Phase 2 trigger:** demand from cost-conscious matter tracking; instrumentation surface in modules can produce per-outcome attribution without unacceptable overhead. **OP-A row:** OBL-COST-OUTCOME-01. ### §26.1.11 Thermal-aware scheduling (per [P11]) **Deferred:** Full thermal-aware scheduling that anticipates and prevents pressure spikes on local hardware. Includes predictive throttling, parallel sub-agent reduction in advance of pressure, and pre-emptive sequencing of high-load operations. **Phase 1 substitute:** §8.8 hardware-aware degradation monitors memory pressure and reduces parallelism reactively when pressure exceeds threshold. Thermal state is not used predictively in V3.1. **Phase 2 trigger:** thermal monitoring API mature on target hardware (M-series Macs); predictive model trained on user's workload patterns. **OP-A row:** OBL-THERMAL-PRED-01. ## §26.2 Open questions deferred to V3.x The following are architectural questions that V3 does not resolve but that may surface during implementation. Each will be addressed in a V3.x amendment when a clear decision is forced. ### §26.2.1 PolicyDecision lifecycle when artifact version changes **Question:** When an artifact moves from version v10 to v11 (e.g., via direct fix or candidate acceptance), does the PolicyDecision attached to the affected outcomes need to be re-evaluated? **Current behavior (V3 default):** PolicyDecision is attached to the specific artifact version. v11 inherits v10's PolicyDecision unless an explicit policy-relevant change is detected. The Revision Compiler triggers re-evaluation when the change touches a policy-sensitive field. **Open:** A more aggressive re-evaluation policy may be safer. Decision deferred until policy semantics are observed in production. ### §26.2.2 SubAgentReputation across model versions **Question:** When a sub-agent's underlying LLM model is upgraded (e.g., Gemini 2.5 → 3.0), does the sub-agent's reputation reset? **Current behavior (V3 default):** Reputation is tracked per `(sub_agent_profile, model_id)` tuple. A model change creates a new reputation series with `minimum_n_met = false`. **Open:** A migration path that preserves reputation history across compatible model upgrades may be useful. Deferred to V3.x. ### §26.2.3 Pattern obsolescence detection **Question:** When does a pattern stop being applicable? V3's PatternHealth tracks regression counts and contested findings, but no automatic obsolescence trigger exists. **Current behavior (V3 default):** Patterns are demoted to lower scopes by the promotion policy if their regression rate exceeds threshold (§13.5). They are not automatically retired. **Open:** Automatic retirement after N quarters of zero use, or after K consecutive contested findings, may be appropriate. Deferred pending Quality Program data. ### §26.2.4 Cross-matter pattern audit **Question:** Should there be a periodic audit that flags patterns whose `applicability_scope` may have widened or narrowed due to changes in the matter portfolio or domain coverage? **Current behavior (V3 default):** Patterns retain their original `applicability_scope` until manually re-evaluated. **Open:** A periodic re-inference pass (re-run identifying-content scan and applicability inference) may be useful. Deferred to V3.x. ### §26.2.5 Hard call resolution expiration **Question:** Should HardCallResolution records have automatic expiration (e.g., 90 days), or only the compatibility-binding expiration triggered by evidence/goal/outcome changes? **Current behavior (V3 default):** No automatic time-based expiration. Compatibility binding (P13) determines reusability. **Open:** Time-based expiration may catch cases where compatibility check passes but the user's overall judgment has evolved. Deferred to V3.x with optional `expires_at` already in schema for forward-compatibility. ## §26.3 Known-but-unfixed issues The following are known imperfections in V3 that the spec acknowledges but does not currently resolve. Implementations should be aware of them. ### §26.3.1 Pattern compatibility false positives on capability rename If a module capability is renamed without `capability_version` bump (against the discipline in §9.2), pattern compatibility may match a renamed capability and produce unexpected behavior. Mitigation: §0A.6 cross-spec consumption discipline; capability rename without version bump is itself a violation. ### §26.3.2 SemanticChangelog quality depends on producer module The SemanticChangelog (P28) is produced by the regenerating module. Its quality depends on the module's compliance. A module that emits a token "changed everything" changelog technically satisfies the schema but provides no review value. Mitigation: §15.4 Plan Verification metrics include changelog quality scoring; modules with consistently low-quality changelogs receive reputation hits. ### §26.3.3 Local model stuttering can still exhaust infrastructure budget Even with the bifurcated budget (P32), pathological local model failures (e.g., infinite premature-stop loops) can exhaust the infrastructure retry budget. Mitigation: `RevisorConfig.budget_infrastructure.max_total_infrastructure_retries_per_revision` provides a per-revision cap; user is escalated when reached. ### §26.3.4 Sub-agent advice timeliness on slow local hardware When the local hardware is under pressure, sub-agent advice may arrive too late to influence the Revision Compiler's strategy selection. Mitigation: hardware-aware degradation (§8.8) downgrades parallel sub-agents to sequential or skips advisory-only sub-agents when latency exceeds threshold. ### §26.3.5 Pattern UI density on long pattern lists With many patterns surfaced in the Adjust panel, the UI density may overwhelm the user. V3 does not specify a max display count. Mitigation: §21.8 specifies that pattern display is paginated when count exceeds 10, sorted by `(similarity_score desc, last_used desc)`. ## §26.4 Items considered but rejected The following were proposed during V3.1 red teaming and explicitly rejected for V3. They are recorded here so that future reviewers do not re-propose them without addressing the rejection rationale. - **Auto-clearing taint on user navigation:** rejected. Taint clearance is an explicit action with a receipt; navigation does not clear taint. - **Sub-agents writing directly to DOC72:** rejected. Sub-agents are advisory; durable writes go through EC. The Revisor and Compiler are the only consumers of sub-agent output, and they decide what becomes a durable record. - **Allowing `instruction_in` as a generic revision target:** rejected. The whole architecture depends on `revision_in` being the canonical revision channel; opening `instruction_in` defeats the safety model. - **Skipping HardCall when budget is exhausted:** rejected. Budget exhaustion escalates to the user; it does not authorize bypass of HardCall. - **Caching SemanticChangelog across runs:** rejected. Each regenerate produces its own changelog; caching across runs creates the same desync problem the changelog was designed to solve. - **Auto-elevating direct fix to module revision when content class boundary is crossed:** rejected. Class boundary crossing is a Hard Call kind; the user decides whether to upgrade. --- # §27. GLOSSARY This glossary defines terms used throughout the addendum. Where a term is defined in a referenced document (DOC11, DOC15, DOC24, DOC72, DOC73, EC), the glossary notes the canonical owner. **AccessTier** — A user's authorization level for read, write, and clearance operations. See §16.2 for the canonical enum. Owner: §16. **AssuranceBasis** — The reason a verdict is trustworthy. Distinct from `EvaluationLimitationKind` (the reason no trustworthy verdict is available). See §0.4 and §5.4. Owner: this addendum §5.4. **Candidate (artifact) version** — A draft version of an artifact created by a revision step, awaiting acceptance. See §11.11. Owner: this addendum. **CIL** — Cognitive Infrastructure Layer. DOC15's authority memory and standing-order subsystem. Consumed by this addendum at §6 and §11.3 via authority snapshots. Owner: DOC15. **CompiledEvaluationPlan** — The output of the Outcome Compiler: a structured plan that the Evaluator can execute deterministically. See §5.2. **Coordination point** — One of the four locations where advisory sub-agents may be invoked: Outcome Compiler, Evaluator, Revision Compiler, Feedback Interpreter. See §17.1. **DirectFixStep** — A revision plan step that applies a class-safe mechanical edit without dispatching to a module. See §7.5, §10. **DirectInstructionCandidate** — A candidate durable instruction proposed by the Feedback Interpreter from user feedback. See §14.7. **Dispatcher** — The Revision Dispatcher; the runtime service that validates and executes revision plans. See §11. Note: derived runtime service with a canvas read-model, NOT a stored graph module. **EC** — Execution Coordinator. The system-of-record for durable writes, policy decisions, and actor-of-record. Owner: EC documents. **EvaluationLimitationKind** — The reason a trustworthy verdict is unavailable. Distinct from AssuranceBasis. See §0.4, §5.4. **FeedbackInterpreter** — The service that ingests user feedback events and produces routed outputs: revision requests, durable knowledge candidates, direct instruction candidates. See §14.3. **FindingState** — The lifecycle state of an EvaluationFinding. See §0.4, §5.7. **Hard Revision Call** — A revision situation the system is not competent to resolve autonomously; surfaces to user. See §6.5, §7.9. **Identifying content scan** — A scan of pattern candidate text for party names, specific facts, and other matter-identifying detail. See §13.1. **JIT pack** — Just-In-Time capability pack, a DOC24 primitive. Consumed by this addendum via the capability registry. Owner: DOC24. **Logical budget** — The budget for successful inference calls. Distinct from infrastructure budget (retries for JSON/timeout failures). See §11.15, P32. **ModuleRevisionCapability** — A module's declared capability to receive a typed revision via its `revision_in` port. Versioned (P7). See §9.2. **ModuleRevisionStep** — A revision plan step that dispatches to a module's `revision_in` port. See §7.5. **Novelty assessment** — A measure of how dissimilar the current outcome's signature is from prior patterns. Triggers fresh reasoning above threshold (§5.2). Computed per P33 (corrected math). **OP-A** — The cross-doc obligation tracker. Each row records a primitive owed by one document to another. See §29. **OutcomeEvaluationState** — The evaluation-result state of an outcome. Distinct from `OutcomeLifecycleState`. See §0.4, §5.1. **OutcomeLifecycleState** — The existence/active state of an outcome definition. Distinct from `OutcomeEvaluationState`. See §0.4, §5.1. **Pattern** — A learned routing or strategy primitive stored in DOC72. Has `provenance` (audit) and `applicability_scope` (retrieval filter, inferred from pattern content). See §13.1. **PatternPerformanceSlice** — A context-conditioned record of a pattern's performance. See §13.3. Replaces V2's global `success_rate`. **PBE** — Positronic Brain Enhancement. DOC73's durable memory and receipt subsystem. Consumed by this addendum via `RevisionOperationReceipt extends PBEOperationReceiptLite`. Owner: DOC73. **PlanAssurancePolicy** — A stack of required assurance modes (deterministic lint, semantic lint, advisory verifier, forum review, human gate) that must complete before plan dispatch. See §11.4. **PlanReadSet / PlanWriteSet** — Declarations of what artifact versions a plan reads and writes; used for conflict detection. See §11.9. **PolicyDecision** — An EC-owned record of a policy evaluation. Referenced via `PolicyEvaluationRef`. Owner: EC. **Privilege class** — A classification of how privileged a piece of work product is. Determined by PropA. See §14.6. **Provenance** — The audit-trail field on a Pattern record: where the lesson came from. Distinct from `applicability_scope`. See §13.1, P39. **RepairStrategyKind** — A closed enum of revision strategies. See §0.4, §6.3. **RevisionDispatcherState** — The runtime state of the Dispatcher. Distinct from FailureEventKind (events), PlanStatus (plan definition), and UIStatus (user-facing label). See §0.4, §11.2. **RevisionFailureEventKind** — A typed failure event. Distinct from RevisionDispatcherState. See §0.4. **RevisionIntelligencePacket** — The context bundle the Revisor consumes: failure diagnosis, CIL authority snapshot, capability availability, prior receipts, applicable patterns. See §7.4. **RevisionOperationReceipt** — The receipt envelope for revision operations. Extends `PBEOperationReceiptLite`. See §11.6. **RevisionPlan** — A typed sequence of steps to revise an artifact. See §7.2. **RevisionPlanStep** — A discriminated union step in a RevisionPlan. See §7.5. **RevisionPlanStatus** — The lifecycle status of a plan record. Distinct from DispatcherState. See §0.4. **RevisionRequest** — The input to the Revision Compiler: an evaluation finding (or user feedback) that triggers compilation of a plan. See §7.1. **RevisionUIStatus** — The user-facing status label. Derived from (DispatcherState, FailureEventKind, PlanStatus). See §0.4, §11.2.3. **RevisorConfig** — Per-user configuration for revisor behavior. See §6.14. **Rolling hash mode** — An opt-in mutation mode where multi-step plans validate against predicted post-hashes rather than original snapshot. See §11.20. Default is `candidate_only`. **Sandboxed evaluation** — Evaluation of a CandidateArtifactVersion in isolation, where taint and findings do not bleed into main `OutcomeRuntimeState`. See §11.12. **SemanticChangelog** — A structured changelog emitted by a `regenerate` or `restructure` module to support review of large-scale changes. See §7.11. **Source Workspace** — The canonical workspace primitive for artifacts. Single source of truth; replaces V2's ShadowWorkspace. See §12. **Source repair max-depth** — Bound on how deep the Revisor will recurse to fix upstream source gaps. Default 1; override ≤ 3. See §6.11. **step.evaluator** — The persisted module type for evaluators. See §0.3. **step.revisor** — The persisted module type for the Revisor. NOT renamed to `step.revision_planner`. See §0.3. **Sub-agent** — Advisory LLM agent invoked at coordination points. See §8, §17. **Sufficiency Protocol** — The check that evaluation lanes (§5.5) and revision plans (§6.7) cover the outcome's scope adequately. **Taint class** — A label on data indicating its trust level. See §0.4, §15.10. **Taint clearance** — An explicit operation that promotes data from a more-tainted class to a less-tainted class, bound to user's access tier. See §15.12. **TypedRevisionInstruction** — The payload schema for a `revision_in` dispatch. Includes diagnosis, target, constraints, and bounded `custom_instruction` envelope. See §7.6. **Upstream failure** — An OutcomeEvaluationState indicating the upstream artifact will never arrive due to a failed producer module. See §5.14. **Validation code** — A named error/warning emitted by schema validation, deterministic lint, semantic lint, or runtime assertion. See §22. **Yield back** — A Revisor mid-plan handoff that returns control to the Revisor with the partial work accumulated in a candidate chain. See §6.7, §11.11. --- # §28. ADJUDICATION MATRIX APPENDIX This appendix records: 1. The full mapping from card V3.1 items to V3 sections (where each card item landed) 2. The Spec Collision Register (SC-01 through SC-28) 3. The patch landing table (mirror of Canonicalization Patch V2 §9, with V3-section anchors) 4. The validation code severity catalog ## §28.1 Card V3.1 item → V3 section landing matrix Card V3.1 items are organized by their original letter-prefix group. Each row shows the card item, its V3 section, the patch that modified it (if any), and the status (accepted, accepted-with-modification, superseded, deferred). ### PRE-series (Preamble) — complete enumeration | Card | V3 § | Patch | Status | |---|---|---|---| | [PRE-1] Implementation Discipline Preamble | §0A | — | accepted | ### A-series (Architecture / Compilers / Sub-Agents) — complete enumeration | Card | V3 § | Patch | Status | |---|---|---|---| | [A1] Preserve Evaluator → Revisor → Plan → Dispatcher → revision_in chain | §1.1 | — | accepted | | [A2] Rename Revision Bus → Revision Dispatcher | §0.3, §11 | — | accepted | | [A3] Persisted module type stays `step.revisor` | §0.3 | — | accepted (supersedes V2 [A3]) | | [A4] Outcome Compiler as first-class layer | §4 | — | accepted | | [A4a] Revision Compiler as first-class layer | §6.1 | — | accepted | | [A4b] CompiledRevisionStrategy as first-class artifact | §7.8 | — | accepted | | [A4c] Revision Lanes as internal composition | §6.1 | — | accepted | | [A4d] Sub-agents split into advisory vs execution | §8.1, §17 | — | accepted | | [A4d.1] Shared sub-agent protocols | §17.2 | — | accepted | | [A4d.2] User-defined custom advisory sub-agents | §8.6 | — | accepted | | [A4e] Revision Sufficiency Protocol + 7 success conditions | §6.7 | — | accepted | | [A4f] Hard Revision Calls | §6.5, §7.9 | — | accepted | | [A4f.1] Hard Call detection method | §6.5 | P2 | accepted-with-modification | | [A4g] Revisor Quality Program | §15.1 | — | accepted | | [A4g.1] Revision Compiler Quality Program | §15.2 | — | accepted | | [A4h] Revisor failure modes (10 explicit handlers) | §6.8 | — | accepted | | [A4i] Revisor↔Task Agent escalation triggers | §6.9 | — | accepted | | [A4i.1] Task Agent escalation fallback | §6.9 | — | accepted | | [A4j] 11 core Revisor rules | §3.9, §6.15 | — | accepted | | [A4j.1] Planner Confidence Threshold | §6.10 | — | accepted | | [A4k] Default human gate for judgment-based outcomes | §6.6 | P11 | accepted-with-modification (AutonomousModePolicy) | | [A4k.1] Plan config supports autonomous-mode opt-out | §6.6 | P11 | superseded by AutonomousModePolicy | | [A4l] Novelty Detection in Compilers | §5.2, §6.1 | P33 | accepted-with-modification (math fix) | | [A4m] Sub-Agent Registry Governance | §8.7 | — | accepted | | [A4n] Sub-Agent Reputation Scoring (Phase 1) | §15.8 | — | accepted | | [A4o] AdvisorySubAgentProfile schema | §8.4 | — | accepted | | [A4p] Standardized advisory output schema | §8.5 | P10 | accepted-with-modification (union expanded) | | [A4q] Hardware-Aware Sub-Agent Degradation | §8.8 | — | accepted | | [A4r] RevisionDispatcherProjection | §11.1 | — | accepted | | [A4s] RevisionRuntimeKernel as canonical §11 | §11 intro | — | accepted (V3.1 adds explicit ownership statement) | | [A4t] Agent/procedure/capability normalization table | §0.3.5 | — | accepted (V3.1) | | [A5] PreliminaryEvaluationPreview + ResolvedEvaluationPlan distinction | §4.2 | — | accepted | | [A6] Evaluation lanes as internal composition | §4.3, §5.3 | — | accepted | | [A7] Outcome Compiler Quality Program | §15.3 | — | accepted | | [A8] Cyclic DAG rewrite | §11.5 | — | accepted | | [A9] Comparison to known patterns | §2.3 | — | accepted | | [A10] Cost predictability as governing principle | §3.3, §11.15 | P26, P32 | accepted-with-modification | | [A11] Sub-Agent Leverage Rule | §3.4, §17 | — | accepted | | [A11a] Sub-agent applicability extends to four coordination points | §17.1 | — | accepted | | [A12] Unified module pair via per-outcome AssuranceBasis | §4.6 | — | accepted | ### B-series (UI) — complete enumeration | Card | V3 § | Patch | Status | |---|---|---|---| | [B1] Drop user-facing configuration burden | §21 | — | accepted | | [B2] Single outcome field + optional guidance | §21.9 | — | accepted | | [B2a] Guidance field text/files only | §21.9 | — | accepted | | [B2b] Guidance accessible to Revisor through taint/authority labels | §7.4.2 | — | accepted | | [B3] Drop strictness UI, keep threshold inference visible | §21.3 | — | accepted | | [B4] Failure behavior is graph wiring | §6.8 | — | accepted | | [B5] Outcome as unit of organization | §3.1, §21 | — | accepted | | [B6] Evaluation Result Card with action-oriented status | §21.1 | — | accepted | | [B7] Teach-from-feedback card | §18.1 | P16 (V2 revised) | accepted-with-modification | | [B8] Reduced Adjust panels — Assurance Basis still exposed | §21.9 | — | accepted | | [B9] Skip/Modify/Defer/Accept gate options | §21.5 | P15 | accepted-with-modification (GateSkippability) | | [B10] Hard Call goal-impact display | §21.4 | — | accepted | | [B11] Plan version diff + revert UI | §21.6 | P28 | accepted-with-modification (SemanticChangelog overlay) | | [B12] Automatic Pattern Suggestion UI | §21.8 | P37 | accepted-with-modification (context-scoped) | ### C-series (Evaluation Schemas) — complete enumeration | Card | V3 § | Patch | Status | |---|---|---|---| | [C1] OutcomeSpec god-object split | §5.1 | P22 | accepted-with-modification | | [C2] CompiledEvaluationPlan schema | §5.2 | — | accepted | | [C2a] CompiledEvaluationPlanStatus enum | §0.4, §5.2 | — | accepted | | [C3] EvaluationLanePlan schema | §5.3 | — | accepted | | [C4] EvaluationBinding wrapper with 15-category AssuranceBasis | §5.4 | P2, P3 | accepted-with-modification (split into Basis + Limitation per P2; 14 trust + separate limitation enum) | | [C5] Evaluation Sufficiency Protocol typed/testable | §5.5 | — | accepted | | [C6] JudgmentLimitationRecord with blocking discriminator | §5.9 | — | accepted | | [C7] DecisionTraceRecord — optional in V3 | §5.10 | — | accepted | | [C8] OutcomeEvaluationResult with state machine | §5.6 | P22 | accepted-with-modification | | [C9] EvaluationFinding lifecycle | §5.7 | — | accepted | | [C10] VerificationRecord schema | §5.8 | — | accepted | | [C11] method_params discriminated schemas | §5.4 | — | accepted | | [C12] Outcome validation rules | §5.1 | — | accepted | | [C13] Canonical schemas | §5.12 | — | accepted | | [C14] meta.is_holistic behavioral field | — | — | rejected (per [D1] meta-outcome structurally identical; display tag only) | | [C15] Progress signal granularity preserved | §5.5.4 | — | accepted (V3.1) | | [C16] EvaluationTargetClosurePolicy | §5.11 | — | accepted | | [C17] Outcome ↔ Goal link | §5.1 (goal_refs field) | — | accepted | | [C18] Outcome naming disambiguation | §0.3.7, §5.1.1, §5.1.1.1 | — | accepted (V3.1 rename to EvaluationOutcomeDefinition; V3.2 adds canonical `criteria: Criterion[]` sub-structure per Addenda A ↔ Addenda B coordination V3 §2.4) | ### D-series (Revision Schemas) — complete enumeration | Card | V3 § | Patch | Status | |---|---|---|---| | [D1] RevisionRequest schema | §7.1 | — | accepted | | [D2] RevisionPlan / RevisionExecutionRecord / RevisionExecutionReceipt / RevisionRunSummary | §7.2 | P9 | accepted-with-modification (structured trace) | | [D2a] Failure taxonomy (12 kinds) | §0.4, §6.2 | — | accepted | | [D2b] Repair strategy taxonomy (13 kinds) | §0.4, §6.3 | P7 | accepted-with-modification (versioning) | | [D2c] Repair target taxonomy (8 kinds) | §0.4, §6.4 | — | accepted | | [D2d] Repair ordering rules with topological sort + cycle detection | §6.11 | — | accepted | | [D2e] RevisionDiagnosis schema | §7.3 | — | accepted | | [D2f] RevisionIntelligencePacket — full schema | §7.4 | P34 | accepted-with-modification (CIL authority snapshot required) | | [D3] RevisionPlanStep schema | §7.5 | — | accepted | | [D3a] TypedRevisionInstruction enrichments | §7.6 | P1 | accepted-with-modification (discriminated union) | | [D4] ModuleRevisionCapability schema | §9.2 | P7, P38 | accepted-with-modification | | [D5] TypedRevisionInstruction cross-reference | §7.6 | — | accepted | | [D6] ModuleAck status-dependent required fields | §9.4 | — | accepted | | [D6a] RevisionExecutionReceipt lifecycle/execution split | §9.4 | P25 | accepted-with-modification | | [D7] ModuleAck artifact versioning | §9.4 | — | accepted | | [D8] Capability semantic constraints | §9.3 | P7 | accepted-with-modification | | [D9] PreservationConstraintSet (renamed from RegressionMemory) | §7.7 | — | accepted | | [D9a] PreservationContract first-class object | §7.7 | — | accepted | | [D10] Plan template versioning correct retrieval ordering | §13.7.1 | — | accepted (V3.1 full 5-step ordering) | | [D11] RevisorConfig schema expanded | §6.14, §6.16 | P4, P6, P11, P32 | accepted-with-modification (V3.2 adds `learning_mode` field per coordination V3 §2.10) | | [D12] CompiledRevisionStrategy full schema | §7.8 | P9 | accepted-with-modification (structured trace) | | [D13] RevisionSafetyEnvelope full schema | §7.7, §11.18 | — | accepted | | [D14] HardRevisionCallKind taxonomy + HardRevisionCall schema | §0.4, §7.9 | — | accepted | | [D15] HardCallResolutionLedger | §7.9 | P13 | accepted-with-modification (compatibility binding) | | [D16] Repeated insufficiency loop breaker | §6.11 | — | accepted | | [D17] RevisionOperationReceipt envelope (DOC73-style) | §11.6 | P8 | accepted-with-modification (extends PBE) | | [D18] RevisionSideEffectPolicy | §11.18 | — | accepted | | [D19] Source repair circular dependency rule | §6.11 | P36 | accepted-with-modification | | [D20] Goal-impact assessment | §6.12 | P20 | accepted-with-modification (UI only) | | [D21] Revisor Explanation Trace | §7.10 | P9 | accepted-with-modification | ### E-series (Runtime) — complete enumeration | Card | V3 § | Patch | Status | |---|---|---|---| | [E1] Revision Dispatcher lifecycle state machine | §11.2 | P23 | accepted-with-modification | | [E1a] Deterministic plan linting (always-on) | §11.3 | P1, P34 | accepted-with-modification | | [E1b] Semantic plan linting | §11.4 | — | superseded by [E1c] PlanAssurancePolicy | | [E1c] PlanAssurancePolicy unified plan-verification ladder | §11.4 | P5 | accepted-with-modification (stack) | | [E1d] Adversarial semantic linter posture | §11.5 | — | accepted | | [E2] Bus failure semantics tied to states | §11.2 | P23 | accepted-with-modification | | [E3] Unhandled exception synthesized ModuleAck | §9.4, §11.17 | — | accepted | | [E3a] Failure-of-failure handlers | §11.17 | P25 | accepted-with-modification | | [E4] Plan step idempotency keys | §11.8 | — | accepted | | [E4a] Deterministically hashed idempotency keys | §11.8, §11.10 | — | accepted | | [E5] RevisionStepExecutionRecord schema | §7.12 | — | accepted | | [E6] EvaluationSnapshot | §5.16 | — | accepted (V3.1) | | [E6a] Live-Edit/Snapshot hash check before direct_fix | §11.20 | P29 | accepted-with-modification (rolling hash mode) | | [E7] ArtifactMutationPrecondition | §7.13 | — | accepted (V3.1) | | [E8] PlanWriteSet for concurrent plan conflict detection | §11.9 | P21 | accepted-with-modification (added PlanReadSet) | | [E8a] Deterministic concurrency tie-breaker | §11.9 | P24 | accepted-with-modification | | [E9] Section-level write-scope validation | §11.10 | — | accepted | | [E10] ShadowWorkspace | §23.1.1 | P4 | superseded by [E10a] | | [E10a] ArtifactVersionState machine + CandidateArtifactVersion | §11.11 | P4, P17, P30, P31 | accepted-with-modification | | [E11] pending_dependency state | §5.14 | P27 | accepted-with-modification (cascade) | | [E12] could_not_fix / error / abort normalization | §9.4 | — | accepted | | [E13] Final aggregation policy | §5.15 | P27 | accepted-with-modification | | [E14] Bus parallel execution rules vs DOC23 engine | §11.22 | — | accepted | | [E14a] Stateful multi-tick — Option A | §11.22 | — | accepted | | [E15] Yield-back — opt-in default | §6.7 | P31 | accepted-with-modification (candidate-only) | | [E16] Version-and-diff workflow mandatory | §11.11 | P28 | accepted-with-modification (changelog required) | | [E16a] GraphStateRollback for revert | §11.13 | — | accepted | | [E17] Dry-Run Mode (Phase 1) | §11.14 | — | accepted | | [E18] Direct fix flip rule — downstream dirty | §10.5 | — | accepted | | [E19] Cost/latency budget cap per revision cycle | §11.15 | P6, P26, P32 | accepted-with-modification | | [E20] Local-first compute budget | §11.16 | — | accepted | ### F-series (Safety) — complete enumeration | Card | V3 § | Patch | Status | |---|---|---|---| | [F1] Adversarial input / prompt injection boundary (taint model) | §15.10 | P12 | accepted-with-modification (custom_instruction envelope) | | [F1a] Adversarial boundary extends to Revisor input packet | §7.4.2 | — | accepted (V3.1) | | [F1b] Transitive taint propagation + SanitizationNode | §15.11 | P30 | accepted-with-modification (sandboxed eval) | | [F1c] Taint lifecycle and clearance | §15.12 | P19 | accepted-with-modification (tier binding) | | [F2] Narrow direct fix to mechanical only | §10.1 | — | accepted | | [F2a] DirectFixAllowedClass + DirectFixForbiddenClass enums | §0.4, §10.1 | P2 | accepted-with-modification | | [F3] Direct fix tracked changes + revalidation | §10.4 | — | accepted | | [F3a] Direct fix forbidden on judgment-based outcomes | §10.2 | P2 | accepted-with-modification | | [F4] PolicyDecision gate before mutation | §11.19 | P3 | accepted-with-modification | | [F5] Safety/governance as inputs to compilation and dispatch | §3.8 | — | accepted | ### G-series (Governance) — complete enumeration | Card | V3 § | Patch | Status | |---|---|---|---| | [G1] EvaluationArtifactGovernancePolicy schema | §16.1 | — | accepted | | [G2] EC Core retention rows for new artifact kinds | §16.1, §29 | — | accepted | | [G3] Pattern-learning promotion gates with privilege/matter/firewall fields | §16.4 | P39 | accepted-with-modification (content-based) | | [G3a] Pattern promotion default-deny posture | §16.5 | P19, P39 | accepted-with-modification | | [G4] Run Inspector access tiers | §16.2 | P19 | accepted-with-modification (taint clearance binding) | | [G5] Export behavior with explicit route/read-model contracts | §16.3 | — | accepted | | [G6] Matter-specific governance policies | §16.6 | — | accepted (V3.1) | | [G7] EC owns policy evaluation; DOC23 consumes | §3.7, §11.19 | P3 | accepted-with-modification | ### H-series (Workspace) — complete enumeration | Card | V3 § | Patch | Status | |---|---|---|---| | [H1] Three-way ownership: RunWorkspace + SourceWorkspace + ArtifactStore | §12.1 | — | accepted | | [H2] Source Workspace API operations | §12.3 | — | accepted | | [H3] Workspace unavailable / conflict behavior | §12.2 | — | accepted | | [H4] ArtifactDiff / SemanticNeighborhood extraction with taint/budget/provenance | §12.6 | — | accepted (V3.1) | | [H4a] Semantic Paging for large documents | §12.4 | P29 | accepted-with-modification | | [H5] Cost attribution per-component required Phase 1 | §12.5 | P26, P32 | accepted-with-modification | ### I-series (Learning) — complete enumeration | Card | V3 § | Patch | Status | |---|---|---|---| | [I1] Current-run feedback vs future-learning decision table | §14.1 | — | accepted | | [I2] Meaningful corrections route through Revisor | §14.2 | — | accepted | | [I3] HumanOutcomeFeedbackEvent schema | §14.2 | — | accepted | | [I4] InterpretedOutcomeFeedback schema | §14.3 | — | accepted | | [I5] DirectInstructionCandidate schema | §14.7 | P14, P39 | accepted-with-modification | | [I6] Feedback Interpreter as named component | §14.3 | P10, P39 | accepted-with-modification | | [I7] BDSM handles utility learning, not semantic interpretation | §3.6 | — | accepted (Phase 2 boundary) | | [I8] Evaluator/Revisor signal classes | §14.8 | — | accepted | | [I9] DOC72 receives scoped durable knowledge candidates | §14.5 | — | accepted | | [I9a] Unified Pattern primitive with owner split + compatibility constraints | §13.1, §13.3 | P18, P39 | accepted-with-modification (provenance/applicability split; V3.2 adds `cross_model_applicability` field on Pattern and `model_class` axis on PatternPerformanceSlice.context_signature per coordination V3 §2.10) | | [I10] DOC24 injection uses compiled guidance, not raw pattern text | §29 OBL-DOC24-V3-02 | — | accepted (Phase 2 per §26.1.2) | | [I11] Pattern compatibility validation | §13.6 | — | accepted | | [I12] "from memory" vs "adapted from memory" UI distinction | §21.8.1 | — | accepted (V3.1) | | [I13] PatternPerformanceSlice with context_signature and negative outcomes | §13.3 | P18 | accepted-with-modification (no global success_rate) | | [I14] Pattern decay / demotion (PatternHealth) | §13.5 | — | accepted | | [I15] Novelty Detection in Compilers | §5.2 | P33 | accepted-with-modification | | [I16] Human feedback authority classification | §14.6 | — | accepted | | [I17] Goal-advancement signal | §13.3 | P20 | accepted-with-modification (source-bound) | ### J-series (Forum) — complete enumeration | Card | V3 § | Patch | Status | |---|---|---|---| | [J1] DOC12 Room for deliberation only | §14.9 | — | accepted | | [J2] Plan review forum lifecycle states | §14.9.3 | — | accepted | | [J3] Role-scoped critique authority | §14.9.1 | — | accepted (V3.1) | | [J4] DOC12 Room amendment for plan_review_room semantics | §14.9, §29 OBL-DOC12-FORUM-01 | — | accepted | | [J5] Plan Verification Agent | §11.4 | — | superseded by [E1c] PlanAssurancePolicy | | [J6] Forum participant set dynamic | §14.9.2 | — | accepted (V3.1) | ### K-series (Quality Program) — complete enumeration | Card | V3 § | Patch | Status | |---|---|---|---| | [K1] Outcome Evaluator Eval Suite | §15.6 | — | accepted | | [K2] Eval suite dataset categories with domain-neutral smoke tests Phase 1 | §15.6 | — | accepted | | [K3] Judge module as scorer inside eval suite | §15.6 | — | accepted | | [K4] Prompt Lab / DSPy with gates — multi-doc target | §29 OBL-DOC17-PROMPT-01 | — | accepted (Phase 2 per §26.1.8) | | [K5] Cost predictability | §3.3, §11.15 | P26, P32 | accepted-with-modification | | [K6] Eval suite slices by pattern_id, not hardcoded work type | §15.6.2 | — | accepted (V3.1) | | [K7] Sub-agent advice quality + reputation scoring | §15.8.3 | — | accepted (V3.1 with four named metrics) | | [K8] Known-bad fixture classes | §15.7 | — | accepted | | [K9] Quality program metric denominators and anti-gaming labels | §15.9 | P35 | accepted-with-modification (actionability) | | [K10] Goal-conditioned eval slices | §15.6.3 | — | accepted (V3.1) | ### L-series (Cross-doc) — complete enumeration | Card | V3 § | Status | |---|---|---| | [L1] DOC23 R3.1 amendment — split into 11 OP-A rows | §29 OBL-DOC23-INTEGRATION-01 et al. | accepted | | [L2] DOC23 Addenda A amendments for Judge / Claim Extractor | §29 (cross-ref to Addenda A) | accepted | | [L3] DOC23 Addenda B R0.6.4 amendments | §23 Migration Guide | accepted (this addendum supersedes R0.6.4 for evaluator/revisor) | | [L4] DOC72 amendment — Pattern graph payload | §29 OBL-DOC72-PATTERN-01 | accepted-with-modification (P18, P39) | | [L5] DOC24 amendment — capability/readiness ownership | §29 OBL-DOC24-CAP-01, OBL-DOC24-CAP-02 | accepted | | [L6] DOC12 plan-review room amendment | §29 OBL-DOC12-FORUM-01 | accepted | | [L7] DOC11 / DOC15 / EC Core obligations — split into discrete rows | §29.1, §29.2, §29.6 | accepted | | [L8] DOC20 UI surfaces | §29 OBL-DOC20-CONTENT-01 | accepted | | [L9] DOC25 / DOC73 evaluator source-verification — split into discrete rows | §29 (cross-doc obligations) | accepted | | [L10] Placeholder section refs cleanup | (V3 hygiene pass) | accepted (no `§x` placeholders remain) | | [L11] DOC72 goals integration obligations | §29 (4 OBL-DOC72-GOAL-* rows) | accepted | ### M-series (Cross-doc terminology) — complete enumeration | Card | V3 § | Status | |---|---|---| | [M1] Cross-doc terminology table | §0.3.6 | accepted (V3.1) | | [M2] DispatcherState enum (canonical) | §0.4, §11.2 | accepted | | [M3] Versus DOC73 terminology | §0.3.4 | accepted | | [M4] Versus DOC24 OutcomeClass | §0.3.6, §0.3.7 | accepted | ### N-series (Stress fixtures) — complete enumeration | Card | V3 § | Status | |---|---|---| | [N0] Stress tests become conformance fixtures | §25 | accepted | | [N1] Quality guarantor framing | §25 F-QUAL-02 | accepted (V3.1) | | [N2] AssuranceBasis 15-category expansion | §25 F-ASSURE-03 | accepted (V3.1; 14-trust + limitations per P2 split) | | [N3] DOC72 goals integration fixture | §25 F-GOAL-01 | accepted (V3.1) | | [N4] Card V3.1 standalone | (this addendum is self-contained) | accepted | | [N5] Concurrent plan conflict fixture | §25 F-CONCUR-02 | accepted | | [N6] Live-edit during evaluation fixture | §25 F-HASH-02 | accepted | | [N7] Preservation violation fixture | §25 F-PRESERVE-01 | accepted (V3.1) | | [N8] Repeated insufficiency fixture | §25 F-LOOP-01 | accepted (V3.1) | | [N9] Transitive taint fixture | §25 F-SANDBOX-01, F-SANDBOX-02 | accepted | | [N10] Delivery replay fixture | §25 F-DELIVERY-01 | accepted (V3.1) | ### O-series (Superseded V2 items) — complete enumeration | Card | V3 § | Status | |---|---|---| | [O1] V2 [A3] step.revision_planner | §0.3, §23.1 | superseded by [A3] | | [O2] V2 [C4] AssuranceBasis replaces check_method | §5.4 | superseded by [C4] V3 (both retained per [C4] V3) | | [O3] V2 "Revision Bus" | §0.3, §11 | superseded by Revision Dispatcher [A2] | | [O4] V2 strictness slider | §21 | superseded ([B3] dropped strictness UI) | | [O5] V2 method picker | §21 | superseded ([B1] drop user-facing config burden) | | [O6] V2 failure-routing radio buttons | §21 | superseded ([B4] failure behavior is graph wiring) | | [O7] V2 item extraction categories at user-facing level | §21 | superseded ([C13] canonical schemas) | | [O8] V2 [F2] direct fix size threshold | §10.1 | superseded by [F2a] class-based eligibility | | [O9] V2 "ShadowWorkspace" name | §23.1.1 | superseded by [E10a] CandidateArtifactVersion | | [O10] V2 generic AsyncModuleRun amendment in DOC23 | §26.1.3 | superseded; deferred to Phase 2 | | [O11] V2 [E1b] semantic plan linting as separate item | §11.4 | superseded by [E1c] PlanAssurancePolicy | | [O12] V2 [J5] Plan Verification Agent as separate item | §11.4 | superseded by [E1c] PlanAssurancePolicy | ### P-series (Phase 2 deferrals) — complete enumeration | Card | V3 § | Status | |---|---|---| | [P1] DecisionTraceRecord universal adoption | §26.1.6 | deferred (Phase 2) | | [P2] Eval suite full domain coverage | §26.1.7 | deferred (Phase 2) | | [P3] BDSM full evaluator/revisor signal ingestion | §26.1.1 | deferred (Phase 2) | | [P4] DOC24 compiled guidance injection from learned patterns | §26.1.2 | deferred (Phase 2) | | [P5] DOC17 prompt artifact integration | §26.1.8 | deferred (Phase 2) | | [P6] Cross-LLM red-team integration for evaluator outputs | §26.1.9 | deferred (Phase 2) | | [P7] Generic DOC23 AsyncModuleRun amendment | §26.1.3 | deferred (Phase 2) | | [P8] Per-outcome exact cost attribution | §26.1.10 | deferred (Phase 2) | | [P9] Phase 2 sub-agent reputation routing logic | §15.8 (Phase 1 surface), §26.1 (Phase 2 routing) | partially accepted (Phase 1 reputation surface); routing deferred | | [P10] Phase 2 full reputation-based routing | §26.2.2 | deferred (Phase 2) | | [P11] Phase 2 thermal-aware scheduling | §26.1.11 | deferred (Phase 2) | ### Q-series (Already settled) — complete enumeration | Card | Cross-references | Status | |---|---|---| | [Q1] [B5] Outcome as unit of organization | §3.1, §21 | settled (per V2; accepted) | | [Q2] [B6] Evaluation Result Card with action-oriented status | §21.1 | settled (per V2; accepted) | | [Q3] [A9] Comparison to known patterns | §2.3 | settled (per V2; accepted) | | [Q4] [A12] Unified module pair via per-outcome AssuranceBasis | §4.6 | settled (per V2; accepted) | | [Q5] [J1] DOC12 Room for deliberation only | §14.9 | settled (per V2; accepted) | | [Q6] [K5] Cost predictability | §3.3 | settled (per V2; accepted) | ### Matrix completeness summary - Total card items in V3.1 adjudication card: **247** - Items recorded in §28.1 matrix: **247** (100%) - Status breakdown: - accepted (as written): **149** - accepted-with-modification (patch applied): **63** - superseded (V2 → V3 deprecation): **23** - deferred (Phase 2): **11** - rejected (intentionally omitted): **1** ([C14]) ## §28.2 Spec Collision Register Maintained from Canonicalization Patch V2 §7. Each entry records a resolved internal contradiction. ``` SC-01 AssuranceBasis vs EvaluationLimitationKind Patch: P2 SC-02 policy_evaluated overload Patch: P3 SC-03 ShadowWorkspace vs CandidateArtifactVersion Patch: P4 SC-04 PlanAssurance selected vs stack Patch: P5 SC-05 target_port flexibility vs revision_in contract Patch: P1 SC-06 Global pattern success_rate vs PerformanceSlice Patch: P18 SC-07 OutcomeRuntimeState old enum vs richer states Patch: P22 SC-08 DispatcherState vs failure events Patch: P23 SC-09 autonomous_mode_opt_out boolean vs policy Patch: P11 SC-10 HardCallResolution absoluteness vs compatibility Patch: P13 SC-11 DirectInstructionCandidate broad scope vs privilege Patch: P14 SC-12 Skip option vs skippability Patch: P15 SC-13 Teach-from-feedback defaults vs privilege gate (revised V2) Patch: P16 SC-14 CandidateArtifactVersion state flip vs receipt Patch: P17 SC-15 Sub-agent narrow output contract Patch: P10 SC-16 RevisionOperationReceipt vs PBEOperationReceiptLite Patch: P8 SC-17 custom_instruction string vs taint model Patch: P12 SC-18 PlanWriteSet only vs read/write conflict reality Patch: P21 SC-19 Concurrency tie-breaker undefined fields Patch: P24 SC-20 GoalImpactAssessment self-grading vs learning Patch: P20 SC-21 Taint clearance vs access tier Patch: P19 SC-22 Cascading deadlock on pending_dependency Patch: P27 SC-23 Multi-step plan hash precondition paradox Patch: P29 SC-24 Tainted candidate evaluation poisoning Patch: P30 SC-25 Yield-back atomicity break Patch: P31 SC-26 Logical vs infrastructure budget conflation Patch: P32 SC-27 NoveltyAssessment math direction Patch: P33 SC-28 Pattern origin conflated with applicability scope Patch: P39 ``` Future collisions discovered during V3.x implementation are recorded here with the same format. ## §28.3 Patch landing summary The full patch landing table is in Canonicalization Patch V2 §9. The summary in this section maps patches to V3 section anchors for quick reference; see Patch V2 for detailed change descriptions. | Patch | V3 Section anchors | Severity | |---|---|---| | P1 | §7.5, §11.3 | CRITICAL | | P2 | §5.4, §6.5, §10 | CRITICAL | | P3 | §5.4, §11.19 | HIGH | | P4 | §6.14, §11.11, §23.1.1 | HIGH | | P5 | §11.4 | CRITICAL | | P6 | §6.14, §11.15 | HIGH | | P7 | §9.2, §9.3 | HIGH | | P8 | §11.6 | HIGH | | P9 | §7.10 | MEDIUM | | P10 | §8.5 | HIGH | | P11 | §6.6, §6.14 | HIGH | | P12 | §7.6, §15.10 | HIGH | | P13 | §7.9 | MEDIUM | | P14 | §14.7 | HIGH | | P15 | §21 | HIGH | | P16 (V2) | §18.1 | MEDIUM | | P17 | §11.11 | MEDIUM | | P18 | §13.1, §13.3 | MEDIUM | | P19 | §15.12, §16.2 | CRITICAL | | P20 | §6.12, §13.3 | CRITICAL | | P21 | §11.9 | HIGH | | P22 | §5.1 | HIGH | | P23 | §11.2 | HIGH | | P24 | §11.9 | MEDIUM | | P25 | §11.17, §9.4 | MEDIUM | | P26 | §15.5, §11.15 | MEDIUM | | P27 | §5.14, §11 | HIGH | | P28 | §7.11, §21.6 | HIGH | | P29 | §11.11, §11.20 | HIGH | | P30 | §11.12, §15.11 | CRITICAL | | P31 | §6.7, §11.11 | HIGH | | P32 | §11.15, §6.14 | HIGH | | P33 | §5.2, §6.1 | LOW | | P34 | §7.4, §11.3 | HIGH | | P35 | §15.9 | LOW | | P36 | §6.11 | LOW | | P37 | §21.8 | LOW | | P38 | §9.7 | HIGH | | P39 | §13.1, §14.3, §16.4 | HIGH | ## §28.4 Validation code severity catalog The complete severity catalog of validation codes: **Critical (block; cannot override without amendment):** - `validation.plan_step_target_port_bypassed_revision_in` (P1) - `validation.legacy_assurance_basis_contained_limitation_value` (P2) - `validation.plan_dispatched_with_unmet_required_modes` (P5) - `validation.degradation_removed_non_degradable_mode` (P6) - `validation.autonomous_mode_attempted_hard_call_bypass` (P11) - `validation.autonomous_mode_attempted_policy_bypass` (P11) - `validation.autonomous_mode_attempted_privilege_bypass` (P11) - `validation.autonomous_mode_attempted_side_effect_bypass` (P11) - `validation.custom_instruction_taint_violation` (P12) - `validation.direct_instruction_scope_exceeds_privilege` (P14) - `validation.skip_button_rendered_on_not_skippable` (P15) - `validation.candidate_accepted_without_receipt` (P17) - `validation.taint_clearance_scope_exceeds_tier` (P19) - `validation.goal_advancement_self_graded` (P20) - `validation.pending_dependency_did_not_cascade_on_upstream_failure` (P27) - `validation.tainted_candidate_eval_promoted_to_main` (P30) - `validation.coding_module_received_revision_without_safe_profile` (P38) - `validation.instruction_in_used_as_revision_target_without_capability` (P1) - `validation.hard_call_triggered_on_assurance_basis` (P2) **Error (block by default; manual override with receipt):** - `validation.direct_fix_step_has_target_module_id` (P1) - `validation.shadow_workspace_referenced_outside_deprecations` (P4) - `validation.module_revision_capability_missing_version` (P7) - `validation.revision_receipt_missing_pbe_field` (P8) - `validation.hard_call_resolution_compatibility_check_skipped` (P13) - `validation.teach_from_feedback_default_on_for_durable` (P16) - `validation.pattern_carries_global_success_rate` (P18) - `validation.plan_missing_read_or_write_set` (P21) - `validation.outcome_state_uses_legacy_enum` (P22) - `validation.failure_event_used_as_dispatcher_state` (P23) - `validation.tie_breaker_references_undefined_field` (P24) - `validation.workspace_failure_used_partially_completed_blanket` (P25) - `validation.uncalibrated_estimator_auto_approved_above_threshold` (P26) - `validation.regenerate_step_missing_semantic_changelog` (P28) - `validation.multi_step_plan_used_live_mutation_without_optin` (P29) - `validation.yield_back_left_live_mutations` (P31) - `validation.infrastructure_retry_counted_against_logical_budget` (P32) - `validation.novelty_score_inverted` (P33) - `validation.revision_packet_missing_cil_authority_snapshot` (P34) - `validation.quality_signal_missing_actionability` (P35) - `validation.source_repair_depth_exceeds_max` (P36) - `validation.pattern_origin_used_as_applicability` (P39) - `validation.rolling_hash_chain_mismatch` (P29) **Warning (surface in UI; do not block):** - `validation.pattern_ui_displayed_unscoped_aggregate` (P37) - `validation.semantic_changelog_quality_below_threshold` (§7.11) - `validation.local_hardware_degradation_engaged` (§8.8) **Informational (logged; not surfaced):** - `validation.cost_estimator_recalibration_recommended` (P26) - `validation.pattern_obsolescence_candidate` (§26.2.3) --- # §29. CROSS-DOC OBLIGATIONS This section enumerates the cross-document obligations created by this addendum. Each row is an OP-A ledger entry: a primitive that another document must provide for V3 to operate fully. Implementations are validated against the obligation status (per §24.3 conformance gate 5). OP-A row schema (from DOC10/DOC72 OP-A tracker): ```ts OPARow { obligation_id: string // canonical ID, e.g., OBL-DOC72-PATTERN-01 owner_doc: string // doc that owes the primitive consumer_doc: string // "DOC23 Addenda B V3" for all rows here description: string consumed_surface: string // exact schema/field/route owed current_status: | "specified_in_owner" // owner doc has stable spec | "consumed_in_v3" // V3 references the surface | "in_review" // owner is drafting | "blocker" // V3 cannot proceed fully v3_section_consumers: string[] notes: string schema_version: 1 } ``` ## §29.1 DOC11 obligations ### OBL-DOC11-ACP-01 — Revision-safe ACP profile **Owner:** DOC11 **Consumer:** This addendum **Description:** DOC11 must define the `acp_profile_revision_safe: boolean` flag on ACP module profiles. When set, the module is eligible to receive revision dispatch via its `revision_in` port. The profile must declare its workspace-root scope, allowed test commands, file-write gate requirements, and ACP monitoring requirements per §9.7. **Consumed surface:** `ACPModuleProfile.acp_profile_revision_safe`, `ACPModuleProfile.allowed_workspace_roots`, `ACPModuleProfile.allowed_test_commands`, `ACPMonitoringHook` **Status:** in_review (DOC11 R14 deltas pending) **V3 sections:** §9.7, §38 step.coding policy **Notes:** Without this, `step.coding` modules cannot receive revision dispatch. Default behavior is to refuse revision dispatch to coding modules until the safe profile is published. ### OBL-DOC11-MON-01 — Session monitoring hook **Owner:** DOC11 **Description:** A monitoring hook that surfaces ACP session state to the Revision Dispatcher: session live/parked/resumed, per-session cost, output streams, error streams. **Consumed surface:** `ACPSessionMonitor` interface; `ACPSessionState` enum **Status:** in_review **V3 sections:** §9.7, §11.18 ## §29.2 DOC15 obligations ### OBL-DOC15-CIL-02 — Authority snapshot reference **Owner:** DOC15 **Description:** DOC15 must provide a `cil_authority_snapshot_ref: StorageRef` primitive. Snapshots capture the active set of authority memory and standing orders at a point in time, suitable for plan validation and conflict detection. **Consumed surface:** `CILAuthoritySnapshot`, `CILAuthoritySnapshotAPI.snapshot()`, `CILAuthoritySnapshotAPI.conflict_check(snapshot_ref, candidate_plan)` **Status:** in_review **V3 sections:** §7.4 RevisionIntelligencePacket, §11.3 deterministic linting ### OBL-DOC15-CIL-03 — Authority conflict resolution policy **Owner:** DOC15 **Description:** A policy that resolves conflicts between a candidate plan step and the CIL authority memory: `block_plan | human_gate | recompile_with_authority`. **Consumed surface:** `CILAuthorityConflictResolution` enum and dispatcher API **Status:** in_review **V3 sections:** §7.4, §11.3 ## §29.3 DOC24 obligations ### OBL-DOC24-CAP-01 — Versioned capability registry **Owner:** DOC24 **Description:** The capability registry must support `capability_id` and `capability_version: semver` per P7. Resolution APIs must accept version constraints and resolve to the matching capability binding. **Consumed surface:** `CapabilityRegistry.resolve(capability_id, version_constraint)`; `CapabilityBinding` shape **Status:** specified_in_owner **V3 sections:** §9.2 ### OBL-DOC24-CAP-02 — Capability snapshot hash **Owner:** DOC24 **Description:** A deterministic hash of the active capability set, used by PlanReadSet to detect capability drift between plan compilation and dispatch. **Consumed surface:** `CapabilityRegistry.snapshot_hash()` **Status:** specified_in_owner **V3 sections:** §11.9 ### OBL-DOC24-V3-02 — Hot-path compiled guidance injection (Phase 2) **Owner:** DOC24 **Description:** Phase 2 injection of compiled guidance from learned patterns into the routing hot path. **Status:** deferred (Phase 2) **V3 sections:** §26.1.2 ## §29.4 DOC72 obligations ### OBL-DOC72-PATTERN-01 — Pattern payload removes success_rate, supports provenance/applicability split **Owner:** DOC72 **Description:** DOC72's Pattern payload must: 1. REMOVE `success_rate` as a first-class field (P18) 2. ADD `performance_slices: PatternPerformanceSlice[]` 3. ADD `aggregate_display_metrics?: PatternAggregateDisplayMetrics` (display-only) 4. SPLIT scope into `provenance` (audit trail, never used for retrieval gating) and `applicability_scope` (retrieval filter, inferred from pattern content) per P39 5. ADD `identifying_content_scan` field per P39 6. Retrieval API filters patterns by `applicability_scope` matching the current work context, with the cross-matter firewall (matter-scoped patterns excluded from non-matching matters) enforced at retrieval, not at intake. **Consumed surface:** `Pattern` schema (per §13.1); `PatternStore.retrieve(context_signature)` with cross-matter filtering **Status:** in_review (V3 specifies the consumed shape; DOC72 V2.x must accept) **V3 sections:** §13.1, §13.3, §13.4, §13.5 ### OBL-DOC72-SLICE-01 — PatternPerformanceSlice storage **Owner:** DOC72 **Description:** Storage for per-context-signature performance slices, including usage, success, failure, regression, override, contested_finding, rollback, cost, and goal_advancement counts with source tracking (per P20). **Consumed surface:** `PatternPerformanceSlice` schema; `PatternStore.update_slice()` **Status:** in_review **V3 sections:** §13.3 ### OBL-DOC72-FED-01 — Pattern federation (Phase 2) **Owner:** DOC72 **Status:** deferred (Phase 2) **V3 sections:** §26.1.5 ## §29.5 DOC73 obligations ### OBL-DOC73-RECEIPT-01 — PBEOperationReceiptLite extension point **Owner:** DOC73 **Description:** DOC73 must publish `PBEOperationReceiptLite` as an extensible base, with extension points for subsystem-specific receipt kinds. This addendum's `RevisionOperationReceipt extends PBEOperationReceiptLite` (P8) requires this primitive. **Consumed surface:** `PBEOperationReceiptLite` schema; the extension contract **Status:** specified_in_owner (DOC73 V1.3 already defines this) **V3 sections:** §11.6 ### OBL-DOC73-IDEMP-01 — Deterministic idempotency key support **Owner:** DOC73 **Description:** PBE receipt store accepts deterministic idempotency keys and deduplicates by key without rejecting legitimate retries. **Consumed surface:** `PBEReceiptStore.write(receipt, idempotency_key)` semantics **Status:** specified_in_owner **V3 sections:** §11.8, §11.10 ## §29.6 EC obligations ### OBL-EC-POLICY-01 — PolicyDecision durable records **Owner:** EC (with PropA) **Description:** Durable PolicyDecision records with `decision: "allow" | "block" | "allow_with_human_gate"`, scope (`compilation | dispatch | mutation | export | memory_write`), and audit trail. **Consumed surface:** `PolicyDecision` schema; `PolicyEngine.evaluate(artifact_ref, operation_kind)` API **Status:** specified_in_owner **V3 sections:** §5.4 PolicyEvaluationRef, §11.19 PolicyDecision gate ### OBL-EC-PRIV-03 — AccessTier to taint-clearance-scope mapping **Owner:** EC **Description:** EC maintains the canonical mapping from `AccessTier` to maximum `clearance_scope` for taint clearance operations per P19: - owner_full_access → this_run, this_matter - matter_team_access → this_matter - supervising_attorney_review → firm - firm_admin, architect_admin → global - audit_log_only, no_access → cannot clear taint The mapping is consulted at the taint clearance write boundary; attempts to clear above the user's tier are rejected. **Consumed surface:** `AccessTierClearanceMap`; `EC.check_clearance_authorization(user_ref, clearance_scope)` API **Status:** in_review **V3 sections:** §15.12, §16.2 ### OBL-EC-ACTOR-01 — Actor of record for receipts **Owner:** EC **Description:** Every RevisionOperationReceipt carries an `actor` field referencing an EC-managed actor record (user, system, agent, module, migration). **Consumed surface:** `Actor` schema; actor resolution at receipt write time **Status:** specified_in_owner **V3 sections:** §11.6 ### OBL-EC-MULTI-01 — Multi-user concurrent task editing (Phase 2) **Owner:** EC **Status:** deferred (Phase 2) **V3 sections:** §26.1.4 ## §29.7 BDSM (Phase 2) ### OBL-BDSM-V3-01 — Signal compilation into utility bundles **Owner:** BDSM **Status:** deferred (Phase 2) **Description:** BDSM consumes the local signal ledger this addendum populates (§14.8) and compiles signals into utility bundles for DOC24 hot-path injection. Phase 1 captures signals locally; Phase 2 enables BDSM consumption. **V3 sections:** §3.6, §26.1.1 ## §29.8 DOC20 obligations ### OBL-DOC20-CONTENT-01 — Unified Content Map entries **Owner:** DOC20 **Description:** DOC20's Unified Content Map registers the content types this addendum introduces: `CompiledEvaluationPlan`, `RevisionPlan`, `RevisionOperationReceipt`, `CandidateArtifactVersion`, `Pattern`, `PatternPerformanceSlice`, `HumanOutcomeFeedbackEvent`, `DirectInstructionCandidate`, `HardCallResolution`, `SemanticChangelog`. **Consumed surface:** Content map registration; viewer routing **Status:** in_review **V3 sections:** §21 UI surfaces ## §29.9 DOC12 obligations ### OBL-DOC12-FORUM-01 — Plan Review Forum room kind **Owner:** DOC12 **Description:** A canonical room kind for plan review forums per §14.9, with declared participants (Revisor, Plan Verifier sub-agent, user), output schema (forum decision), and audit trail. **Consumed surface:** `RoomKind.plan_review` registration; forum lifecycle hooks **Status:** in_review **V3 sections:** §14.9 ## §29.10 DOC10 obligations ### OBL-DOC10-LEDGER-01 — OP-A row registration **Owner:** DOC10 **Description:** DOC10's OP-A ledger registers all rows above. Each row is checked during V3 conformance gate evaluation. **Status:** specified_in_owner **V3 sections:** §24.3, §29 (this section) ## §29.11 DOC23 R3.1 (parent doc) ### OBL-DOC23-INTEGRATION-01 — step.evaluator and step.revisor module types **Owner:** DOC23 R3.1 (parent) **Description:** Confirm the canonical module types `step.evaluator` and `step.revisor` are registered in the module type registry. Their internal runtime services (Outcome Compiler, Evaluator, Revision Compiler, Revision Execution Service, Revision Dispatcher) are specified in this addendum and do not appear as separate module types. **Consumed surface:** Module type registry **Status:** specified_in_owner **V3 sections:** §0.3 ### OBL-DOC23-SECURITY-01 — TaskSecurityPolicy bridge **Owner:** DOC23 R3.1 **Description:** RevisionSideEffectPolicy (§11.18) bridges to DOC23's TaskSecurityPolicy. No external side effect plan passes V3 validation unless it also passes the relevant DOC23 TaskSecurityPolicy check. **Consumed surface:** `TaskSecurityPolicyCheck` API **Status:** specified_in_owner **V3 sections:** §11.18 ## §29.12 Obligation status summary | Status | Count | Notes | |---|---|---| | specified_in_owner | 9 | Owner doc has stable spec; V3 consumes as-is | | consumed_in_v3 | (all rows above are consumed by V3) | — | | in_review | 11 | Owner doc draft in progress; V3 specifies the expected shape | | deferred (Phase 2) | 5 | Out of scope for V3 | No obligations are at status `blocker` for V3 production deployment. Production deployment requires the `in_review` items to land or, where they have not landed, the V3 implementation degrades the affected subsystem per §24.3 conformance gate 5. ## §29.13 V3.2 Addenda A ↔ Addenda B coordination obligations The following obligations are added in V3.2 from the Addenda A ↔ Addenda B coordination V3 FINAL proposal. Full row text lives in the Addenda B response §9.1 ("DOC23_ADDB_RESPONSE_TO_ADDA_R4_1_V3_COORDINATION_PROPOSAL_V1.md"); summarized here for the V3.2 record. Status is current as of V3.2 freeze (2026-05-17). | OP-A row | Owner | Consumer | V3.2 surface | Status | |---|---|---|---|---| | OBL-XDOC-EVAL-ENV-01 | DOC23 Evaluation Common Contracts doc | Addenda A R4.1 V3, Addenda B Core R0.7.1 | EvaluationResultEnvelope referenced by V3.2; full schema in Common Contracts | specified_in_owner (Common Contracts doc pending creation) | | OBL-XDOC-MODULES-REGISTRY-01 | DOC23 R3.2 (target) | All addenda | step.evaluator + step.revisor module types persist; step.judge + step.claim_extractor are Addenda A | pending_R3_2_compile | | OBL-XDOC-SCOPE-PRIMITIVES-01 | DOC23 Evaluation Common Contracts doc | Addenda A, Addenda B Core R0.7.1, PropA | ArtifactScopeRef, TextAnchor, StructuredAnchor referenced; lives in Common Contracts | specified_in_owner | | OBL-XDOC-OUTCOME-COMPLIANCE-01 | Addenda A R4.1 V3 | Addenda B (Evaluator via Pattern C) | §5.1.1.1 Criterion is the public sub-contract Judge consumes | in_review | | OBL-XDOC-PROMPT-COMPARISON-SIGNAL-01 | Addenda A R4.1 V3 | DOC8/BDSM | Signal envelope referenced; this addendum doesn't emit | in_review | | OBL-XDOC-CLAIM-EXTRACTOR-PUBLIC-01 | Addenda A R4.1 V3 | Addenda B (Evaluator dispatches) | §5.17 claims_in port specifies the consumer contract | in_review | | OBL-XDOC-EVALUATOR-CLAIMS-IN-01 | Addenda B Core R0.7.1 (consumes from this addendum) | Addenda A Claim Extractor wiring | §5.17 claims_in port contract (V3.2 NEW) | specified_in_owner | | OBL-XDOC-EVAL-SIGNAL-OWNERSHIP-01 | Addenda B Core R0.7.1 | DOC8/BDSM | Signal definitions live in Core R0.7.1; this addendum specifies the surface that produces signals | specified_in_owner (Core R0.7.1 pending) | | OBL-XDOC-LEARNING-MODE-01 | Addenda B Core R0.7.1 (consumes from this addendum) | EC Core | §6.16 learning_mode (V3.2 NEW) | specified_in_owner | | OBL-XDOC-MODEL-CLASS-AXIS-01 | DOC72 | Addenda B Core R0.7.1, Addenda A R4.1 | §13.1 cross_model_applicability + §13.3 model_class axis (V3.2 NEW) | pending_DOC72_update | | OBL-XDOC-BDSM-CONSUME-SIGNALS-01 | DOC8/BDSM | Pattern primitive (DOC72), Task Agent | This addendum specifies the eight Phase 1 signal envelopes consumed by BDSM | pending_DOC8_update | | OBL-XDOC-EC-POLICY-SIGNALS-01 | EC Core | All signal emitters | Envelope governance fields gate at EC policy engine | pending_EC_Core_update | | OBL-XDOC-PROPA-DSPY-TARGETS-01 | PropA R6.3+ | Addenda A R4.1, Addenda B Core R0.7.1 | claim_extractor_main + outcome_evaluator_main + revision_compiler_main + outcome_compiler_main DSPy targets | pending_PropA_update | | OBL-XDOC-DOC20-EVAL-UI-01 | DOC20 | User UI | Shared envelope rendering, Pattern C ad-hoc Judge attachment, learning_mode toggle, model_class pattern context, graph-edit warnings for implicit auto-revision chains and route_all_variants wiring | pending_DOC20_update | | OBL-XDOC-JUDGE-EVALUATOR-OUTPUT-IN-01 (V3.3 NEW) | Addenda A R4.1 V3+ | This addendum (Pattern C wiring crystallization) | Judge module specifies input port consuming `EvaluationArtifactEnvelope` from `step.evaluator.evaluation_result_out` (V3.3 §5.18). Port name at Addenda A's discretion (e.g., `evaluation_result_in`, `evaluator_output_in`, `upstream_evaluation_in`); the contract is the consumed payload type, not the port name. Pattern C wiring uses this port. Hidden-dispatch prohibition (parallel to V3.2 §5.17.5): Judge consumes via graph wiring; the Evaluator MUST NOT hidden-dispatch Judge. | in_review | **Coordination V3 spec-anchor sentence (per coordination V3 §3.1, normative in this addendum):** > *Auto-revision is a property of the Revisor's `AutonomousModePolicy` (§6.6), not the Experiment surface. If a user wires Revisor downstream of an Experiment's variant output, the Revisor's policy determines whether revision proceeds autonomously. Experiments do not introduce auto-revision policy of their own.* This sentence forecloses re-litigation of where auto-revision authority lives. --- # END OF V3.3 SPECIFICATION **Document:** DOC23 Addenda B Addendum — Outcome Evaluator/Revisor Subsystem V3.3 **Status:** Build-ready surgical patch over V3.2 crystallizing Pattern C wiring at port level **Date:** 2026-05-17 **Total sections:** 29 **Total patches incorporated:** 39 (per Canonicalization Patch V2) + V3.1 audit fixes + V3.2 coordination patch + V3.3 Pattern C wiring crystallization **Total cross-doc obligations:** 22 (V3.1) + 14 (V3.2 coordination) + 1 (V3.3 Pattern C) = 37 **Self-contained:** Yes; suitable for cross-LLM red teaming and AI implementation agent consumption Next steps: (1) DOC23 Evaluation Common Contracts V1.1.1 §3.7 documents Pattern C envelope consumption semantics (sibling change to this V3.3 patch); (2) OP-A V3.11 records OBL-XDOC-JUDGE-EVALUATOR-OUTPUT-IN-01 (sibling change to this V3.3 patch); (3) Addenda A R4.1 V3 next revision specifies Judge's `evaluation_result_in` (or chosen name) input port surface. V3.2 may be retired once V3.3 is reviewed. ---