ELNOR REPO READER TEXT MIRROR Original path: Current Specs/DOC23/DOC23 Addenda B/DOC23_ADDENDA_B_CORE_R0_7_1.md Source repo: /Users/OpenClaw1/Elnor/Elnor Specs Git branch: main Git commit: dbaa25962edc11ab30e8d4ca1715f9ae5bf77331 Generated: 2026-06-09T01:23:58.539Z --- # DOC23 Addenda B Core — Task Intelligence, Memory, Observability, Natural Task Invocation, and Self-Improving Task Design **Revision:** R0.7.1 Core — clean replacement for R0.7; reference/topology cleanup only; no substantive architecture changes **Date:** 2026-05-17 **Status:** Build-ready Core specification. R0.7.1 is the family-core for Addenda B's Task Design domain. Outcome Evaluator/Revisor lives in the V3.3.1 sub-addendum. Source Workspace, Task Forum + Run Board, and Feedback Delivery live in their own sub-addenda. Shared evaluation primitives live in DOC23 Evaluation Common Contracts (sibling to the parent doc). **Target parent:** DOC23 — ELNOR Task System: Modular Automation Architecture R3.1 (R3.2 absorbs shared primitives later) **Companion docs:** DOC23 Addenda B / Outcome Evaluator+Revisor V3.3.1; DOC23 Evaluation Common Contracts V1.1.1; DOC23 Addenda A R4.1 V3 (with V4.1 Coordination Patch / V5 Mini-Card from coordination V3 FINAL); DOC23 Addenda B / Source Workspace V1.0.1; DOC23 Addenda B / Task Forum + Run Board V1.0.1; DOC23 Addenda B / Feedback Delivery V1.0.1; DOC23 R3.1; DOC24 R3; DOC72 R5.73; DOC25 V2.0; DOC73 V1.5.1; DOC8 v1.11.4; DOC24 Addendum A BDSM v6.4; DOC3 R11.3 Addenda A R2.2; DOC17 R4.2; DOC11 R15 Amendment Proposal R2; OpenClaw Release Alignment Running Spec Update List R2; EC Core Addendum A V3.3; PropA R6.3+; OP-A V3.7+. **Purpose:** Define the Task Agent, natural saved-task invocation, Task Blueprint, template/preset intelligence, Task Agent education/TKP, task-design self-learning, DOC72 task-memory integration, telemetry spine, artifact/output intake, DOC24 capability/runtime routing, OpenClaw/DOC11 runtime alignment, unified prompt improvement, prompt-evaluation tasks, task UI/telemetry surfaces, module continuation/rerun mechanics, shared-task forward-compatibility, and all cross-doc seams required to make DOC23 a reliable, inspectable, self-improving saved-task system. R0.7.1 also documents the family topology (Core + V3.3.1 + three sub-addenda + Common Contracts) and integrates the Addenda A ↔ Addenda B coordination V3 FINAL converged architecture. **R0.7.1 changes from R0.7 (reference/topology cleanup only):** R0.7.1 is a full replacement copy of R0.7. It makes no substantive architecture, schema, route, storage, or runtime changes. It only clarifies the current Addenda B family topology after sibling documents landed: Outcome Evaluator/Revisor V3.3.1, Common Contracts V1.1.1, Source Workspace V1.0.1, Task Forum + Run Board V1.0.1, and Feedback Delivery V1.0.1. It also marks R0.6.4 as superseded/provenance-only. **R0.7 changes from R0.6.4 (substantive change log):** R0.7 supersedes R0.6.4 with three classes of changes: **Class 1: Family-topology reorganization (per Will's earlier organizational decision).** R0.6.4 was the singular Addenda B document carrying all task-design concerns. R0.7 retains the Task Design domain core but **does NOT carry Outcome Evaluator/Revisor, Source Workspace, Task Forum + Run Board, or Feedback Delivery content** — those live in their own sub-addenda. Section additions in this Core reflect the family-topology context: §0A (R0.7 supersession statement and topology), updated §0.4 (scope summary excludes sub-addenda content), updated §1 (executive summary scoped to Core). **Class 2: R0.6.5 base material absorption.** R0.6.5 proposal §§1-4 (governing principles, complexity profiles, what this adds, object taxonomy) absorbed into this Core's existing §2 governing principles and §3A terminology/object taxonomy. R0.6.5 §§5-7 (OutcomeSpec, Outcome Evaluator, States) superseded by V3.3.1 — referenced, not duplicated. R0.6.5 §§8-15, §§16-18, §§19-22 distributed to Feedback Delivery V1.0.1, Source Workspace V1.0.1, and Task Forum + Run Board V1.0.1 respectively. R0.6.5 retires when sub-addenda land. **Class 3: Addenda A ↔ Addenda B coordination V3 FINAL integration.** The locked V3 FINAL coordination architecture is integrated: - §9 Task Design Learning extended with the eight Phase 1 learning signals wrapped in `EvaluationLearningSignalEnvelope` (per coordination V3 §2.7, §2.11) - §13/§13A DOC24 capability binding extended with model_class context for cheap-LLM learning mode - §17 series Prompt evaluation updated to consume the shared `EvaluationResultEnvelope`, reference DSPy targets coordinated with PropA, and integrate Pattern C ad-hoc Judge attachment (per coordination V3 §2.9, §2.10) - §24 cross-doc obligations extended with the V3 FINAL coordination OP-A rows (14 new rows; full text in §24B) - New §0B "Shared evaluation infrastructure" cross-references DOC23 Evaluation Common Contracts schemas and V3.3.1 module surfaces **Class 4: Resolved open questions.** Several R0.6.4 §26 open questions are resolved by the V3 FINAL coordination outputs and noted in §26. **Migration note:** R0.6.4 is superseded and should be treated as provenance/source-only. Together, R0.7 Core, Outcome Evaluator+Revisor V3.3.1, Common Contracts V1.1.1, Source Workspace V1.0.1, Task Forum + Run Board V1.0.1, and Feedback Delivery V1.0.1 cover the operative Addenda B family. Cross-references from other addenda to R0.6.4 sections that moved should update during their next revisions. **Source merge:** R0.7 uses R0.6.4 (this filename's predecessor) as the base. R0.6.5 proposal base material (§§1-4) merged into §2 and §3A. Domain-specific R0.6.5 content distributed to sub-addenda. --- ## §0A — R0.7 supersession statement and family topology This Core supersedes R0.6.4 and reorganizes Addenda B into a family of focused specifications. R0.7.1 is the entry point. ### §0A.1 Family topology ``` DOC23 (parent) ├── DOC23 R3.1 — Task System Modular Architecture (operative parent) ├── DOC23 R3.2 — (later) absorbs Evaluation Common Contracts │ ├── DOC23 Addenda A R4.1 V3+ — Judge, Experiment, Claim Extractor │ ├── DOC23 Evaluation Common Contracts V1.1.1 — sibling doc; shared primitives │ (EvaluationResultEnvelope, slice schemas, EvaluationLearningSignalEnvelope, │ Criterion, ArtifactScopeRef, TextAnchor, StructuredAnchor, │ VariantEvaluationLineage, CriterionLineage) │ Retires when DOC23 R3.2 absorbs │ └── DOC23 Addenda B family ├── DOC23 Addenda B Core R0.7.1 (THIS DOCUMENT) │ — Task Agent, Blueprint, Templates, Task Design Learning, │ Task Activity Memory, DOC24 binding, Telemetry, Task Assessment, │ Prompt improvement, baseline UI/routes/storage/validation │ ├── DOC23 Addenda B / Outcome Evaluator+Revisor V3.3.1 — DONE │ — Outcome Compiler, Evaluator, Revisor, Dispatcher, │ Candidate versions, Taint, Patterns, Quality, Governance, │ claims_in port, learning_mode, model_class │ ├── DOC23 Addenda B / Source Workspace V1.0.1 — ACTIVE │ — Workspace object, Source Research module, ingestion, UI │ ├── DOC23 Addenda B / Task Forum + Run Board V1.0.1 — ACTIVE │ — Forum object, Board posts, Module Assistance Requests, UI │ └── DOC23 Addenda B / Feedback Delivery V1.0.1 — ACTIVE — Evaluation Feedback Bundle, Run Guidance, Repair Instructions, Feedback Routing Policy, four delivery channels, Consumption Receipts ``` ### §0A.2 Why this organization R0.6.4 was the singular Addenda B carrying everything. As the Addenda B subsystems matured (especially Outcome Evaluator/Revisor → V3.1 → V3.3), single-document architecture became impractical: 9,570+ lines is unwieldy for fresh-window red-team review, for cross-LLM evaluation, and for focused implementation. The family topology splits the subsystems into focused documents: - **Each sub-addendum is independently red-teamable** in a fresh chat window - **Each has its own version trajectory** — V3.3.1 can iterate without forcing Core R0.7.1 rev - **Shared primitives live at the family-shared level** (Common Contracts) — no buried-in-one-addendum cross-references - **The Core stays canonical for cross-cutting concerns** that don't belong in any sub-addendum ### §0A.3 What Core R0.7.1 owns This Core R0.7.1 owns the **Task Design domain** — everything about designing, instantiating, learning from, and improving tasks. Specifically: - Task Agent (system agent for natural task invocation) - Task Blueprint (the persisted task design primitive) - Templates and presets (reuse-first task creation) - Task Design Intelligence substrate (the data the Task Agent reasons over) - Task Design Learning (how the system learns better task designs over time) - DOC72 Task Activity Memory integration - DOC24 capability binding and runtime routing - OpenClaw / DOC11 alignment - Telemetry and observability - Task Assessment (pre-execution) - Artifact and output intake policy - Prompt improvement and prompt evaluation - Baseline UI surfaces (with task-detail UI living here; per-subsystem UI moves to sub-addenda) - Routes, commands, read models, SSE - Storage paths - Validation codes - Cross-doc obligations (Core-owned rows here; sub-addendum-owned rows there) - Privacy, retention, degradation modes - Open questions for architect review ### §0A.4 What Core R0.7.1 does NOT own - Outcome evaluation and revision — V3.3 - Source workspace + source research module — Source Workspace V1.0.1 (when created) - Task Forum, Run Board, Module Assistance Requests — Task Forum + Run Board V1.0.1 (when created) - In-run feedback delivery (Evaluation Feedback Bundle, Run Guidance, Repair Instructions, Routing Policy, Delivery Channels, Consumption Receipts) — Feedback Delivery V1.0.1 (when created) - Shared evaluation envelope, slices, signal envelope, anchoring primitives, Criterion — DOC23 Evaluation Common Contracts V1.1.1 References to these surfaces in this Core go through the named owning document. Implementations consult the owning document for the canonical schemas. ### §0A.5 Reading order recommendation For new readers of Addenda B: 1. This Core R0.7.1 (§0–§28) — for the task design system substrate 2. DOC23 Evaluation Common Contracts V1.1.1 — for the shared evaluation primitives the Core references 3. V3.3.1 Outcome Evaluator+Revisor — for the deep outcome evaluation subsystem 4. Sub-addenda (Source Workspace, Forum, Feedback Delivery) — for their respective subsystems when ready For implementers, all five docs are required reading to build the full system. For red-team review of a specific subsystem, that subsystem's document is largely self-contained. --- ## §0B — Shared evaluation infrastructure integration Cross-references to the shared evaluation primitives used throughout this Core. ### §0B.1 EvaluationResultEnvelope (Common Contracts §3) Every evaluation result emitted by an Evaluator (V3.3), a Judge (Addenda A), an Agent Review Gate, a Deterministic Scorer, or a Human Review record is wrapped in `EvaluationResultEnvelope` (inside Addenda A's `EvaluationArtifactEnvelope`). This Core's prompt evaluation surfaces (§17 series) consume the envelope; the Task Agent reads the envelope to inform task-design suggestions. ### §0B.2 EvaluationLearningSignalEnvelope (Common Contracts §5) Every learning signal emitted by anything in the system wraps in `EvaluationLearningSignalEnvelope`. This Core owns the emitters for five of the eight Phase 1 signal types (per coordination V3 §2.11 and §9 below): - OutcomeEvaluationSignal — emitted by V3.3 Evaluator; envelope structure here - RepairCycleSignal — emitted by V3.3.1 Revisor - TaskProcessGapSignal (runtime) — emitted by Task Agent or V3.3.1 Revisor - TaintClearanceSignal — emitted by V3.3.1 Revisor or by user-action surfaces - HardCallResolutionSignal — emitted by V3.3.1 Revisor when Hard Calls resolve DOC8/BDSM consumes the unified signal stream. Addenda A owns PromptComparisonSignal. R0.6.4 §24A.7 UserActionSignal continues to exist (now wrapped in the common envelope). DOC8/BDSM emits TaskDesignCorrelationSignal (aggregate; cross-doc obligation per OBL-XDOC-BDSM-CONSUME-SIGNALS-01). ### §0B.3 Criterion (Common Contracts §6) `Criterion` is the public sub-contract on `EvaluationOutcomeDefinition.criteria[]`. The Task Agent reads criteria when assessing whether a saved task's outcome definitions are well-formed; the Outcome Compiler (V3.3) reads them when compiling evaluation plans; Judge (Addenda A) reads them for `outcome_compliance_scoring`. This Core does not redefine the schema. ### §0B.4 Anchoring primitives (Common Contracts §7) `ArtifactScopeRef`, `TextAnchor`, `StructuredAnchor` are shared between Addenda A's Claim Extractor (`step.claim_extractor`) and this Core's evaluation surfaces. When the Task Agent surfaces "your saved task X failed criterion Y in section Z" suggestions, the section reference uses `ArtifactScopeRef`. ### §0B.5 Cheap-LLM learning mode integration Per coordination V3 §2.10 and V3.3 §6.16, `RevisorConfig.learning_mode` enables cheap-LLM signal generation and cross-model calibration. This Core's Task Agent surfaces (§4) consume learning_mode context: pattern surfacing respects the mode the user is currently running in, and the Task Agent's suggestions distinguish patterns learned at the current `model_class` from patterns requiring validation. ### §0B.6 Pattern C ad-hoc Judge attachment Per coordination V3 §2.9, Pattern C allows Judge to attach downstream of any Evaluator output to generate per-criterion numeric scores without an Experiment. The Task Agent surfaces this as an optional "attach numeric scoring" action on Evaluator result cards; the user can opt in per task. This Core's §17 series prompt evaluation also leverages Pattern C when measuring prompt-version comparisons against outcomes without standing up full Experiments. --- ## §0 — How to read this addendum ### §0.1 Normative status This addendum is an end-state architectural draft. It is **not** an implementation phase plan. Sections that describe dependency ordering are non-normative readiness notes only. Operative behavior is defined by the contracts, schemas, invariants, and owner splits below. ### §0.2 Relationship to DOC23 R3.1 DOC23 R3.1 remains the operative parent specification for the task graph, modules, ports, config schemas, execution engine, lifecycle, API routes, SSE events, storage, Run Inspector, templates, and module presets. This addendum does not replace those mechanics. It extends DOC23 with intelligence, memory, observability, and design-learning contracts. ### §0.3 Relationship to DOC23 Addenda A DOC23 Addenda A R4.1 V2 owns experiments, judges, claim extraction, sub-agent awareness inside modules, prompt optimization reservations, DOC24 context injection for task modules, and session continuity. This addendum consumes those surfaces when it defines telemetry, Task Assessment, evaluation datasets, prompt/module-preset promotion safety, and task-design learning. It does not redefine Judge, Experiment, Claim Extractor, or sub-agent runtime semantics. ### §0.4 Scope summary Addenda B defines: 1. Task Agent as the conversational task-design, task-inspection, task-retrieval, and task-assessment surface. 2. Task Agent education through a hybrid, self-updating Task Design Intelligence substrate. 3. Task Blueprint as the semantic intent contract behind a graph. 4. Reuse-first task creation through task templates and module presets. 5. DOC72 Task Activity Memory for task runs, outputs, artifacts, goals, entities, matters, and follow-on sessions. 6. Task telemetry and trace spans. 7. Artifact/output intake policy for final and intermediate outputs. 8. DOC24 capability binding and runtime expansion for tools, connectors, MCP servers, procedures, skills, and models. 9. Task creation learning, Task Assessment, and self-improvement loops. 10. UI, storage, routes, validation, and cross-doc obligations required to make the above real. ### §0.5 R0.3 audit-pass additions The R0.3 audit pass confirms the R0.1/R0.2 architecture but strengthens several areas that were under-specified relative to the design discussion: 1. Adds a terminology/object taxonomy so Task, Task Run, Blueprint, Template, Preset, Template Group/Workflow Package, Design Pattern, Design Case, Artifact, Work Product, and Task Assessment do not blur. 2. Expands the Template Group / Workflow Package concept and clarifies that most single-template bundles should remain Task Templates, while groups/packages exist only when multiple templates or shared cross-template assets must be managed together. 3. Strengthens Task Agent education with a non-monolithic `skill.md` rule, required decision-card families, and explicit examples for context routing, named outputs, Carry Forward, Loop Controller, session continuation, holds, signals, email triggers, model choice, and connector/capability use. 4. Adds stronger prior-task Casebook rules: user-created, user-liked, and user-refined tasks can teach the knowledge pack by extracting not just what was built, but why a module/config was used, what goal it served, and whether it worked. 5. Adds minimum DOC72 work-context/entity-link capture requirements so task memory links to matters, entities, work context, artifacts, outputs, follow-up rooms/chats, and goals. 6. Adds explicit TaskDesignIntakeBundle and DOC24/DOC3-style task-creation intake rules so intent, rationale, preferences, goals, and template directives are not lost. 7. Adds TaskAgentEducationProfile, TaskDesignHintCard, Spec Librarian behavior, and TaskAgentInstructionCandidate so Task Agent expertise is compiled, retrieved, validated, and improved rather than stored in one monolithic prompt. 8. Adds goal-based design-learning records and non-normative external benchmark rationale. 9. Adds end-to-end examples showing task creation, progress inspection, output retrieval, and post-run learning. --- ### §0.6 R0.6.1 unified merge-pass additions R0.6.1 merges the R0.3 Audit-Complete Addenda B baseline with the R0.5 Audit-Complete Merge Pack source. R0.6.1 is intended to be the single working Addenda B draft for subsequent red-team and seam review. The R0.5 merge source is no longer intended to operate as a companion addendum. The R0.6.1 merge adds or reconciles the following topic families: 1. Natural task invocation, direct-first handling, and the rule that saved tasks are used only when task formalization materially improves the user outcome. 2. DOC24 Task Mode Resolver, TaskModeDecision, TaskOpportunityPacket, token-budget limits, and no full task-system prompt stuffing. 3. Task Agent typed entrypoints, runtime identity, system-agent registration, model/think/fallback settings, OpenClaw named-agent compatibility, and visible degraded modes. 4. Native OpenClaw monitoring versus saved-task monitoring, and the workflow formalization continuum from direct chat to standing procedure / recurring saved task. 5. Visible Elnor-created saved tasks with receipts, graph links, edit/cancel controls, DOC72 memory linkage, and no hidden task graphs. 6. DOC72 / DOC3 / DOC23 procedural boundaries, including TaskInvocationDirective and the distinction between memory directives, standing procedures, DOC3 semantic skills, and DOC23 tasks. 7. Task invocation learning, Task Agent proposal-edit learning, TaskSystemLearningReport, and BDSM/DOC8 signals for accepted/rejected task suggestions, overuse, underuse, and prompt edits. 8. TKP boundary correction: TKP is the Task Agent's detailed task-design/reference substrate, not the ambient task router, not the Task Agent core contract, and not a user-facing DOC73 library. 9. Unified prompt improvement: one user-facing Improve Prompt capability, DOC17 Prompt Advisor as a quick service, Task Agent as task-aware prompt orchestrator, and deep prompt testing as tracked DOC23 prompt-evaluation tasks. 10. Prompt candidate generation, Experiment/Judge integration, Promptolution as a candidate-generation method rather than a module, and DSPy/GEPA reserved until the Addenda A R5 optimization substrate is operative. 11. Additional routes, read models, SSE events, UI surfaces, failure modes, and cross-doc obligations required to make the above real. ### §0.7 R0.6.4 UI / module-continuation / sharing-forward-compatibility additions R0.6.4 adds the UI and runtime refinements surfaced during the DOC20/mockup review pass. These additions are normative where they define schemas, owner splits, and UI contracts; visual layout examples remain illustrative. R0.6.4 adds or clarifies: 1. DOC20 integration for the modular task page: tasks open as workspace tabs, the Browser column remains available, the right chat column defaults closed, and task/list/load/save/duplicate behavior is unified with Browser and task-page controls. 2. A Graph Overview Preservation Rule: the broad modular graph/module-block view remains clean and lightly badged; heavy telemetry belongs in drawers/inspector/detail surfaces. 3. Graph ⇄ Run Inspector navigation with synchronized task/run/module/artifact state. 4. Module detail/config panel quick access to last/current run data, module outputs, and the full Run Inspector. 5. A consolidated Run Inspector with Header, Run Flow & Steps, Artifacts & Deliveries, and Context & Audit, replacing any tab-heavy telemetry design. 6. Delivery records for task outputs, including where and when outputs were emailed, messaged, saved, posted, added to a library, written to memory, or delivered elsewhere. 7. A DOC24 context inspector showing injected and excluded memories, entities, tools, connectors, procedures, documents, libraries, prompt layers, and context packets, with thumbs/comment feedback routed to DOC8/BDSM. 8. A Tasks page command center for active, scheduled, saved/reusable, historical, and preset/task-segment assets, including run count, average duration, average cost, success rate, and links into graph/inspector views. 9. User-facing saved-task terminology: “Saved Task,” “Reusable Saved Task,” “Scheduled Saved Task,” and “Triggered Saved Task” are primary UI labels; internal `TaskTemplateRecord` may remain an implementation detail. 10. Task Segment as the reusable-module-group concept formerly described as “sub-task.” 11. Task Agent as a contextual side-panel chat with task/run/module/artifact scope, full chat treatment, attachments, feedback controls, and scoped conversation threads. 12. Module session continuation and follow-up distinctions: ask Task Agent about a module, start a new follow-up from module context, or continue the original OpenClaw session only when runtime truth says it is available. 13. Rerun/fork-from-module mechanics using immutable parent runs, module activation replay records, activation sequence IDs, upstream output freezing, graph/context/source-document policies, and side-effect replay controls. 14. Judge and Experiment module rerun/rescore/variant/fork affordances, including rescoring the same output, rerunning variants, adding variants, and forking downstream from a winning variant. 15. Export audit as a redaction-aware audit bundle, not a raw log dump. 16. Standalone/shared task surface forward-compatibility: task canvas, inspector, artifacts/deliveries, and module detail components must be renderable without the full DOC20 shell; full collaborative sharing is deferred to DOC50+ but DOC23 defines task-specific hooks now. ## §1 — Executive summary DOC23 is already the correct execution substrate for reliable business and personal automation: visible graph, explicit modules, typed ports, typed configs, explicit cables, Run Inspector, templates, module presets, durable task records, and EC sole-writer execution. The missing layer is not a central autonomous orchestrator. The missing layer is **task intelligence**: the ability for Elnor to understand, design, inspect, remember, assess, and improve tasks without turning them into opaque black-box agent improvisations. This addendum introduces the **Task Agent** as a system agent/capability that functions as: - a natural-language task designer; - a reuse-first template/preset matcher; - a progress/status window into running tasks; - a retrieval surface for outputs and intermediate artifacts; - a Task Assessment generator; - a DOC72-aware task-memory navigator; - a task-design learning participant; - and a user-facing explanation layer over graphs, runs, artifacts, telemetry, and improvements. Task Agent is **not** a DOC23 module and is **not** a runtime orchestrator. EC and DOC23 execute the graph. DOC24 resolves capabilities and context. DOC72 stores task activity memory and semantic relationships. DOC25 processes produced/downloaded/received documents. DOC8/BDSM learn contextual utility. Task Agent reads these substrates and proposes changes through EC commands. The main architecture is: ```text User intent / task question -> Task Agent -> Task Design Intelligence substrate -> DOC72 task/entity/template/preset/memory lookup -> DOC24 capability/context/tool/model lookup -> DOC23 graph/template/preset lookup and validation -> Blueprint + graph proposal + capability plan + artifact policy + preflight -> EC command if user approves -> DOC23 execution -> telemetry + artifacts + DOC72 Task Activity Memory + DOC8/BDSM signals -> Task Assessment + candidate improvements ``` The product experience target is: ```text Elnor knows the task system. Elnor knows what tasks/templates/presets already exist. Elnor knows what tools/connectors/models are available now. Elnor knows what prior tasks worked and why. Elnor knows the user's goals and preferences. Elnor can set up a process better than the user initially described. Elnor can tell the user what a running task is doing. Elnor can retrieve outputs and explain what happened. Elnor can propose improvements without silently mutating the graph. ``` --- ## §2 — Governing principles and invariants ### §2.1 DOC23 remains the visible process substrate Every executable task remains a DOC23 graph. Modules, ports, cables, configs, triggers, gates, outputs, loops, holds, delays, switches, junctions, and other task mechanics remain visible and auditable. Task Agent may design, inspect, explain, assess, and propose changes. It must not run hidden workflow steps outside the saved graph. ### §2.2 No central runtime orchestrator This addendum rejects a central runtime orchestrator that makes ad-hoc workflow decisions, rewires execution, or spawns sub-agents outside module ownership. Runtime autonomy is bounded by: - graph topology; - module configs; - port/cable semantics; - task security policy; - durable checkpoints; - human gates; - capability bindings; - side-effect policy; - DOC24 packet/capability resolution; - EC execution state. ### §2.3 Task Agent is an intelligence interface, not an executor Task Agent is a system agent/capability. It is not a task module. It may invoke DOC23/DOC24/DOC72/DOC25/DOC8 read tools, validation tools, preflight tools, and proposal commands. It may not perform direct durable writes except through EC commands. ### §2.4 EC sole durable writer All durable writes created by this addendum flow through EC. Q remains read/control surface. Task Agent emits proposals, write intents, commands, or assessment requests; EC validates and writes. ### §2.5 Operational run truth and DOC72 memory are distinct DOC23/EC run storage is the source of truth for exact execution details. DOC72 stores structured task activity memory and relationship links. DOC8/BDSM stores derived learning/utility signals. These stores must not be collapsed. ### §2.6 Raw content is not DOC72 memory truth DOC72 must not become a raw log or document warehouse. Full raw prompts, outputs, documents, and tool results remain in run-scoped storage, DOC25 document stores, DOC73 libraries/corpora, or other owner-doc stores. DOC72 stores references, summaries, relationships, goals, work-product links, and selected structured memory. ### §2.7 Artifact intake is policy-governed Final outputs and intermediate module outputs are not all equal. Artifact intake must be configurable globally, per task, per module, per port, and per output class. Most intermediate outputs default to run-scoped storage only. High-value outputs can be made findable, summarized, routed through DOC25, or added to a DOC73 library/corpus. ### §2.8 Generated truth plus authored and learned design wisdom Task Agent education cannot be a single prompt or purely generated from code. It requires: - generated canonical mechanics from DOC23/spec/code; - authored task-design doctrine; - module/config rationale cards; - worked examples; - task-design cases from prior tasks; - DOC72 goal links; - template/preset semantic cards; - user preference overlays; - DOC8/BDSM learning bundles; - typed tools and validators. ### §2.9 Reuse before creation Before creating a new task graph, Task Agent must search existing task templates, module presets, task design patterns, and semantically similar prior tasks. It should recommend reuse, customization, or fork before creating a new graph. ### §2.10 Learning is proposal-based Task Agent and learning systems may propose improvements to templates, presets, pattern cards, artifact policies, question policies, and Task Agent guidance. Material changes to executable task templates, module presets, security policy, side-effect policy, or capability bindings require versioned proposals and approval. ### §2.11 No phantom UI or controls Every visible control introduced by this addendum must map to a route, command, read model, durable write or explicit no-op, telemetry event, and refreshed state. Unwired controls must be explicitly marked deferred. ### §2.12 Autonomy (Principle X; R0.6.5 §1 absorption) The system supports autonomous task execution when explicitly enabled, but autonomy is a **property of the modules and agents in the task graph**, not an emergent property of having those modules. Specifically: - Auto-revision is a property of the V3.3.1 Revisor's `AutonomousModePolicy` (V3.3 §6.6), not the Experiment surface - Auto-execution of saved tasks is a property of the Task Blueprint's invocation policy (§6.4) and the user's standing instructions - Auto-promotion of learned patterns is gated by EC Core's compiled policy engine (per OBL-XDOC-EC-POLICY-SIGNALS-01) - Auto-routing in Experiments uses `experiment_winner_routing` (Addenda A R4.1 V3, per coordination V3 §3.1) Autonomy is opt-in by module, not opt-out by surface. A user wiring a task graph with autonomous Revisor downstream of an Experiment with `pass_through_winner` opt-in is explicitly creating an autonomous chain; DOC20 surfaces this at graph-edit time (per OBL-XDOC-DOC20-EVAL-UI-01) so the chain isn't constructed accidentally. **Spec-anchor sentence (normative; per coordination V3 §3.1):** > *Auto-revision is a property of the Revisor's `AutonomousModePolicy`, not the Experiment surface. If a user wires Revisor downstream of an Experiment's variant output, the Revisor's policy determines whether revision proceeds autonomously. Experiments do not introduce auto-revision policy of their own.* This sentence forecloses re-litigation of where auto-revision authority lives. It applies symmetrically to other autonomy questions: each module owns its own autonomy policy; surfaces do not introduce autonomy policy beyond what the modules consume. ### §2.13 Complexity profiles (R0.6.5 §4 absorption) Tasks vary in complexity. The Task Agent reasons about complexity along a small set of axes; complexity informs onboarding, suggestion verbosity, default settings, and pattern surfacing. The axes: - **Modules count.** Simple = ≤3 modules; medium = 4-10; complex = 10+. - **Outcome count.** Simple = ≤2 outcomes; medium = 3-5; complex = 6+. - **Source dependency.** Self-contained (no external sources); single-source (one Source Workspace); multi-source (multiple Source Workspaces). - **Output stakes.** Internal advisory (low); client-facing (medium); filing-or-binding (high). - **Autonomy level.** Manual (user gates every step); partial (auto-revision opt-in); full (auto-revise + auto-route). The Task Agent surfaces complexity in onboarding ("this is a complex task; consider...") and in suggestions ("simpler tasks like X often work better when..."). Complexity does NOT gate features; it informs Task Agent presentation. ```ts TaskComplexityProfile { task_id: string modules_count: number outcome_count: number source_dependency: "self_contained" | "single_source" | "multi_source" output_stakes: "internal_advisory" | "client_facing" | "filing_or_binding" autonomy_level: "manual" | "partial" | "full" // Derived classification overall_complexity: "simple" | "medium" | "complex" schema_version: 1 } ``` The Task Agent reads this when constructing recommendations; the Task Blueprint persists it as a derived field updated whenever the blueprint changes. ### §2.14 What this addendum adds (R0.6.5 §2 absorption) This Core R0.7.1 adds to DOC23 / Addenda A: - **Task intelligence layer** — the Task Agent system agent and supporting infrastructure - **Task design memory** — DOC72 Task Activity Memory integration for cross-run learning - **Telemetry spine** — observability over task design, execution, and learning signals - **Self-improving task design** — patterns learned from runs feed back into task design suggestions - **Prompt improvement and evaluation** — DOC17 prompt registry integration with V3.3 Pattern C ad-hoc Judge attachment R0.7 does NOT add (these live in sub-addenda): - Outcome evaluation engine — V3.3 - In-run feedback delivery — Feedback Delivery V1.0.1 - Source workspace infrastructure — Source Workspace V1.0.1 - Task forum and run board — Task Forum + Run Board V1.0.1 ### §2.15 Object taxonomy (R0.6.5 §3 absorption; merged into §3A below) R0.6.5 §3 added object taxonomy expansion. The expanded taxonomy is folded into §3A (Terminology and object taxonomy) below. --- ## §3 — Owner split ### §3.1 DOC23 owns DOC23 owns: - Task Agent task-facing contracts defined in this addendum. - Task Blueprint schema and graph binding. - Task templates and module presets as executable DOC23 assets. - Task Template Semantic Cards and Module Preset Semantic Cards as DOC23-authored projections, with DOC72 graph links. - Task telemetry schemas for task execution. - Task Artifact Index. - TaskPreflightReport. - TaskAssessment. - Artifact intake policy for task outputs. - Task Agent task-design tools that operate over DOC23 graph/template/preset/catalog/read models. - Run Inspector and task UI extensions. ### §3.2 EC Core owns EC owns: - Durable writes. - Task Agent command handling. - Task Knowledge Compiler execution and pack activation. - Task graph validation/preflight execution. - Read-model generation. - Storage paths and atomic writes. - Background scheduling, nightly extraction, compaction, and pack regeneration. - Effective runtime state, incognito controls, and processing controls. - Route registry and command registry entries. ### §3.3 DOC24 owns DOC24 owns: - Capability registry and live action state. - Tool/connector/MCP/procedure/model availability. - Semantic routing and packet assembly. - Rendering/injection of relevant task-design cards into Task Agent context. - Runtime expansion of module tool packs within allowed envelopes. - DeliveryDirective and compiled utility bundle consumption. - Knowledge/capability packet lifecycle and injection manifests. ### §3.4 DOC72 owns DOC72 owns: - Entity graph storage and node taxonomy. - `execution_trace`, `goal`, `world_entity`, `work_product`, `memory_directive`, `procedure`, `tool_capability`, and other node payload contracts. - Task Activity Memory graph relationships. - Goal nodes and task-design goal links. - Semantic projections for templates, presets, task runs, artifacts, design cases, and learned preferences. - Confidence/provenance/temporal/connection/experience dimensions for task-related knowledge. ### §3.5 DOC25 owns DOC25 owns universal ingestion for task-produced, task-downloaded, task-received, or task-attached documents. DOC23 emits ingestion intents/events. DOC25 processes, deduplicates, converts, stores derived artifacts, reports quality, and exposes document retrieval tools. ### §3.6 DOC73 owns DOC73 owns deep extraction into libraries/corpora where a task output or downloaded/received document is intentionally added to a bounded knowledge substrate. DOC23 must not use DOC73 as a generic artifact store; DOC73 is used when the endpoint is a library/corpus or extraction lens. ### §3.7 DOC8 and BDSM own DOC8 and BDSM own: - friction events; - learning signals; - question utility; - tool/procedure/capability utility; - pattern detection; - compiled learning bundles; - self-healing proposal generation; - satisfaction/utility computation. DOC23 defines task-specific signal shapes and consumers. DOC8/BDSM compute and compile. ### §3.8 DOC3 owns DOC3 owns procedural skill lifecycle, demonstrated procedures, semantic skill learning, and DOC3 procedure execution. DOC23 templates/presets can be routable like DOC3 procedures but remain DOC23-owned executable assets. ### §3.9 DOC11/OpenClaw owns DOC11/OpenClaw owns final runtime dispatch, model/provider routing, session truth, `sessions_spawn`, ACP/coding session runtime, and native sub-agent behavior. DOC23 telemetry records task-scoped observations of those events. --- ## §3A — Terminology and object taxonomy This addendum uses the following terms precisely. The Task Agent must use the same taxonomy when explaining, designing, searching, and updating task assets. ### §3A.1 Task A **Task** is an executable DOC23 graph instance or saved graph definition. It is the thing EC/DOC23 can run. Its canonical executable truth is the graph: modules, ports, cables, configs, triggers, gates, output modules, and task settings. ### §3A.2 Task Run A **Task Run** is a concrete execution of a Task. It has operational run truth: run record, run snapshot, checkpoints, events, spans, module activation states, artifacts, prompts, tool calls, costs, errors, and receipts. ### §3A.3 Task Blueprint A **Task Blueprint** is the semantic intent contract behind a graph. It explains what the task is for, what process it implements, what outputs matter, what quality criteria apply, what human gates and side effects are expected, what goals it serves, and how graph modules bind to step intents. A Blueprint is not executable by itself. ### §3A.4 Module Preset A **Module Preset** is a saved configuration for one module type. Example: a Complaint Red-Team Judge preset, a Damages Memo Agent preset, a PACER Docket Fetcher preset, or an email-output preset. Module presets are DOC23-owned executable/config assets; DOC72 may store semantic cards and experience links, not canonical configs. ### §3A.5 Task Template A **Task Template** is a reusable executable graph template. A strong Task Template may bundle a Blueprint, graph, module configs, module presets, capability requirements, artifact policy, assessment plan, sample inputs, example outputs, version history, and semantic card. A Task Template can therefore already contain much of what was initially described as a “process pack.” Do not create a separate package object for single-template cases unless multiple templates or shared assets create a real distinction. ### §3A.6 Template Group / Workflow Package A **Template Group** or **Workflow Package** is an optional bundle of multiple related Task Templates and shared assets. It is not required for ordinary tasks. It earns its place only when the user needs to manage a family of tasks together. Examples: ```text Complaint Drafting Workflow Package - Draft Complaint template - Evaluate Complaint template - Red-Team Complaint template - Revise Complaint template - Shared presets: fact extractor, damages memo, red-team judge - Shared output archetypes: issue matrix, revision plan, work-product package - Shared evaluation rubrics and sample cases Docket Monitoring Workflow Package - Recurring PACER Monitor template - Filing Summary template - Deadline Extraction template - Notification/Human Review template ``` User-facing naming remains provisional. “Template Group” is acceptable as an implementation-facing label. Avoid introducing “Playbook,” “Process Pack,” or “Business Process Pack” as a durable product noun until the UI proves the distinction is useful. ### §3A.7 Task Design Pattern A **Task Design Pattern** is an abstract reusable process motif. It is not an executable graph. Example: source classification → fact extraction → analysis memo → draft → red-team → revise. Patterns help Task Agent design new tasks and explain why certain modules/configs are useful. ### §3A.8 Task Design Case A **Task Design Case** is a distilled prior task or task-creation session that teaches design judgment. It records the user intent, final design, important module/config decisions, the “because” behind those decisions, the goals served, actual outcomes, and reusable lessons. ### §3A.9 Task Assessment A **Task Assessment** is a post-run or on-demand report that unifies operational telemetry, artifacts, judge/experiment results, errors, user feedback, DOC8/BDSM signals, capability findings, substantive findings, and proposed improvements. It is broader than a failure playbook. ### §3A.10 Artifact and Work Product A **Task Artifact** is any material output, receipt, file, message, memo, judge result, subagent result, or module output produced or consumed by a task. A **Work Product** is a higher-value artifact intended to persist as a document, deliverable, filing, memo, report, brief, complaint, exhibit chart, or other reusable professional/personal output. Work Product status should generally imply DOC25 processing and DOC72/DOC25/DOC73 linkage according to policy. ### §3A.11 Task Agent Knowledge Pack A **Task Agent Knowledge Pack** is the active versioned education substrate consumed by Task Agent. It is hybrid: generated canonical mechanics plus authored doctrine, rationale cards, decision cards, examples, design cases, template/preset cards, DOC72 goal/preference links, and DOC8/BDSM learning overlays. It is not one long prompt and not one giant `skill.md` file. --- ## §3B — Natural invocation and product-language rule ### §3B.1 Problem R0.3 defines how Task Agent designs and inspects tasks once task mode is selected. It does not sufficiently define when Elnor should use a saved task, suggest a task, call Task Agent, or simply answer directly. This is a core product seam. If Elnor suggests tasks too aggressively, ordinary chat quality degrades and users are annoyed. If Elnor never suggests tasks, the system fails to exploit one of ELNOR's most valuable features. ### §3B.2 Direct-first invariant **Direct-First Non-Degradation Invariant.** Task awareness MUST NOT materially degrade Elnor's ordinary chat performance. Unless DOC24 identifies a meaningful task signal, no task-specific packet, task-template list, TKP content, or Task Agent instruction SHALL be injected into the ordinary chat turn beyond the tiny ambient task-awareness rule. Ordinary chat, quick answers, direct interactive reasoning, and simple one-step tool/procedure actions remain direct. This invariant is binding. It protects the entire ELNOR system from becoming task-preoccupied. ### §3B.3 Product-language rule Ambient model-facing and user-facing invocation language MUST NOT rely on internal spec labels such as `DOC23` or `DOC24` as operative action words. Internal schemas, route contracts, and owner-doc prose may still use owner-doc identifiers. Use product terms: ```text saved task tracked task tracked workflow saved task template task run task graph task template module preset Task Agent ``` Do not use ambiguous instructions such as: ```text use DOC23 use DOC23/Task Agent use Task Agent ``` Instead use typed verbs: ```text run saved task suggest saved task design task adapt task template review task inspect task run retrieve task output explain task graph assess task review my tasks improve task prompt test prompt variants ``` ### §3B.4 Canonical ambient task-awareness card The following is the recommended compact ambient card for Elnor. It should be short enough for stable baseline injection or equivalent SOUL/core instruction placement. ```text Saved Tasks: A saved task is an inspectable, reusable tracked process with graph steps, progress, outputs, logs, approvals, and artifacts. Use saved tasks when the user explicitly asks to run/create/review/inspect one, references a saved task/template, or the runtime supplies a TaskOpportunityPacket. Saved tasks are best for repeatable, scheduled/triggered, independently running, gate-controlled, artifact-producing, auditable, or template-backed processes. Stay direct-first for ordinary quick answers, direct interactive reasoning, and simple one-step actions. Task Agent: Task Agent is the specialist for designing, adapting, explaining, reviewing, inspecting, and assessing saved tasks and task runs. Invoke a typed Task Agent entrypoint only when the user asks, a task/run/template is explicitly involved, or runtime routing marks the task opportunity as medium/high/explicit. Do not invoke Task Agent merely because a request is complex. ``` Normative clarifications: 1. “Complex” is not by itself a sufficient reason to invoke Task Agent. 2. “Review this complaint” defaults direct unless additional signals suggest saved-task value. 3. “Run the Complaint Evaluation task” is an explicit saved-task request. 4. “Set up a process to review complaints like this going forward” is a Task Agent design request. 5. “Where is the Marex draft task?” is a Task Agent / task-run inspection request. 6. “Send me the output from the Marex draft task” is a task-output retrieval request. 7. The ambient card MUST NOT tell Elnor to consult Task Agent for generic “task fit.” Task opportunity is prefiltered by DOC24/EC; Task Agent consultation occurs only through an allowed typed entrypoint. ### §3B.5 Task Agent canonical naming `Task Agent` is the canonical name. There is no separate `Workflow Designer`, `Task Inspector`, or `Prompt Agent` identity in the initial Addenda B architecture. Use modes/entrypoints for what Task Agent is doing: ```text Task Agent: Design Task Task Agent: Adapt Template Task Agent: Review Task Task Agent: Inspect Run Task Agent: Retrieve Output Task Agent: Explain Graph Task Agent: Assess Task Task Agent: Review My Tasks Task Agent: Improve Prompt Task Agent: Test Prompt Variants ``` User-facing UI labels SHOULD be: ```text Ask Task Agent Design task Review task Inspect run Retrieve output Improve prompt Test prompt variants ``` No distinct public “Workflow Designer” or “Prompt Agent” should be introduced unless the architect later chooses a product rename. --- ## §3C — Task invocation intelligence ### §3C.1 Execution mode taxonomy Elnor chooses among execution modes, not between “task” and “no task” only. ```ts type ElnorExecutionMode = | "direct_chat" | "direct_tool_or_procedure" | "openclaw_background_subagent" | "ec_scheduled_job" | "run_existing_saved_task" | "suggest_existing_saved_task" | "invoke_task_agent_for_design" | "invoke_task_agent_for_review" | "inspect_saved_task_run" | "retrieve_saved_task_output" | "background_or_recurring_saved_task" | "ask_clarifying_question"; ``` ### §3C.2 Use saved tasks when formalization adds value A saved/tracked task is not needed merely because a request is complex. Saved tasks earn their place when formalization adds value: ```text repeatability independent/background execution scheduled or triggered execution durable checkpoints progress visibility human gates side-effect controls artifact tracking multi-output tracking audit trails telemetry run inspection DOC72 task memory template reuse Task Assessment and improvement ``` ### §3C.3 Do not use saved tasks when task overhead exceeds value Do not invoke or suggest a saved task for: ```text ordinary quick answers single-turn direct reasoning simple one-step tool actions ordinary file move/save operations fast graph/memory lookups primary interactive reasoning where the user expects to work directly with Elnor requests with explicit “quick,” “just answer,” “briefly,” “do this here,” or similar direct-mode language ``` ### §3C.4 Task opportunity levels ```ts type TaskOpportunityLevel = "none" | "low" | "medium" | "high" | "explicit"; ``` Recommended handling: | Level | Handling | |---|---| | `none` | Inject no task packet. Elnor answers/acts directly. | | `low` | Usually no task mention. May log weak signal for learning. | | `medium` | Direct answer first; optional quiet task chip if useful. | | `high` | Elnor may recommend a saved task or consult Task Agent. | | `explicit` | User named/asked for task/template/run/output; use task route. | ### §3C.5 TaskModeDecision schema ```ts export type TaskModeDecision = { decision_id: string; user_request_ref: string; resolved_intent_summary: string; selected_mode: | "direct_chat" | "direct_tool_or_procedure" | "openclaw_background_subagent" | "ec_scheduled_job" | "run_existing_saved_task" | "suggest_existing_saved_task" | "invoke_task_agent_for_design" | "invoke_task_agent_for_review" | "inspect_saved_task_run" | "retrieve_saved_task_output" | "background_or_recurring_saved_task" | "ask_clarifying_question"; task_opportunity_level: "none" | "low" | "medium" | "high" | "explicit"; task_agent_consult_policy: | "forbidden" | "not_needed" | "allowed_if_user_accepts" | "recommended" | "required"; task_fit_score: number; direct_answer_fit_score: number; existing_template_match_score?: number; positive_task_signals: TaskUseSignal[]; task_veto_signals: TaskUseVetoSignal[]; matched_invocation_directives: string[]; matched_task_templates: string[]; matched_module_presets: string[]; doc24_packet_ref?: StorageRef; task_agent_consult_ref?: StorageRef; should_inject_task_packet: boolean; should_inject_tkp_cards: boolean; // normally false unless Task Agent active visible_to_llm: boolean; visible_to_user: boolean; created_at: string; schema_version: "1.0"; }; ``` Default rule: ```text If task_opportunity_level is none or low, should_inject_task_packet SHOULD be false. If Task Agent is not active, should_inject_tkp_cards MUST be false. ``` ### §3C.6 Positive task signals ```ts type TaskUseSignal = | "explicit_task_request" | "existing_task_name_referenced" | "existing_template_name_referenced" | "matching_template_high_confidence" | "long_running" | "recurring_or_monitoring" | "multi_step_workflow" | "multiple_independent_artifacts" | "requires_human_gates" | "requires_audit_trail" | "requires_scheduled_or_triggered_execution" | "requires_background_progress" | "high_stakes_repeatable_process" | "user_asked_to_set_up_process" | "known_user_preference_for_task_in_context" | "existing_invocation_directive_match" | "task_output_or_run_question"; ``` ### §3C.7 Veto signals ```ts type TaskUseVetoSignal = | "simple_single_turn_answer" | "single_tool_action" | "user_requested_quick_response" | "primary_conversation_should_remain_direct" | "task_overhead_exceeds_value" | "user_recently_rejected_task_suggestion" | "low_stakes_non_repeatable" | "ambiguous_but_direct_answer_safe" | "native_openclaw_monitoring_sufficient" | "ec_scheduled_job_sufficient" | "no_matching_template" | "weak_template_match_only"; ``` ### §3C.8 Vague requests For vague requests like: ```text Review this complaint. Review this motion. Look at this draft. Check this. ``` Default behavior is direct-first unless one or more strong signals are present: ```text full/deep/comprehensive review requested source-document comparison requested red-team requested revision/draft/final artifact package requested known high-confidence template exists and user often accepts it user references saved task/template/run user asks to set up a process background/scheduled/recurring/independent execution requested ``` Medium-fit UX may show a quiet chip after or alongside the direct answer: ```text [Run deeper saved task] [Use Complaint Evaluation task] [Ask Task Agent to set this up] ``` The chip should be suppressed when BDSM/DOC8 learning shows repeated rejection in the same context class. --- ## §3D — DOC24 Task Opportunity Packet ### §3D.1 DOC24 ownership DOC24 owns the live decision of whether task awareness is injected into a user turn. TKP does not participate in ordinary prompt assembly unless DOC24 selects a task-design, task-review, task-inspection, prompt-evaluation, or high-confidence task-opportunity path. DOC24 SHALL provide: ```text ambient task-awareness card TaskModeDecision prefilter TaskOpportunityPacket assembly matching task template top-k lookup matching task invocation directive lookup matching module preset top-k lookup Task Agent entrypoint permission packet token-budget enforcement active capability/model/runtime state ``` ### §3D.2 TaskOpportunityPacket schema ```ts export type TaskOpportunityPacket = { task_system_available: boolean; task_agent_available: boolean; task_opportunity_level: "none" | "low" | "medium" | "high" | "explicit"; recommended_handling: | "do_not_mention_task" | "direct_answer_with_optional_task_chip" | "suggest_existing_task" | "ask_user_task_or_direct" | "invoke_task_agent" | "run_explicit_task" | "inspect_existing_run" | "retrieve_task_output"; task_agent_consult_policy: | "forbidden" | "not_needed" | "allowed_if_user_accepts" | "recommended" | "required"; matched_invocation_directives: CompactTaskInvocationDirectiveCard[]; matched_task_templates: CompactTaskTemplateCard[]; matched_module_presets: CompactModulePresetCard[]; task_agent_entrypoints: TaskAgentEntrypoint[]; suggested_user_visible_copy?: string; reason_codes: string[]; veto_signals: string[]; max_tokens: number; schema_version: "1.0"; }; ``` ### §3D.3 Token-budget rules ```text Ambient task-awareness card: 50–160 tokens, always stable or included in baseline system context. TaskOpportunityPacket: 150–600 tokens, only when task-like signals exist. Task Agent design packet: larger, only in explicit task-design/review/inspection/evaluation mode. TKP cards: retrieved on demand by Task Agent; never dumped into ordinary chat. ``` ### §3D.4 No full task-system prompt stuffing DOC24 MUST NOT inject the full TKP, full task module catalog, full template catalog, or full Addenda B content into ordinary chat. It may inject compact top-k cards only when relevant. ### §3D.5 Degraded behavior If DOC24 cannot compute task opportunity: ```text Elnor remains direct-first. Explicit task commands still use task route if identifiable. No speculative Task Agent consult occurs solely because routing failed. ``` --- ## §4 — Task Agent ### §4.1 Definition Task Agent is a DOC23 system agent/capability that helps the user design, inspect, retrieve, assess, and improve tasks. Task Agent may be invoked by: - conversation with Elnor; - “Create Task from Description” UI; - “Explain this task” UI; - “Where is this run?” UI; - “Assess this run” UI; - “Improve this template” UI; - “Find task output” request; - scheduled or post-run assessment jobs, when policy permits. ### §4.2 Task Agent is not a module Task Agent must not be represented as an ordinary DOC23 module in the executable graph. It does not consume or emit task port data during a run. It is a system capability tied into DOC23 read models, telemetry, templates, presets, validation, and proposal commands. Task Agent may be deeply tied to system modules and surfaces, including: - Monitor module / task monitoring status; - Environment module / effective task-wide context; - Run Inspector; - Task Artifact Index; - Task History; - DOC24 context packets; - DOC72 Task Activity Memory; - DOC8/BDSM learning bundles. ### §4.3 Primary roles Task Agent performs seven primary roles. #### §4.3.1 Task creation and setup Task Agent converts natural-language user intent into: - Task Blueprint; - template/preset match report; - graph proposal or graph diff; - capability binding plan; - artifact/output intake policy; - telemetry/memory policy; - human review / side-effect plan; - preflight report; - saved task or template proposal. #### §4.3.2 Task progress window Task Agent answers questions about current and past task runs: ```text Where are we in the Draft Complaint task? What module is running now? Why is it waiting? What did the red team say? Which documents has it reviewed? What sub-agents are active? What is the latest output? What remains before completion? ``` #### §4.3.3 Output and artifact retrieval Task Agent locates, summarizes, links, or returns outputs: ```text Send me the results of the Marex draft task. Show me the damages memo from the middle of that run. Open the final complaint draft. Find the red-team report from the complaint evaluation task. Which task created this document? ``` #### §4.3.4 Task explanation Task Agent explains: - what a task does; - why modules are present; - how data and context flow; - what capabilities are used; - what outputs are produced; - what human review points exist; - what side effects can occur; - how the task differs from a template or prior version. #### §4.3.5 Task Assessment Task Agent generates Task Assessments combining telemetry, judge/experiment data, artifacts, user feedback, DOC8 friction, BDSM utility signals, capability reliability, and substantive findings. #### §4.3.6 Template/preset intelligence Task Agent searches, explains, compares, reuses, customizes, forks, and proposes updates to task templates and module presets. #### §4.3.7 Task-design self-improvement Task Agent participates in learning from task creation sessions, task runs, user edits, accepted/rejected suggestions, output usage, artifact promotion, and Task Assessments. ### §4.4 Hard prohibitions Task Agent must not: 1. Execute hidden task steps outside DOC23 graph execution. 2. Spawn sub-agents outside a task module or a non-task system route explicitly authorized for advisory/design work. 3. Mutate task records directly without EC command validation. 4. Change executable task templates or module presets without a versioned update proposal and required approval. 5. Bypass validation/preflight when generating a graph proposal. 6. Treat learned heuristics as canonical module semantics. 7. Store full raw task content in DOC72. 8. Add side-effecting capabilities without policy compliance and approval where required. 9. Create duplicate templates when a suitable existing template should be reused. ### §4.5 Task Agent read/control contract Task Agent MUST read task system modules and task-wide context through a permissioned operational lens. In addition to canonical task/run records, it SHOULD be able to inspect the Monitor module, Environment module/effective task-wide context, DOC24 capability/context packets, TaskTraceSpans, prompt/capability/policy snapshots, step-output shelves, tool-result shelves, artifact index, DOC72 TaskRun memory, and DOC8/BDSM learning signals. This is required for progress answers, user-created graph review, failure diagnosis, and Task Assessment generation. ```ts TaskAgentPermissionContract { agent_id: "task_agent" read_surfaces: Array< | "task_catalog" | "task_record" | "task_template" | "module_preset" | "task_blueprint" | "task_run_record" | "task_run_snapshot" | "task_run_log" | "task_checkpoint" | "task_artifact_index" | "task_trace_span" | "monitor_module_status" | "environment_config" | "prompt_snapshot" | "capability_snapshot" | "policy_decision_snapshot" | "step_output_shelf" | "tool_result_shelf" | "run_inspector_read_model" | "doc72_task_activity_memory" | "doc24_capability_registry" | "doc24_context_packet" | "doc25_ingestion_result" | "doc73_library_binding" | "doc8_learning_bundle" | "bdsm_utility_bundle" > command_surfaces: Array< | "task.design_session.start" | "task.blueprint.propose" | "task.graph.propose" | "task.graph.diff.propose" | "task.preflight.run" | "task.assessment.request" | "task.artifact.promote.request" | "task.template.update.propose" | "task.preset.update.propose" | "task.knowledge_pack.rebuild.request" > forbidden_commands: Array< | "direct_file_write" | "direct_graph_mutation" | "hidden_runtime_orchestration" | "unapproved_side_effect" > schema_version: "1.0" } ``` ### §4.6 TaskAgentRunLens Task Agent answers operational questions through a structured run lens before drilling into raw logs. ```ts TaskAgentRunLens { task_id: string run_id: string task_name: string template_id?: string blueprint_id?: string current_status: TaskRunStatus definition_status: TaskDefinitionStatus current_module_states: ModuleActivationStateSummary[] graph_progress_summary: GraphProgressSummary active_path_summary: ActivePathSummary[] wait_reasons: WaitReasonSummary[] human_gate_status: HumanGateSummary[] active_subagents: SubAgentTraceSummary[] trigger_summary: TriggerSummary input_summary: TaskInputSummary documents_reviewed_summary: ReviewedDocumentSummary[] capability_usage_summary: CapabilityUsageSummary cost_summary: CostSummary quality_summary?: QualitySummary anomaly_summary?: AnomalySummary artifact_index: TaskArtifactIndexEntry[] latest_outputs: TaskOutputSummary[] final_output_refs: StorageRef[] doc72_task_memory_ref?: string learning_signal_refs: string[] assessment_refs: string[] run_inspector_url: string updated_at: string schema_version: "1.0" } ``` ### §4.7 Degraded modes Task Agent must degrade honestly: | Missing substrate | Allowed degraded behavior | |---|---| | DOC72 unavailable | Answer from DOC23/EC operational run truth only; no memory graph links. | | DOC24 unavailable | Use saved capability snapshots; do not claim live connector availability. | | DOC25 unavailable | Show FileRefs/StorageRefs; no document intelligence summaries. | | DOC8/BDSM unavailable | Omit learning/utility recommendations. | | Knowledge pack stale | Query live module catalog and validators; avoid relying on stale cards. | | Run trace incomplete | State which trace surfaces are incomplete and fall back to available logs. | --- ## §4A — Task Agent typed entrypoints ### §4A.1 Entrypoint enum ```ts export type TaskAgentEntrypoint = | "consult_task_opportunity" | "design_task" | "adapt_task_template" | "review_existing_task" | "inspect_task_run" | "retrieve_task_output" | "explain_task_graph" | "assess_task" | "assess_task_portfolio" | "answer_task_system_question" | "improve_task_prompt" | "review_prompt_quality" | "generate_prompt_variants" | "test_prompt_variants" | "propose_prompt_update"; ``` **Gating rule for `consult_task_opportunity`.** This entrypoint is internal-only. It is not a general instruction for Elnor to ask Task Agent whether every complex request should become a saved task. It MAY be invoked only when the TaskModeDecision/TaskOpportunityPacket marks task opportunity as `medium`, `high`, or `explicit`, or when the user directly asks whether a saved task should be used. Low/no-opportunity turns MUST NOT call this entrypoint. ### §4A.2 Entrypoint labels ```ts export type TaskAgentEntrypointDescriptor = { entrypoint: TaskAgentEntrypoint; user_facing_label: | "Check whether a saved task fits" | "Design task" | "Adapt template" | "Review task" | "Inspect run" | "Retrieve output" | "Explain graph" | "Assess task" | "Review my tasks" | "Answer task-system question" | "Improve prompt" | "Review prompt quality" | "Generate prompt variants" | "Test prompt variants" | "Propose prompt update"; may_run_without_llm: boolean; may_spawn_background_session: boolean; requires_tkp: boolean; requires_run_truth: boolean; requires_doc72_memory: boolean; schema_version: "1.0"; }; ``` ### §4A.3 Elnor remains front door Task Agent is never the default primary conversation owner. Elnor remains the front door and summarizes Task Agent results in Elnor's voice unless the user explicitly opens a Task Agent design/review surface. Use: ```text I'll ask Task Agent to draft this as a saved task. Task Agent found an existing template that fits. Task Agent reviewed the run and found the final output here. ``` Avoid: ```text I am transferring you to Task Agent. ``` ### §4A.4 Invocation modes ```ts export type TaskAgentInvocationMode = | "read_model_only" | "synchronous_consult" | "visible_design_session" | "async_background_design" | "assessment_job"; ``` Modes: 1. `read_model_only` — no LLM call; tools/read models answer run/output/status queries. 2. `synchronous_consult` — small bounded Task Agent call for template selection or task review. 3. `visible_design_session` — user-facing task design/review mode. 4. `async_background_design` — OpenClaw named sub-agent/background session for larger task design, portfolio review, or prompt evaluation. 5. `assessment_job` — Task Assessment / portfolio report / prompt evaluation job. ### §4A.5 Read-model-only uses Task Agent SHOULD use deterministic read models without LLM calls for: ```text Where is this run? What module is running now? Why is it waiting? What did module X output? Retrieve final output. Open Run Inspector. List artifacts from run. ``` Representative tools: ```ts task_runs.inspect(run_id) task_artifacts.retrieve(run_id, artifact_query) task_graph.explain(task_id) task_runs.current_status(task_id) task_outputs.find(task_id_or_run_id, query) ``` --- ## §4B — Task Agent system-agent registration and runtime profile ### §4B.1 System agent identity Task Agent MUST be registered in the System Agent Identity Registry as a named system agent. ```ts export type SystemAgentIdentityRegistryEntry_TaskAgent = { agent_id: "task_agent"; display_name: "Task Agent"; user_facing_aliases: ["Task Agent"]; owning_spec_doc: "DOC23 Addenda B"; purpose: "Design, adapt, inspect, explain, assess, and improve saved tasks, task templates, task graphs, task runs, task prompts, and task-system usage."; lifecycle_state: "staged" | "active" | "disabled" | "deprecated"; required_execution_profile: "system_agent_task_design"; declared_dependencies: [ "DOC23", "DOC23 Addenda A", "DOC24", "DOC72", "DOC25", "DOC73", "DOC8", "BDSM", "DOC11", "DOC17", "EC Core" ]; capability_registered: true; schema_version: "1.0"; }; ``` ### §4B.2 Capability registry entry Because Task Agent is invokable, DOC24 MUST register it as an agent capability. ```ts export type TaskAgentCapabilityRegistryEntry = { capability_id: "agent.task_agent"; capability_type: "agent"; implemented_by_agent_id: "task_agent"; semantic_actions: TaskAgentEntrypoint[]; owner_doc: "DOC24"; detailed_contract_ref: "DOC23 Addenda B"; availability_source: "doc11_named_agent_runtime_truth" | "ec_agent_registry" | "both"; health_state: "healthy" | "degraded" | "unavailable" | "unknown"; schema_version: "1.0"; }; ``` ### §4B.3 Runtime profile Task Agent requires user-configurable runtime settings. These settings control Task Agent design/review behavior, not the model settings of modules inside a saved task unless explicitly selected. ```ts export type TaskAgentRuntimeProfile = { agent_id: "task_agent"; default_model_ref: string; default_think_level: "low" | "medium" | "high" | "max"; fallback_model_refs: string[]; model_policy: { allow_cloud: boolean; allow_local: boolean; require_local_for_sensitivity_tags: string[]; prefer_high_reasoning_for_graph_design: boolean; }; cost_policy: { max_design_cost_usd: number | null; max_assessment_cost_usd: number | null; max_prompt_eval_cost_usd: number | null; max_repair_iterations: number; }; invocation_policy: { allow_invisible_consult: boolean; allow_background_design: boolean; require_user_confirmation_before_save: boolean; require_user_confirmation_before_run: boolean; allow_prompt_evaluation_tasks: boolean; }; context_policy: { default_context_mode: | "isolated_with_task_pack" | "fork_if_safe" | "manual_only"; max_context_pack_tokens: number; allow_tkp_retrieval: boolean; allow_doc72_task_memory: boolean; allow_doc8_bDSM_learning_overlay: boolean; allow_doc17_prompt_advisor_service: boolean; }; schema_version: "1.0"; }; ``` ### §4B.4 Settings UI ```text Settings > Agents > Task Agent ────────────────────────────── Default model: [GPT-5.5 Pro ▾] Think level: [High ▾] Fallback chain: [Claude Sonnet 4.5 → Gemini 2.5 Pro ▾] Use local-only when required: [When policy requires ▾] Max design iterations: [3] Max design cost: [$5.00] Max assessment cost: [$10.00] Max prompt-eval cost: [$15.00] Auto-consult policy: [Only when runtime routing marks high/explicit ▾] Show task suggestion chips: [On] Create visible receipts: [Always] Allow background design jobs: [On] Allow prompt evaluation tasks: [On] [Save] [Open Task Agent diagnostics] ``` ### §4B.5 Fallback behavior Task Agent fallback model use MUST be recorded in the Task Agent trace. High-risk graph changes, prompt promotions, side-effecting task creation, and template/preset mutations require validation/preflight after fallback model use. ```ts export type TaskAgentModelFallbackReceipt = { receipt_id: string; task_agent_request_id: string; primary_model_ref: string; fallback_model_ref: string; fallback_reason: | "primary_unavailable" | "primary_auth_failed" | "primary_cost_blocked" | "primary_context_exceeded" | "policy_required_local" | "timeout" | "unknown"; output_requires_extra_validation: boolean; created_at: string; schema_version: "1.0"; }; ``` --- ## §4C — OpenClaw and DOC11 alignment ### §4C.1 DOC11 role DOC11 owns OpenClaw runtime truth: named-agent availability, model catalogs, provider auth, fallback models, sub-agent settings, sessions_spawn behavior, fork/isolated context support, runtime diagnostics, and no-phantom-control readback. Addenda B consumes DOC11 truth. It MUST NOT invent a parallel OpenClaw runtime registry. ### §4C.2 Task Agent as named OpenClaw agent If Task Agent runs as an OpenClaw named agent, DOC11 MUST expose: ```text task_agent installed/registered state task_agent runtime health task_agent model profile task_agent fallback chain task_agent workspace/profile resolution sessions_spawn availability forked context availability and parentForkMaxTokens behavior child session run status sub-agent completion events session archival/cleanup state ``` ### §4C.3 Task Agent context mode Default Task Agent context mode SHOULD be: ```text isolated_with_task_pack ``` Task Agent receives a DOC24-assembled task-scoped context pack rather than the full parent transcript. Use `fork_if_safe` only when: ```text The current conversation transcript itself is needed to design the task; parentForkMaxTokens constraints are satisfied; privacy/policy permits fork; DOC24 cannot produce an adequate task-scoped context pack alone. ``` ### §4C.4 TaskAgentContextPack schema ```ts export type TaskAgentContextPack = { pack_id: string; request_ref: string; entrypoint: TaskAgentEntrypoint; user_intent_summary: string; current_conversation_excerpt_ref?: StorageRef; relevant_task_templates: CompactTaskTemplateCard[]; relevant_module_presets: CompactModulePresetCard[]; relevant_task_invocation_directives: CompactTaskInvocationDirectiveCard[]; relevant_tkp_card_refs: string[]; work_context: { matter_refs: EntityRef[]; entity_refs: EntityRef[]; goal_refs: GoalRef[]; active_context_ref?: string; }; artifact_context: { source_document_refs: DocumentRef[]; prior_work_product_refs: WorkProductRef[]; prior_task_run_refs: string[]; }; capability_context: { available_capability_refs: string[]; unavailable_capability_refs: string[]; policy_blocked_capability_refs: string[]; model_availability_summary_ref?: StorageRef; }; constraints: { memory_directive_refs: string[]; security_policy_refs: string[]; artifact_policy_refs: string[]; cost_policy_refs: string[]; }; schema_version: "1.0"; }; ``` ### §4C.5 Native OpenClaw monitoring versus saved-task monitoring The term “monitored” MUST be disambiguated. Use the following boundary: | Need | Use | |---|---| | Know whether spawned session is alive, completed, failed, timed out, or archived | OpenClaw native session/sub-agent monitoring | | Run one side lookup or background helper while Elnor continues conversation | OpenClaw `sessions_spawn` / specialist sub-agent | | Run recurring infrastructure maintenance, nightly compilation, or background system job | EC scheduler/orchestrator | | Run a repeatable visible process with modules, checkpoints, outputs, gates, artifacts, task memory, and Run Inspector | Saved task / tracked workflow | | Monitor an external condition and then execute a structured process | EC trigger/schedule launching a saved task | | Watch something simple and announce if it changes | OpenClaw watcher or EC job unless workflow structure/artifacts/gates are needed | Saved tasks SHOULD be used for process monitoring that needs durable workflow state, graph visibility, outputs, artifacts, gates, or auditability. Saved tasks SHOULD NOT be used merely for OpenClaw heartbeat/session monitoring or simple background watchers. ### §4C.6 Task Agent sub-agent policy Task Agent itself may spawn sub-agents only under explicit policy. Default should be conservative. ```ts export type TaskAgentSubagentPolicy = { allow_subagent_spawn: boolean; allowed_named_agent_ids: string[]; max_child_sessions: number; max_depth: number; max_child_cost_usd: number | null; context_mode_default: "isolated" | "fork_if_safe"; require_sub_agent_context_pack: boolean; on_cap_hit: "abort" | "continue_with_warning"; allowed_entrypoints_for_subagents: TaskAgentEntrypoint[]; schema_version: "1.0"; }; ``` Default: ```text ordinary task design: allow_subagent_spawn = false portfolio assessment: allow_subagent_spawn = true if configured heavy template audit: allow_subagent_spawn = true if configured prompt evaluation job: allow_subagent_spawn = true if configured ``` ### §4C.7 OpenClaw runtime trace bridge DOC11/OpenClaw may emit native runtime artifacts relevant to task observability but not DOC23 graph events by themselves, including sub-agent spawn acceptance, child session keys, completion/failure/timeout events, model/fallback resolution, forked-vs-isolated context mode, structured execution-item progress, provider/tool runtime diagnostics, `sessions_yield` / hidden-follow-up orchestration payloads where applicable, and trajectory/export references when policy permits. DOC23 Addenda B SHALL consume those native events through an adapter rather than claiming ownership of OpenClaw runtime truth. ```ts export type OpenClawTaskRuntimeTraceBridge = { bridge_id: string; task_id?: string; run_id?: string; task_agent_consult_id?: string; openclaw_session_key?: string; openclaw_child_session_key?: string; openclaw_run_id?: string; native_event_kind: | "sessions_spawn_accepted" | "subagent_completed" | "subagent_failed" | "subagent_timed_out" | "model_fallback_applied" | "context_mode_resolved" | "structured_progress_item" | "sessions_yield" | "trajectory_export_created" | "runtime_diagnostic"; native_event_ref: StorageRef; mapped_task_trace_span_ref?: StorageRef; mapping_status: | "mapped" | "stored_native_ref_only" | "blocked_by_policy" | "unavailable" | "degraded"; degraded_reason_codes: string[]; created_at: string; schema_version: "1.0"; }; ``` Normative rule: DOC23 consumes OpenClaw runtime traces for task observability, but DOC11/OpenClaw remains the owner of native runtime truth. If an OpenClaw event cannot be mapped safely into a TaskTraceSpan, DOC23 stores a native-event reference and degraded reason instead of inventing a span. This bridge is relevant to the entire task system, not only Task Agent. It lets Run Inspector, Task Assessment, and Task Agent understand native sub-agent/background execution without creating a second runtime ledger. ### §4C.8 Workflow formalization continuum As LLMs improve, Elnor will be increasingly capable of long multi-step work directly in chat or native OpenClaw execution. Saved tasks remain valuable where formalization adds product value. ```text Level 0 — Direct chat response Ordinary answers, quick analysis, interactive reasoning. Level 1 — Direct chat with internal plan Multi-step reasoning where user remains actively engaged. Level 2 — Native OpenClaw agentic run / structured execution items Long in-session action sequence where progress visibility helps, but no saved graph/template is needed. Level 3 — Ad hoc saved/tracked task One request needs independent execution, artifacts, gates, progress, or audit trail. Level 4 — Saved task template Process will recur or should be reusable. Level 5 — Standing procedure / recurring task invocation directive Semantic trigger should repeatedly launch, suggest, or adapt a saved task. ``` Native OpenClaw structured progress items are a runtime-progress surface. Saved tasks are durable process objects. Use native OpenClaw progress when Elnor is doing direct or in-session work and no reusable graph/template, durable artifact policy, independent task lifecycle, Run Inspector, or template reuse is needed. Use saved tasks when formalization adds durable value: repeatability, graph visibility, independent/background execution, scheduled/triggered state, human gates, artifact tracking, telemetry, task memory, template reuse, or Task Assessment. Task use should be justified by formalization value, not by model incapability. --- ## §5 — Task creation and onboarding flow ### §5.1 Purpose Task creation is the highest-leverage self-learning surface in this addendum. The system must learn how to convert underspecified user intent into reliable, high-value, reusable DOC23 tasks. ### §5.2 Required creation flow Task Agent task creation follows this sequence: ```text 1. Capture user intent. 2. Resolve task type, domain, work context, entities, goals, and output expectations. 3. Search existing task templates, module presets, design patterns, and prior design cases. 4. Query DOC24 for available capabilities, connectors, MCP servers, procedures, skills, agents, and models. 5. Retrieve relevant Task Design Doctrine, pattern cards, user preferences, goal links, and worked examples. 6. Ask high-value missing-slot questions only when needed. 7. Produce Task Blueprint. 8. Generate or adapt graph. 9. Produce capability binding plan, artifact intake policy, memory policy, side-effect/human-review plan, and assessment plan. 10. Run validation/preflight. 11. Revise proposal based on validation/preflight. 12. Present proposal with rationale and alternatives. 13. Save through EC command only after user approval or policy-authorized confirmation. 14. Record TaskCreationSessionTrace for learning. ``` ### §5.3 Optional onboarding-style conversation For complex tasks, Task Agent may run a structured but natural onboarding-style conversation. This must not be a rigid form. The system should know which slots matter; the LLM decides how to ask. Example onboarding slots: ```ts TaskCreationOnboardingSlots { task_goal?: string task_type?: string work_context?: EntityRef[] source_materials?: SourceMaterialRef[] expected_outputs?: OutputArchetype[] success_criteria?: QualityCriterion[] required_steps?: StepIntent[] optional_steps?: StepIntent[] review_tolerance?: "low_review" | "normal_review" | "high_review" autonomy_tolerance?: "manual" | "assisted" | "mostly_auto" side_effects_allowed?: SideEffectDeclaration[] capability_needs?: CapabilityNeed[] artifact_memory_preferences?: ArtifactPreference[] budget_latency_preferences?: BudgetLatencyPreference schema_version: "1.0" } ``` ### §5.4 Question policy Task Agent should ask questions only where the answer changes the design materially. It should learn question utility through DOC8/BDSM. Examples: | Context | High-value question | Usually low-value question | |---|---|---| | Complaint evaluation | “Do you want issue matrix, memo, or both?” | “Which model should I use?” | | Drafting legal document | “Should this include red-team and revision?” | “Should I use an agent module?” | | Email-triggered attachment task | “Should attachments be saved/ingested or only summarized?” | “Should the email trigger emit body_out?” | | Side-effecting task | “Should external sends require approval?” | “Should I add a signal_out?” | ### §5.5 User-facing proposal panel Task Agent must show task proposals in business/user terms first, graph terms second. ```text Proposed task: Draft Complaint Goal: Draft a filing-ready complaint from selected source documents. I found: - Existing template: Complaint Drafting Template v3 - Useful presets: Damages Memo Agent, Complaint Red-Team Judge - Relevant preference: complex legal analysis should include second-pass deepening Recommended process: 1. Source document intake and classification 2. Fact extraction 3. Claims and damages memo 4. Complaint draft 5. Second-pass "look deeper" review 6. Red-team review 7. Revision and final package Outputs: - Final complaint draft - Damages analysis memo - Red-team report - Revision summary Memory/artifacts: - Final complaint: save as work product/document - Damages memo: suggest making findable later - Raw intermediate drafts: keep with run only [Inspect template] [Edit blueprint] [Generate graph] [Run preflight] [Save] ``` ### §5.6 TaskDesignIntakeBundle Task creation must produce a structured bundle, not merely a graph. This mirrors DOC3's principle that demonstrations produce multi-output knowledge bundles rather than single procedures, while remaining DOC23-owned and task-specific. ```ts TaskDesignIntakeBundle { bundle_id: string design_session_id: string user_intent_text: string resolved_intent_summary: string inferred_task_type: string work_context_refs: EntityRef[] matter_or_project_refs: EntityRef[] goal_refs: GoalRef[] template_match_report_ref?: StorageRef selected_template_refs: string[] selected_preset_refs: string[] considered_pattern_refs: string[] considered_case_refs: TaskDesignCaseRef[] task_blueprint_ref: StorageRef graph_proposal_ref: StorageRef capability_plan_ref: StorageRef artifact_intake_policy_ref: StorageRef telemetry_memory_policy_ref: StorageRef side_effect_review_plan_ref: StorageRef preflight_report_ref?: StorageRef extracted_user_preferences: TaskDesignPreferenceCandidate[] extracted_goal_links: TaskGoalLinkCandidate[] extracted_template_routing_directives: TemplateRoutingDirective[] extracted_design_learning_signals: TaskAgentDesignLearningSignal[] schema_version: "1.0" } ``` The saved task graph is only one output of the design intake process. The bundle is the learning unit for task creation. ### §5.7 DOC24/DOC3-style task creation intake posture Task creation should use DOC24's onboarding philosophy: the LLM asks naturally, but the system knows which slots matter, what can be auto-resolved, and when readiness is sufficient. It should also borrow the DOC3 semantic-learning principle: preserve semantic intent, rationale, preferences, constraints, vocabulary, goals, and reusable patterns instead of preserving only mechanical graph operations. The Task Agent must detect and preserve statements such as: ```text "For complex analysis, I like a second pass that says look deeper." "When reviewing a complaint, I usually want an issue matrix, not just a memo." "Don't use PACER unless docket history matters." "Make damages memos findable because I reuse them." ``` Those statements should become preference candidates, goal links, pattern-card updates, template-routing directives, artifact-policy candidates, or Task Agent instruction candidates as appropriate. They must not disappear as ordinary chat text. --- ## §5A — Visible Elnor-created saved tasks ### §5A.1 Rule Elnor may create or instantiate a saved task through Task Agent when the task fit is high or explicit, but all created tasks must be visible, inspectable, editable, cancelable, and receipt-backed. Hidden saved-task graphs are forbidden. Internally, any DOC23 graph created for execution MUST be surfaced as a saved/tracked task with a visible receipt, graph link, run/status surface, edit controls, and cancellation/disable controls. ### §5A.2 Creation modes ```ts export type TaskCreationMode = | "user_authored" | "task_agent_proposed" | "elnor_created_tracked" | "template_instantiated" | "scratch_draft_not_run"; ``` ### §5A.3 TrackedTaskReceipt ```ts export type TrackedTaskReceipt = { receipt_id: string; task_id: string; run_id?: string; creation_mode: TaskCreationMode; visible_to_user: true; reason_for_task_mode: string; graph_inspector_url: string; run_inspector_url?: string; can_cancel: boolean; can_pause: boolean; can_edit: boolean; can_convert_to_template: boolean; doc72_task_memory_ref?: string; created_at: string; schema_version: "1.0"; }; ``` ### §5A.4 Elnor-created task semantic trace Every Elnor-created saved task MUST produce semantic invocation memory. This trace links the originating request, TaskModeDecision, Task Agent design session, resulting task, visible receipt, user-visible rationale, work context, entities, goals, and any TaskInvocationDirective candidate. ```ts export type ElnorCreatedTaskSemanticTrace = { trace_id: string; originating_user_request_ref: string; task_mode_decision_ref: string; task_agent_design_session_ref?: string; resulting_task_id: string; resulting_task_template_id?: string; resulting_task_receipt_ref: string; semantic_request_summary: string; why_task_mode_was_selected: string; user_visible_description: string; linked_task_invocation_directive_candidate_id?: string; linked_doc72_goal_refs: string[]; linked_entity_refs: string[]; linked_work_context_refs: string[]; created_at: string; schema_version: "1.0"; }; ``` This trace supports later recognition of similar requests and prevents Elnor-created tasks from becoming disconnected from the semantic user request that produced them. ### §5A.5 User-facing examples ```text I set this up as a saved task because it needs to run independently and preserve outputs. It will check the docket, retrieve new filings, route documents through ingestion, summarize them, flag deadlines, and ask before sending anything externally. [View task] [Edit steps] [Run now] [Disable] ``` For high-stakes side effects: ```text I drafted the saved task but have not run it yet because it can send external messages. Review the graph and approval gates first. ``` --- ## §6 — Task Blueprint ### §6.1 Definition Task Blueprint is the semantic intent contract behind a DOC23 graph. It records what the task is trying to accomplish, why the graph is structured as it is, what outputs are expected, what quality criteria matter, what capabilities are needed, what review/side-effect rules apply, and what memory/artifact behavior should occur. A graph is executable. A blueprint is explanatory, evaluative, reusable, and learnable. ### §6.2 Blueprint is not a graph The Blueprint must not duplicate cable-level execution. It captures user intent, process logic, design rationale, goals, quality criteria, and graph-module bindings. ### §6.3 TaskBlueprint schema ```ts TaskBlueprint { blueprint_id: string name: string description?: string business_or_personal_goal: string task_type: string domain?: string document_archetypes: string[] work_context_refs: EntityRef[] matter_or_project_refs: EntityRef[] goal_refs: GoalRef[] expected_inputs: InputContract[] expected_outputs: OutputContract[] step_intents: StepIntent[] quality_criteria: QualityCriterion[] human_review_points: ReviewPoint[] side_effects: SideEffectDeclaration[] capability_needs: CapabilityNeed[] model_strategy?: ModelStrategy memory_and_artifact_policy: TaskArtifactIntakePolicy telemetry_policy: TaskTelemetryPolicy assessment_plan: TaskAssessmentPlan security_profile_ref?: string retention_policy_ref?: string template_binding?: TaskTemplateBinding generated_graph_ref?: StorageRef current_graph_version?: number blueprint_version: string prior_version_ref?: StorageRef authored_by: "user" | "task_agent" | "system" | "mixed" created_at: string updated_at: string schema_version: "1.0" } ``` ### §6.4 StepIntent schema ```ts StepIntent { step_intent_id: string label: string purpose: string why_it_exists?: string expected_input_refs: string[] expected_output_refs: string[] suggested_module_type?: string bound_module_ids: string[] required_capabilities: CapabilityNeed[] optional_capabilities: CapabilityNeed[] quality_checks: QualityCriterion[] artifact_policy_override?: ArtifactIntakeLevel goal_refs: GoalRef[] design_rationale?: string alternatives_considered?: string[] schema_version: "1.0" } ``` ### §6.5 Graph binding Each StepIntent may bind to zero, one, or multiple graph modules. A Blueprint step can map to a graph region. Example: ```text Blueprint step: “Assess damages theory” -> source.file -> step.agent_task -> step.judge -> optional human review ``` ### §6.6 Blueprint versioning and diffs Task Blueprint must support versioned diffs. ```ts BlueprintDiff { diff_id: string prior_blueprint_id: string candidate_blueprint_id: string changed_fields: BlueprintChangedField[] step_intent_changes: StepIntentChange[] output_contract_changes: OutputContractChange[] capability_need_changes: CapabilityNeedChange[] artifact_policy_changes: ArtifactPolicyChange[] rationale: string evidence_refs: StorageRef[] schema_version: "1.0" } ``` ### §6.7 Blueprint-to-graph validation Preflight must compare Blueprint and graph: - Blueprint expects a step but graph lacks bound module(s). - Graph contains a side-effecting output not declared in Blueprint. - Blueprint expects a final output but no output module/port emits it. - Blueprint declares DOC25 ingestion but artifact policy does not route documents. - Blueprint requires PACER but capability binding forbids PACER. - Blueprint requires human review before external send but graph has no gate. ### §6.8 Blueprint and DOC72 goal nodes Blueprint and DOC72 goal nodes are distinct. ```text Task Blueprint = intent contract for one task/template. DOC72 goal node = reusable objective/rationale in the memory matrix. ``` Blueprint should link to DOC72 goal nodes when design decisions serve reusable objectives. Examples: ```text Goal: Improve substantive depth in complex legal analysis. Served by: second-pass deepening, red-team, judge, looped revision. Goal: Preserve valuable intermediate analysis for later reuse. Served by: artifact intake policy, DOC72 summary/reference, DOC25 work_product ingestion. ``` ### §6.9 TaskDesignGoalPayload Use existing DOC72 `goal` node kind. Do not create a new DOC72 node kind. ```ts TaskDesignGoalPayload { goal_type: | "task_outcome_goal" | "process_quality_goal" | "output_quality_goal" | "artifact_memory_goal" | "cost_latency_goal" | "risk_control_goal" | "user_preference_goal" goal_statement: string applies_to_context_class?: string success_criteria: string[] disfavored_tradeoffs: string[] preferred_patterns: TaskDesignPatternRef[] preferred_templates: TaskTemplateRef[] preferred_module_presets: ModulePresetRef[] source: | "architect_authored" | "user_stated" | "learned_from_task_creation" | "learned_from_task_assessment" | "system_inferred" evidence_refs: StorageRef[] confidence: number schema_version: "1.0" } ``` ### §6.10 Goal relationship patterns DOC72 should support or accept relation labels such as: ```text blueprint expresses_goal goal design_pattern implements_goal goal task_template serves_goal goal module_preset supports_goal goal task_run advanced_goal goal task_assessment evaluated_goal goal artifact satisfies_goal goal module_design_decision serves_goal goal user_preference constrains_goal goal ``` --- ## §7 — Templates, presets, and reuse-first task creation ### §7.1 Definitions | Object | Owner | Meaning | |---|---|---| | Module Preset | DOC23 | Saved config for one module type. | | Task Template | DOC23 | Reusable executable graph/template. | | Task Blueprint | DOC23 | Semantic intent/process contract behind a task/template. | | Task Design Pattern | DOC23 + DOC72/DOC8 | Abstract reusable process motif. | | Template Group | DOC23 | Optional bundle of multiple related task templates. | Avoid introducing a new user-facing noun such as “Playbook” until the distinction earns its keep. If a package is needed, use “Template Group” or “Workflow Package” provisionally. ### §7.2 Reuse-first rule Before proposing a new graph, Task Agent must search: 1. Task Template Semantic Cards. 2. Module Preset Semantic Cards. 3. Task Design Pattern Cards. 4. Prior Task Design Cases. 5. Similar Task Blueprints. Task Agent must present strong matches unless the user explicitly asks for a blank/new task. ### §7.3 TaskTemplateSemanticCard The executable template remains DOC23 canonical. The semantic card is a projection used for routing, explanation, reuse, and learning. ```ts TaskTemplateSemanticCard { template_id: string doc23_template_ref: StorageRef canonical_name: string description: string task_type: string domain_tags: string[] document_archetypes: string[] work_context_tags: string[] trigger_phrases: string[] semantic_lookup_phrases: string[] use_conditions: string[] non_use_conditions: string[] expected_inputs: InputContract[] expected_outputs: OutputContract[] default_output_archetypes: OutputArchetype[] blueprint_ref?: StorageRef graph_snapshot_ref: StorageRef module_summary: TemplateModuleSummary[] module_preset_refs: string[] capability_requirements: CapabilityNeed[] connector_requirements: ConnectorRequirement[] memory_artifact_policy_summary: string evaluation_plan_summary?: string known_strengths: string[] known_limitations: string[] user_preference_refs: string[] goal_refs: GoalRef[] last_used_at?: string usage_count: number assessment_summary?: TemplateAssessmentSummary lifecycle_state: | "draft" | "active" | "candidate_update" | "deprecated" | "archived" graph_hash: string blueprint_hash?: string source_authority: KnowledgeAuthorityClass schema_version: "1.0" } ``` ### §7.4 ModulePresetSemanticCard ```ts ModulePresetSemanticCard { preset_id: string doc23_preset_ref: StorageRef canonical_name: string module_type: string description: string role_in_workflow: string trigger_phrases: string[] semantic_lookup_phrases: string[] use_conditions: string[] non_use_conditions: string[] expected_input_kinds: string[] expected_output_kinds: string[] output_shape_summary: string required_capabilities: CapabilityNeed[] optional_capabilities: CapabilityNeed[] forbidden_capabilities: CapabilityNeed[] compatible_task_types: string[] incompatible_task_types: string[] config_summary: string why_this_preset_exists?: string known_strengths: string[] known_limitations: string[] common_misuses: string[] user_preference_refs: string[] goal_refs: GoalRef[] design_case_refs: TaskDesignCaseRef[] usage_count: number last_used_at?: string assessment_summary?: PresetAssessmentSummary config_hash: string lifecycle_state: | "draft" | "active" | "candidate_update" | "deprecated" | "archived" source_authority: KnowledgeAuthorityClass schema_version: "1.0" } ``` ### §7.5 TemplateMatchScore ```ts TemplateMatchScore { template_id: string semantic_intent_match: number task_type_match: number input_contract_match: number output_contract_match: number capability_availability_match: number entity_context_match: number user_preference_match: number prior_assessment_score: number recency_or_staleness_score: number overall_score: number hard_vetoes: MatchVeto[] soft_penalties: MatchPenalty[] explanation: string recommended_action: | "use_as_is" | "customize" | "fork" | "create_new_from_pattern" | "create_new_blank" schema_version: "1.0" } ``` Hard vetoes include: - required capability unavailable; - forbidden side effect present; - wrong output class; - archived/deprecated template; - security policy conflict; - memory/artifact policy conflict; - materially wrong task type. ### §7.6 Template update proposals ```ts TemplateUpdateProposal { proposal_id: string target_kind: "task_template" | "module_preset" | "task_blueprint" | "semantic_card" target_id: string update_kind: | "semantic_metadata" | "blueprint_change" | "graph_change" | "module_config_change" | "capability_binding_change" | "artifact_policy_change" | "evaluation_plan_change" | "deprecation" rationale: string evidence_refs: StorageRef[] supporting_signals: TaskDesignLearningSignal[] proposed_patch_ref: StorageRef prior_hash: string candidate_hash: string approval_required: boolean rollback_ref?: StorageRef lifecycle_state: "candidate" | "staged" | "approved" | "rejected" | "applied" | "rolled_back" schema_version: "1.0" } ``` ### §7.7 Low/medium/high-risk update classes | Risk | Examples | Approval | |---|---|---| | Low | add trigger phrase, improve description, add known limitation, update usage count | May auto-stage or auto-apply per policy. | | Medium | add optional capability, change default output package, add optional judge/red-team step, alter artifact policy default | User or architect review. | | High | rewire graph, remove module, change side-effecting output, change security/capability policy, promote optimized prompt | Explicit approval + versioning + rollback. | ### §7.8 User teaching directives ```ts TemplateRoutingDirective { directive_id: string template_id?: string preset_id?: string directive_kind: | "prefer_when_phrase" | "avoid_when_phrase" | "prefer_for_task_type" | "avoid_for_task_type" | "prefer_output_archetype" | "require_user_confirmation" phrase_or_condition: string source: "user_taught" | "task_agent_inferred" | "assessment_inferred" confidence: number approval_state: "active" | "candidate" | "rejected" schema_version: "1.0" } ``` Example: ```text When Will says “complaint audit,” prefer Complaint Evaluation Template. When Will asks only for “citation cleanup,” avoid Complaint Evaluation Template. ``` --- ## §7A — Template Groups / Workflow Packages ### §7A.1 Purpose Template Groups / Workflow Packages are optional aggregation objects for managing multiple related Task Templates and shared assets. They should not be used where a single Task Template is sufficient. The audit rule is: ```text Single reusable graph + blueprint + presets + capability requirements + evals + artifact policy -> Task Template. Multiple related templates with shared presets, shared rubrics, shared examples, shared capability setup, or cross-template learning -> Template Group / Workflow Package. ``` ### §7A.2 Distinction from Task Template A Task Template answers: “What executable graph should I run or instantiate for this process?” A Template Group answers: “What family of related templates, presets, examples, rubrics, and defaults should Task Agent use for this broader workflow area?” A Template Group may include one primary template, but if it never includes more than one template or shared cross-template assets, it should collapse back into the Task Template. ### §7A.3 TemplateGroup schema ```ts TaskTemplateGroup { group_id: string display_name: string description: string domain_tags: string[] task_type_tags: string[] primary_template_id?: string related_template_ids: string[] shared_module_preset_ids: string[] shared_blueprint_pattern_refs: TaskDesignPatternRef[] shared_capability_requirements: CapabilityNeed[] shared_connector_requirements: ConnectorRequirement[] shared_artifact_policy_ref?: StorageRef shared_assessment_plan_ref?: StorageRef shared_eval_dataset_refs: string[] shared_judge_rubric_refs: string[] sample_input_refs: StorageRef[] example_output_refs: StorageRef[] known_limitations: string[] usage_summary?: TemplateGroupUsageSummary version: string lifecycle_state: "draft" | "active" | "deprecated" | "archived" schema_version: "1.0" } ``` ### §7A.4 User-facing experience The Task Agent should not burden the user with this object unless it helps. The ordinary user experience should be: ```text I found an existing Complaint Evaluation template that fits. Related templates are also available in the same group: - Draft Complaint - Red-Team Complaint - Revise Complaint Recommendation: Use Complaint Evaluation now. Keep the related drafting templates available if you want to turn this into a full complaint workflow. [Use template] [Inspect related templates] [Customize] [Create new] ``` ### §7A.5 What the group adds beyond the template page A Template Group adds value only when it provides at least one of the following beyond a single preset/template view: - cross-template setup guidance; - shared capability setup, such as PACER + DOC25 + citation checker; - shared output archetype defaults; - shared evaluation rubrics; - shared example inputs/outputs; - cross-template assessment trends; - recommended handoffs between templates; - a consistent onboarding flow for a family of tasks; - a place for Task Agent to learn design patterns across related templates without mutating each template separately. ### §7A.6 Merge-with-template rule If a Template Group has only one template, no shared assets, no cross-template learning value, and no separate user-facing workflow function, EC should flag it for collapse into the underlying Task Template. --- ## §7B — DOC72 / DOC3 / DOC23 procedural boundary ### §7B.1 Four-layer procedural taxonomy Task invocation overlaps with procedural memory and semantic skills, but it must not collapse into DOC3 skills. ```text Procedural memory: Soft guidance about how the user likes work handled. Usually memory_directive. Skill procedure: Reusable app/tool interaction technique. DOC3 owns lifecycle; stored as procedure nodes; delivered by DOC24. Standing procedure: Confirmed trigger-action behavioral automation. DOC1/DOC72 governed. Saved task / DOC23 task: Hardened modular automation graph with modules, typed cables, gates, retries, telemetry, artifacts, and run history. ``` ### §7B.2 TaskInvocationDirective A TaskInvocationDirective is semantic routing/procedural guidance that says when Elnor should suggest, run, adapt, or design a saved task. It does not duplicate the task graph. ```ts export type TaskInvocationDirective = { directive_id: string; canonical_name: string; description: string; trigger_phrases: string[]; semantic_lookup_phrases: string[]; use_conditions: string[]; non_use_conditions: string[]; invocation_policy: | "suggest_task" | "run_existing_task_after_confirmation" | "invoke_task_agent_for_design" | "invoke_task_agent_for_review" | "answer_direct_but_offer_task_chip" | "do_not_suggest_task"; target_task_template_refs: string[]; target_task_ids?: string[]; target_module_preset_refs?: string[]; task_agent_entrypoint?: TaskAgentEntrypoint; pre_invocation_context_plan?: { gather_matter_context: boolean; gather_background_documents: boolean; required_source_sets: string[]; doc25_required: boolean; doc73_library_refs?: string[]; }; default_artifact_policy_ref?: string; confirmation_policy: | "never_auto_run" | "confirm_before_run" | "auto_run_if_user_explicit" | "auto_run_if_standing_procedure"; prior_task_use_summary?: string; evidence_refs: StorageRef[]; learned_from_refs: string[]; authority_source: | "user_taught" | "user_confirmed" | "learned_from_acceptance" | "learned_from_rejection" | "architect_authored" | "system_candidate"; confidence: number; lifecycle_state: | "candidate" | "active" | "suppressed" | "deprecated"; schema_version: "1.0"; }; ``` ### §7B.3 TaskInvocationDirectiveCandidate TaskInvocationDirective candidates are created from actual use, not from speculative prompt guesses. When Elnor receives a user request, chooses a saved task or Task Agent path, and the user accepts, rejects, edits, or repeats that path, EC SHOULD record a `TaskInvocationTrace`. Repeated or high-confidence traces MAY produce a `TaskInvocationDirectiveCandidate`. ```ts export type TaskInvocationDirectiveCandidate = { candidate_id: string; source_trace_refs: string[]; source_task_ids: string[]; source_task_template_ids: string[]; source_task_agent_design_session_ids: string[]; user_request_patterns: string[]; inferred_trigger_phrases: string[]; inferred_semantic_lookup_phrases: string[]; proposed_directive_ref: StorageRef; evidence_summary: string; acceptance_count: number; rejection_count: number; edit_count_after_acceptance: number; recommended_lifecycle: | "memory_directive_candidate" | "standing_procedure_candidate" | "template_semantic_card_update" | "suppress_task_suggestion_rule"; review_required: boolean; schema_version: "1.0"; }; ``` Examples: ```text If the user repeatedly accepts “Use Complaint Evaluation task” after asking “complaint audit,” create a candidate directive mapping that phrase to the Complaint Evaluation template. If the user repeatedly rejects task suggestions after “quick review,” create a suppression/update candidate for that context class. If Elnor creates a visible saved task from “monitor Marex filings,” link the semantic request, resulting task, template/preset choices, and later run outcomes. ``` The directive candidate is not the executable graph. The executable task remains DOC23 canonical. The directive is semantic routing/procedural guidance stored as a DOC72 `memory_directive` candidate or, if it becomes a confirmed trigger-action behavior, a `standing_procedure` candidate. ### §7B.4 Storage classification ```text One-off task-mode decision: TaskInvocationTrace / execution_trace extension. Soft routing preference: memory_directive. Confirmed conditional trigger-action routing: standing_procedure. Executable graph: DOC23 task/template. Task run: execution_trace node linked to task, directive, template, entities, goals, artifacts, work products. ``` ### §7B.5 Boundary invariant A DOC23 task template is not a DOC3 skill procedure. A TaskInvocationDirective may reference DOC23 task templates, DOC3 procedures, DOC24 capabilities, DOC72 goals, and user preferences, but the executable graph remains DOC23 canonical. DOC3 graph-backed procedures execute through DOC24 direct injection from DOC72 contracts. DOC23 task graphs execute through DOC23/EC task runtime. Do not materialize DOC23 graphs as `SKILL.md` files. ### §7B.6 Example User request pattern: ```text Review this motion to dismiss and generate an opposition. ``` Potential directive: ```text When Will asks for a full opposition-drafting workflow: - resolve matter/work context; - gather complaint, motion, exhibits, prior filings, source docs; - search for existing Opposition Drafting task/template; - if available, suggest or run it after confirmation; - otherwise invoke Task Agent to design/adapt a saved task; - route substantial documents through DOC25; - save final draft as work product; - preserve task run in DOC72 unless incognito. ``` --- ## §8 — Task Agent education and Task Design Intelligence substrate ### §8.1 Purpose Task Agent cannot be reliable if it merely receives DOC23 specs in a prompt. The task system is too detailed, too schema-sensitive, and too fast-evolving. Task Agent must be educated by a hybrid, versioned, self-updating substrate. ### §8.2 Education stack Task Agent education consists of: ```text A. Task Agent Core Contract B. Task Design Doctrine C. Generated Task System Knowledge Pack D. Module/Port/Config/Validation Cards E. Module/Config Rationale Cards F. Design Pattern and Anti-Pattern Library G. Template/Preset Semantic Index H. Task Design Casebook I. DOC72 Goal/Preference/Entity Links J. DOC24 Capability/Connector/MCP/Model Live Registry K. DOC8/BDSM Learning Bundles L. Typed Task Agent Tools M. Task Agent Design Eval Suite ``` ### §8.3 Generated truth is necessary but insufficient Generated cards from DOC23/code/spec answer: ```text What exists? What ports exist? What config fields exist? What validation rules exist? What defaults exist? ``` They do not reliably answer: ```text Why does this feature exist? When should it be used? What design problem does it solve? What user preference makes it valuable? Which prior task demonstrates it working? What goal did it serve? ``` Therefore the Task System Knowledge Pack is hybrid. ### §8.4 TaskAgentCoreContract ```ts TaskAgentCoreContract { agent_id: "task_agent" role_summary: string hard_rules: string[] command_permissions: string[] forbidden_behaviors: string[] required_tool_sequence_for_creation: string[] degraded_mode_rules: string[] schema_version: "1.0" } ``` Core rules include: - Search templates/presets before creating new graphs. - Use tools/validators rather than memory alone. - Propose, do not silently mutate. - Do not bypass EC. - Do not create hidden runtime orchestration. - Validate/preflight before presenting executable proposals. - Explain design choices in user-facing terms. ### §8.5 TaskDesignDoctrine Human-authored design doctrine captures principles not derivable from code. ```ts TaskDesignDoctrine { doctrine_id: string version: string hard_principles: string[] preferred_patterns: string[] disfavored_patterns: string[] approval_rules: string[] examples: TaskDesignCaseRef[] goal_refs: GoalRef[] source_refs: SpecSourceRef[] schema_version: "1.0" } ``` Example doctrine entries: ```text Prefer visible graph topology over hidden autonomy. Prefer reuse of templates/presets before new graph creation. For complex legal analysis, suggest second-pass deepening or red-team review. For final legal work products, consider DOC25 ingestion and work_product linkage. For side-effecting outputs, require or recommend human review before delivery. For long-running tasks, prefer durable checkpoints/gates over implicit agent continuation. ``` ### §8.6 TaskSystemKnowledgePack ```ts TaskSystemKnowledgePack { pack_id: string doc23_version: string generated_at: string source_refs: SpecSourceRef[] source_hashes: Record core_contract_ref: StorageRef design_doctrine_refs: StorageRef[] module_cards: TaskModuleCard[] port_cards: PortSemanticsCard[] config_cards: ModuleConfigCard[] validation_cards: ValidationRuleCard[] rationale_cards: TaskModuleDesignRationaleCard[] config_rationale_cards: TaskConfigRationaleCard[] decision_cards: TaskDesignDecisionCard[] pattern_cards: TaskDesignPatternCard[] anti_pattern_cards: TaskAntiPatternCard[] template_semantic_cards: TaskTemplateSemanticCard[] module_preset_semantic_cards: ModulePresetSemanticCard[] capability_cards: CapabilityUseCard[] model_selection_cards: ModelSelectionCard[] ui_control_cards: TaskUIControlCard[] worked_examples: TaskWorkedExample[] design_case_refs: TaskDesignCaseRef[] learning_bundle_refs: StorageRef[] coverage_report_ref: StorageRef eval_report_ref?: StorageRef lifecycle_state: "active" | "staged" | "rejected" | "deprecated" | "stale" schema_version: "1.0" } ``` ### §8.7 Task Knowledge Compiler “Self-updating” means deterministic compilation, validation, evaluation, and activation. It does not mean the LLM rewrites its own instructions opportunistically. ```ts TaskKnowledgeCompilerRun { compiler_run_id: string input_sources: TaskKnowledgeSourceRef[] generated_pack_id: string source_hashes: Record generation_mode: | "spec_change" | "code_schema_change" | "module_catalog_change" | "capability_registry_change" | "template_preset_change" | "learning_bundle_refresh" | "manual_rebuild" validation_result_ref: StorageRef eval_result_ref?: StorageRef activation_decision: | "activated" | "staged" | "blocked_validation_failure" | "blocked_eval_regression" | "blocked_source_drift" prior_active_pack_id?: string created_at: string schema_version: "1.0" } ``` ### §8.8 TaskSystemKnowledgePackManifest ```ts TaskSystemKnowledgePackManifest { pack_id: string generated_at: string source_refs: SpecSourceRef[] source_hashes: Record doc23_version: string module_catalog_hash: string module_schema_hash: string validation_rule_hash: string route_registry_hash: string doc24_capability_registry_hash: string template_index_hash: string preset_index_hash: string learning_bundle_generation_id?: string coverage_report_ref: StorageRef eval_report_ref?: StorageRef lifecycle_state: | "active" | "staged" | "rejected" | "deprecated" | "stale" schema_version: "1.0" } ``` ### §8.8A TaskAgentEducationProfile Task Agent must have an active education profile that controls which resources it may use, how many cards it may retrieve, and which validation steps are mandatory. ```ts TaskAgentEducationProfile { profile_id: string active_knowledge_pack_id: string active_design_doctrine_version: string allowed_resource_kinds: Array< | "module_card" | "port_card" | "config_card" | "validation_card" | "rationale_card" | "decision_card" | "hint_card" | "pattern_card" | "anti_pattern_card" | "template_card" | "preset_card" | "design_case" | "goal_card" | "user_preference" | "spec_reference" | "worked_example" | "learning_bundle" > max_cards_per_design_turn: number require_template_search_before_new_graph: boolean require_validation_before_proposal: boolean require_preflight_before_save: boolean allow_learned_heuristic_injection: boolean allow_architect_doctrine_injection: boolean allow_prior_task_case_retrieval: boolean stale_pack_behavior: | "block_graph_proposals" | "use_live_catalog_and_warn" | "degraded_lookup_only" schema_version: "1.0" } ``` This profile prevents accidental over-reliance on stale or overly broad resources and makes Task Agent education inspectable. ### §8.9 TaskModuleCard ```ts TaskModuleCard { module_type: string display_name: string category: "trigger" | "step" | "utility" | "output" | "source" | "environment" | "system" purpose: string when_to_use: string[] when_not_to_use: string[] common_patterns: string[] common_mistakes: string[] input_ports: PortUseCard[] output_ports: PortUseCard[] config_summary: string important_config_fields: ConfigFieldUseCard[] context_behavior: string chain_history_behavior: string session_behavior?: string side_effect_behavior?: string compatible_modules: string[] incompatible_patterns: string[] examples: MiniGraphExample[] source_refs: SpecSourceRef[] source_authority: KnowledgeAuthorityClass schema_version: "1.0" } ``` ### §8.10 TaskModuleDesignRationaleCard ```ts TaskModuleDesignRationaleCard { module_type: string display_name: string why_it_exists: string primary_design_problem: string when_to_use: string[] when_not_to_use: string[] weak_signals_to_consider: string[] common_success_patterns: string[] common_failure_patterns: string[] examples_from_prior_tasks: TaskDesignCaseRef[] related_goal_refs: GoalRef[] related_pattern_refs: TaskDesignPatternRef[] authored_notes: string[] learned_notes: LearnedDesignNote[] source_authority: | "generated_from_doc23" | "architect_authored" | "learned_from_task_cases" | "user_preference" | "mixed" schema_version: "1.0" } ``` ### §8.11 TaskConfigRationaleCard ```ts TaskConfigRationaleCard { module_type: string field_name: string technical_effect: string why_it_exists: string when_to_use: string[] when_not_to_use: string[] interactions_with_other_fields: string[] examples: ConfigUseExample[] common_misconfigurations: string[] validation_refs: string[] source_authority: KnowledgeAuthorityClass schema_version: "1.0" } ``` ### §8.11A TaskDesignHintCard Some “when to use this” knowledge is too broad and situational for a hard rule but too important to leave to fresh LLM reasoning every time. These hints are first-class Task Agent education resources. ```ts TaskDesignHintCard { hint_id: string title: string hint_text: string applies_to_task_types: string[] applies_to_domains: string[] applies_to_module_types: string[] applies_to_config_fields: string[] applies_to_decision_families: string[] weak_signals: string[] examples: MiniGraphExample[] counterexamples: MiniGraphExample[] related_goal_refs: GoalRef[] related_preference_refs: string[] related_case_refs: TaskDesignCaseRef[] authority: KnowledgeAuthorityClass activation_policy: "always_retrieve_if_matched" | "retrieve_when_uncertain" | "example_only" schema_version: "1.0" } ``` Examples of hint-card content: ```text If a complex analysis task is likely to produce a shallow first answer, consider a second-pass deepening step or red-team/revision pattern. If a module produces a memo that downstream modules cite or the user often opens later, suggest making it findable with summary. If a prompt arriving mid-process asks for a capability not mounted at design time, allow DOC24 runtime expansion only within the module's capability envelope. If source documents are long or likely to be revisited, prefer DOC25 retrieval tools over dumping all source text into ordinary context. ``` Hint cards are advisory. They may influence recommendations, pattern retrieval, and question policy, but they do not override validation, policy, or canonical module semantics. ### §8.12 TaskDesignDecisionCard The decision table must not be one giant table. It is decomposed into scoped decision cards. ```ts TaskDesignDecisionCard { card_id: string decision_family: | "context_routing" | "control_flow" | "iteration" | "triggering" | "artifact_intake" | "capability_binding" | "model_selection" | "output_delivery" | "memory_policy" | "evaluation" question: string hard_rules: string[] heuristics: string[] anti_patterns: string[] required_validator_checks: string[] examples: MiniGraphExample[] source_refs: SpecSourceRef[] learned_overlay_refs: string[] schema_version: "1.0" } ``` ### §8.13 Example decision guidance #### Context routing ```text Use data_in for the object the module should operate on. Use context_in for reference material, constraints, source documents, formatting guidance, or background. Use instruction_in for dynamic task instructions when the module supports it. Use chain history for prior step outputs that should remain background. Use Carry Forward when a specific prior output must survive context decay and remain visible downstream. Use environment.config for task-wide ambient instructions or sources. ``` #### Loop vs continuation ```text Use Loop Controller when graph execution must re-run work until a condition is satisfied. Use session continuation when preserving the same agent session state is the goal. Use second-pass deepening when the goal is substantive improvement without requiring full loop machinery. Use Judge/Red-Team -> Revision when an evaluative critique should drive changes. ``` #### Hold / Delay / Human Review / Signal ```text Hold pauses until release signal/condition. Delay waits by time. Human Review waits for user decision. Signal sends control-flow pulse, not substantive content. Junction AND synchronizes multiple incoming paths. ``` #### Email trigger ```text Email trigger emits full email, body, and attachments through explicit ports. Attachments are files and may need DOC25 ingestion. Body text may go to data_in. Full email may go to context_in if headers/sender/thread matter. Do not infer hidden routing or summarization at the trigger. ``` ### §8.14 TaskWorkedExample ```ts TaskWorkedExample { example_id: string name: string user_request: string interpreted_intent: string selected_template_refs: string[] selected_preset_refs: string[] blueprint_ref: StorageRef graph_ref: StorageRef explanation: string why_not_alternatives: string[] validation_result_ref: StorageRef source_refs: SpecSourceRef[] schema_version: "1.0" } ``` Required example families: - Draft complaint. - Evaluate complaint. - Red-team document. - Email attachment summarizer. - Docket monitor. - Recurring research monitor. - File watcher -> analysis -> output. - Human review loop. - Judge/experiment setup. - Coding task. - Task-to-task handoff. - Library ingestion task. ### §8.15 Knowledge authority classes ```ts KnowledgeAuthorityClass { source_kind: | "generated_from_doc23_schema" | "generated_from_doc24_registry" | "architect_authored" | "user_taught" | "learned_from_trace" | "learned_from_assessment" | "llm_inferred_candidate" authority_level: | "hard_rule" | "strong_default" | "soft_heuristic" | "example" | "candidate" may_affect: | "validation" | "recommendation" | "template_matching" | "question_policy" | "artifact_policy" | "graph_generation" requires_review_before_activation: boolean schema_version: "1.0" } ``` ### §8.16 TaskKnowledgeDrift ```ts TaskKnowledgeDrift { drift_id: string drift_kind: | "module_added" | "module_removed" | "port_changed" | "config_field_changed" | "validation_rule_changed" | "route_changed" | "capability_registry_changed" | "template_changed" | "preset_changed" | "spec_hash_changed_without_card_update" | "card_source_missing" | "eval_regression" affected_cards: string[] severity: "info" | "warning" | "blocking" remediation: | "regenerate_pack" | "manual_card_review" | "run_eval_suite" | "block_activation" | "accept_noop" schema_version: "1.0" } ``` ### §8.17 Typed Task Agent tools Task Agent should call tools for current truth and validation. ```ts task_catalog.search_modules(query, constraints) task_catalog.get_module_card(module_type) task_catalog.get_config_schema(module_type) task_catalog.explain_config(module_type, field_name) task_catalog.search_patterns(query) task_catalog.search_antipatterns(query) task_templates.search(intent, context) task_templates.get_semantic_card(template_id) task_templates.compare_to_intent(template_id, task_intent) task_presets.search(module_type, intent) task_presets.get_semantic_card(preset_id) task_graph.validate(graph) task_graph.simulate(graph, sample_inputs) task_graph.preflight(blueprint_or_graph) task_graph.explain(graph) task_graph.diff(old_graph, new_graph) task_capabilities.resolve_needs(blueprint, graph) task_capabilities.check_availability(capability_ids) task_capabilities.suggest_for_module(module_type, intent) task_models.recommend(module_type, task_type, budget, latency, quality, privacy) task_runs.inspect(run_id) task_artifacts.search(run_id_or_task_id, query) task_spec.lookup(query, owner_doc, expected_object_kind?) ``` ### §8.18 Task Agent Design Eval Suite Task Agent proposals must be evaluated. Evals should include representative user requests and expected task-design properties. ```ts TaskAgentDesignEvalSuite { suite_id: string examples: TaskDesignEvalExample[] scorers: TaskDesignScorer[] baseline_task_agent_version: string candidate_task_agent_version: string result_ref?: StorageRef schema_version: "1.0" } ``` Scorers should cover: - template reuse correctness; - module choice correctness; - port routing correctness; - context routing correctness; - capability binding correctness; - artifact policy correctness; - human gate/side-effect safety; - graph validity; - preflight quality; - question quality; - user preference alignment; - explanation clarity. ### §8.19 Anti-monolithic prompt / skill-file rule Task Agent must not be educated primarily by one huge `skill.md`, one enormous system prompt, or pasted full specs. Those may exist as export/debug/fallback artifacts, but they are not the primary runtime education substrate. Required pattern: ```text Small stable Task Agent Core Contract + generated canonical module/port/config/validation cards + authored design doctrine + rationale cards + decision cards + worked examples + prior-task design cases + template/preset semantic cards + DOC24 live capability/model registry + DOC72 goal/entity/preference links + DOC8/BDSM learning bundles + typed lookup/validation/preflight tools ``` Full specs remain available through `task_spec.lookup`, but the Task Agent should retrieve exact sections only when needed. ### §8.19A Spec Librarian behavior Specs are source truth, but full-spec prompt stuffing is forbidden. Task Agent should use generated cards first, typed tools second, and exact spec retrieval only when needed. ```ts task_spec.lookup({ query: string owner_doc?: "DOC23" | "DOC24" | "DOC72" | "DOC25" | "DOC73" | "DOC3" | "DOC8" | "EC" expected_object_kind?: | "module" | "port" | "config" | "validation" | "route" | "storage" | "capability" | "procedure" | "pattern" | "policy" }) ``` Task Agent may enter a Spec Librarian behavior when exact detail is needed: ```text Show me the operative definition of utility.hold. Show me Agent Task named output behavior. Show me email trigger ports. Show me Loop Controller stop semantics. Show me whether context_in cascades. Show me validation rules for cycles. ``` Spec lookup output should be cited internally in the Task Agent proposal record so later audits can trace which spec/card/tool informed the proposed design. ### §8.20 Required decision-card families and minimum examples The Knowledge Pack must include decision cards for at least the following families. These cards are where “when to use this” guidance lives. The space is too broad for one giant decision table, so each family is cardized and backed by examples. #### Context routing and context survival ```text Use data_in when the downstream module should operate on the item. Use context_in when the downstream module needs reference material, source documents, constraints, formatting guidance, or background. Use instruction_in only when the module supports dynamic instructions and the input is instruction-like. Use chain history for ordinary prior outputs that can be treated as background. Use Carry Forward when a specific prior output must survive context decay and remain visible downstream. Use environment.config for task-wide ambient context or standing task instructions. Use DOC24 runtime expansion when upstream content reveals an unanticipated context/tool need. ``` #### Agent Task output shaping ```text Use separate named outputs when downstream modules need distinct artifacts, such as issue matrix, damages memo, draft complaint, red-team prompt, or file output. Use file_only_out when downstream consumers need actual FileRefs. Use signal_out for completion/control flow, not substantive content. Use error_out for failure routing. Use Task Artifact Index entries for every material output, even if the output remains run-scoped. Avoid forcing one large markdown blob downstream when separate outputs would improve routing, memory policy, or assessment. ``` #### Iteration, loops, and session continuation ```text Use Loop Controller when graph execution must repeat until a condition, judge, review, or iteration limit is satisfied. Use session continuation when preserving the same agent working session is the goal. Use second-pass deepening when the goal is deeper analysis and missed-issue reduction without a full loop. Use Judge/Red-Team -> Revision when critique should drive a revised output. Use fresh isolated sessions for independent variants or experiments where context bleed would distort comparison. ``` #### Holds, gates, waits, and signals ```text Use Human Review when a user decision, approval, rejection, or revision is required. Use Hold when the graph must wait for a release signal/condition without implying a substantive human judgment. Use Delay for time-based wait. Use Signal for control-flow pulse only. Use Junction AND to synchronize independent incoming paths. Do not hide waits inside Task Agent; waits must remain graph-visible. ``` #### Triggers and email workflows ```text Email trigger emits explicit full-email, body, and attachment outputs. Route email body to data_in when the body is the thing to analyze. Route full email to context_in when sender, headers, thread, or surrounding metadata matter. Route attachments as FileRefs and consider DOC25 processing when they are documents. Do not imply trigger-level summarization. Add an Agent Task or Transform module for summarization. Email sends and other external outputs should generally require human review unless policy says otherwise. ``` #### Capability, connector, skill, procedure, and MCP use ```text Declare known required/allowed/denied capabilities at module config time. Let DOC24 resolve live tools/connectors/MCP servers/procedures/models. Allow runtime expansion only inside the module's policy envelope, with receipts. Use DOC3 procedures when the needed behavior is a learned semantic procedure. Use MCP servers as capability providers, not raw unlimited tool lists. Use capability utility learning for recommendation/ranking/warning, not silent deletion of explicit user bindings. ``` #### Model selection ```text Use stronger models for final drafting, high-stakes analysis, red-team, and judges. Use cheaper/faster models for classification, routing, extraction pre-pass, simple transforms, and low-risk summaries. Use local-only models when PropA/EC policy or sensitivity requires local execution. Use coding ACP module only when actual coding workspace, filesystem, terminal, diff, or test operations are needed. Always snapshot the effective model and capability state at run start. ``` ### §8.21 Knowledge Pack self-update authority rule The Knowledge Pack is self-updating only through compiler and activation mechanics. It may receive input from prior tasks, user edits, assessments, and learning signals, but candidate updates must be authority-classified: ```text Generated canonical mechanics Authority: hard rule. Source: DOC23 spec/code/schema/catalog/validator. Auto-update via compiler. Architect-authored doctrine and rationale Authority: hard rule or strong default depending on card. Source: architect/user-authored guidance. Updates require explicit approval unless marked editable soft guidance. Learned heuristics and prior-task cases Authority: soft heuristic, example, or candidate unless promoted. Source: TaskCreationSessionTrace, TaskAssessment, DOC8/BDSM, user edits. Updates may affect recommendations, not canonical semantics. LLM-inferred candidates Authority: candidate only. Source: extraction/synthesis. Requires review or eval before activation. ``` Task Agent must never treat learned examples as changes to port semantics, validation rules, side-effect policy, route truth, or live connector availability. ### §8.22 TaskAgentInstructionCandidate Task Agent's own instructions may improve, but only through proposal and evaluation. ```ts TaskAgentInstructionCandidate { candidate_id: string target_layer: | "core_contract" | "design_doctrine" | "pattern_card" | "hint_card" | "module_card_annotation" | "question_policy" | "template_matching_rule" proposed_change: string rationale: string evidence_refs: StorageRef[] supporting_signal_refs: string[] counter_evidence_refs: StorageRef[] eval_suite_ref?: string eval_result_ref?: string approval_required: boolean rollback_ref?: StorageRef lifecycle_state: "candidate" | "staged" | "approved" | "rejected" | "applied" | "rolled_back" schema_version: "1.0" } ``` The Task Agent must not silently rewrite its core contract. Low-risk hint-card or semantic-card additions may auto-stage under policy; changes to core hard rules require architect/user approval. --- ## §8A — Task Knowledge Pack boundary and readiness ### §8A.1 TKP scope correction The Task Knowledge Pack (TKP) is the Task Agent's detailed task-system reference and design substrate. It is not: ```text the ambient runtime router; the Task Agent core contract; the DOC24 invocation doctrine; a monolithic prompt; a DOC73 user library; an ordinary prompt artifact; or a replacement for live schema/validator tools. ``` It may reference the active Task Agent Core Contract and invocation doctrine for auditability, but it does not own them. ### §8A.2 Owner split | Artifact | Owner / role | |---|---| | Task Agent Core Contract | System-agent config; stable hard rules, permissions, prohibitions. | | DOC24 Ambient Task Awareness Card | Runtime packet assembly; tiny direct-first rule. | | Task Mode Resolver Policy | DOC24/EC; task signals, vetoes, opportunity packet, token caps. | | Task Invocation Directives | DOC72/DOC24; learned/user-taught routing guidance. | | TKP | DOC23/EC; Task Agent design/reference substrate. | | Task Agent Eval Suite | DOC23/Addenda A/EC; tests design/review/invocation quality. | ### §8A.3 TKP content TKP SHOULD contain: ```text 1. Module cards 2. Port cards 3. Config rationale cards 4. Context-routing cards 5. Loop / hold / signal / junction / switch decision cards 6. Trigger cards 7. Output/artifact intake cards 8. Capability binding cards 9. DOC24 runtime-expansion cards 10. DOC72 memory integration cards 11. DOC25/DOC73 artifact-routing cards 12. Model-selection cards 13. Worked examples 14. Anti-patterns 15. Task template semantic cards 16. Module preset semantic cards 17. Task design pattern cards 18. Task design casebook 19. User preference overlays 20. Goal/rationale links 21. Prompt design guide cards 22. Prompt evaluation guide cards 23. Spec source references 24. Coverage and drift reports ``` ### §8A.4 TKP usage TKP is used for: ```text designing tasks reviewing existing tasks explaining graphs answering task-system questions generating blueprints generating graph diffs validating config choices comparing templates/presets improving task prompts creating prompt-evaluation tasks learning from prior task-design cases ``` TKP is not normally used for: ```text deciding every ordinary chat turn ambient Elnor prompt context simple direct answers OpenClaw heartbeat/session monitoring ordinary tool routing ``` ### §8A.5 TKP readiness ```ts export type TaskKnowledgePackReadiness = { pack_id: string; readiness_state: | "schema_only" | "canonical_cards_generated" | "core_rationale_cards_authored" | "core_patterns_authored" | "worked_examples_loaded" | "task_invocation_cards_loaded" | "prompt_design_cards_loaded" | "eval_suite_passing" | "production_task_agent_ready"; activated_for: | "inspection_only" | "task_review" | "task_design_draft_only" | "prompt_review" | "task_design_production"; missing_required_cards: string[]; stale_source_refs: string[]; eval_failures: string[]; schema_version: "1.0"; }; ``` Task Agent MUST NOT be considered production-ready for task design until TKP has at least: ```text module cards for every operative DOC23 module; port/config cards for all agent-capable, utility, trigger, output, system, and evaluation modules; pattern cards for context routing, loops, holds, signals, triggers, outputs, artifact policy, capability binding, session continuation, human gates, and prompt quality; template/preset semantic cards; worked examples; Task Agent eval suite; drift detection against DOC23 schema/code changes. ``` ### §8A.6 Self-updating TKP “Self-updating” means deterministic or governed regeneration, not ad hoc LLM self-rewriting. ```text Spec/code/schema/registry changes -> Task Knowledge Compiler detects drift -> new candidate TKP generated -> coverage and validation checks run -> Task Agent eval suite runs -> EC activates or stages/rejects -> active pack pointer updates only on pass/approval ``` ```ts export type TaskKnowledgeCompilerRun = { compiler_run_id: string; input_sources: TaskKnowledgeSourceRef[]; generated_pack_id: string; source_hashes: Record; generation_mode: | "spec_change" | "code_schema_change" | "module_catalog_change" | "capability_registry_change" | "template_preset_change" | "learning_bundle_refresh" | "manual_rebuild"; validation_result_ref: StorageRef; eval_result_ref?: StorageRef; activation_decision: | "activated" | "staged" | "blocked_validation_failure" | "blocked_eval_regression" | "blocked_source_drift"; prior_active_pack_id?: string; created_at: string; schema_version: "1.0"; }; ``` ### §8A.7 TKP as internal knowledge repository TKP may use DOC73-like mechanics such as cardization, source references, versioned artifacts, indexed retrieval, and deep lookup. It should not be presented as a normal user-facing DOC73 library. It is an internal system-operation knowledge repository owned by DOC23/EC and consumed by Task Agent. --- ## §9 — Task design learning and self-improvement ### §9.0 V3 FINAL coordination signal envelope (R0.7) Per Addenda A ↔ Addenda B coordination V3 FINAL §2.7 and §2.11, all task-design learning signals wrap in the unified `EvaluationLearningSignalEnvelope` (DOC23 Evaluation Common Contracts §5). This Core owns the emission of five Phase 1 signal types; signal payloads carry the task design signature for cross-run correlation. **Eight Phase 1 signal types (per coordination V3 §2.11):** | Signal type | Owning addendum | Emitter | Consumer | |---|---|---|---| | `outcome_evaluation` | Addenda B Core R0.7.1 (this doc) | V3.3.1 Outcome Evaluator | DOC8/BDSM, Pattern primitive | | `repair_cycle` | Addenda B Core R0.7.1 | V3.3.1 Revisor | DOC8/BDSM, Pattern primitive | | `task_process_gap_runtime` | Addenda B Core R0.7.1 | Task Agent or V3.3.1 Revisor | Task Agent (consume), DOC8/BDSM | | `taint_clearance` | Addenda B Core R0.7.1 | V3.3.1 Revisor or user-action surfaces | DOC8/BDSM (privilege learning) | | `hard_call_resolution` | Addenda B Core R0.7.1 | V3.3.1 Revisor when Hard Calls resolve | DOC8/BDSM (escalation learning) | | `prompt_comparison` | Addenda A R4.1 V3 | Experiment | DOC8/BDSM (prompt learning) | | `task_design_correlation` | DOC8/BDSM | BDSM correlation analysis | Task Agent (suggestion surfacing) | | `user_action` | R0.6.4 §24A.7 (continues) | UI surfaces | Pattern primitive | **Common envelope per Common Contracts §5.1:** ```ts EvaluationLearningSignalEnvelope { signal_id: string signal_type: SignalType // per table above task_id, run_id: string evaluation_chain_id?: string source_module_id: string source_activation_seq: number // Governance — gated by EC Core (OBL-XDOC-EC-POLICY-SIGNALS-01) governance_policy_ref: string source_policy_snapshot_ref?: StorageRef data_class: "public" | "internal" | "privileged" | "local_only" matter_id?: string pattern_promotion_eligible: boolean // Model context — for cheap-LLM learning mode (V3.3 §6.16) model_class: "cheap_local" | "cheap_api" | "medium" | "expensive_frontier" model_fingerprint: string // Task design context — optional; populated for signals where task design // is relevant (per coordination V3 §2.7) task_design_signature?: { graph_topology_hash: string upstream_module_types: string[] upstream_module_version_constraints?: Record segment_ids?: string[] task_blueprint_ref?: string // links to §6 Task Blueprint // enables blueprint-level correlation } emitted_at: ISO8601 payload_ref: StorageRef // typed signal-specific payload schema_version: 1 } ``` **Signal payloads owned by Core R0.7.1:** The five Core-owned signal payloads are specified below. Each is wrapped in `EvaluationLearningSignalEnvelope` at emit time. #### §9.0.1 OutcomeEvaluationSignal payload ```ts OutcomeEvaluationSignal { evaluator_result_ref: string // EvaluationResultEnvelope.result_id outcome_spec_ref: string // EvaluationOutcomeDefinition reference outcome_spec_template_id?: string // critical for cross-run learning verdict: "passed" | "failed" | "indeterminate" findings_summary: { total_count: number critical_count: number major_count: number minor_count: number by_criterion_id: Record } evidence_status_summary: Record schema_version: 1 } ``` Emitted by V3.3 Evaluator on every completed evaluation. #### §9.0.2 RepairCycleSignal payload (full Phase 1 form) Per coordination V3 §2.11, RepairCycleSignal emits the full Phase 1 form including `qualitative_delta` (always populated), per-criterion score deltas when Judge is in the loop, and `taint_evolution` per V3.1's actual taint model. ```ts RepairCycleSignal { initial_evaluator_result_ref: string // EvaluationResultEnvelope.result_id final_evaluator_result_ref: string before_snapshot_ref: StorageRef // V3.3 §5.16 EvaluationSnapshot after_snapshot_ref: StorageRef revisor_actions: RevisorActionRecord[] // per V3.3 §2.7 RevisorActionKind cycle_outcome: | "passed_after_repair" | "still_failing" | "diverged" | "max_iterations" | "escalated_to_human" // when Hard Call surfaces during cycle // Qualitative delta — ALWAYS populated (Evaluator data sufficient) qualitative_delta: { findings_before: number findings_after: number critical_findings_before: number critical_findings_after: number major_findings_before: number major_findings_after: number minor_findings_before: number minor_findings_after: number resolved_finding_ids: string[] new_finding_ids: string[] per_criterion_finding_counts_before: Record per_criterion_finding_counts_after: Record } // Numeric delta — populated when Judge ran in the loop (Pattern A, B, or C) per_criterion_score_deltas: Record attribution: AttributionRecord[] }> // Taint evolution — per coordination V3 §2.11 taint_evolution: { taint_at_initial_evaluation: TaintClass taint_at_final_evaluation: TaintClass taint_transitions: Array<{ action_id: string transition_kind: | "expanded" | "cleared" | "unchanged" | "isolated_to_candidate" // V3.1 P30 sandboxed eval pathway // For "expanded": expanded_from?: TaintClass expanded_to?: TaintClass expansion_source_kind?: | "new_source_pulled" | "tainted_revision_input" | "ungoverned_user_input" | "sub_agent_advice_taint_inheritance" // For "cleared": clearance_method?: TaintClearanceMethod // V3.3 §0.4 typed enum clearance_record_ref?: StorageRef cleared_by_actor_ref?: ActorRef cleared_by_access_tier?: AccessTier // V3.1 P19 tier-binding // For "isolated_to_candidate": candidate_version_ref?: StorageRef // Cross-link to per-event signals related_taint_clearance_signal_ids?: string[] }> } // Semantic guards — required when Judge ran metric_semantics_version?: string scorer_hash?: string evaluation_method?: string assurance_basis?: AssuranceBasis[] schema_version: 1 } RevisorActionRecord { action_id: string action_kind: RevisorActionKind // derived projection per V3.3 §2.6 underlying_step_kind: RevisionPlanStepKind // V3.1 raw enum underlying_strategy_kind: RepairStrategyKind // V3.1 raw enum targeted_criterion_ids: string[] // empty when global plan context targeted_finding_ids: string[] action_scope_ref: ArtifactScopeRef // Common Contracts §7 revision_operation_receipt_ref: StorageRef // V3.3 §11.6 receipt linkage schema_version: 1 } AttributionRecord { action_id: string attribution_basis: | "scope_match" | "criterion_targeted" | "finding_targeted" | "user_confirmed" | "global_plan_context" // explicitly excludes "revisor_self_assessment" per V3.1 P20 sycophancy // mitigation; Revisor's own goal-impact assessment NEVER feeds learning attribution_confidence: number // 0..1 } ``` Emitted by V3.3.1 Revisor at outermost cycle closure (post-evaluation after final revision attempt). #### §9.0.3 TaskProcessGapSignal payload (runtime) ```ts TaskProcessGapSignal { detected_during_run_id: string gap_kind: | "missing_module" | "missing_capability" | "missing_source_module" | "missing_verification" | "missing_handoff" | "policy_blocked_path" | "capability_version_mismatch" observed_failure_kind?: FailureKind // V3.1 §0.4 affected_outcome_refs: string[] suggested_remediation?: TaskGraphPatchProposal schema_version: 1 } ``` Emitted by Task Agent or V3.3.1 Revisor when a runtime gap is observed mid-execution. Task Agent consumes both this signal (its own emission) and aggregate `task_design_correlation` signals from DOC8/BDSM to surface task-design suggestions to the user. #### §9.0.4 TaintClearanceSignal payload ```ts TaintClearanceSignal { cleared_artifact_ref: StorageRef cleared_artifact_version_ref: StorageRef cleared_from: TaintClass cleared_to: TaintClass // typically "system_trusted" after clearance clearance_method: TaintClearanceMethod // V3.3 §0.4 clearance_record_ref: StorageRef // V3.1 TaintClearanceRecord cleared_by_actor_ref: ActorRef cleared_by_access_tier: AccessTier // V3.1 P19 tier-binding scope_cleared: ArtifactScopeRef // Common Contracts §7 // Context for learning artifact_kind: string domain_tags?: string[] schema_version: 1 } ``` Emitted when taint is actively cleared (not when isolated to candidate or expanded — those are tracked in RepairCycleSignal's taint_transitions). Enables BDSM to learn clearance patterns and to detect privilege-escalation patterns (junior users clearing more frequently than supervisor-tier policy expects). #### §9.0.5 HardCallResolutionSignal payload ```ts HardCallResolutionSignal { hard_call_ref: string // V3.3 §7.9 HardRevisionCall ID resolution_at: ISO8601 hard_call_kind: HardRevisionCallKind // V3.1 §0.4 resolution_path: | "accept_proposed" | "modify_proposed" | "defer" | "reject" | "escalate_to_supervisor" resolution_actor_ref: ActorRef resolution_actor_access_tier: AccessTier resolution_rationale_summary?: string // user-stated reason (when provided) // For learning correlation preceding_revision_plan_ref?: StorageRef // plan that triggered the Hard Call schema_version: 1 } ``` Emitted when a Hard Revision Call resolves. Enables BDSM to learn when to surface Hard Calls vs. auto-resolve, and to detect resolution-path patterns by user/tier. #### §9.0.6 Signal emission and consumption flow ``` [V3.3 Evaluator] → OutcomeEvaluationSignal → [V3.3.1 Revisor] → RepairCycleSignal, TaskProcessGapSignal, TaintClearanceSignal, HardCallResolutionSignal → ↓ [EC Core policy gate] ↓ [DOC8/BDSM signal stream] ↓ [TaskDesignCorrelationSignal aggregation] ↓ [DOC72 Pattern primitive store] ↓ [Task Agent surfaces suggestions] ↓ [User in §20 UI] ``` EC Core's compiled policy engine (OBL-XDOC-EC-POLICY-SIGNALS-01) gates every signal at the envelope layer based on `data_class`, `matter_id`, and `pattern_promotion_eligible`. Privileged-matter signals do not auto-promote to durable learning; matter-scoped signals do not auto-cross matter boundaries (per V3.3 §13.4 firewall). ### §9.1 Highest-value learning target The highest-value self-learning for DOC23 is **task creation quality**, not merely post-run error detection. The system should learn how to convert underspecified user intent into rich, reliable, reusable processes. ### §9.2 What to learn Task-design learning should learn: 1. Which templates match which user intents. 2. Which module presets work in which task types. 3. Which optional process steps improve outcomes. 4. Which user questions are useful or annoying. 5. Which output archetypes the user prefers. 6. Which artifact-intake defaults are valuable. 7. Which capabilities/connectors/tools are useful in task contexts. 8. Which human review points prevent errors. 9. Which model strategies balance cost/quality. 10. Which graph patterns create reliable long-running workflows. 11. Which Task Agent suggestions are accepted or rejected. 12. Which prior task-design decisions served which goals. ### §9.3 What not to learn automatically The system must not learn canonical mechanics by inference. It must not infer or mutate: - technical meaning of ports; - config field semantics; - validation rules; - side-effect policy; - route paths; - live connector availability; - security policy overrides; - executable graph rewrites; - template/preset material mutations. Those are generated/canonical or approval-gated. ### §9.4 TaskCreationSessionTrace ```ts TaskCreationSessionTrace { design_session_id: string user_initial_request: string resolved_intent: string inferred_task_type: string inferred_domain?: string context_class_key: string templates_considered: TemplateCandidate[] presets_considered: PresetCandidate[] design_patterns_considered: PatternCandidate[] design_cases_considered: TaskDesignCaseRef[] goals_considered: GoalRef[] questions_asked: TaskAgentQuestionTrace[] user_answers: TaskAgentAnswerTrace[] proposed_blueprint_ref: StorageRef proposed_graph_ref: StorageRef final_graph_ref?: StorageRef user_edits: TaskCreationUserEdit[] accepted_suggestions: string[] rejected_suggestions: string[] capability_plan_ref: StorageRef artifact_policy_ref: StorageRef preflight_report_ref?: StorageRef final_saved_task_id?: string final_template_id?: string later_run_refs: string[] outcome: | "saved_as_is" | "saved_with_user_edits" | "abandoned" | "restarted" | "rejected" learning_signal_refs: string[] schema_version: "1.0" } ``` ### §9.5 TaskAgentDesignLearningSignal ```ts TaskAgentDesignLearningSignal { signal_id: string design_session_id?: string task_id?: string run_id?: string task_type: string context_class_key: string signal_kind: | "template_selected" | "template_rejected" | "preset_added" | "preset_removed" | "question_helpful" | "question_annoying" | "output_preference_confirmed" | "artifact_policy_changed" | "capability_needed" | "capability_unused" | "graph_pattern_success" | "graph_pattern_failure" | "user_manual_edit_after_proposal" | "second_pass_deepening_accepted" | "second_pass_deepening_rejected" | "red_team_default_accepted" | "red_team_default_rejected" evidence_ref: StorageRef recommended_update_kind: | "semantic_card_update" | "pattern_card_update" | "template_update_proposal" | "preset_update_proposal" | "task_agent_instruction_candidate" | "question_policy_update" | "none" schema_version: "1.0" } ``` ### §9.6 TaskDesignCase Prior user-created, user-liked, user-refined, or high-value tasks should become design cases. ```ts TaskDesignCase { case_id: string source_task_id: string source_run_ids: string[] source_blueprint_id?: string task_type: string domain?: string context_class_key: string user_intent_summary: string final_blueprint_summary: string design_decisions: TaskDesignDecision[] user_refinements: UserRefinementTrace[] outcome_summary: string assessment_refs: string[] reusable_lessons: ReusableDesignLesson[] anti_lessons: ReusableAntiPatternLesson[] goal_refs: GoalRef[] artifact_refs: ArtifactRef[] template_or_preset_refs: string[] lifecycle_state: "candidate" | "active" | "archived" schema_version: "1.0" } ``` ### §9.7 TaskDesignDecision ```ts TaskDesignDecision { decision_id: string module_id?: string graph_region_ref?: string selected_design: string alternatives_considered: string[] rejected_alternatives: string[] because: string goal_refs: GoalRef[] expected_effect: string actual_effect?: string evidence_refs: StorageRef[] confidence: number schema_version: "1.0" } ``` Example: ```text Decision: Use Loop Controller after initial complaint draft. Because: user wanted iterative refinement rather than one-pass analysis. Goal served: improve substantive depth and catch missed pleading defects. Actual effect: red-team feedback was incorporated into revised complaint. ``` ### §9.8 TaskDesignExtractionBundle ```ts TaskDesignExtractionBundle { bundle_id: string source_kind: | "task_creation_session" | "task_run" | "task_assessment" | "user_refined_task" | "template_update" | "manual_architect_note" extracted_design_cases: TaskDesignCase[] extracted_design_patterns: TaskDesignPatternCard[] extracted_user_preferences: TaskDesignPreferenceCandidate[] extracted_goal_links: TaskGoalLinkCandidate[] extracted_template_updates: TemplateUpdateProposal[] extracted_preset_updates: TemplateUpdateProposal[] confidence: number review_required: boolean schema_version: "1.0" } ``` ### §9.9 TaskDesignImprovementProposal ```ts TaskDesignImprovementProposal { proposal_id: string proposal_kind: | "task_agent_instruction_candidate" | "design_pattern_card_update" | "anti_pattern_card_update" | "template_semantic_card_update" | "module_preset_semantic_card_update" | "task_template_update" | "module_preset_update" | "output_archetype_default_update" | "artifact_policy_default_update" | "question_policy_update" target_ref: string rationale: string evidence_refs: StorageRef[] support_count: number counter_evidence_refs: StorageRef[] proposed_patch_ref: StorageRef prior_hash: string candidate_hash: string requires_eval: boolean eval_suite_ref?: string eval_result_ref?: string approval_policy: | "auto_apply_low_risk" | "user_review" | "architect_review" | "never_auto_apply" lifecycle_state: | "candidate" | "staged" | "approved" | "rejected" | "applied" | "rolled_back" schema_version: "1.0" } ``` ### §9.10 Second-pass deepening pattern The user has stated a preference for follow-up prompts such as “look deeper” in complex analysis tasks. This must be represented as a pattern/preference, not merely remembered as chat text. ```ts TaskDesignPatternCard { pattern_id: "complex_analysis_second_pass_deepening" display_name: "Second-Pass Deepening" purpose: "Improve complex analytical outputs by requiring a follow-up pass that looks for missed issues, deeper implications, weaknesses, and revisions." when_to_use: [ "complex legal analysis", "brief or complaint evaluation", "multi-document synthesis", "strategic analysis where first-pass answers may be shallow" ] when_not_to_use: [ "simple extraction", "low-cost quick answer", "deterministic formatting tasks" ] canonical_graph_shapes: [ "Agent Task -> Agent Task deeper review -> revision/final output", "Agent Task -> Judge/Red Team -> revision Agent Task", "Agent Task -> Loop Controller with deepening instructions" ] user_preference_refs: ["Will prefers second-pass deepening in complex analysis tasks"] goal_refs: ["increase substantive depth", "reduce missed issues", "surface hidden weaknesses"] schema_version: "1.0" } ``` ### §9.11 Question utility and output archetype learning Task Agent should learn that some questions and output suggestions are routinely useful. Examples: ```text Complaint evaluation: default outputs: executive summary + issue matrix + revision plan. Draft complaint: default outputs: complaint draft + damages memo + red-team report + revision summary. Docket monitoring: default outputs: filing summary + deadline table + alert. ``` ### §9.12 Capability utility learning Capability utility learning is not mainly for deleting tools from saved tasks. It is used for: - Task Agent design recommendations; - DOC24 capability ranking and fallback ordering; - preflight warnings; - failure diagnosis; - connector health/reliability reports; - reducing tool bloat in modules; - procurement/maintenance decisions; - suggesting capabilities when designing new tasks. It must not silently remove explicit user-selected capabilities or mutate saved configs. ### §9.13 Task Design Casebook promotion rules The Task Design Casebook is the bridge between prior task experience and Task Agent design wisdom. It should extract lessons from tasks the user created, liked, refined, reused, or assessed as successful. A prior task is eligible to become an active design case when at least one of the following is true: - user manually created or substantially refined the graph; - user reused the task/template; - user accepted Task Agent's proposal with few edits; - user edited the proposal in a way that reveals a reusable design preference; - Task Assessment rated the run or process highly; - an intermediate artifact was repeatedly opened, cited, reused, promoted, or incorporated downstream; - the task illustrates a module/config choice that is not obvious from schema alone. ### §9.14 Prior-task “because and goal” extraction Casebook extraction must capture not just what was built but why. The minimum case-learning unit is: ```ts PriorTaskDesignLesson { lesson_id: string source_task_id: string source_run_ids: string[] module_or_graph_region_ref?: string design_choice: string because: string goal_refs: GoalRef[] expected_effect: string actual_effect?: string when_to_reuse: string[] when_not_to_reuse: string[] evidence_refs: StorageRef[] authority_level: "example" | "soft_heuristic" | "strong_default_candidate" review_state: "candidate" | "active" | "rejected" | "archived" schema_version: "1.0" } ``` Example: ```text Design choice: Loop Controller was used after the red-team module. Because: The user wanted iterative improvement until the revised complaint addressed major red-team findings. Goal served: Improve substantive depth and reduce missed pleading weaknesses. Actual effect: The revised draft incorporated the red-team findings and was accepted by the user. Reusable lesson: For complex drafting tasks where critique should drive revision, prefer Red-Team -> Loop/Revision over one-pass red-team report only. ``` ### §9.15 User preference overlays for task design Task Agent must preserve user task-design preferences as first-class overlays, not disposable chat commentary. Examples: ```text Will prefers second-pass “look deeper” prompts in complex analysis tasks. Will often wants issue matrix + memo for complaint evaluation. Will wants final legal work products saved as work products/documents unless incognito. Will usually wants red-team before final legal drafting output. Will dislikes unnecessary model-choice questions during task creation. ``` These preferences should influence Blueprint proposals, question policy, output archetype defaults, artifact policy suggestions, and template matching, but they must not silently override explicit user instructions in the current design session. ### §9.16 Goal-based task-design learning Task design learning should not only record which module or pattern was used. It should record which goal the design served and whether that goal was advanced. ```ts TaskDesignGoalLearningRecord { record_id: string goal_ref: GoalRef source_kind: "design_session" | "task_run" | "task_assessment" | "user_feedback" source_ref: StorageRef task_type: string context_class_key: string pattern_refs: TaskDesignPatternRef[] template_refs: TaskTemplateRef[] module_preset_refs: ModulePresetRef[] design_decision_refs: string[] observed_effect: "advanced_goal" | "partially_advanced_goal" | "did_not_advance_goal" | "unknown" evidence_summary: string confidence: number schema_version: "1.0" } ``` This lets Task Agent recommend patterns because they serve a known objective, not merely because they appeared in a similar graph. --- ## §9A — Task invocation learning and BDSM/DOC8 integration ### §9A.1 TaskSuggestionFeedbackEvent ```ts export type TaskSuggestionFeedbackEvent = { event_id: string; user_request_ref: string; task_mode_decision_ref: string; suggestion_kind: | "task_suggested" | "task_chip_shown" | "task_not_suggested" | "task_agent_invoked" | "task_agent_not_invoked"; user_response: | "accepted" | "rejected" | "ignored" | "asked_direct_answer_instead" | "later_requested_task" | "complained_task_was_overused" | "complained_task_was_underused"; context_class_key: string; matched_template_ids: string[]; matched_invocation_directive_ids: string[]; evidence_ref: StorageRef; schema_version: "1.0"; }; ``` ### §9A.2 TaskInvocationLearningSignal ```ts export type TaskInvocationLearningSignal = { signal_id: string; user_request_ref: string; mode_decision_ref: string; signal_kind: | "task_suggestion_accepted" | "task_suggestion_rejected" | "task_chip_accepted" | "task_chip_ignored" | "task_overused" | "task_underused" | "direct_answer_preferred" | "task_mode_preferred" | "task_used_successfully" | "task_used_unsuccessfully" | "existing_template_missed" | "wrong_template_suggested" | "task_agent_consult_helpful" | "task_agent_consult_unhelpful"; context_class_key: string; task_type?: string; document_archetype?: string; matched_directive_ids: string[]; matched_template_ids: string[]; evidence_ref: StorageRef; recommended_update_kind: | "task_invocation_directive_update" | "task_mode_policy_update" | "template_semantic_card_update" | "question_policy_update" | "none"; schema_version: "1.0"; }; ``` ### §9A.3 Runtime influence BDSM/DOC8 compiled bundles may influence: ```text task suggestion suppression task suggestion boost task chip copy template matching priority Task Agent question policy Task Agent design defaults TaskInvocationDirective candidate promotion ``` BDSM/DOC8 MUST NOT: ```text silently run tasks silently mutate task graphs silently rewrite templates/presets silently change side-effect policies force injection of TKP into ordinary chat ``` ### §9A.4 Rate limits and suppression Task suggestions SHOULD be rate-limited by context class and recent rejection. Repeated rejection of task suggestions for a context class SHOULD suppress future non-explicit suggestions until sufficient countervailing evidence appears. --- ## §9B — Task Agent proposal-edit learning ### §9B.1 User edits are high-signal If Task Agent proposes a task and the user adds a module, removes a module, rewires context, changes output policy, changes a prompt, adds a loop, adds a judge, changes capability binding, or changes artifact intake, that edit is a high-value learning signal. ### §9B.2 TaskAgentProposalEditTrace ```ts export type TaskAgentProposalEditTrace = { trace_id: string; design_session_id: string; proposed_blueprint_ref: StorageRef; proposed_graph_ref: StorageRef; final_user_edited_graph_ref: StorageRef; graph_diff: GraphDiff; blueprint_diff?: BlueprintDiff; edit_kinds: Array< | "module_added" | "module_removed" | "module_rewired" | "config_changed" | "prompt_changed" | "capability_added" | "capability_removed" | "artifact_policy_changed" | "human_gate_added" | "loop_added" | "judge_added" | "template_changed" | "second_pass_deepening_added" | "source_grounding_added" >; interpreted_learning_signals: TaskAgentDesignLearningSignal[]; evidence_ref: StorageRef; schema_version: "1.0"; }; ``` ### §9B.3 Example ```text Task Agent proposed a simple complaint-evaluation graph. User added a second-pass “look deeper” module. Signal: Task Agent omitted preferred complex-analysis deepening pattern. Recommended update: Add or strengthen TaskDesignPatternCard: complex_analysis_second_pass_deepening. ``` --- ## §9C — Unified Task System Learning Report ### §9C.1 Separate ledgers, unified report Do not collapse all learning into one giant ledger. Keep separate evidence families: ```text task invocation utility task suggestion feedback task agent design utility template/preset utility capability/tool utility question utility artifact policy utility execution success/failure judge/experiment quality prompt-quality signals user graph edits ``` Unify them through shared context keys, evidence references, and reports. ### §9C.2 TaskSystemLearningReport ```ts export type TaskSystemLearningReport = { report_id: string; generated_at: string; scope: | "task_invocation" | "task_agent_design" | "task_execution" | "template_library" | "prompt_quality" | "portfolio" | "all"; invocation_findings: TaskInvocationFinding[]; task_agent_findings: TaskAgentDesignFinding[]; template_preset_findings: TemplatePresetFinding[]; execution_findings: ExecutionFinding[]; judge_experiment_findings: EvaluationFinding[]; prompt_quality_findings: PromptQualityFinding[]; user_feedback_findings: UserFeedbackFinding[]; artifact_policy_findings: ArtifactPolicyFinding[]; recommended_updates: TaskSystemImprovementProposal[]; schema_version: "1.0"; }; ``` ### §9C.3 TaskSystemImprovementProposal ```ts export type TaskSystemImprovementProposal = { proposal_id: string; proposal_kind: | "task_invocation_directive_update" | "task_mode_policy_update" | "task_agent_instruction_candidate" | "task_design_pattern_card_update" | "task_design_hint_card_update" | "template_semantic_card_update" | "module_preset_semantic_card_update" | "task_template_update" | "module_preset_update" | "artifact_policy_default_update" | "prompt_design_guide_card_update" | "prompt_update_proposal" | "question_policy_update"; target_ref: string; rationale: string; evidence_refs: StorageRef[]; support_count: number; counter_evidence_refs: StorageRef[]; proposed_patch_ref: StorageRef; prior_hash?: string; candidate_hash?: string; requires_eval: boolean; eval_suite_ref?: string; eval_result_ref?: string; approval_policy: | "auto_apply_low_risk" | "user_review" | "architect_review" | "never_auto_apply"; lifecycle_state: | "candidate" | "staged" | "approved" | "rejected" | "applied" | "rolled_back"; schema_version: "1.0"; }; ``` --- ## §10 — DOC72 Task Activity Memory ### §10.1 Purpose DOC72 Task Activity Memory turns tasks into durable, queryable, relationship-rich memory without turning DOC72 into raw log storage. ### §10.2 Three-layer memory model ```text Layer 1 — Operational run truth Owner: DOC23/EC Contents: exact logs, checkpoints, prompts, outputs, tool calls, spans, artifacts, errors, costs. Layer 2 — DOC72 Task Activity Memory Owner: DOC72, written by EC Contents: task/matter/entity/artifact/work-product/goal relationships, run summaries, structured activity memory, selected summaries/references. Layer 3 — Learning and utility signals Owner: DOC8/BDSM, written by EC Contents: friction, utility, capability fit, question usefulness, pattern success/failure, template/preset learning. ``` ### §10.3 TaskRunMemoryPolicy ```ts TaskRunMemoryPolicy { enabled: boolean mode: | "standard" | "reference_only" | "summary_reference" | "summary_plus_learning" | "memory_incognito" | "full_ephemeral_test" artifact_default_level: ArtifactIntakeLevel capture_prompts: "never" | "redacted" | "full_local_only" capture_outputs: "reference_only" | "redacted_summary" | "full_local_only" doc72_write_scope: | "none" | "task_run_only" | "task_run_plus_final_artifacts" | "task_run_plus_linked_artifacts" | "task_run_plus_learning_signals" entity_linking_enabled: boolean goal_linking_enabled: boolean matter_linking_enabled: boolean followup_session_linking_enabled: boolean retention_class: | "standard" | "high_value" | "matter_active" | "ephemeral" | "legal_work_product" user_can_retroactively_promote: boolean schema_version: "1.0" } ``` ### §10.4 Memory Incognito vs Full Ephemeral | Mode | Meaning | |---|---| | Memory Incognito | Suppresses DOC72/DOC8/BDSM optional memory and learning writes, but preserves operational run records needed for execution safety, audit, and user retrieval. | | Full Ephemeral/Test | Suppresses durable run details except minimum safety/failure receipts; intended for private experiments and test runs. | Business-grade workflows should default to audit-preserving modes, not full ephemeral modes. ### §10.5 TaskRunExecutionTracePayload Task runs are stored in DOC72 using existing `execution_trace` node kind. ```ts TaskRunExecutionTracePayload { kind: "doc23_task_run" task_id: string run_id: string task_name: string template_id?: string blueprint_id?: string preset_or_template_ref?: string status: TaskRunStatus started_at: string completed_at?: string trigger_summary: TriggerSummary work_context_refs: EntityRef[] matter_or_project_refs: EntityRef[] linked_goal_refs: GoalRef[] linked_task_entity_ref?: EntityRef workflow_summary: string graph_snapshot_ref: StorageRef blueprint_snapshot_ref?: StorageRef key_config_snapshot_ref: StorageRef module_activity_summaries: ModuleActivitySummary[] reviewed_document_refs: DocumentRef[] produced_artifact_refs: ArtifactRef[] produced_work_product_refs: WorkProductRef[] human_gate_summary: HumanGateSummary subagent_lineage_summary: SubAgentLineageSummary capability_usage_summary: CapabilityUsageSummary cost_summary: CostSummary quality_summary?: QualitySummary failure_summary?: FailureSummary assessment_refs: TaskAssessmentRef[] followup_session_refs: ConversationOrRoomRef[] raw_run_ref: StorageRef schema_version: "1.0" } ``` ### §10.6 Marex example If the user runs a `DRAFT COMPLAINT` task for Marex, DOC72 should be able to know and link: ```text Marex matter -> DRAFT COMPLAINT task template -> task run on date X -> source documents reviewed by modules A/B/C -> damages analysis memo produced by module D -> complaint draft produced by module E -> red-team report produced by module F -> final revised complaint work_product -> later red-team chat/room/session -> later revised complaint version -> task assessment and improvement proposals ``` This is not raw log storage. It is activity memory and relationship graphing. ### §10.7 Task Activity Graph ```ts TaskRunMemoryGraph { task_run_node: ExecutionTraceRef linked_task_entity: EntityRef linked_matter_entities: EntityRef[] linked_goal_nodes: GoalRef[] reviewed_document_refs: DocumentRef[] produced_artifact_refs: ArtifactRef[] produced_work_product_refs: WorkProductRef[] module_activity_summaries: ModuleActivitySummary[] human_review_refs: HumanReviewRef[] red_team_refs: EvaluationOrRedTeamRef[] followup_session_refs: ConversationOrRoomRef[] capability_usage_refs: CapabilityUsageRef[] assessment_refs: TaskAssessmentRef[] schema_version: "1.0" } ``` ### §10.8 Immediate vs nightly/ongoing ingestion Task memory integration has two lanes. #### §10.8.1 Immediate lane Immediate lane is deterministic or cheap and occurs at run start, module completion, artifact creation, and run completion. Immediate writes include: - TaskRun execution_trace stub at run start when memory policy permits. - Task/matter/entity refs already known from task config, active context, source docs, template, or user selection. - Graph snapshot ref. - Prompt/capability/policy snapshot refs. - Module status summaries. - ArtifactIndex entries. - Final output refs. - Cost and timing summaries. - Capability snapshot and usage receipts. - Human gate receipts. - Error/failure receipts. Immediate lane must not require a large agent extraction call. #### §10.8.2 Nightly/ongoing extraction lane Nightly/ongoing lane may use LLM-assisted extraction or heavier graph analysis. It should process: - long output summaries; - design cases from task creation sessions; - task-design rationale extraction; - goal-link extraction; - artifact importance classification; - template/preset semantic card updates; - Task Assessment synthesis if not run immediately; - DOC8/BDSM learning bundle compilation; - stale-policy/retention compaction; - cross-run pattern detection. #### §10.8.3 Trigger conditions A task run or artifact is queued for nightly extraction when: - artifact policy requests DOC72 summary/reference; - output exceeds immediate summary threshold; - output is opened/bookmarked/used downstream/cited; - user explicitly says save/make findable/remember this; - Task Agent or Task Assessment flags it as high-value; - run produced final work product; - run generated significant failure/learning signals; - task creation session had substantial user edits or accepted/rejected suggestions. ### §10.9 Retention and compaction Retention classes: ```ts TaskRetentionClass = | "standard" | "high_value" | "matter_active" | "legal_work_product" | "template_training" | "ephemeral" ``` Compaction must preserve: - TaskRun identity; - task/template/blueprint refs; - matter/entity/goal links; - final artifact/work-product refs; - assessment refs; - cost/quality/failure summaries; - enough module summary to explain what happened; - provenance refs for any durable memory claims. ### §10.10 Minimum DOC72 work-context capture When task memory is enabled, DOC72 Task Activity Memory must capture enough context for the task run to link through the broader memory matrix. The minimum capture set is: ```text Task identity task_id, run_id, task name, template/preset/blueprint refs, graph snapshot ref. Work context matter/case/project/personal context refs when known; active context slot refs; user-selected context refs. Entity links relevant people, organizations, matters, documents, libraries/corpora, rooms, panels, chats, email accounts, calendars, folders, and other known entities. Source/document links documents reviewed, downloaded, received, produced, or opened; DOC25 ingestion refs where available. Artifact/work-product links final outputs, high-value intermediate outputs, work products, red-team reports, judge/experiment outputs selected for retention. Goal links Blueprint goals and reusable DOC72 goal nodes served by the task or by major design decisions. Follow-on links later chats, rooms, red-team sessions, revised work products, task re-runs, and assessments linked back to the original run. ``` This minimum capture is what lets a later Marex red-team chat, a Marex complaint revision, a damages memo, and the original DRAFT COMPLAINT task all become connected parts of the memory matrix. ### §10.11 Task memory is not solely retrospective Task Activity Memory also supports prospective task assistance. When the user asks Task Agent to design or run a task, DOC72 memory can supply: - relevant matters/entities/work contexts; - prior tasks and templates for the same entity or matter; - prior outputs and work products; - user preferences and task-design goals; - known failure modes or assessments; - relevant libraries/corpora and documents; - prior capability/procedure/tool experiences. Task Agent must distinguish between memory-derived context and current user-provided instructions when presenting a proposal. --- ## §11 — Artifact and output intake policy ### §11.1 Purpose Task outputs vary widely. Some are transient control data. Some are low-value intermediate drafts. Some are high-value memos. Some are final work products. Some are downloaded/received documents. Some are emails/texts. The system must let the user and Task Agent control what becomes findable, summarized, ingested, or added to a library. ### §11.2 ArtifactIntakeLevel Use technical enum internally, with user-friendly labels in UI. ```ts type ArtifactIntakeLevel = | "run_only" | "doc72_reference" | "doc72_summary_reference" | "doc25_document_ingestion" | "doc25_work_product_ingestion" | "doc73_library_ingestion" | "no_capture" ``` | Enum | User-facing label | Meaning | |---|---|---| | `no_capture` | Do not save | Suppress optional artifact capture; only minimum execution receipts if required. | | `run_only` | Keep with this run | Stored in task run/artifact storage; retrievable from Run Inspector while retained. | | `doc72_reference` | Make findable as a linked item | DOC72 stores metadata/ref only; no substantive summary. | | `doc72_summary_reference` | Make findable with summary | DOC72 stores summary/key points/entities/ref. | | `doc25_document_ingestion` | Process as document | DOC25 converts/dedups/processes the document; endpoint chosen by policy. | | `doc25_work_product_ingestion` | Save as work product | DOC25 processes; DOC72 links as work_product where appropriate. | | `doc73_library_ingestion` | Add to library | DOC25 ingestion plus DOC73 library/corpus binding/extraction. | ### §11.3 DOC25 is a processor, not always the endpoint DOC25 routes documents through ingestion, conversion, deduplication, quality reporting, and document intelligence. The destination after DOC25 may be: - task run artifact; - DOC72 work_product; - DOC72 document/work-product-linked node; - DOC73 library/corpus member; - user file location; - output delivery attachment. The UI must not imply “route through DOC25” is the final destination. It should say what happens after processing. ### §11.4 TaskArtifactIntakePolicy ```ts TaskArtifactIntakePolicy { task_default: ArtifactIntakeLevel final_output_default: ArtifactIntakeLevel intermediate_output_default: ArtifactIntakeLevel produced_document_default: ArtifactIntakeLevel downloaded_document_default: ArtifactIntakeLevel received_attachment_default: ArtifactIntakeLevel sent_message_default: ArtifactIntakeLevel received_message_default: ArtifactIntakeLevel per_module_overrides: Record per_port_overrides: Record auto_candidate_rules: ArtifactCandidateRule[] user_review_required_above_level: ArtifactIntakeLevel incognito_override: "suppress_doc72_doc25" | "suppress_all_optional_memory" schema_version: "1.0" } ``` ### §11.5 ModuleArtifactIntakePolicy ```ts ModuleArtifactIntakePolicy { module_id: string module_type: string capture_mode: | "not_applicable" | "default" | "never_capture" | "keep_with_run" | "make_findable" | "make_findable_with_summary" | "process_documents" | "save_work_product" | "add_to_library" | "ask_each_time" port_overrides: Record output_archetype_hints: OutputArchetype[] high_value_candidate: boolean review_required: boolean schema_version: "1.0" } ``` ### §11.6 Default policy Recommended defaults: ```text Utility/control modules such as junctions, switches, signals, delay, hold: not_applicable or keep_with_run only. Most intermediate module outputs: keep_with_run. Intermediate analysis outputs marked high-value by module/preset/policy: ask_each_time or make_findable_with_summary candidate. Final textual/file output: make_findable_with_summary. Substantial final document/work product: process as document + save work product candidate. Downloaded/received documents or attachments: process as document if policy allows. Library/corpus-targeted tasks: add to library. Judge/experiment raw artifacts: keep_with_run unless submitted to evaluation dataset, assessment, or memory review. Incognito: suppress DOC72/DOC25/DOC8 optional writes unless user explicitly promotes afterward and policy allows. ``` ### §11.7 ArtifactCandidateRule ```ts ArtifactCandidateRule { rule_id: string applies_to: | "final_output" | "intermediate_output" | "produced_document" | "downloaded_document" | "received_attachment" | "judge_result" | "red_team_report" | "analysis_memo" | "user_opened_artifact" | "downstream_used_artifact" condition: string suggested_level: ArtifactIntakeLevel rationale: string requires_user_review: boolean schema_version: "1.0" } ``` ### §11.8 TaskArtifactIndexEntry Every material output receives an index entry even if it remains run-only, unless no-capture mode applies. ```ts TaskArtifactIndexEntry { artifact_id: string task_id: string run_id: string module_id: string activation_seq?: number port_name: string artifact_kind: | "text_output" | "file_output" | "document" | "work_product" | "email" | "text_message" | "judge_result" | "experiment_bundle" | "red_team_report" | "subagent_result" | "tool_result" | "system_receipt" display_name: string output_archetype?: string storage_ref: StorageRef intake_level: ArtifactIntakeLevel doc72_ref?: string doc25_ingestion_ref?: string doc73_library_ref?: string work_product_ref?: string summary?: string key_points?: string[] linked_entity_refs: EntityRef[] linked_goal_refs: GoalRef[] created_at: string lifecycle_state: "active" | "promoted" | "compacted" | "deleted" | "expired" schema_version: "1.0" } ``` ### §11.9 User-facing artifact settings Avoid overly technical or domain-specific labels. Suggested UI: ```text Memory & Artifact Handling ────────────────────────── Final results [Make findable with summary ▾] Intermediate outputs [Keep with this run ▾] High-value intermediate work [Ask whether to make findable ▾] Documents produced by task [Process and save if final ▾] Documents received/downloaded [Process as documents ▾] Add selected outputs to library [Off / Ask / Auto for library tasks ▾] Private/incognito run [Off] ``` Per module: ```text This module's outputs Capture behavior: [Use task default ▾] Options: Do not save Keep with this run Make findable Make findable with summary Process documents Save final work product Add to library Ask each time ``` ### §11.10 Message and communication artifacts Artifact intake must handle more than markdown outputs and files. Tasks may send or receive emails, texts, chat messages, forum posts, task messages, webhook payloads, and notification receipts. Default handling: ```text Sent email/text/chat/forum outputs Store receipt with run; make substantive body/output findable only if policy says so. Received email/text/chat/forum trigger inputs Store enough metadata for run audit; route documents/attachments through DOC25 if policy allows. Communication body text Treat as text artifact only when it is itself an output/input of analytical value. Headers/sender/thread metadata Use as context/entity linkage; do not promote as substantive memory unless needed. External-send receipts Preserve side-effect idempotency key, policy snapshot, recipient/account, timestamp, and delivery status. ``` The UI should describe the endpoint plainly, e.g., “Save email receipt only,” “Make email body findable,” “Process attachments as documents,” or “Save final sent memo as work product.” --- ## §12 — Telemetry and observability ### §12.1 Purpose DOC23 needs production-grade telemetry so a business or individual can see exactly what happened, what is happening, what was prompted, what was produced, what tools were used, what failed, and what should improve. ### §12.2 TaskTraceSpan Every meaningful task operation should emit a span-like record. ```ts TaskTraceSpan { span_id: string trace_id: string parent_span_id?: string task_id: string run_id: string module_id?: string activation_seq?: number span_kind: | "task_run" | "module_activation" | "llm_call" | "prompt_assembly" | "tool_call" | "mcp_call" | "connector_call" | "doc24_packet_resolution" | "capability_mount" | "runtime_capability_expansion" | "subagent_spawn" | "policy_decision" | "human_review" | "file_document_transform" | "artifact_emission" | "output_delivery" | "judge_eval" | "experiment_variant" | "retry" | "fallback" | "wait_state" | "resume" | "error" name: string status: "pending" | "running" | "success" | "error" | "cancelled" | "skipped" | "waiting" started_at: string completed_at?: string duration_ms?: number input_ref?: StorageRef output_ref?: StorageRef summary?: string error?: TaskError cost_usd?: number token_count?: { input: number; output: number } model_ref?: string agent_ref?: string prompt_snapshot_ref?: StorageRef capability_snapshot_ref?: StorageRef policy_snapshot_ref?: StorageRef doc24_packet_snapshot_ref?: StorageRef subagent_trace_ref?: SubAgentTraceRef artifact_refs: ArtifactRef[] tags: string[] schema_version: "1.0" } ``` ### §12.3 EffectivePromptSnapshot ```ts EffectivePromptSnapshot { snapshot_id: string task_id: string run_id: string module_id: string activation_seq: number model_ref: string agent_ref: string cIL_layer_refs: StorageRef[] doc24_packet_ref?: StorageRef system_instruction_ref?: StorageRef module_instruction_ref?: StorageRef dynamic_instruction_ref?: StorageRef context_bundle_ref?: StorageRef chain_history_projection_ref?: StorageRef attached_file_refs: FileRef[] prompt_text_ref?: StorageRef redaction_mode: "none" | "redacted" | "hash_only" | "not_captured" prompt_hash: string created_at: string schema_version: "1.0" } ``` ### §12.4 CapabilitySnapshot ```ts CapabilitySnapshot { snapshot_id: string task_id: string run_id: string module_id?: string required_capabilities: CapabilityRef[] optional_capabilities: CapabilityRef[] mounted_capabilities: CapabilityRef[] denied_capabilities: CapabilityRef[] live_state_summary: CapabilityLiveStateSummary[] mcp_server_refs: McpServerRef[] connector_refs: ConnectorRef[] procedure_refs: ProcedureRef[] model_availability_snapshot: ModelAvailabilitySnapshot doc24_registry_generation_id: string created_at: string schema_version: "1.0" } ``` ### §12.5 PolicyDecisionSnapshot ```ts PolicyDecisionSnapshot { snapshot_id: string task_id: string run_id: string module_id?: string decision_kind: | "capability_mount" | "tool_call" | "external_send" | "file_write" | "memory_write" | "doc25_ingestion" | "doc73_library_add" | "subagent_spawn" | "model_selection" decision: "allowed" | "blocked" | "requires_approval" | "degraded" reason_codes: string[] policy_generation_id: string exposure_context_ref?: StorageRef created_at: string schema_version: "1.0" } ``` ### §12.6 Run Inspector extensions Run Inspector should include: 1. **Trace Tree** — hierarchical span tree. 2. **Timeline** — Gantt-style module/subagent/tool/wait view. 3. **Graph Progress** — current active module(s), completed modules, waiting modules. 4. **Prompt & Context** — effective prompt snapshots, CIL layers, DOC24 packets, context bundle, chain history. 5. **Capabilities & Tools** — mounted capabilities, runtime expansions, denied/blocked tools. 6. **Artifacts & Outputs** — all TaskArtifactIndex entries with filters. 7. **Costs** — per module, per model, per subagent, total. 8. **Quality & Judges** — Judge/Experiment/Claim Extractor summaries. 9. **Anomalies & Failures** — errors, retries, validation warnings, policy blocks. 10. **Memory & Learning** — DOC72 TaskRun memory, artifact intake, learning signals, assessments. ### §12.7 Task History view A global Task History/Runs view should filter by: - task; - template; - matter/entity; - status; - date; - trigger; - output type; - artifact saved/promoted; - assessment quality; - failure type; - capability used; - cost range. Task Agent uses this view internally and conversationally. ### §12.8 OTEL/export posture DOC23 should define ELNOR-native trace schemas first. It may provide OpenTelemetry-compatible export adapters, but no external telemetry standard should own DOC23's internal trace truth. --- ## §12A — Task Agent operational truth and evaluation evidence consumption ### §12A.1 Task Agent knowledge sources during operation When Task Agent answers questions about a task, reviews a task, designs a task, or generates an assessment, it should distinguish three evidence layers: ```text Operational run truth: DOC23/EC-owned run records, activation records, checkpoints, logs, prompt snapshots, capability snapshots, policy snapshots, tool calls, sub-agent traces, artifacts, costs, errors, and Run Inspector read models. DOC72 task memory: summary/reference memory, entity/matter/work-product/artifact links, TaskRun execution_trace hubs, goals, task invocation directives, template/preset semantic cards, and prior design cases. Learning signals: DOC8/BDSM utility, friction, question utility, task invocation utility, prompt utility, capability utility, user edit traces, judge/experiment evidence, and promotion/rollback outcomes. ``` Task Agent MUST NOT confuse these layers. Operational run truth is the audit/debug source of truth. DOC72 task memory is the semantic memory substrate. Learning signals inform recommendations and policy proposals. ### §12A.2 TaskAgentOperationalConsultTrace ```ts export type TaskAgentOperationalConsultTrace = { consult_id: string; entrypoint: TaskAgentEntrypoint; user_request_ref: string; task_id?: string; run_id?: string; module_id?: string; consulted_operational_truth_refs: StorageRef[]; consulted_doc72_memory_refs: string[]; consulted_learning_signal_refs: string[]; consulted_tkp_card_refs: string[]; consulted_capability_refs: string[]; response_kind: | "status_answer" | "output_retrieval" | "graph_explanation" | "design_recommendation" | "assessment" | "prompt_improvement" | "task_mode_advice"; degraded_reasons: string[]; created_at: string; schema_version: "1.0"; }; ``` ### §12A.3 TaskInvocationTrace Every nontrivial task-mode decision should be recordable as an audit/learning trace even when no packet is injected. ```ts export type TaskInvocationTrace = { trace_id: string; user_request_ref: string; task_mode_decision_ref: string; initial_user_text_summary: string; resolved_intent_summary: string; selected_mode: ElnorExecutionMode; task_opportunity_level: TaskOpportunityLevel; matched_directive_ids: string[]; matched_task_template_ids: string[]; matched_module_preset_ids: string[]; task_agent_invoked: boolean; task_agent_entrypoint?: TaskAgentEntrypoint; task_agent_consult_ref?: string; user_visible_task_suggestion_shown: boolean; user_response?: | "accepted" | "rejected" | "ignored" | "asked_direct_answer_instead" | "later_requested_task"; resulting_task_id?: string; resulting_run_id?: string; resulting_task_receipt_ref?: string; learning_signal_refs: string[]; created_at: string; schema_version: "1.0"; }; ``` ### §12A.4 Judge / Experiment / Claim / Optimization evidence Task Agent MUST be able to inspect Addenda A evidence when reviewing or assessing tasks: ```text Experiment variants and ComparisonBundles VariantOutputBundles JudgeScoreBundles and dimension audit trails Claim extraction outputs Claim-level verdicts and source spans sub-agent trace refs used by judges/verification prompt variant outcomes human labels when available promotion/rollback ledger entries when operative online scoring/drift signals when operative DSPy/GEPA optimization results when R5 substrate is operative ``` Task Agent uses this evidence to generate Task Assessment findings such as: ```text prompt underperformed; judge rubric was unscorable; variant B outperformed baseline; red-team module caught recurring weakness; source-grounding was insufficient; module output shape was not downstream-compatible; second-pass deepening improved results; capability/tool was unused or misapplied; template/preset should be updated or not updated. ``` Task Agent MUST NOT auto-promote experiment winners, judge prompt revisions, DSPy candidates, or task template changes without the relevant promotion ledger, approval, and rollback contracts. ### §12A.5 Task Agent eval suite must test invocation and prompt behavior Task Agent's own evaluation suite SHOULD include: ```text whether it over-suggests tasks for direct requests; whether it under-suggests tasks for high-fit saved-task requests; whether it reuses templates instead of creating duplicates; whether it uses Task Agent entrypoints correctly; whether it designs valid graphs; whether it routes context correctly; whether it chooses appropriate module prompts; whether it uses Prompt Advisor service only when appropriate; whether it creates prompt-evaluation tasks only for reusable/high-value prompts; whether it avoids Prompt Lab / Promptolution / DSPy product-surface fragmentation; whether it respects DOC24 packet limits and TKP boundaries; whether it degrades honestly when DOC11/OpenClaw/TKP/DOC72/Addenda A evidence is unavailable. ``` --- ## §13 — DOC24 capability binding and runtime expansion ### §13.1 Purpose Modules should declare intended capabilities, but DOC24 should resolve live tools/connectors/MCP/procedures/models and may expand at runtime when upstream prompts or outputs create new needs within policy. ### §13.2 Two-pass capability resolution #### Pass 1 — design-time / run-start binding At task save or run start, modules declare required/allowed/denied capabilities. #### Pass 2 — runtime DOC24 expansion When a module receives unexpected content or a new need, DOC24 may add relevant capabilities within the module's allowed envelope, emitting an expansion receipt. ### §13.3 ModuleCapabilityPolicy ```ts ModuleCapabilityPolicy { mode: "auto" | "explicit" | "explicit_plus_auto_relevant" | "none" required_capabilities: CapabilityRef[] allowed_capabilities: CapabilityRef[] denied_capabilities: CapabilityRef[] preferred_procedures: ProcedureRef[] allowed_mcp_servers: McpServerRef[] connector_refs: ConnectorRef[] runtime_expansion_policy: RuntimeExpansionPolicy side_effect_policy: SideEffectPolicy snapshot_required: boolean schema_version: "1.0" } ``` ### §13.4 RuntimeExpansionPolicy ```ts RuntimeExpansionPolicy { enabled: boolean expansion_basis: | "upstream_prompt_detected" | "missing_required_capability" | "failure_recovery" | "doc24_relevance" max_added_capabilities: number capabilities_outside_allowlist_require_approval: boolean external_side_effects_require_approval: boolean emit_expansion_receipt: boolean schema_version: "1.0" } ``` ### §13.5 RuntimeCapabilityExpansionReceipt ```ts RuntimeCapabilityExpansionReceipt { receipt_id: string task_id: string run_id: string module_id: string activation_seq: number detected_need: string basis: RuntimeExpansionPolicy["expansion_basis"] added_capabilities: CapabilityRef[] blocked_capabilities: CapabilityRef[] approval_required: boolean approval_ref?: string doc24_packet_generation_id: string policy_snapshot_ref: StorageRef created_at: string schema_version: "1.0" } ``` ### §13.6 Example: PACER in legal module ```text Module B was configured for legal research. Module A sends: “Check the PACER docket for Marex filings.” DOC24 detects PACER need. If PACER is allowed but not mounted -> mount PACER and emit receipt. If PACER is not allowed -> pause or ask approval. If PACER is denied -> block and explain. ``` Module binding example: ```ts required_capabilities: [ "capability:doc25.retrieve_document_pages" ] optional_capabilities: [ "capability:pacer.search_docket", "capability:pacer.retrieve_filing", "capability:legal_citation_check" ] denied_capabilities: [ "capability:email.send", "capability:file.delete" ] ``` ### §13.7 MCP posture MCP servers are capability providers. They must register into DOC24 with stable capability IDs, health, auth, side-effect class, schema hash, and least-privilege tool-pack policies. DOC23 modules should not mount raw MCP tool sprawl directly. --- ## §13A — Task-module DOC24 context injection and surface/run scope isolation ### §13A.1 Purpose DOC24 context injection into DOC23 task modules must make task modules smarter without creating context bleed from unrelated chats, unrelated active work, or stale user focus. A task run may occur later, recur on a schedule, trigger from email, trigger from a webhook, launch from a task-complete event, or run while the user has several unrelated chats open. Therefore a task module's DOC24 packet must be scoped to the **task run and its launch/trigger evidence**, not to a singular notion of what Elnor is currently working on. This section governs DOC24 context packets for all task modules that receive model-facing instructions, tool/capability awareness, entity/memory cards, procedures, document references, library references, prior task memory, or policy constraints. ### §13A.2 No singular current-work-context assumption Addenda B SHALL NOT assume there is one global "current work context" that can safely govern task-module injection. ELNOR may have multiple concurrent chats, rooms, task runs, panels, browser sessions, task editors, and background processes. A user may be discussing DOC73 in one chat, Marex in another chat, and have a scheduled docket-monitor task running for an unrelated matter. The scheduled task must not inherit the DOC73 or Marex chat context unless the task run is explicitly bound to one of those contexts. Normative rule: ```text There is no default global current context for DOC23 task modules. Each task run has its own TaskRunScopeEnvelope. DOC24 assembles module context from that run envelope, the task definition, the trigger/invocation payload, module inputs, explicit bindings, policies, and relevant durable memory. ``` DOC24's active-context / working-context machinery may help resolve a user utterance or launch surface, but it is not a default source of model-facing task context. DOC24 R3 already treats active context as optional, defeatable, and non-authoritative with respect to graph truth, memory truth, and live capability truth; DOC23 task modules must preserve that boundary. ### §13A.3 Task Context Isolation Invariant A task module's DOC24 context packet SHALL be assembled from: ```text - the TaskRunScopeEnvelope; - the task's declared scope; - run inputs and trigger payloads; - module config and module capability/context policy; - graph wiring and upstream outputs; - blueprint/template bindings; - explicit source/document/library bindings; - task invocation directives; - environment.config; - policy-approved durable memory; - prior task memory relevant to the task/run scope; - and live DOC24 capability/tool/model availability. ``` The user's active chat, active work, active matter, active library, or currently focused project SHALL NOT be injected into a task module merely because it is currently salient to Elnor, recently used, or present in another surface. Surface/chat context MAY be used only when one of the following is true: ```text 1. The task run was explicitly created or launched from that surface and the run envelope records that it inherits or selects that surface context. 2. The user manually selected the surface/chat/document/library/matter context for the run. 3. The task blueprint or task template declares an explicit context binding. 4. The trigger payload, source module, selected document, or run input resolves to the same entity, matter, work product, library, source set, or task template. 5. DOC24 resolves a high-confidence entity/matter/library match from task-owned evidence, not from recency alone. 6. The task is itself about that surface/system/spec/work context, such as a task to review DOC73, CSA, or a specified ELNOR spec file. ``` Default rule: ```text Surface context is candidate evidence for scope resolution. It is not automatic injected context. ``` ### §13A.4 Scheduled, triggered, and background task rule For scheduled, email-triggered, webhook-triggered, file-watcher-triggered, external-event, task-complete, and other background runs, ordinary chat context is **not applicable by default**. The strongest scope evidence is the trigger/run payload plus the saved task definition. ```text Scheduled run: Scope comes from saved task/template/blueprint, recurrence config, environment.config, selected source sets, saved matter/library/document bindings, and prior run state. It does not inherit whichever chat is active at the scheduled time. Email-triggered run: Scope comes from the email account, folder, sender, recipients, subject, body, thread, attachments, source-classification rules, attachment ingestion results, and task trigger config. It does not inherit the user's current chat/work focus. Webhook/file/external-event run: Scope comes from the event payload, authenticated source, correlation key, configured trigger mapping, and saved task bindings. It does not inherit active chat context. ``` When a background task needs persistent matter/library/source context, that context must be saved in the task definition, blueprint, template, trigger mapping, invocation directive, environment.config, or source module configuration. It must not be recovered from whatever the user is doing at run time. ### §13A.5 Scoped injection does not mean starved injection This section does not require minimal context. It requires correct context. DOC24 must not over-limit task modules so much that they lose the benefits of ELNOR memory, preferences, tools, and procedures. The following categories remain eligible even when unrelated surface context is excluded: ```text 1. Global hard constraints and safety rules. 2. Global or domain user preferences that are relevant to the task class. 3. Standing procedures or memory directives that apply to the task's entity, matter, domain, source, trigger, or side-effect class. 4. Task-template and module-preset semantic cards. 5. Prior task-run memory for the same task, template, matter, work product, source set, trigger family, or explicitly linked artifact. 6. Task-relevant tool/capability/procedure utility bundles. 7. Artifact/output intake policy and retention policy. 8. Source-grounding, re-read, citation, privilege, and outbound side-effect restrictions. 9. User-authored task blueprint goals, quality criteria, output preferences, and review gates. 10. Trigger-specific context, such as email thread context or recurrence-state context, when it is part of the TaskRunScopeEnvelope. ``` Example: if Will generally prefers an issue matrix plus revision plan for complaint review, that preference may be injected into a complaint-evaluation task even if several active chats are about DOC73, CSA, or prompt evaluation. But those unrelated chats themselves should not be injected unless the task run is bound to them. ### §13A.6 Profiles control categories, not relevance Context injection profiles control which categories may be included; they do not decide which specific items are relevant. ```text profile = "domain_aware" means: task-relevant entities, memories, preferences, procedures, and restrictions may be included. does not mean: inject Elnor's active chat/work context, recent DOC73 discussion, current CSA work, or unrelated recent memories. ``` A profile is an upper bound on eligible component families. Task/run/module scope, trigger evidence, policy, and relevance still decide item selection. ### §13A.7 TaskRunScopeEnvelope Each task run SHALL have a `TaskRunScopeEnvelope` computed at run start. DOC24 context packets for module activations derive from this envelope plus module-specific activation evidence. ```ts TaskRunScopeEnvelope { envelope_id: string task_id: string run_id: string launch_surface: | "chat" | "task_ui" | "task_canvas" | "run_inspector" | "schedule_trigger" | "email_trigger" | "webhook_trigger" | "file_watcher_trigger" | "external_event" | "task_complete_trigger" | "manual_api" | "ec_system_job" | "unknown" launch_surface_ref?: StorageRef surface_context_policy: | "not_applicable" // background/triggered runs by default | "none" // launch surface exists but context not inherited | "explicit_inherit" // user or task receipt explicitly inherits surface context | "explicit_selected" // user selected specific surface/context refs | "trigger_bound" // trigger payload supplies the run's scope | "template_bound" // template/blueprint supplies scope | "forbidden" // policy forbids surface-context inheritance inherited_surface_context_refs: StorageRef[] selected_surface_context_refs: StorageRef[] resolved_scope: { entity_refs: EntityRef[] matter_refs: EntityRef[] work_product_refs: WorkProductRef[] document_refs: DocumentRef[] library_refs: LibraryRef[] source_set_refs: StorageRef[] task_template_refs: string[] module_preset_refs: string[] goal_refs: GoalRef[] invocation_directive_refs: string[] } trigger_scope?: TriggerScopeSummary schedule_scope?: ScheduledRunScopeSummary email_scope?: EmailTriggerScopeSummary policy_snapshot_ref: StorageRef source_visibility_snapshot_ref?: StorageRef created_at: string schema_version: "1.0" } ``` Required defaults: ```text For launch_surface = schedule_trigger / email_trigger / webhook_trigger / file_watcher_trigger / external_event / task_complete_trigger / ec_system_job: surface_context_policy defaults to "not_applicable" or "trigger_bound". inherited_surface_context_refs defaults to []. For launch_surface = chat: surface_context_policy defaults to "none" unless the user request, task receipt, or TaskModeDecision explicitly chooses inheritance/selection. ``` ### §13A.8 TriggerScopeSummary, ScheduledRunScopeSummary, and EmailTriggerScopeSummary Triggered/background runs need source-specific scope records so DOC24 can assemble context without relying on active chat state. ```ts TriggerScopeSummary { trigger_module_id: string trigger_kind: | "email" | "schedule" | "webhook" | "file_watcher" | "discord" | "forum_panel" | "task_complete" | "manual_api" | "system_job" correlation_key?: string trigger_payload_ref?: StorageRef resolved_entity_refs: EntityRef[] resolved_document_refs: DocumentRef[] resolved_library_refs: LibraryRef[] source_classification_refs: StorageRef[] confidence: number schema_version: "1.0" } ScheduledRunScopeSummary { schedule_id: string recurrence_rule_ref: StorageRef saved_scope_refs: StorageRef[] last_run_scope_ref?: StorageRef run_window_start: string run_window_end?: string carry_forward_scope_from_prior_run: boolean schema_version: "1.0" } EmailTriggerScopeSummary { email_account_ref: string mailbox_or_folder_ref?: string message_id: string thread_id?: string sender_ref?: EntityRef recipient_refs: EntityRef[] subject_summary: string body_summary_ref?: StorageRef attachment_refs: DocumentRef[] doc25_ingestion_result_refs: StorageRef[] source_classification_refs: StorageRef[] new_case_or_matter_candidate_refs: EntityRef[] schema_version: "1.0" } ``` Email-triggered tasks that create or update matters, cases, work products, libraries, or task runs SHOULD record the trigger-derived entities as candidate or confirmed scope according to DOC72/DOC24/DOC1 governance, but they MUST NOT infer scope from unrelated concurrent chats. ### §13A.9 TaskModuleContextPacket DOC24 SHALL produce a task-scoped packet for each model-facing task-module activation where DOC24 context is enabled. ```ts TaskModuleContextPacket { packet_id: string task_id: string run_id: string module_id: string module_type: string activation_seq: number run_scope_envelope_ref: StorageRef assembly_scope: { invocation_context: "task_module" task_context_binding: | "none" | "explicit_task_scope" | "inherited_from_launch_surface" | "selected_surface_context" | "trigger_resolved" | "schedule_bound" | "email_bound" | "template_bound" | "blueprint_bound" | "manual_user_selected" | "module_input_resolved" | "library_bound" | "system_context_task" surface_context_used: boolean surface_context_reason?: string surface_context_refs: StorageRef[] surface_context_policy: TaskRunScopeEnvelope["surface_context_policy"] } context_basis: TaskModuleContextBasis injection_profile: TaskContextInjectionProfileRef injected_layers: { tools: ToolCapabilityCard[] procedures: ProcedureCard[] entity_cards: EntityCard[] memory_excerpts: MemoryExcerpt[] restrictions: RestrictionCard[] preferences: PreferenceCard[] documents: DocumentContextCard[] libraries: LibraryContextCard[] prior_task_runs: PriorTaskRunCard[] specialist_agents: SpecialistAgentCard[] output_artifact_policy: ArtifactPolicyCard[] task_blueprint_context: BlueprintContextCard[] task_template_context: TaskTemplateContextCard[] prompt_or_evaluation_guidance: PromptEvaluationGuidanceCard[] trigger_context: TriggerContextCard[] recurrence_context: RecurrenceContextCard[] } excluded_candidates: ExcludedContextCandidate[] policy_snapshot_ref: StorageRef capability_snapshot_ref: StorageRef source_refs: StorageRef[] token_budget: { requested_tokens: number actual_tokens: number max_tokens: number trimmed_items: string[] } packet_generation_id: string created_at: string schema_version: "1.0" } ``` ### §13A.10 ExcludedContextCandidate DOC24 SHALL record meaningful excluded candidates when context is considered but not injected. This is required for Run Inspector, Task Assessment, and debugging context omissions or context bleed. ```ts ExcludedContextCandidate { candidate_kind: | "surface_context" | "active_chat_context" | "active_work_context" | "memory_directive" | "entity_card" | "library" | "prior_task_run" | "procedure" | "tool_capability" | "document" | "task_template" | "module_preset" candidate_ref: string exclusion_reason: | "not_task_bound" | "not_run_bound" | "surface_context_not_applicable" | "low_relevance" | "wrong_matter" | "wrong_library" | "wrong_task_type" | "wrong_trigger_scope" | "policy_blocked" | "budget_trimmed" | "profile_excludes_component" | "module_config_denied" | "incognito_or_memory_suppressed" | "active_context_recency_only" | "requires_user_selection" | "source_visibility_blocked" explanation?: string schema_version: "1.0" } ``` ### §13A.11 TaskModuleContextBasis DOC24 must assemble module context from the actual module purpose and activation inputs. It must not assume every module has a single `config.instruction` field. ```ts TaskModuleContextBasis { module_type: string primary_instruction_summary: string | null output_contract_summary: string | null evaluation_or_review_basis?: string data_input_summary?: string context_input_summary?: string instruction_input_summary?: string file_input_summary?: string trigger_payload_summary?: string resolved_entity_refs: EntityRef[] resolved_document_refs: DocumentRef[] resolved_library_refs: LibraryRef[] resolved_task_artifact_refs: TaskArtifactRef[] side_effect_class?: SideEffectClass schema_version: "1.0" } ``` Required module-specific extraction examples: ```ts getTaskModuleContextBasis(module, activation): TaskModuleContextBasis { switch (module.type) { case "step.agent_task": return { module_type: module.type, primary_instruction_summary: summarize(module.config.instruction), output_contract_summary: summarize(module.config.output_contract), data_input_summary: summarizePortInput(activation.data_in), context_input_summary: summarizePortInput(activation.context_in), instruction_input_summary: summarizePortInput(activation.instruction_in), file_input_summary: summarizeFileInputs(activation.file_only_in), trigger_payload_summary: summarizeTriggerPayload(activation.trigger_payload), resolved_entity_refs: resolveEntitiesFromActivation(activation), resolved_document_refs: resolveDocumentsFromActivation(activation), resolved_library_refs: resolveLibrariesFromActivation(activation), resolved_task_artifact_refs: resolveTaskArtifactsFromActivation(activation), schema_version: "1.0", } case "step.judge": return { module_type: module.type, primary_instruction_summary: summarizeJudgeDimensions(module.config.dimensions), output_contract_summary: "structured judge scores and audit trail", evaluation_or_review_basis: summarize(module.config.evaluation_instruction), data_input_summary: summarizePortInput(activation.candidate_in ?? activation.data_in), context_input_summary: summarizePortInput(activation.comparison_bundle_in ?? activation.context_in), resolved_entity_refs: resolveEntitiesFromActivation(activation), resolved_document_refs: resolveDocumentsFromActivation(activation), resolved_library_refs: resolveLibrariesFromActivation(activation), resolved_task_artifact_refs: resolveTaskArtifactsFromActivation(activation), schema_version: "1.0", } case "step.agent_review_gate": return { module_type: module.type, primary_instruction_summary: summarize(module.config.review_criteria), output_contract_summary: "approve / revise / reject / feedback", data_input_summary: summarizePortInput(activation.data_in), context_input_summary: summarizePortInput(activation.context_in), resolved_entity_refs: resolveEntitiesFromActivation(activation), resolved_document_refs: resolveDocumentsFromActivation(activation), resolved_library_refs: resolveLibrariesFromActivation(activation), resolved_task_artifact_refs: resolveTaskArtifactsFromActivation(activation), schema_version: "1.0", } case "output.file": return { module_type: module.type, primary_instruction_summary: summarize( module.config.naming_instructions + "\n" + module.config.directory_instructions ), output_contract_summary: "file path, naming, overwrite policy, receipt", data_input_summary: summarizePortInput(activation.data_in), context_input_summary: summarizePortInput(activation.context_in), side_effect_class: "file_write", resolved_entity_refs: resolveEntitiesFromActivation(activation), resolved_document_refs: resolveDocumentsFromActivation(activation), resolved_library_refs: resolveLibrariesFromActivation(activation), resolved_task_artifact_refs: resolveTaskArtifactsFromActivation(activation), schema_version: "1.0", } case "step.coding": return { module_type: module.type, primary_instruction_summary: summarizeCodingAssignments(module.config.assignments), output_contract_summary: "diffs, test results, changed files, receipt", data_input_summary: summarizePortInput(activation.data_in), context_input_summary: summarizePortInput(activation.context_in), instruction_input_summary: summarizePortInput(activation.instruction_in), resolved_entity_refs: resolveEntitiesFromActivation(activation), resolved_document_refs: resolveDocumentsFromActivation(activation), resolved_library_refs: resolveLibrariesFromActivation(activation), resolved_task_artifact_refs: resolveTaskArtifactsFromActivation(activation), schema_version: "1.0", } } } ``` ### §13A.12 Context-source priority cascade DOC24 SHALL assemble task-module context using this priority cascade: ```text 1. Hard policy, security, privilege, side-effect, and source-visibility constraints. 2. TaskRunScopeEnvelope: launch/trigger type, selected scope, trigger payload, schedule/email/webhook/file/task-complete context. 3. Module config and module capability/context policy. 4. Runtime inputs: data_in, context_in, instruction_in, file inputs, trigger payloads, and resolved entity/document/library refs. 5. Task-level scope: Blueprint, template, invocation directive, environment.config, selected source modules, source sets, and declared matter/library bindings. 6. Prior task memory: same task/template/matter/work-product/source-set/trigger-family prior runs and linked artifacts. 7. Relevant global or domain user preferences, procedures, and memory directives. 8. Live DOC24 capability/tool/model/connector/MCP availability and utility bundles. 9. Launch-surface context only when explicitly inherited/selected or high-confidence bound by task-owned evidence. ``` The cascade is not a simple token priority list. Items in a lower category may outrank higher-category optional items when the module's task purpose requires them, but hard policy and explicit user exclusions always dominate. Launch-surface context is deliberately last because it is often unrelated to scheduled or background task work. ### §13A.13 Surface context use cases Surface context may be used for: ```text - disambiguating a user invocation such as “run the draft task for Marex”; - selecting among otherwise ambiguous matching task templates; - resolving “this complaint” when the task is launched from a chat or document surface; - carrying explicit user-provided context into an Elnor-created task when the receipt says so; - providing current system/spec context when the task is explicitly about that system/spec work; - linking a task created inside a task editor, run inspector, library surface, document viewer, or chat to that surface when the user accepts the binding. ``` Surface context must not be used for: ```text - filling ordinary task modules with unrelated current chat context; - injecting a current CSA/DOC73 discussion into an unrelated legal drafting task; - injecting one chat's matter into another chat's task; - injecting a currently active matter into a scheduled task for a different matter; - giving an email-triggered task unrelated chat context because the user happens to be chatting at the same time; - bypassing explicit task source bindings; - replacing DOC25/DOC73 source retrieval with conversational recollection. ``` ### §13A.14 Runtime expansion from upstream content If an upstream module produces new content that changes the next module's needs, DOC24 may expand context or capabilities for the next module only within the next module's policy envelope. Example: ```text Module A produces: “Check the PACER docket for Marex filings.” Module B originally has a general legal-research profile. DOC24 may mount PACER or inject PACER procedure guidance only if Module B allows it, the task policy permits it, and the expansion receipt records why. ``` This supports mid-process adaptation without letting DOC24 smuggle unrelated surface context into the graph. ### §13A.15 DOC73 / library context rule DOC73 library/corpus context may be injected into a task module only if one of the following is true: ```text 1. The task has an explicit library/corpus binding. 2. A source module selected documents from that library. 3. The Task Blueprint declares the library as a source. 4. The trigger payload resolves to a library-scoped document/work product. 5. The user invoked the task from a library surface and the run envelope records selected/inherited library context. 6. A module capability binding requires corpus/library retrieval. 7. Task Agent proposed the library and the user accepted it. 8. The task is a declared broad-scope library audit/search/comparison task. ``` Otherwise, a current DOC73 conversation or library-focused chat must not leak into unrelated tasks. DOC73 libraries are scoped knowledge regions in the graph, not ambient overlays on every task. ### §13A.16 DOC25 document context rule Task modules should normally receive document references, summaries, page markers, and retrieval tools rather than full document bodies. Full text or page excerpts may be injected only when: ```text - the module's input contract requires it; - the selected document/page set fits budget; - source-visibility policy permits it; - the task or module explicitly selected the source; - the trigger/input payload supplied the source; - and DOC25/DOC24 marker governance is followed. ``` Substantial produced, downloaded, or received documents route through DOC25 ingestion under §11 / §18. DOC25 remains the processor; DOC72/DOC73/task artifacts/work products remain the destination surfaces depending on policy. ### §13A.17 Global preferences, restrictions, and standing orders This isolation model does not block relevant global memory. It blocks unrelated surface context. Examples of globally eligible items: ```text - “Ask before sending external emails.” - “Use issue matrices for complaint evaluation.” - “For complex legal analysis, consider second-pass deepening.” - “Do not rely on memory for source-document quotes; re-read source documents.” - “Never use personal email for work filings.” ``` Examples of globally ineligible items unless task-bound: ```text - “Currently reviewing DOC73 V1.5.1.” - “Currently drafting CSA architecture notes.” - “Current chat thread about prompt lab naming.” - “Recently opened unrelated Marex run inspector.” ``` ### §13A.18 Cross-scope and broad-scope tasks Some tasks intentionally require broad context. The isolation invariant must not prevent such tasks from working. Broad-scope context is allowed when the task explicitly asks for it, for example: ```text - “Compare all tasks that touch Marex.” - “Find every library that mentions loss causation.” - “Audit my saved tasks and suggest improvements.” - “Summarize recent work across DOC23, DOC24, and DOC73.” - “Search all prior complaint-review tasks for reusable patterns.” ``` For broad-scope tasks, DOC24 must still emit a packet receipt showing the declared scope, source boundaries, visibility exclusions, and policy filters. ### §13A.19 Module-family defaults Recommended defaults: | Module family | DOC24 packet posture | |---|---| | Trigger modules | No LLM context packet; trigger payload creates scope evidence and TaskRunScopeEnvelope fields. | | Schedule trigger | Scope comes from recurrence config, saved task bindings, environment.config, and prior scheduled-run state; no active chat inheritance. | | Email trigger | Scope comes from email account/folder/thread/body/attachments/source rules/DOC25 ingestion; no active chat inheritance. | | Source modules | Source selection and metadata only; no unrelated memory. | | Agent Task / Red Team / Agent Review | Task-scoped entities, documents, procedures, preferences, tools, and restrictions according to module policy. | | Judge / Claim Extractor / Experiment | Evaluator-mode context: rubric, evidence policy, source scope, candidate/comparison bundles, not broad active context. | | Utility modules | Usually no DOC24 packet unless they call an LLM or need capability/context policy. | | Output modules | Delivery/naming/formatting/recipient/policy context only; no broad task-substance injection unless required for composition. | | Coding module | ACP/session/workspace context, explicit task files, coding policy, and module instructions; no unrelated chat context. | | Monitor / Environment system modules | May publish task scope, environment defaults, and context policy, but do not themselves authorize unrelated context injection. | ### §13A.20 Packet receipts and Run Inspector display Each model-facing module activation that receives DOC24 context SHALL link to its `TaskModuleContextPacket` in: ```text - EffectivePromptSnapshot; - TaskTraceSpan; - Run Inspector context/prompt detail view; - Task Assessment memory/context findings; - Task Agent operational lens. ``` Run Inspector should show: ```text Run scope: - Launch: email_trigger - Email account: Firm Outlook - Thread: Marex new filing alert - Attachments: 2 filings routed through DOC25 - Surface context: not applicable Injected: - 2 entity cards: Marex, Draft Complaint work product - 1 preference: issue matrix output preference - 3 tools: DOC25 retrieve pages, citation checker, PACER docket search - 1 source set: selected complaint and exhibits Excluded: - chat context “DOC73 spec review” — surface context not applicable - library “CSA research” — wrong library - prior task “Narayanan docket monitor” — low relevance ``` ### §13A.21 Validation and warnings Addenda B introduces validation and warning codes for task-module context packets: ```text validation.task_run_scope_envelope_required validation.task_module_context_packet_required validation.task_module_context_surface_context_unjustified validation.task_module_context_policy_snapshot_missing validation.task_module_context_source_visibility_violation validation.task_module_context_profile_overrides_denied_capability validation.task_module_context_trigger_scope_missing warning.task_module_context_surface_context_excluded warning.task_module_context_broad_scope_declared warning.task_module_context_relevant_memory_trimmed warning.task_module_context_library_candidate_excluded warning.task_module_context_prior_task_candidate_excluded warning.task_module_context_trigger_scope_low_confidence ``` ### §13A.22 Relationship to Task Agent and TKP Task Agent may inspect `TaskRunScopeEnvelope` and `TaskModuleContextPacket` records to answer why a module did or did not receive certain context. TKP may include decision cards explaining context-routing strategy, but TKP is not injected into ordinary modules unless the module is in a task-design/review/evaluation context requiring Task Agent knowledge. Task Agent must not treat packet omissions as errors merely because context exists elsewhere. The question is whether the context was run-bound, task-bound, module-relevant, policy-allowed, and budget-appropriate. --- ## §14 — Durable execution readiness ### §14.1 Purpose DOC23 already has checkpoints and run state. Addenda B strengthens the semantic contract for long-running business-grade workflows. ### §14.2 DurableTaskExecutionContract ```ts DurableTaskExecutionContract { checkpoint_strategy: | "module_boundary" | "llm_call_boundary" | "tool_call_boundary" replay_policy: | "resume_from_checkpoint" | "replay_with_cached_side_effects" side_effect_idempotency_key_required: boolean pending_human_signal_policy: { durable_wait: true timeout_policy: "none" | "deadline" | "business_hours" decision_log_required: true } crash_recovery_policy: { resume_incomplete_modules: boolean never_repeat_committed_side_effects: boolean require_policy_recheck_on_resume: boolean } schema_version: "1.0" } ``` ### §14.3 Human gates and holds Human review, hold, delay, and signal mechanics should remain graph-visible. Task Agent may explain or propose them, but must not create hidden waits. ### §14.4 Side-effect idempotency Any email send, file write, calendar write, PACER download, webhook POST, task output send, or other side-effecting operation should carry an idempotency key and policy snapshot where supported. --- ## §15 — Preflight Report ### §15.1 Purpose Preflight is not a separate duplicate simulator. It is the orchestrated report that unifies validation, test/simulation, capability readiness, policy checks, artifact policy preview, cost estimates, context readiness, and checkpoint coverage. ### §15.2 TaskPreflightReport ```ts TaskPreflightReport { report_id: string task_id?: string blueprint_id?: string graph_version?: number blueprint_version?: string static_validation: TaskGraphValidationResult input_readiness: InputReadinessReport capability_readiness: CapabilityReadinessReport doc24_context_preview: ContextPreviewReport policy_and_side_effects: PolicyRiskReport artifact_intake_preview: ArtifactIntakePreview cost_estimate: CostEstimate duration_estimate: DurationEstimate checkpoint_coverage: CheckpointCoverageReport test_results: ModuleTestResult[] simulation_result?: TaskSimulationSummary blocking_issues: ValidationIssue[] warnings: ValidationIssue[] recommended_fixes: TaskFixProposal[] generated_at: string schema_version: "1.0" } ``` ### §15.3 User-facing preflight example ```text Preflight Result: Warning Ready: - Graph validates. - Required source documents selected. - DOC25 retrieval available. Warnings: - PACER requested by module 3 but not allowed in module policy. - Final email output is configured but no human approval gate exists. - Estimated cost is $18-$32, above usual threshold. - Damages memo output is keep-with-run, but similar outputs are often reused. Recommended fixes: [Allow PACER for module 3] [Add human review before email] [Set cost cap to $25] [Make damages memo findable with summary] ``` --- ## §16 — Task Assessment ### §16.1 Purpose Task Assessment is the unified post-run and on-demand evaluation surface for task quality, failures, improvements, substantive observations, graph/config issues, learning signals, and template/preset recommendations. It replaces the narrower “failure playbook” concept. ### §16.2 TaskAssessment schema ```ts TaskAssessment { assessment_id: string task_id: string run_id?: string blueprint_id?: string graph_version: number template_id?: string outcome_summary: string execution_findings: ExecutionFinding[] quality_findings: QualityFinding[] capability_findings: CapabilityFinding[] memory_context_findings: MemoryContextFinding[] artifact_findings: ArtifactFinding[] cost_latency_findings: CostLatencyFinding[] user_feedback_findings: UserFeedbackFinding[] substantive_findings: SubstantiveFinding[] template_selection_findings: TemplateSelectionFinding[] preset_findings: PresetFinding[] design_learning_findings: DesignLearningFinding[] recommended_actions: TaskAssessmentAction[] graph_diff_proposals: GraphDiff[] blueprint_diff_proposals: BlueprintDiff[] module_config_patch_proposals: ModuleConfigPatch[] capability_binding_patch_proposals: CapabilityBindingPatch[] prompt_or_preset_patch_proposals: PromptPatch[] artifact_policy_patch_proposals: ArtifactPolicyPatch[] template_update_proposal_refs: string[] design_improvement_proposal_refs: string[] report_ref: StorageRef created_at: string schema_version: "1.0" } ``` ### §16.3 Inputs to Task Assessment Task Assessment consumes: - TaskTraceSpan tree; - run logs and checkpoints; - artifact index; - prompt/context snapshots; - capability snapshots; - policy decisions; - sub-agent traces; - DOC72 TaskRun memory; - DOC25 ingestion results; - DOC73 library ingestion results; - Judge/Experiment/Claim Extractor results; - Addenda A eval traces; - DOC8 friction events; - BDSM utility bundles; - user feedback; - manual edits after output; - output usage/open/bookmark/downstream-citation signals. ### §16.4 User-facing assessment example ```text Task Assessment — Marex Draft Complaint Run Outcome: Completed. Final complaint generated. Red-team found 6 issues. What went well: - Fact extraction was strong. - Damages memo was useful and reused downstream. - Red-team caught missing scienter allegations. Problems: - Complaint drafter did not receive full damages memo. - PACER capability was mounted but unused. - Red-team judge used summarized source docs; confidence lower than expected. - Final output should have been saved as work product/document. Recommended changes: 1. Wire damages memo as explicit context into complaint drafter. 2. Make damages memo findable with summary by default. 3. Remove PACER from this task unless docket entries are requested. 4. Add human review after red-team revision. 5. Save final complaint as DOC25 document/work_product. [Apply selected changes] [Generate graph diff] [Save template update proposal] ``` ### §16.5 Assessment-generated learning Task Assessment may emit: - TaskDesignLearningSignal; - TemplateUpdateProposal; - ModulePresetSemanticCard update candidate; - TaskDesignCase candidate; - TaskDesignPatternCard update candidate; - CapabilityUseObservation; - ArtifactCandidateRule update candidate; - QuestionPolicy update candidate; - TaskAgentInstructionCandidate. Material changes remain proposal-gated. --- ## §16A — Task review and portfolio assessment ### §16A.1 Ask Task Agent in task canvas Task editor UI SHOULD expose: ```text Ask Task Agent - Explain this graph - Review this task - Suggest missing steps - Check wiring - Check context routing - Check capability bindings - Check memory/artifact policy - Compare to template - Run preflight - Improve selected prompt - Test selected prompt variants ``` This is a side panel / assistant affordance, not a DOC23 module. ### §16A.2 Portfolio-level review User command: ```text Elnor, look at my tasks and suggest improvements. ``` Task Agent may perform a portfolio assessment. ```ts export type TaskPortfolioAssessment = { assessment_id: string; task_scope: | "all_tasks" | "active_tasks" | "selected_tasks" | "template_library" | "recent_failed_runs" | "prompt_quality" | "task_invocation_policy"; goals_considered: GoalRef[]; tasks_reviewed: string[]; templates_reviewed: string[]; module_presets_reviewed: string[]; findings: TaskPortfolioFinding[]; recommended_actions: TaskPortfolioRecommendation[]; high_value_candidates: string[]; low_value_or_duplicate_tasks: string[]; stale_templates: string[]; missing_blueprints: string[]; capability_binding_issues: string[]; artifact_policy_issues: string[]; evaluation_coverage_issues: string[]; prompt_quality_issues: string[]; report_ref: StorageRef; schema_version: "1.0"; }; ``` ### §16A.3 Optional goal/stakes intake For portfolio review or complex task creation, Task Agent MAY ask high-value questions such as: ```text What are your goals for these tasks? Are these high-stakes or convenience workflows? Do you care more about speed, quality, auditability, cost, or reuse? Which tasks do you actually use? Do you want more automation or more review gates? ``` BDSM/DOC8 question utility should learn whether these questions are useful or annoying by context class. --- ## §17 — Evaluation, datasets, and promotion safety ### §17.1 Purpose Self-improvement of tasks, prompts, templates, presets, and Task Agent behavior requires eval datasets, traces, promotion ledgers, rollback, and drift detection. ### §17.2 TaskEvalDataset ```ts TaskEvalDataset { dataset_id: string source: | "manual" | "production_trace" | "experiment" | "judge_failure" | "user_label" | "task_creation_eval" | "task_assessment" examples: TaskEvalExampleRef[] task_id?: string module_id?: string template_id?: string frozen_at: string input_fingerprint_hash: string schema_version: "1.0" } ``` ### §17.3 TaskEvalTrace ```ts TaskEvalTrace { eval_trace_id: string run_id?: string design_session_id?: string module_activation_ref?: string prompt_snapshot_ref?: StorageRef output_ref?: StorageRef graph_ref?: StorageRef blueprint_ref?: StorageRef judge_score_refs: string[] human_label_refs: string[] capability_snapshot_ref?: StorageRef policy_snapshot_ref?: StorageRef schema_version: "1.0" } ``` ### §17.4 PromotionLedgerEntry ```ts PromotionLedgerEntry { promotion_id: string target_kind: | "module_instruction" | "task_template" | "module_preset" | "task_blueprint" | "procedure_binding" | "capability_binding" | "task_agent_instruction" | "pattern_card" | "semantic_card" prior_hash: string candidate_hash: string validation_dataset_id: string validation_score_summary: object approved_by: "user" | "architect" | "policy_auto" | "experiment" rollback_ref: StorageRef post_promotion_monitor_id: string created_at: string schema_version: "1.0" } ``` ### §17.5 R5/R6 alignment This addendum should align with Addenda A's R5/R6 eval/optimization substrate rather than creating a parallel prompt optimization system. Task-specific datasets and promotion entries should use the same owner-doc promotion/rollback discipline where possible. --- ## §17A — Unified prompt improvement and evaluation ### §17A.1 Product rule: one prompt-help capability There should be one user-facing prompt-help concept: ```text Improve Prompt ``` The system may offer depth levels: ```ts export type PromptImprovementMode = | "quick_check" | "rewrite" | "generate_variants" | "test_variants" | "optimize_and_propose_promotion"; ``` The user SHOULD NOT have to choose among Prompt Advisor, Prompt Lab, Promptolution, DSPy, Experiment, or Judge as product concepts. ### §17A.2 No separate primary Prompt Lab UI A separate user-facing Prompt Lab is not required in the initial system. Prompt Lab, if retained from DOC17, is a backend/offline evaluation lane for reusable prompt artifacts, not the main user workflow. Primary access points: ```text Chat composer: Improve Prompt Prompt recipe editor: Improve / Rewrite / Test Task module prompt editor: Ask Task Agent -> Improve/Test Prompt Run Inspector: Prompt quality findings Task Assessment: Prompt findings and update proposals Prompt artifact/library page: optional management/history surface only ``` ### §17A.3 DOC17 Prompt Advisor as service DOC17 Prompt Advisor remains useful as a lightweight service: ```ts prompt_advisor.improve() prompt_advisor.rewrite() ``` It provides: ```text deterministic prompt-gap analysis; optional AI rewrite through controlled rewrite template; feedback events for viewed/requested/accepted/dismissed/edited rewrites; suggestions for prompt recipes or overlays where relevant. ``` It is not a separate agent and not the task-aware prompt orchestrator. User-facing quick-flow behavior: ```text Improve Prompt: - run deterministic gap analysis; - show gaps/hints/suggested fixes; - optionally offer Rewrite; - if rewritten, show what changed; - user may Use, Dismiss, Edit, or Save as Prompt Recipe. ``` Prompt Advisor feedback events (`viewed`, `rewrite_requested`, `rewrite_accepted`, `rewrite_dismissed`, `accepted_then_edited`, `saved_as_recipe`) feed the shared prompt-learning substrate and MUST NOT by themselves mutate active prompts. ### §17A.4 Task Agent as task-aware prompt orchestrator Task Agent orchestrates prompt improvement when the prompt belongs to: ```text DOC23 task module instruction DOC23 judge rubric DOC23 red-team prompt DOC23 output contract DOC23 task template DOC23 module preset prompt evaluation task reusable prompt being converted into module preset/template component ``` Task Agent may call DOC17 Prompt Advisor service for quick gap analysis/rewrite, but Task Agent owns task-context interpretation, downstream compatibility analysis, graph-aware recommendations, test-job creation, and task/template/preset update proposals. ### §17A.5 Prompt surfaces ```ts export type PromptSurface = | "chat_composer" | "prompt_recipe" | "overlay" | "room_role_prompt" | "red_team_prompt" | "doc23_module_instruction" | "doc23_judge_rubric" | "doc23_output_contract" | "task_template_prompt" | "module_preset_prompt"; ``` ### §17A.6 PromptImprovementRequest ```ts export type PromptImprovementRequest = { request_id: string; source_surface: PromptSurface; prompt_ref?: StorageRef; prompt_text_inline?: string; requested_mode: PromptImprovementMode; context_packet?: { task_id?: string; task_template_id?: string; module_id?: string; module_type?: string; blueprint_step_id?: string; downstream_consumers?: string[]; expected_output_contract?: string; judge_or_eval_refs?: string[]; }; orchestrator: | "doc17_prompt_advisor_service" | "task_agent" | "doc23_prompt_evaluation_task"; schema_version: "1.0"; }; ``` Routing rules: ```text chat_composer + quick_check/rewrite: DOC17 Prompt Advisor service or direct Elnor reasoning. prompt_recipe + quick_check/rewrite: DOC17 Prompt Advisor service. doc23 module/judge/red-team/task-template prompt: Task Agent orchestrates; may call DOC17 Prompt Advisor service. generate_variants/test_variants/optimize: Task Agent orchestrates tracked prompt-evaluation task/job. ``` ### §17A.7 Prompt artifact taxonomy ```text Prompt Recipe: DOC17-owned reusable prompt text. Module Prompt: DOC23-owned instruction text attached to a DOC23 module. Module Preset Prompt: Prompt included in DOC23 module preset with config/model/tools/output policy. Tested Prompt: Not a separate prompt type; a prompt artifact with evaluation evidence. Task Template Prompt: Prompt embedded in a reusable DOC23 task template. ``` Lifecycle: ```text free prompt -> prompt recipe -> tested prompt recipe -> module prompt -> module preset -> task template component ``` ### §17A.8 TaskPromptArtifact ```ts export type TaskPromptArtifact = { prompt_artifact_id: string; owner_surface: | "doc23_task_module" | "doc23_task_template" | "doc23_module_preset" | "doc17_prompt_recipe" | "doc17_overlay" | "doc12_room_role" | "doc14_red_team" | "propa_extraction_prompt"; prompt_role: | "instruction" | "rubric" | "output_contract" | "tool_policy" | "context_policy" | "judge_dimension" | "red_team_role" | "candidate_variant"; prompt_text_ref: StorageRef; prompt_text_hash: string; prompt_recipe_id?: string; module_id?: string; task_id?: string; task_template_id?: string; module_preset_id?: string; runtime_truth_ref?: StorageRef; evaluation_refs: StorageRef[]; tested_status: | "untested" | "tested" | "candidate_winner" | "promoted" | "rejected"; promotion_state: | "draft" | "active" | "candidate" | "promoted" | "rejected" | "rolled_back"; schema_version: "1.0"; }; ``` ### §17A.9 Prompt quality assessment ```ts export type TaskPromptQualityAssessment = { assessment_id: string; task_id?: string; module_id?: string; prompt_surface: | "module_instruction" | "judge_rubric" | "red_team_prompt" | "output_contract" | "task_template_instruction" | "module_preset_prompt"; current_prompt_hash: string; quality_findings: PromptQualityFinding[]; suggested_prompt_ref?: StorageRef; evidence_refs: StorageRef[]; confidence: number; recommended_next_step: | "accept" | "manual_review" | "run_judge_eval" | "run_experiment" | "run_dspy_candidate_generation" | "defer"; schema_version: "1.0"; }; export type PromptQualityFinding = { finding_kind: | "missing_goal" | "missing_source_scope" | "missing_output_contract" | "missing_tool_policy" | "missing_evidence_policy" | "missing_uncertainty_policy" | "ambiguous_success_criteria" | "downstream_incompatibility" | "overconstrained" | "underconstrained" | "unsafe_side_effect_instruction" | "judge_unscorable"; severity: "low" | "medium" | "high"; explanation: string; suggested_fix: string; schema_version: "1.0"; }; ``` ### §17A.10 Prompt design guide cards in TKP TKP SHOULD include `PromptDesignGuideCard` records for prompt surfaces. ```ts export type PromptDesignGuideCard = { card_id: string; prompt_surface: | "agent_task_instruction" | "judge_rubric" | "red_team_prompt" | "claim_extractor_instruction" | "experiment_variant_instruction" | "task_agent_design_question" | "output_contract" | "tool_use_instruction" | "document_review_instruction"; purpose: string; when_to_use: string[]; when_not_to_use: string[]; prompt_quality_principles: string[]; common_failures: string[]; repair_patterns: string[]; example_before?: string; example_after?: string; related_modules: string[]; related_eval_methods: string[]; source_refs: SpecSourceRef[]; authority_level: "hard_rule" | "strong_default" | "soft_heuristic" | "example"; schema_version: "1.0"; }; ``` Recommended card families: ```text Agent Task instruction for legal/professional analysis Judge rubric prompt Red-team prompt Output contract Tool-use/source-grounding prompt Document-review prompt Second-pass deepening prompt Experiment variant prompt Prompt-evaluation task prompt ``` --- ## §17B — Prompt candidate generation and Experiment integration ### §17B.1 No Promptolution-specific module There shall be no Promptolution-specific module in the initial design. Promptolution is a candidate-generation method, not a graph module and not a user-facing surface. ### §17B.2 Candidate-generation methods ```ts export type PromptCandidateGenerationMethod = | "task_agent_structured_rewrite" | "doc17_prompt_advisor_rewrite" | "promptolution" | "dspy_gepa" | "manual_variant" | "template_based_variant" | "prior_winner"; export type PromptCandidateGenerationMethodDescriptor = { method: PromptCandidateGenerationMethod; requires_llm_call: boolean; operative_context: "quick_help" | "tracked_evaluation_only" | "reserved_r5" | "manual_only"; availability_state: "operative" | "reserved" | "unavailable" | "degraded"; notes?: string; schema_version: "1.0"; }; export type PromptCandidateMethodMetadata = { method: PromptCandidateGenerationMethod; execution_class: | "deterministic" | "llm_assisted" | "optimizer" | "manual"; requires_llm_call: boolean; requires_eval_dataset: boolean; requires_judge_or_scorer: boolean; permitted_runtime: | "quick_help" | "tracked_evaluation_only" | "r5_optimization_only"; expected_cost_class: "none" | "low" | "medium" | "high"; expected_latency_class: "instant" | "short" | "long" | "background"; operative_status: | "operative" | "reserved_r5" | "unavailable" | "degraded"; schema_version: "1.0"; }; ``` `PromptCandidateGenerationMethodDescriptor` is the compact UI/availability descriptor. `PromptCandidateMethodMetadata` is the fuller operational descriptor used by Experiment, Task Agent, and evaluation reporting to distinguish deterministic candidate generation from LLM-assisted rewriting and optimizer-backed generation. Rules: ```text Promptolution is treated as a candidate-generation method. If the implementation is deterministic and does not require a model call, mark requires_llm_call=false. It remains tracked-evaluation-only unless the architect later approves live use. DSPy/GEPA is reserved for R5 optimization substrate. Manual variants require no generation call and can feed Experiment directly. Default method metadata guidance: ```text manual_variant: execution_class = manual; requires_llm_call = false. task_agent_structured_rewrite: execution_class = llm_assisted; requires_llm_call = true. doc17_prompt_advisor_rewrite: execution_class = llm_assisted for rewrite, deterministic for gap analysis. promptolution: execution_class = deterministic or llm_assisted depending on implementation; if deterministic, it may be a cheap candidate generator; still no separate module or UI. dspy_gepa: execution_class = optimizer; permitted_runtime = r5_optimization_only until Addenda A R5 lands. ``` ``` ### §17B.3 PromptCandidateBundle ```ts export type PromptCandidateBundle = { bundle_id: string; source_prompt_ref: StorageRef; target_prompt_surface: | "chat_prompt" | "prompt_recipe" | "module_instruction" | "judge_rubric" | "red_team_prompt" | "output_contract"; candidates: PromptCandidate[]; generation_methods_used: PromptCandidateGenerationMethod[]; generation_context_ref?: StorageRef; created_by: | "user" | "task_agent" | "experiment_module" | "prompt_advisor_service"; schema_version: "1.0"; }; export type PromptCandidate = { candidate_id: string; label: string; prompt_text_ref: StorageRef; prompt_hash: string; generation_method: PromptCandidateGenerationMethod; rationale?: string; preserved_constraints: string[]; changed_elements_summary: string[]; schema_version: "1.0"; }; ``` ### §17B.4 Experiment module integration The `system.experiment` module should be the initial home for prompt candidate testing. Candidate generation may be baked into Experiment via candidate sources rather than a separate graph module. ```ts export type ExperimentVariantInstructionSource = | "manual" | "same_as_baseline" | "task_agent_candidate" | "doc17_prompt_advisor_candidate" | "promptolution_candidate" | "dspy_candidate" | "prompt_recipe" | "prior_winner"; ``` ```ts export type PromptCandidateGenerationConfig = { enabled: boolean; methods: PromptCandidateGenerationMethod[]; candidate_count: number; target_prompt_surface: | "chat_prompt" | "prompt_recipe" | "module_instruction" | "judge_rubric" | "red_team_prompt" | "output_contract"; preserve_constraints: boolean; preserve_output_contract: boolean; preserve_tool_policy: boolean; generation_status_by_method: Record< PromptCandidateGenerationMethod, "operative" | "reserved_r5" | "unavailable" | "degraded" >; generated_candidate_bundle_ref?: StorageRef; schema_version: "1.0"; }; ``` Experiment may consume: ```ts export type ExperimentPromptCandidateIntegration = { prompt_candidate_bundle_ref?: StorageRef; prompt_candidate_generation_config?: PromptCandidateGenerationConfig; variants_instruction_source?: Record; schema_version: "1.0"; }; export type ExperimentPromptCandidateMode = | "consume_existing_candidate_bundle" | "generate_candidates_then_test" | "manual_variants_only"; export type ExperimentPromptCandidateSourcePlan = { mode: ExperimentPromptCandidateMode; source_prompt_ref?: StorageRef; existing_candidate_bundle_ref?: StorageRef; generation_config?: PromptCandidateGenerationConfig; method_metadata: PromptCandidateMethodMetadata[]; schema_version: "1.0"; }; ``` This supports three workflows: Task Agent generates candidates and Experiment tests them; Experiment generates candidates internally and tests them; or the user supplies manual variants and Experiment only compares them. ### §17B.5 DSPy / GEPA status DSPy/GEPA may appear as a reserved candidate-generation or optimization method in prompt-evaluation schemas, but it is non-operative until the Addenda A R5 optimization substrate lands. R4.1 must not implement unsafe DSPy-lite behavior. ### §17B.6 Prompt Candidate Generator module reserved A future generic module MAY be reserved: ```text step.prompt_candidate_generator ``` Status: ```text R5/R6 optional; not operative in initial Addenda B. ``` Purpose: ```text Generate candidate prompt variants when manual graph composition proves valuable. ``` Initial preference: ```text Bake candidate generation into Experiment first; reserve separate module only if manual graph construction requires it. ``` --- ## §17C — Prompt Evaluation Task ### §17C.1 Deep prompt testing as a tracked task Deep prompt testing should be a saved/tracked DOC23 task or task-native evaluation job. It should use the existing/forthcoming Experiment, Judge, Claim Extractor, DSPy/GEPA, and Task Assessment infrastructure. ### §17C.2 PromptEvaluationTask schema ```ts export type PromptEvaluationTask = { evaluation_task_id: string; source_prompt_ref: StorageRef; prompt_kind: | "prompt_recipe" | "module_prompt" | "judge_rubric" | "red_team_prompt" | "output_contract"; evaluation_goal: string; candidate_generation_methods: PromptCandidateGenerationMethod[]; candidate_prompt_refs: StorageRef[]; eval_dataset_ref?: StorageRef; judge_rubric_refs: StorageRef[]; experiment_run_refs: StorageRef[]; dspy_job_refs: StorageRef[]; result_summary_ref?: StorageRef; recommendation: | "keep_original" | "adopt_candidate" | "save_as_recipe" | "save_as_module_preset" | "propose_template_update" | "needs_more_examples" | "manual_review"; promotion_proposal_ref?: StorageRef; schema_version: "1.0"; }; ``` ### §17C.3 User-facing flow ```text User: Refine my red-team prompt for effectiveness. Elnor: I can do a quick rewrite, or run a tracked prompt test with variants and judge scoring. [Quick rewrite] [Run tracked prompt test] ``` If tracked: ```text Task Agent creates Prompt Evaluation Task. Experiment runs baseline + candidates. Judge scores outputs. Task Assessment summarizes winner, failure modes, and proposed save/update action. ``` ### §17C.4 No silent mutation Prompt updates MUST be proposals with prompt hashes, evidence, approval policy, and rollback where appropriate. No live task prompt, module preset prompt, prompt recipe, or template prompt may silently rewrite itself based on prompt-evaluation results. --- ## §17D — Prompt learning signals ### §17D.1 TaskPromptEditLearningSignal ```ts export type TaskPromptEditLearningSignal = { signal_id: string; design_session_id: string; task_id?: string; module_id?: string; edit_kind: | "added_output_contract" | "added_source_scope" | "added_deeper_analysis_instruction" | "added_tool_policy" | "added_red_team_instruction" | "tightened_rubric" | "changed_tone" | "changed_format" | "removed_overconstraint" | "removed_unwanted_question"; before_prompt_hash: string; after_prompt_hash: string; interpreted_lesson: string; evidence_ref: StorageRef; recommended_update_kind: | "prompt_design_guide_card_update" | "template_prompt_update_proposal" | "module_preset_prompt_update_proposal" | "task_agent_instruction_candidate" | "user_preference_overlay" | "none"; schema_version: "1.0"; }; ``` ### §17D.2 Prompt learning inputs Track: ```text user accepted prompt rewrite user rejected prompt rewrite user edited accepted rewrite user added output contract user added “look deeper” user added source-scope instruction judge found prompt weak experiment variant won prompt caused downstream parser failure prompt produced reusable artifact prompt was promoted to recipe/preset/template ``` These signals feed: ```text Task Agent prompt-design guide cards Prompt recipe recommendations Module preset update proposals Task template update proposals BDSM question/suggestion utility DOC8 friction/quality-degradation signals TaskSystemLearningReport ``` --- ## §17E — Unified prompt help UX Prompt help appears where prompts live: chat composer, prompt recipe editor, task module prompt editor, judge/rubric editor, red-team prompt editor, task template editor, module preset editor, Task Assessment prompt findings, and Run Inspector prompt evidence. Prompt help does not require the user to visit a separate Prompt Lab page. A future advanced history/admin surface may list prompt-evaluation jobs and promotion history, but that surface is not the ordinary workflow. ### §17E.1 Chat / composer quick help ```text User: Is this a good prompt? Elnor: It is missing a clear output format and success criteria. Here is a stronger version... ``` Backend: ```text DOC17 Prompt Advisor service or direct Elnor reasoning. No Task Agent unless user asks for testing, task context, saving, or deep improvement. ``` ### §17E.2 Task module prompt editor ```text Ask Task Agent - Improve prompt - Test prompt variants - Explain prompt role - Check downstream compatibility ``` Backend: ```text Task Agent + DOC17 Prompt Advisor service + TKP + graph context. ``` ### §17E.3 Prompt recipe surface ```text Prompt Recipe actions: Improve Rewrite Generate variants Test variants Use in task Convert to module preset ``` Backend: ```text DOC17 owns recipe. Task Agent orchestrates deep testing or task conversion. ``` ### §17E.4 Task Assessment prompt findings Task Assessment SHOULD include prompt-quality findings when relevant: ```text Prompt Quality - weak module instructions - missing output contracts - unscorable judge rubrics - downstream parsing failures - wrong tool-use instructions - omitted source-grounding - shallow one-pass analysis where second-pass deepening is preferred ``` ```ts export type TaskAssessmentPromptSection = { prompt_artifacts_reviewed: string[]; prompt_quality_assessments: TaskPromptQualityAssessment[]; prompt_eval_job_refs: string[]; suggested_prompt_updates: PromptImprovementProposal[]; schema_version: "1.0"; }; ``` --- ## §18 — Library / DOC73 integration ### §18.1 Purpose Task outputs and downloaded/received documents may become library/corpus material when the task purpose or user instruction requires deep extraction into bounded knowledge. ### §18.2 Task output to library source class DOC23 must support a task-output-to-library source class for DOC73 integration. ```ts TaskOutputLibraryIngestionIntent { intent_id: string task_id: string run_id: string artifact_id: string source_artifact_ref: StorageRef target_library_ref: string ingestion_reason: | "user_requested" | "task_template_policy" | "blueprint_declared_output" | "assessment_recommended" | "recurring_gathering_task" doc25_ingestion_required: boolean doc25_ingestion_ref?: string doc73_extraction_profile_ref?: string policy_snapshot_ref: StorageRef created_at: string schema_version: "1.0" } ``` ### §18.3 Library user-facing language Technical schemas may use corpus/library per DOC73 rules. User-facing Task Agent language should use “library” unless the user uses “corpus” or the UI/debug context requires technical terminology. ### §18.4 DOC73 endpoint rule Adding to a library is a destination. DOC25 processing is a route before destination, not the destination itself. --- ## §19 — Privacy, incognito, and retention ### §19.1 Purpose Tasks may handle privileged, personal, confidential, or sensitive material. Addenda B must preserve global EC/PropA/DOC1 privacy controls while adding task-specific memory and artifact behavior. ### §19.2 Effective control order Task memory/artifact capture is subordinate to: 1. EC global memory/collection/application controls. 2. EC incognito and processing controls. 3. PropA source/sensitivity/visibility policy. 4. DOC1 write gate and memory governance. 5. TaskRunMemoryPolicy. 6. TaskArtifactIntakePolicy. ### §19.3 Incognito UI ```text Private run options ─────────────────── Memory mode: [Standard] [Do not add to Elnor memory] [Ephemeral test run] Explanation: Standard: preserves run audit and selected memories/artifacts. Do not add to Elnor memory: no optional DOC72/DOC8 learning, but run audit remains. Ephemeral test run: minimal records only; intended for testing. ``` ### §19.4 Retention classes See §10.9. Retention decisions must be visible in Run Inspector and artifact details. --- ## §19A — Failure modes and degraded behavior ### §19A.1 Task Agent degraded matrix | Failure | Behavior | |---|---| | Task Agent unavailable | Elnor remains direct-first; task canvas/manual editor still works. | | TKP stale | Task Agent may inspect/retrieve; task design recommendations are marked degraded or blocked depending on drift severity. | | DOC11 named-agent unavailable | Use deterministic tools/read models; no background Task Agent design session. | | OpenClaw `sessions_spawn` unavailable | No async/background Task Agent; synchronous Gateway call only if available. | | Fallback model used | Receipt records primary failure and fallback model; high-risk graph/prompt changes require validation. | | Task Agent proposal invalid | Validator/preflight blocks save; Task Agent may revise up to configured max iterations; then reports unresolved issues. | | Capability registry unavailable | No capability-binding recommendations beyond saved template metadata. | | DOC72 memory unavailable | Design proceeds from templates/TKP only; no prior-run learning. | | DOC8/BDSM unavailable | No learned suppression/preference; static policy only. | | DOC17 Prompt Advisor unavailable | Task Agent may use TKP prompt guide cards and direct reasoning; no Prompt Advisor service receipt. | | Addenda A Experiment/Judge unavailable | Deep prompt testing unavailable; quick prompt help remains available. | | User rejects task suggestion repeatedly | BDSM/DOC8 suppress similar suggestions by context class. | | Hidden graph would be created | Forbidden; must create visible receipt and graph link. | ### §19A.2 No-phantom controls Every visible Task Agent, task suggestion, prompt-improvement, prompt-testing, task-assessment, or task-inspection control MUST map to a real route/read model/command or an explicit disabled/degraded/no-op row with reason code. --- ## §20 — UI/UX surfaces ### §20.1 Task Agent surfaces Required user-facing Task Agent surfaces: 1. **Create from Description** — natural-language task creation. 2. **Task Proposal Panel** — blueprint, templates, graph, capabilities, outputs, memory, risks. 3. **Template Match Panel** — reuse/customize/fork/new decision. 4. **Blueprint Editor** — user-readable intent/process editor. 5. **Graph Diff Review** — proposed graph/config changes. 6. **Task Agent Run Lens** — conversational and visual task progress/status. 7. **Artifact Finder** — search outputs across runs/tasks/templates/matters. 8. **Task Assessment Report** — post-run and on-demand report. 9. **Knowledge Pack Inspector** — active pack/version/drift/eval state. 10. **Task Design Learning Review Queue** — proposed template/preset/pattern/preference updates. ### §20.2 Creation UI states ```text Create Task ──────────────────────── [Describe what you want Elnor to do] After submit: - Intent summary - Existing templates found - Recommended setup - Optional questions - Blueprint preview - Graph preview - Capability/tool plan - Output/memory plan - Preflight result ``` ### §20.3 Run progress conversational examples ```text User: Where are we in the Marex draft task? Task Agent: The task is 5 of 7 steps complete. It finished source classification, fact extraction, damages memo, and initial complaint draft. It is currently waiting on the red-team module. The next step is revision. The latest output is the initial complaint draft, and the damages memo is available here: [artifact]. ``` ```text User: Send me the results of the draft task ran for Marex. Task Agent: I found the completed Marex Draft Complaint run from May 2. Final outputs: complaint draft, damages memo, red-team report, revision summary. The final complaint is saved as [work product link]. I can also show the intermediate damages memo. ``` ### §20.4 Artifact intake UI terminology Use user-friendly labels as in §11.2. Avoid exposing “candidate for memory review” as the primary label; use “Ask whether to make findable” or “Suggest saving for later.” ### §20.5 Task Assessment UI Task Assessment UI should show: - outcome summary; - what went well; - what failed or degraded; - substantive findings; - graph/config findings; - capability/tool findings; - artifact/memory findings; - recommended changes; - graph/blueprint diffs; - apply selected changes controls; - save as template update proposal; - learning signals generated. --- ## §20A — UI additions ### §20A.1 Chat chips Medium/high task opportunity may surface quiet chips: ```text [Use saved task] [Run deeper task] [Ask Task Agent to set this up] [View matching template] ``` These chips must be suppressible by BDSM/DOC8 learning. ### §20A.2 Task Agent panels Task surfaces SHOULD include: ```text Task editor side panel: Ask Task Agent Run Inspector side panel: Ask Task Agent Task list: Review my tasks Template/preset library: Ask Task Agent to compare/improve Task Assessment report: Apply selected proposals Prompt editor: Improve/Test prompt ``` ### §20A.3 Example — direct answer with optional task ```text Here is a quick review. For a deeper tracked review, I can also use your Complaint Evaluation task to produce an issue matrix, red-team report, and revision plan. [Run deeper saved task] ``` ### §20A.4 Example — high-fit task recommendation ```text This looks like a tracked workflow rather than a quick chat answer. I found your Complaint Evaluation template. I recommend using it because it will preserve the source review, red-team pass, outputs, and task history. [Use template] [Customize first] [Just answer here] ``` ### §20A.5 Example — visible Elnor-created task ```text I set this up as a saved task because it needs to run independently and preserve outputs. You can inspect or edit the graph before it runs. [View task] [Edit steps] [Run now] ``` ### §20A.6 Example — prompt testing ```text I can do a quick rewrite, or run a tracked prompt test with variants and judge scoring. [Quick rewrite] [Run tracked prompt test] ``` --- ## §20B — DOC20-integrated modular task surface ### §20B.1 DOC20 shell integration rule The task modular canvas is a DOC20 content surface. It SHALL render inside the DOC20 unified workspace main view area as a `task` tab, not as a separate application shell. Required behavior: 1. Opening a task from the Browser, Tasks page, Q Home status card, Task Agent link, or run-history link opens or focuses a DOC20 task tab. 2. The right main-chat column defaults closed for task tabs unless the user previously pinned it open for that tab. 3. The left DOC20 nav rail remains available so the user can switch to Browser, main chat, settings, split view, or other DOC20 surfaces. 4. When the user opens the task page from the Pages navigation, Browser defaults to the Browser sub-tab with the Task filter selected. 5. Task load, save, duplicate, archive, and view actions are unified between the Browser and the Tasks page; Q MUST NOT create parallel task-picking surfaces with inconsistent state. 6. Browser items may be dragged into the modular graph, into a module config/drop zone, or into a Task Segment insertion target when permitted by type and policy. 7. Opening a module output, artifact, input document, delivery receipt, or related work product opens that object in the appropriate DOC20 tab/viewer and preserves the ability to switch back to the task tab using the DOC20 tab bar. DOC20’s surface-independence principle remains controlling: the task tab is a viewport over EC-owned durable state, not a writer or separate task data owner. ### §20B.2 Blank task-page state When no task is loaded, the task page SHOULD show: ```text Task Workspace ──────────────────────────────── [New Task] [Use Saved Task] [Ask Task Agent] Active / Running Draft Complaint — Marex running · 7/12 steps · $7.84 Filing Monitor — Paramount scheduled · next run 5:00 PM Recently used Complaint Evaluation 18 runs · avg 42m · avg $14.80 Weekly Docket Monitor 9 runs · avg 8m · avg $1.30 Blank graph area: Drag a saved task, module preset, document, or Task Segment here, or click New Task to start from a blank graph. ``` The blank state SHOULD not hide active/scheduled work. It SHOULD provide a larger running-task overview than the compact Browser rows. ### §20B.3 Graph Overview Preservation Rule The broad modular graph/module-card overview remains the primary task-design/status surface. The visual direction of the existing V11 ALT A module blocks is preserved: compact module cards, ports/cables, light status badges, and minimal visual noise. The graph overview SHOULD show only: - module name/type/category; - ports/cables; - compact execution status; - compact validation/error badge; - compact output-available badge; - compact Judge/Experiment score/winner badge where relevant; - current/next module highlighting during a selected run. The graph overview SHOULD NOT show full telemetry, raw prompts, full DOC24 packets, delivery receipts, all artifacts, full cost breakdowns, sub-agent traces, or raw audit data. Those belong in module detail panels, the Run Inspector, Artifacts & Deliveries, Context & Audit, and Task Agent side panel. ### §20B.4 Module detail/config panel run-data block Every module detail/config panel MUST include a compact `Run Data` block for the selected run, without requiring the user to leave the modular page. ```ts ModuleRunQuickAccess { task_id: string run_id: string module_id: string activation_seq?: number status: | "not_run" | "pending" | "running" | "waiting" | "success" | "error" | "skipped" | "cancelled" last_started_at?: string last_completed_at?: string duration_ms?: number cost_usd?: number primary_outputs: Array<{ artifact_id: string display_name: string artifact_kind: string open_url: string }> inspector_url: string live_inspector_available: boolean schema_version: "1.0" } ``` Illustrative UI: ```text Run Data ──────── Status: Completed Finished: May 4, 2026 11:04 AM Duration: 6m 14s Cost: $2.18 Outputs Damages Memo.md [Open] Analysis.json [Open] [Open run inspector →] ``` `Open output` opens the artifact in the appropriate DOC20 viewer/tab. `Open run inspector` links to the full run inspector focused on the module activation, not to a separate module-inspector product. ### §20B.5 Graph ⇄ Inspector navigation Graph view and Run Inspector are two synchronized views of the same task/run state. ```ts TaskGraphInspectorNavigationState { task_id: string run_id?: string view: "graph" | "inspector" selected_module_id?: string selected_activation_seq?: number selected_artifact_id?: string graph_viewport_state?: { zoom: number pan_x: number pan_y: number } inspector_section?: "run_flow_steps" | "artifacts_deliveries" | "context_audit" opened_from?: "graph" | "tasks_page" | "home" | "task_agent" | "artifact_link" | "shared_view" schema_version: "1.0" } ``` Rules: 1. Opening the inspector from a selected module focuses the corresponding activation row. 2. Returning from the inspector to graph focuses/highlights the same module and restores graph viewport state when available. 3. If the inspector is already open as a side/top panel within the graph page, the graph-back arrow is redundant and MAY be hidden; Close returns to the graph. 4. If the inspector is opened from a standalone Tasks/History/Home/Task Agent link without the graph visible, a graph-back/open-graph affordance opens the modular task tab and focuses the selected module/run. 5. Artifact links opened from the inspector retain a backlink to task/run/module context. --- ## §20C — Run Inspector, Run Flow, Artifacts, Deliveries, and Context Audit ### §20C.1 Consolidated Run Inspector sections Run Inspector is a focused detail panel/page, not an 11-tab observability product. It SHALL present: ```text Header / Summary 1. Run Flow & Steps 2. Artifacts & Deliveries 3. Context & Audit ``` Judge/Experiment results appear first on the relevant graph modules and module detail panels. Run Inspector may summarize them inline when present but SHOULD NOT create a permanent top-level “Quality” tab for ordinary tasks. ### §20C.2 Header / Summary Header shows the key run facts: ```text Marex — Draft Complaint Run Completed · May 4, 2026 · 49m 12s · $18.37 12 modules · 11 completed · 1 skipped · 0 failed Final output: Revised Complaint Draft Delivered: emailed to will@will.com · saved to /Marex/Drafts/ [Open final output] [Ask Task Agent] [Compare] [Export audit] ``` Header MUST include dates, run duration, cost when available, final output links, and delivery summary when any output was delivered externally or saved to a durable destination. ### §20C.3 Run Flow & Steps — not a simple timeline The primary inspector section is `Run Flow & Steps`, not merely “Timeline.” DOC23 graphs may branch, loop, fan out, emit multiple outputs, and reactivate modules multiple times. The UI must therefore be activation-aware and flow-aware rather than assuming a straight linear timeline. ```ts TaskRunStepRow { task_id: string run_id: string module_id: string activation_seq: number display_order: number module_label: string module_type: string branch_label?: string loop_iteration?: number output_port_id?: string status: | "pending" | "running" | "waiting" | "success" | "error" | "cancelled" | "skipped" started_at?: string completed_at?: string duration_ms?: number cost_usd?: number agent_summary?: { agent_id?: string model_ref?: string think_level?: string fallback_used?: boolean } input_summary: string[] output_summary: string[] artifact_ids: string[] delivery_ids: string[] has_prompt_context_snapshot: boolean has_doc24_context_packet: boolean has_subagent_trace: boolean has_error: boolean has_evaluation: boolean expandable_detail_ref: StorageRef schema_version: "1.0" } ``` Run Flow & Steps MUST be searchable and filterable. It SHOULD support sorting for power users, but default order should be execution/display order that preserves branch/loop structure. Rows must show date/time, duration, cost, status, and activation sequence when relevant. ### §20C.4 Current/next written status A single read model powers written run status across Q Home, Tasks page, graph header, notifications, and Task Agent answers. ```ts TaskRunNarrativeStatus { task_id: string run_id: string one_line_status: string current_work_summary: string next_work_summary: string recent_completion_summary: string current_module_ids: string[] next_module_ids: string[] waiting_reasons: string[] generated_from_state_version: string generated_at: string schema_version: "1.0" } ``` Example: ```text The Marex Draft Complaint task completed source intake and damages analysis. Red-team review is running now. Final revision and judge review will run after the red-team report finishes. ``` ### §20C.5 Artifacts & Deliveries Artifacts get a dedicated section. It MUST show outputs, intermediate artifacts, consumed documents, receipts, and delivery destinations. ```ts TaskDeliveryRecord { delivery_id: string task_id: string run_id: string module_id: string activation_seq: number artifact_id: string artifact_ref: StorageRef delivery_kind: | "email" | "imessage" | "chat" | "file_save" | "webhook" | "task_output" | "calendar" | "notification" | "work_product_save" | "doc73_library_add" | "doc72_memory_reference" status: | "pending" | "delivered" | "failed" | "blocked_by_policy" | "cancelled" | "draft_only" destination_summary: string delivered_to?: string[] delivered_at?: string external_message_id?: string saved_path?: string open_url?: string receipt_ref?: StorageRef policy_approval_ref?: StorageRef schema_version: "1.0" } ``` ```ts TaskDeliverablesReadModel { task_id: string run_id: string final_outputs: TaskArtifactIndexEntry[] intermediate_outputs: TaskArtifactIndexEntry[] input_documents: TaskArtifactIndexEntry[] receipts: TaskArtifactIndexEntry[] deliveries: TaskDeliveryRecord[] undelivered_outputs: TaskArtifactIndexEntry[] failed_deliveries: TaskDeliveryRecord[] schema_version: "1.0" } ``` Illustrative UI: ```text Final outputs Revised Complaint Draft.docx Produced by: Final Output module Saved to: /Marex/Drafts/Revised Complaint Draft.docx Delivered: emailed to will@will.com on May 4, 2026 at 11:31 AM [Open] [Preview] [Copy link] [Save as work product] [Add to library] Intermediate outputs Damages Memo.md Produced by: Damages Memo module Used by: Complaint Drafter, Red-Team Reviewer Intake: kept with run; can make findable [Open] [Make findable] [Save as work product] Inputs / consumed documents Complaint.pdf DOC25 processed Used by: Source Intake, Fact Extraction, Complaint Drafter, Judge [Open document] [View pages used] Receipts Email delivery receipt To: will@will.com Sent: May 4, 2026 11:31 AM [Open receipt] ``` ### §20C.6 Artifact/document context menu Right-clicking any document/artifact/output/receipt link SHOULD expose a consistent context menu when allowed by environment and permissions: ```text Open in Q Show in Browser Show in Finder Save As… Copy Link Open in Native App Add to Library Save as Work Product Make Findable Use in New Task Ask Task Agent about this ``` Native actions must degrade honestly in non-Electron/web contexts or when OS permissions are unavailable. ### §20C.7 Context & Audit inspector Context & Audit is the power-user inspection section. It MUST expose what was injected into a module prompt and what was excluded. ```ts TaskPromptContextInspectorReadModel { task_id: string run_id: string module_id: string activation_seq: number effective_prompt_snapshot_ref: StorageRef redacted_prompt_snapshot_ref?: StorageRef prompt_layers: Array<{ layer_kind: | "system" | "task_blueprint" | "environment_config" | "module_instruction" | "instruction_in" | "doc24_context_packet" | "data_in" | "context_in" | "chain_history" | "attached_files" display_name: string storage_ref?: StorageRef token_count?: number }> doc24_context_packet_ref?: StorageRef injected_items: ContextInjectedItem[] excluded_items: ContextExcludedItem[] feedback_enabled: boolean schema_version: "1.0" } ``` The UI SHOULD show injected memories, entity cards, procedures, patterns, tools, connectors, MCPs, APIs, documents, source pages, DOC73 libraries, and excluded candidates with reason codes. It MUST support opening the source memory/document/procedure/tool record where policy permits. ### §20C.8 Context feedback into DOC8/BDSM The context inspector SHALL allow thumbs/comment feedback on injected and excluded items. ```ts TaskContextFeedbackEvent { event_id: string task_id: string run_id: string module_id: string activation_seq: number target_kind: | "injected_memory" | "injected_entity" | "injected_procedure" | "injected_tool" | "injected_connector" | "injected_document" | "injected_library" | "excluded_candidate" | "prompt_layer" | "context_packet" target_ref: string feedback: | "helpful" | "not_helpful" | "wrong" | "missing" | "too_much" | "should_have_been_included" | "should_have_been_excluded" user_comment?: string created_at: string schema_version: "1.0" } ``` These events are learning inputs for DOC8/BDSM. They do not directly mutate DOC24 routing or DOC72 memory. ### §20C.9 Export audit `Export audit` creates a redaction-aware audit bundle, not a raw log dump. ```ts TaskAuditExportBundle { export_id: string task_id: string run_id: string included_sections: Array< | "run_summary" | "graph_snapshot" | "blueprint_snapshot" | "module_activation_list" | "cost_duration_summary" | "models_and_agents" | "tool_connector_calls" | "subagent_sessions" | "prompt_snapshots" | "doc24_context_packets" | "input_output_artifact_manifest" | "delivery_receipts" | "policy_approvals" | "errors_retries_fallbacks" | "judge_experiment_results" | "doc72_memory_summary_links" | "raw_trace" > redaction_profile: | "summary_only" | "professional_internal" | "external_redacted" | "full_local_audit" output_format: "folder" | "zip" | "markdown" | "pdf_packet" export_ref?: StorageRef policy_decision_ref?: PolicyDecisionRef created_at: string schema_version: "1.0" } ``` Prompt snapshots, context packets, memory excerpts, local paths, privileged content, and raw traces must honor PropA/EC policy decisions and chosen redaction profile. --- ## §20D — Task Agent side panel and scoped Task Agent conversations ### §20D.1 One Task Agent, contextual side panel Task Agent is the single canonical agent name. It appears as a side-panel chat on task-related surfaces, including the modular graph, module detail/config panel, Run Inspector, artifact detail, Tasks page, and prompt editor. The side panel should look and behave like a Q chat panel, with attachments, send controls, message feedback/stars when those exist in the main chat design, and surface-scoped context. It must not become a separate app or second agent identity. ### §20D.2 Contextual actions pass refs, not separate agents Buttons or context-menu actions such as `Ask about module`, `Ask about artifact`, or `Ask about this run` simply open/focus the Task Agent side panel with the relevant task/run/module/artifact refs attached. They do not invoke a separate agent or separate prompt system. Predetermined suggestion chips inside the Task Agent panel SHOULD be sparse and action-oriented. They are appropriate when they launch a specific workflow such as: ```text Deep review and suggestions Run validation/test Improve selected prompt Generate Task Assessment Start onboarding/intake for this task ``` Generic prefilled questions such as “Where are we?” are usually unnecessary because the user can type them. ```ts TaskAgentPanelContext { surface: | "tasks_page" | "task_graph" | "run_inspector" | "module_detail" | "artifact_detail" | "context_inspector" | "prompt_editor" task_id?: string run_id?: string module_id?: string activation_seq?: number artifact_id?: string selected_entrypoint?: TaskAgentEntrypoint attached_context_refs: StorageRef[] schema_version: "1.0" } ``` ### §20D.3 Scoped Task Agent threads Task Agent uses one visible side-panel UI but scoped conversation threads to avoid context bleed. ```ts TaskAgentScopedThread { thread_id: string scope: | "general_task_agent" | "task_design" | "task_run_inspection" | "module_followup" | "artifact_followup" | "portfolio_review" | "prompt_improvement" task_id?: string run_id?: string module_id?: string activation_seq?: number artifact_id?: string openclaw_session_key?: string task_agent_entrypoint?: TaskAgentEntrypoint created_at: string updated_at: string schema_version: "1.0" } ``` The UI may feel like one Task Agent, but transcript storage and context packaging must be scoped to task/run/module/artifact where appropriate. --- ## §20E — Module continuation, follow-up, rerun, and fork-from-module ### §20E.1 Three distinct module-after-run actions The system distinguishes: 1. **Ask Task Agent about the module** — always available; Task Agent answers from run truth, prompt/context snapshots, artifacts, tools, errors, and outputs. 2. **Start follow-up from module context** — usually available; starts a new session seeded with the module’s prompt, inputs, outputs, DOC24 packet, tool/capability snapshot, and run summary. 3. **Continue original module agent session** — available only when DOC11/OpenClaw runtime truth reports that the original module/session is resumable and policy permits. The UI MUST NOT show a phantom “continue original session” control when the module was a one-shot call, the session was archived, or continuation is blocked by policy. ```ts TaskModuleSessionRef { task_id: string run_id: string module_id: string activation_seq: number agent_id?: string openclaw_session_key?: string openclaw_run_id?: string child_session_key?: string session_kind: | "one_shot_gateway_call" | "resumable_openclaw_session" | "spawned_subagent_session" | "forked_session" | "archived_session" | "unavailable" continuation_status: | "available" | "archived" | "expired" | "not_supported" | "blocked_by_policy" | "unknown" model_ref?: string think_level?: string agent_config_snapshot_ref?: StorageRef effective_prompt_snapshot_ref?: StorageRef doc24_context_packet_ref?: StorageRef input_artifact_refs: StorageRef[] output_artifact_refs: StorageRef[] tool_call_trace_refs: StorageRef[] subagent_trace_refs: StorageRef[] created_at: string completed_at?: string archived_at?: string schema_version: "1.0" } ``` ```ts ModuleFollowupSession { followup_session_id: string source_task_id: string source_run_id: string source_module_id: string source_activation_seq: number mode: | "continue_original_session" | "new_session_from_module_context" | "ask_task_agent_about_module" context_seed_refs: StorageRef[] user_question: string output_ref?: StorageRef linked_artifact_refs: StorageRef[] durable_effect: | "chat_only" | "module_followup_artifact" | "proposed_task_patch" | "proposed_run_fork" schema_version: "1.0" } ``` Continuing/following up on a completed module MUST NOT mutate the completed parent run. New answers, revised memos, or revised artifacts become follow-up artifacts linked to the original module activation. ### §20E.2 Mid-run interaction limits Mid-run inspection is allowed. Mid-run mutation is not presumed. The safe default for a running module: ```text Observe live trace: allowed. Ask Task Agent about status: allowed. Pause after current module: allowed where runtime supports. Cancel module/run: allowed through existing run controls. Inject new instruction into current LLM call: not allowed unless module/runtime explicitly supports intervention. Add note for downstream module or future fork: allowed. ``` ```ts ModuleSessionIntervention { intervention_id: string task_id: string run_id: string module_id: string activation_seq: number intervention_kind: | "user_note_for_downstream" | "cancel_current_call" | "pause_after_module" | "inject_instruction_if_supported" | "request_status_if_supported" supported_by_runtime: boolean applied: boolean receipt_ref: StorageRef schema_version: "1.0" } ``` ### §20E.3 Rerun/fork-from-module Users must be able to work from a specific module/activation without rerunning a whole 10–20 module graph. Completed parent runs remain immutable. Reruns/forks create new child runs with lineage. ```ts TaskRunForkRequest { request_id: string parent_task_id: string parent_run_id: string fork_from: { module_id: string activation_seq: number output_port_id?: string branch_scope?: "selected_branch" | "all_downstream" | "selected_modules" } rerun_scope: | "selected_module_only" | "selected_module_and_downstream" | "downstream_only" | "judge_only" | "experiment_only" | "branch_only" | "full_run_from_checkpoint" upstream_state_policy: | "freeze_parent_outputs" | "recompute_upstream" | "use_current_task_inputs" graph_policy: | "use_parent_graph_snapshot" | "use_current_task_definition" | "use_user_edited_fork_graph" context_policy: | "reuse_prior_doc24_packet" | "recompute_from_same_task_scope" | "recompute_from_current_memory_and_capabilities" source_document_policy: | "use_parent_document_versions" | "use_latest_document_versions" | "ask_if_changed" side_effect_replay_policy: | "block_all_external_side_effects" | "draft_only" | "allow_with_confirmation" | "allow_if_non_duplicate" user_patch_refs: StorageRef[] created_by: "user" | "task_agent" | "system_repair" schema_version: "1.0" } ``` Side-effect replay defaults to `block_all_external_side_effects` or `draft_only` for external sends, email, iMessage, calendar, webhooks, and file writes outside a safe run-scoped output folder unless the user explicitly authorizes replay. ### §20E.4 ModuleActivationReplayRecord ```ts ModuleActivationReplayRecord { task_id: string run_id: string module_id: string activation_seq: number graph_snapshot_ref: StorageRef module_config_snapshot_ref: StorageRef input_bundle_ref: StorageRef output_bundle_ref?: StorageRef checkpoint_ref?: StorageRef effective_prompt_snapshot_ref?: StorageRef doc24_context_packet_ref?: StorageRef capability_snapshot_ref?: StorageRef policy_snapshot_ref?: StorageRef model_runtime_snapshot_ref?: StorageRef source_document_version_refs: StorageRef[] artifact_refs: StorageRef[] delivery_record_refs: StorageRef[] error_refs: StorageRef[] downstream_activation_refs: Array<{ module_id: string activation_seq: number }> schema_version: "1.0" } ``` `activation_seq` is required because loops, repeated activations, fan-out, and multi-output modules make `module_id` alone insufficient. ### §20E.5 Judge and Experiment rerun UX Judge and Experiment modules are first-class beneficiaries of rerun/fork mechanics. Judge module detail actions SHOULD include: ```text Open judge audit Change rubric and rescore Rerun judge only Rerun with more source context Compare judge runs Generate prompt improvement proposal ``` Experiment module detail actions SHOULD include: ```text Open comparison Rerun variants Add variant Generate candidates Test winner downstream Promote winning prompt Fork downstream from variant B ``` `Experiment variant output → downstream branch fork` is a required fork pattern. Example: if Variant B produced the best damages memo, the user may fork downstream from Variant B into the Complaint Drafter and Final Revision modules without rerunning all upstream intake modules. --- ## §20F — Task Segment reusable module groups ### §20F.1 Definition A **Task Segment** is a saved reusable group of modules, cables, configs, prompts, optional local artifact/context policies, and optional local validation/evaluation expectations. It is not necessarily a full runnable task. Task Segment replaces the placeholder term “Sub-Task” for reusable graph fragments, because “sub-task” implies an active child task/run. Examples: ```text Source classification + fact extraction segment Complaint drafting first-pass segment Red-team + revise segment Experiment + Judge scoring segment Email attachment intake + DOC25 processing segment ``` ### §20F.2 Schema ```ts TaskSegmentRecord { segment_id: string display_name: string description?: string source_task_id?: string source_graph_snapshot_ref?: StorageRef selected_module_ids: string[] selected_cable_ids: string[] module_records_ref: StorageRef cable_records_ref: StorageRef local_blueprint_step_refs?: string[] required_input_ports: PortRef[] exposed_output_ports: PortRef[] module_preset_refs: string[] capability_requirements: CapabilityNeed[] artifact_policy_ref?: StorageRef evaluation_expectations_ref?: StorageRef semantic_card_ref?: StorageRef usage_count: number last_used_at?: string avg_duration_ms?: number avg_cost_usd?: number lifecycle_state: | "draft" | "active" | "deprecated" | "archived" schema_version: "1.0" } ``` ### §20F.3 UI behavior Users may create a Task Segment by selecting multiple modules/cables and choosing: ```text Save as Task Segment ``` from toolbar or context menu. Browser and Tasks page should show Task Segments as reusable task assets distinct from full Saved Tasks and Module Presets. Dragging a Task Segment into a graph inserts the saved module group and prompts for required inputs/outputs and any missing capabilities. Judge/Experiment modules may evaluate a Task Segment when it has sufficient input/output contracts for testing. Task Agent may recommend inserting a Task Segment during design, review, or repair. --- ## §20G — Tasks page command center and saved-task terminology ### §20G.1 Tasks page scope The Tasks page is the command center for task assets and runs. It should support: ```text [Active] [Scheduled] [Saved] [History] [Segments] [Presets] ``` Search and filters SHOULD include: ```text Status Matter/entity Trigger type Date range Output type Error status Saved/reusable type Template/source task Capability/tool used Judge/assessment score ``` ### §20G.2 Operational stats for saved/reusable tasks and presets Saved/reusable task rows SHOULD show run count, average time, average cost, success rate, last run, and common outputs. ```ts SavedTaskOperationalStats { task_id: string display_name: string run_count: number success_count: number failure_count: number success_rate: number avg_duration_ms: number p50_duration_ms: number p90_duration_ms: number avg_cost_usd: number p50_cost_usd: number p90_cost_usd: number last_run_at?: string last_success_at?: string last_failure_at?: string common_output_kinds: string[] common_failure_modes: string[] latest_assessment_ref?: StorageRef schema_version: "1.0" } ``` ```ts ModulePresetOperationalStats { preset_id: string module_type: string usage_count: number tasks_used_in: number templates_used_in: number avg_module_duration_ms?: number avg_module_cost_usd?: number common_downstream_consumers: string[] known_strengths: string[] known_limitations: string[] schema_version: "1.0" } ``` Cost truth should reference DOC13 when DOC13 is available. DOC23 stores task/run/module cost snapshots and references, not a competing authoritative cost ledger. ### §20G.3 Saved task vs task template For user-facing language: ```text Saved Task = any saved graph the user can run/edit/inspect. Reusable Saved Task = a saved task intended to be reused like a template. Scheduled Saved Task = a saved task with a schedule. Triggered Saved Task = a saved task with an email/webhook/file/task trigger. Module Preset = saved config for one module. Task Segment = saved reusable group of modules/cables. ``` Internal `TaskTemplateRecord` storage may remain where useful, but the primary UI noun should be `Reusable Saved Task` unless the product deliberately exposes templates as an advanced subtype. ```ts SavedTaskKind { task_id: string reusable: boolean scheduled: boolean trigger_enabled: boolean archived: boolean source_template_id?: string user_facing_label: | "Saved Task" | "Reusable Saved Task" | "Scheduled Saved Task" | "Triggered Saved Task" schema_version: "1.0" } ``` --- ## §20H — Standalone shared task surface forward-compatibility ### §20H.1 Scope and deferral Full shared task collaboration is not fully specified in this Addenda B revision. It is expected to belong to a future DOC50+ shared/networking spec family. This addendum nevertheless imposes forward-compatible constraints so DOC23 task UI is not hard-wired to the full DOC20 shell. ### §20H.2 DOC50 owner split Future shared/networking work SHOULD be owned by a DOC50+ family: ```text DOC50 — Shared Surfaces, Links, and Collaboration Kernel DOC51 — Team / Firm Identity, Roles, and Trust Domains DOC52 — Collaborative Workspaces and Shared Task Execution DOC53 — Sync, Conflict Resolution, and Multi-Client Networking DOC54 — External Sharing, Audit, and Export Governance ``` DOC23 owns only task-specific shared manifests and task-run behavior. DOC50 should own the common sharing capsule, identity, permissions, revocation, access logging, and collaboration shell. PropA/EC policy governs exposure; DOC24 governs shared-session capability/context packets. ### §20H.3 Shell-independent task components Task canvas, Run Inspector, Artifacts & Deliveries, module detail panels, and selected task summaries MUST be renderable without the full DOC20 Browser, right chat column, private workspace, or user’s ordinary Q shell. Internal route and shared route should reuse task content components where possible: ```text Internal: /tasks/:taskId?runId=:runId&view=graph Shared: /share/task/:shareId ``` ### §20H.4 TaskShareCapsule sketch ```ts TaskShareCapsule { share_id: string task_id: string task_version_ref: StorageRef graph_snapshot_ref: StorageRef optional_run_id?: string shared_by_principal_id: string recipient_principal_refs: PrincipalRef[] permission: | "view_only" | "comment_only" | "copy_task" | "edit_proposal_only" | "run_with_sender_approval" | "run_within_limits" | "edit_and_run_within_limits" | "trusted_collaborator" | "high_risk_warned_full_access" share_mode: | "task_design_only" | "specific_run_summary" | "specific_run_with_artifacts" | "specific_artifacts_only" | "collaborative_task_workspace" | "audit_packet" memory_policy: SharedTaskMemoryAccessPolicy file_policy: SharedTaskFileAccessPolicy output_policy: SharedTaskOutputPolicy email_policy: SharedTaskEmailPolicy capability_policy: SharedTaskCapabilityPolicy agent_policy: SharedTaskAgentAccessPolicy cost_policy: SharedTaskCostPolicy approval_policy: SharedTaskApprovalPolicy audit_policy: SharedTaskAuditPolicy redaction_profile: | "public_safe" | "external_professional" | "firm_collaborator" | "trusted_internal" | "full_access_warned" expires_at?: string revoked_at?: string schema_version: "1.0" } ``` ### §20H.5 Access-control principles for shared task workspaces A shared task workspace may eventually allow recipients to view, copy, propose edits, run with approval, run within limits, edit/run within limits, and use Task Agent within the granted scope. It must never grant implicit access to the sender’s full Q workspace. Controls must be explicit for: 1. what memories are available; 2. what files/documents/folders/libraries are available; 3. where outputs can be saved; 4. whether outputs can be written to the sender’s DOC72 memory or DOC73 libraries; 5. whether email/iMessage/calendar/webhook delivery is allowed, draft-only, or sender-approved; 6. what APIs/connectors/MCPs/preconfigured agents are available; 7. what cost/runs limits apply; 8. whether the sender must approve first runs, every run, graph edits, prompt edits, capability mounting, memory expansion, file expansion, external sends, and output saves. ### §20H.6 Shared Task Agent access Task Agent may be enabled in a shared task workspace only within the capsule’s policy. ```ts SharedTaskAgentAccessPolicy { enabled: boolean allowed_entrypoints: Array< | "review_existing_task" | "design_task_patch" | "inspect_task_run" | "retrieve_task_output" | "explain_task_graph" | "assess_task" | "improve_task_prompt" > grounding_mode: | "shared_manifest_only" | "shared_manifest_plus_selected_memory" | "shared_manifest_plus_sender_approved_context" may_query_sender_private_memory: boolean may_query_unshared_task_history: boolean may_modify_task_directly: boolean may_propose_task_patch: boolean require_sender_approval_for_task_agent_generated_patch: boolean schema_version: "1.0" } ``` Default shared-task Task Agent mode is `shared_manifest_only`, no private memory search, no unshared task-history access, no direct mutation, and proposal-only edits unless the sender grants stronger authority. ### §20H.7 Shared task audit All shared access/run/edit/Task Agent/capability/file/memory/output events must emit share-scoped audit events. ```ts TaskShareAuditEvent { event_id: string share_id: string actor_principal_id: string event_kind: | "share_opened" | "task_copied" | "graph_viewed" | "graph_patch_proposed" | "graph_patch_applied" | "task_run_requested" | "task_run_approved" | "task_run_started" | "task_run_completed" | "artifact_opened" | "artifact_downloaded" | "file_access_requested" | "memory_access_requested" | "capability_mount_requested" | "email_draft_created" | "external_send_requested" | "external_send_approved" | "policy_blocked" | "share_revoked" target_ref?: string policy_decision_ref?: PolicyDecisionRef created_at: string schema_version: "1.0" } ``` Full shared task workspace behavior is deferred, but all DOC23 task UI components must remain permission-aware and shell-independent enough to support this later. --- ## §21 — Routes, commands, read models, and SSE ### §21.1 Route posture Routes here are starting contracts for DOC23 R3.2+ incorporation. EC route registry remains canonical. ### §21.2 Proposed routes | Route | Method | Purpose | |---|---|---| | `/api/tasks/agent/design-sessions` | POST | Start Task Agent design session. | | `/api/tasks/agent/design-sessions/:id` | GET | Get design session state. | | `/api/tasks/agent/design-sessions/:id/propose-blueprint` | POST | Generate/update blueprint proposal. | | `/api/tasks/agent/design-sessions/:id/propose-graph` | POST | Generate graph proposal. | | `/api/tasks/agent/design-sessions/:id/preflight` | POST | Run preflight. | | `/api/tasks/:taskId/blueprint` | GET/PUT | Read/update blueprint via EC command. | | `/api/tasks/:taskId/runs/:runId/lens` | GET | TaskAgentRunLens. | | `/api/tasks/:taskId/runs/:runId/artifacts` | GET | Artifact index for run. | | `/api/tasks/artifacts/search` | POST | Search artifacts across tasks/runs. | | `/api/tasks/artifacts/:artifactId/promote` | POST | Request artifact intake/promotion. | | `/api/tasks/:taskId/runs/:runId/assessment` | POST/GET | Request/read Task Assessment. | | `/api/tasks/templates/search` | POST | Semantic template search. | | `/api/tasks/presets/search` | POST | Semantic module preset search. | | `/api/tasks/knowledge-pack/status` | GET | Active TaskSystemKnowledgePack status. | | `/api/tasks/knowledge-pack/rebuild` | POST | Request rebuild through EC. | | `/api/tasks/design-learning/proposals` | GET | Review improvement proposals. | | `/api/tasks/design-learning/proposals/:id/apply` | POST | Apply approved proposal. | ### §21.3 Proposed commands ```ts TaskAgentCommandType = | "task.design_session.start" | "task.design_session.update" | "task.blueprint.create" | "task.blueprint.update" | "task.graph.proposal.create" | "task.graph.diff.apply" | "task.preflight.run" | "task.assessment.request" | "task.artifact.promote.request" | "task.template.semantic_card.update" | "task.preset.semantic_card.update" | "task.template.update.propose" | "task.preset.update.propose" | "task.knowledge_pack.rebuild" | "task.design_improvement.proposal.apply" ``` ### §21.4 Proposed SSE events ```ts TaskAgentSseEvent = | "task.agent.design_session.started" | "task.agent.template_matches.updated" | "task.agent.blueprint.proposed" | "task.agent.graph.proposed" | "task.agent.preflight.completed" | "task.agent.run_lens.updated" | "task.artifact.index.updated" | "task.artifact.promotion.requested" | "task.assessment.started" | "task.assessment.completed" | "task.knowledge_pack.staged" | "task.knowledge_pack.activated" | "task.knowledge_pack.drift_detected" | "task.design_learning.proposal.created" | "task.design_learning.proposal.applied" ``` --- ## §21A — Routes, commands, read models, and events to add ### §21A.1 Routes / commands ```text POST /api/tasks/mode-decision GET /api/tasks/opportunity-packet POST /api/tasks/task-agent/consult POST /api/tasks/task-agent/design POST /api/tasks/task-agent/adapt-template POST /api/tasks/task-agent/review POST /api/tasks/task-agent/inspect-run POST /api/tasks/task-agent/retrieve-output POST /api/tasks/task-agent/assess POST /api/tasks/task-agent/portfolio-assess POST /api/tasks/task-agent/improve-prompt POST /api/tasks/task-agent/test-prompt POST /api/tasks/prompt-evaluation-jobs GET /api/tasks/prompt-evaluation-jobs/:jobId POST /api/tasks/invocation-feedback POST /api/tasks/proposal-edit-trace GET /api/tasks/knowledge-pack/status POST /api/tasks/knowledge-pack/rebuild ``` Route names are illustrative; final paths must conform to DOC23 route conventions and EC command-envelope standards. ### §21A.2 SSE / events ```text task.mode.decision_recorded task.opportunity.packet_ready task.agent.consult_started task.agent.consult_completed task.agent.design_started task.agent.design_completed task.agent.assessment_started task.agent.assessment_completed task.agent.degraded task.agent.fallback_model_used task.suggestion.feedback_recorded task.invocation.learning_signal_recorded task.proposal_edit.trace_recorded task.prompt.quality_assessed task.prompt.candidates_generated task.prompt.evaluation_started task.prompt.evaluation_completed task.knowledge_pack.stale task.knowledge_pack.activated ``` ### §21A.3 Read models ```text TaskAgentStatusReadModel TaskModeDecisionReadModel TaskOpportunityPacketReadModel TaskInvocationDirectiveReadModel TaskSuggestionFeedbackReadModel TaskAgentDesignSessionReadModel TaskAgentProposalEditTraceReadModel TaskPortfolioAssessmentReadModel TaskPromptQualityReadModel PromptEvaluationJobReadModel TaskKnowledgePackReadinessReadModel TaskModuleContextPacketReadModel TaskModuleContextExclusionReadModel ``` --- ## §21B — Additional routes, commands, read models, and events for R0.6.4 UI/run-control additions ### §21B.1 Additional routes / commands Route names are illustrative until incorporated into DOC23 route conventions and EC registry. ```text GET /api/tasks/:taskId/runs/:runId/inspector GET /api/tasks/:taskId/runs/:runId/flow GET /api/tasks/:taskId/runs/:runId/deliverables GET /api/tasks/:taskId/runs/:runId/context-audit/:moduleId/:activationSeq POST /api/tasks/:taskId/runs/:runId/context-feedback GET /api/tasks/:taskId/runs/:runId/modules/:moduleId/:activationSeq/quick-access POST /api/tasks/:taskId/runs/:runId/export-audit POST /api/tasks/:taskId/runs/:runId/fork GET /api/tasks/:taskId/runs/:runId/forks POST /api/tasks/:taskId/runs/:runId/modules/:moduleId/:activationSeq/followup POST /api/tasks/:taskId/runs/:runId/modules/:moduleId/:activationSeq/continue-session POST /api/tasks/:taskId/runs/:runId/modules/:moduleId/:activationSeq/intervene POST /api/tasks/segments GET /api/tasks/segments/:segmentId POST /api/tasks/segments/:segmentId/insert POST /api/tasks/:taskId/segments/save-selection GET /api/tasks/stats/saved-tasks GET /api/tasks/stats/presets GET /api/tasks/:taskId/navigation-state PUT /api/tasks/:taskId/navigation-state POST /api/tasks/share-capsules // deferred / DOC50-owned when available GET /api/tasks/share-capsules/:shareId/manifest ``` ### §21B.2 Additional command types ```ts TaskUiAndRunControlCommandType = | "task.run_inspector.open" | "task.navigation_state.update" | "task.context_feedback.record" | "task.audit_export.request" | "task.run.fork.request" | "task.module.followup.start" | "task.module.session.continue" | "task.module.intervention.request" | "task.segment.create" | "task.segment.insert" | "task.segment.update" | "task.segment.archive" | "task.share_capsule.create" // future DOC50 integration | "task.share_capsule.revoke" // future DOC50 integration ``` ### §21B.3 Additional read models ```text TaskGraphInspectorNavigationStateReadModel ModuleRunQuickAccessReadModel TaskRunInspectorReadModel TaskRunStepRowReadModel TaskRunNarrativeStatusReadModel TaskDeliverablesReadModel TaskDeliveryRecordReadModel TaskPromptContextInspectorReadModel TaskAuditExportBundleReadModel TaskModuleSessionRefReadModel ModuleFollowupSessionReadModel TaskRunForkRequestReadModel TaskRunForkHistoryReadModel ModuleActivationReplayRecordReadModel TaskSegmentReadModel SavedTaskOperationalStatsReadModel ModulePresetOperationalStatsReadModel TaskShareCapsuleReadModel // future DOC50 integration SharedTaskSurfaceManifestReadModel // future DOC50 integration ``` ### §21B.4 Additional SSE/events ```text task.ui.navigation_state_updated task.run.inspector_updated task.run.flow_updated task.run.narrative_status_updated task.module.quick_access_updated task.delivery.recorded task.deliverables.updated task.context.feedback_recorded task.audit_export.started task.audit_export.completed task.module.followup.started task.module.session.continuation_available task.module.session.continuation_unavailable task.module.intervention.recorded task.run.fork_requested task.run.fork_created task.run.fork_completed task.segment.created task.segment.inserted task.segment.updated task.saved_task.stats_updated task.preset.stats_updated task.share_capsule.created // future DOC50 integration task.share_capsule.revoked // future DOC50 integration task.share.audit_event_recorded // future DOC50 integration ``` --- ## §22 — Storage paths ### §22.1 Proposed paths ```ts TaskAddendaBPaths = { taskBlueprint: "ELNOR_MEMORY/tasks/{task_id}/blueprint.json", taskBlueprintVersions: "ELNOR_MEMORY/tasks/{task_id}/blueprints/{blueprint_version}.json", taskDesignSessions: "ELNOR_MEMORY/system/task_design/sessions/{design_session_id}.json", taskDesignSessionEvents: "ELNOR_MEMORY/system/task_design/sessions/{design_session_id}/events.jsonl", taskTraceSpans: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/trace_spans.jsonl", taskPromptSnapshots: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/prompt_snapshots/", taskCapabilitySnapshots: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/capability_snapshots/", taskModuleContextPackets: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/context_packets/{module_id}/{activation_seq}.json", taskPolicySnapshots: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/policy_snapshots/", taskArtifactIndex: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/artifact_index.json", taskAssessments: "ELNOR_MEMORY/tasks/{task_id}/assessments/{assessment_id}.json", templateSemanticCards: "ELNOR_MEMORY/system/task_system/template_semantic_cards/{template_id}.json", modulePresetSemanticCards: "ELNOR_MEMORY/system/task_system/module_preset_semantic_cards/{preset_id}.json", taskDesignCases: "ELNOR_MEMORY/system/task_design/cases/{case_id}.json", taskDesignPatterns: "ELNOR_MEMORY/system/task_design/patterns/{pattern_id}.json", taskDesignImprovementProposals: "ELNOR_MEMORY/system/task_design/improvement_proposals/{proposal_id}.json", taskKnowledgePacks: "ELNOR_MEMORY/system/task_agent/knowledge_packs/{pack_id}/", activeTaskKnowledgePackPointer: "ELNOR_MEMORY/system/task_agent/active_knowledge_pack.json", taskKnowledgeCompilerRuns: "ELNOR_MEMORY/system/task_agent/compiler_runs/{compiler_run_id}.json", taskAgentEvalSuites: "ELNOR_MEMORY/system/task_agent/evals/{suite_id}.json", taskAgentEvalRuns: "ELNOR_MEMORY/system/task_agent/eval_runs/{eval_run_id}.json", taskDeliveries: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/deliveries.jsonl", taskRunNarrativeStatus: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/narrative_status.json", taskRunInspectorReadModel: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/inspector_read_model.json", taskModuleQuickAccess: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/module_quick_access/{module_id}/{activation_seq}.json", taskPromptContextInspectors: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/prompt_context_inspectors/{module_id}/{activation_seq}.json", taskContextFeedbackEvents: "ELNOR_MEMORY/system/task_feedback/context_feedback.jsonl", taskAuditExports: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/audit_exports/{export_id}/", taskModuleSessionRefs: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/module_sessions/{module_id}/{activation_seq}.json", moduleFollowupSessions: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/module_followups/{followup_session_id}.json", taskRunForkRequests: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/fork_requests/{request_id}.json", taskRunForks: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/forks/{child_run_id}.json", moduleActivationReplayRecords: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/replay_records/{module_id}/{activation_seq}.json", taskSegments: "ELNOR_MEMORY/system/task_system/segments/{segment_id}.json", taskSegmentSemanticCards: "ELNOR_MEMORY/system/task_system/segment_semantic_cards/{segment_id}.json", savedTaskOperationalStats: "ELNOR_MEMORY/system/task_system/stats/saved_tasks/{task_id}.json", modulePresetOperationalStats: "ELNOR_MEMORY/system/task_system/stats/module_presets/{preset_id}.json", taskShareCapsules: "ELNOR_MEMORY/system/sharing/task_capsules/{share_id}.json", taskShareManifests: "ELNOR_MEMORY/system/sharing/task_manifests/{share_id}.json", taskShareAuditEvents: "ELNOR_MEMORY/system/sharing/task_audit_events.jsonl" } ``` ### §22.2 Storage classification | Artifact | Canonical/derived | Owner | |---|---|---| | Task Blueprint | canonical DOC23 task metadata | DOC23/EC | | TaskTraceSpan | operational truth/log | DOC23/EC | | TaskArtifactIndex | run read model | DOC23/EC | | Task Assessment | derived assessment artifact | DOC23/EC | | Semantic cards | projection/derived unless user-authored fields | DOC23 + DOC72 | | Knowledge Pack | generated/derived active resource | DOC23/EC | | Learning signals | append-only learning inputs | DOC8/BDSM + EC | | Delivery records | append-only/run-indexed receipts | DOC23/EC + output owner docs | | Context feedback events | append-only learning inputs | DOC8/BDSM + EC | | Module session refs | runtime truth bridge/read model | DOC23 consumes DOC11/OpenClaw | | Run fork requests/replay records | canonical child-run lineage inputs | DOC23/EC | | Task Segments | canonical reusable task asset | DOC23/EC | | Share capsules/manifests | future shared-surface permission artifact | DOC50/EC/PropA, DOC23 task-specific manifest | --- ## §23 — Validation codes ### §23.1 Proposed validation errors ```text validation.task_agent_template_search_required validation.blueprint_missing_required_output validation.blueprint_graph_mismatch validation.blueprint_side_effect_undeclared validation.capability_required_unavailable validation.capability_denied_but_requested validation.runtime_expansion_outside_allowlist validation.artifact_policy_endpoint_ambiguous validation.doc25_required_but_unavailable validation.doc73_library_missing validation.task_memory_policy_conflict validation.incognito_blocks_requested_memory validation.prompt_snapshot_required_missing validation.task_knowledge_pack_stale_blocking validation.template_update_requires_approval validation.preset_update_requires_approval validation.graph_proposal_not_preflighted validation.task_assessment_input_trace_missing validation.task_module_context_packet_required validation.task_module_context_active_context_unjustified validation.task_module_context_policy_snapshot_missing validation.task_module_context_source_visibility_violation validation.task_module_context_profile_overrides_denied_capability validation.module_run_quick_access_missing validation.task_delivery_record_missing_for_delivered_output validation.task_context_feedback_target_missing validation.task_run_fork_requires_activation_seq validation.task_run_fork_side_effect_replay_policy_required validation.module_session_continuation_phantom_control validation.task_segment_missing_exposed_io validation.task_share_capsule_policy_missing validation.task_shared_surface_shell_dependency ``` ### §23.2 Proposed warnings ```text warning.template_match_available_but_new_graph_requested warning.intermediate_output_likely_high_value warning.tool_pack_too_broad warning.capability_historically_unused warning.capability_historically_unreliable warning.second_pass_deepening_recommended warning.human_gate_recommended_before_side_effect warning.final_output_not_saved_as_work_product warning.blueprint_goal_unlinked warning.task_design_case_candidate warning.knowledge_pack_stale_nonblocking warning.task_module_context_active_context_excluded warning.task_module_context_broad_scope_declared warning.task_module_context_relevant_memory_trimmed warning.task_module_context_library_candidate_excluded warning.task_module_context_prior_task_candidate_excluded warning.run_inspector_timeline_linearization_loss warning.module_output_link_missing warning.task_agent_panel_unscoped_thread warning.task_segment_possible_duplicate warning.audit_export_redaction_required warning.shared_task_feature_deferred_doc50 ``` --- ## §24 — Cross-doc obligations ### §24.1 DOC72 obligations DOC72 must support or confirm: 1. `execution_trace` payloads for DOC23 TaskRun activity memory. 2. Task/matter/entity/artifact/work-product/followup-session relationship patterns. 3. Task-design `goal` payload/annotations and goal-link relations. 4. Semantic projection support for task templates, module presets, design cases, and design patterns. 5. Work-product links from task outputs. 6. Provenance discipline for any task-derived durable memory. ### §24.2 DOC24 obligations DOC24 must support or confirm: 1. Task Agent as registered system agent/capability. 2. Capability registry lookup for task modules and Task Agent. 3. Runtime capability expansion receipts. 4. DOC24 packet snapshots for task module dispatch. 5. Rendering/injection of Task Design Intelligence cards. 6. MCP/connector/procedure/model availability surfaces. 7. Tool/procedure/capability utility bundle consumption from BDSM. 8. Task-module DOC24 context packet assembly that is task-scoped, module-scoped, profile-gated, policy-gated, and receipt-backed. 9. Active chat/work context exclusion unless explicitly inherited or high-confidence bound by task-owned evidence. 10. Context packet exclusion receipts for Run Inspector, Task Assessment, and Task Agent operational lens. ### §24.3 DOC25 obligations DOC25 must support: 1. Task-produced document ingestion. 2. Task-downloaded document ingestion. 3. Task-received attachment ingestion. 4. Task final work-product ingestion. 5. DOC25_IngestionResult references back to TaskArtifactIndex. 6. Quality/degraded-state reporting visible in Run Inspector and Task Assessment. ### §24.4 DOC73 obligations DOC73 must support: 1. Task-output-to-library/corpus source class. 2. Library/corpus binding from DOC23 artifact promotion. 3. Extraction profile selection from task/template/blueprint. 4. User-facing “library” terminology in task UI. 5. Receipts linking DOC23 artifact -> DOC25 ingestion -> DOC73 library member. ### §24.5 DOC8/BDSM obligations DOC8/BDSM must support: 1. TaskCreationSessionTrace-derived learning signals. 2. Task Agent question utility. 3. Template/preset suggestion utility. 4. Capability utility in task context classes. 5. Artifact policy utility. 6. Task design pattern utility. 7. Task Assessment learning inputs. 8. Compiled bundles consumable by Task Agent and DOC24. ### §24.6 EC Core obligations EC must support: 1. Task Agent command registration. 2. Knowledge Pack compiler jobs and activation pointer. 3. Route/read model closure. 4. Storage paths. 5. Atomic writes and rollback/proposal ledgers. 6. Nightly/ongoing extraction jobs. 7. Incognito/effective runtime enforcement. 8. Task run telemetry write paths. 9. Drift detection and eval gating. ### §24.7 DOC3 obligations DOC3 must support: 1. Semantic procedure discovery by Task Agent through DOC24/DOC72. 2. Procedure use/outcome links to TaskRun memory. 3. Distinction between DOC3 procedures and DOC23 task templates/presets. 4. Optional task-design learning analogies without collapsing task templates into DOC3 procedures. ### §24.8 DOC20/DOC21/DOC22 obligations UI/content docs must register: - Task Agent Create surface; - Blueprint editor; - Template Match panel; - Task Agent Run Lens; - Artifact Finder; - Task Assessment page/panel; - Knowledge Pack inspector; - Task Design Learning Review Queue; - task modular page as DOC20 `task` tab; - Browser task filter/default behavior; - graph overview preservation rule; - ModuleRunQuickAccess block in every module detail/config panel; - Graph ⇄ Run Inspector navigation and state synchronization; - Run Inspector with Header, Run Flow & Steps, Artifacts & Deliveries, and Context & Audit; - artifact/document context menu and native-app/Finder actions where available; - Context Inspector with DOC24 injected/excluded context and feedback controls; - Task Agent side-panel chat with full chat treatment and scoped context; - Tasks page command center with active/scheduled/saved/history/segments/presets views; - Task Segment asset type in Browser/Tasks surfaces; - saved/reusable task operational stats; - shell-independent task content components for future shared-task rendering. ### §24.9 OP-A obligation tracker All accepted cross-doc obligations from this addendum must be added to OP-A during the next OP-A maintenance pass. --- ## §24A — Cross-doc obligations to add to OP-A ### §24A.1 DOC24 obligations 1. Add Task Mode Resolver prefilter and `TaskModeDecision` generation. 2. Add `TaskOpportunityPacket` runtime packet lane. 3. Register Task Agent as an invokable agent capability. 4. Register Task Agent entrypoints in capability registry. 5. Support compact top-k task template / module preset / invocation directive injection. 6. Enforce no full TKP injection into ordinary chat. 7. Route TaskInvocationDirective cards through DOC24 delivery architecture. 8. Feed task suggestion feedback into BDSM/DOC8. 9. Expose model/capability/tool availability to Task Agent through live registry and packet assembly. 10. Add prompt-improvement routing so DOC17 Prompt Advisor service and Task Agent do not conflict. 11. Add TaskModuleContextPacket assembly and receipt support for task modules. 12. Enforce task-context isolation: active chat/work context is candidate evidence, not automatic task-module context. 13. Add library/document/source binding gates so DOC73/DOC25 context appears in task modules only when task-bound or explicitly selected. ### §24A.2 DOC11 / OpenClaw obligations 1. Expose Task Agent named-agent runtime truth if registered. 2. Expose model catalog, fallback chain, auth, and provider availability for Task Agent runtime profile. 3. Expose sub-agent settings/status/events for Task Agent background design sessions. 4. Support isolated/fork context mode truth for Task Agent context packaging. 5. Ensure visible Task Agent controls map to actual routes/read models or degraded states. 6. Distinguish OpenClaw heartbeat/session monitoring from saved-task process monitoring. ### §24A.3 DOC72 obligations 1. Store task runs as `execution_trace` hubs linked to entities, matters, artifacts, work products, templates, goals, directives, and follow-up sessions. 2. Support TaskInvocationDirective as memory_directive or standing_procedure payload depending on lifecycle/governance. 3. Support Task Design Goals as `goal` nodes and edge links from blueprints, templates, design decisions, patterns, outputs, and assessments. 4. Store semantic projections/cards for task templates and module presets without duplicating executable graph truth. 5. Support task-design casebook and prior-task rationale links. ### §24A.4 DOC3 obligations 1. Preserve boundary between DOC3 semantic skills/procedures and DOC23 saved task graphs. 2. Allow TaskInvocationDirective to reference DOC3 procedures without materializing task graphs as SKILL.md. 3. Ensure DOC24 delivery of graph-backed procedures remains direct injection from DOC72 contracts. ### §24A.5 DOC17 obligations 1. Reframe Prompt Advisor as lightweight prompt-analysis/rewrite service, not a separate agent. 2. Keep Prompt Lab as backend/offline bridge only if retained; no separate primary user-facing prompt lab required. 3. Allow Task Agent to call Prompt Advisor service in task-context prompt improvement. 4. Keep prompt recipes DOC17-owned while allowing Task Agent to test/use/convert them in task context. ### §24A.6 DOC23 Addenda A obligations 1. Extend Experiment to accept prompt-candidate sources or bundles. 2. Reserve DSPy/GEPA prompt optimization until R5 substrate is implemented. 3. Integrate PromptEvaluationTask with Experiment/Judge/Claim Extractor and Task Assessment. 4. Ensure prompt promotion uses promotion ledger, hashes, datasets, rollback, and post-promotion monitoring when operative. ### §24A.7 DOC8/BDSM obligations 1. Add task invocation utility signals. 2. Add task suggestion accepted/rejected/ignored events. 3. Add Task Agent design utility signals. 4. Add TaskAgentProposalEditTrace learning consumption. 5. Add prompt-edit and prompt-evaluation learning signals. 6. Compile task suggestion suppression/boost policies by context class. ### §24A.8 EC Core obligations 1. Add Task Agent system-agent identity registry entry. 2. Add Task Agent runtime profile storage and settings controls. 3. Add EC command routes for Task Agent proposals, TKP rebuilds, TaskModeDecision receipts, and prompt-evaluation tasks. 4. Enforce visible receipt for Elnor-created saved tasks. 5. Enforce no hidden graph runs. 6. Maintain TKP active/staged/rejected state and drift detection. ### §24A.9 DOC25 / DOC73 obligations 1. Produced, downloaded, received, or task-generated documents routed through DOC25 ingestion when artifact policy requires. 2. DOC73 libraries/corpora remain destinations for deep bounded knowledge; TKP is not a normal user-facing DOC73 library. 3. Prompt-evaluation tasks producing documents/work products route outputs through DOC25/DOC73 per artifact policy. ### §24A.10 DOC20 / DOC21 / DOC22 obligations 1. Render Task Agent panels in task editor, Run Inspector, task list, templates/presets, and prompt editor. 2. Render Task Agent settings. 3. Render task opportunity chips and suppression states. 4. Render prompt-improvement levels without exposing duplicate Prompt Lab surface. 5. Render TKP readiness/drift diagnostics for advanced/admin users. ### §24A.11 DOC20 / DOC21 / DOC22 UI obligations from R0.6.4 1. Task modular canvas opens as a DOC20 workspace `task` tab with right chat column closed by default. 2. Browser defaults to Task filter when Task page is opened through Pages navigation. 3. Browser drag/drop supports saved tasks, modules, documents, artifacts, and Task Segments into allowed graph/drop targets. 4. Module detail/config panels include ModuleRunQuickAccess with Open Output and Open Run Inspector. 5. Output/artifact clicks open the appropriate DOC20 viewer/tab and preserve task-tab navigation. 6. Graph ⇄ Run Inspector navigation preserves selected task/run/module/artifact state. 7. Run Inspector is consolidated into Header, Run Flow & Steps, Artifacts & Deliveries, and Context & Audit. 8. Artifact context menus include Show in Finder, Show in Browser, Save As, Open in Q, Open in Native App, Copy Link, Add to Library, Save as Work Product, Make Findable, Use in New Task, and Ask Task Agent where supported. 9. Task Agent side panel receives full chat treatment and scoped context refs. 10. Task page supports active/scheduled/saved/history/segments/presets command-center views and operational stats. 11. DOC21/22 component registry must include Task Segment, Run Flow & Steps, Artifacts & Deliveries, Context Inspector, Task Agent Panel Context, and ModuleRunQuickAccess components. ### §24A.12 DOC11 / OpenClaw obligations from R0.6.4 1. Expose whether a task module activation used a one-shot Gateway call, resumable OpenClaw session, spawned sub-agent session, forked session, archived session, or unavailable session. 2. Expose continuation availability/status for module agent sessions. 3. Expose child session keys, run IDs, model refs, think levels, fallback usage, and session archive/expiration status needed for `TaskModuleSessionRef`. 4. Support no-phantom continuation controls: if continuation is unavailable, Q must receive a reason code. 5. Surface sub-agent/session traces for Run Inspector and Task Agent without making DOC23 the OpenClaw runtime owner. ### §24A.13 DOC23 Addenda A obligations from R0.6.4 1. Judge modules must support rescore-same-output and compare-judge-runs flows when the underlying data exists. 2. Experiment modules must support rerun variants, add variant, compare variants, and fork downstream from a selected/winning variant. 3. Experiment prompt-candidate handling must integrate with run fork/downstream fork behavior when prompt candidates produce downstream artifacts. 4. Judge/Experiment detail views must expose evaluation evidence, prompts/rubrics, source context, and Task Assessment entrypoints. ### §24A.14 DOC50+ / sharing obligations to create 1. Create DOC50 Shared Surfaces, Links, and Collaboration Kernel as the common owner for share capsules, identity, permissions, revocation, audit, and shared rendering shells. 2. DOC23 owns only task-specific shared manifests and task-run behavior. 3. DOC73 owns library-specific shared manifests while consuming DOC50 permission capsules. 4. DOC20 renders shared surfaces as shell-independent viewports. 5. EC/PropA policy engine gates shared memory, file, tool, connector, email, cost, output, and Task Agent authority. 6. DOC24 assembles shared-session capability/context packets constrained by the share capsule. 7. Shared Task Agent access must be grounded only in the shared manifest/selected memory unless stronger trusted access is explicitly granted. 8. High-risk full-access sharing requires explicit warning, expiration/revocation, and audit receipts. ### §24A.15 DOC8/BDSM learning obligations from R0.6.4 1. Consume `TaskContextFeedbackEvent` for injected/excluded DOC24 context, tools, memories, documents, connectors, procedures, and libraries. 2. Consume module-output/open/save/promote actions as artifact-utility signals. 3. Consume Task Agent panel feedback, scoped-thread success/failure, and module-followup utility signals. 4. Consume run-fork/follow-up outcomes as task-design and repair utility signals. 5. Consume Task Segment reuse success/failure and duplicate/near-duplicate segment signals. --- ## §24B — Cross-doc obligations from Addenda A ↔ Addenda B coordination V3 FINAL (R0.7 absorption) The following OP-A rows originate from the Addenda A ↔ Addenda B coordination V3 FINAL proposal. They are paste-ready for the OP-A ledger. Status reflects state as of R0.7 freeze (2026-05-17). ### §24B.1 OP-A rows ``` OBL-XDOC-EVAL-ENV-01 Owner: DOC23 Evaluation Common Contracts V1.1.1 (sibling doc; per coord V3 §3.2) Consumer: Addenda A R4.1 V3, Addenda B Core R0.7.1, V3.3 Description: Shared EvaluationResultEnvelope schema with EvaluationArtifactEnvelope wrapper Status: specified_in_owner OBL-XDOC-MODULES-REGISTRY-01 Owner: DOC23 R3.2 (target) Consumer: All addenda using these module types Description: Register step.judge, step.evaluator, step.revisor, step.claim_extractor in parent module registry Status: pending_R3_2_compile OBL-XDOC-SCOPE-PRIMITIVES-01 Owner: DOC23 Evaluation Common Contracts V1.1.1 Consumer: Addenda A, Addenda B Core R0.7.1, V3.3.1, PropA R6.3+ Description: ArtifactScopeRef, TextAnchor, StructuredAnchor as shared primitives Status: specified_in_owner OBL-XDOC-OUTCOME-COMPLIANCE-01 Owner: Addenda A R4.1 V3 (V4.1 Coordination Patch / V5 Mini-Card) Consumer: Addenda B (V3.3 Evaluator) — via Pattern C wiring Description: Judge gains outcome_compliance_scoring method consuming EvaluationOutcomeDefinition.criteria[] directly. Pattern C wiring allows Judge to attach downstream of any Evaluator output. Status: in_review OBL-XDOC-PROMPT-COMPARISON-SIGNAL-01 Owner: Addenda A R4.1 V3 Consumer: DOC8/BDSM Description: Experiment emits PromptComparisonSignal wrapped in EvaluationLearningSignalEnvelope including task_design_signature when applicable Status: in_review OBL-XDOC-CLAIM-EXTRACTOR-PUBLIC-01 Owner: Addenda A R4.1 V3 Consumer: V3.3 Evaluator dispatches Description: step.claim_extractor as public contract with claims_out port; broadened output to ExtractedEvaluationUnit union (22 types); section-anchored + privilege-tagged units; no virtual data_out alias Status: in_review OBL-XDOC-EVALUATOR-CLAIMS-IN-01 Owner: Addenda B V3.3.1 (specified in V3.3 §5.17) Consumer: Addenda A Claim Extractor wiring patterns Description: Evaluator adds claims_in port consuming ClaimSetBundle / ExtractedEvaluationUnitBundle Status: specified_in_owner OBL-XDOC-EVAL-SIGNAL-OWNERSHIP-01 Owner: Addenda B Core R0.7.1 (this doc, §9.0) Consumer: DOC8/BDSM Description: Define and emit OutcomeEvaluationSignal, RepairCycleSignal (with taint_evolution and qualitative_delta), TaskProcessGapSignal (runtime), TaintClearanceSignal, HardCallResolutionSignal — all wrapped by EvaluationLearningSignalEnvelope including task_design_signature when applicable Status: specified_in_owner OBL-XDOC-LEARNING-MODE-01 Owner: Addenda B V3.3.1 (specified in V3.3 §6.16) Consumer: EC Core (cost governance integration) Description: RevisorConfig.learning_mode (production / signal_generation / calibration) Status: specified_in_owner OBL-XDOC-MODEL-CLASS-AXIS-01 Owner: DOC72 Consumer: Addenda B Core R0.7.1, V3.3.1, Addenda A R4.1 Description: Add model_class axis to PatternContextSignature; add cross_model_applicability to Pattern Status: pending_DOC72_update OBL-XDOC-BDSM-CONSUME-SIGNALS-01 Owner: DOC8/BDSM Consumer: Pattern primitive (DOC72), Task Agent (Addenda B) Description: Consume governed signal stream (all eight Phase 1 signal types); produce utility bundles; threshold-gate surfacing via PatternSurfacingThreshold; emit TaskDesignCorrelationSignal Status: pending_DOC8_update OBL-XDOC-EC-POLICY-SIGNALS-01 Owner: EC Core Consumer: All signal emitters Description: Compiled policy engine gates every signal at envelope layer based on data_class, matter_id, pattern_promotion_eligible Status: pending_EC_Core_update OBL-XDOC-PROPA-DSPY-TARGETS-01 Owner: PropA R6.3+ Consumer: Addenda A R4.1, Addenda B Core R0.7.1, V3.3 Description: Add DspyTargetIdSchemaV4 values: claim_extractor_main, outcome_evaluator_main, revision_compiler_main, outcome_compiler_main Status: pending_PropA_update OBL-XDOC-DOC20-EVAL-UI-01 Owner: DOC20 Consumer: User UI Description: UI surfaces for shared envelope, variant comparison (Patterns A and B), Pattern C ad-hoc Judge attachment, learning_mode toggle, pattern display with model_class context, graph-edit warning for implicit auto-revision chains (Revisor downstream of Experiment with pass_through_winner + autonomous Revisor), wiring validation error when non-comparison-aware consumer wired downstream of Experiment with route_all_variants Status: pending_DOC20_update ``` ### §24B.2 Consuming-doc inserts (cross-doc obligation mechanism) The following `[XDOC-INSERT: target_doc]` blocks are reciprocal to the Addenda A R4.1 V3 / V4.1 Coordination Patch outputs. The architect's coding agent applies them when the named target doc is next revised. **[XDOC-INSERT: DOC23 R3.2 (when compilation pass runs)]** ``` 1. Absorb DOC23 Evaluation Common Contracts V1.1.1 into parent doc as new top-level section. Common Contracts V1.1.1 retires per its §11 migration guide. 2. Extend module type registry with: - step.judge (Addenda A) - step.evaluator (Addenda B V3.3) - step.revisor (Addenda B V3.3; not "reviser") - step.claim_extractor (Addenda A) - step.dspy_optimizer (PropA; consumed by all extractors and evaluators) 3. Document Patterns A, B, C wiring as first-class supported patterns. 4. Update references in Addenda A R4.1 V3 / Addenda B Core R0.7.1 / V3.3.1 / sub-addenda to point to DOC23 R3.2 sections rather than Common Contracts V1.1.1. ``` **[XDOC-INSERT: DOC72 next revision]** ``` 1. Extend PatternContextSignature with model_class axis (cheap_local / cheap_api / medium / expensive_frontier). Mandatory for new Pattern records. 2. Add cross_model_applicability field to Pattern primitive: - "model_class_specific" - "cross_model_applicable" - "requires_validation" (default for new patterns) 3. Enforce matter-scoped retrieval firewall per Addenda B V3.1/V3.3 §13.4. Matter-scoped Patterns do not surface in cross-matter contexts. Privileged-matter pattern promotion requires EC Core policy gate. 4. Cross-reference OBL-XDOC-MODEL-CLASS-AXIS-01 in DOC72's OP-A list. ``` **[XDOC-INSERT: DOC8/BDSM next revision]** ``` 1. Consume unified governed signal stream wrapped in EvaluationLearningSignalEnvelope per Common Contracts §5.1. Discriminate by signal_type (eight Phase 1 types). 2. Produce utility bundles consumed by DOC72 Pattern primitive store. 3. Threshold-gate pattern surfacing via PatternSurfacingThreshold: min_runs, min_distinct_tasks, min_success_confidence, max_regression_rate. Default values: 10, 3, 0.7, 0.15. 4. Emit TaskDesignCorrelationSignal (aggregate). Consumed by Task Agent for suggestion surfacing. BDSM does NOT emit runtime TaskProcessGapSignal — that is emitted by Revisor or Task Agent during execution. 5. Phase 2 correlation analytics: clustering, deficiency taxonomy emergence, auto-fix candidate detection. Operates on Phase 1 captured data. 6. Cross-reference OBL-XDOC-BDSM-CONSUME-SIGNALS-01 in DOC8's OP-A list. ``` **[XDOC-INSERT: EC Core Addendum A next revision]** ``` 1. Extend compiled policy engine to gate signal envelope persistence and promotion: - data_class enforcement (public / internal / privileged / local_only) - matter_id firewall (matter-scoped signals do not auto-cross matter boundaries) - pattern_promotion_eligible governs durable learning feed 2. Cost governance for learning_mode field (V3.3 §6.16): - "signal_generation" mode draws from cheap-model budget pool - "calibration" mode draws from mixed pool with explicit user authorization - "production" mode draws from production pool 3. Retention policy for evaluation envelopes and signals per data_class and matter. 4. Cross-reference OBL-XDOC-EC-POLICY-SIGNALS-01 in EC Core's OP-A list. ``` **[XDOC-INSERT: PropA R6.3 → R6.4 or later]** ``` 1. Extend DspyTargetIdSchemaV4 with new target ids: - "claim_extractor_main" (Addenda A step.claim_extractor prompts) - "outcome_evaluator_main" (Addenda B V3.3.1 Evaluator prompts) - "revision_compiler_main" (Addenda B V3.3.1 Revisor compile prompts) - "outcome_compiler_main" (Addenda B V3.3.1 Outcome Compiler prompts) Each new target requires DspyTargetEligibilitySchemaV4 discipline. 2. Add a new section documenting the extraction landscape coordination: - Boundary between PropA P0_master_extraction (DOC72 graph candidates) and Addenda A step.claim_extractor (Judge/Evaluator inputs) - Affirm separation: different consumers, different lifecycles - Affirm shared infrastructure: anchoring primitives via Common Contracts §7 - Affirm shared DSPy lane: PropA is single optimization lane 3. Cross-reference OBL-XDOC-PROPA-DSPY-TARGETS-01 in PropA's OP-A list. ``` **[XDOC-INSERT: DOC20 next revision]** ``` 1. UI surfaces for shared EvaluationResultEnvelope: - Producer-aware rendering - Slice-aware rendering (quantitative, qualitative, comparison, assurance, safety) - Hard Call surface integration 2. UI surfaces for variant comparison (Patterns A and B per coord V3 §2.9): - Per-variant result cards - Cross-variant comparison view 3. UI surface for Pattern C ad-hoc Judge attachment: - "Attach Judge to this Evaluator output" action on result cards - Cost estimate before Judge dispatch - Score display once Judge completes 4. UI for learning_mode toggle in RevisorConfig (V3.3 §6.16): - Three-state selector with cost guidance per mode 5. UI for pattern display with model_class context: - Pattern card shows model_class context - "from memory" vs "adapted from memory" badge (V3.3 §21.8.1) - cross_model_applicability status 6. Graph-edit warning when Revisor wired downstream of Experiment with experiment_winner_routing = "pass_through_winner" AND Revisor's AutonomousModePolicy permits autonomous repair. Surface confirmation dialog explaining implicit auto-revision chain. 7. Wiring validation error at graph-edit time when non-comparison-aware consumer wired downstream of Experiment with route_all_variants. 8. Cross-reference OBL-XDOC-DOC20-EVAL-UI-01 in DOC20's OP-A list. ``` ### §24B.3 Coding-agent instructions [CODING-AGENT-NOTE: cross-doc obligations pass] When implementing changes from Addenda B Core R0.7.1 (this doc), scan §24B.2 for blocks tagged `[XDOC-INSERT: target_doc]` and apply them to the named target docs when those docs are next revised. The architect (Will) does not separately coordinate with other-doc chats; the inserts are the coordination mechanism. Status updates for each OP-A row are queryable via the OP-A ledger. --- ## §25 — Rejected concepts and non-goals | Concept | Disposition | |---|---| | Central runtime orchestrator | Rejected. Undermines visible graph and repeatable execution. | | Task Agent as ordinary DOC23 module | Rejected for primary Task Agent. It is a system capability/read-control interface. | | Hidden subagent spawning outside modules | Rejected. Sub-agents belong inside modules or explicit non-task advisory routes. | | Full raw task content stored in DOC72 | Rejected. Raw content belongs in run storage, DOC25, DOC73, or owner stores. | | Automatic ingestion of all intermediate outputs | Rejected. Use artifact policy. | | Giant monolithic Task Agent `skill.md` as primary truth | Rejected. Use hybrid knowledge pack/tools/cards/evals. | | Purely generated knowledge pack | Rejected. Generated truth must be supplemented by doctrine, rationale, examples, cases, goals, preferences, and learning. | | Automatic material mutation of templates/presets | Rejected. Use proposal/versioning/approval. | | Capability utility silently deleting tools | Rejected. Utility informs recommendations/ranking/warnings unless approved. | | DOC25 as final endpoint label | Rejected. DOC25 is processor; endpoint must be specified. | --- ## §25A — Rejected or deferred concepts added by the post-R0.3 merge ### §25A.1 Rejected for initial Addenda B ```text Separate user-facing Workflow Designer agent Separate user-facing Task Inspector agent Separate user-facing Prompt Agent Separate primary Prompt Lab location Promptolution-specific module Task Agent as ordinary DOC23 module Task Agent as default classifier for every chat turn Hidden saved-task graphs Full TKP injection into ordinary chat DSPy-lite implementation before R5 substrate Task graphs materialized as DOC3 SKILL.md files ``` ### §25A.2 Reserved / possible future ```text Generic step.prompt_candidate_generator module Standalone advanced Prompt Evaluation Jobs page Task Agent sub-agent decomposition for heavy portfolio audits Formal prompt artifact library beyond DOC17 prompt recipes Task mode policy simulator Multi-user shared TaskInvocationDirective governance Full collaborative shared task workspace (DOC50+/Phase 2) Generic real-time multi-user graph editing Prompt/Context Inspector feedback auto-mutating DOC24 routing without review Live mid-call instruction injection unless module/runtime explicitly supports intervention ``` --- ## §26 — Open questions for architect review 1. Should “Template Group” be the placeholder name for multi-template bundles, or should this remain unnamed until needed? 2. Which task-design patterns should be architect-authored first beyond second-pass deepening, draft-redteam-revise, email attachment ingestion, and recurring monitor? 3. What default artifact policy should apply to final legal work products: `doc72_summary_reference` or `doc25_work_product_ingestion`? 4. Should Task Agent have a visible identity in the System Agent Identity Registry immediately, or land through DOC24 capability registry first? 5. What level of automatic low-risk semantic-card update is acceptable without user review? 6. How aggressively should prior task cases be extracted into the Design Casebook? 7. Should knowledge pack evals be mandatory before any pack activation, or only before major pack changes? 8. What is the correct default for intermediate high-value analysis outputs: ask each time, suggest after run, or make findable with summary when module preset says high-value? 9. How should legal matter privilege/sensitivity policy alter task memory defaults? 10. Should Task Assessment run automatically after every completed run, only after selected templates, or only when enabled? 11. What exact UI label should replace internal “TaskTemplateRecord” where user-facing “Reusable Saved Task” is sufficient? 12. Should Task Segments become searchable/recommendable through the same semantic-card machinery as saved tasks and module presets immediately, or after usage proves value? 13. Which run-fork defaults should vary by task type: reuse prior DOC24 context, recompute from same scope, or recompute from current memory/capabilities? 14. What redaction profile should be default for external audit export? 15. Which shared-task permissions, if any, belong in Phase 1.5 before full DOC50 collaboration exists? --- ## §27 — Implementation dependency note — non-normative This addendum is an end-state spec. Implementation may proceed in dependency-aware order, but the spec should not be rewritten as phases. Practical dependency order: ```text 1. Telemetry spine + Artifact Index 2. DOC72 Task Activity Memory 3. Artifact intake policy 4. Template/preset semantic cards 5. Task Blueprint 6. Capability binding/runtime expansion 7. Task Agent Run Lens and output retrieval 8. Task Agent creation flow 9. Task Knowledge Pack compiler 10. Task Assessment 11. Task design learning and improvement proposals 12. Eval/promotion/drift substrate ``` Task Agent may be partially useful before all layers exist, but it must degrade honestly and must not pretend unavailable memory/learning/capability substrates exist. --- ## §27A — End-to-end examples ### §27A.1 Create a complaint evaluation task ```text User: Create a task to evaluate a complaint. Task Agent: I found an existing Complaint Evaluation template. It matches your request better than a new blank graph. Recommended customization: - use DOC25 retrieval for complaint and source documents; - include issue matrix + revision plan output; - add second-pass deepening because you prefer “look deeper” in complex analysis; - add optional red-team review; - keep raw intermediate drafts with the run; - make final report findable with summary; - save final revised complaint as work product if produced. I will run preflight before saving. [Use recommended setup] [Inspect template] [Edit blueprint] [Create new instead] ``` ### §27A.2 Ask about a running task ```text User: Where are we in the Marex Draft Complaint task? Task Agent reads TaskAgentRunLens and answers: The task is on step 5 of 7. The damages memo and first complaint draft are complete. The red-team module is running now. It has reviewed the complaint draft and 12 source documents. No human gate is pending yet. The next expected step is revision based on red-team findings. Latest available artifacts: damages memo, first draft complaint, source-doc classification table. ``` ### §27A.3 Retrieve a task output ```text User: Send me the results of the draft task we ran for Marex. Task Agent: I found the Marex Draft Complaint run from [date]. Final outputs were: - Final complaint draft - Damages analysis memo - Red-team report - Revision summary The final complaint was saved as a work product. The damages memo is still run-scoped but can be made findable. [Open final complaint] [Open damages memo] [Make damages memo findable] [Open Run Inspector] ``` ### §27A.4 Learn from a refined task ```text User creates or edits a complaint drafting task and adds: - second-pass deepening; - red-team before revision; - separate damages memo output; - final work-product save. After run and assessment, Task Agent proposes: This design worked well and appears reusable. I recommend adding a design case: “For complex complaint drafting, use damages memo -> draft -> second-pass deepening -> red-team -> revision.” Candidate template update: Add damages memo and second-pass deepening as optional defaults in Complaint Drafting Template. [Approve design case] [Review template update] [Reject] ``` --- --- ## §27B — Non-normative external design benchmarks This section preserves external design patterns considered during drafting. These references are not normative and do not override DOC23-native contracts. They explain why Addenda B emphasizes graph-visible execution, telemetry, evals, connector registries, reusable skills/procedures, and durable execution. | External pattern | Lesson imported into Addenda B | |---|---| | Visual agent/workflow builders with connector registries and versioning | DOC23 should keep the graph as the visible execution substrate while adding semantic template/preset reuse and capability binding. | | Claude Cowork-style natural-language process setup | Task Agent should accept underspecified intent, propose a richer process, show progress, retrieve outputs, and ask for approval when needed. | | LangGraph/Temporal-style durable execution | Long-running tasks need checkpoints, idempotent side effects, durable waits, and safe resume semantics. | | Phoenix/Braintrust-style traces/evals/datasets | Telemetry must feed eval datasets, promotion ledgers, regression checks, and Task Assessment. | | Anthropic Skills / progressive disclosure | Task Agent should receive compact routing cards first and deeper cards/spec sections only when needed. | | MCP connector ecosystems | MCP should be exposed through DOC24 capability IDs, health, auth, policy, and least-privilege tool packs, not raw uncontrolled tool lists. | | OpenTelemetry / span-based observability | DOC23 should define native TaskTraceSpan records and may later export compatible traces. | | Typed DAG planning / validator-gated repair research | Task Agent should synthesize typed blueprints/graphs, validate/preflight them, and repair through explicit proposals rather than free-form runtime improvisation. | The architectural conclusion is: **structured plan + governed execution + trace/eval feedback** is preferable to opaque autonomous runtime improvisation for business-grade tasks. ## §27C — End-state examples ### §27C.1 Simple direct action ```text User: File this in the Marex folder. Elnor: Done — I saved it under Marex > Drafts. I kept the original filename. ``` No saved task. No Task Agent. Possible DOC3 procedure/tool action. ### §27C.2 Ambiguous review ```text User: Review this complaint. Elnor: Here is a quick review... Optional chip: [Run deeper Complaint Evaluation task] ``` Task chip appears only if DOC24 task opportunity and learned preferences justify it. ### §27C.3 Deep complaint review ```text User: Do a deep review of this complaint against the source documents and give me a red-team report. Elnor: This is a good fit for your Complaint Evaluation task. It will review the complaint and source docs, produce an issue matrix, run red-team, and save the final report. I recommend using that saved task rather than doing this as an ordinary chat response. [Use saved task] [Customize first] [Just answer here] ``` ### §27C.4 Elnor-created recurring task ```text User: Keep an eye on Marex filings and tell me when anything new appears. Elnor: I’ll set this up as a saved task so it can run independently and report back. You’ll be able to inspect the task, logs, and alerts from the Tasks page. [View task] [Edit steps] [Enable] ``` ### §27C.5 Existing task improvement ```text User: Look at my complaint tasks and suggest improvements. Elnor: I’ll ask Task Agent to review the relevant task templates, recent runs, judge results, artifacts, and user edits, then give you a ranked improvement report. ``` ### §27C.6 Prompt help quick ```text User: Is this a good prompt? Elnor: It is missing source scope, a required output format, and a verification rule. Here is a stronger version... ``` No task. DOC17 Prompt Advisor service or direct reasoning. ### §27C.7 Prompt testing deep ```text User: Refine my red-team prompt for effectiveness. Elnor: I can do a quick rewrite, or run a tracked prompt test with variants and judge scoring. [Quick rewrite] [Run tracked prompt test] ``` If tracked, Task Agent creates a Prompt Evaluation Task using Experiment/Judge and candidate-generation methods. --- ## §27D — Coverage audit against post-R0.3 decisions This R0.6.1 unified draft incorporates the following post-R0.3 architectural decisions: 1. Runtime/user-facing wording should use **saved task** / **tracked workflow** / **Task Agent**, not “use DOC23.” 2. `Workflow Designer` is not a separate agent name. Task Agent is the canonical identity; design/review/inspect/retrieve/assess are modes. 3. “Consult Task Agent for task fit” was too broad. DOC24/EC prefilter first; Task Agent is invoked only through typed entrypoints. 4. Saved tasks are direct-first and formalization-value driven; complexity alone is not enough. 5. `monitored` was ambiguous. OpenClaw heartbeat/session monitoring, OpenClaw background sub-agents, EC scheduler jobs, and saved-task monitoring are separate modes. 6. Elnor may create saved tasks to handle long-running/scheduled/multi-step work, but all such tasks must be visible, editable, inspectable, cancelable, and receipt-backed. 7. Task Agent is a system agent and invokable capability, registered through EC Agent Identity Registry and DOC24 Capability Registry, with DOC11/OpenClaw runtime truth. 8. Task Agent needs model, think-level, fallback, cost, context, sub-agent, and degradation settings. 9. TKP is Task Agent's design/reference substrate, not the ambient router, not the core contract, and not a user-facing DOC73 library. 10. Task invocation resembles procedural routing, but TaskInvocationDirective should not collapse DOC23 tasks into DOC3 skills. 11. Task suggestions, acceptances, rejections, overuse, underuse, and user edits are high-value BDSM/DOC8 learning signals. 12. Prompt help should be one user-facing capability, **Improve Prompt**, with quick and deep modes. 13. DOC17 Prompt Advisor should become a lightweight prompt-analysis/rewrite service, not a separate agent competing with Task Agent. 14. No separate primary Prompt Lab UI is required. 15. Deep prompt testing should be a tracked DOC23 prompt-evaluation task orchestrated by Task Agent. 16. Promptolution should not be a dedicated module or UI; it is a prompt-candidate generation method. 17. DSPy/GEPA is reserved until Addenda A R5 optimization substrate lands. 18. Candidate generation should initially be baked into Experiment or invoked by Task Agent; a generic Prompt Candidate Generator module is only reserved for later manual graph use. 19. Task Agent must inspect Judge/Experiment/Claim/optimization evidence when assessing tasks and prompts. 20. No hidden graph runs, no silent prompt mutation, no silent template/preset mutation, no full TKP injection into ordinary chat. --- ## §27E — Merge audit cleanup record R0.6.1 integrates the audit-complete clarifications from the former post-R0.3 review appendix into their operative sections. The following items are now placed in the body of the addendum rather than left as standalone merge notes: 1. `TaskInvocationDirectiveCandidate` creation from real task-mode usage and user acceptance/rejection/edit traces (§7B). 2. `ElnorCreatedTaskSemanticTrace` for every Elnor-created visible saved task (§5A). 3. `OpenClawTaskRuntimeTraceBridge` and the boundary between native OpenClaw progress and saved-task formalization (§4C). 4. Task Agent access to Monitor, Environment, telemetry, snapshots, step-output shelves, tool-result shelves, DOC72 task memory, and DOC8/BDSM learning signals (§4.5). 5. `PromptCandidateMethodMetadata`, `ExperimentPromptCandidateMode`, and `ExperimentPromptCandidateSourcePlan` (§17B). 6. Contextual prompt-help access without a separate primary Prompt Lab surface (§17E). Audit verdict: coverage is complete against the R0.3 base and R0.5 merge pack at schema/section/idea level. The remaining work is architectural red-team review, not merge repair. ## §27F — Closing post-R0.3 merge principle The final design should make saved tasks naturally available without contaminating ordinary Elnor interactions. ```text Direct-first for ordinary work. Task-aware when task value is clear. Task Agent only through typed entrypoints. TKP only in specialist modes. Prompt help unified under Improve Prompt. Deep testing becomes tracked tasks. Learning tunes overuse, underuse, design quality, prompts, templates, and preferences. ``` The goal is not to use tasks more. The goal is to use saved tasks when they materially improve reliability, auditability, reuse, background execution, artifact management, evaluation, or user outcomes. --- ## §27G — R0.6.2 context-injection patch record R0.6.3 revises §13A to remove the singular-current-work-context assumption and to distinguish launch-surface context, scheduled/background task context, email/webhook/file/event trigger context, and task-run scope envelopes. R0.6.2 had closed the initial task-module DOC24 context-injection scope gap identified after R0.6.1; R0.6.3 sharpens that rule so scheduled, triggered, and concurrent-chat runs never inherit unrelated active context by default. The patch makes explicit that DOC24 task-module packets are task-scoped and module-scoped, not ordinary active-chat-context dumps; preserves relevant global hard constraints, preferences, procedures, and prior task memory; requires packet receipts and exclusion receipts; clarifies DOC73/library and DOC25/document injection conditions; and adds read-model, storage, validation, and OP-A obligations for the seam. --- --- ## §27H — R0.6.4 UI, continuation, and sharing-forward-compatibility patch record R0.6.4 integrates the post-R0.6.3 UI/mockup and task-surface design decisions: 1. DOC20 tab/browser integration for the modular task page. 2. Graph overview preservation and non-obtrusive telemetry rule. 3. Module detail Run Data / Open Output / Open Run Inspector links. 4. Graph ⇄ Inspector state-synchronized navigation. 5. Run Inspector consolidation into Header, Run Flow & Steps, Artifacts & Deliveries, and Context & Audit. 6. TaskDeliveryRecord and deliverables read-model for where/when outputs were saved, emailed, messaged, posted, or otherwise delivered. 7. Context Inspector with DOC24 injection/exclusion visibility and feedback into DOC8/BDSM. 8. Tasks page command center with active/scheduled/saved/history/segments/presets views and operational stats. 9. User-facing saved-task/reusable-saved-task terminology. 10. Task Segment as reusable module group replacing the placeholder “sub-task.” 11. Task Agent contextual side-panel and scoped Task Agent threads. 12. Module session continuation/follow-up modes and TaskModuleSessionRef. 13. Rerun/fork-from-module and ModuleActivationReplayRecord. 14. Judge/Experiment rescore, rerun, variant, and downstream-fork affordances. 15. Export audit bundle semantics. 16. Standalone/shared task surface forward-compatibility and DOC50 owner split. ## §27I — R0.7 patch record (family-topology reorganization + V3 FINAL coordination absorption) R0.7 carries the following deltas from R0.6.4: ### §27I.1 Family-topology reorganization R0.6.4 was a singular Addenda B document. R0.7 reorganizes Addenda B into a family of focused specifications: - Outcome Evaluator + Revisor content extracted to V3.3.1 sub-addendum (already produced) - Source workspace + Source Research module content extracted to Source Workspace V1.0.1 sub-addendum (to be produced) - Task Forum + Run Board + Module Assistance Requests content extracted to Task Forum + Run Board V1.0.1 sub-addendum (to be produced) - In-run feedback delivery content extracted to Feedback Delivery V1.0.1 sub-addendum (to be produced) - Shared evaluation primitives moved to DOC23 Evaluation Common Contracts V1.1.1 sibling document (already produced) R0.7 retains the Task Design domain core: Task Agent, Blueprint, Templates, Task Design Learning, Task Activity Memory, DOC24 binding, Telemetry, Task Assessment, Prompt improvement, baseline UI/routes/storage/validation. ### §27I.2 R0.6.5 base material absorption R0.6.5 proposal sections absorbed into R0.7: - §0 Executive Summary — merged into R0.7 front matter - §1 Governing Principles (including Principle X "Autonomy") — absorbed into §2 as §2.12 - §2 What This Adds — absorbed into §2 as §2.14 - §3 Object Taxonomy — pointed to §3A - §4 Complexity Profiles — absorbed into §2 as §2.13 Domain-specific R0.6.5 content (sections §5 onward) distributed to sub-addenda per §27I.1. R0.6.5 proposal retires when sub-addenda are produced. ### §27I.3 Addenda A ↔ Addenda B coordination V3 FINAL absorption The locked V3 FINAL coordination architecture integrated: - **§0A** R0.7 supersession statement and family topology — NEW - **§0B** Shared evaluation infrastructure integration — NEW - **§2.12** Autonomy principle (Principle X) with normative spec-anchor sentence — NEW - **§9.0** V3 FINAL coordination signal envelope with five Core-owned signal payloads (OutcomeEvaluationSignal, RepairCycleSignal full Phase 1 form with taint_evolution, TaskProcessGapSignal, TaintClearanceSignal, HardCallResolutionSignal) — NEW - **§24B** V3 FINAL coordination OP-A rows + six `[XDOC-INSERT: target_doc]` blocks for DOC23 R3.2, DOC72, DOC8/BDSM, EC Core, PropA, DOC20 + coding-agent instruction — NEW ### §27I.4 Resolved open questions from R0.6.4 §26 The following R0.6.4 open questions are now resolved: - "Should the outcome evaluator be a separate addendum or part of the core?" → resolved by V3.3 family-topology decision - "How do learning signals coordinate with Addenda A's Judge module?" → resolved by V3 FINAL coordination (signal envelope, Pattern C wiring) - "How should the system support running on cheap models for learning purposes?" → resolved by V3.3 §6.16 learning_mode field - "What's the boundary between PropA extraction and Addenda A claim extraction?" → resolved by V3 FINAL §2.12 (separate modules, shared anchoring infrastructure, shared DSPy lane) ### §27I.5 What R0.7 does NOT change R0.6.4 content for §3 through §22 (modulo the §9.0 insertion and §2 additions) is preserved. The Task Agent, Task Blueprint, templates, telemetry spine, DOC24 binding, OpenClaw alignment, prompt improvement, UI surfaces, routes, storage paths, validation codes, and existing cross-doc obligations carry through unchanged. R0.7 is a surgical absorption of the V3 FINAL coordination outputs plus a topology reorganization — it does not rewrite the Task Design substrate that R0.6.4 established. ### §27I.6 Migration - R0.6.4 is superseded and should be treated as provenance/source-only now that Source Workspace V1.0.1, Task Forum + Run Board V1.0.1, and Feedback Delivery V1.0.1 have landed - Cross-references from other addenda to R0.6.4 sections that moved should update during their next revisions - The OP-A ledger picks up R0.7's new obligation rows (§24B); existing R0.6.4 obligation rows (§24, §24A) remain active ## §28 — Closing note This addendum's core claim is that DOC23 can become a business-grade and personal-grade agentic process system without sacrificing the core strength of explicit modular graphs. The architecture is: ```text Visible process graph for execution. Task Agent for intelligence and interface. Task Blueprint for intent. Telemetry for truth. DOC72 for task activity memory. DOC25/DOC73 for document and library outputs. DOC24 for capabilities/context/tools/models. DOC8/BDSM for learning and utility. EC for durable writes and control. ``` The value comes from compounding: every task created, run, inspected, assessed, edited, and reused makes the Task Agent better at turning future intent into reliable task processes.