DOC23_ADDENDA_B_CORE_R0_7_1.md
Current Specs/DOC23/DOC23 Addenda B/DOC23_ADDENDA_B_CORE_R0_7_1.md
ELNOR REPO READER TEXT MIRROR
Original path: Current Specs/DOC23/DOC23 Addenda B/DOC23_ADDENDA_B_CORE_R0_7_1.md
Source repo: /Users/OpenClaw1/Elnor/Elnor Specs
Git branch: main
Git commit: dbaa25962edc11ab30e8d4ca1715f9ae5bf77331
Generated: 2026-06-09T01:23:58.539Z
---
# DOC23 Addenda B Core — Task Intelligence, Memory, Observability, Natural Task Invocation, and Self-Improving Task Design
**Revision:** R0.7.1 Core — clean replacement for R0.7; reference/topology cleanup only; no substantive architecture changes
**Date:** 2026-05-17
**Status:** Build-ready Core specification. R0.7.1 is the family-core for Addenda B's Task Design domain. Outcome Evaluator/Revisor lives in the V3.3.1 sub-addendum. Source Workspace, Task Forum + Run Board, and Feedback Delivery live in their own sub-addenda. Shared evaluation primitives live in DOC23 Evaluation Common Contracts (sibling to the parent doc).
**Target parent:** DOC23 — ELNOR Task System: Modular Automation Architecture R3.1 (R3.2 absorbs shared primitives later)
**Companion docs:** DOC23 Addenda B / Outcome Evaluator+Revisor V3.3.1; DOC23 Evaluation Common Contracts V1.1.1; DOC23 Addenda A R4.1 V3 (with V4.1 Coordination Patch / V5 Mini-Card from coordination V3 FINAL); DOC23 Addenda B / Source Workspace V1.0.1; DOC23 Addenda B / Task Forum + Run Board V1.0.1; DOC23 Addenda B / Feedback Delivery V1.0.1; DOC23 R3.1; DOC24 R3; DOC72 R5.73; DOC25 V2.0; DOC73 V1.5.1; DOC8 v1.11.4; DOC24 Addendum A BDSM v6.4; DOC3 R11.3 Addenda A R2.2; DOC17 R4.2; DOC11 R15 Amendment Proposal R2; OpenClaw Release Alignment Running Spec Update List R2; EC Core Addendum A V3.3; PropA R6.3+; OP-A V3.7+.
**Purpose:** Define the Task Agent, natural saved-task invocation, Task Blueprint, template/preset intelligence, Task Agent education/TKP, task-design self-learning, DOC72 task-memory integration, telemetry spine, artifact/output intake, DOC24 capability/runtime routing, OpenClaw/DOC11 runtime alignment, unified prompt improvement, prompt-evaluation tasks, task UI/telemetry surfaces, module continuation/rerun mechanics, shared-task forward-compatibility, and all cross-doc seams required to make DOC23 a reliable, inspectable, self-improving saved-task system. R0.7.1 also documents the family topology (Core + V3.3.1 + three sub-addenda + Common Contracts) and integrates the Addenda A ↔ Addenda B coordination V3 FINAL converged architecture.
**R0.7.1 changes from R0.7 (reference/topology cleanup only):**
R0.7.1 is a full replacement copy of R0.7. It makes no substantive architecture, schema, route, storage, or runtime changes. It only clarifies the current Addenda B family topology after sibling documents landed: Outcome Evaluator/Revisor V3.3.1, Common Contracts V1.1.1, Source Workspace V1.0.1, Task Forum + Run Board V1.0.1, and Feedback Delivery V1.0.1. It also marks R0.6.4 as superseded/provenance-only.
**R0.7 changes from R0.6.4 (substantive change log):**
R0.7 supersedes R0.6.4 with three classes of changes:
**Class 1: Family-topology reorganization (per Will's earlier organizational decision).** R0.6.4 was the singular Addenda B document carrying all task-design concerns. R0.7 retains the Task Design domain core but **does NOT carry Outcome Evaluator/Revisor, Source Workspace, Task Forum + Run Board, or Feedback Delivery content** — those live in their own sub-addenda. Section additions in this Core reflect the family-topology context: §0A (R0.7 supersession statement and topology), updated §0.4 (scope summary excludes sub-addenda content), updated §1 (executive summary scoped to Core).
**Class 2: R0.6.5 base material absorption.** R0.6.5 proposal §§1-4 (governing principles, complexity profiles, what this adds, object taxonomy) absorbed into this Core's existing §2 governing principles and §3A terminology/object taxonomy. R0.6.5 §§5-7 (OutcomeSpec, Outcome Evaluator, States) superseded by V3.3.1 — referenced, not duplicated. R0.6.5 §§8-15, §§16-18, §§19-22 distributed to Feedback Delivery V1.0.1, Source Workspace V1.0.1, and Task Forum + Run Board V1.0.1 respectively. R0.6.5 retires when sub-addenda land.
**Class 3: Addenda A ↔ Addenda B coordination V3 FINAL integration.** The locked V3 FINAL coordination architecture is integrated:
- §9 Task Design Learning extended with the eight Phase 1 learning signals wrapped in `EvaluationLearningSignalEnvelope` (per coordination V3 §2.7, §2.11)
- §13/§13A DOC24 capability binding extended with model_class context for cheap-LLM learning mode
- §17 series Prompt evaluation updated to consume the shared `EvaluationResultEnvelope`, reference DSPy targets coordinated with PropA, and integrate Pattern C ad-hoc Judge attachment (per coordination V3 §2.9, §2.10)
- §24 cross-doc obligations extended with the V3 FINAL coordination OP-A rows (14 new rows; full text in §24B)
- New §0B "Shared evaluation infrastructure" cross-references DOC23 Evaluation Common Contracts schemas and V3.3.1 module surfaces
**Class 4: Resolved open questions.** Several R0.6.4 §26 open questions are resolved by the V3 FINAL coordination outputs and noted in §26.
**Migration note:** R0.6.4 is superseded and should be treated as provenance/source-only. Together, R0.7 Core, Outcome Evaluator+Revisor V3.3.1, Common Contracts V1.1.1, Source Workspace V1.0.1, Task Forum + Run Board V1.0.1, and Feedback Delivery V1.0.1 cover the operative Addenda B family. Cross-references from other addenda to R0.6.4 sections that moved should update during their next revisions.
**Source merge:** R0.7 uses R0.6.4 (this filename's predecessor) as the base. R0.6.5 proposal base material (§§1-4) merged into §2 and §3A. Domain-specific R0.6.5 content distributed to sub-addenda.
---
## §0A — R0.7 supersession statement and family topology
This Core supersedes R0.6.4 and reorganizes Addenda B into a family of focused specifications. R0.7.1 is the entry point.
### §0A.1 Family topology
```
DOC23 (parent)
├── DOC23 R3.1 — Task System Modular Architecture (operative parent)
├── DOC23 R3.2 — (later) absorbs Evaluation Common Contracts
│
├── DOC23 Addenda A R4.1 V3+ — Judge, Experiment, Claim Extractor
│
├── DOC23 Evaluation Common Contracts V1.1.1 — sibling doc; shared primitives
│ (EvaluationResultEnvelope, slice schemas, EvaluationLearningSignalEnvelope,
│ Criterion, ArtifactScopeRef, TextAnchor, StructuredAnchor,
│ VariantEvaluationLineage, CriterionLineage)
│ Retires when DOC23 R3.2 absorbs
│
└── DOC23 Addenda B family
├── DOC23 Addenda B Core R0.7.1 (THIS DOCUMENT)
│ — Task Agent, Blueprint, Templates, Task Design Learning,
│ Task Activity Memory, DOC24 binding, Telemetry, Task Assessment,
│ Prompt improvement, baseline UI/routes/storage/validation
│
├── DOC23 Addenda B / Outcome Evaluator+Revisor V3.3.1 — DONE
│ — Outcome Compiler, Evaluator, Revisor, Dispatcher,
│ Candidate versions, Taint, Patterns, Quality, Governance,
│ claims_in port, learning_mode, model_class
│
├── DOC23 Addenda B / Source Workspace V1.0.1 — ACTIVE
│ — Workspace object, Source Research module, ingestion, UI
│
├── DOC23 Addenda B / Task Forum + Run Board V1.0.1 — ACTIVE
│ — Forum object, Board posts, Module Assistance Requests, UI
│
└── DOC23 Addenda B / Feedback Delivery V1.0.1 — ACTIVE
— Evaluation Feedback Bundle, Run Guidance, Repair Instructions,
Feedback Routing Policy, four delivery channels, Consumption Receipts
```
### §0A.2 Why this organization
R0.6.4 was the singular Addenda B carrying everything. As the Addenda B subsystems matured (especially Outcome Evaluator/Revisor → V3.1 → V3.3), single-document architecture became impractical: 9,570+ lines is unwieldy for fresh-window red-team review, for cross-LLM evaluation, and for focused implementation.
The family topology splits the subsystems into focused documents:
- **Each sub-addendum is independently red-teamable** in a fresh chat window
- **Each has its own version trajectory** — V3.3.1 can iterate without forcing Core R0.7.1 rev
- **Shared primitives live at the family-shared level** (Common Contracts) — no buried-in-one-addendum cross-references
- **The Core stays canonical for cross-cutting concerns** that don't belong in any sub-addendum
### §0A.3 What Core R0.7.1 owns
This Core R0.7.1 owns the **Task Design domain** — everything about designing, instantiating, learning from, and improving tasks. Specifically:
- Task Agent (system agent for natural task invocation)
- Task Blueprint (the persisted task design primitive)
- Templates and presets (reuse-first task creation)
- Task Design Intelligence substrate (the data the Task Agent reasons over)
- Task Design Learning (how the system learns better task designs over time)
- DOC72 Task Activity Memory integration
- DOC24 capability binding and runtime routing
- OpenClaw / DOC11 alignment
- Telemetry and observability
- Task Assessment (pre-execution)
- Artifact and output intake policy
- Prompt improvement and prompt evaluation
- Baseline UI surfaces (with task-detail UI living here; per-subsystem UI moves to sub-addenda)
- Routes, commands, read models, SSE
- Storage paths
- Validation codes
- Cross-doc obligations (Core-owned rows here; sub-addendum-owned rows there)
- Privacy, retention, degradation modes
- Open questions for architect review
### §0A.4 What Core R0.7.1 does NOT own
- Outcome evaluation and revision — V3.3
- Source workspace + source research module — Source Workspace V1.0.1 (when created)
- Task Forum, Run Board, Module Assistance Requests — Task Forum + Run Board V1.0.1 (when created)
- In-run feedback delivery (Evaluation Feedback Bundle, Run Guidance, Repair Instructions, Routing Policy, Delivery Channels, Consumption Receipts) — Feedback Delivery V1.0.1 (when created)
- Shared evaluation envelope, slices, signal envelope, anchoring primitives, Criterion — DOC23 Evaluation Common Contracts V1.1.1
References to these surfaces in this Core go through the named owning document. Implementations consult the owning document for the canonical schemas.
### §0A.5 Reading order recommendation
For new readers of Addenda B:
1. This Core R0.7.1 (§0–§28) — for the task design system substrate
2. DOC23 Evaluation Common Contracts V1.1.1 — for the shared evaluation primitives the Core references
3. V3.3.1 Outcome Evaluator+Revisor — for the deep outcome evaluation subsystem
4. Sub-addenda (Source Workspace, Forum, Feedback Delivery) — for their respective subsystems when ready
For implementers, all five docs are required reading to build the full system. For red-team review of a specific subsystem, that subsystem's document is largely self-contained.
---
## §0B — Shared evaluation infrastructure integration
Cross-references to the shared evaluation primitives used throughout this Core.
### §0B.1 EvaluationResultEnvelope (Common Contracts §3)
Every evaluation result emitted by an Evaluator (V3.3), a Judge (Addenda A), an Agent Review Gate, a Deterministic Scorer, or a Human Review record is wrapped in `EvaluationResultEnvelope` (inside Addenda A's `EvaluationArtifactEnvelope`). This Core's prompt evaluation surfaces (§17 series) consume the envelope; the Task Agent reads the envelope to inform task-design suggestions.
### §0B.2 EvaluationLearningSignalEnvelope (Common Contracts §5)
Every learning signal emitted by anything in the system wraps in `EvaluationLearningSignalEnvelope`. This Core owns the emitters for five of the eight Phase 1 signal types (per coordination V3 §2.11 and §9 below):
- OutcomeEvaluationSignal — emitted by V3.3 Evaluator; envelope structure here
- RepairCycleSignal — emitted by V3.3.1 Revisor
- TaskProcessGapSignal (runtime) — emitted by Task Agent or V3.3.1 Revisor
- TaintClearanceSignal — emitted by V3.3.1 Revisor or by user-action surfaces
- HardCallResolutionSignal — emitted by V3.3.1 Revisor when Hard Calls resolve
DOC8/BDSM consumes the unified signal stream. Addenda A owns PromptComparisonSignal. R0.6.4 §24A.7 UserActionSignal continues to exist (now wrapped in the common envelope). DOC8/BDSM emits TaskDesignCorrelationSignal (aggregate; cross-doc obligation per OBL-XDOC-BDSM-CONSUME-SIGNALS-01).
### §0B.3 Criterion (Common Contracts §6)
`Criterion` is the public sub-contract on `EvaluationOutcomeDefinition.criteria[]`. The Task Agent reads criteria when assessing whether a saved task's outcome definitions are well-formed; the Outcome Compiler (V3.3) reads them when compiling evaluation plans; Judge (Addenda A) reads them for `outcome_compliance_scoring`. This Core does not redefine the schema.
### §0B.4 Anchoring primitives (Common Contracts §7)
`ArtifactScopeRef`, `TextAnchor`, `StructuredAnchor` are shared between Addenda A's Claim Extractor (`step.claim_extractor`) and this Core's evaluation surfaces. When the Task Agent surfaces "your saved task X failed criterion Y in section Z" suggestions, the section reference uses `ArtifactScopeRef`.
### §0B.5 Cheap-LLM learning mode integration
Per coordination V3 §2.10 and V3.3 §6.16, `RevisorConfig.learning_mode` enables cheap-LLM signal generation and cross-model calibration. This Core's Task Agent surfaces (§4) consume learning_mode context: pattern surfacing respects the mode the user is currently running in, and the Task Agent's suggestions distinguish patterns learned at the current `model_class` from patterns requiring validation.
### §0B.6 Pattern C ad-hoc Judge attachment
Per coordination V3 §2.9, Pattern C allows Judge to attach downstream of any Evaluator output to generate per-criterion numeric scores without an Experiment. The Task Agent surfaces this as an optional "attach numeric scoring" action on Evaluator result cards; the user can opt in per task. This Core's §17 series prompt evaluation also leverages Pattern C when measuring prompt-version comparisons against outcomes without standing up full Experiments.
---
## §0 — How to read this addendum
### §0.1 Normative status
This addendum is an end-state architectural draft. It is **not** an implementation phase plan. Sections that describe dependency ordering are non-normative readiness notes only. Operative behavior is defined by the contracts, schemas, invariants, and owner splits below.
### §0.2 Relationship to DOC23 R3.1
DOC23 R3.1 remains the operative parent specification for the task graph, modules, ports, config schemas, execution engine, lifecycle, API routes, SSE events, storage, Run Inspector, templates, and module presets. This addendum does not replace those mechanics. It extends DOC23 with intelligence, memory, observability, and design-learning contracts.
### §0.3 Relationship to DOC23 Addenda A
DOC23 Addenda A R4.1 V2 owns experiments, judges, claim extraction, sub-agent awareness inside modules, prompt optimization reservations, DOC24 context injection for task modules, and session continuity. This addendum consumes those surfaces when it defines telemetry, Task Assessment, evaluation datasets, prompt/module-preset promotion safety, and task-design learning. It does not redefine Judge, Experiment, Claim Extractor, or sub-agent runtime semantics.
### §0.4 Scope summary
Addenda B defines:
1. Task Agent as the conversational task-design, task-inspection, task-retrieval, and task-assessment surface.
2. Task Agent education through a hybrid, self-updating Task Design Intelligence substrate.
3. Task Blueprint as the semantic intent contract behind a graph.
4. Reuse-first task creation through task templates and module presets.
5. DOC72 Task Activity Memory for task runs, outputs, artifacts, goals, entities, matters, and follow-on sessions.
6. Task telemetry and trace spans.
7. Artifact/output intake policy for final and intermediate outputs.
8. DOC24 capability binding and runtime expansion for tools, connectors, MCP servers, procedures, skills, and models.
9. Task creation learning, Task Assessment, and self-improvement loops.
10. UI, storage, routes, validation, and cross-doc obligations required to make the above real.
### §0.5 R0.3 audit-pass additions
The R0.3 audit pass confirms the R0.1/R0.2 architecture but strengthens several areas that were under-specified relative to the design discussion:
1. Adds a terminology/object taxonomy so Task, Task Run, Blueprint, Template, Preset, Template Group/Workflow Package, Design Pattern, Design Case, Artifact, Work Product, and Task Assessment do not blur.
2. Expands the Template Group / Workflow Package concept and clarifies that most single-template bundles should remain Task Templates, while groups/packages exist only when multiple templates or shared cross-template assets must be managed together.
3. Strengthens Task Agent education with a non-monolithic `skill.md` rule, required decision-card families, and explicit examples for context routing, named outputs, Carry Forward, Loop Controller, session continuation, holds, signals, email triggers, model choice, and connector/capability use.
4. Adds stronger prior-task Casebook rules: user-created, user-liked, and user-refined tasks can teach the knowledge pack by extracting not just what was built, but why a module/config was used, what goal it served, and whether it worked.
5. Adds minimum DOC72 work-context/entity-link capture requirements so task memory links to matters, entities, work context, artifacts, outputs, follow-up rooms/chats, and goals.
6. Adds explicit TaskDesignIntakeBundle and DOC24/DOC3-style task-creation intake rules so intent, rationale, preferences, goals, and template directives are not lost.
7. Adds TaskAgentEducationProfile, TaskDesignHintCard, Spec Librarian behavior, and TaskAgentInstructionCandidate so Task Agent expertise is compiled, retrieved, validated, and improved rather than stored in one monolithic prompt.
8. Adds goal-based design-learning records and non-normative external benchmark rationale.
9. Adds end-to-end examples showing task creation, progress inspection, output retrieval, and post-run learning.
---
### §0.6 R0.6.1 unified merge-pass additions
R0.6.1 merges the R0.3 Audit-Complete Addenda B baseline with the R0.5 Audit-Complete Merge Pack source. R0.6.1 is intended to be the single working Addenda B draft for subsequent red-team and seam review. The R0.5 merge source is no longer intended to operate as a companion addendum.
The R0.6.1 merge adds or reconciles the following topic families:
1. Natural task invocation, direct-first handling, and the rule that saved tasks are used only when task formalization materially improves the user outcome.
2. DOC24 Task Mode Resolver, TaskModeDecision, TaskOpportunityPacket, token-budget limits, and no full task-system prompt stuffing.
3. Task Agent typed entrypoints, runtime identity, system-agent registration, model/think/fallback settings, OpenClaw named-agent compatibility, and visible degraded modes.
4. Native OpenClaw monitoring versus saved-task monitoring, and the workflow formalization continuum from direct chat to standing procedure / recurring saved task.
5. Visible Elnor-created saved tasks with receipts, graph links, edit/cancel controls, DOC72 memory linkage, and no hidden task graphs.
6. DOC72 / DOC3 / DOC23 procedural boundaries, including TaskInvocationDirective and the distinction between memory directives, standing procedures, DOC3 semantic skills, and DOC23 tasks.
7. Task invocation learning, Task Agent proposal-edit learning, TaskSystemLearningReport, and BDSM/DOC8 signals for accepted/rejected task suggestions, overuse, underuse, and prompt edits.
8. TKP boundary correction: TKP is the Task Agent's detailed task-design/reference substrate, not the ambient task router, not the Task Agent core contract, and not a user-facing DOC73 library.
9. Unified prompt improvement: one user-facing Improve Prompt capability, DOC17 Prompt Advisor as a quick service, Task Agent as task-aware prompt orchestrator, and deep prompt testing as tracked DOC23 prompt-evaluation tasks.
10. Prompt candidate generation, Experiment/Judge integration, Promptolution as a candidate-generation method rather than a module, and DSPy/GEPA reserved until the Addenda A R5 optimization substrate is operative.
11. Additional routes, read models, SSE events, UI surfaces, failure modes, and cross-doc obligations required to make the above real.
### §0.7 R0.6.4 UI / module-continuation / sharing-forward-compatibility additions
R0.6.4 adds the UI and runtime refinements surfaced during the DOC20/mockup review pass. These additions are normative where they define schemas, owner splits, and UI contracts; visual layout examples remain illustrative.
R0.6.4 adds or clarifies:
1. DOC20 integration for the modular task page: tasks open as workspace tabs, the Browser column remains available, the right chat column defaults closed, and task/list/load/save/duplicate behavior is unified with Browser and task-page controls.
2. A Graph Overview Preservation Rule: the broad modular graph/module-block view remains clean and lightly badged; heavy telemetry belongs in drawers/inspector/detail surfaces.
3. Graph ⇄ Run Inspector navigation with synchronized task/run/module/artifact state.
4. Module detail/config panel quick access to last/current run data, module outputs, and the full Run Inspector.
5. A consolidated Run Inspector with Header, Run Flow & Steps, Artifacts & Deliveries, and Context & Audit, replacing any tab-heavy telemetry design.
6. Delivery records for task outputs, including where and when outputs were emailed, messaged, saved, posted, added to a library, written to memory, or delivered elsewhere.
7. A DOC24 context inspector showing injected and excluded memories, entities, tools, connectors, procedures, documents, libraries, prompt layers, and context packets, with thumbs/comment feedback routed to DOC8/BDSM.
8. A Tasks page command center for active, scheduled, saved/reusable, historical, and preset/task-segment assets, including run count, average duration, average cost, success rate, and links into graph/inspector views.
9. User-facing saved-task terminology: “Saved Task,” “Reusable Saved Task,” “Scheduled Saved Task,” and “Triggered Saved Task” are primary UI labels; internal `TaskTemplateRecord` may remain an implementation detail.
10. Task Segment as the reusable-module-group concept formerly described as “sub-task.”
11. Task Agent as a contextual side-panel chat with task/run/module/artifact scope, full chat treatment, attachments, feedback controls, and scoped conversation threads.
12. Module session continuation and follow-up distinctions: ask Task Agent about a module, start a new follow-up from module context, or continue the original OpenClaw session only when runtime truth says it is available.
13. Rerun/fork-from-module mechanics using immutable parent runs, module activation replay records, activation sequence IDs, upstream output freezing, graph/context/source-document policies, and side-effect replay controls.
14. Judge and Experiment module rerun/rescore/variant/fork affordances, including rescoring the same output, rerunning variants, adding variants, and forking downstream from a winning variant.
15. Export audit as a redaction-aware audit bundle, not a raw log dump.
16. Standalone/shared task surface forward-compatibility: task canvas, inspector, artifacts/deliveries, and module detail components must be renderable without the full DOC20 shell; full collaborative sharing is deferred to DOC50+ but DOC23 defines task-specific hooks now.
## §1 — Executive summary
DOC23 is already the correct execution substrate for reliable business and personal automation: visible graph, explicit modules, typed ports, typed configs, explicit cables, Run Inspector, templates, module presets, durable task records, and EC sole-writer execution. The missing layer is not a central autonomous orchestrator. The missing layer is **task intelligence**: the ability for Elnor to understand, design, inspect, remember, assess, and improve tasks without turning them into opaque black-box agent improvisations.
This addendum introduces the **Task Agent** as a system agent/capability that functions as:
- a natural-language task designer;
- a reuse-first template/preset matcher;
- a progress/status window into running tasks;
- a retrieval surface for outputs and intermediate artifacts;
- a Task Assessment generator;
- a DOC72-aware task-memory navigator;
- a task-design learning participant;
- and a user-facing explanation layer over graphs, runs, artifacts, telemetry, and improvements.
Task Agent is **not** a DOC23 module and is **not** a runtime orchestrator. EC and DOC23 execute the graph. DOC24 resolves capabilities and context. DOC72 stores task activity memory and semantic relationships. DOC25 processes produced/downloaded/received documents. DOC8/BDSM learn contextual utility. Task Agent reads these substrates and proposes changes through EC commands.
The main architecture is:
```text
User intent / task question
-> Task Agent
-> Task Design Intelligence substrate
-> DOC72 task/entity/template/preset/memory lookup
-> DOC24 capability/context/tool/model lookup
-> DOC23 graph/template/preset lookup and validation
-> Blueprint + graph proposal + capability plan + artifact policy + preflight
-> EC command if user approves
-> DOC23 execution
-> telemetry + artifacts + DOC72 Task Activity Memory + DOC8/BDSM signals
-> Task Assessment + candidate improvements
```
The product experience target is:
```text
Elnor knows the task system.
Elnor knows what tasks/templates/presets already exist.
Elnor knows what tools/connectors/models are available now.
Elnor knows what prior tasks worked and why.
Elnor knows the user's goals and preferences.
Elnor can set up a process better than the user initially described.
Elnor can tell the user what a running task is doing.
Elnor can retrieve outputs and explain what happened.
Elnor can propose improvements without silently mutating the graph.
```
---
## §2 — Governing principles and invariants
### §2.1 DOC23 remains the visible process substrate
Every executable task remains a DOC23 graph. Modules, ports, cables, configs, triggers, gates, outputs, loops, holds, delays, switches, junctions, and other task mechanics remain visible and auditable.
Task Agent may design, inspect, explain, assess, and propose changes. It must not run hidden workflow steps outside the saved graph.
### §2.2 No central runtime orchestrator
This addendum rejects a central runtime orchestrator that makes ad-hoc workflow decisions, rewires execution, or spawns sub-agents outside module ownership. Runtime autonomy is bounded by:
- graph topology;
- module configs;
- port/cable semantics;
- task security policy;
- durable checkpoints;
- human gates;
- capability bindings;
- side-effect policy;
- DOC24 packet/capability resolution;
- EC execution state.
### §2.3 Task Agent is an intelligence interface, not an executor
Task Agent is a system agent/capability. It is not a task module. It may invoke DOC23/DOC24/DOC72/DOC25/DOC8 read tools, validation tools, preflight tools, and proposal commands. It may not perform direct durable writes except through EC commands.
### §2.4 EC sole durable writer
All durable writes created by this addendum flow through EC. Q remains read/control surface. Task Agent emits proposals, write intents, commands, or assessment requests; EC validates and writes.
### §2.5 Operational run truth and DOC72 memory are distinct
DOC23/EC run storage is the source of truth for exact execution details. DOC72 stores structured task activity memory and relationship links. DOC8/BDSM stores derived learning/utility signals. These stores must not be collapsed.
### §2.6 Raw content is not DOC72 memory truth
DOC72 must not become a raw log or document warehouse. Full raw prompts, outputs, documents, and tool results remain in run-scoped storage, DOC25 document stores, DOC73 libraries/corpora, or other owner-doc stores. DOC72 stores references, summaries, relationships, goals, work-product links, and selected structured memory.
### §2.7 Artifact intake is policy-governed
Final outputs and intermediate module outputs are not all equal. Artifact intake must be configurable globally, per task, per module, per port, and per output class. Most intermediate outputs default to run-scoped storage only. High-value outputs can be made findable, summarized, routed through DOC25, or added to a DOC73 library/corpus.
### §2.8 Generated truth plus authored and learned design wisdom
Task Agent education cannot be a single prompt or purely generated from code. It requires:
- generated canonical mechanics from DOC23/spec/code;
- authored task-design doctrine;
- module/config rationale cards;
- worked examples;
- task-design cases from prior tasks;
- DOC72 goal links;
- template/preset semantic cards;
- user preference overlays;
- DOC8/BDSM learning bundles;
- typed tools and validators.
### §2.9 Reuse before creation
Before creating a new task graph, Task Agent must search existing task templates, module presets, task design patterns, and semantically similar prior tasks. It should recommend reuse, customization, or fork before creating a new graph.
### §2.10 Learning is proposal-based
Task Agent and learning systems may propose improvements to templates, presets, pattern cards, artifact policies, question policies, and Task Agent guidance. Material changes to executable task templates, module presets, security policy, side-effect policy, or capability bindings require versioned proposals and approval.
### §2.11 No phantom UI or controls
Every visible control introduced by this addendum must map to a route, command, read model, durable write or explicit no-op, telemetry event, and refreshed state. Unwired controls must be explicitly marked deferred.
### §2.12 Autonomy (Principle X; R0.6.5 §1 absorption)
The system supports autonomous task execution when explicitly enabled, but autonomy is a **property of the modules and agents in the task graph**, not an emergent property of having those modules. Specifically:
- Auto-revision is a property of the V3.3.1 Revisor's `AutonomousModePolicy` (V3.3 §6.6), not the Experiment surface
- Auto-execution of saved tasks is a property of the Task Blueprint's invocation policy (§6.4) and the user's standing instructions
- Auto-promotion of learned patterns is gated by EC Core's compiled policy engine (per OBL-XDOC-EC-POLICY-SIGNALS-01)
- Auto-routing in Experiments uses `experiment_winner_routing` (Addenda A R4.1 V3, per coordination V3 §3.1)
Autonomy is opt-in by module, not opt-out by surface. A user wiring a task graph with autonomous Revisor downstream of an Experiment with `pass_through_winner` opt-in is explicitly creating an autonomous chain; DOC20 surfaces this at graph-edit time (per OBL-XDOC-DOC20-EVAL-UI-01) so the chain isn't constructed accidentally.
**Spec-anchor sentence (normative; per coordination V3 §3.1):**
> *Auto-revision is a property of the Revisor's `AutonomousModePolicy`, not the Experiment surface. If a user wires Revisor downstream of an Experiment's variant output, the Revisor's policy determines whether revision proceeds autonomously. Experiments do not introduce auto-revision policy of their own.*
This sentence forecloses re-litigation of where auto-revision authority lives. It applies symmetrically to other autonomy questions: each module owns its own autonomy policy; surfaces do not introduce autonomy policy beyond what the modules consume.
### §2.13 Complexity profiles (R0.6.5 §4 absorption)
Tasks vary in complexity. The Task Agent reasons about complexity along a small set of axes; complexity informs onboarding, suggestion verbosity, default settings, and pattern surfacing. The axes:
- **Modules count.** Simple = ≤3 modules; medium = 4-10; complex = 10+.
- **Outcome count.** Simple = ≤2 outcomes; medium = 3-5; complex = 6+.
- **Source dependency.** Self-contained (no external sources); single-source (one Source Workspace); multi-source (multiple Source Workspaces).
- **Output stakes.** Internal advisory (low); client-facing (medium); filing-or-binding (high).
- **Autonomy level.** Manual (user gates every step); partial (auto-revision opt-in); full (auto-revise + auto-route).
The Task Agent surfaces complexity in onboarding ("this is a complex task; consider...") and in suggestions ("simpler tasks like X often work better when..."). Complexity does NOT gate features; it informs Task Agent presentation.
```ts
TaskComplexityProfile {
task_id: string
modules_count: number
outcome_count: number
source_dependency: "self_contained" | "single_source" | "multi_source"
output_stakes: "internal_advisory" | "client_facing" | "filing_or_binding"
autonomy_level: "manual" | "partial" | "full"
// Derived classification
overall_complexity: "simple" | "medium" | "complex"
schema_version: 1
}
```
The Task Agent reads this when constructing recommendations; the Task Blueprint persists it as a derived field updated whenever the blueprint changes.
### §2.14 What this addendum adds (R0.6.5 §2 absorption)
This Core R0.7.1 adds to DOC23 / Addenda A:
- **Task intelligence layer** — the Task Agent system agent and supporting infrastructure
- **Task design memory** — DOC72 Task Activity Memory integration for cross-run learning
- **Telemetry spine** — observability over task design, execution, and learning signals
- **Self-improving task design** — patterns learned from runs feed back into task design suggestions
- **Prompt improvement and evaluation** — DOC17 prompt registry integration with V3.3 Pattern C ad-hoc Judge attachment
R0.7 does NOT add (these live in sub-addenda):
- Outcome evaluation engine — V3.3
- In-run feedback delivery — Feedback Delivery V1.0.1
- Source workspace infrastructure — Source Workspace V1.0.1
- Task forum and run board — Task Forum + Run Board V1.0.1
### §2.15 Object taxonomy (R0.6.5 §3 absorption; merged into §3A below)
R0.6.5 §3 added object taxonomy expansion. The expanded taxonomy is folded into §3A (Terminology and object taxonomy) below.
---
## §3 — Owner split
### §3.1 DOC23 owns
DOC23 owns:
- Task Agent task-facing contracts defined in this addendum.
- Task Blueprint schema and graph binding.
- Task templates and module presets as executable DOC23 assets.
- Task Template Semantic Cards and Module Preset Semantic Cards as DOC23-authored projections, with DOC72 graph links.
- Task telemetry schemas for task execution.
- Task Artifact Index.
- TaskPreflightReport.
- TaskAssessment.
- Artifact intake policy for task outputs.
- Task Agent task-design tools that operate over DOC23 graph/template/preset/catalog/read models.
- Run Inspector and task UI extensions.
### §3.2 EC Core owns
EC owns:
- Durable writes.
- Task Agent command handling.
- Task Knowledge Compiler execution and pack activation.
- Task graph validation/preflight execution.
- Read-model generation.
- Storage paths and atomic writes.
- Background scheduling, nightly extraction, compaction, and pack regeneration.
- Effective runtime state, incognito controls, and processing controls.
- Route registry and command registry entries.
### §3.3 DOC24 owns
DOC24 owns:
- Capability registry and live action state.
- Tool/connector/MCP/procedure/model availability.
- Semantic routing and packet assembly.
- Rendering/injection of relevant task-design cards into Task Agent context.
- Runtime expansion of module tool packs within allowed envelopes.
- DeliveryDirective and compiled utility bundle consumption.
- Knowledge/capability packet lifecycle and injection manifests.
### §3.4 DOC72 owns
DOC72 owns:
- Entity graph storage and node taxonomy.
- `execution_trace`, `goal`, `world_entity`, `work_product`, `memory_directive`, `procedure`, `tool_capability`, and other node payload contracts.
- Task Activity Memory graph relationships.
- Goal nodes and task-design goal links.
- Semantic projections for templates, presets, task runs, artifacts, design cases, and learned preferences.
- Confidence/provenance/temporal/connection/experience dimensions for task-related knowledge.
### §3.5 DOC25 owns
DOC25 owns universal ingestion for task-produced, task-downloaded, task-received, or task-attached documents. DOC23 emits ingestion intents/events. DOC25 processes, deduplicates, converts, stores derived artifacts, reports quality, and exposes document retrieval tools.
### §3.6 DOC73 owns
DOC73 owns deep extraction into libraries/corpora where a task output or downloaded/received document is intentionally added to a bounded knowledge substrate. DOC23 must not use DOC73 as a generic artifact store; DOC73 is used when the endpoint is a library/corpus or extraction lens.
### §3.7 DOC8 and BDSM own
DOC8 and BDSM own:
- friction events;
- learning signals;
- question utility;
- tool/procedure/capability utility;
- pattern detection;
- compiled learning bundles;
- self-healing proposal generation;
- satisfaction/utility computation.
DOC23 defines task-specific signal shapes and consumers. DOC8/BDSM compute and compile.
### §3.8 DOC3 owns
DOC3 owns procedural skill lifecycle, demonstrated procedures, semantic skill learning, and DOC3 procedure execution. DOC23 templates/presets can be routable like DOC3 procedures but remain DOC23-owned executable assets.
### §3.9 DOC11/OpenClaw owns
DOC11/OpenClaw owns final runtime dispatch, model/provider routing, session truth, `sessions_spawn`, ACP/coding session runtime, and native sub-agent behavior. DOC23 telemetry records task-scoped observations of those events.
---
## §3A — Terminology and object taxonomy
This addendum uses the following terms precisely. The Task Agent must use the same taxonomy when explaining, designing, searching, and updating task assets.
### §3A.1 Task
A **Task** is an executable DOC23 graph instance or saved graph definition. It is the thing EC/DOC23 can run. Its canonical executable truth is the graph: modules, ports, cables, configs, triggers, gates, output modules, and task settings.
### §3A.2 Task Run
A **Task Run** is a concrete execution of a Task. It has operational run truth: run record, run snapshot, checkpoints, events, spans, module activation states, artifacts, prompts, tool calls, costs, errors, and receipts.
### §3A.3 Task Blueprint
A **Task Blueprint** is the semantic intent contract behind a graph. It explains what the task is for, what process it implements, what outputs matter, what quality criteria apply, what human gates and side effects are expected, what goals it serves, and how graph modules bind to step intents. A Blueprint is not executable by itself.
### §3A.4 Module Preset
A **Module Preset** is a saved configuration for one module type. Example: a Complaint Red-Team Judge preset, a Damages Memo Agent preset, a PACER Docket Fetcher preset, or an email-output preset. Module presets are DOC23-owned executable/config assets; DOC72 may store semantic cards and experience links, not canonical configs.
### §3A.5 Task Template
A **Task Template** is a reusable executable graph template. A strong Task Template may bundle a Blueprint, graph, module configs, module presets, capability requirements, artifact policy, assessment plan, sample inputs, example outputs, version history, and semantic card.
A Task Template can therefore already contain much of what was initially described as a “process pack.” Do not create a separate package object for single-template cases unless multiple templates or shared assets create a real distinction.
### §3A.6 Template Group / Workflow Package
A **Template Group** or **Workflow Package** is an optional bundle of multiple related Task Templates and shared assets. It is not required for ordinary tasks. It earns its place only when the user needs to manage a family of tasks together.
Examples:
```text
Complaint Drafting Workflow Package
- Draft Complaint template
- Evaluate Complaint template
- Red-Team Complaint template
- Revise Complaint template
- Shared presets: fact extractor, damages memo, red-team judge
- Shared output archetypes: issue matrix, revision plan, work-product package
- Shared evaluation rubrics and sample cases
Docket Monitoring Workflow Package
- Recurring PACER Monitor template
- Filing Summary template
- Deadline Extraction template
- Notification/Human Review template
```
User-facing naming remains provisional. “Template Group” is acceptable as an implementation-facing label. Avoid introducing “Playbook,” “Process Pack,” or “Business Process Pack” as a durable product noun until the UI proves the distinction is useful.
### §3A.7 Task Design Pattern
A **Task Design Pattern** is an abstract reusable process motif. It is not an executable graph. Example: source classification → fact extraction → analysis memo → draft → red-team → revise. Patterns help Task Agent design new tasks and explain why certain modules/configs are useful.
### §3A.8 Task Design Case
A **Task Design Case** is a distilled prior task or task-creation session that teaches design judgment. It records the user intent, final design, important module/config decisions, the “because” behind those decisions, the goals served, actual outcomes, and reusable lessons.
### §3A.9 Task Assessment
A **Task Assessment** is a post-run or on-demand report that unifies operational telemetry, artifacts, judge/experiment results, errors, user feedback, DOC8/BDSM signals, capability findings, substantive findings, and proposed improvements. It is broader than a failure playbook.
### §3A.10 Artifact and Work Product
A **Task Artifact** is any material output, receipt, file, message, memo, judge result, subagent result, or module output produced or consumed by a task.
A **Work Product** is a higher-value artifact intended to persist as a document, deliverable, filing, memo, report, brief, complaint, exhibit chart, or other reusable professional/personal output. Work Product status should generally imply DOC25 processing and DOC72/DOC25/DOC73 linkage according to policy.
### §3A.11 Task Agent Knowledge Pack
A **Task Agent Knowledge Pack** is the active versioned education substrate consumed by Task Agent. It is hybrid: generated canonical mechanics plus authored doctrine, rationale cards, decision cards, examples, design cases, template/preset cards, DOC72 goal/preference links, and DOC8/BDSM learning overlays. It is not one long prompt and not one giant `skill.md` file.
---
## §3B — Natural invocation and product-language rule
### §3B.1 Problem
R0.3 defines how Task Agent designs and inspects tasks once task mode is selected. It does not sufficiently define when Elnor should use a saved task, suggest a task, call Task Agent, or simply answer directly.
This is a core product seam. If Elnor suggests tasks too aggressively, ordinary chat quality degrades and users are annoyed. If Elnor never suggests tasks, the system fails to exploit one of ELNOR's most valuable features.
### §3B.2 Direct-first invariant
**Direct-First Non-Degradation Invariant.** Task awareness MUST NOT materially degrade Elnor's ordinary chat performance. Unless DOC24 identifies a meaningful task signal, no task-specific packet, task-template list, TKP content, or Task Agent instruction SHALL be injected into the ordinary chat turn beyond the tiny ambient task-awareness rule. Ordinary chat, quick answers, direct interactive reasoning, and simple one-step tool/procedure actions remain direct.
This invariant is binding. It protects the entire ELNOR system from becoming task-preoccupied.
### §3B.3 Product-language rule
Ambient model-facing and user-facing invocation language MUST NOT rely on internal spec labels such as `DOC23` or `DOC24` as operative action words. Internal schemas, route contracts, and owner-doc prose may still use owner-doc identifiers.
Use product terms:
```text
saved task
tracked task
tracked workflow
saved task template
task run
task graph
task template
module preset
Task Agent
```
Do not use ambiguous instructions such as:
```text
use DOC23
use DOC23/Task Agent
use Task Agent
```
Instead use typed verbs:
```text
run saved task
suggest saved task
design task
adapt task template
review task
inspect task run
retrieve task output
explain task graph
assess task
review my tasks
improve task prompt
test prompt variants
```
### §3B.4 Canonical ambient task-awareness card
The following is the recommended compact ambient card for Elnor. It should be short enough for stable baseline injection or equivalent SOUL/core instruction placement.
```text
Saved Tasks:
A saved task is an inspectable, reusable tracked process with graph steps, progress, outputs, logs, approvals, and artifacts. Use saved tasks when the user explicitly asks to run/create/review/inspect one, references a saved task/template, or the runtime supplies a TaskOpportunityPacket. Saved tasks are best for repeatable, scheduled/triggered, independently running, gate-controlled, artifact-producing, auditable, or template-backed processes. Stay direct-first for ordinary quick answers, direct interactive reasoning, and simple one-step actions.
Task Agent:
Task Agent is the specialist for designing, adapting, explaining, reviewing, inspecting, and assessing saved tasks and task runs. Invoke a typed Task Agent entrypoint only when the user asks, a task/run/template is explicitly involved, or runtime routing marks the task opportunity as medium/high/explicit. Do not invoke Task Agent merely because a request is complex.
```
Normative clarifications:
1. “Complex” is not by itself a sufficient reason to invoke Task Agent.
2. “Review this complaint” defaults direct unless additional signals suggest saved-task value.
3. “Run the Complaint Evaluation task” is an explicit saved-task request.
4. “Set up a process to review complaints like this going forward” is a Task Agent design request.
5. “Where is the Marex draft task?” is a Task Agent / task-run inspection request.
6. “Send me the output from the Marex draft task” is a task-output retrieval request.
7. The ambient card MUST NOT tell Elnor to consult Task Agent for generic “task fit.” Task opportunity is prefiltered by DOC24/EC; Task Agent consultation occurs only through an allowed typed entrypoint.
### §3B.5 Task Agent canonical naming
`Task Agent` is the canonical name. There is no separate `Workflow Designer`, `Task Inspector`, or `Prompt Agent` identity in the initial Addenda B architecture.
Use modes/entrypoints for what Task Agent is doing:
```text
Task Agent: Design Task
Task Agent: Adapt Template
Task Agent: Review Task
Task Agent: Inspect Run
Task Agent: Retrieve Output
Task Agent: Explain Graph
Task Agent: Assess Task
Task Agent: Review My Tasks
Task Agent: Improve Prompt
Task Agent: Test Prompt Variants
```
User-facing UI labels SHOULD be:
```text
Ask Task Agent
Design task
Review task
Inspect run
Retrieve output
Improve prompt
Test prompt variants
```
No distinct public “Workflow Designer” or “Prompt Agent” should be introduced unless the architect later chooses a product rename.
---
## §3C — Task invocation intelligence
### §3C.1 Execution mode taxonomy
Elnor chooses among execution modes, not between “task” and “no task” only.
```ts
type ElnorExecutionMode =
| "direct_chat"
| "direct_tool_or_procedure"
| "openclaw_background_subagent"
| "ec_scheduled_job"
| "run_existing_saved_task"
| "suggest_existing_saved_task"
| "invoke_task_agent_for_design"
| "invoke_task_agent_for_review"
| "inspect_saved_task_run"
| "retrieve_saved_task_output"
| "background_or_recurring_saved_task"
| "ask_clarifying_question";
```
### §3C.2 Use saved tasks when formalization adds value
A saved/tracked task is not needed merely because a request is complex. Saved tasks earn their place when formalization adds value:
```text
repeatability
independent/background execution
scheduled or triggered execution
durable checkpoints
progress visibility
human gates
side-effect controls
artifact tracking
multi-output tracking
audit trails
telemetry
run inspection
DOC72 task memory
template reuse
Task Assessment and improvement
```
### §3C.3 Do not use saved tasks when task overhead exceeds value
Do not invoke or suggest a saved task for:
```text
ordinary quick answers
single-turn direct reasoning
simple one-step tool actions
ordinary file move/save operations
fast graph/memory lookups
primary interactive reasoning where the user expects to work directly with Elnor
requests with explicit “quick,” “just answer,” “briefly,” “do this here,” or similar direct-mode language
```
### §3C.4 Task opportunity levels
```ts
type TaskOpportunityLevel = "none" | "low" | "medium" | "high" | "explicit";
```
Recommended handling:
| Level | Handling |
|---|---|
| `none` | Inject no task packet. Elnor answers/acts directly. |
| `low` | Usually no task mention. May log weak signal for learning. |
| `medium` | Direct answer first; optional quiet task chip if useful. |
| `high` | Elnor may recommend a saved task or consult Task Agent. |
| `explicit` | User named/asked for task/template/run/output; use task route. |
### §3C.5 TaskModeDecision schema
```ts
export type TaskModeDecision = {
decision_id: string;
user_request_ref: string;
resolved_intent_summary: string;
selected_mode:
| "direct_chat"
| "direct_tool_or_procedure"
| "openclaw_background_subagent"
| "ec_scheduled_job"
| "run_existing_saved_task"
| "suggest_existing_saved_task"
| "invoke_task_agent_for_design"
| "invoke_task_agent_for_review"
| "inspect_saved_task_run"
| "retrieve_saved_task_output"
| "background_or_recurring_saved_task"
| "ask_clarifying_question";
task_opportunity_level: "none" | "low" | "medium" | "high" | "explicit";
task_agent_consult_policy:
| "forbidden"
| "not_needed"
| "allowed_if_user_accepts"
| "recommended"
| "required";
task_fit_score: number;
direct_answer_fit_score: number;
existing_template_match_score?: number;
positive_task_signals: TaskUseSignal[];
task_veto_signals: TaskUseVetoSignal[];
matched_invocation_directives: string[];
matched_task_templates: string[];
matched_module_presets: string[];
doc24_packet_ref?: StorageRef;
task_agent_consult_ref?: StorageRef;
should_inject_task_packet: boolean;
should_inject_tkp_cards: boolean; // normally false unless Task Agent active
visible_to_llm: boolean;
visible_to_user: boolean;
created_at: string;
schema_version: "1.0";
};
```
Default rule:
```text
If task_opportunity_level is none or low, should_inject_task_packet SHOULD be false.
If Task Agent is not active, should_inject_tkp_cards MUST be false.
```
### §3C.6 Positive task signals
```ts
type TaskUseSignal =
| "explicit_task_request"
| "existing_task_name_referenced"
| "existing_template_name_referenced"
| "matching_template_high_confidence"
| "long_running"
| "recurring_or_monitoring"
| "multi_step_workflow"
| "multiple_independent_artifacts"
| "requires_human_gates"
| "requires_audit_trail"
| "requires_scheduled_or_triggered_execution"
| "requires_background_progress"
| "high_stakes_repeatable_process"
| "user_asked_to_set_up_process"
| "known_user_preference_for_task_in_context"
| "existing_invocation_directive_match"
| "task_output_or_run_question";
```
### §3C.7 Veto signals
```ts
type TaskUseVetoSignal =
| "simple_single_turn_answer"
| "single_tool_action"
| "user_requested_quick_response"
| "primary_conversation_should_remain_direct"
| "task_overhead_exceeds_value"
| "user_recently_rejected_task_suggestion"
| "low_stakes_non_repeatable"
| "ambiguous_but_direct_answer_safe"
| "native_openclaw_monitoring_sufficient"
| "ec_scheduled_job_sufficient"
| "no_matching_template"
| "weak_template_match_only";
```
### §3C.8 Vague requests
For vague requests like:
```text
Review this complaint.
Review this motion.
Look at this draft.
Check this.
```
Default behavior is direct-first unless one or more strong signals are present:
```text
full/deep/comprehensive review requested
source-document comparison requested
red-team requested
revision/draft/final artifact package requested
known high-confidence template exists and user often accepts it
user references saved task/template/run
user asks to set up a process
background/scheduled/recurring/independent execution requested
```
Medium-fit UX may show a quiet chip after or alongside the direct answer:
```text
[Run deeper saved task]
[Use Complaint Evaluation task]
[Ask Task Agent to set this up]
```
The chip should be suppressed when BDSM/DOC8 learning shows repeated rejection in the same context class.
---
## §3D — DOC24 Task Opportunity Packet
### §3D.1 DOC24 ownership
DOC24 owns the live decision of whether task awareness is injected into a user turn. TKP does not participate in ordinary prompt assembly unless DOC24 selects a task-design, task-review, task-inspection, prompt-evaluation, or high-confidence task-opportunity path.
DOC24 SHALL provide:
```text
ambient task-awareness card
TaskModeDecision prefilter
TaskOpportunityPacket assembly
matching task template top-k lookup
matching task invocation directive lookup
matching module preset top-k lookup
Task Agent entrypoint permission
packet token-budget enforcement
active capability/model/runtime state
```
### §3D.2 TaskOpportunityPacket schema
```ts
export type TaskOpportunityPacket = {
task_system_available: boolean;
task_agent_available: boolean;
task_opportunity_level: "none" | "low" | "medium" | "high" | "explicit";
recommended_handling:
| "do_not_mention_task"
| "direct_answer_with_optional_task_chip"
| "suggest_existing_task"
| "ask_user_task_or_direct"
| "invoke_task_agent"
| "run_explicit_task"
| "inspect_existing_run"
| "retrieve_task_output";
task_agent_consult_policy:
| "forbidden"
| "not_needed"
| "allowed_if_user_accepts"
| "recommended"
| "required";
matched_invocation_directives: CompactTaskInvocationDirectiveCard[];
matched_task_templates: CompactTaskTemplateCard[];
matched_module_presets: CompactModulePresetCard[];
task_agent_entrypoints: TaskAgentEntrypoint[];
suggested_user_visible_copy?: string;
reason_codes: string[];
veto_signals: string[];
max_tokens: number;
schema_version: "1.0";
};
```
### §3D.3 Token-budget rules
```text
Ambient task-awareness card:
50–160 tokens, always stable or included in baseline system context.
TaskOpportunityPacket:
150–600 tokens, only when task-like signals exist.
Task Agent design packet:
larger, only in explicit task-design/review/inspection/evaluation mode.
TKP cards:
retrieved on demand by Task Agent; never dumped into ordinary chat.
```
### §3D.4 No full task-system prompt stuffing
DOC24 MUST NOT inject the full TKP, full task module catalog, full template catalog, or full Addenda B content into ordinary chat. It may inject compact top-k cards only when relevant.
### §3D.5 Degraded behavior
If DOC24 cannot compute task opportunity:
```text
Elnor remains direct-first.
Explicit task commands still use task route if identifiable.
No speculative Task Agent consult occurs solely because routing failed.
```
---
## §4 — Task Agent
### §4.1 Definition
Task Agent is a DOC23 system agent/capability that helps the user design, inspect, retrieve, assess, and improve tasks.
Task Agent may be invoked by:
- conversation with Elnor;
- “Create Task from Description” UI;
- “Explain this task” UI;
- “Where is this run?” UI;
- “Assess this run” UI;
- “Improve this template” UI;
- “Find task output” request;
- scheduled or post-run assessment jobs, when policy permits.
### §4.2 Task Agent is not a module
Task Agent must not be represented as an ordinary DOC23 module in the executable graph. It does not consume or emit task port data during a run. It is a system capability tied into DOC23 read models, telemetry, templates, presets, validation, and proposal commands.
Task Agent may be deeply tied to system modules and surfaces, including:
- Monitor module / task monitoring status;
- Environment module / effective task-wide context;
- Run Inspector;
- Task Artifact Index;
- Task History;
- DOC24 context packets;
- DOC72 Task Activity Memory;
- DOC8/BDSM learning bundles.
### §4.3 Primary roles
Task Agent performs seven primary roles.
#### §4.3.1 Task creation and setup
Task Agent converts natural-language user intent into:
- Task Blueprint;
- template/preset match report;
- graph proposal or graph diff;
- capability binding plan;
- artifact/output intake policy;
- telemetry/memory policy;
- human review / side-effect plan;
- preflight report;
- saved task or template proposal.
#### §4.3.2 Task progress window
Task Agent answers questions about current and past task runs:
```text
Where are we in the Draft Complaint task?
What module is running now?
Why is it waiting?
What did the red team say?
Which documents has it reviewed?
What sub-agents are active?
What is the latest output?
What remains before completion?
```
#### §4.3.3 Output and artifact retrieval
Task Agent locates, summarizes, links, or returns outputs:
```text
Send me the results of the Marex draft task.
Show me the damages memo from the middle of that run.
Open the final complaint draft.
Find the red-team report from the complaint evaluation task.
Which task created this document?
```
#### §4.3.4 Task explanation
Task Agent explains:
- what a task does;
- why modules are present;
- how data and context flow;
- what capabilities are used;
- what outputs are produced;
- what human review points exist;
- what side effects can occur;
- how the task differs from a template or prior version.
#### §4.3.5 Task Assessment
Task Agent generates Task Assessments combining telemetry, judge/experiment data, artifacts, user feedback, DOC8 friction, BDSM utility signals, capability reliability, and substantive findings.
#### §4.3.6 Template/preset intelligence
Task Agent searches, explains, compares, reuses, customizes, forks, and proposes updates to task templates and module presets.
#### §4.3.7 Task-design self-improvement
Task Agent participates in learning from task creation sessions, task runs, user edits, accepted/rejected suggestions, output usage, artifact promotion, and Task Assessments.
### §4.4 Hard prohibitions
Task Agent must not:
1. Execute hidden task steps outside DOC23 graph execution.
2. Spawn sub-agents outside a task module or a non-task system route explicitly authorized for advisory/design work.
3. Mutate task records directly without EC command validation.
4. Change executable task templates or module presets without a versioned update proposal and required approval.
5. Bypass validation/preflight when generating a graph proposal.
6. Treat learned heuristics as canonical module semantics.
7. Store full raw task content in DOC72.
8. Add side-effecting capabilities without policy compliance and approval where required.
9. Create duplicate templates when a suitable existing template should be reused.
### §4.5 Task Agent read/control contract
Task Agent MUST read task system modules and task-wide context through a permissioned operational lens. In addition to canonical task/run records, it SHOULD be able to inspect the Monitor module, Environment module/effective task-wide context, DOC24 capability/context packets, TaskTraceSpans, prompt/capability/policy snapshots, step-output shelves, tool-result shelves, artifact index, DOC72 TaskRun memory, and DOC8/BDSM learning signals. This is required for progress answers, user-created graph review, failure diagnosis, and Task Assessment generation.
```ts
TaskAgentPermissionContract {
agent_id: "task_agent"
read_surfaces: Array<
| "task_catalog"
| "task_record"
| "task_template"
| "module_preset"
| "task_blueprint"
| "task_run_record"
| "task_run_snapshot"
| "task_run_log"
| "task_checkpoint"
| "task_artifact_index"
| "task_trace_span"
| "monitor_module_status"
| "environment_config"
| "prompt_snapshot"
| "capability_snapshot"
| "policy_decision_snapshot"
| "step_output_shelf"
| "tool_result_shelf"
| "run_inspector_read_model"
| "doc72_task_activity_memory"
| "doc24_capability_registry"
| "doc24_context_packet"
| "doc25_ingestion_result"
| "doc73_library_binding"
| "doc8_learning_bundle"
| "bdsm_utility_bundle"
>
command_surfaces: Array<
| "task.design_session.start"
| "task.blueprint.propose"
| "task.graph.propose"
| "task.graph.diff.propose"
| "task.preflight.run"
| "task.assessment.request"
| "task.artifact.promote.request"
| "task.template.update.propose"
| "task.preset.update.propose"
| "task.knowledge_pack.rebuild.request"
>
forbidden_commands: Array<
| "direct_file_write"
| "direct_graph_mutation"
| "hidden_runtime_orchestration"
| "unapproved_side_effect"
>
schema_version: "1.0"
}
```
### §4.6 TaskAgentRunLens
Task Agent answers operational questions through a structured run lens before drilling into raw logs.
```ts
TaskAgentRunLens {
task_id: string
run_id: string
task_name: string
template_id?: string
blueprint_id?: string
current_status: TaskRunStatus
definition_status: TaskDefinitionStatus
current_module_states: ModuleActivationStateSummary[]
graph_progress_summary: GraphProgressSummary
active_path_summary: ActivePathSummary[]
wait_reasons: WaitReasonSummary[]
human_gate_status: HumanGateSummary[]
active_subagents: SubAgentTraceSummary[]
trigger_summary: TriggerSummary
input_summary: TaskInputSummary
documents_reviewed_summary: ReviewedDocumentSummary[]
capability_usage_summary: CapabilityUsageSummary
cost_summary: CostSummary
quality_summary?: QualitySummary
anomaly_summary?: AnomalySummary
artifact_index: TaskArtifactIndexEntry[]
latest_outputs: TaskOutputSummary[]
final_output_refs: StorageRef[]
doc72_task_memory_ref?: string
learning_signal_refs: string[]
assessment_refs: string[]
run_inspector_url: string
updated_at: string
schema_version: "1.0"
}
```
### §4.7 Degraded modes
Task Agent must degrade honestly:
| Missing substrate | Allowed degraded behavior |
|---|---|
| DOC72 unavailable | Answer from DOC23/EC operational run truth only; no memory graph links. |
| DOC24 unavailable | Use saved capability snapshots; do not claim live connector availability. |
| DOC25 unavailable | Show FileRefs/StorageRefs; no document intelligence summaries. |
| DOC8/BDSM unavailable | Omit learning/utility recommendations. |
| Knowledge pack stale | Query live module catalog and validators; avoid relying on stale cards. |
| Run trace incomplete | State which trace surfaces are incomplete and fall back to available logs. |
---
## §4A — Task Agent typed entrypoints
### §4A.1 Entrypoint enum
```ts
export type TaskAgentEntrypoint =
| "consult_task_opportunity"
| "design_task"
| "adapt_task_template"
| "review_existing_task"
| "inspect_task_run"
| "retrieve_task_output"
| "explain_task_graph"
| "assess_task"
| "assess_task_portfolio"
| "answer_task_system_question"
| "improve_task_prompt"
| "review_prompt_quality"
| "generate_prompt_variants"
| "test_prompt_variants"
| "propose_prompt_update";
```
**Gating rule for `consult_task_opportunity`.** This entrypoint is internal-only. It is not a general instruction for Elnor to ask Task Agent whether every complex request should become a saved task. It MAY be invoked only when the TaskModeDecision/TaskOpportunityPacket marks task opportunity as `medium`, `high`, or `explicit`, or when the user directly asks whether a saved task should be used. Low/no-opportunity turns MUST NOT call this entrypoint.
### §4A.2 Entrypoint labels
```ts
export type TaskAgentEntrypointDescriptor = {
entrypoint: TaskAgentEntrypoint;
user_facing_label:
| "Check whether a saved task fits"
| "Design task"
| "Adapt template"
| "Review task"
| "Inspect run"
| "Retrieve output"
| "Explain graph"
| "Assess task"
| "Review my tasks"
| "Answer task-system question"
| "Improve prompt"
| "Review prompt quality"
| "Generate prompt variants"
| "Test prompt variants"
| "Propose prompt update";
may_run_without_llm: boolean;
may_spawn_background_session: boolean;
requires_tkp: boolean;
requires_run_truth: boolean;
requires_doc72_memory: boolean;
schema_version: "1.0";
};
```
### §4A.3 Elnor remains front door
Task Agent is never the default primary conversation owner. Elnor remains the front door and summarizes Task Agent results in Elnor's voice unless the user explicitly opens a Task Agent design/review surface.
Use:
```text
I'll ask Task Agent to draft this as a saved task.
Task Agent found an existing template that fits.
Task Agent reviewed the run and found the final output here.
```
Avoid:
```text
I am transferring you to Task Agent.
```
### §4A.4 Invocation modes
```ts
export type TaskAgentInvocationMode =
| "read_model_only"
| "synchronous_consult"
| "visible_design_session"
| "async_background_design"
| "assessment_job";
```
Modes:
1. `read_model_only` — no LLM call; tools/read models answer run/output/status queries.
2. `synchronous_consult` — small bounded Task Agent call for template selection or task review.
3. `visible_design_session` — user-facing task design/review mode.
4. `async_background_design` — OpenClaw named sub-agent/background session for larger task design, portfolio review, or prompt evaluation.
5. `assessment_job` — Task Assessment / portfolio report / prompt evaluation job.
### §4A.5 Read-model-only uses
Task Agent SHOULD use deterministic read models without LLM calls for:
```text
Where is this run?
What module is running now?
Why is it waiting?
What did module X output?
Retrieve final output.
Open Run Inspector.
List artifacts from run.
```
Representative tools:
```ts
task_runs.inspect(run_id)
task_artifacts.retrieve(run_id, artifact_query)
task_graph.explain(task_id)
task_runs.current_status(task_id)
task_outputs.find(task_id_or_run_id, query)
```
---
## §4B — Task Agent system-agent registration and runtime profile
### §4B.1 System agent identity
Task Agent MUST be registered in the System Agent Identity Registry as a named system agent.
```ts
export type SystemAgentIdentityRegistryEntry_TaskAgent = {
agent_id: "task_agent";
display_name: "Task Agent";
user_facing_aliases: ["Task Agent"];
owning_spec_doc: "DOC23 Addenda B";
purpose: "Design, adapt, inspect, explain, assess, and improve saved tasks, task templates, task graphs, task runs, task prompts, and task-system usage.";
lifecycle_state: "staged" | "active" | "disabled" | "deprecated";
required_execution_profile: "system_agent_task_design";
declared_dependencies: [
"DOC23",
"DOC23 Addenda A",
"DOC24",
"DOC72",
"DOC25",
"DOC73",
"DOC8",
"BDSM",
"DOC11",
"DOC17",
"EC Core"
];
capability_registered: true;
schema_version: "1.0";
};
```
### §4B.2 Capability registry entry
Because Task Agent is invokable, DOC24 MUST register it as an agent capability.
```ts
export type TaskAgentCapabilityRegistryEntry = {
capability_id: "agent.task_agent";
capability_type: "agent";
implemented_by_agent_id: "task_agent";
semantic_actions: TaskAgentEntrypoint[];
owner_doc: "DOC24";
detailed_contract_ref: "DOC23 Addenda B";
availability_source: "doc11_named_agent_runtime_truth" | "ec_agent_registry" | "both";
health_state: "healthy" | "degraded" | "unavailable" | "unknown";
schema_version: "1.0";
};
```
### §4B.3 Runtime profile
Task Agent requires user-configurable runtime settings. These settings control Task Agent design/review behavior, not the model settings of modules inside a saved task unless explicitly selected.
```ts
export type TaskAgentRuntimeProfile = {
agent_id: "task_agent";
default_model_ref: string;
default_think_level: "low" | "medium" | "high" | "max";
fallback_model_refs: string[];
model_policy: {
allow_cloud: boolean;
allow_local: boolean;
require_local_for_sensitivity_tags: string[];
prefer_high_reasoning_for_graph_design: boolean;
};
cost_policy: {
max_design_cost_usd: number | null;
max_assessment_cost_usd: number | null;
max_prompt_eval_cost_usd: number | null;
max_repair_iterations: number;
};
invocation_policy: {
allow_invisible_consult: boolean;
allow_background_design: boolean;
require_user_confirmation_before_save: boolean;
require_user_confirmation_before_run: boolean;
allow_prompt_evaluation_tasks: boolean;
};
context_policy: {
default_context_mode:
| "isolated_with_task_pack"
| "fork_if_safe"
| "manual_only";
max_context_pack_tokens: number;
allow_tkp_retrieval: boolean;
allow_doc72_task_memory: boolean;
allow_doc8_bDSM_learning_overlay: boolean;
allow_doc17_prompt_advisor_service: boolean;
};
schema_version: "1.0";
};
```
### §4B.4 Settings UI
```text
Settings > Agents > Task Agent
──────────────────────────────
Default model: [GPT-5.5 Pro ▾]
Think level: [High ▾]
Fallback chain: [Claude Sonnet 4.5 → Gemini 2.5 Pro ▾]
Use local-only when required: [When policy requires ▾]
Max design iterations: [3]
Max design cost: [$5.00]
Max assessment cost: [$10.00]
Max prompt-eval cost: [$15.00]
Auto-consult policy: [Only when runtime routing marks high/explicit ▾]
Show task suggestion chips: [On]
Create visible receipts: [Always]
Allow background design jobs: [On]
Allow prompt evaluation tasks: [On]
[Save] [Open Task Agent diagnostics]
```
### §4B.5 Fallback behavior
Task Agent fallback model use MUST be recorded in the Task Agent trace. High-risk graph changes, prompt promotions, side-effecting task creation, and template/preset mutations require validation/preflight after fallback model use.
```ts
export type TaskAgentModelFallbackReceipt = {
receipt_id: string;
task_agent_request_id: string;
primary_model_ref: string;
fallback_model_ref: string;
fallback_reason:
| "primary_unavailable"
| "primary_auth_failed"
| "primary_cost_blocked"
| "primary_context_exceeded"
| "policy_required_local"
| "timeout"
| "unknown";
output_requires_extra_validation: boolean;
created_at: string;
schema_version: "1.0";
};
```
---
## §4C — OpenClaw and DOC11 alignment
### §4C.1 DOC11 role
DOC11 owns OpenClaw runtime truth: named-agent availability, model catalogs, provider auth, fallback models, sub-agent settings, sessions_spawn behavior, fork/isolated context support, runtime diagnostics, and no-phantom-control readback.
Addenda B consumes DOC11 truth. It MUST NOT invent a parallel OpenClaw runtime registry.
### §4C.2 Task Agent as named OpenClaw agent
If Task Agent runs as an OpenClaw named agent, DOC11 MUST expose:
```text
task_agent installed/registered state
task_agent runtime health
task_agent model profile
task_agent fallback chain
task_agent workspace/profile resolution
sessions_spawn availability
forked context availability and parentForkMaxTokens behavior
child session run status
sub-agent completion events
session archival/cleanup state
```
### §4C.3 Task Agent context mode
Default Task Agent context mode SHOULD be:
```text
isolated_with_task_pack
```
Task Agent receives a DOC24-assembled task-scoped context pack rather than the full parent transcript.
Use `fork_if_safe` only when:
```text
The current conversation transcript itself is needed to design the task;
parentForkMaxTokens constraints are satisfied;
privacy/policy permits fork;
DOC24 cannot produce an adequate task-scoped context pack alone.
```
### §4C.4 TaskAgentContextPack schema
```ts
export type TaskAgentContextPack = {
pack_id: string;
request_ref: string;
entrypoint: TaskAgentEntrypoint;
user_intent_summary: string;
current_conversation_excerpt_ref?: StorageRef;
relevant_task_templates: CompactTaskTemplateCard[];
relevant_module_presets: CompactModulePresetCard[];
relevant_task_invocation_directives: CompactTaskInvocationDirectiveCard[];
relevant_tkp_card_refs: string[];
work_context: {
matter_refs: EntityRef[];
entity_refs: EntityRef[];
goal_refs: GoalRef[];
active_context_ref?: string;
};
artifact_context: {
source_document_refs: DocumentRef[];
prior_work_product_refs: WorkProductRef[];
prior_task_run_refs: string[];
};
capability_context: {
available_capability_refs: string[];
unavailable_capability_refs: string[];
policy_blocked_capability_refs: string[];
model_availability_summary_ref?: StorageRef;
};
constraints: {
memory_directive_refs: string[];
security_policy_refs: string[];
artifact_policy_refs: string[];
cost_policy_refs: string[];
};
schema_version: "1.0";
};
```
### §4C.5 Native OpenClaw monitoring versus saved-task monitoring
The term “monitored” MUST be disambiguated. Use the following boundary:
| Need | Use |
|---|---|
| Know whether spawned session is alive, completed, failed, timed out, or archived | OpenClaw native session/sub-agent monitoring |
| Run one side lookup or background helper while Elnor continues conversation | OpenClaw `sessions_spawn` / specialist sub-agent |
| Run recurring infrastructure maintenance, nightly compilation, or background system job | EC scheduler/orchestrator |
| Run a repeatable visible process with modules, checkpoints, outputs, gates, artifacts, task memory, and Run Inspector | Saved task / tracked workflow |
| Monitor an external condition and then execute a structured process | EC trigger/schedule launching a saved task |
| Watch something simple and announce if it changes | OpenClaw watcher or EC job unless workflow structure/artifacts/gates are needed |
Saved tasks SHOULD be used for process monitoring that needs durable workflow state, graph visibility, outputs, artifacts, gates, or auditability. Saved tasks SHOULD NOT be used merely for OpenClaw heartbeat/session monitoring or simple background watchers.
### §4C.6 Task Agent sub-agent policy
Task Agent itself may spawn sub-agents only under explicit policy. Default should be conservative.
```ts
export type TaskAgentSubagentPolicy = {
allow_subagent_spawn: boolean;
allowed_named_agent_ids: string[];
max_child_sessions: number;
max_depth: number;
max_child_cost_usd: number | null;
context_mode_default: "isolated" | "fork_if_safe";
require_sub_agent_context_pack: boolean;
on_cap_hit: "abort" | "continue_with_warning";
allowed_entrypoints_for_subagents: TaskAgentEntrypoint[];
schema_version: "1.0";
};
```
Default:
```text
ordinary task design: allow_subagent_spawn = false
portfolio assessment: allow_subagent_spawn = true if configured
heavy template audit: allow_subagent_spawn = true if configured
prompt evaluation job: allow_subagent_spawn = true if configured
```
### §4C.7 OpenClaw runtime trace bridge
DOC11/OpenClaw may emit native runtime artifacts relevant to task observability but not DOC23 graph events by themselves, including sub-agent spawn acceptance, child session keys, completion/failure/timeout events, model/fallback resolution, forked-vs-isolated context mode, structured execution-item progress, provider/tool runtime diagnostics, `sessions_yield` / hidden-follow-up orchestration payloads where applicable, and trajectory/export references when policy permits.
DOC23 Addenda B SHALL consume those native events through an adapter rather than claiming ownership of OpenClaw runtime truth.
```ts
export type OpenClawTaskRuntimeTraceBridge = {
bridge_id: string;
task_id?: string;
run_id?: string;
task_agent_consult_id?: string;
openclaw_session_key?: string;
openclaw_child_session_key?: string;
openclaw_run_id?: string;
native_event_kind:
| "sessions_spawn_accepted"
| "subagent_completed"
| "subagent_failed"
| "subagent_timed_out"
| "model_fallback_applied"
| "context_mode_resolved"
| "structured_progress_item"
| "sessions_yield"
| "trajectory_export_created"
| "runtime_diagnostic";
native_event_ref: StorageRef;
mapped_task_trace_span_ref?: StorageRef;
mapping_status:
| "mapped"
| "stored_native_ref_only"
| "blocked_by_policy"
| "unavailable"
| "degraded";
degraded_reason_codes: string[];
created_at: string;
schema_version: "1.0";
};
```
Normative rule: DOC23 consumes OpenClaw runtime traces for task observability, but DOC11/OpenClaw remains the owner of native runtime truth. If an OpenClaw event cannot be mapped safely into a TaskTraceSpan, DOC23 stores a native-event reference and degraded reason instead of inventing a span.
This bridge is relevant to the entire task system, not only Task Agent. It lets Run Inspector, Task Assessment, and Task Agent understand native sub-agent/background execution without creating a second runtime ledger.
### §4C.8 Workflow formalization continuum
As LLMs improve, Elnor will be increasingly capable of long multi-step work directly in chat or native OpenClaw execution. Saved tasks remain valuable where formalization adds product value.
```text
Level 0 — Direct chat response
Ordinary answers, quick analysis, interactive reasoning.
Level 1 — Direct chat with internal plan
Multi-step reasoning where user remains actively engaged.
Level 2 — Native OpenClaw agentic run / structured execution items
Long in-session action sequence where progress visibility helps, but no saved graph/template is needed.
Level 3 — Ad hoc saved/tracked task
One request needs independent execution, artifacts, gates, progress, or audit trail.
Level 4 — Saved task template
Process will recur or should be reusable.
Level 5 — Standing procedure / recurring task invocation directive
Semantic trigger should repeatedly launch, suggest, or adapt a saved task.
```
Native OpenClaw structured progress items are a runtime-progress surface. Saved tasks are durable process objects. Use native OpenClaw progress when Elnor is doing direct or in-session work and no reusable graph/template, durable artifact policy, independent task lifecycle, Run Inspector, or template reuse is needed. Use saved tasks when formalization adds durable value: repeatability, graph visibility, independent/background execution, scheduled/triggered state, human gates, artifact tracking, telemetry, task memory, template reuse, or Task Assessment.
Task use should be justified by formalization value, not by model incapability.
---
## §5 — Task creation and onboarding flow
### §5.1 Purpose
Task creation is the highest-leverage self-learning surface in this addendum. The system must learn how to convert underspecified user intent into reliable, high-value, reusable DOC23 tasks.
### §5.2 Required creation flow
Task Agent task creation follows this sequence:
```text
1. Capture user intent.
2. Resolve task type, domain, work context, entities, goals, and output expectations.
3. Search existing task templates, module presets, design patterns, and prior design cases.
4. Query DOC24 for available capabilities, connectors, MCP servers, procedures, skills, agents, and models.
5. Retrieve relevant Task Design Doctrine, pattern cards, user preferences, goal links, and worked examples.
6. Ask high-value missing-slot questions only when needed.
7. Produce Task Blueprint.
8. Generate or adapt graph.
9. Produce capability binding plan, artifact intake policy, memory policy, side-effect/human-review plan, and assessment plan.
10. Run validation/preflight.
11. Revise proposal based on validation/preflight.
12. Present proposal with rationale and alternatives.
13. Save through EC command only after user approval or policy-authorized confirmation.
14. Record TaskCreationSessionTrace for learning.
```
### §5.3 Optional onboarding-style conversation
For complex tasks, Task Agent may run a structured but natural onboarding-style conversation. This must not be a rigid form. The system should know which slots matter; the LLM decides how to ask.
Example onboarding slots:
```ts
TaskCreationOnboardingSlots {
task_goal?: string
task_type?: string
work_context?: EntityRef[]
source_materials?: SourceMaterialRef[]
expected_outputs?: OutputArchetype[]
success_criteria?: QualityCriterion[]
required_steps?: StepIntent[]
optional_steps?: StepIntent[]
review_tolerance?: "low_review" | "normal_review" | "high_review"
autonomy_tolerance?: "manual" | "assisted" | "mostly_auto"
side_effects_allowed?: SideEffectDeclaration[]
capability_needs?: CapabilityNeed[]
artifact_memory_preferences?: ArtifactPreference[]
budget_latency_preferences?: BudgetLatencyPreference
schema_version: "1.0"
}
```
### §5.4 Question policy
Task Agent should ask questions only where the answer changes the design materially. It should learn question utility through DOC8/BDSM.
Examples:
| Context | High-value question | Usually low-value question |
|---|---|---|
| Complaint evaluation | “Do you want issue matrix, memo, or both?” | “Which model should I use?” |
| Drafting legal document | “Should this include red-team and revision?” | “Should I use an agent module?” |
| Email-triggered attachment task | “Should attachments be saved/ingested or only summarized?” | “Should the email trigger emit body_out?” |
| Side-effecting task | “Should external sends require approval?” | “Should I add a signal_out?” |
### §5.5 User-facing proposal panel
Task Agent must show task proposals in business/user terms first, graph terms second.
```text
Proposed task: Draft Complaint
Goal:
Draft a filing-ready complaint from selected source documents.
I found:
- Existing template: Complaint Drafting Template v3
- Useful presets: Damages Memo Agent, Complaint Red-Team Judge
- Relevant preference: complex legal analysis should include second-pass deepening
Recommended process:
1. Source document intake and classification
2. Fact extraction
3. Claims and damages memo
4. Complaint draft
5. Second-pass "look deeper" review
6. Red-team review
7. Revision and final package
Outputs:
- Final complaint draft
- Damages analysis memo
- Red-team report
- Revision summary
Memory/artifacts:
- Final complaint: save as work product/document
- Damages memo: suggest making findable later
- Raw intermediate drafts: keep with run only
[Inspect template] [Edit blueprint] [Generate graph] [Run preflight] [Save]
```
### §5.6 TaskDesignIntakeBundle
Task creation must produce a structured bundle, not merely a graph. This mirrors DOC3's principle that demonstrations produce multi-output knowledge bundles rather than single procedures, while remaining DOC23-owned and task-specific.
```ts
TaskDesignIntakeBundle {
bundle_id: string
design_session_id: string
user_intent_text: string
resolved_intent_summary: string
inferred_task_type: string
work_context_refs: EntityRef[]
matter_or_project_refs: EntityRef[]
goal_refs: GoalRef[]
template_match_report_ref?: StorageRef
selected_template_refs: string[]
selected_preset_refs: string[]
considered_pattern_refs: string[]
considered_case_refs: TaskDesignCaseRef[]
task_blueprint_ref: StorageRef
graph_proposal_ref: StorageRef
capability_plan_ref: StorageRef
artifact_intake_policy_ref: StorageRef
telemetry_memory_policy_ref: StorageRef
side_effect_review_plan_ref: StorageRef
preflight_report_ref?: StorageRef
extracted_user_preferences: TaskDesignPreferenceCandidate[]
extracted_goal_links: TaskGoalLinkCandidate[]
extracted_template_routing_directives: TemplateRoutingDirective[]
extracted_design_learning_signals: TaskAgentDesignLearningSignal[]
schema_version: "1.0"
}
```
The saved task graph is only one output of the design intake process. The bundle is the learning unit for task creation.
### §5.7 DOC24/DOC3-style task creation intake posture
Task creation should use DOC24's onboarding philosophy: the LLM asks naturally, but the system knows which slots matter, what can be auto-resolved, and when readiness is sufficient. It should also borrow the DOC3 semantic-learning principle: preserve semantic intent, rationale, preferences, constraints, vocabulary, goals, and reusable patterns instead of preserving only mechanical graph operations.
The Task Agent must detect and preserve statements such as:
```text
"For complex analysis, I like a second pass that says look deeper."
"When reviewing a complaint, I usually want an issue matrix, not just a memo."
"Don't use PACER unless docket history matters."
"Make damages memos findable because I reuse them."
```
Those statements should become preference candidates, goal links, pattern-card updates, template-routing directives, artifact-policy candidates, or Task Agent instruction candidates as appropriate. They must not disappear as ordinary chat text.
---
## §5A — Visible Elnor-created saved tasks
### §5A.1 Rule
Elnor may create or instantiate a saved task through Task Agent when the task fit is high or explicit, but all created tasks must be visible, inspectable, editable, cancelable, and receipt-backed.
Hidden saved-task graphs are forbidden. Internally, any DOC23 graph created for execution MUST be surfaced as a saved/tracked task with a visible receipt, graph link, run/status surface, edit controls, and cancellation/disable controls.
### §5A.2 Creation modes
```ts
export type TaskCreationMode =
| "user_authored"
| "task_agent_proposed"
| "elnor_created_tracked"
| "template_instantiated"
| "scratch_draft_not_run";
```
### §5A.3 TrackedTaskReceipt
```ts
export type TrackedTaskReceipt = {
receipt_id: string;
task_id: string;
run_id?: string;
creation_mode: TaskCreationMode;
visible_to_user: true;
reason_for_task_mode: string;
graph_inspector_url: string;
run_inspector_url?: string;
can_cancel: boolean;
can_pause: boolean;
can_edit: boolean;
can_convert_to_template: boolean;
doc72_task_memory_ref?: string;
created_at: string;
schema_version: "1.0";
};
```
### §5A.4 Elnor-created task semantic trace
Every Elnor-created saved task MUST produce semantic invocation memory. This trace links the originating request, TaskModeDecision, Task Agent design session, resulting task, visible receipt, user-visible rationale, work context, entities, goals, and any TaskInvocationDirective candidate.
```ts
export type ElnorCreatedTaskSemanticTrace = {
trace_id: string;
originating_user_request_ref: string;
task_mode_decision_ref: string;
task_agent_design_session_ref?: string;
resulting_task_id: string;
resulting_task_template_id?: string;
resulting_task_receipt_ref: string;
semantic_request_summary: string;
why_task_mode_was_selected: string;
user_visible_description: string;
linked_task_invocation_directive_candidate_id?: string;
linked_doc72_goal_refs: string[];
linked_entity_refs: string[];
linked_work_context_refs: string[];
created_at: string;
schema_version: "1.0";
};
```
This trace supports later recognition of similar requests and prevents Elnor-created tasks from becoming disconnected from the semantic user request that produced them.
### §5A.5 User-facing examples
```text
I set this up as a saved task because it needs to run independently and preserve outputs. It will check the docket, retrieve new filings, route documents through ingestion, summarize them, flag deadlines, and ask before sending anything externally.
[View task] [Edit steps] [Run now] [Disable]
```
For high-stakes side effects:
```text
I drafted the saved task but have not run it yet because it can send external messages. Review the graph and approval gates first.
```
---
## §6 — Task Blueprint
### §6.1 Definition
Task Blueprint is the semantic intent contract behind a DOC23 graph. It records what the task is trying to accomplish, why the graph is structured as it is, what outputs are expected, what quality criteria matter, what capabilities are needed, what review/side-effect rules apply, and what memory/artifact behavior should occur.
A graph is executable. A blueprint is explanatory, evaluative, reusable, and learnable.
### §6.2 Blueprint is not a graph
The Blueprint must not duplicate cable-level execution. It captures user intent, process logic, design rationale, goals, quality criteria, and graph-module bindings.
### §6.3 TaskBlueprint schema
```ts
TaskBlueprint {
blueprint_id: string
name: string
description?: string
business_or_personal_goal: string
task_type: string
domain?: string
document_archetypes: string[]
work_context_refs: EntityRef[]
matter_or_project_refs: EntityRef[]
goal_refs: GoalRef[]
expected_inputs: InputContract[]
expected_outputs: OutputContract[]
step_intents: StepIntent[]
quality_criteria: QualityCriterion[]
human_review_points: ReviewPoint[]
side_effects: SideEffectDeclaration[]
capability_needs: CapabilityNeed[]
model_strategy?: ModelStrategy
memory_and_artifact_policy: TaskArtifactIntakePolicy
telemetry_policy: TaskTelemetryPolicy
assessment_plan: TaskAssessmentPlan
security_profile_ref?: string
retention_policy_ref?: string
template_binding?: TaskTemplateBinding
generated_graph_ref?: StorageRef
current_graph_version?: number
blueprint_version: string
prior_version_ref?: StorageRef
authored_by: "user" | "task_agent" | "system" | "mixed"
created_at: string
updated_at: string
schema_version: "1.0"
}
```
### §6.4 StepIntent schema
```ts
StepIntent {
step_intent_id: string
label: string
purpose: string
why_it_exists?: string
expected_input_refs: string[]
expected_output_refs: string[]
suggested_module_type?: string
bound_module_ids: string[]
required_capabilities: CapabilityNeed[]
optional_capabilities: CapabilityNeed[]
quality_checks: QualityCriterion[]
artifact_policy_override?: ArtifactIntakeLevel
goal_refs: GoalRef[]
design_rationale?: string
alternatives_considered?: string[]
schema_version: "1.0"
}
```
### §6.5 Graph binding
Each StepIntent may bind to zero, one, or multiple graph modules. A Blueprint step can map to a graph region.
Example:
```text
Blueprint step: “Assess damages theory”
-> source.file
-> step.agent_task
-> step.judge
-> optional human review
```
### §6.6 Blueprint versioning and diffs
Task Blueprint must support versioned diffs.
```ts
BlueprintDiff {
diff_id: string
prior_blueprint_id: string
candidate_blueprint_id: string
changed_fields: BlueprintChangedField[]
step_intent_changes: StepIntentChange[]
output_contract_changes: OutputContractChange[]
capability_need_changes: CapabilityNeedChange[]
artifact_policy_changes: ArtifactPolicyChange[]
rationale: string
evidence_refs: StorageRef[]
schema_version: "1.0"
}
```
### §6.7 Blueprint-to-graph validation
Preflight must compare Blueprint and graph:
- Blueprint expects a step but graph lacks bound module(s).
- Graph contains a side-effecting output not declared in Blueprint.
- Blueprint expects a final output but no output module/port emits it.
- Blueprint declares DOC25 ingestion but artifact policy does not route documents.
- Blueprint requires PACER but capability binding forbids PACER.
- Blueprint requires human review before external send but graph has no gate.
### §6.8 Blueprint and DOC72 goal nodes
Blueprint and DOC72 goal nodes are distinct.
```text
Task Blueprint = intent contract for one task/template.
DOC72 goal node = reusable objective/rationale in the memory matrix.
```
Blueprint should link to DOC72 goal nodes when design decisions serve reusable objectives.
Examples:
```text
Goal: Improve substantive depth in complex legal analysis.
Served by: second-pass deepening, red-team, judge, looped revision.
Goal: Preserve valuable intermediate analysis for later reuse.
Served by: artifact intake policy, DOC72 summary/reference, DOC25 work_product ingestion.
```
### §6.9 TaskDesignGoalPayload
Use existing DOC72 `goal` node kind. Do not create a new DOC72 node kind.
```ts
TaskDesignGoalPayload {
goal_type:
| "task_outcome_goal"
| "process_quality_goal"
| "output_quality_goal"
| "artifact_memory_goal"
| "cost_latency_goal"
| "risk_control_goal"
| "user_preference_goal"
goal_statement: string
applies_to_context_class?: string
success_criteria: string[]
disfavored_tradeoffs: string[]
preferred_patterns: TaskDesignPatternRef[]
preferred_templates: TaskTemplateRef[]
preferred_module_presets: ModulePresetRef[]
source:
| "architect_authored"
| "user_stated"
| "learned_from_task_creation"
| "learned_from_task_assessment"
| "system_inferred"
evidence_refs: StorageRef[]
confidence: number
schema_version: "1.0"
}
```
### §6.10 Goal relationship patterns
DOC72 should support or accept relation labels such as:
```text
blueprint expresses_goal goal
design_pattern implements_goal goal
task_template serves_goal goal
module_preset supports_goal goal
task_run advanced_goal goal
task_assessment evaluated_goal goal
artifact satisfies_goal goal
module_design_decision serves_goal goal
user_preference constrains_goal goal
```
---
## §7 — Templates, presets, and reuse-first task creation
### §7.1 Definitions
| Object | Owner | Meaning |
|---|---|---|
| Module Preset | DOC23 | Saved config for one module type. |
| Task Template | DOC23 | Reusable executable graph/template. |
| Task Blueprint | DOC23 | Semantic intent/process contract behind a task/template. |
| Task Design Pattern | DOC23 + DOC72/DOC8 | Abstract reusable process motif. |
| Template Group | DOC23 | Optional bundle of multiple related task templates. |
Avoid introducing a new user-facing noun such as “Playbook” until the distinction earns its keep. If a package is needed, use “Template Group” or “Workflow Package” provisionally.
### §7.2 Reuse-first rule
Before proposing a new graph, Task Agent must search:
1. Task Template Semantic Cards.
2. Module Preset Semantic Cards.
3. Task Design Pattern Cards.
4. Prior Task Design Cases.
5. Similar Task Blueprints.
Task Agent must present strong matches unless the user explicitly asks for a blank/new task.
### §7.3 TaskTemplateSemanticCard
The executable template remains DOC23 canonical. The semantic card is a projection used for routing, explanation, reuse, and learning.
```ts
TaskTemplateSemanticCard {
template_id: string
doc23_template_ref: StorageRef
canonical_name: string
description: string
task_type: string
domain_tags: string[]
document_archetypes: string[]
work_context_tags: string[]
trigger_phrases: string[]
semantic_lookup_phrases: string[]
use_conditions: string[]
non_use_conditions: string[]
expected_inputs: InputContract[]
expected_outputs: OutputContract[]
default_output_archetypes: OutputArchetype[]
blueprint_ref?: StorageRef
graph_snapshot_ref: StorageRef
module_summary: TemplateModuleSummary[]
module_preset_refs: string[]
capability_requirements: CapabilityNeed[]
connector_requirements: ConnectorRequirement[]
memory_artifact_policy_summary: string
evaluation_plan_summary?: string
known_strengths: string[]
known_limitations: string[]
user_preference_refs: string[]
goal_refs: GoalRef[]
last_used_at?: string
usage_count: number
assessment_summary?: TemplateAssessmentSummary
lifecycle_state:
| "draft"
| "active"
| "candidate_update"
| "deprecated"
| "archived"
graph_hash: string
blueprint_hash?: string
source_authority: KnowledgeAuthorityClass
schema_version: "1.0"
}
```
### §7.4 ModulePresetSemanticCard
```ts
ModulePresetSemanticCard {
preset_id: string
doc23_preset_ref: StorageRef
canonical_name: string
module_type: string
description: string
role_in_workflow: string
trigger_phrases: string[]
semantic_lookup_phrases: string[]
use_conditions: string[]
non_use_conditions: string[]
expected_input_kinds: string[]
expected_output_kinds: string[]
output_shape_summary: string
required_capabilities: CapabilityNeed[]
optional_capabilities: CapabilityNeed[]
forbidden_capabilities: CapabilityNeed[]
compatible_task_types: string[]
incompatible_task_types: string[]
config_summary: string
why_this_preset_exists?: string
known_strengths: string[]
known_limitations: string[]
common_misuses: string[]
user_preference_refs: string[]
goal_refs: GoalRef[]
design_case_refs: TaskDesignCaseRef[]
usage_count: number
last_used_at?: string
assessment_summary?: PresetAssessmentSummary
config_hash: string
lifecycle_state:
| "draft"
| "active"
| "candidate_update"
| "deprecated"
| "archived"
source_authority: KnowledgeAuthorityClass
schema_version: "1.0"
}
```
### §7.5 TemplateMatchScore
```ts
TemplateMatchScore {
template_id: string
semantic_intent_match: number
task_type_match: number
input_contract_match: number
output_contract_match: number
capability_availability_match: number
entity_context_match: number
user_preference_match: number
prior_assessment_score: number
recency_or_staleness_score: number
overall_score: number
hard_vetoes: MatchVeto[]
soft_penalties: MatchPenalty[]
explanation: string
recommended_action:
| "use_as_is"
| "customize"
| "fork"
| "create_new_from_pattern"
| "create_new_blank"
schema_version: "1.0"
}
```
Hard vetoes include:
- required capability unavailable;
- forbidden side effect present;
- wrong output class;
- archived/deprecated template;
- security policy conflict;
- memory/artifact policy conflict;
- materially wrong task type.
### §7.6 Template update proposals
```ts
TemplateUpdateProposal {
proposal_id: string
target_kind: "task_template" | "module_preset" | "task_blueprint" | "semantic_card"
target_id: string
update_kind:
| "semantic_metadata"
| "blueprint_change"
| "graph_change"
| "module_config_change"
| "capability_binding_change"
| "artifact_policy_change"
| "evaluation_plan_change"
| "deprecation"
rationale: string
evidence_refs: StorageRef[]
supporting_signals: TaskDesignLearningSignal[]
proposed_patch_ref: StorageRef
prior_hash: string
candidate_hash: string
approval_required: boolean
rollback_ref?: StorageRef
lifecycle_state: "candidate" | "staged" | "approved" | "rejected" | "applied" | "rolled_back"
schema_version: "1.0"
}
```
### §7.7 Low/medium/high-risk update classes
| Risk | Examples | Approval |
|---|---|---|
| Low | add trigger phrase, improve description, add known limitation, update usage count | May auto-stage or auto-apply per policy. |
| Medium | add optional capability, change default output package, add optional judge/red-team step, alter artifact policy default | User or architect review. |
| High | rewire graph, remove module, change side-effecting output, change security/capability policy, promote optimized prompt | Explicit approval + versioning + rollback. |
### §7.8 User teaching directives
```ts
TemplateRoutingDirective {
directive_id: string
template_id?: string
preset_id?: string
directive_kind:
| "prefer_when_phrase"
| "avoid_when_phrase"
| "prefer_for_task_type"
| "avoid_for_task_type"
| "prefer_output_archetype"
| "require_user_confirmation"
phrase_or_condition: string
source: "user_taught" | "task_agent_inferred" | "assessment_inferred"
confidence: number
approval_state: "active" | "candidate" | "rejected"
schema_version: "1.0"
}
```
Example:
```text
When Will says “complaint audit,” prefer Complaint Evaluation Template.
When Will asks only for “citation cleanup,” avoid Complaint Evaluation Template.
```
---
## §7A — Template Groups / Workflow Packages
### §7A.1 Purpose
Template Groups / Workflow Packages are optional aggregation objects for managing multiple related Task Templates and shared assets. They should not be used where a single Task Template is sufficient.
The audit rule is:
```text
Single reusable graph + blueprint + presets + capability requirements + evals + artifact policy
-> Task Template.
Multiple related templates with shared presets, shared rubrics, shared examples, shared capability setup, or cross-template learning
-> Template Group / Workflow Package.
```
### §7A.2 Distinction from Task Template
A Task Template answers: “What executable graph should I run or instantiate for this process?”
A Template Group answers: “What family of related templates, presets, examples, rubrics, and defaults should Task Agent use for this broader workflow area?”
A Template Group may include one primary template, but if it never includes more than one template or shared cross-template assets, it should collapse back into the Task Template.
### §7A.3 TemplateGroup schema
```ts
TaskTemplateGroup {
group_id: string
display_name: string
description: string
domain_tags: string[]
task_type_tags: string[]
primary_template_id?: string
related_template_ids: string[]
shared_module_preset_ids: string[]
shared_blueprint_pattern_refs: TaskDesignPatternRef[]
shared_capability_requirements: CapabilityNeed[]
shared_connector_requirements: ConnectorRequirement[]
shared_artifact_policy_ref?: StorageRef
shared_assessment_plan_ref?: StorageRef
shared_eval_dataset_refs: string[]
shared_judge_rubric_refs: string[]
sample_input_refs: StorageRef[]
example_output_refs: StorageRef[]
known_limitations: string[]
usage_summary?: TemplateGroupUsageSummary
version: string
lifecycle_state: "draft" | "active" | "deprecated" | "archived"
schema_version: "1.0"
}
```
### §7A.4 User-facing experience
The Task Agent should not burden the user with this object unless it helps. The ordinary user experience should be:
```text
I found an existing Complaint Evaluation template that fits.
Related templates are also available in the same group:
- Draft Complaint
- Red-Team Complaint
- Revise Complaint
Recommendation:
Use Complaint Evaluation now. Keep the related drafting templates available if you want to turn this into a full complaint workflow.
[Use template] [Inspect related templates] [Customize] [Create new]
```
### §7A.5 What the group adds beyond the template page
A Template Group adds value only when it provides at least one of the following beyond a single preset/template view:
- cross-template setup guidance;
- shared capability setup, such as PACER + DOC25 + citation checker;
- shared output archetype defaults;
- shared evaluation rubrics;
- shared example inputs/outputs;
- cross-template assessment trends;
- recommended handoffs between templates;
- a consistent onboarding flow for a family of tasks;
- a place for Task Agent to learn design patterns across related templates without mutating each template separately.
### §7A.6 Merge-with-template rule
If a Template Group has only one template, no shared assets, no cross-template learning value, and no separate user-facing workflow function, EC should flag it for collapse into the underlying Task Template.
---
## §7B — DOC72 / DOC3 / DOC23 procedural boundary
### §7B.1 Four-layer procedural taxonomy
Task invocation overlaps with procedural memory and semantic skills, but it must not collapse into DOC3 skills.
```text
Procedural memory:
Soft guidance about how the user likes work handled.
Usually memory_directive.
Skill procedure:
Reusable app/tool interaction technique.
DOC3 owns lifecycle; stored as procedure nodes; delivered by DOC24.
Standing procedure:
Confirmed trigger-action behavioral automation.
DOC1/DOC72 governed.
Saved task / DOC23 task:
Hardened modular automation graph with modules, typed cables, gates, retries, telemetry, artifacts, and run history.
```
### §7B.2 TaskInvocationDirective
A TaskInvocationDirective is semantic routing/procedural guidance that says when Elnor should suggest, run, adapt, or design a saved task. It does not duplicate the task graph.
```ts
export type TaskInvocationDirective = {
directive_id: string;
canonical_name: string;
description: string;
trigger_phrases: string[];
semantic_lookup_phrases: string[];
use_conditions: string[];
non_use_conditions: string[];
invocation_policy:
| "suggest_task"
| "run_existing_task_after_confirmation"
| "invoke_task_agent_for_design"
| "invoke_task_agent_for_review"
| "answer_direct_but_offer_task_chip"
| "do_not_suggest_task";
target_task_template_refs: string[];
target_task_ids?: string[];
target_module_preset_refs?: string[];
task_agent_entrypoint?: TaskAgentEntrypoint;
pre_invocation_context_plan?: {
gather_matter_context: boolean;
gather_background_documents: boolean;
required_source_sets: string[];
doc25_required: boolean;
doc73_library_refs?: string[];
};
default_artifact_policy_ref?: string;
confirmation_policy:
| "never_auto_run"
| "confirm_before_run"
| "auto_run_if_user_explicit"
| "auto_run_if_standing_procedure";
prior_task_use_summary?: string;
evidence_refs: StorageRef[];
learned_from_refs: string[];
authority_source:
| "user_taught"
| "user_confirmed"
| "learned_from_acceptance"
| "learned_from_rejection"
| "architect_authored"
| "system_candidate";
confidence: number;
lifecycle_state:
| "candidate"
| "active"
| "suppressed"
| "deprecated";
schema_version: "1.0";
};
```
### §7B.3 TaskInvocationDirectiveCandidate
TaskInvocationDirective candidates are created from actual use, not from speculative prompt guesses. When Elnor receives a user request, chooses a saved task or Task Agent path, and the user accepts, rejects, edits, or repeats that path, EC SHOULD record a `TaskInvocationTrace`. Repeated or high-confidence traces MAY produce a `TaskInvocationDirectiveCandidate`.
```ts
export type TaskInvocationDirectiveCandidate = {
candidate_id: string;
source_trace_refs: string[];
source_task_ids: string[];
source_task_template_ids: string[];
source_task_agent_design_session_ids: string[];
user_request_patterns: string[];
inferred_trigger_phrases: string[];
inferred_semantic_lookup_phrases: string[];
proposed_directive_ref: StorageRef;
evidence_summary: string;
acceptance_count: number;
rejection_count: number;
edit_count_after_acceptance: number;
recommended_lifecycle:
| "memory_directive_candidate"
| "standing_procedure_candidate"
| "template_semantic_card_update"
| "suppress_task_suggestion_rule";
review_required: boolean;
schema_version: "1.0";
};
```
Examples:
```text
If the user repeatedly accepts “Use Complaint Evaluation task” after asking “complaint audit,” create a candidate directive mapping that phrase to the Complaint Evaluation template.
If the user repeatedly rejects task suggestions after “quick review,” create a suppression/update candidate for that context class.
If Elnor creates a visible saved task from “monitor Marex filings,” link the semantic request, resulting task, template/preset choices, and later run outcomes.
```
The directive candidate is not the executable graph. The executable task remains DOC23 canonical. The directive is semantic routing/procedural guidance stored as a DOC72 `memory_directive` candidate or, if it becomes a confirmed trigger-action behavior, a `standing_procedure` candidate.
### §7B.4 Storage classification
```text
One-off task-mode decision:
TaskInvocationTrace / execution_trace extension.
Soft routing preference:
memory_directive.
Confirmed conditional trigger-action routing:
standing_procedure.
Executable graph:
DOC23 task/template.
Task run:
execution_trace node linked to task, directive, template, entities, goals, artifacts, work products.
```
### §7B.5 Boundary invariant
A DOC23 task template is not a DOC3 skill procedure. A TaskInvocationDirective may reference DOC23 task templates, DOC3 procedures, DOC24 capabilities, DOC72 goals, and user preferences, but the executable graph remains DOC23 canonical.
DOC3 graph-backed procedures execute through DOC24 direct injection from DOC72 contracts. DOC23 task graphs execute through DOC23/EC task runtime. Do not materialize DOC23 graphs as `SKILL.md` files.
### §7B.6 Example
User request pattern:
```text
Review this motion to dismiss and generate an opposition.
```
Potential directive:
```text
When Will asks for a full opposition-drafting workflow:
- resolve matter/work context;
- gather complaint, motion, exhibits, prior filings, source docs;
- search for existing Opposition Drafting task/template;
- if available, suggest or run it after confirmation;
- otherwise invoke Task Agent to design/adapt a saved task;
- route substantial documents through DOC25;
- save final draft as work product;
- preserve task run in DOC72 unless incognito.
```
---
## §8 — Task Agent education and Task Design Intelligence substrate
### §8.1 Purpose
Task Agent cannot be reliable if it merely receives DOC23 specs in a prompt. The task system is too detailed, too schema-sensitive, and too fast-evolving. Task Agent must be educated by a hybrid, versioned, self-updating substrate.
### §8.2 Education stack
Task Agent education consists of:
```text
A. Task Agent Core Contract
B. Task Design Doctrine
C. Generated Task System Knowledge Pack
D. Module/Port/Config/Validation Cards
E. Module/Config Rationale Cards
F. Design Pattern and Anti-Pattern Library
G. Template/Preset Semantic Index
H. Task Design Casebook
I. DOC72 Goal/Preference/Entity Links
J. DOC24 Capability/Connector/MCP/Model Live Registry
K. DOC8/BDSM Learning Bundles
L. Typed Task Agent Tools
M. Task Agent Design Eval Suite
```
### §8.3 Generated truth is necessary but insufficient
Generated cards from DOC23/code/spec answer:
```text
What exists?
What ports exist?
What config fields exist?
What validation rules exist?
What defaults exist?
```
They do not reliably answer:
```text
Why does this feature exist?
When should it be used?
What design problem does it solve?
What user preference makes it valuable?
Which prior task demonstrates it working?
What goal did it serve?
```
Therefore the Task System Knowledge Pack is hybrid.
### §8.4 TaskAgentCoreContract
```ts
TaskAgentCoreContract {
agent_id: "task_agent"
role_summary: string
hard_rules: string[]
command_permissions: string[]
forbidden_behaviors: string[]
required_tool_sequence_for_creation: string[]
degraded_mode_rules: string[]
schema_version: "1.0"
}
```
Core rules include:
- Search templates/presets before creating new graphs.
- Use tools/validators rather than memory alone.
- Propose, do not silently mutate.
- Do not bypass EC.
- Do not create hidden runtime orchestration.
- Validate/preflight before presenting executable proposals.
- Explain design choices in user-facing terms.
### §8.5 TaskDesignDoctrine
Human-authored design doctrine captures principles not derivable from code.
```ts
TaskDesignDoctrine {
doctrine_id: string
version: string
hard_principles: string[]
preferred_patterns: string[]
disfavored_patterns: string[]
approval_rules: string[]
examples: TaskDesignCaseRef[]
goal_refs: GoalRef[]
source_refs: SpecSourceRef[]
schema_version: "1.0"
}
```
Example doctrine entries:
```text
Prefer visible graph topology over hidden autonomy.
Prefer reuse of templates/presets before new graph creation.
For complex legal analysis, suggest second-pass deepening or red-team review.
For final legal work products, consider DOC25 ingestion and work_product linkage.
For side-effecting outputs, require or recommend human review before delivery.
For long-running tasks, prefer durable checkpoints/gates over implicit agent continuation.
```
### §8.6 TaskSystemKnowledgePack
```ts
TaskSystemKnowledgePack {
pack_id: string
doc23_version: string
generated_at: string
source_refs: SpecSourceRef[]
source_hashes: Record<string, string>
core_contract_ref: StorageRef
design_doctrine_refs: StorageRef[]
module_cards: TaskModuleCard[]
port_cards: PortSemanticsCard[]
config_cards: ModuleConfigCard[]
validation_cards: ValidationRuleCard[]
rationale_cards: TaskModuleDesignRationaleCard[]
config_rationale_cards: TaskConfigRationaleCard[]
decision_cards: TaskDesignDecisionCard[]
pattern_cards: TaskDesignPatternCard[]
anti_pattern_cards: TaskAntiPatternCard[]
template_semantic_cards: TaskTemplateSemanticCard[]
module_preset_semantic_cards: ModulePresetSemanticCard[]
capability_cards: CapabilityUseCard[]
model_selection_cards: ModelSelectionCard[]
ui_control_cards: TaskUIControlCard[]
worked_examples: TaskWorkedExample[]
design_case_refs: TaskDesignCaseRef[]
learning_bundle_refs: StorageRef[]
coverage_report_ref: StorageRef
eval_report_ref?: StorageRef
lifecycle_state: "active" | "staged" | "rejected" | "deprecated" | "stale"
schema_version: "1.0"
}
```
### §8.7 Task Knowledge Compiler
“Self-updating” means deterministic compilation, validation, evaluation, and activation. It does not mean the LLM rewrites its own instructions opportunistically.
```ts
TaskKnowledgeCompilerRun {
compiler_run_id: string
input_sources: TaskKnowledgeSourceRef[]
generated_pack_id: string
source_hashes: Record<string, string>
generation_mode:
| "spec_change"
| "code_schema_change"
| "module_catalog_change"
| "capability_registry_change"
| "template_preset_change"
| "learning_bundle_refresh"
| "manual_rebuild"
validation_result_ref: StorageRef
eval_result_ref?: StorageRef
activation_decision:
| "activated"
| "staged"
| "blocked_validation_failure"
| "blocked_eval_regression"
| "blocked_source_drift"
prior_active_pack_id?: string
created_at: string
schema_version: "1.0"
}
```
### §8.8 TaskSystemKnowledgePackManifest
```ts
TaskSystemKnowledgePackManifest {
pack_id: string
generated_at: string
source_refs: SpecSourceRef[]
source_hashes: Record<string, string>
doc23_version: string
module_catalog_hash: string
module_schema_hash: string
validation_rule_hash: string
route_registry_hash: string
doc24_capability_registry_hash: string
template_index_hash: string
preset_index_hash: string
learning_bundle_generation_id?: string
coverage_report_ref: StorageRef
eval_report_ref?: StorageRef
lifecycle_state:
| "active"
| "staged"
| "rejected"
| "deprecated"
| "stale"
schema_version: "1.0"
}
```
### §8.8A TaskAgentEducationProfile
Task Agent must have an active education profile that controls which resources it may use, how many cards it may retrieve, and which validation steps are mandatory.
```ts
TaskAgentEducationProfile {
profile_id: string
active_knowledge_pack_id: string
active_design_doctrine_version: string
allowed_resource_kinds: Array<
| "module_card"
| "port_card"
| "config_card"
| "validation_card"
| "rationale_card"
| "decision_card"
| "hint_card"
| "pattern_card"
| "anti_pattern_card"
| "template_card"
| "preset_card"
| "design_case"
| "goal_card"
| "user_preference"
| "spec_reference"
| "worked_example"
| "learning_bundle"
>
max_cards_per_design_turn: number
require_template_search_before_new_graph: boolean
require_validation_before_proposal: boolean
require_preflight_before_save: boolean
allow_learned_heuristic_injection: boolean
allow_architect_doctrine_injection: boolean
allow_prior_task_case_retrieval: boolean
stale_pack_behavior:
| "block_graph_proposals"
| "use_live_catalog_and_warn"
| "degraded_lookup_only"
schema_version: "1.0"
}
```
This profile prevents accidental over-reliance on stale or overly broad resources and makes Task Agent education inspectable.
### §8.9 TaskModuleCard
```ts
TaskModuleCard {
module_type: string
display_name: string
category: "trigger" | "step" | "utility" | "output" | "source" | "environment" | "system"
purpose: string
when_to_use: string[]
when_not_to_use: string[]
common_patterns: string[]
common_mistakes: string[]
input_ports: PortUseCard[]
output_ports: PortUseCard[]
config_summary: string
important_config_fields: ConfigFieldUseCard[]
context_behavior: string
chain_history_behavior: string
session_behavior?: string
side_effect_behavior?: string
compatible_modules: string[]
incompatible_patterns: string[]
examples: MiniGraphExample[]
source_refs: SpecSourceRef[]
source_authority: KnowledgeAuthorityClass
schema_version: "1.0"
}
```
### §8.10 TaskModuleDesignRationaleCard
```ts
TaskModuleDesignRationaleCard {
module_type: string
display_name: string
why_it_exists: string
primary_design_problem: string
when_to_use: string[]
when_not_to_use: string[]
weak_signals_to_consider: string[]
common_success_patterns: string[]
common_failure_patterns: string[]
examples_from_prior_tasks: TaskDesignCaseRef[]
related_goal_refs: GoalRef[]
related_pattern_refs: TaskDesignPatternRef[]
authored_notes: string[]
learned_notes: LearnedDesignNote[]
source_authority:
| "generated_from_doc23"
| "architect_authored"
| "learned_from_task_cases"
| "user_preference"
| "mixed"
schema_version: "1.0"
}
```
### §8.11 TaskConfigRationaleCard
```ts
TaskConfigRationaleCard {
module_type: string
field_name: string
technical_effect: string
why_it_exists: string
when_to_use: string[]
when_not_to_use: string[]
interactions_with_other_fields: string[]
examples: ConfigUseExample[]
common_misconfigurations: string[]
validation_refs: string[]
source_authority: KnowledgeAuthorityClass
schema_version: "1.0"
}
```
### §8.11A TaskDesignHintCard
Some “when to use this” knowledge is too broad and situational for a hard rule but too important to leave to fresh LLM reasoning every time. These hints are first-class Task Agent education resources.
```ts
TaskDesignHintCard {
hint_id: string
title: string
hint_text: string
applies_to_task_types: string[]
applies_to_domains: string[]
applies_to_module_types: string[]
applies_to_config_fields: string[]
applies_to_decision_families: string[]
weak_signals: string[]
examples: MiniGraphExample[]
counterexamples: MiniGraphExample[]
related_goal_refs: GoalRef[]
related_preference_refs: string[]
related_case_refs: TaskDesignCaseRef[]
authority: KnowledgeAuthorityClass
activation_policy: "always_retrieve_if_matched" | "retrieve_when_uncertain" | "example_only"
schema_version: "1.0"
}
```
Examples of hint-card content:
```text
If a complex analysis task is likely to produce a shallow first answer, consider a second-pass deepening step or red-team/revision pattern.
If a module produces a memo that downstream modules cite or the user often opens later, suggest making it findable with summary.
If a prompt arriving mid-process asks for a capability not mounted at design time, allow DOC24 runtime expansion only within the module's capability envelope.
If source documents are long or likely to be revisited, prefer DOC25 retrieval tools over dumping all source text into ordinary context.
```
Hint cards are advisory. They may influence recommendations, pattern retrieval, and question policy, but they do not override validation, policy, or canonical module semantics.
### §8.12 TaskDesignDecisionCard
The decision table must not be one giant table. It is decomposed into scoped decision cards.
```ts
TaskDesignDecisionCard {
card_id: string
decision_family:
| "context_routing"
| "control_flow"
| "iteration"
| "triggering"
| "artifact_intake"
| "capability_binding"
| "model_selection"
| "output_delivery"
| "memory_policy"
| "evaluation"
question: string
hard_rules: string[]
heuristics: string[]
anti_patterns: string[]
required_validator_checks: string[]
examples: MiniGraphExample[]
source_refs: SpecSourceRef[]
learned_overlay_refs: string[]
schema_version: "1.0"
}
```
### §8.13 Example decision guidance
#### Context routing
```text
Use data_in for the object the module should operate on.
Use context_in for reference material, constraints, source documents, formatting guidance, or background.
Use instruction_in for dynamic task instructions when the module supports it.
Use chain history for prior step outputs that should remain background.
Use Carry Forward when a specific prior output must survive context decay and remain visible downstream.
Use environment.config for task-wide ambient instructions or sources.
```
#### Loop vs continuation
```text
Use Loop Controller when graph execution must re-run work until a condition is satisfied.
Use session continuation when preserving the same agent session state is the goal.
Use second-pass deepening when the goal is substantive improvement without requiring full loop machinery.
Use Judge/Red-Team -> Revision when an evaluative critique should drive changes.
```
#### Hold / Delay / Human Review / Signal
```text
Hold pauses until release signal/condition.
Delay waits by time.
Human Review waits for user decision.
Signal sends control-flow pulse, not substantive content.
Junction AND synchronizes multiple incoming paths.
```
#### Email trigger
```text
Email trigger emits full email, body, and attachments through explicit ports.
Attachments are files and may need DOC25 ingestion.
Body text may go to data_in.
Full email may go to context_in if headers/sender/thread matter.
Do not infer hidden routing or summarization at the trigger.
```
### §8.14 TaskWorkedExample
```ts
TaskWorkedExample {
example_id: string
name: string
user_request: string
interpreted_intent: string
selected_template_refs: string[]
selected_preset_refs: string[]
blueprint_ref: StorageRef
graph_ref: StorageRef
explanation: string
why_not_alternatives: string[]
validation_result_ref: StorageRef
source_refs: SpecSourceRef[]
schema_version: "1.0"
}
```
Required example families:
- Draft complaint.
- Evaluate complaint.
- Red-team document.
- Email attachment summarizer.
- Docket monitor.
- Recurring research monitor.
- File watcher -> analysis -> output.
- Human review loop.
- Judge/experiment setup.
- Coding task.
- Task-to-task handoff.
- Library ingestion task.
### §8.15 Knowledge authority classes
```ts
KnowledgeAuthorityClass {
source_kind:
| "generated_from_doc23_schema"
| "generated_from_doc24_registry"
| "architect_authored"
| "user_taught"
| "learned_from_trace"
| "learned_from_assessment"
| "llm_inferred_candidate"
authority_level:
| "hard_rule"
| "strong_default"
| "soft_heuristic"
| "example"
| "candidate"
may_affect:
| "validation"
| "recommendation"
| "template_matching"
| "question_policy"
| "artifact_policy"
| "graph_generation"
requires_review_before_activation: boolean
schema_version: "1.0"
}
```
### §8.16 TaskKnowledgeDrift
```ts
TaskKnowledgeDrift {
drift_id: string
drift_kind:
| "module_added"
| "module_removed"
| "port_changed"
| "config_field_changed"
| "validation_rule_changed"
| "route_changed"
| "capability_registry_changed"
| "template_changed"
| "preset_changed"
| "spec_hash_changed_without_card_update"
| "card_source_missing"
| "eval_regression"
affected_cards: string[]
severity: "info" | "warning" | "blocking"
remediation:
| "regenerate_pack"
| "manual_card_review"
| "run_eval_suite"
| "block_activation"
| "accept_noop"
schema_version: "1.0"
}
```
### §8.17 Typed Task Agent tools
Task Agent should call tools for current truth and validation.
```ts
task_catalog.search_modules(query, constraints)
task_catalog.get_module_card(module_type)
task_catalog.get_config_schema(module_type)
task_catalog.explain_config(module_type, field_name)
task_catalog.search_patterns(query)
task_catalog.search_antipatterns(query)
task_templates.search(intent, context)
task_templates.get_semantic_card(template_id)
task_templates.compare_to_intent(template_id, task_intent)
task_presets.search(module_type, intent)
task_presets.get_semantic_card(preset_id)
task_graph.validate(graph)
task_graph.simulate(graph, sample_inputs)
task_graph.preflight(blueprint_or_graph)
task_graph.explain(graph)
task_graph.diff(old_graph, new_graph)
task_capabilities.resolve_needs(blueprint, graph)
task_capabilities.check_availability(capability_ids)
task_capabilities.suggest_for_module(module_type, intent)
task_models.recommend(module_type, task_type, budget, latency, quality, privacy)
task_runs.inspect(run_id)
task_artifacts.search(run_id_or_task_id, query)
task_spec.lookup(query, owner_doc, expected_object_kind?)
```
### §8.18 Task Agent Design Eval Suite
Task Agent proposals must be evaluated. Evals should include representative user requests and expected task-design properties.
```ts
TaskAgentDesignEvalSuite {
suite_id: string
examples: TaskDesignEvalExample[]
scorers: TaskDesignScorer[]
baseline_task_agent_version: string
candidate_task_agent_version: string
result_ref?: StorageRef
schema_version: "1.0"
}
```
Scorers should cover:
- template reuse correctness;
- module choice correctness;
- port routing correctness;
- context routing correctness;
- capability binding correctness;
- artifact policy correctness;
- human gate/side-effect safety;
- graph validity;
- preflight quality;
- question quality;
- user preference alignment;
- explanation clarity.
### §8.19 Anti-monolithic prompt / skill-file rule
Task Agent must not be educated primarily by one huge `skill.md`, one enormous system prompt, or pasted full specs. Those may exist as export/debug/fallback artifacts, but they are not the primary runtime education substrate.
Required pattern:
```text
Small stable Task Agent Core Contract
+ generated canonical module/port/config/validation cards
+ authored design doctrine
+ rationale cards
+ decision cards
+ worked examples
+ prior-task design cases
+ template/preset semantic cards
+ DOC24 live capability/model registry
+ DOC72 goal/entity/preference links
+ DOC8/BDSM learning bundles
+ typed lookup/validation/preflight tools
```
Full specs remain available through `task_spec.lookup`, but the Task Agent should retrieve exact sections only when needed.
### §8.19A Spec Librarian behavior
Specs are source truth, but full-spec prompt stuffing is forbidden. Task Agent should use generated cards first, typed tools second, and exact spec retrieval only when needed.
```ts
task_spec.lookup({
query: string
owner_doc?: "DOC23" | "DOC24" | "DOC72" | "DOC25" | "DOC73" | "DOC3" | "DOC8" | "EC"
expected_object_kind?:
| "module"
| "port"
| "config"
| "validation"
| "route"
| "storage"
| "capability"
| "procedure"
| "pattern"
| "policy"
})
```
Task Agent may enter a Spec Librarian behavior when exact detail is needed:
```text
Show me the operative definition of utility.hold.
Show me Agent Task named output behavior.
Show me email trigger ports.
Show me Loop Controller stop semantics.
Show me whether context_in cascades.
Show me validation rules for cycles.
```
Spec lookup output should be cited internally in the Task Agent proposal record so later audits can trace which spec/card/tool informed the proposed design.
### §8.20 Required decision-card families and minimum examples
The Knowledge Pack must include decision cards for at least the following families. These cards are where “when to use this” guidance lives. The space is too broad for one giant decision table, so each family is cardized and backed by examples.
#### Context routing and context survival
```text
Use data_in when the downstream module should operate on the item.
Use context_in when the downstream module needs reference material, source documents, constraints, formatting guidance, or background.
Use instruction_in only when the module supports dynamic instructions and the input is instruction-like.
Use chain history for ordinary prior outputs that can be treated as background.
Use Carry Forward when a specific prior output must survive context decay and remain visible downstream.
Use environment.config for task-wide ambient context or standing task instructions.
Use DOC24 runtime expansion when upstream content reveals an unanticipated context/tool need.
```
#### Agent Task output shaping
```text
Use separate named outputs when downstream modules need distinct artifacts, such as issue matrix, damages memo, draft complaint, red-team prompt, or file output.
Use file_only_out when downstream consumers need actual FileRefs.
Use signal_out for completion/control flow, not substantive content.
Use error_out for failure routing.
Use Task Artifact Index entries for every material output, even if the output remains run-scoped.
Avoid forcing one large markdown blob downstream when separate outputs would improve routing, memory policy, or assessment.
```
#### Iteration, loops, and session continuation
```text
Use Loop Controller when graph execution must repeat until a condition, judge, review, or iteration limit is satisfied.
Use session continuation when preserving the same agent working session is the goal.
Use second-pass deepening when the goal is deeper analysis and missed-issue reduction without a full loop.
Use Judge/Red-Team -> Revision when critique should drive a revised output.
Use fresh isolated sessions for independent variants or experiments where context bleed would distort comparison.
```
#### Holds, gates, waits, and signals
```text
Use Human Review when a user decision, approval, rejection, or revision is required.
Use Hold when the graph must wait for a release signal/condition without implying a substantive human judgment.
Use Delay for time-based wait.
Use Signal for control-flow pulse only.
Use Junction AND to synchronize independent incoming paths.
Do not hide waits inside Task Agent; waits must remain graph-visible.
```
#### Triggers and email workflows
```text
Email trigger emits explicit full-email, body, and attachment outputs.
Route email body to data_in when the body is the thing to analyze.
Route full email to context_in when sender, headers, thread, or surrounding metadata matter.
Route attachments as FileRefs and consider DOC25 processing when they are documents.
Do not imply trigger-level summarization. Add an Agent Task or Transform module for summarization.
Email sends and other external outputs should generally require human review unless policy says otherwise.
```
#### Capability, connector, skill, procedure, and MCP use
```text
Declare known required/allowed/denied capabilities at module config time.
Let DOC24 resolve live tools/connectors/MCP servers/procedures/models.
Allow runtime expansion only inside the module's policy envelope, with receipts.
Use DOC3 procedures when the needed behavior is a learned semantic procedure.
Use MCP servers as capability providers, not raw unlimited tool lists.
Use capability utility learning for recommendation/ranking/warning, not silent deletion of explicit user bindings.
```
#### Model selection
```text
Use stronger models for final drafting, high-stakes analysis, red-team, and judges.
Use cheaper/faster models for classification, routing, extraction pre-pass, simple transforms, and low-risk summaries.
Use local-only models when PropA/EC policy or sensitivity requires local execution.
Use coding ACP module only when actual coding workspace, filesystem, terminal, diff, or test operations are needed.
Always snapshot the effective model and capability state at run start.
```
### §8.21 Knowledge Pack self-update authority rule
The Knowledge Pack is self-updating only through compiler and activation mechanics. It may receive input from prior tasks, user edits, assessments, and learning signals, but candidate updates must be authority-classified:
```text
Generated canonical mechanics
Authority: hard rule.
Source: DOC23 spec/code/schema/catalog/validator.
Auto-update via compiler.
Architect-authored doctrine and rationale
Authority: hard rule or strong default depending on card.
Source: architect/user-authored guidance.
Updates require explicit approval unless marked editable soft guidance.
Learned heuristics and prior-task cases
Authority: soft heuristic, example, or candidate unless promoted.
Source: TaskCreationSessionTrace, TaskAssessment, DOC8/BDSM, user edits.
Updates may affect recommendations, not canonical semantics.
LLM-inferred candidates
Authority: candidate only.
Source: extraction/synthesis.
Requires review or eval before activation.
```
Task Agent must never treat learned examples as changes to port semantics, validation rules, side-effect policy, route truth, or live connector availability.
### §8.22 TaskAgentInstructionCandidate
Task Agent's own instructions may improve, but only through proposal and evaluation.
```ts
TaskAgentInstructionCandidate {
candidate_id: string
target_layer:
| "core_contract"
| "design_doctrine"
| "pattern_card"
| "hint_card"
| "module_card_annotation"
| "question_policy"
| "template_matching_rule"
proposed_change: string
rationale: string
evidence_refs: StorageRef[]
supporting_signal_refs: string[]
counter_evidence_refs: StorageRef[]
eval_suite_ref?: string
eval_result_ref?: string
approval_required: boolean
rollback_ref?: StorageRef
lifecycle_state: "candidate" | "staged" | "approved" | "rejected" | "applied" | "rolled_back"
schema_version: "1.0"
}
```
The Task Agent must not silently rewrite its core contract. Low-risk hint-card or semantic-card additions may auto-stage under policy; changes to core hard rules require architect/user approval.
---
## §8A — Task Knowledge Pack boundary and readiness
### §8A.1 TKP scope correction
The Task Knowledge Pack (TKP) is the Task Agent's detailed task-system reference and design substrate. It is not:
```text
the ambient runtime router;
the Task Agent core contract;
the DOC24 invocation doctrine;
a monolithic prompt;
a DOC73 user library;
an ordinary prompt artifact;
or a replacement for live schema/validator tools.
```
It may reference the active Task Agent Core Contract and invocation doctrine for auditability, but it does not own them.
### §8A.2 Owner split
| Artifact | Owner / role |
|---|---|
| Task Agent Core Contract | System-agent config; stable hard rules, permissions, prohibitions. |
| DOC24 Ambient Task Awareness Card | Runtime packet assembly; tiny direct-first rule. |
| Task Mode Resolver Policy | DOC24/EC; task signals, vetoes, opportunity packet, token caps. |
| Task Invocation Directives | DOC72/DOC24; learned/user-taught routing guidance. |
| TKP | DOC23/EC; Task Agent design/reference substrate. |
| Task Agent Eval Suite | DOC23/Addenda A/EC; tests design/review/invocation quality. |
### §8A.3 TKP content
TKP SHOULD contain:
```text
1. Module cards
2. Port cards
3. Config rationale cards
4. Context-routing cards
5. Loop / hold / signal / junction / switch decision cards
6. Trigger cards
7. Output/artifact intake cards
8. Capability binding cards
9. DOC24 runtime-expansion cards
10. DOC72 memory integration cards
11. DOC25/DOC73 artifact-routing cards
12. Model-selection cards
13. Worked examples
14. Anti-patterns
15. Task template semantic cards
16. Module preset semantic cards
17. Task design pattern cards
18. Task design casebook
19. User preference overlays
20. Goal/rationale links
21. Prompt design guide cards
22. Prompt evaluation guide cards
23. Spec source references
24. Coverage and drift reports
```
### §8A.4 TKP usage
TKP is used for:
```text
designing tasks
reviewing existing tasks
explaining graphs
answering task-system questions
generating blueprints
generating graph diffs
validating config choices
comparing templates/presets
improving task prompts
creating prompt-evaluation tasks
learning from prior task-design cases
```
TKP is not normally used for:
```text
deciding every ordinary chat turn
ambient Elnor prompt context
simple direct answers
OpenClaw heartbeat/session monitoring
ordinary tool routing
```
### §8A.5 TKP readiness
```ts
export type TaskKnowledgePackReadiness = {
pack_id: string;
readiness_state:
| "schema_only"
| "canonical_cards_generated"
| "core_rationale_cards_authored"
| "core_patterns_authored"
| "worked_examples_loaded"
| "task_invocation_cards_loaded"
| "prompt_design_cards_loaded"
| "eval_suite_passing"
| "production_task_agent_ready";
activated_for:
| "inspection_only"
| "task_review"
| "task_design_draft_only"
| "prompt_review"
| "task_design_production";
missing_required_cards: string[];
stale_source_refs: string[];
eval_failures: string[];
schema_version: "1.0";
};
```
Task Agent MUST NOT be considered production-ready for task design until TKP has at least:
```text
module cards for every operative DOC23 module;
port/config cards for all agent-capable, utility, trigger, output, system, and evaluation modules;
pattern cards for context routing, loops, holds, signals, triggers, outputs, artifact policy, capability binding, session continuation, human gates, and prompt quality;
template/preset semantic cards;
worked examples;
Task Agent eval suite;
drift detection against DOC23 schema/code changes.
```
### §8A.6 Self-updating TKP
“Self-updating” means deterministic or governed regeneration, not ad hoc LLM self-rewriting.
```text
Spec/code/schema/registry changes
-> Task Knowledge Compiler detects drift
-> new candidate TKP generated
-> coverage and validation checks run
-> Task Agent eval suite runs
-> EC activates or stages/rejects
-> active pack pointer updates only on pass/approval
```
```ts
export type TaskKnowledgeCompilerRun = {
compiler_run_id: string;
input_sources: TaskKnowledgeSourceRef[];
generated_pack_id: string;
source_hashes: Record<string, string>;
generation_mode:
| "spec_change"
| "code_schema_change"
| "module_catalog_change"
| "capability_registry_change"
| "template_preset_change"
| "learning_bundle_refresh"
| "manual_rebuild";
validation_result_ref: StorageRef;
eval_result_ref?: StorageRef;
activation_decision:
| "activated"
| "staged"
| "blocked_validation_failure"
| "blocked_eval_regression"
| "blocked_source_drift";
prior_active_pack_id?: string;
created_at: string;
schema_version: "1.0";
};
```
### §8A.7 TKP as internal knowledge repository
TKP may use DOC73-like mechanics such as cardization, source references, versioned artifacts, indexed retrieval, and deep lookup. It should not be presented as a normal user-facing DOC73 library. It is an internal system-operation knowledge repository owned by DOC23/EC and consumed by Task Agent.
---
## §9 — Task design learning and self-improvement
### §9.0 V3 FINAL coordination signal envelope (R0.7)
Per Addenda A ↔ Addenda B coordination V3 FINAL §2.7 and §2.11, all task-design learning signals wrap in the unified `EvaluationLearningSignalEnvelope` (DOC23 Evaluation Common Contracts §5). This Core owns the emission of five Phase 1 signal types; signal payloads carry the task design signature for cross-run correlation.
**Eight Phase 1 signal types (per coordination V3 §2.11):**
| Signal type | Owning addendum | Emitter | Consumer |
|---|---|---|---|
| `outcome_evaluation` | Addenda B Core R0.7.1 (this doc) | V3.3.1 Outcome Evaluator | DOC8/BDSM, Pattern primitive |
| `repair_cycle` | Addenda B Core R0.7.1 | V3.3.1 Revisor | DOC8/BDSM, Pattern primitive |
| `task_process_gap_runtime` | Addenda B Core R0.7.1 | Task Agent or V3.3.1 Revisor | Task Agent (consume), DOC8/BDSM |
| `taint_clearance` | Addenda B Core R0.7.1 | V3.3.1 Revisor or user-action surfaces | DOC8/BDSM (privilege learning) |
| `hard_call_resolution` | Addenda B Core R0.7.1 | V3.3.1 Revisor when Hard Calls resolve | DOC8/BDSM (escalation learning) |
| `prompt_comparison` | Addenda A R4.1 V3 | Experiment | DOC8/BDSM (prompt learning) |
| `task_design_correlation` | DOC8/BDSM | BDSM correlation analysis | Task Agent (suggestion surfacing) |
| `user_action` | R0.6.4 §24A.7 (continues) | UI surfaces | Pattern primitive |
**Common envelope per Common Contracts §5.1:**
```ts
EvaluationLearningSignalEnvelope {
signal_id: string
signal_type: SignalType // per table above
task_id, run_id: string
evaluation_chain_id?: string
source_module_id: string
source_activation_seq: number
// Governance — gated by EC Core (OBL-XDOC-EC-POLICY-SIGNALS-01)
governance_policy_ref: string
source_policy_snapshot_ref?: StorageRef
data_class: "public" | "internal" | "privileged" | "local_only"
matter_id?: string
pattern_promotion_eligible: boolean
// Model context — for cheap-LLM learning mode (V3.3 §6.16)
model_class: "cheap_local" | "cheap_api" | "medium" | "expensive_frontier"
model_fingerprint: string
// Task design context — optional; populated for signals where task design
// is relevant (per coordination V3 §2.7)
task_design_signature?: {
graph_topology_hash: string
upstream_module_types: string[]
upstream_module_version_constraints?: Record<string, string>
segment_ids?: string[]
task_blueprint_ref?: string // links to §6 Task Blueprint
// enables blueprint-level correlation
}
emitted_at: ISO8601
payload_ref: StorageRef // typed signal-specific payload
schema_version: 1
}
```
**Signal payloads owned by Core R0.7.1:**
The five Core-owned signal payloads are specified below. Each is wrapped in `EvaluationLearningSignalEnvelope` at emit time.
#### §9.0.1 OutcomeEvaluationSignal payload
```ts
OutcomeEvaluationSignal {
evaluator_result_ref: string // EvaluationResultEnvelope.result_id
outcome_spec_ref: string // EvaluationOutcomeDefinition reference
outcome_spec_template_id?: string // critical for cross-run learning
verdict: "passed" | "failed" | "indeterminate"
findings_summary: {
total_count: number
critical_count: number
major_count: number
minor_count: number
by_criterion_id: Record<string, number>
}
evidence_status_summary: Record<EvidenceStatus, number>
schema_version: 1
}
```
Emitted by V3.3 Evaluator on every completed evaluation.
#### §9.0.2 RepairCycleSignal payload (full Phase 1 form)
Per coordination V3 §2.11, RepairCycleSignal emits the full Phase 1 form including `qualitative_delta` (always populated), per-criterion score deltas when Judge is in the loop, and `taint_evolution` per V3.1's actual taint model.
```ts
RepairCycleSignal {
initial_evaluator_result_ref: string // EvaluationResultEnvelope.result_id
final_evaluator_result_ref: string
before_snapshot_ref: StorageRef // V3.3 §5.16 EvaluationSnapshot
after_snapshot_ref: StorageRef
revisor_actions: RevisorActionRecord[] // per V3.3 §2.7 RevisorActionKind
cycle_outcome:
| "passed_after_repair"
| "still_failing"
| "diverged"
| "max_iterations"
| "escalated_to_human" // when Hard Call surfaces during cycle
// Qualitative delta — ALWAYS populated (Evaluator data sufficient)
qualitative_delta: {
findings_before: number
findings_after: number
critical_findings_before: number
critical_findings_after: number
major_findings_before: number
major_findings_after: number
minor_findings_before: number
minor_findings_after: number
resolved_finding_ids: string[]
new_finding_ids: string[]
per_criterion_finding_counts_before: Record<CriterionRef, number>
per_criterion_finding_counts_after: Record<CriterionRef, number>
}
// Numeric delta — populated when Judge ran in the loop (Pattern A, B, or C)
per_criterion_score_deltas: Record<string, {
criterion_id: string
criterion_semantics_hash: string
judge_score_before: number | null
judge_score_after: number | null
judge_score_delta: number | null
finding_count_before: number
finding_count_after: number
finding_count_delta: number
severity_distribution_delta: Record<"critical" | "major" | "minor", number>
attribution: AttributionRecord[]
}>
// Taint evolution — per coordination V3 §2.11
taint_evolution: {
taint_at_initial_evaluation: TaintClass
taint_at_final_evaluation: TaintClass
taint_transitions: Array<{
action_id: string
transition_kind:
| "expanded"
| "cleared"
| "unchanged"
| "isolated_to_candidate" // V3.1 P30 sandboxed eval pathway
// For "expanded":
expanded_from?: TaintClass
expanded_to?: TaintClass
expansion_source_kind?:
| "new_source_pulled"
| "tainted_revision_input"
| "ungoverned_user_input"
| "sub_agent_advice_taint_inheritance"
// For "cleared":
clearance_method?: TaintClearanceMethod // V3.3 §0.4 typed enum
clearance_record_ref?: StorageRef
cleared_by_actor_ref?: ActorRef
cleared_by_access_tier?: AccessTier // V3.1 P19 tier-binding
// For "isolated_to_candidate":
candidate_version_ref?: StorageRef
// Cross-link to per-event signals
related_taint_clearance_signal_ids?: string[]
}>
}
// Semantic guards — required when Judge ran
metric_semantics_version?: string
scorer_hash?: string
evaluation_method?: string
assurance_basis?: AssuranceBasis[]
schema_version: 1
}
RevisorActionRecord {
action_id: string
action_kind: RevisorActionKind // derived projection per V3.3 §2.6
underlying_step_kind: RevisionPlanStepKind // V3.1 raw enum
underlying_strategy_kind: RepairStrategyKind // V3.1 raw enum
targeted_criterion_ids: string[] // empty when global plan context
targeted_finding_ids: string[]
action_scope_ref: ArtifactScopeRef // Common Contracts §7
revision_operation_receipt_ref: StorageRef // V3.3 §11.6 receipt linkage
schema_version: 1
}
AttributionRecord {
action_id: string
attribution_basis:
| "scope_match"
| "criterion_targeted"
| "finding_targeted"
| "user_confirmed"
| "global_plan_context"
// explicitly excludes "revisor_self_assessment" per V3.1 P20 sycophancy
// mitigation; Revisor's own goal-impact assessment NEVER feeds learning
attribution_confidence: number // 0..1
}
```
Emitted by V3.3.1 Revisor at outermost cycle closure (post-evaluation after final revision attempt).
#### §9.0.3 TaskProcessGapSignal payload (runtime)
```ts
TaskProcessGapSignal {
detected_during_run_id: string
gap_kind:
| "missing_module"
| "missing_capability"
| "missing_source_module"
| "missing_verification"
| "missing_handoff"
| "policy_blocked_path"
| "capability_version_mismatch"
observed_failure_kind?: FailureKind // V3.1 §0.4
affected_outcome_refs: string[]
suggested_remediation?: TaskGraphPatchProposal
schema_version: 1
}
```
Emitted by Task Agent or V3.3.1 Revisor when a runtime gap is observed mid-execution. Task Agent consumes both this signal (its own emission) and aggregate `task_design_correlation` signals from DOC8/BDSM to surface task-design suggestions to the user.
#### §9.0.4 TaintClearanceSignal payload
```ts
TaintClearanceSignal {
cleared_artifact_ref: StorageRef
cleared_artifact_version_ref: StorageRef
cleared_from: TaintClass
cleared_to: TaintClass // typically "system_trusted" after clearance
clearance_method: TaintClearanceMethod // V3.3 §0.4
clearance_record_ref: StorageRef // V3.1 TaintClearanceRecord
cleared_by_actor_ref: ActorRef
cleared_by_access_tier: AccessTier // V3.1 P19 tier-binding
scope_cleared: ArtifactScopeRef // Common Contracts §7
// Context for learning
artifact_kind: string
domain_tags?: string[]
schema_version: 1
}
```
Emitted when taint is actively cleared (not when isolated to candidate or expanded — those are tracked in RepairCycleSignal's taint_transitions). Enables BDSM to learn clearance patterns and to detect privilege-escalation patterns (junior users clearing more frequently than supervisor-tier policy expects).
#### §9.0.5 HardCallResolutionSignal payload
```ts
HardCallResolutionSignal {
hard_call_ref: string // V3.3 §7.9 HardRevisionCall ID
resolution_at: ISO8601
hard_call_kind: HardRevisionCallKind // V3.1 §0.4
resolution_path:
| "accept_proposed"
| "modify_proposed"
| "defer"
| "reject"
| "escalate_to_supervisor"
resolution_actor_ref: ActorRef
resolution_actor_access_tier: AccessTier
resolution_rationale_summary?: string // user-stated reason (when provided)
// For learning correlation
preceding_revision_plan_ref?: StorageRef // plan that triggered the Hard Call
schema_version: 1
}
```
Emitted when a Hard Revision Call resolves. Enables BDSM to learn when to surface Hard Calls vs. auto-resolve, and to detect resolution-path patterns by user/tier.
#### §9.0.6 Signal emission and consumption flow
```
[V3.3 Evaluator] → OutcomeEvaluationSignal →
[V3.3.1 Revisor] → RepairCycleSignal, TaskProcessGapSignal,
TaintClearanceSignal, HardCallResolutionSignal →
↓
[EC Core policy gate]
↓
[DOC8/BDSM signal stream]
↓
[TaskDesignCorrelationSignal aggregation]
↓
[DOC72 Pattern primitive store]
↓
[Task Agent surfaces suggestions]
↓
[User in §20 UI]
```
EC Core's compiled policy engine (OBL-XDOC-EC-POLICY-SIGNALS-01) gates every signal at the envelope layer based on `data_class`, `matter_id`, and `pattern_promotion_eligible`. Privileged-matter signals do not auto-promote to durable learning; matter-scoped signals do not auto-cross matter boundaries (per V3.3 §13.4 firewall).
### §9.1 Highest-value learning target
The highest-value self-learning for DOC23 is **task creation quality**, not merely post-run error detection. The system should learn how to convert underspecified user intent into rich, reliable, reusable processes.
### §9.2 What to learn
Task-design learning should learn:
1. Which templates match which user intents.
2. Which module presets work in which task types.
3. Which optional process steps improve outcomes.
4. Which user questions are useful or annoying.
5. Which output archetypes the user prefers.
6. Which artifact-intake defaults are valuable.
7. Which capabilities/connectors/tools are useful in task contexts.
8. Which human review points prevent errors.
9. Which model strategies balance cost/quality.
10. Which graph patterns create reliable long-running workflows.
11. Which Task Agent suggestions are accepted or rejected.
12. Which prior task-design decisions served which goals.
### §9.3 What not to learn automatically
The system must not learn canonical mechanics by inference. It must not infer or mutate:
- technical meaning of ports;
- config field semantics;
- validation rules;
- side-effect policy;
- route paths;
- live connector availability;
- security policy overrides;
- executable graph rewrites;
- template/preset material mutations.
Those are generated/canonical or approval-gated.
### §9.4 TaskCreationSessionTrace
```ts
TaskCreationSessionTrace {
design_session_id: string
user_initial_request: string
resolved_intent: string
inferred_task_type: string
inferred_domain?: string
context_class_key: string
templates_considered: TemplateCandidate[]
presets_considered: PresetCandidate[]
design_patterns_considered: PatternCandidate[]
design_cases_considered: TaskDesignCaseRef[]
goals_considered: GoalRef[]
questions_asked: TaskAgentQuestionTrace[]
user_answers: TaskAgentAnswerTrace[]
proposed_blueprint_ref: StorageRef
proposed_graph_ref: StorageRef
final_graph_ref?: StorageRef
user_edits: TaskCreationUserEdit[]
accepted_suggestions: string[]
rejected_suggestions: string[]
capability_plan_ref: StorageRef
artifact_policy_ref: StorageRef
preflight_report_ref?: StorageRef
final_saved_task_id?: string
final_template_id?: string
later_run_refs: string[]
outcome:
| "saved_as_is"
| "saved_with_user_edits"
| "abandoned"
| "restarted"
| "rejected"
learning_signal_refs: string[]
schema_version: "1.0"
}
```
### §9.5 TaskAgentDesignLearningSignal
```ts
TaskAgentDesignLearningSignal {
signal_id: string
design_session_id?: string
task_id?: string
run_id?: string
task_type: string
context_class_key: string
signal_kind:
| "template_selected"
| "template_rejected"
| "preset_added"
| "preset_removed"
| "question_helpful"
| "question_annoying"
| "output_preference_confirmed"
| "artifact_policy_changed"
| "capability_needed"
| "capability_unused"
| "graph_pattern_success"
| "graph_pattern_failure"
| "user_manual_edit_after_proposal"
| "second_pass_deepening_accepted"
| "second_pass_deepening_rejected"
| "red_team_default_accepted"
| "red_team_default_rejected"
evidence_ref: StorageRef
recommended_update_kind:
| "semantic_card_update"
| "pattern_card_update"
| "template_update_proposal"
| "preset_update_proposal"
| "task_agent_instruction_candidate"
| "question_policy_update"
| "none"
schema_version: "1.0"
}
```
### §9.6 TaskDesignCase
Prior user-created, user-liked, user-refined, or high-value tasks should become design cases.
```ts
TaskDesignCase {
case_id: string
source_task_id: string
source_run_ids: string[]
source_blueprint_id?: string
task_type: string
domain?: string
context_class_key: string
user_intent_summary: string
final_blueprint_summary: string
design_decisions: TaskDesignDecision[]
user_refinements: UserRefinementTrace[]
outcome_summary: string
assessment_refs: string[]
reusable_lessons: ReusableDesignLesson[]
anti_lessons: ReusableAntiPatternLesson[]
goal_refs: GoalRef[]
artifact_refs: ArtifactRef[]
template_or_preset_refs: string[]
lifecycle_state: "candidate" | "active" | "archived"
schema_version: "1.0"
}
```
### §9.7 TaskDesignDecision
```ts
TaskDesignDecision {
decision_id: string
module_id?: string
graph_region_ref?: string
selected_design: string
alternatives_considered: string[]
rejected_alternatives: string[]
because: string
goal_refs: GoalRef[]
expected_effect: string
actual_effect?: string
evidence_refs: StorageRef[]
confidence: number
schema_version: "1.0"
}
```
Example:
```text
Decision: Use Loop Controller after initial complaint draft.
Because: user wanted iterative refinement rather than one-pass analysis.
Goal served: improve substantive depth and catch missed pleading defects.
Actual effect: red-team feedback was incorporated into revised complaint.
```
### §9.8 TaskDesignExtractionBundle
```ts
TaskDesignExtractionBundle {
bundle_id: string
source_kind:
| "task_creation_session"
| "task_run"
| "task_assessment"
| "user_refined_task"
| "template_update"
| "manual_architect_note"
extracted_design_cases: TaskDesignCase[]
extracted_design_patterns: TaskDesignPatternCard[]
extracted_user_preferences: TaskDesignPreferenceCandidate[]
extracted_goal_links: TaskGoalLinkCandidate[]
extracted_template_updates: TemplateUpdateProposal[]
extracted_preset_updates: TemplateUpdateProposal[]
confidence: number
review_required: boolean
schema_version: "1.0"
}
```
### §9.9 TaskDesignImprovementProposal
```ts
TaskDesignImprovementProposal {
proposal_id: string
proposal_kind:
| "task_agent_instruction_candidate"
| "design_pattern_card_update"
| "anti_pattern_card_update"
| "template_semantic_card_update"
| "module_preset_semantic_card_update"
| "task_template_update"
| "module_preset_update"
| "output_archetype_default_update"
| "artifact_policy_default_update"
| "question_policy_update"
target_ref: string
rationale: string
evidence_refs: StorageRef[]
support_count: number
counter_evidence_refs: StorageRef[]
proposed_patch_ref: StorageRef
prior_hash: string
candidate_hash: string
requires_eval: boolean
eval_suite_ref?: string
eval_result_ref?: string
approval_policy:
| "auto_apply_low_risk"
| "user_review"
| "architect_review"
| "never_auto_apply"
lifecycle_state:
| "candidate"
| "staged"
| "approved"
| "rejected"
| "applied"
| "rolled_back"
schema_version: "1.0"
}
```
### §9.10 Second-pass deepening pattern
The user has stated a preference for follow-up prompts such as “look deeper” in complex analysis tasks. This must be represented as a pattern/preference, not merely remembered as chat text.
```ts
TaskDesignPatternCard {
pattern_id: "complex_analysis_second_pass_deepening"
display_name: "Second-Pass Deepening"
purpose: "Improve complex analytical outputs by requiring a follow-up pass that looks for missed issues, deeper implications, weaknesses, and revisions."
when_to_use: [
"complex legal analysis",
"brief or complaint evaluation",
"multi-document synthesis",
"strategic analysis where first-pass answers may be shallow"
]
when_not_to_use: [
"simple extraction",
"low-cost quick answer",
"deterministic formatting tasks"
]
canonical_graph_shapes: [
"Agent Task -> Agent Task deeper review -> revision/final output",
"Agent Task -> Judge/Red Team -> revision Agent Task",
"Agent Task -> Loop Controller with deepening instructions"
]
user_preference_refs: ["Will prefers second-pass deepening in complex analysis tasks"]
goal_refs: ["increase substantive depth", "reduce missed issues", "surface hidden weaknesses"]
schema_version: "1.0"
}
```
### §9.11 Question utility and output archetype learning
Task Agent should learn that some questions and output suggestions are routinely useful.
Examples:
```text
Complaint evaluation:
default outputs: executive summary + issue matrix + revision plan.
Draft complaint:
default outputs: complaint draft + damages memo + red-team report + revision summary.
Docket monitoring:
default outputs: filing summary + deadline table + alert.
```
### §9.12 Capability utility learning
Capability utility learning is not mainly for deleting tools from saved tasks. It is used for:
- Task Agent design recommendations;
- DOC24 capability ranking and fallback ordering;
- preflight warnings;
- failure diagnosis;
- connector health/reliability reports;
- reducing tool bloat in modules;
- procurement/maintenance decisions;
- suggesting capabilities when designing new tasks.
It must not silently remove explicit user-selected capabilities or mutate saved configs.
### §9.13 Task Design Casebook promotion rules
The Task Design Casebook is the bridge between prior task experience and Task Agent design wisdom. It should extract lessons from tasks the user created, liked, refined, reused, or assessed as successful.
A prior task is eligible to become an active design case when at least one of the following is true:
- user manually created or substantially refined the graph;
- user reused the task/template;
- user accepted Task Agent's proposal with few edits;
- user edited the proposal in a way that reveals a reusable design preference;
- Task Assessment rated the run or process highly;
- an intermediate artifact was repeatedly opened, cited, reused, promoted, or incorporated downstream;
- the task illustrates a module/config choice that is not obvious from schema alone.
### §9.14 Prior-task “because and goal” extraction
Casebook extraction must capture not just what was built but why. The minimum case-learning unit is:
```ts
PriorTaskDesignLesson {
lesson_id: string
source_task_id: string
source_run_ids: string[]
module_or_graph_region_ref?: string
design_choice: string
because: string
goal_refs: GoalRef[]
expected_effect: string
actual_effect?: string
when_to_reuse: string[]
when_not_to_reuse: string[]
evidence_refs: StorageRef[]
authority_level: "example" | "soft_heuristic" | "strong_default_candidate"
review_state: "candidate" | "active" | "rejected" | "archived"
schema_version: "1.0"
}
```
Example:
```text
Design choice:
Loop Controller was used after the red-team module.
Because:
The user wanted iterative improvement until the revised complaint addressed major red-team findings.
Goal served:
Improve substantive depth and reduce missed pleading weaknesses.
Actual effect:
The revised draft incorporated the red-team findings and was accepted by the user.
Reusable lesson:
For complex drafting tasks where critique should drive revision, prefer Red-Team -> Loop/Revision over one-pass red-team report only.
```
### §9.15 User preference overlays for task design
Task Agent must preserve user task-design preferences as first-class overlays, not disposable chat commentary. Examples:
```text
Will prefers second-pass “look deeper” prompts in complex analysis tasks.
Will often wants issue matrix + memo for complaint evaluation.
Will wants final legal work products saved as work products/documents unless incognito.
Will usually wants red-team before final legal drafting output.
Will dislikes unnecessary model-choice questions during task creation.
```
These preferences should influence Blueprint proposals, question policy, output archetype defaults, artifact policy suggestions, and template matching, but they must not silently override explicit user instructions in the current design session.
### §9.16 Goal-based task-design learning
Task design learning should not only record which module or pattern was used. It should record which goal the design served and whether that goal was advanced.
```ts
TaskDesignGoalLearningRecord {
record_id: string
goal_ref: GoalRef
source_kind: "design_session" | "task_run" | "task_assessment" | "user_feedback"
source_ref: StorageRef
task_type: string
context_class_key: string
pattern_refs: TaskDesignPatternRef[]
template_refs: TaskTemplateRef[]
module_preset_refs: ModulePresetRef[]
design_decision_refs: string[]
observed_effect: "advanced_goal" | "partially_advanced_goal" | "did_not_advance_goal" | "unknown"
evidence_summary: string
confidence: number
schema_version: "1.0"
}
```
This lets Task Agent recommend patterns because they serve a known objective, not merely because they appeared in a similar graph.
---
## §9A — Task invocation learning and BDSM/DOC8 integration
### §9A.1 TaskSuggestionFeedbackEvent
```ts
export type TaskSuggestionFeedbackEvent = {
event_id: string;
user_request_ref: string;
task_mode_decision_ref: string;
suggestion_kind:
| "task_suggested"
| "task_chip_shown"
| "task_not_suggested"
| "task_agent_invoked"
| "task_agent_not_invoked";
user_response:
| "accepted"
| "rejected"
| "ignored"
| "asked_direct_answer_instead"
| "later_requested_task"
| "complained_task_was_overused"
| "complained_task_was_underused";
context_class_key: string;
matched_template_ids: string[];
matched_invocation_directive_ids: string[];
evidence_ref: StorageRef;
schema_version: "1.0";
};
```
### §9A.2 TaskInvocationLearningSignal
```ts
export type TaskInvocationLearningSignal = {
signal_id: string;
user_request_ref: string;
mode_decision_ref: string;
signal_kind:
| "task_suggestion_accepted"
| "task_suggestion_rejected"
| "task_chip_accepted"
| "task_chip_ignored"
| "task_overused"
| "task_underused"
| "direct_answer_preferred"
| "task_mode_preferred"
| "task_used_successfully"
| "task_used_unsuccessfully"
| "existing_template_missed"
| "wrong_template_suggested"
| "task_agent_consult_helpful"
| "task_agent_consult_unhelpful";
context_class_key: string;
task_type?: string;
document_archetype?: string;
matched_directive_ids: string[];
matched_template_ids: string[];
evidence_ref: StorageRef;
recommended_update_kind:
| "task_invocation_directive_update"
| "task_mode_policy_update"
| "template_semantic_card_update"
| "question_policy_update"
| "none";
schema_version: "1.0";
};
```
### §9A.3 Runtime influence
BDSM/DOC8 compiled bundles may influence:
```text
task suggestion suppression
task suggestion boost
task chip copy
template matching priority
Task Agent question policy
Task Agent design defaults
TaskInvocationDirective candidate promotion
```
BDSM/DOC8 MUST NOT:
```text
silently run tasks
silently mutate task graphs
silently rewrite templates/presets
silently change side-effect policies
force injection of TKP into ordinary chat
```
### §9A.4 Rate limits and suppression
Task suggestions SHOULD be rate-limited by context class and recent rejection. Repeated rejection of task suggestions for a context class SHOULD suppress future non-explicit suggestions until sufficient countervailing evidence appears.
---
## §9B — Task Agent proposal-edit learning
### §9B.1 User edits are high-signal
If Task Agent proposes a task and the user adds a module, removes a module, rewires context, changes output policy, changes a prompt, adds a loop, adds a judge, changes capability binding, or changes artifact intake, that edit is a high-value learning signal.
### §9B.2 TaskAgentProposalEditTrace
```ts
export type TaskAgentProposalEditTrace = {
trace_id: string;
design_session_id: string;
proposed_blueprint_ref: StorageRef;
proposed_graph_ref: StorageRef;
final_user_edited_graph_ref: StorageRef;
graph_diff: GraphDiff;
blueprint_diff?: BlueprintDiff;
edit_kinds: Array<
| "module_added"
| "module_removed"
| "module_rewired"
| "config_changed"
| "prompt_changed"
| "capability_added"
| "capability_removed"
| "artifact_policy_changed"
| "human_gate_added"
| "loop_added"
| "judge_added"
| "template_changed"
| "second_pass_deepening_added"
| "source_grounding_added"
>;
interpreted_learning_signals: TaskAgentDesignLearningSignal[];
evidence_ref: StorageRef;
schema_version: "1.0";
};
```
### §9B.3 Example
```text
Task Agent proposed a simple complaint-evaluation graph.
User added a second-pass “look deeper” module.
Signal:
Task Agent omitted preferred complex-analysis deepening pattern.
Recommended update:
Add or strengthen TaskDesignPatternCard: complex_analysis_second_pass_deepening.
```
---
## §9C — Unified Task System Learning Report
### §9C.1 Separate ledgers, unified report
Do not collapse all learning into one giant ledger. Keep separate evidence families:
```text
task invocation utility
task suggestion feedback
task agent design utility
template/preset utility
capability/tool utility
question utility
artifact policy utility
execution success/failure
judge/experiment quality
prompt-quality signals
user graph edits
```
Unify them through shared context keys, evidence references, and reports.
### §9C.2 TaskSystemLearningReport
```ts
export type TaskSystemLearningReport = {
report_id: string;
generated_at: string;
scope:
| "task_invocation"
| "task_agent_design"
| "task_execution"
| "template_library"
| "prompt_quality"
| "portfolio"
| "all";
invocation_findings: TaskInvocationFinding[];
task_agent_findings: TaskAgentDesignFinding[];
template_preset_findings: TemplatePresetFinding[];
execution_findings: ExecutionFinding[];
judge_experiment_findings: EvaluationFinding[];
prompt_quality_findings: PromptQualityFinding[];
user_feedback_findings: UserFeedbackFinding[];
artifact_policy_findings: ArtifactPolicyFinding[];
recommended_updates: TaskSystemImprovementProposal[];
schema_version: "1.0";
};
```
### §9C.3 TaskSystemImprovementProposal
```ts
export type TaskSystemImprovementProposal = {
proposal_id: string;
proposal_kind:
| "task_invocation_directive_update"
| "task_mode_policy_update"
| "task_agent_instruction_candidate"
| "task_design_pattern_card_update"
| "task_design_hint_card_update"
| "template_semantic_card_update"
| "module_preset_semantic_card_update"
| "task_template_update"
| "module_preset_update"
| "artifact_policy_default_update"
| "prompt_design_guide_card_update"
| "prompt_update_proposal"
| "question_policy_update";
target_ref: string;
rationale: string;
evidence_refs: StorageRef[];
support_count: number;
counter_evidence_refs: StorageRef[];
proposed_patch_ref: StorageRef;
prior_hash?: string;
candidate_hash?: string;
requires_eval: boolean;
eval_suite_ref?: string;
eval_result_ref?: string;
approval_policy:
| "auto_apply_low_risk"
| "user_review"
| "architect_review"
| "never_auto_apply";
lifecycle_state:
| "candidate"
| "staged"
| "approved"
| "rejected"
| "applied"
| "rolled_back";
schema_version: "1.0";
};
```
---
## §10 — DOC72 Task Activity Memory
### §10.1 Purpose
DOC72 Task Activity Memory turns tasks into durable, queryable, relationship-rich memory without turning DOC72 into raw log storage.
### §10.2 Three-layer memory model
```text
Layer 1 — Operational run truth
Owner: DOC23/EC
Contents: exact logs, checkpoints, prompts, outputs, tool calls, spans, artifacts, errors, costs.
Layer 2 — DOC72 Task Activity Memory
Owner: DOC72, written by EC
Contents: task/matter/entity/artifact/work-product/goal relationships, run summaries, structured activity memory, selected summaries/references.
Layer 3 — Learning and utility signals
Owner: DOC8/BDSM, written by EC
Contents: friction, utility, capability fit, question usefulness, pattern success/failure, template/preset learning.
```
### §10.3 TaskRunMemoryPolicy
```ts
TaskRunMemoryPolicy {
enabled: boolean
mode:
| "standard"
| "reference_only"
| "summary_reference"
| "summary_plus_learning"
| "memory_incognito"
| "full_ephemeral_test"
artifact_default_level: ArtifactIntakeLevel
capture_prompts: "never" | "redacted" | "full_local_only"
capture_outputs: "reference_only" | "redacted_summary" | "full_local_only"
doc72_write_scope:
| "none"
| "task_run_only"
| "task_run_plus_final_artifacts"
| "task_run_plus_linked_artifacts"
| "task_run_plus_learning_signals"
entity_linking_enabled: boolean
goal_linking_enabled: boolean
matter_linking_enabled: boolean
followup_session_linking_enabled: boolean
retention_class:
| "standard"
| "high_value"
| "matter_active"
| "ephemeral"
| "legal_work_product"
user_can_retroactively_promote: boolean
schema_version: "1.0"
}
```
### §10.4 Memory Incognito vs Full Ephemeral
| Mode | Meaning |
|---|---|
| Memory Incognito | Suppresses DOC72/DOC8/BDSM optional memory and learning writes, but preserves operational run records needed for execution safety, audit, and user retrieval. |
| Full Ephemeral/Test | Suppresses durable run details except minimum safety/failure receipts; intended for private experiments and test runs. |
Business-grade workflows should default to audit-preserving modes, not full ephemeral modes.
### §10.5 TaskRunExecutionTracePayload
Task runs are stored in DOC72 using existing `execution_trace` node kind.
```ts
TaskRunExecutionTracePayload {
kind: "doc23_task_run"
task_id: string
run_id: string
task_name: string
template_id?: string
blueprint_id?: string
preset_or_template_ref?: string
status: TaskRunStatus
started_at: string
completed_at?: string
trigger_summary: TriggerSummary
work_context_refs: EntityRef[]
matter_or_project_refs: EntityRef[]
linked_goal_refs: GoalRef[]
linked_task_entity_ref?: EntityRef
workflow_summary: string
graph_snapshot_ref: StorageRef
blueprint_snapshot_ref?: StorageRef
key_config_snapshot_ref: StorageRef
module_activity_summaries: ModuleActivitySummary[]
reviewed_document_refs: DocumentRef[]
produced_artifact_refs: ArtifactRef[]
produced_work_product_refs: WorkProductRef[]
human_gate_summary: HumanGateSummary
subagent_lineage_summary: SubAgentLineageSummary
capability_usage_summary: CapabilityUsageSummary
cost_summary: CostSummary
quality_summary?: QualitySummary
failure_summary?: FailureSummary
assessment_refs: TaskAssessmentRef[]
followup_session_refs: ConversationOrRoomRef[]
raw_run_ref: StorageRef
schema_version: "1.0"
}
```
### §10.6 Marex example
If the user runs a `DRAFT COMPLAINT` task for Marex, DOC72 should be able to know and link:
```text
Marex matter
-> DRAFT COMPLAINT task template
-> task run on date X
-> source documents reviewed by modules A/B/C
-> damages analysis memo produced by module D
-> complaint draft produced by module E
-> red-team report produced by module F
-> final revised complaint work_product
-> later red-team chat/room/session
-> later revised complaint version
-> task assessment and improvement proposals
```
This is not raw log storage. It is activity memory and relationship graphing.
### §10.7 Task Activity Graph
```ts
TaskRunMemoryGraph {
task_run_node: ExecutionTraceRef
linked_task_entity: EntityRef
linked_matter_entities: EntityRef[]
linked_goal_nodes: GoalRef[]
reviewed_document_refs: DocumentRef[]
produced_artifact_refs: ArtifactRef[]
produced_work_product_refs: WorkProductRef[]
module_activity_summaries: ModuleActivitySummary[]
human_review_refs: HumanReviewRef[]
red_team_refs: EvaluationOrRedTeamRef[]
followup_session_refs: ConversationOrRoomRef[]
capability_usage_refs: CapabilityUsageRef[]
assessment_refs: TaskAssessmentRef[]
schema_version: "1.0"
}
```
### §10.8 Immediate vs nightly/ongoing ingestion
Task memory integration has two lanes.
#### §10.8.1 Immediate lane
Immediate lane is deterministic or cheap and occurs at run start, module completion, artifact creation, and run completion.
Immediate writes include:
- TaskRun execution_trace stub at run start when memory policy permits.
- Task/matter/entity refs already known from task config, active context, source docs, template, or user selection.
- Graph snapshot ref.
- Prompt/capability/policy snapshot refs.
- Module status summaries.
- ArtifactIndex entries.
- Final output refs.
- Cost and timing summaries.
- Capability snapshot and usage receipts.
- Human gate receipts.
- Error/failure receipts.
Immediate lane must not require a large agent extraction call.
#### §10.8.2 Nightly/ongoing extraction lane
Nightly/ongoing lane may use LLM-assisted extraction or heavier graph analysis. It should process:
- long output summaries;
- design cases from task creation sessions;
- task-design rationale extraction;
- goal-link extraction;
- artifact importance classification;
- template/preset semantic card updates;
- Task Assessment synthesis if not run immediately;
- DOC8/BDSM learning bundle compilation;
- stale-policy/retention compaction;
- cross-run pattern detection.
#### §10.8.3 Trigger conditions
A task run or artifact is queued for nightly extraction when:
- artifact policy requests DOC72 summary/reference;
- output exceeds immediate summary threshold;
- output is opened/bookmarked/used downstream/cited;
- user explicitly says save/make findable/remember this;
- Task Agent or Task Assessment flags it as high-value;
- run produced final work product;
- run generated significant failure/learning signals;
- task creation session had substantial user edits or accepted/rejected suggestions.
### §10.9 Retention and compaction
Retention classes:
```ts
TaskRetentionClass =
| "standard"
| "high_value"
| "matter_active"
| "legal_work_product"
| "template_training"
| "ephemeral"
```
Compaction must preserve:
- TaskRun identity;
- task/template/blueprint refs;
- matter/entity/goal links;
- final artifact/work-product refs;
- assessment refs;
- cost/quality/failure summaries;
- enough module summary to explain what happened;
- provenance refs for any durable memory claims.
### §10.10 Minimum DOC72 work-context capture
When task memory is enabled, DOC72 Task Activity Memory must capture enough context for the task run to link through the broader memory matrix. The minimum capture set is:
```text
Task identity
task_id, run_id, task name, template/preset/blueprint refs, graph snapshot ref.
Work context
matter/case/project/personal context refs when known; active context slot refs; user-selected context refs.
Entity links
relevant people, organizations, matters, documents, libraries/corpora, rooms, panels, chats, email accounts, calendars, folders, and other known entities.
Source/document links
documents reviewed, downloaded, received, produced, or opened; DOC25 ingestion refs where available.
Artifact/work-product links
final outputs, high-value intermediate outputs, work products, red-team reports, judge/experiment outputs selected for retention.
Goal links
Blueprint goals and reusable DOC72 goal nodes served by the task or by major design decisions.
Follow-on links
later chats, rooms, red-team sessions, revised work products, task re-runs, and assessments linked back to the original run.
```
This minimum capture is what lets a later Marex red-team chat, a Marex complaint revision, a damages memo, and the original DRAFT COMPLAINT task all become connected parts of the memory matrix.
### §10.11 Task memory is not solely retrospective
Task Activity Memory also supports prospective task assistance. When the user asks Task Agent to design or run a task, DOC72 memory can supply:
- relevant matters/entities/work contexts;
- prior tasks and templates for the same entity or matter;
- prior outputs and work products;
- user preferences and task-design goals;
- known failure modes or assessments;
- relevant libraries/corpora and documents;
- prior capability/procedure/tool experiences.
Task Agent must distinguish between memory-derived context and current user-provided instructions when presenting a proposal.
---
## §11 — Artifact and output intake policy
### §11.1 Purpose
Task outputs vary widely. Some are transient control data. Some are low-value intermediate drafts. Some are high-value memos. Some are final work products. Some are downloaded/received documents. Some are emails/texts. The system must let the user and Task Agent control what becomes findable, summarized, ingested, or added to a library.
### §11.2 ArtifactIntakeLevel
Use technical enum internally, with user-friendly labels in UI.
```ts
type ArtifactIntakeLevel =
| "run_only"
| "doc72_reference"
| "doc72_summary_reference"
| "doc25_document_ingestion"
| "doc25_work_product_ingestion"
| "doc73_library_ingestion"
| "no_capture"
```
| Enum | User-facing label | Meaning |
|---|---|---|
| `no_capture` | Do not save | Suppress optional artifact capture; only minimum execution receipts if required. |
| `run_only` | Keep with this run | Stored in task run/artifact storage; retrievable from Run Inspector while retained. |
| `doc72_reference` | Make findable as a linked item | DOC72 stores metadata/ref only; no substantive summary. |
| `doc72_summary_reference` | Make findable with summary | DOC72 stores summary/key points/entities/ref. |
| `doc25_document_ingestion` | Process as document | DOC25 converts/dedups/processes the document; endpoint chosen by policy. |
| `doc25_work_product_ingestion` | Save as work product | DOC25 processes; DOC72 links as work_product where appropriate. |
| `doc73_library_ingestion` | Add to library | DOC25 ingestion plus DOC73 library/corpus binding/extraction. |
### §11.3 DOC25 is a processor, not always the endpoint
DOC25 routes documents through ingestion, conversion, deduplication, quality reporting, and document intelligence. The destination after DOC25 may be:
- task run artifact;
- DOC72 work_product;
- DOC72 document/work-product-linked node;
- DOC73 library/corpus member;
- user file location;
- output delivery attachment.
The UI must not imply “route through DOC25” is the final destination. It should say what happens after processing.
### §11.4 TaskArtifactIntakePolicy
```ts
TaskArtifactIntakePolicy {
task_default: ArtifactIntakeLevel
final_output_default: ArtifactIntakeLevel
intermediate_output_default: ArtifactIntakeLevel
produced_document_default: ArtifactIntakeLevel
downloaded_document_default: ArtifactIntakeLevel
received_attachment_default: ArtifactIntakeLevel
sent_message_default: ArtifactIntakeLevel
received_message_default: ArtifactIntakeLevel
per_module_overrides: Record<string, ModuleArtifactIntakePolicy>
per_port_overrides: Record<string, ArtifactIntakeLevel>
auto_candidate_rules: ArtifactCandidateRule[]
user_review_required_above_level: ArtifactIntakeLevel
incognito_override: "suppress_doc72_doc25" | "suppress_all_optional_memory"
schema_version: "1.0"
}
```
### §11.5 ModuleArtifactIntakePolicy
```ts
ModuleArtifactIntakePolicy {
module_id: string
module_type: string
capture_mode:
| "not_applicable"
| "default"
| "never_capture"
| "keep_with_run"
| "make_findable"
| "make_findable_with_summary"
| "process_documents"
| "save_work_product"
| "add_to_library"
| "ask_each_time"
port_overrides: Record<string, ArtifactIntakeLevel>
output_archetype_hints: OutputArchetype[]
high_value_candidate: boolean
review_required: boolean
schema_version: "1.0"
}
```
### §11.6 Default policy
Recommended defaults:
```text
Utility/control modules such as junctions, switches, signals, delay, hold:
not_applicable or keep_with_run only.
Most intermediate module outputs:
keep_with_run.
Intermediate analysis outputs marked high-value by module/preset/policy:
ask_each_time or make_findable_with_summary candidate.
Final textual/file output:
make_findable_with_summary.
Substantial final document/work product:
process as document + save work product candidate.
Downloaded/received documents or attachments:
process as document if policy allows.
Library/corpus-targeted tasks:
add to library.
Judge/experiment raw artifacts:
keep_with_run unless submitted to evaluation dataset, assessment, or memory review.
Incognito:
suppress DOC72/DOC25/DOC8 optional writes unless user explicitly promotes afterward and policy allows.
```
### §11.7 ArtifactCandidateRule
```ts
ArtifactCandidateRule {
rule_id: string
applies_to:
| "final_output"
| "intermediate_output"
| "produced_document"
| "downloaded_document"
| "received_attachment"
| "judge_result"
| "red_team_report"
| "analysis_memo"
| "user_opened_artifact"
| "downstream_used_artifact"
condition: string
suggested_level: ArtifactIntakeLevel
rationale: string
requires_user_review: boolean
schema_version: "1.0"
}
```
### §11.8 TaskArtifactIndexEntry
Every material output receives an index entry even if it remains run-only, unless no-capture mode applies.
```ts
TaskArtifactIndexEntry {
artifact_id: string
task_id: string
run_id: string
module_id: string
activation_seq?: number
port_name: string
artifact_kind:
| "text_output"
| "file_output"
| "document"
| "work_product"
| "email"
| "text_message"
| "judge_result"
| "experiment_bundle"
| "red_team_report"
| "subagent_result"
| "tool_result"
| "system_receipt"
display_name: string
output_archetype?: string
storage_ref: StorageRef
intake_level: ArtifactIntakeLevel
doc72_ref?: string
doc25_ingestion_ref?: string
doc73_library_ref?: string
work_product_ref?: string
summary?: string
key_points?: string[]
linked_entity_refs: EntityRef[]
linked_goal_refs: GoalRef[]
created_at: string
lifecycle_state: "active" | "promoted" | "compacted" | "deleted" | "expired"
schema_version: "1.0"
}
```
### §11.9 User-facing artifact settings
Avoid overly technical or domain-specific labels. Suggested UI:
```text
Memory & Artifact Handling
──────────────────────────
Final results [Make findable with summary ▾]
Intermediate outputs [Keep with this run ▾]
High-value intermediate work [Ask whether to make findable ▾]
Documents produced by task [Process and save if final ▾]
Documents received/downloaded [Process as documents ▾]
Add selected outputs to library [Off / Ask / Auto for library tasks ▾]
Private/incognito run [Off]
```
Per module:
```text
This module's outputs
Capture behavior: [Use task default ▾]
Options:
Do not save
Keep with this run
Make findable
Make findable with summary
Process documents
Save final work product
Add to library
Ask each time
```
### §11.10 Message and communication artifacts
Artifact intake must handle more than markdown outputs and files. Tasks may send or receive emails, texts, chat messages, forum posts, task messages, webhook payloads, and notification receipts.
Default handling:
```text
Sent email/text/chat/forum outputs
Store receipt with run; make substantive body/output findable only if policy says so.
Received email/text/chat/forum trigger inputs
Store enough metadata for run audit; route documents/attachments through DOC25 if policy allows.
Communication body text
Treat as text artifact only when it is itself an output/input of analytical value.
Headers/sender/thread metadata
Use as context/entity linkage; do not promote as substantive memory unless needed.
External-send receipts
Preserve side-effect idempotency key, policy snapshot, recipient/account, timestamp, and delivery status.
```
The UI should describe the endpoint plainly, e.g., “Save email receipt only,” “Make email body findable,” “Process attachments as documents,” or “Save final sent memo as work product.”
---
## §12 — Telemetry and observability
### §12.1 Purpose
DOC23 needs production-grade telemetry so a business or individual can see exactly what happened, what is happening, what was prompted, what was produced, what tools were used, what failed, and what should improve.
### §12.2 TaskTraceSpan
Every meaningful task operation should emit a span-like record.
```ts
TaskTraceSpan {
span_id: string
trace_id: string
parent_span_id?: string
task_id: string
run_id: string
module_id?: string
activation_seq?: number
span_kind:
| "task_run"
| "module_activation"
| "llm_call"
| "prompt_assembly"
| "tool_call"
| "mcp_call"
| "connector_call"
| "doc24_packet_resolution"
| "capability_mount"
| "runtime_capability_expansion"
| "subagent_spawn"
| "policy_decision"
| "human_review"
| "file_document_transform"
| "artifact_emission"
| "output_delivery"
| "judge_eval"
| "experiment_variant"
| "retry"
| "fallback"
| "wait_state"
| "resume"
| "error"
name: string
status: "pending" | "running" | "success" | "error" | "cancelled" | "skipped" | "waiting"
started_at: string
completed_at?: string
duration_ms?: number
input_ref?: StorageRef
output_ref?: StorageRef
summary?: string
error?: TaskError
cost_usd?: number
token_count?: { input: number; output: number }
model_ref?: string
agent_ref?: string
prompt_snapshot_ref?: StorageRef
capability_snapshot_ref?: StorageRef
policy_snapshot_ref?: StorageRef
doc24_packet_snapshot_ref?: StorageRef
subagent_trace_ref?: SubAgentTraceRef
artifact_refs: ArtifactRef[]
tags: string[]
schema_version: "1.0"
}
```
### §12.3 EffectivePromptSnapshot
```ts
EffectivePromptSnapshot {
snapshot_id: string
task_id: string
run_id: string
module_id: string
activation_seq: number
model_ref: string
agent_ref: string
cIL_layer_refs: StorageRef[]
doc24_packet_ref?: StorageRef
system_instruction_ref?: StorageRef
module_instruction_ref?: StorageRef
dynamic_instruction_ref?: StorageRef
context_bundle_ref?: StorageRef
chain_history_projection_ref?: StorageRef
attached_file_refs: FileRef[]
prompt_text_ref?: StorageRef
redaction_mode: "none" | "redacted" | "hash_only" | "not_captured"
prompt_hash: string
created_at: string
schema_version: "1.0"
}
```
### §12.4 CapabilitySnapshot
```ts
CapabilitySnapshot {
snapshot_id: string
task_id: string
run_id: string
module_id?: string
required_capabilities: CapabilityRef[]
optional_capabilities: CapabilityRef[]
mounted_capabilities: CapabilityRef[]
denied_capabilities: CapabilityRef[]
live_state_summary: CapabilityLiveStateSummary[]
mcp_server_refs: McpServerRef[]
connector_refs: ConnectorRef[]
procedure_refs: ProcedureRef[]
model_availability_snapshot: ModelAvailabilitySnapshot
doc24_registry_generation_id: string
created_at: string
schema_version: "1.0"
}
```
### §12.5 PolicyDecisionSnapshot
```ts
PolicyDecisionSnapshot {
snapshot_id: string
task_id: string
run_id: string
module_id?: string
decision_kind:
| "capability_mount"
| "tool_call"
| "external_send"
| "file_write"
| "memory_write"
| "doc25_ingestion"
| "doc73_library_add"
| "subagent_spawn"
| "model_selection"
decision: "allowed" | "blocked" | "requires_approval" | "degraded"
reason_codes: string[]
policy_generation_id: string
exposure_context_ref?: StorageRef
created_at: string
schema_version: "1.0"
}
```
### §12.6 Run Inspector extensions
Run Inspector should include:
1. **Trace Tree** — hierarchical span tree.
2. **Timeline** — Gantt-style module/subagent/tool/wait view.
3. **Graph Progress** — current active module(s), completed modules, waiting modules.
4. **Prompt & Context** — effective prompt snapshots, CIL layers, DOC24 packets, context bundle, chain history.
5. **Capabilities & Tools** — mounted capabilities, runtime expansions, denied/blocked tools.
6. **Artifacts & Outputs** — all TaskArtifactIndex entries with filters.
7. **Costs** — per module, per model, per subagent, total.
8. **Quality & Judges** — Judge/Experiment/Claim Extractor summaries.
9. **Anomalies & Failures** — errors, retries, validation warnings, policy blocks.
10. **Memory & Learning** — DOC72 TaskRun memory, artifact intake, learning signals, assessments.
### §12.7 Task History view
A global Task History/Runs view should filter by:
- task;
- template;
- matter/entity;
- status;
- date;
- trigger;
- output type;
- artifact saved/promoted;
- assessment quality;
- failure type;
- capability used;
- cost range.
Task Agent uses this view internally and conversationally.
### §12.8 OTEL/export posture
DOC23 should define ELNOR-native trace schemas first. It may provide OpenTelemetry-compatible export adapters, but no external telemetry standard should own DOC23's internal trace truth.
---
## §12A — Task Agent operational truth and evaluation evidence consumption
### §12A.1 Task Agent knowledge sources during operation
When Task Agent answers questions about a task, reviews a task, designs a task, or generates an assessment, it should distinguish three evidence layers:
```text
Operational run truth:
DOC23/EC-owned run records, activation records, checkpoints, logs, prompt snapshots, capability snapshots, policy snapshots, tool calls, sub-agent traces, artifacts, costs, errors, and Run Inspector read models.
DOC72 task memory:
summary/reference memory, entity/matter/work-product/artifact links, TaskRun execution_trace hubs, goals, task invocation directives, template/preset semantic cards, and prior design cases.
Learning signals:
DOC8/BDSM utility, friction, question utility, task invocation utility, prompt utility, capability utility, user edit traces, judge/experiment evidence, and promotion/rollback outcomes.
```
Task Agent MUST NOT confuse these layers. Operational run truth is the audit/debug source of truth. DOC72 task memory is the semantic memory substrate. Learning signals inform recommendations and policy proposals.
### §12A.2 TaskAgentOperationalConsultTrace
```ts
export type TaskAgentOperationalConsultTrace = {
consult_id: string;
entrypoint: TaskAgentEntrypoint;
user_request_ref: string;
task_id?: string;
run_id?: string;
module_id?: string;
consulted_operational_truth_refs: StorageRef[];
consulted_doc72_memory_refs: string[];
consulted_learning_signal_refs: string[];
consulted_tkp_card_refs: string[];
consulted_capability_refs: string[];
response_kind:
| "status_answer"
| "output_retrieval"
| "graph_explanation"
| "design_recommendation"
| "assessment"
| "prompt_improvement"
| "task_mode_advice";
degraded_reasons: string[];
created_at: string;
schema_version: "1.0";
};
```
### §12A.3 TaskInvocationTrace
Every nontrivial task-mode decision should be recordable as an audit/learning trace even when no packet is injected.
```ts
export type TaskInvocationTrace = {
trace_id: string;
user_request_ref: string;
task_mode_decision_ref: string;
initial_user_text_summary: string;
resolved_intent_summary: string;
selected_mode: ElnorExecutionMode;
task_opportunity_level: TaskOpportunityLevel;
matched_directive_ids: string[];
matched_task_template_ids: string[];
matched_module_preset_ids: string[];
task_agent_invoked: boolean;
task_agent_entrypoint?: TaskAgentEntrypoint;
task_agent_consult_ref?: string;
user_visible_task_suggestion_shown: boolean;
user_response?:
| "accepted"
| "rejected"
| "ignored"
| "asked_direct_answer_instead"
| "later_requested_task";
resulting_task_id?: string;
resulting_run_id?: string;
resulting_task_receipt_ref?: string;
learning_signal_refs: string[];
created_at: string;
schema_version: "1.0";
};
```
### §12A.4 Judge / Experiment / Claim / Optimization evidence
Task Agent MUST be able to inspect Addenda A evidence when reviewing or assessing tasks:
```text
Experiment variants and ComparisonBundles
VariantOutputBundles
JudgeScoreBundles and dimension audit trails
Claim extraction outputs
Claim-level verdicts and source spans
sub-agent trace refs used by judges/verification
prompt variant outcomes
human labels when available
promotion/rollback ledger entries when operative
online scoring/drift signals when operative
DSPy/GEPA optimization results when R5 substrate is operative
```
Task Agent uses this evidence to generate Task Assessment findings such as:
```text
prompt underperformed;
judge rubric was unscorable;
variant B outperformed baseline;
red-team module caught recurring weakness;
source-grounding was insufficient;
module output shape was not downstream-compatible;
second-pass deepening improved results;
capability/tool was unused or misapplied;
template/preset should be updated or not updated.
```
Task Agent MUST NOT auto-promote experiment winners, judge prompt revisions, DSPy candidates, or task template changes without the relevant promotion ledger, approval, and rollback contracts.
### §12A.5 Task Agent eval suite must test invocation and prompt behavior
Task Agent's own evaluation suite SHOULD include:
```text
whether it over-suggests tasks for direct requests;
whether it under-suggests tasks for high-fit saved-task requests;
whether it reuses templates instead of creating duplicates;
whether it uses Task Agent entrypoints correctly;
whether it designs valid graphs;
whether it routes context correctly;
whether it chooses appropriate module prompts;
whether it uses Prompt Advisor service only when appropriate;
whether it creates prompt-evaluation tasks only for reusable/high-value prompts;
whether it avoids Prompt Lab / Promptolution / DSPy product-surface fragmentation;
whether it respects DOC24 packet limits and TKP boundaries;
whether it degrades honestly when DOC11/OpenClaw/TKP/DOC72/Addenda A evidence is unavailable.
```
---
## §13 — DOC24 capability binding and runtime expansion
### §13.1 Purpose
Modules should declare intended capabilities, but DOC24 should resolve live tools/connectors/MCP/procedures/models and may expand at runtime when upstream prompts or outputs create new needs within policy.
### §13.2 Two-pass capability resolution
#### Pass 1 — design-time / run-start binding
At task save or run start, modules declare required/allowed/denied capabilities.
#### Pass 2 — runtime DOC24 expansion
When a module receives unexpected content or a new need, DOC24 may add relevant capabilities within the module's allowed envelope, emitting an expansion receipt.
### §13.3 ModuleCapabilityPolicy
```ts
ModuleCapabilityPolicy {
mode: "auto" | "explicit" | "explicit_plus_auto_relevant" | "none"
required_capabilities: CapabilityRef[]
allowed_capabilities: CapabilityRef[]
denied_capabilities: CapabilityRef[]
preferred_procedures: ProcedureRef[]
allowed_mcp_servers: McpServerRef[]
connector_refs: ConnectorRef[]
runtime_expansion_policy: RuntimeExpansionPolicy
side_effect_policy: SideEffectPolicy
snapshot_required: boolean
schema_version: "1.0"
}
```
### §13.4 RuntimeExpansionPolicy
```ts
RuntimeExpansionPolicy {
enabled: boolean
expansion_basis:
| "upstream_prompt_detected"
| "missing_required_capability"
| "failure_recovery"
| "doc24_relevance"
max_added_capabilities: number
capabilities_outside_allowlist_require_approval: boolean
external_side_effects_require_approval: boolean
emit_expansion_receipt: boolean
schema_version: "1.0"
}
```
### §13.5 RuntimeCapabilityExpansionReceipt
```ts
RuntimeCapabilityExpansionReceipt {
receipt_id: string
task_id: string
run_id: string
module_id: string
activation_seq: number
detected_need: string
basis: RuntimeExpansionPolicy["expansion_basis"]
added_capabilities: CapabilityRef[]
blocked_capabilities: CapabilityRef[]
approval_required: boolean
approval_ref?: string
doc24_packet_generation_id: string
policy_snapshot_ref: StorageRef
created_at: string
schema_version: "1.0"
}
```
### §13.6 Example: PACER in legal module
```text
Module B was configured for legal research.
Module A sends: “Check the PACER docket for Marex filings.”
DOC24 detects PACER need.
If PACER is allowed but not mounted -> mount PACER and emit receipt.
If PACER is not allowed -> pause or ask approval.
If PACER is denied -> block and explain.
```
Module binding example:
```ts
required_capabilities: [
"capability:doc25.retrieve_document_pages"
]
optional_capabilities: [
"capability:pacer.search_docket",
"capability:pacer.retrieve_filing",
"capability:legal_citation_check"
]
denied_capabilities: [
"capability:email.send",
"capability:file.delete"
]
```
### §13.7 MCP posture
MCP servers are capability providers. They must register into DOC24 with stable capability IDs, health, auth, side-effect class, schema hash, and least-privilege tool-pack policies. DOC23 modules should not mount raw MCP tool sprawl directly.
---
## §13A — Task-module DOC24 context injection and surface/run scope isolation
### §13A.1 Purpose
DOC24 context injection into DOC23 task modules must make task modules smarter without creating context bleed from unrelated chats, unrelated active work, or stale user focus. A task run may occur later, recur on a schedule, trigger from email, trigger from a webhook, launch from a task-complete event, or run while the user has several unrelated chats open. Therefore a task module's DOC24 packet must be scoped to the **task run and its launch/trigger evidence**, not to a singular notion of what Elnor is currently working on.
This section governs DOC24 context packets for all task modules that receive model-facing instructions, tool/capability awareness, entity/memory cards, procedures, document references, library references, prior task memory, or policy constraints.
### §13A.2 No singular current-work-context assumption
Addenda B SHALL NOT assume there is one global "current work context" that can safely govern task-module injection. ELNOR may have multiple concurrent chats, rooms, task runs, panels, browser sessions, task editors, and background processes. A user may be discussing DOC73 in one chat, Marex in another chat, and have a scheduled docket-monitor task running for an unrelated matter. The scheduled task must not inherit the DOC73 or Marex chat context unless the task run is explicitly bound to one of those contexts.
Normative rule:
```text
There is no default global current context for DOC23 task modules.
Each task run has its own TaskRunScopeEnvelope.
DOC24 assembles module context from that run envelope, the task definition, the trigger/invocation payload, module inputs, explicit bindings, policies, and relevant durable memory.
```
DOC24's active-context / working-context machinery may help resolve a user utterance or launch surface, but it is not a default source of model-facing task context. DOC24 R3 already treats active context as optional, defeatable, and non-authoritative with respect to graph truth, memory truth, and live capability truth; DOC23 task modules must preserve that boundary.
### §13A.3 Task Context Isolation Invariant
A task module's DOC24 context packet SHALL be assembled from:
```text
- the TaskRunScopeEnvelope;
- the task's declared scope;
- run inputs and trigger payloads;
- module config and module capability/context policy;
- graph wiring and upstream outputs;
- blueprint/template bindings;
- explicit source/document/library bindings;
- task invocation directives;
- environment.config;
- policy-approved durable memory;
- prior task memory relevant to the task/run scope;
- and live DOC24 capability/tool/model availability.
```
The user's active chat, active work, active matter, active library, or currently focused project SHALL NOT be injected into a task module merely because it is currently salient to Elnor, recently used, or present in another surface.
Surface/chat context MAY be used only when one of the following is true:
```text
1. The task run was explicitly created or launched from that surface and the run envelope records that it inherits or selects that surface context.
2. The user manually selected the surface/chat/document/library/matter context for the run.
3. The task blueprint or task template declares an explicit context binding.
4. The trigger payload, source module, selected document, or run input resolves to the same entity, matter, work product, library, source set, or task template.
5. DOC24 resolves a high-confidence entity/matter/library match from task-owned evidence, not from recency alone.
6. The task is itself about that surface/system/spec/work context, such as a task to review DOC73, CSA, or a specified ELNOR spec file.
```
Default rule:
```text
Surface context is candidate evidence for scope resolution.
It is not automatic injected context.
```
### §13A.4 Scheduled, triggered, and background task rule
For scheduled, email-triggered, webhook-triggered, file-watcher-triggered, external-event, task-complete, and other background runs, ordinary chat context is **not applicable by default**. The strongest scope evidence is the trigger/run payload plus the saved task definition.
```text
Scheduled run:
Scope comes from saved task/template/blueprint, recurrence config, environment.config,
selected source sets, saved matter/library/document bindings, and prior run state.
It does not inherit whichever chat is active at the scheduled time.
Email-triggered run:
Scope comes from the email account, folder, sender, recipients, subject, body,
thread, attachments, source-classification rules, attachment ingestion results,
and task trigger config.
It does not inherit the user's current chat/work focus.
Webhook/file/external-event run:
Scope comes from the event payload, authenticated source, correlation key,
configured trigger mapping, and saved task bindings.
It does not inherit active chat context.
```
When a background task needs persistent matter/library/source context, that context must be saved in the task definition, blueprint, template, trigger mapping, invocation directive, environment.config, or source module configuration. It must not be recovered from whatever the user is doing at run time.
### §13A.5 Scoped injection does not mean starved injection
This section does not require minimal context. It requires correct context. DOC24 must not over-limit task modules so much that they lose the benefits of ELNOR memory, preferences, tools, and procedures.
The following categories remain eligible even when unrelated surface context is excluded:
```text
1. Global hard constraints and safety rules.
2. Global or domain user preferences that are relevant to the task class.
3. Standing procedures or memory directives that apply to the task's entity, matter, domain, source, trigger, or side-effect class.
4. Task-template and module-preset semantic cards.
5. Prior task-run memory for the same task, template, matter, work product, source set, trigger family, or explicitly linked artifact.
6. Task-relevant tool/capability/procedure utility bundles.
7. Artifact/output intake policy and retention policy.
8. Source-grounding, re-read, citation, privilege, and outbound side-effect restrictions.
9. User-authored task blueprint goals, quality criteria, output preferences, and review gates.
10. Trigger-specific context, such as email thread context or recurrence-state context, when it is part of the TaskRunScopeEnvelope.
```
Example: if Will generally prefers an issue matrix plus revision plan for complaint review, that preference may be injected into a complaint-evaluation task even if several active chats are about DOC73, CSA, or prompt evaluation. But those unrelated chats themselves should not be injected unless the task run is bound to them.
### §13A.6 Profiles control categories, not relevance
Context injection profiles control which categories may be included; they do not decide which specific items are relevant.
```text
profile = "domain_aware"
means:
task-relevant entities, memories, preferences, procedures, and restrictions may be included.
does not mean:
inject Elnor's active chat/work context, recent DOC73 discussion, current CSA work,
or unrelated recent memories.
```
A profile is an upper bound on eligible component families. Task/run/module scope, trigger evidence, policy, and relevance still decide item selection.
### §13A.7 TaskRunScopeEnvelope
Each task run SHALL have a `TaskRunScopeEnvelope` computed at run start. DOC24 context packets for module activations derive from this envelope plus module-specific activation evidence.
```ts
TaskRunScopeEnvelope {
envelope_id: string
task_id: string
run_id: string
launch_surface:
| "chat"
| "task_ui"
| "task_canvas"
| "run_inspector"
| "schedule_trigger"
| "email_trigger"
| "webhook_trigger"
| "file_watcher_trigger"
| "external_event"
| "task_complete_trigger"
| "manual_api"
| "ec_system_job"
| "unknown"
launch_surface_ref?: StorageRef
surface_context_policy:
| "not_applicable" // background/triggered runs by default
| "none" // launch surface exists but context not inherited
| "explicit_inherit" // user or task receipt explicitly inherits surface context
| "explicit_selected" // user selected specific surface/context refs
| "trigger_bound" // trigger payload supplies the run's scope
| "template_bound" // template/blueprint supplies scope
| "forbidden" // policy forbids surface-context inheritance
inherited_surface_context_refs: StorageRef[]
selected_surface_context_refs: StorageRef[]
resolved_scope: {
entity_refs: EntityRef[]
matter_refs: EntityRef[]
work_product_refs: WorkProductRef[]
document_refs: DocumentRef[]
library_refs: LibraryRef[]
source_set_refs: StorageRef[]
task_template_refs: string[]
module_preset_refs: string[]
goal_refs: GoalRef[]
invocation_directive_refs: string[]
}
trigger_scope?: TriggerScopeSummary
schedule_scope?: ScheduledRunScopeSummary
email_scope?: EmailTriggerScopeSummary
policy_snapshot_ref: StorageRef
source_visibility_snapshot_ref?: StorageRef
created_at: string
schema_version: "1.0"
}
```
Required defaults:
```text
For launch_surface = schedule_trigger / email_trigger / webhook_trigger / file_watcher_trigger / external_event / task_complete_trigger / ec_system_job:
surface_context_policy defaults to "not_applicable" or "trigger_bound".
inherited_surface_context_refs defaults to [].
For launch_surface = chat:
surface_context_policy defaults to "none" unless the user request, task receipt,
or TaskModeDecision explicitly chooses inheritance/selection.
```
### §13A.8 TriggerScopeSummary, ScheduledRunScopeSummary, and EmailTriggerScopeSummary
Triggered/background runs need source-specific scope records so DOC24 can assemble context without relying on active chat state.
```ts
TriggerScopeSummary {
trigger_module_id: string
trigger_kind:
| "email"
| "schedule"
| "webhook"
| "file_watcher"
| "discord"
| "forum_panel"
| "task_complete"
| "manual_api"
| "system_job"
correlation_key?: string
trigger_payload_ref?: StorageRef
resolved_entity_refs: EntityRef[]
resolved_document_refs: DocumentRef[]
resolved_library_refs: LibraryRef[]
source_classification_refs: StorageRef[]
confidence: number
schema_version: "1.0"
}
ScheduledRunScopeSummary {
schedule_id: string
recurrence_rule_ref: StorageRef
saved_scope_refs: StorageRef[]
last_run_scope_ref?: StorageRef
run_window_start: string
run_window_end?: string
carry_forward_scope_from_prior_run: boolean
schema_version: "1.0"
}
EmailTriggerScopeSummary {
email_account_ref: string
mailbox_or_folder_ref?: string
message_id: string
thread_id?: string
sender_ref?: EntityRef
recipient_refs: EntityRef[]
subject_summary: string
body_summary_ref?: StorageRef
attachment_refs: DocumentRef[]
doc25_ingestion_result_refs: StorageRef[]
source_classification_refs: StorageRef[]
new_case_or_matter_candidate_refs: EntityRef[]
schema_version: "1.0"
}
```
Email-triggered tasks that create or update matters, cases, work products, libraries, or task runs SHOULD record the trigger-derived entities as candidate or confirmed scope according to DOC72/DOC24/DOC1 governance, but they MUST NOT infer scope from unrelated concurrent chats.
### §13A.9 TaskModuleContextPacket
DOC24 SHALL produce a task-scoped packet for each model-facing task-module activation where DOC24 context is enabled.
```ts
TaskModuleContextPacket {
packet_id: string
task_id: string
run_id: string
module_id: string
module_type: string
activation_seq: number
run_scope_envelope_ref: StorageRef
assembly_scope: {
invocation_context: "task_module"
task_context_binding:
| "none"
| "explicit_task_scope"
| "inherited_from_launch_surface"
| "selected_surface_context"
| "trigger_resolved"
| "schedule_bound"
| "email_bound"
| "template_bound"
| "blueprint_bound"
| "manual_user_selected"
| "module_input_resolved"
| "library_bound"
| "system_context_task"
surface_context_used: boolean
surface_context_reason?: string
surface_context_refs: StorageRef[]
surface_context_policy: TaskRunScopeEnvelope["surface_context_policy"]
}
context_basis: TaskModuleContextBasis
injection_profile: TaskContextInjectionProfileRef
injected_layers: {
tools: ToolCapabilityCard[]
procedures: ProcedureCard[]
entity_cards: EntityCard[]
memory_excerpts: MemoryExcerpt[]
restrictions: RestrictionCard[]
preferences: PreferenceCard[]
documents: DocumentContextCard[]
libraries: LibraryContextCard[]
prior_task_runs: PriorTaskRunCard[]
specialist_agents: SpecialistAgentCard[]
output_artifact_policy: ArtifactPolicyCard[]
task_blueprint_context: BlueprintContextCard[]
task_template_context: TaskTemplateContextCard[]
prompt_or_evaluation_guidance: PromptEvaluationGuidanceCard[]
trigger_context: TriggerContextCard[]
recurrence_context: RecurrenceContextCard[]
}
excluded_candidates: ExcludedContextCandidate[]
policy_snapshot_ref: StorageRef
capability_snapshot_ref: StorageRef
source_refs: StorageRef[]
token_budget: {
requested_tokens: number
actual_tokens: number
max_tokens: number
trimmed_items: string[]
}
packet_generation_id: string
created_at: string
schema_version: "1.0"
}
```
### §13A.10 ExcludedContextCandidate
DOC24 SHALL record meaningful excluded candidates when context is considered but not injected. This is required for Run Inspector, Task Assessment, and debugging context omissions or context bleed.
```ts
ExcludedContextCandidate {
candidate_kind:
| "surface_context"
| "active_chat_context"
| "active_work_context"
| "memory_directive"
| "entity_card"
| "library"
| "prior_task_run"
| "procedure"
| "tool_capability"
| "document"
| "task_template"
| "module_preset"
candidate_ref: string
exclusion_reason:
| "not_task_bound"
| "not_run_bound"
| "surface_context_not_applicable"
| "low_relevance"
| "wrong_matter"
| "wrong_library"
| "wrong_task_type"
| "wrong_trigger_scope"
| "policy_blocked"
| "budget_trimmed"
| "profile_excludes_component"
| "module_config_denied"
| "incognito_or_memory_suppressed"
| "active_context_recency_only"
| "requires_user_selection"
| "source_visibility_blocked"
explanation?: string
schema_version: "1.0"
}
```
### §13A.11 TaskModuleContextBasis
DOC24 must assemble module context from the actual module purpose and activation inputs. It must not assume every module has a single `config.instruction` field.
```ts
TaskModuleContextBasis {
module_type: string
primary_instruction_summary: string | null
output_contract_summary: string | null
evaluation_or_review_basis?: string
data_input_summary?: string
context_input_summary?: string
instruction_input_summary?: string
file_input_summary?: string
trigger_payload_summary?: string
resolved_entity_refs: EntityRef[]
resolved_document_refs: DocumentRef[]
resolved_library_refs: LibraryRef[]
resolved_task_artifact_refs: TaskArtifactRef[]
side_effect_class?: SideEffectClass
schema_version: "1.0"
}
```
Required module-specific extraction examples:
```ts
getTaskModuleContextBasis(module, activation): TaskModuleContextBasis {
switch (module.type) {
case "step.agent_task":
return {
module_type: module.type,
primary_instruction_summary: summarize(module.config.instruction),
output_contract_summary: summarize(module.config.output_contract),
data_input_summary: summarizePortInput(activation.data_in),
context_input_summary: summarizePortInput(activation.context_in),
instruction_input_summary: summarizePortInput(activation.instruction_in),
file_input_summary: summarizeFileInputs(activation.file_only_in),
trigger_payload_summary: summarizeTriggerPayload(activation.trigger_payload),
resolved_entity_refs: resolveEntitiesFromActivation(activation),
resolved_document_refs: resolveDocumentsFromActivation(activation),
resolved_library_refs: resolveLibrariesFromActivation(activation),
resolved_task_artifact_refs: resolveTaskArtifactsFromActivation(activation),
schema_version: "1.0",
}
case "step.judge":
return {
module_type: module.type,
primary_instruction_summary: summarizeJudgeDimensions(module.config.dimensions),
output_contract_summary: "structured judge scores and audit trail",
evaluation_or_review_basis: summarize(module.config.evaluation_instruction),
data_input_summary: summarizePortInput(activation.candidate_in ?? activation.data_in),
context_input_summary: summarizePortInput(activation.comparison_bundle_in ?? activation.context_in),
resolved_entity_refs: resolveEntitiesFromActivation(activation),
resolved_document_refs: resolveDocumentsFromActivation(activation),
resolved_library_refs: resolveLibrariesFromActivation(activation),
resolved_task_artifact_refs: resolveTaskArtifactsFromActivation(activation),
schema_version: "1.0",
}
case "step.agent_review_gate":
return {
module_type: module.type,
primary_instruction_summary: summarize(module.config.review_criteria),
output_contract_summary: "approve / revise / reject / feedback",
data_input_summary: summarizePortInput(activation.data_in),
context_input_summary: summarizePortInput(activation.context_in),
resolved_entity_refs: resolveEntitiesFromActivation(activation),
resolved_document_refs: resolveDocumentsFromActivation(activation),
resolved_library_refs: resolveLibrariesFromActivation(activation),
resolved_task_artifact_refs: resolveTaskArtifactsFromActivation(activation),
schema_version: "1.0",
}
case "output.file":
return {
module_type: module.type,
primary_instruction_summary: summarize(
module.config.naming_instructions + "\n" +
module.config.directory_instructions
),
output_contract_summary: "file path, naming, overwrite policy, receipt",
data_input_summary: summarizePortInput(activation.data_in),
context_input_summary: summarizePortInput(activation.context_in),
side_effect_class: "file_write",
resolved_entity_refs: resolveEntitiesFromActivation(activation),
resolved_document_refs: resolveDocumentsFromActivation(activation),
resolved_library_refs: resolveLibrariesFromActivation(activation),
resolved_task_artifact_refs: resolveTaskArtifactsFromActivation(activation),
schema_version: "1.0",
}
case "step.coding":
return {
module_type: module.type,
primary_instruction_summary: summarizeCodingAssignments(module.config.assignments),
output_contract_summary: "diffs, test results, changed files, receipt",
data_input_summary: summarizePortInput(activation.data_in),
context_input_summary: summarizePortInput(activation.context_in),
instruction_input_summary: summarizePortInput(activation.instruction_in),
resolved_entity_refs: resolveEntitiesFromActivation(activation),
resolved_document_refs: resolveDocumentsFromActivation(activation),
resolved_library_refs: resolveLibrariesFromActivation(activation),
resolved_task_artifact_refs: resolveTaskArtifactsFromActivation(activation),
schema_version: "1.0",
}
}
}
```
### §13A.12 Context-source priority cascade
DOC24 SHALL assemble task-module context using this priority cascade:
```text
1. Hard policy, security, privilege, side-effect, and source-visibility constraints.
2. TaskRunScopeEnvelope: launch/trigger type, selected scope, trigger payload, schedule/email/webhook/file/task-complete context.
3. Module config and module capability/context policy.
4. Runtime inputs: data_in, context_in, instruction_in, file inputs, trigger payloads, and resolved entity/document/library refs.
5. Task-level scope: Blueprint, template, invocation directive, environment.config, selected source modules, source sets, and declared matter/library bindings.
6. Prior task memory: same task/template/matter/work-product/source-set/trigger-family prior runs and linked artifacts.
7. Relevant global or domain user preferences, procedures, and memory directives.
8. Live DOC24 capability/tool/model/connector/MCP availability and utility bundles.
9. Launch-surface context only when explicitly inherited/selected or high-confidence bound by task-owned evidence.
```
The cascade is not a simple token priority list. Items in a lower category may outrank higher-category optional items when the module's task purpose requires them, but hard policy and explicit user exclusions always dominate. Launch-surface context is deliberately last because it is often unrelated to scheduled or background task work.
### §13A.13 Surface context use cases
Surface context may be used for:
```text
- disambiguating a user invocation such as “run the draft task for Marex”;
- selecting among otherwise ambiguous matching task templates;
- resolving “this complaint” when the task is launched from a chat or document surface;
- carrying explicit user-provided context into an Elnor-created task when the receipt says so;
- providing current system/spec context when the task is explicitly about that system/spec work;
- linking a task created inside a task editor, run inspector, library surface, document viewer, or chat to that surface when the user accepts the binding.
```
Surface context must not be used for:
```text
- filling ordinary task modules with unrelated current chat context;
- injecting a current CSA/DOC73 discussion into an unrelated legal drafting task;
- injecting one chat's matter into another chat's task;
- injecting a currently active matter into a scheduled task for a different matter;
- giving an email-triggered task unrelated chat context because the user happens to be chatting at the same time;
- bypassing explicit task source bindings;
- replacing DOC25/DOC73 source retrieval with conversational recollection.
```
### §13A.14 Runtime expansion from upstream content
If an upstream module produces new content that changes the next module's needs, DOC24 may expand context or capabilities for the next module only within the next module's policy envelope.
Example:
```text
Module A produces: “Check the PACER docket for Marex filings.”
Module B originally has a general legal-research profile.
DOC24 may mount PACER or inject PACER procedure guidance only if Module B allows it, the task policy permits it, and the expansion receipt records why.
```
This supports mid-process adaptation without letting DOC24 smuggle unrelated surface context into the graph.
### §13A.15 DOC73 / library context rule
DOC73 library/corpus context may be injected into a task module only if one of the following is true:
```text
1. The task has an explicit library/corpus binding.
2. A source module selected documents from that library.
3. The Task Blueprint declares the library as a source.
4. The trigger payload resolves to a library-scoped document/work product.
5. The user invoked the task from a library surface and the run envelope records selected/inherited library context.
6. A module capability binding requires corpus/library retrieval.
7. Task Agent proposed the library and the user accepted it.
8. The task is a declared broad-scope library audit/search/comparison task.
```
Otherwise, a current DOC73 conversation or library-focused chat must not leak into unrelated tasks. DOC73 libraries are scoped knowledge regions in the graph, not ambient overlays on every task.
### §13A.16 DOC25 document context rule
Task modules should normally receive document references, summaries, page markers, and retrieval tools rather than full document bodies. Full text or page excerpts may be injected only when:
```text
- the module's input contract requires it;
- the selected document/page set fits budget;
- source-visibility policy permits it;
- the task or module explicitly selected the source;
- the trigger/input payload supplied the source;
- and DOC25/DOC24 marker governance is followed.
```
Substantial produced, downloaded, or received documents route through DOC25 ingestion under §11 / §18. DOC25 remains the processor; DOC72/DOC73/task artifacts/work products remain the destination surfaces depending on policy.
### §13A.17 Global preferences, restrictions, and standing orders
This isolation model does not block relevant global memory. It blocks unrelated surface context.
Examples of globally eligible items:
```text
- “Ask before sending external emails.”
- “Use issue matrices for complaint evaluation.”
- “For complex legal analysis, consider second-pass deepening.”
- “Do not rely on memory for source-document quotes; re-read source documents.”
- “Never use personal email for work filings.”
```
Examples of globally ineligible items unless task-bound:
```text
- “Currently reviewing DOC73 V1.5.1.”
- “Currently drafting CSA architecture notes.”
- “Current chat thread about prompt lab naming.”
- “Recently opened unrelated Marex run inspector.”
```
### §13A.18 Cross-scope and broad-scope tasks
Some tasks intentionally require broad context. The isolation invariant must not prevent such tasks from working.
Broad-scope context is allowed when the task explicitly asks for it, for example:
```text
- “Compare all tasks that touch Marex.”
- “Find every library that mentions loss causation.”
- “Audit my saved tasks and suggest improvements.”
- “Summarize recent work across DOC23, DOC24, and DOC73.”
- “Search all prior complaint-review tasks for reusable patterns.”
```
For broad-scope tasks, DOC24 must still emit a packet receipt showing the declared scope, source boundaries, visibility exclusions, and policy filters.
### §13A.19 Module-family defaults
Recommended defaults:
| Module family | DOC24 packet posture |
|---|---|
| Trigger modules | No LLM context packet; trigger payload creates scope evidence and TaskRunScopeEnvelope fields. |
| Schedule trigger | Scope comes from recurrence config, saved task bindings, environment.config, and prior scheduled-run state; no active chat inheritance. |
| Email trigger | Scope comes from email account/folder/thread/body/attachments/source rules/DOC25 ingestion; no active chat inheritance. |
| Source modules | Source selection and metadata only; no unrelated memory. |
| Agent Task / Red Team / Agent Review | Task-scoped entities, documents, procedures, preferences, tools, and restrictions according to module policy. |
| Judge / Claim Extractor / Experiment | Evaluator-mode context: rubric, evidence policy, source scope, candidate/comparison bundles, not broad active context. |
| Utility modules | Usually no DOC24 packet unless they call an LLM or need capability/context policy. |
| Output modules | Delivery/naming/formatting/recipient/policy context only; no broad task-substance injection unless required for composition. |
| Coding module | ACP/session/workspace context, explicit task files, coding policy, and module instructions; no unrelated chat context. |
| Monitor / Environment system modules | May publish task scope, environment defaults, and context policy, but do not themselves authorize unrelated context injection. |
### §13A.20 Packet receipts and Run Inspector display
Each model-facing module activation that receives DOC24 context SHALL link to its `TaskModuleContextPacket` in:
```text
- EffectivePromptSnapshot;
- TaskTraceSpan;
- Run Inspector context/prompt detail view;
- Task Assessment memory/context findings;
- Task Agent operational lens.
```
Run Inspector should show:
```text
Run scope:
- Launch: email_trigger
- Email account: Firm Outlook
- Thread: Marex new filing alert
- Attachments: 2 filings routed through DOC25
- Surface context: not applicable
Injected:
- 2 entity cards: Marex, Draft Complaint work product
- 1 preference: issue matrix output preference
- 3 tools: DOC25 retrieve pages, citation checker, PACER docket search
- 1 source set: selected complaint and exhibits
Excluded:
- chat context “DOC73 spec review” — surface context not applicable
- library “CSA research” — wrong library
- prior task “Narayanan docket monitor” — low relevance
```
### §13A.21 Validation and warnings
Addenda B introduces validation and warning codes for task-module context packets:
```text
validation.task_run_scope_envelope_required
validation.task_module_context_packet_required
validation.task_module_context_surface_context_unjustified
validation.task_module_context_policy_snapshot_missing
validation.task_module_context_source_visibility_violation
validation.task_module_context_profile_overrides_denied_capability
validation.task_module_context_trigger_scope_missing
warning.task_module_context_surface_context_excluded
warning.task_module_context_broad_scope_declared
warning.task_module_context_relevant_memory_trimmed
warning.task_module_context_library_candidate_excluded
warning.task_module_context_prior_task_candidate_excluded
warning.task_module_context_trigger_scope_low_confidence
```
### §13A.22 Relationship to Task Agent and TKP
Task Agent may inspect `TaskRunScopeEnvelope` and `TaskModuleContextPacket` records to answer why a module did or did not receive certain context. TKP may include decision cards explaining context-routing strategy, but TKP is not injected into ordinary modules unless the module is in a task-design/review/evaluation context requiring Task Agent knowledge.
Task Agent must not treat packet omissions as errors merely because context exists elsewhere. The question is whether the context was run-bound, task-bound, module-relevant, policy-allowed, and budget-appropriate.
---
## §14 — Durable execution readiness
### §14.1 Purpose
DOC23 already has checkpoints and run state. Addenda B strengthens the semantic contract for long-running business-grade workflows.
### §14.2 DurableTaskExecutionContract
```ts
DurableTaskExecutionContract {
checkpoint_strategy:
| "module_boundary"
| "llm_call_boundary"
| "tool_call_boundary"
replay_policy:
| "resume_from_checkpoint"
| "replay_with_cached_side_effects"
side_effect_idempotency_key_required: boolean
pending_human_signal_policy: {
durable_wait: true
timeout_policy: "none" | "deadline" | "business_hours"
decision_log_required: true
}
crash_recovery_policy: {
resume_incomplete_modules: boolean
never_repeat_committed_side_effects: boolean
require_policy_recheck_on_resume: boolean
}
schema_version: "1.0"
}
```
### §14.3 Human gates and holds
Human review, hold, delay, and signal mechanics should remain graph-visible. Task Agent may explain or propose them, but must not create hidden waits.
### §14.4 Side-effect idempotency
Any email send, file write, calendar write, PACER download, webhook POST, task output send, or other side-effecting operation should carry an idempotency key and policy snapshot where supported.
---
## §15 — Preflight Report
### §15.1 Purpose
Preflight is not a separate duplicate simulator. It is the orchestrated report that unifies validation, test/simulation, capability readiness, policy checks, artifact policy preview, cost estimates, context readiness, and checkpoint coverage.
### §15.2 TaskPreflightReport
```ts
TaskPreflightReport {
report_id: string
task_id?: string
blueprint_id?: string
graph_version?: number
blueprint_version?: string
static_validation: TaskGraphValidationResult
input_readiness: InputReadinessReport
capability_readiness: CapabilityReadinessReport
doc24_context_preview: ContextPreviewReport
policy_and_side_effects: PolicyRiskReport
artifact_intake_preview: ArtifactIntakePreview
cost_estimate: CostEstimate
duration_estimate: DurationEstimate
checkpoint_coverage: CheckpointCoverageReport
test_results: ModuleTestResult[]
simulation_result?: TaskSimulationSummary
blocking_issues: ValidationIssue[]
warnings: ValidationIssue[]
recommended_fixes: TaskFixProposal[]
generated_at: string
schema_version: "1.0"
}
```
### §15.3 User-facing preflight example
```text
Preflight Result: Warning
Ready:
- Graph validates.
- Required source documents selected.
- DOC25 retrieval available.
Warnings:
- PACER requested by module 3 but not allowed in module policy.
- Final email output is configured but no human approval gate exists.
- Estimated cost is $18-$32, above usual threshold.
- Damages memo output is keep-with-run, but similar outputs are often reused.
Recommended fixes:
[Allow PACER for module 3]
[Add human review before email]
[Set cost cap to $25]
[Make damages memo findable with summary]
```
---
## §16 — Task Assessment
### §16.1 Purpose
Task Assessment is the unified post-run and on-demand evaluation surface for task quality, failures, improvements, substantive observations, graph/config issues, learning signals, and template/preset recommendations.
It replaces the narrower “failure playbook” concept.
### §16.2 TaskAssessment schema
```ts
TaskAssessment {
assessment_id: string
task_id: string
run_id?: string
blueprint_id?: string
graph_version: number
template_id?: string
outcome_summary: string
execution_findings: ExecutionFinding[]
quality_findings: QualityFinding[]
capability_findings: CapabilityFinding[]
memory_context_findings: MemoryContextFinding[]
artifact_findings: ArtifactFinding[]
cost_latency_findings: CostLatencyFinding[]
user_feedback_findings: UserFeedbackFinding[]
substantive_findings: SubstantiveFinding[]
template_selection_findings: TemplateSelectionFinding[]
preset_findings: PresetFinding[]
design_learning_findings: DesignLearningFinding[]
recommended_actions: TaskAssessmentAction[]
graph_diff_proposals: GraphDiff[]
blueprint_diff_proposals: BlueprintDiff[]
module_config_patch_proposals: ModuleConfigPatch[]
capability_binding_patch_proposals: CapabilityBindingPatch[]
prompt_or_preset_patch_proposals: PromptPatch[]
artifact_policy_patch_proposals: ArtifactPolicyPatch[]
template_update_proposal_refs: string[]
design_improvement_proposal_refs: string[]
report_ref: StorageRef
created_at: string
schema_version: "1.0"
}
```
### §16.3 Inputs to Task Assessment
Task Assessment consumes:
- TaskTraceSpan tree;
- run logs and checkpoints;
- artifact index;
- prompt/context snapshots;
- capability snapshots;
- policy decisions;
- sub-agent traces;
- DOC72 TaskRun memory;
- DOC25 ingestion results;
- DOC73 library ingestion results;
- Judge/Experiment/Claim Extractor results;
- Addenda A eval traces;
- DOC8 friction events;
- BDSM utility bundles;
- user feedback;
- manual edits after output;
- output usage/open/bookmark/downstream-citation signals.
### §16.4 User-facing assessment example
```text
Task Assessment — Marex Draft Complaint Run
Outcome:
Completed. Final complaint generated. Red-team found 6 issues.
What went well:
- Fact extraction was strong.
- Damages memo was useful and reused downstream.
- Red-team caught missing scienter allegations.
Problems:
- Complaint drafter did not receive full damages memo.
- PACER capability was mounted but unused.
- Red-team judge used summarized source docs; confidence lower than expected.
- Final output should have been saved as work product/document.
Recommended changes:
1. Wire damages memo as explicit context into complaint drafter.
2. Make damages memo findable with summary by default.
3. Remove PACER from this task unless docket entries are requested.
4. Add human review after red-team revision.
5. Save final complaint as DOC25 document/work_product.
[Apply selected changes] [Generate graph diff] [Save template update proposal]
```
### §16.5 Assessment-generated learning
Task Assessment may emit:
- TaskDesignLearningSignal;
- TemplateUpdateProposal;
- ModulePresetSemanticCard update candidate;
- TaskDesignCase candidate;
- TaskDesignPatternCard update candidate;
- CapabilityUseObservation;
- ArtifactCandidateRule update candidate;
- QuestionPolicy update candidate;
- TaskAgentInstructionCandidate.
Material changes remain proposal-gated.
---
## §16A — Task review and portfolio assessment
### §16A.1 Ask Task Agent in task canvas
Task editor UI SHOULD expose:
```text
Ask Task Agent
- Explain this graph
- Review this task
- Suggest missing steps
- Check wiring
- Check context routing
- Check capability bindings
- Check memory/artifact policy
- Compare to template
- Run preflight
- Improve selected prompt
- Test selected prompt variants
```
This is a side panel / assistant affordance, not a DOC23 module.
### §16A.2 Portfolio-level review
User command:
```text
Elnor, look at my tasks and suggest improvements.
```
Task Agent may perform a portfolio assessment.
```ts
export type TaskPortfolioAssessment = {
assessment_id: string;
task_scope:
| "all_tasks"
| "active_tasks"
| "selected_tasks"
| "template_library"
| "recent_failed_runs"
| "prompt_quality"
| "task_invocation_policy";
goals_considered: GoalRef[];
tasks_reviewed: string[];
templates_reviewed: string[];
module_presets_reviewed: string[];
findings: TaskPortfolioFinding[];
recommended_actions: TaskPortfolioRecommendation[];
high_value_candidates: string[];
low_value_or_duplicate_tasks: string[];
stale_templates: string[];
missing_blueprints: string[];
capability_binding_issues: string[];
artifact_policy_issues: string[];
evaluation_coverage_issues: string[];
prompt_quality_issues: string[];
report_ref: StorageRef;
schema_version: "1.0";
};
```
### §16A.3 Optional goal/stakes intake
For portfolio review or complex task creation, Task Agent MAY ask high-value questions such as:
```text
What are your goals for these tasks?
Are these high-stakes or convenience workflows?
Do you care more about speed, quality, auditability, cost, or reuse?
Which tasks do you actually use?
Do you want more automation or more review gates?
```
BDSM/DOC8 question utility should learn whether these questions are useful or annoying by context class.
---
## §17 — Evaluation, datasets, and promotion safety
### §17.1 Purpose
Self-improvement of tasks, prompts, templates, presets, and Task Agent behavior requires eval datasets, traces, promotion ledgers, rollback, and drift detection.
### §17.2 TaskEvalDataset
```ts
TaskEvalDataset {
dataset_id: string
source:
| "manual"
| "production_trace"
| "experiment"
| "judge_failure"
| "user_label"
| "task_creation_eval"
| "task_assessment"
examples: TaskEvalExampleRef[]
task_id?: string
module_id?: string
template_id?: string
frozen_at: string
input_fingerprint_hash: string
schema_version: "1.0"
}
```
### §17.3 TaskEvalTrace
```ts
TaskEvalTrace {
eval_trace_id: string
run_id?: string
design_session_id?: string
module_activation_ref?: string
prompt_snapshot_ref?: StorageRef
output_ref?: StorageRef
graph_ref?: StorageRef
blueprint_ref?: StorageRef
judge_score_refs: string[]
human_label_refs: string[]
capability_snapshot_ref?: StorageRef
policy_snapshot_ref?: StorageRef
schema_version: "1.0"
}
```
### §17.4 PromotionLedgerEntry
```ts
PromotionLedgerEntry {
promotion_id: string
target_kind:
| "module_instruction"
| "task_template"
| "module_preset"
| "task_blueprint"
| "procedure_binding"
| "capability_binding"
| "task_agent_instruction"
| "pattern_card"
| "semantic_card"
prior_hash: string
candidate_hash: string
validation_dataset_id: string
validation_score_summary: object
approved_by: "user" | "architect" | "policy_auto" | "experiment"
rollback_ref: StorageRef
post_promotion_monitor_id: string
created_at: string
schema_version: "1.0"
}
```
### §17.5 R5/R6 alignment
This addendum should align with Addenda A's R5/R6 eval/optimization substrate rather than creating a parallel prompt optimization system. Task-specific datasets and promotion entries should use the same owner-doc promotion/rollback discipline where possible.
---
## §17A — Unified prompt improvement and evaluation
### §17A.1 Product rule: one prompt-help capability
There should be one user-facing prompt-help concept:
```text
Improve Prompt
```
The system may offer depth levels:
```ts
export type PromptImprovementMode =
| "quick_check"
| "rewrite"
| "generate_variants"
| "test_variants"
| "optimize_and_propose_promotion";
```
The user SHOULD NOT have to choose among Prompt Advisor, Prompt Lab, Promptolution, DSPy, Experiment, or Judge as product concepts.
### §17A.2 No separate primary Prompt Lab UI
A separate user-facing Prompt Lab is not required in the initial system. Prompt Lab, if retained from DOC17, is a backend/offline evaluation lane for reusable prompt artifacts, not the main user workflow.
Primary access points:
```text
Chat composer: Improve Prompt
Prompt recipe editor: Improve / Rewrite / Test
Task module prompt editor: Ask Task Agent -> Improve/Test Prompt
Run Inspector: Prompt quality findings
Task Assessment: Prompt findings and update proposals
Prompt artifact/library page: optional management/history surface only
```
### §17A.3 DOC17 Prompt Advisor as service
DOC17 Prompt Advisor remains useful as a lightweight service:
```ts
prompt_advisor.improve()
prompt_advisor.rewrite()
```
It provides:
```text
deterministic prompt-gap analysis;
optional AI rewrite through controlled rewrite template;
feedback events for viewed/requested/accepted/dismissed/edited rewrites;
suggestions for prompt recipes or overlays where relevant.
```
It is not a separate agent and not the task-aware prompt orchestrator.
User-facing quick-flow behavior:
```text
Improve Prompt:
- run deterministic gap analysis;
- show gaps/hints/suggested fixes;
- optionally offer Rewrite;
- if rewritten, show what changed;
- user may Use, Dismiss, Edit, or Save as Prompt Recipe.
```
Prompt Advisor feedback events (`viewed`, `rewrite_requested`, `rewrite_accepted`, `rewrite_dismissed`, `accepted_then_edited`, `saved_as_recipe`) feed the shared prompt-learning substrate and MUST NOT by themselves mutate active prompts.
### §17A.4 Task Agent as task-aware prompt orchestrator
Task Agent orchestrates prompt improvement when the prompt belongs to:
```text
DOC23 task module instruction
DOC23 judge rubric
DOC23 red-team prompt
DOC23 output contract
DOC23 task template
DOC23 module preset
prompt evaluation task
reusable prompt being converted into module preset/template component
```
Task Agent may call DOC17 Prompt Advisor service for quick gap analysis/rewrite, but Task Agent owns task-context interpretation, downstream compatibility analysis, graph-aware recommendations, test-job creation, and task/template/preset update proposals.
### §17A.5 Prompt surfaces
```ts
export type PromptSurface =
| "chat_composer"
| "prompt_recipe"
| "overlay"
| "room_role_prompt"
| "red_team_prompt"
| "doc23_module_instruction"
| "doc23_judge_rubric"
| "doc23_output_contract"
| "task_template_prompt"
| "module_preset_prompt";
```
### §17A.6 PromptImprovementRequest
```ts
export type PromptImprovementRequest = {
request_id: string;
source_surface: PromptSurface;
prompt_ref?: StorageRef;
prompt_text_inline?: string;
requested_mode: PromptImprovementMode;
context_packet?: {
task_id?: string;
task_template_id?: string;
module_id?: string;
module_type?: string;
blueprint_step_id?: string;
downstream_consumers?: string[];
expected_output_contract?: string;
judge_or_eval_refs?: string[];
};
orchestrator:
| "doc17_prompt_advisor_service"
| "task_agent"
| "doc23_prompt_evaluation_task";
schema_version: "1.0";
};
```
Routing rules:
```text
chat_composer + quick_check/rewrite:
DOC17 Prompt Advisor service or direct Elnor reasoning.
prompt_recipe + quick_check/rewrite:
DOC17 Prompt Advisor service.
doc23 module/judge/red-team/task-template prompt:
Task Agent orchestrates; may call DOC17 Prompt Advisor service.
generate_variants/test_variants/optimize:
Task Agent orchestrates tracked prompt-evaluation task/job.
```
### §17A.7 Prompt artifact taxonomy
```text
Prompt Recipe:
DOC17-owned reusable prompt text.
Module Prompt:
DOC23-owned instruction text attached to a DOC23 module.
Module Preset Prompt:
Prompt included in DOC23 module preset with config/model/tools/output policy.
Tested Prompt:
Not a separate prompt type; a prompt artifact with evaluation evidence.
Task Template Prompt:
Prompt embedded in a reusable DOC23 task template.
```
Lifecycle:
```text
free prompt
-> prompt recipe
-> tested prompt recipe
-> module prompt
-> module preset
-> task template component
```
### §17A.8 TaskPromptArtifact
```ts
export type TaskPromptArtifact = {
prompt_artifact_id: string;
owner_surface:
| "doc23_task_module"
| "doc23_task_template"
| "doc23_module_preset"
| "doc17_prompt_recipe"
| "doc17_overlay"
| "doc12_room_role"
| "doc14_red_team"
| "propa_extraction_prompt";
prompt_role:
| "instruction"
| "rubric"
| "output_contract"
| "tool_policy"
| "context_policy"
| "judge_dimension"
| "red_team_role"
| "candidate_variant";
prompt_text_ref: StorageRef;
prompt_text_hash: string;
prompt_recipe_id?: string;
module_id?: string;
task_id?: string;
task_template_id?: string;
module_preset_id?: string;
runtime_truth_ref?: StorageRef;
evaluation_refs: StorageRef[];
tested_status:
| "untested"
| "tested"
| "candidate_winner"
| "promoted"
| "rejected";
promotion_state:
| "draft"
| "active"
| "candidate"
| "promoted"
| "rejected"
| "rolled_back";
schema_version: "1.0";
};
```
### §17A.9 Prompt quality assessment
```ts
export type TaskPromptQualityAssessment = {
assessment_id: string;
task_id?: string;
module_id?: string;
prompt_surface:
| "module_instruction"
| "judge_rubric"
| "red_team_prompt"
| "output_contract"
| "task_template_instruction"
| "module_preset_prompt";
current_prompt_hash: string;
quality_findings: PromptQualityFinding[];
suggested_prompt_ref?: StorageRef;
evidence_refs: StorageRef[];
confidence: number;
recommended_next_step:
| "accept"
| "manual_review"
| "run_judge_eval"
| "run_experiment"
| "run_dspy_candidate_generation"
| "defer";
schema_version: "1.0";
};
export type PromptQualityFinding = {
finding_kind:
| "missing_goal"
| "missing_source_scope"
| "missing_output_contract"
| "missing_tool_policy"
| "missing_evidence_policy"
| "missing_uncertainty_policy"
| "ambiguous_success_criteria"
| "downstream_incompatibility"
| "overconstrained"
| "underconstrained"
| "unsafe_side_effect_instruction"
| "judge_unscorable";
severity: "low" | "medium" | "high";
explanation: string;
suggested_fix: string;
schema_version: "1.0";
};
```
### §17A.10 Prompt design guide cards in TKP
TKP SHOULD include `PromptDesignGuideCard` records for prompt surfaces.
```ts
export type PromptDesignGuideCard = {
card_id: string;
prompt_surface:
| "agent_task_instruction"
| "judge_rubric"
| "red_team_prompt"
| "claim_extractor_instruction"
| "experiment_variant_instruction"
| "task_agent_design_question"
| "output_contract"
| "tool_use_instruction"
| "document_review_instruction";
purpose: string;
when_to_use: string[];
when_not_to_use: string[];
prompt_quality_principles: string[];
common_failures: string[];
repair_patterns: string[];
example_before?: string;
example_after?: string;
related_modules: string[];
related_eval_methods: string[];
source_refs: SpecSourceRef[];
authority_level: "hard_rule" | "strong_default" | "soft_heuristic" | "example";
schema_version: "1.0";
};
```
Recommended card families:
```text
Agent Task instruction for legal/professional analysis
Judge rubric prompt
Red-team prompt
Output contract
Tool-use/source-grounding prompt
Document-review prompt
Second-pass deepening prompt
Experiment variant prompt
Prompt-evaluation task prompt
```
---
## §17B — Prompt candidate generation and Experiment integration
### §17B.1 No Promptolution-specific module
There shall be no Promptolution-specific module in the initial design. Promptolution is a candidate-generation method, not a graph module and not a user-facing surface.
### §17B.2 Candidate-generation methods
```ts
export type PromptCandidateGenerationMethod =
| "task_agent_structured_rewrite"
| "doc17_prompt_advisor_rewrite"
| "promptolution"
| "dspy_gepa"
| "manual_variant"
| "template_based_variant"
| "prior_winner";
export type PromptCandidateGenerationMethodDescriptor = {
method: PromptCandidateGenerationMethod;
requires_llm_call: boolean;
operative_context: "quick_help" | "tracked_evaluation_only" | "reserved_r5" | "manual_only";
availability_state: "operative" | "reserved" | "unavailable" | "degraded";
notes?: string;
schema_version: "1.0";
};
export type PromptCandidateMethodMetadata = {
method: PromptCandidateGenerationMethod;
execution_class:
| "deterministic"
| "llm_assisted"
| "optimizer"
| "manual";
requires_llm_call: boolean;
requires_eval_dataset: boolean;
requires_judge_or_scorer: boolean;
permitted_runtime:
| "quick_help"
| "tracked_evaluation_only"
| "r5_optimization_only";
expected_cost_class: "none" | "low" | "medium" | "high";
expected_latency_class: "instant" | "short" | "long" | "background";
operative_status:
| "operative"
| "reserved_r5"
| "unavailable"
| "degraded";
schema_version: "1.0";
};
```
`PromptCandidateGenerationMethodDescriptor` is the compact UI/availability descriptor. `PromptCandidateMethodMetadata` is the fuller operational descriptor used by Experiment, Task Agent, and evaluation reporting to distinguish deterministic candidate generation from LLM-assisted rewriting and optimizer-backed generation.
Rules:
```text
Promptolution is treated as a candidate-generation method. If the implementation is deterministic and does not require a model call, mark requires_llm_call=false. It remains tracked-evaluation-only unless the architect later approves live use.
DSPy/GEPA is reserved for R5 optimization substrate.
Manual variants require no generation call and can feed Experiment directly.
Default method metadata guidance:
```text
manual_variant:
execution_class = manual; requires_llm_call = false.
task_agent_structured_rewrite:
execution_class = llm_assisted; requires_llm_call = true.
doc17_prompt_advisor_rewrite:
execution_class = llm_assisted for rewrite, deterministic for gap analysis.
promptolution:
execution_class = deterministic or llm_assisted depending on implementation;
if deterministic, it may be a cheap candidate generator;
still no separate module or UI.
dspy_gepa:
execution_class = optimizer;
permitted_runtime = r5_optimization_only until Addenda A R5 lands.
```
```
### §17B.3 PromptCandidateBundle
```ts
export type PromptCandidateBundle = {
bundle_id: string;
source_prompt_ref: StorageRef;
target_prompt_surface:
| "chat_prompt"
| "prompt_recipe"
| "module_instruction"
| "judge_rubric"
| "red_team_prompt"
| "output_contract";
candidates: PromptCandidate[];
generation_methods_used: PromptCandidateGenerationMethod[];
generation_context_ref?: StorageRef;
created_by:
| "user"
| "task_agent"
| "experiment_module"
| "prompt_advisor_service";
schema_version: "1.0";
};
export type PromptCandidate = {
candidate_id: string;
label: string;
prompt_text_ref: StorageRef;
prompt_hash: string;
generation_method: PromptCandidateGenerationMethod;
rationale?: string;
preserved_constraints: string[];
changed_elements_summary: string[];
schema_version: "1.0";
};
```
### §17B.4 Experiment module integration
The `system.experiment` module should be the initial home for prompt candidate testing. Candidate generation may be baked into Experiment via candidate sources rather than a separate graph module.
```ts
export type ExperimentVariantInstructionSource =
| "manual"
| "same_as_baseline"
| "task_agent_candidate"
| "doc17_prompt_advisor_candidate"
| "promptolution_candidate"
| "dspy_candidate"
| "prompt_recipe"
| "prior_winner";
```
```ts
export type PromptCandidateGenerationConfig = {
enabled: boolean;
methods: PromptCandidateGenerationMethod[];
candidate_count: number;
target_prompt_surface:
| "chat_prompt"
| "prompt_recipe"
| "module_instruction"
| "judge_rubric"
| "red_team_prompt"
| "output_contract";
preserve_constraints: boolean;
preserve_output_contract: boolean;
preserve_tool_policy: boolean;
generation_status_by_method: Record<
PromptCandidateGenerationMethod,
"operative" | "reserved_r5" | "unavailable" | "degraded"
>;
generated_candidate_bundle_ref?: StorageRef;
schema_version: "1.0";
};
```
Experiment may consume:
```ts
export type ExperimentPromptCandidateIntegration = {
prompt_candidate_bundle_ref?: StorageRef;
prompt_candidate_generation_config?: PromptCandidateGenerationConfig;
variants_instruction_source?: Record<string, ExperimentVariantInstructionSource>;
schema_version: "1.0";
};
export type ExperimentPromptCandidateMode =
| "consume_existing_candidate_bundle"
| "generate_candidates_then_test"
| "manual_variants_only";
export type ExperimentPromptCandidateSourcePlan = {
mode: ExperimentPromptCandidateMode;
source_prompt_ref?: StorageRef;
existing_candidate_bundle_ref?: StorageRef;
generation_config?: PromptCandidateGenerationConfig;
method_metadata: PromptCandidateMethodMetadata[];
schema_version: "1.0";
};
```
This supports three workflows: Task Agent generates candidates and Experiment tests them; Experiment generates candidates internally and tests them; or the user supplies manual variants and Experiment only compares them.
### §17B.5 DSPy / GEPA status
DSPy/GEPA may appear as a reserved candidate-generation or optimization method in prompt-evaluation schemas, but it is non-operative until the Addenda A R5 optimization substrate lands. R4.1 must not implement unsafe DSPy-lite behavior.
### §17B.6 Prompt Candidate Generator module reserved
A future generic module MAY be reserved:
```text
step.prompt_candidate_generator
```
Status:
```text
R5/R6 optional; not operative in initial Addenda B.
```
Purpose:
```text
Generate candidate prompt variants when manual graph composition proves valuable.
```
Initial preference:
```text
Bake candidate generation into Experiment first; reserve separate module only if manual graph construction requires it.
```
---
## §17C — Prompt Evaluation Task
### §17C.1 Deep prompt testing as a tracked task
Deep prompt testing should be a saved/tracked DOC23 task or task-native evaluation job. It should use the existing/forthcoming Experiment, Judge, Claim Extractor, DSPy/GEPA, and Task Assessment infrastructure.
### §17C.2 PromptEvaluationTask schema
```ts
export type PromptEvaluationTask = {
evaluation_task_id: string;
source_prompt_ref: StorageRef;
prompt_kind:
| "prompt_recipe"
| "module_prompt"
| "judge_rubric"
| "red_team_prompt"
| "output_contract";
evaluation_goal: string;
candidate_generation_methods: PromptCandidateGenerationMethod[];
candidate_prompt_refs: StorageRef[];
eval_dataset_ref?: StorageRef;
judge_rubric_refs: StorageRef[];
experiment_run_refs: StorageRef[];
dspy_job_refs: StorageRef[];
result_summary_ref?: StorageRef;
recommendation:
| "keep_original"
| "adopt_candidate"
| "save_as_recipe"
| "save_as_module_preset"
| "propose_template_update"
| "needs_more_examples"
| "manual_review";
promotion_proposal_ref?: StorageRef;
schema_version: "1.0";
};
```
### §17C.3 User-facing flow
```text
User: Refine my red-team prompt for effectiveness.
Elnor:
I can do a quick rewrite, or run a tracked prompt test with variants and judge scoring.
[Quick rewrite] [Run tracked prompt test]
```
If tracked:
```text
Task Agent creates Prompt Evaluation Task.
Experiment runs baseline + candidates.
Judge scores outputs.
Task Assessment summarizes winner, failure modes, and proposed save/update action.
```
### §17C.4 No silent mutation
Prompt updates MUST be proposals with prompt hashes, evidence, approval policy, and rollback where appropriate. No live task prompt, module preset prompt, prompt recipe, or template prompt may silently rewrite itself based on prompt-evaluation results.
---
## §17D — Prompt learning signals
### §17D.1 TaskPromptEditLearningSignal
```ts
export type TaskPromptEditLearningSignal = {
signal_id: string;
design_session_id: string;
task_id?: string;
module_id?: string;
edit_kind:
| "added_output_contract"
| "added_source_scope"
| "added_deeper_analysis_instruction"
| "added_tool_policy"
| "added_red_team_instruction"
| "tightened_rubric"
| "changed_tone"
| "changed_format"
| "removed_overconstraint"
| "removed_unwanted_question";
before_prompt_hash: string;
after_prompt_hash: string;
interpreted_lesson: string;
evidence_ref: StorageRef;
recommended_update_kind:
| "prompt_design_guide_card_update"
| "template_prompt_update_proposal"
| "module_preset_prompt_update_proposal"
| "task_agent_instruction_candidate"
| "user_preference_overlay"
| "none";
schema_version: "1.0";
};
```
### §17D.2 Prompt learning inputs
Track:
```text
user accepted prompt rewrite
user rejected prompt rewrite
user edited accepted rewrite
user added output contract
user added “look deeper”
user added source-scope instruction
judge found prompt weak
experiment variant won
prompt caused downstream parser failure
prompt produced reusable artifact
prompt was promoted to recipe/preset/template
```
These signals feed:
```text
Task Agent prompt-design guide cards
Prompt recipe recommendations
Module preset update proposals
Task template update proposals
BDSM question/suggestion utility
DOC8 friction/quality-degradation signals
TaskSystemLearningReport
```
---
## §17E — Unified prompt help UX
Prompt help appears where prompts live: chat composer, prompt recipe editor, task module prompt editor, judge/rubric editor, red-team prompt editor, task template editor, module preset editor, Task Assessment prompt findings, and Run Inspector prompt evidence. Prompt help does not require the user to visit a separate Prompt Lab page. A future advanced history/admin surface may list prompt-evaluation jobs and promotion history, but that surface is not the ordinary workflow.
### §17E.1 Chat / composer quick help
```text
User: Is this a good prompt?
Elnor: It is missing a clear output format and success criteria. Here is a stronger version...
```
Backend:
```text
DOC17 Prompt Advisor service or direct Elnor reasoning.
No Task Agent unless user asks for testing, task context, saving, or deep improvement.
```
### §17E.2 Task module prompt editor
```text
Ask Task Agent
- Improve prompt
- Test prompt variants
- Explain prompt role
- Check downstream compatibility
```
Backend:
```text
Task Agent + DOC17 Prompt Advisor service + TKP + graph context.
```
### §17E.3 Prompt recipe surface
```text
Prompt Recipe actions:
Improve
Rewrite
Generate variants
Test variants
Use in task
Convert to module preset
```
Backend:
```text
DOC17 owns recipe.
Task Agent orchestrates deep testing or task conversion.
```
### §17E.4 Task Assessment prompt findings
Task Assessment SHOULD include prompt-quality findings when relevant:
```text
Prompt Quality
- weak module instructions
- missing output contracts
- unscorable judge rubrics
- downstream parsing failures
- wrong tool-use instructions
- omitted source-grounding
- shallow one-pass analysis where second-pass deepening is preferred
```
```ts
export type TaskAssessmentPromptSection = {
prompt_artifacts_reviewed: string[];
prompt_quality_assessments: TaskPromptQualityAssessment[];
prompt_eval_job_refs: string[];
suggested_prompt_updates: PromptImprovementProposal[];
schema_version: "1.0";
};
```
---
## §18 — Library / DOC73 integration
### §18.1 Purpose
Task outputs and downloaded/received documents may become library/corpus material when the task purpose or user instruction requires deep extraction into bounded knowledge.
### §18.2 Task output to library source class
DOC23 must support a task-output-to-library source class for DOC73 integration.
```ts
TaskOutputLibraryIngestionIntent {
intent_id: string
task_id: string
run_id: string
artifact_id: string
source_artifact_ref: StorageRef
target_library_ref: string
ingestion_reason:
| "user_requested"
| "task_template_policy"
| "blueprint_declared_output"
| "assessment_recommended"
| "recurring_gathering_task"
doc25_ingestion_required: boolean
doc25_ingestion_ref?: string
doc73_extraction_profile_ref?: string
policy_snapshot_ref: StorageRef
created_at: string
schema_version: "1.0"
}
```
### §18.3 Library user-facing language
Technical schemas may use corpus/library per DOC73 rules. User-facing Task Agent language should use “library” unless the user uses “corpus” or the UI/debug context requires technical terminology.
### §18.4 DOC73 endpoint rule
Adding to a library is a destination. DOC25 processing is a route before destination, not the destination itself.
---
## §19 — Privacy, incognito, and retention
### §19.1 Purpose
Tasks may handle privileged, personal, confidential, or sensitive material. Addenda B must preserve global EC/PropA/DOC1 privacy controls while adding task-specific memory and artifact behavior.
### §19.2 Effective control order
Task memory/artifact capture is subordinate to:
1. EC global memory/collection/application controls.
2. EC incognito and processing controls.
3. PropA source/sensitivity/visibility policy.
4. DOC1 write gate and memory governance.
5. TaskRunMemoryPolicy.
6. TaskArtifactIntakePolicy.
### §19.3 Incognito UI
```text
Private run options
───────────────────
Memory mode:
[Standard]
[Do not add to Elnor memory]
[Ephemeral test run]
Explanation:
Standard: preserves run audit and selected memories/artifacts.
Do not add to Elnor memory: no optional DOC72/DOC8 learning, but run audit remains.
Ephemeral test run: minimal records only; intended for testing.
```
### §19.4 Retention classes
See §10.9. Retention decisions must be visible in Run Inspector and artifact details.
---
## §19A — Failure modes and degraded behavior
### §19A.1 Task Agent degraded matrix
| Failure | Behavior |
|---|---|
| Task Agent unavailable | Elnor remains direct-first; task canvas/manual editor still works. |
| TKP stale | Task Agent may inspect/retrieve; task design recommendations are marked degraded or blocked depending on drift severity. |
| DOC11 named-agent unavailable | Use deterministic tools/read models; no background Task Agent design session. |
| OpenClaw `sessions_spawn` unavailable | No async/background Task Agent; synchronous Gateway call only if available. |
| Fallback model used | Receipt records primary failure and fallback model; high-risk graph/prompt changes require validation. |
| Task Agent proposal invalid | Validator/preflight blocks save; Task Agent may revise up to configured max iterations; then reports unresolved issues. |
| Capability registry unavailable | No capability-binding recommendations beyond saved template metadata. |
| DOC72 memory unavailable | Design proceeds from templates/TKP only; no prior-run learning. |
| DOC8/BDSM unavailable | No learned suppression/preference; static policy only. |
| DOC17 Prompt Advisor unavailable | Task Agent may use TKP prompt guide cards and direct reasoning; no Prompt Advisor service receipt. |
| Addenda A Experiment/Judge unavailable | Deep prompt testing unavailable; quick prompt help remains available. |
| User rejects task suggestion repeatedly | BDSM/DOC8 suppress similar suggestions by context class. |
| Hidden graph would be created | Forbidden; must create visible receipt and graph link. |
### §19A.2 No-phantom controls
Every visible Task Agent, task suggestion, prompt-improvement, prompt-testing, task-assessment, or task-inspection control MUST map to a real route/read model/command or an explicit disabled/degraded/no-op row with reason code.
---
## §20 — UI/UX surfaces
### §20.1 Task Agent surfaces
Required user-facing Task Agent surfaces:
1. **Create from Description** — natural-language task creation.
2. **Task Proposal Panel** — blueprint, templates, graph, capabilities, outputs, memory, risks.
3. **Template Match Panel** — reuse/customize/fork/new decision.
4. **Blueprint Editor** — user-readable intent/process editor.
5. **Graph Diff Review** — proposed graph/config changes.
6. **Task Agent Run Lens** — conversational and visual task progress/status.
7. **Artifact Finder** — search outputs across runs/tasks/templates/matters.
8. **Task Assessment Report** — post-run and on-demand report.
9. **Knowledge Pack Inspector** — active pack/version/drift/eval state.
10. **Task Design Learning Review Queue** — proposed template/preset/pattern/preference updates.
### §20.2 Creation UI states
```text
Create Task
────────────────────────
[Describe what you want Elnor to do]
After submit:
- Intent summary
- Existing templates found
- Recommended setup
- Optional questions
- Blueprint preview
- Graph preview
- Capability/tool plan
- Output/memory plan
- Preflight result
```
### §20.3 Run progress conversational examples
```text
User: Where are we in the Marex draft task?
Task Agent: The task is 5 of 7 steps complete. It finished source classification, fact extraction, damages memo, and initial complaint draft. It is currently waiting on the red-team module. The next step is revision. The latest output is the initial complaint draft, and the damages memo is available here: [artifact].
```
```text
User: Send me the results of the draft task ran for Marex.
Task Agent: I found the completed Marex Draft Complaint run from May 2. Final outputs: complaint draft, damages memo, red-team report, revision summary. The final complaint is saved as [work product link]. I can also show the intermediate damages memo.
```
### §20.4 Artifact intake UI terminology
Use user-friendly labels as in §11.2. Avoid exposing “candidate for memory review” as the primary label; use “Ask whether to make findable” or “Suggest saving for later.”
### §20.5 Task Assessment UI
Task Assessment UI should show:
- outcome summary;
- what went well;
- what failed or degraded;
- substantive findings;
- graph/config findings;
- capability/tool findings;
- artifact/memory findings;
- recommended changes;
- graph/blueprint diffs;
- apply selected changes controls;
- save as template update proposal;
- learning signals generated.
---
## §20A — UI additions
### §20A.1 Chat chips
Medium/high task opportunity may surface quiet chips:
```text
[Use saved task]
[Run deeper task]
[Ask Task Agent to set this up]
[View matching template]
```
These chips must be suppressible by BDSM/DOC8 learning.
### §20A.2 Task Agent panels
Task surfaces SHOULD include:
```text
Task editor side panel: Ask Task Agent
Run Inspector side panel: Ask Task Agent
Task list: Review my tasks
Template/preset library: Ask Task Agent to compare/improve
Task Assessment report: Apply selected proposals
Prompt editor: Improve/Test prompt
```
### §20A.3 Example — direct answer with optional task
```text
Here is a quick review.
For a deeper tracked review, I can also use your Complaint Evaluation task to produce an issue matrix, red-team report, and revision plan.
[Run deeper saved task]
```
### §20A.4 Example — high-fit task recommendation
```text
This looks like a tracked workflow rather than a quick chat answer. I found your Complaint Evaluation template. I recommend using it because it will preserve the source review, red-team pass, outputs, and task history.
[Use template] [Customize first] [Just answer here]
```
### §20A.5 Example — visible Elnor-created task
```text
I set this up as a saved task because it needs to run independently and preserve outputs. You can inspect or edit the graph before it runs.
[View task] [Edit steps] [Run now]
```
### §20A.6 Example — prompt testing
```text
I can do a quick rewrite, or run a tracked prompt test with variants and judge scoring.
[Quick rewrite] [Run tracked prompt test]
```
---
## §20B — DOC20-integrated modular task surface
### §20B.1 DOC20 shell integration rule
The task modular canvas is a DOC20 content surface. It SHALL render inside the DOC20 unified workspace main view area as a `task` tab, not as a separate application shell.
Required behavior:
1. Opening a task from the Browser, Tasks page, Q Home status card, Task Agent link, or run-history link opens or focuses a DOC20 task tab.
2. The right main-chat column defaults closed for task tabs unless the user previously pinned it open for that tab.
3. The left DOC20 nav rail remains available so the user can switch to Browser, main chat, settings, split view, or other DOC20 surfaces.
4. When the user opens the task page from the Pages navigation, Browser defaults to the Browser sub-tab with the Task filter selected.
5. Task load, save, duplicate, archive, and view actions are unified between the Browser and the Tasks page; Q MUST NOT create parallel task-picking surfaces with inconsistent state.
6. Browser items may be dragged into the modular graph, into a module config/drop zone, or into a Task Segment insertion target when permitted by type and policy.
7. Opening a module output, artifact, input document, delivery receipt, or related work product opens that object in the appropriate DOC20 tab/viewer and preserves the ability to switch back to the task tab using the DOC20 tab bar.
DOC20’s surface-independence principle remains controlling: the task tab is a viewport over EC-owned durable state, not a writer or separate task data owner.
### §20B.2 Blank task-page state
When no task is loaded, the task page SHOULD show:
```text
Task Workspace
────────────────────────────────
[New Task] [Use Saved Task] [Ask Task Agent]
Active / Running
Draft Complaint — Marex running · 7/12 steps · $7.84
Filing Monitor — Paramount scheduled · next run 5:00 PM
Recently used
Complaint Evaluation 18 runs · avg 42m · avg $14.80
Weekly Docket Monitor 9 runs · avg 8m · avg $1.30
Blank graph area:
Drag a saved task, module preset, document, or Task Segment here,
or click New Task to start from a blank graph.
```
The blank state SHOULD not hide active/scheduled work. It SHOULD provide a larger running-task overview than the compact Browser rows.
### §20B.3 Graph Overview Preservation Rule
The broad modular graph/module-card overview remains the primary task-design/status surface. The visual direction of the existing V11 ALT A module blocks is preserved: compact module cards, ports/cables, light status badges, and minimal visual noise.
The graph overview SHOULD show only:
- module name/type/category;
- ports/cables;
- compact execution status;
- compact validation/error badge;
- compact output-available badge;
- compact Judge/Experiment score/winner badge where relevant;
- current/next module highlighting during a selected run.
The graph overview SHOULD NOT show full telemetry, raw prompts, full DOC24 packets, delivery receipts, all artifacts, full cost breakdowns, sub-agent traces, or raw audit data. Those belong in module detail panels, the Run Inspector, Artifacts & Deliveries, Context & Audit, and Task Agent side panel.
### §20B.4 Module detail/config panel run-data block
Every module detail/config panel MUST include a compact `Run Data` block for the selected run, without requiring the user to leave the modular page.
```ts
ModuleRunQuickAccess {
task_id: string
run_id: string
module_id: string
activation_seq?: number
status:
| "not_run"
| "pending"
| "running"
| "waiting"
| "success"
| "error"
| "skipped"
| "cancelled"
last_started_at?: string
last_completed_at?: string
duration_ms?: number
cost_usd?: number
primary_outputs: Array<{
artifact_id: string
display_name: string
artifact_kind: string
open_url: string
}>
inspector_url: string
live_inspector_available: boolean
schema_version: "1.0"
}
```
Illustrative UI:
```text
Run Data
────────
Status: Completed
Finished: May 4, 2026 11:04 AM
Duration: 6m 14s
Cost: $2.18
Outputs
Damages Memo.md [Open]
Analysis.json [Open]
[Open run inspector →]
```
`Open output` opens the artifact in the appropriate DOC20 viewer/tab. `Open run inspector` links to the full run inspector focused on the module activation, not to a separate module-inspector product.
### §20B.5 Graph ⇄ Inspector navigation
Graph view and Run Inspector are two synchronized views of the same task/run state.
```ts
TaskGraphInspectorNavigationState {
task_id: string
run_id?: string
view: "graph" | "inspector"
selected_module_id?: string
selected_activation_seq?: number
selected_artifact_id?: string
graph_viewport_state?: {
zoom: number
pan_x: number
pan_y: number
}
inspector_section?: "run_flow_steps" | "artifacts_deliveries" | "context_audit"
opened_from?: "graph" | "tasks_page" | "home" | "task_agent" | "artifact_link" | "shared_view"
schema_version: "1.0"
}
```
Rules:
1. Opening the inspector from a selected module focuses the corresponding activation row.
2. Returning from the inspector to graph focuses/highlights the same module and restores graph viewport state when available.
3. If the inspector is already open as a side/top panel within the graph page, the graph-back arrow is redundant and MAY be hidden; Close returns to the graph.
4. If the inspector is opened from a standalone Tasks/History/Home/Task Agent link without the graph visible, a graph-back/open-graph affordance opens the modular task tab and focuses the selected module/run.
5. Artifact links opened from the inspector retain a backlink to task/run/module context.
---
## §20C — Run Inspector, Run Flow, Artifacts, Deliveries, and Context Audit
### §20C.1 Consolidated Run Inspector sections
Run Inspector is a focused detail panel/page, not an 11-tab observability product. It SHALL present:
```text
Header / Summary
1. Run Flow & Steps
2. Artifacts & Deliveries
3. Context & Audit
```
Judge/Experiment results appear first on the relevant graph modules and module detail panels. Run Inspector may summarize them inline when present but SHOULD NOT create a permanent top-level “Quality” tab for ordinary tasks.
### §20C.2 Header / Summary
Header shows the key run facts:
```text
Marex — Draft Complaint Run
Completed · May 4, 2026 · 49m 12s · $18.37
12 modules · 11 completed · 1 skipped · 0 failed
Final output: Revised Complaint Draft
Delivered: emailed to will@will.com · saved to /Marex/Drafts/
[Open final output] [Ask Task Agent] [Compare] [Export audit]
```
Header MUST include dates, run duration, cost when available, final output links, and delivery summary when any output was delivered externally or saved to a durable destination.
### §20C.3 Run Flow & Steps — not a simple timeline
The primary inspector section is `Run Flow & Steps`, not merely “Timeline.” DOC23 graphs may branch, loop, fan out, emit multiple outputs, and reactivate modules multiple times. The UI must therefore be activation-aware and flow-aware rather than assuming a straight linear timeline.
```ts
TaskRunStepRow {
task_id: string
run_id: string
module_id: string
activation_seq: number
display_order: number
module_label: string
module_type: string
branch_label?: string
loop_iteration?: number
output_port_id?: string
status:
| "pending"
| "running"
| "waiting"
| "success"
| "error"
| "cancelled"
| "skipped"
started_at?: string
completed_at?: string
duration_ms?: number
cost_usd?: number
agent_summary?: {
agent_id?: string
model_ref?: string
think_level?: string
fallback_used?: boolean
}
input_summary: string[]
output_summary: string[]
artifact_ids: string[]
delivery_ids: string[]
has_prompt_context_snapshot: boolean
has_doc24_context_packet: boolean
has_subagent_trace: boolean
has_error: boolean
has_evaluation: boolean
expandable_detail_ref: StorageRef
schema_version: "1.0"
}
```
Run Flow & Steps MUST be searchable and filterable. It SHOULD support sorting for power users, but default order should be execution/display order that preserves branch/loop structure. Rows must show date/time, duration, cost, status, and activation sequence when relevant.
### §20C.4 Current/next written status
A single read model powers written run status across Q Home, Tasks page, graph header, notifications, and Task Agent answers.
```ts
TaskRunNarrativeStatus {
task_id: string
run_id: string
one_line_status: string
current_work_summary: string
next_work_summary: string
recent_completion_summary: string
current_module_ids: string[]
next_module_ids: string[]
waiting_reasons: string[]
generated_from_state_version: string
generated_at: string
schema_version: "1.0"
}
```
Example:
```text
The Marex Draft Complaint task completed source intake and damages analysis. Red-team review is running now. Final revision and judge review will run after the red-team report finishes.
```
### §20C.5 Artifacts & Deliveries
Artifacts get a dedicated section. It MUST show outputs, intermediate artifacts, consumed documents, receipts, and delivery destinations.
```ts
TaskDeliveryRecord {
delivery_id: string
task_id: string
run_id: string
module_id: string
activation_seq: number
artifact_id: string
artifact_ref: StorageRef
delivery_kind:
| "email"
| "imessage"
| "chat"
| "file_save"
| "webhook"
| "task_output"
| "calendar"
| "notification"
| "work_product_save"
| "doc73_library_add"
| "doc72_memory_reference"
status:
| "pending"
| "delivered"
| "failed"
| "blocked_by_policy"
| "cancelled"
| "draft_only"
destination_summary: string
delivered_to?: string[]
delivered_at?: string
external_message_id?: string
saved_path?: string
open_url?: string
receipt_ref?: StorageRef
policy_approval_ref?: StorageRef
schema_version: "1.0"
}
```
```ts
TaskDeliverablesReadModel {
task_id: string
run_id: string
final_outputs: TaskArtifactIndexEntry[]
intermediate_outputs: TaskArtifactIndexEntry[]
input_documents: TaskArtifactIndexEntry[]
receipts: TaskArtifactIndexEntry[]
deliveries: TaskDeliveryRecord[]
undelivered_outputs: TaskArtifactIndexEntry[]
failed_deliveries: TaskDeliveryRecord[]
schema_version: "1.0"
}
```
Illustrative UI:
```text
Final outputs
Revised Complaint Draft.docx
Produced by: Final Output module
Saved to: /Marex/Drafts/Revised Complaint Draft.docx
Delivered: emailed to will@will.com on May 4, 2026 at 11:31 AM
[Open] [Preview] [Copy link] [Save as work product] [Add to library]
Intermediate outputs
Damages Memo.md
Produced by: Damages Memo module
Used by: Complaint Drafter, Red-Team Reviewer
Intake: kept with run; can make findable
[Open] [Make findable] [Save as work product]
Inputs / consumed documents
Complaint.pdf
DOC25 processed
Used by: Source Intake, Fact Extraction, Complaint Drafter, Judge
[Open document] [View pages used]
Receipts
Email delivery receipt
To: will@will.com
Sent: May 4, 2026 11:31 AM
[Open receipt]
```
### §20C.6 Artifact/document context menu
Right-clicking any document/artifact/output/receipt link SHOULD expose a consistent context menu when allowed by environment and permissions:
```text
Open in Q
Show in Browser
Show in Finder
Save As…
Copy Link
Open in Native App
Add to Library
Save as Work Product
Make Findable
Use in New Task
Ask Task Agent about this
```
Native actions must degrade honestly in non-Electron/web contexts or when OS permissions are unavailable.
### §20C.7 Context & Audit inspector
Context & Audit is the power-user inspection section. It MUST expose what was injected into a module prompt and what was excluded.
```ts
TaskPromptContextInspectorReadModel {
task_id: string
run_id: string
module_id: string
activation_seq: number
effective_prompt_snapshot_ref: StorageRef
redacted_prompt_snapshot_ref?: StorageRef
prompt_layers: Array<{
layer_kind:
| "system"
| "task_blueprint"
| "environment_config"
| "module_instruction"
| "instruction_in"
| "doc24_context_packet"
| "data_in"
| "context_in"
| "chain_history"
| "attached_files"
display_name: string
storage_ref?: StorageRef
token_count?: number
}>
doc24_context_packet_ref?: StorageRef
injected_items: ContextInjectedItem[]
excluded_items: ContextExcludedItem[]
feedback_enabled: boolean
schema_version: "1.0"
}
```
The UI SHOULD show injected memories, entity cards, procedures, patterns, tools, connectors, MCPs, APIs, documents, source pages, DOC73 libraries, and excluded candidates with reason codes. It MUST support opening the source memory/document/procedure/tool record where policy permits.
### §20C.8 Context feedback into DOC8/BDSM
The context inspector SHALL allow thumbs/comment feedback on injected and excluded items.
```ts
TaskContextFeedbackEvent {
event_id: string
task_id: string
run_id: string
module_id: string
activation_seq: number
target_kind:
| "injected_memory"
| "injected_entity"
| "injected_procedure"
| "injected_tool"
| "injected_connector"
| "injected_document"
| "injected_library"
| "excluded_candidate"
| "prompt_layer"
| "context_packet"
target_ref: string
feedback:
| "helpful"
| "not_helpful"
| "wrong"
| "missing"
| "too_much"
| "should_have_been_included"
| "should_have_been_excluded"
user_comment?: string
created_at: string
schema_version: "1.0"
}
```
These events are learning inputs for DOC8/BDSM. They do not directly mutate DOC24 routing or DOC72 memory.
### §20C.9 Export audit
`Export audit` creates a redaction-aware audit bundle, not a raw log dump.
```ts
TaskAuditExportBundle {
export_id: string
task_id: string
run_id: string
included_sections: Array<
| "run_summary"
| "graph_snapshot"
| "blueprint_snapshot"
| "module_activation_list"
| "cost_duration_summary"
| "models_and_agents"
| "tool_connector_calls"
| "subagent_sessions"
| "prompt_snapshots"
| "doc24_context_packets"
| "input_output_artifact_manifest"
| "delivery_receipts"
| "policy_approvals"
| "errors_retries_fallbacks"
| "judge_experiment_results"
| "doc72_memory_summary_links"
| "raw_trace"
>
redaction_profile:
| "summary_only"
| "professional_internal"
| "external_redacted"
| "full_local_audit"
output_format: "folder" | "zip" | "markdown" | "pdf_packet"
export_ref?: StorageRef
policy_decision_ref?: PolicyDecisionRef
created_at: string
schema_version: "1.0"
}
```
Prompt snapshots, context packets, memory excerpts, local paths, privileged content, and raw traces must honor PropA/EC policy decisions and chosen redaction profile.
---
## §20D — Task Agent side panel and scoped Task Agent conversations
### §20D.1 One Task Agent, contextual side panel
Task Agent is the single canonical agent name. It appears as a side-panel chat on task-related surfaces, including the modular graph, module detail/config panel, Run Inspector, artifact detail, Tasks page, and prompt editor.
The side panel should look and behave like a Q chat panel, with attachments, send controls, message feedback/stars when those exist in the main chat design, and surface-scoped context. It must not become a separate app or second agent identity.
### §20D.2 Contextual actions pass refs, not separate agents
Buttons or context-menu actions such as `Ask about module`, `Ask about artifact`, or `Ask about this run` simply open/focus the Task Agent side panel with the relevant task/run/module/artifact refs attached. They do not invoke a separate agent or separate prompt system.
Predetermined suggestion chips inside the Task Agent panel SHOULD be sparse and action-oriented. They are appropriate when they launch a specific workflow such as:
```text
Deep review and suggestions
Run validation/test
Improve selected prompt
Generate Task Assessment
Start onboarding/intake for this task
```
Generic prefilled questions such as “Where are we?” are usually unnecessary because the user can type them.
```ts
TaskAgentPanelContext {
surface:
| "tasks_page"
| "task_graph"
| "run_inspector"
| "module_detail"
| "artifact_detail"
| "context_inspector"
| "prompt_editor"
task_id?: string
run_id?: string
module_id?: string
activation_seq?: number
artifact_id?: string
selected_entrypoint?: TaskAgentEntrypoint
attached_context_refs: StorageRef[]
schema_version: "1.0"
}
```
### §20D.3 Scoped Task Agent threads
Task Agent uses one visible side-panel UI but scoped conversation threads to avoid context bleed.
```ts
TaskAgentScopedThread {
thread_id: string
scope:
| "general_task_agent"
| "task_design"
| "task_run_inspection"
| "module_followup"
| "artifact_followup"
| "portfolio_review"
| "prompt_improvement"
task_id?: string
run_id?: string
module_id?: string
activation_seq?: number
artifact_id?: string
openclaw_session_key?: string
task_agent_entrypoint?: TaskAgentEntrypoint
created_at: string
updated_at: string
schema_version: "1.0"
}
```
The UI may feel like one Task Agent, but transcript storage and context packaging must be scoped to task/run/module/artifact where appropriate.
---
## §20E — Module continuation, follow-up, rerun, and fork-from-module
### §20E.1 Three distinct module-after-run actions
The system distinguishes:
1. **Ask Task Agent about the module** — always available; Task Agent answers from run truth, prompt/context snapshots, artifacts, tools, errors, and outputs.
2. **Start follow-up from module context** — usually available; starts a new session seeded with the module’s prompt, inputs, outputs, DOC24 packet, tool/capability snapshot, and run summary.
3. **Continue original module agent session** — available only when DOC11/OpenClaw runtime truth reports that the original module/session is resumable and policy permits.
The UI MUST NOT show a phantom “continue original session” control when the module was a one-shot call, the session was archived, or continuation is blocked by policy.
```ts
TaskModuleSessionRef {
task_id: string
run_id: string
module_id: string
activation_seq: number
agent_id?: string
openclaw_session_key?: string
openclaw_run_id?: string
child_session_key?: string
session_kind:
| "one_shot_gateway_call"
| "resumable_openclaw_session"
| "spawned_subagent_session"
| "forked_session"
| "archived_session"
| "unavailable"
continuation_status:
| "available"
| "archived"
| "expired"
| "not_supported"
| "blocked_by_policy"
| "unknown"
model_ref?: string
think_level?: string
agent_config_snapshot_ref?: StorageRef
effective_prompt_snapshot_ref?: StorageRef
doc24_context_packet_ref?: StorageRef
input_artifact_refs: StorageRef[]
output_artifact_refs: StorageRef[]
tool_call_trace_refs: StorageRef[]
subagent_trace_refs: StorageRef[]
created_at: string
completed_at?: string
archived_at?: string
schema_version: "1.0"
}
```
```ts
ModuleFollowupSession {
followup_session_id: string
source_task_id: string
source_run_id: string
source_module_id: string
source_activation_seq: number
mode:
| "continue_original_session"
| "new_session_from_module_context"
| "ask_task_agent_about_module"
context_seed_refs: StorageRef[]
user_question: string
output_ref?: StorageRef
linked_artifact_refs: StorageRef[]
durable_effect:
| "chat_only"
| "module_followup_artifact"
| "proposed_task_patch"
| "proposed_run_fork"
schema_version: "1.0"
}
```
Continuing/following up on a completed module MUST NOT mutate the completed parent run. New answers, revised memos, or revised artifacts become follow-up artifacts linked to the original module activation.
### §20E.2 Mid-run interaction limits
Mid-run inspection is allowed. Mid-run mutation is not presumed.
The safe default for a running module:
```text
Observe live trace: allowed.
Ask Task Agent about status: allowed.
Pause after current module: allowed where runtime supports.
Cancel module/run: allowed through existing run controls.
Inject new instruction into current LLM call: not allowed unless module/runtime explicitly supports intervention.
Add note for downstream module or future fork: allowed.
```
```ts
ModuleSessionIntervention {
intervention_id: string
task_id: string
run_id: string
module_id: string
activation_seq: number
intervention_kind:
| "user_note_for_downstream"
| "cancel_current_call"
| "pause_after_module"
| "inject_instruction_if_supported"
| "request_status_if_supported"
supported_by_runtime: boolean
applied: boolean
receipt_ref: StorageRef
schema_version: "1.0"
}
```
### §20E.3 Rerun/fork-from-module
Users must be able to work from a specific module/activation without rerunning a whole 10–20 module graph. Completed parent runs remain immutable. Reruns/forks create new child runs with lineage.
```ts
TaskRunForkRequest {
request_id: string
parent_task_id: string
parent_run_id: string
fork_from: {
module_id: string
activation_seq: number
output_port_id?: string
branch_scope?: "selected_branch" | "all_downstream" | "selected_modules"
}
rerun_scope:
| "selected_module_only"
| "selected_module_and_downstream"
| "downstream_only"
| "judge_only"
| "experiment_only"
| "branch_only"
| "full_run_from_checkpoint"
upstream_state_policy:
| "freeze_parent_outputs"
| "recompute_upstream"
| "use_current_task_inputs"
graph_policy:
| "use_parent_graph_snapshot"
| "use_current_task_definition"
| "use_user_edited_fork_graph"
context_policy:
| "reuse_prior_doc24_packet"
| "recompute_from_same_task_scope"
| "recompute_from_current_memory_and_capabilities"
source_document_policy:
| "use_parent_document_versions"
| "use_latest_document_versions"
| "ask_if_changed"
side_effect_replay_policy:
| "block_all_external_side_effects"
| "draft_only"
| "allow_with_confirmation"
| "allow_if_non_duplicate"
user_patch_refs: StorageRef[]
created_by: "user" | "task_agent" | "system_repair"
schema_version: "1.0"
}
```
Side-effect replay defaults to `block_all_external_side_effects` or `draft_only` for external sends, email, iMessage, calendar, webhooks, and file writes outside a safe run-scoped output folder unless the user explicitly authorizes replay.
### §20E.4 ModuleActivationReplayRecord
```ts
ModuleActivationReplayRecord {
task_id: string
run_id: string
module_id: string
activation_seq: number
graph_snapshot_ref: StorageRef
module_config_snapshot_ref: StorageRef
input_bundle_ref: StorageRef
output_bundle_ref?: StorageRef
checkpoint_ref?: StorageRef
effective_prompt_snapshot_ref?: StorageRef
doc24_context_packet_ref?: StorageRef
capability_snapshot_ref?: StorageRef
policy_snapshot_ref?: StorageRef
model_runtime_snapshot_ref?: StorageRef
source_document_version_refs: StorageRef[]
artifact_refs: StorageRef[]
delivery_record_refs: StorageRef[]
error_refs: StorageRef[]
downstream_activation_refs: Array<{
module_id: string
activation_seq: number
}>
schema_version: "1.0"
}
```
`activation_seq` is required because loops, repeated activations, fan-out, and multi-output modules make `module_id` alone insufficient.
### §20E.5 Judge and Experiment rerun UX
Judge and Experiment modules are first-class beneficiaries of rerun/fork mechanics.
Judge module detail actions SHOULD include:
```text
Open judge audit
Change rubric and rescore
Rerun judge only
Rerun with more source context
Compare judge runs
Generate prompt improvement proposal
```
Experiment module detail actions SHOULD include:
```text
Open comparison
Rerun variants
Add variant
Generate candidates
Test winner downstream
Promote winning prompt
Fork downstream from variant B
```
`Experiment variant output → downstream branch fork` is a required fork pattern. Example: if Variant B produced the best damages memo, the user may fork downstream from Variant B into the Complaint Drafter and Final Revision modules without rerunning all upstream intake modules.
---
## §20F — Task Segment reusable module groups
### §20F.1 Definition
A **Task Segment** is a saved reusable group of modules, cables, configs, prompts, optional local artifact/context policies, and optional local validation/evaluation expectations. It is not necessarily a full runnable task.
Task Segment replaces the placeholder term “Sub-Task” for reusable graph fragments, because “sub-task” implies an active child task/run.
Examples:
```text
Source classification + fact extraction segment
Complaint drafting first-pass segment
Red-team + revise segment
Experiment + Judge scoring segment
Email attachment intake + DOC25 processing segment
```
### §20F.2 Schema
```ts
TaskSegmentRecord {
segment_id: string
display_name: string
description?: string
source_task_id?: string
source_graph_snapshot_ref?: StorageRef
selected_module_ids: string[]
selected_cable_ids: string[]
module_records_ref: StorageRef
cable_records_ref: StorageRef
local_blueprint_step_refs?: string[]
required_input_ports: PortRef[]
exposed_output_ports: PortRef[]
module_preset_refs: string[]
capability_requirements: CapabilityNeed[]
artifact_policy_ref?: StorageRef
evaluation_expectations_ref?: StorageRef
semantic_card_ref?: StorageRef
usage_count: number
last_used_at?: string
avg_duration_ms?: number
avg_cost_usd?: number
lifecycle_state:
| "draft"
| "active"
| "deprecated"
| "archived"
schema_version: "1.0"
}
```
### §20F.3 UI behavior
Users may create a Task Segment by selecting multiple modules/cables and choosing:
```text
Save as Task Segment
```
from toolbar or context menu. Browser and Tasks page should show Task Segments as reusable task assets distinct from full Saved Tasks and Module Presets. Dragging a Task Segment into a graph inserts the saved module group and prompts for required inputs/outputs and any missing capabilities.
Judge/Experiment modules may evaluate a Task Segment when it has sufficient input/output contracts for testing. Task Agent may recommend inserting a Task Segment during design, review, or repair.
---
## §20G — Tasks page command center and saved-task terminology
### §20G.1 Tasks page scope
The Tasks page is the command center for task assets and runs. It should support:
```text
[Active] [Scheduled] [Saved] [History] [Segments] [Presets]
```
Search and filters SHOULD include:
```text
Status
Matter/entity
Trigger type
Date range
Output type
Error status
Saved/reusable type
Template/source task
Capability/tool used
Judge/assessment score
```
### §20G.2 Operational stats for saved/reusable tasks and presets
Saved/reusable task rows SHOULD show run count, average time, average cost, success rate, last run, and common outputs.
```ts
SavedTaskOperationalStats {
task_id: string
display_name: string
run_count: number
success_count: number
failure_count: number
success_rate: number
avg_duration_ms: number
p50_duration_ms: number
p90_duration_ms: number
avg_cost_usd: number
p50_cost_usd: number
p90_cost_usd: number
last_run_at?: string
last_success_at?: string
last_failure_at?: string
common_output_kinds: string[]
common_failure_modes: string[]
latest_assessment_ref?: StorageRef
schema_version: "1.0"
}
```
```ts
ModulePresetOperationalStats {
preset_id: string
module_type: string
usage_count: number
tasks_used_in: number
templates_used_in: number
avg_module_duration_ms?: number
avg_module_cost_usd?: number
common_downstream_consumers: string[]
known_strengths: string[]
known_limitations: string[]
schema_version: "1.0"
}
```
Cost truth should reference DOC13 when DOC13 is available. DOC23 stores task/run/module cost snapshots and references, not a competing authoritative cost ledger.
### §20G.3 Saved task vs task template
For user-facing language:
```text
Saved Task = any saved graph the user can run/edit/inspect.
Reusable Saved Task = a saved task intended to be reused like a template.
Scheduled Saved Task = a saved task with a schedule.
Triggered Saved Task = a saved task with an email/webhook/file/task trigger.
Module Preset = saved config for one module.
Task Segment = saved reusable group of modules/cables.
```
Internal `TaskTemplateRecord` storage may remain where useful, but the primary UI noun should be `Reusable Saved Task` unless the product deliberately exposes templates as an advanced subtype.
```ts
SavedTaskKind {
task_id: string
reusable: boolean
scheduled: boolean
trigger_enabled: boolean
archived: boolean
source_template_id?: string
user_facing_label:
| "Saved Task"
| "Reusable Saved Task"
| "Scheduled Saved Task"
| "Triggered Saved Task"
schema_version: "1.0"
}
```
---
## §20H — Standalone shared task surface forward-compatibility
### §20H.1 Scope and deferral
Full shared task collaboration is not fully specified in this Addenda B revision. It is expected to belong to a future DOC50+ shared/networking spec family.
This addendum nevertheless imposes forward-compatible constraints so DOC23 task UI is not hard-wired to the full DOC20 shell.
### §20H.2 DOC50 owner split
Future shared/networking work SHOULD be owned by a DOC50+ family:
```text
DOC50 — Shared Surfaces, Links, and Collaboration Kernel
DOC51 — Team / Firm Identity, Roles, and Trust Domains
DOC52 — Collaborative Workspaces and Shared Task Execution
DOC53 — Sync, Conflict Resolution, and Multi-Client Networking
DOC54 — External Sharing, Audit, and Export Governance
```
DOC23 owns only task-specific shared manifests and task-run behavior. DOC50 should own the common sharing capsule, identity, permissions, revocation, access logging, and collaboration shell. PropA/EC policy governs exposure; DOC24 governs shared-session capability/context packets.
### §20H.3 Shell-independent task components
Task canvas, Run Inspector, Artifacts & Deliveries, module detail panels, and selected task summaries MUST be renderable without the full DOC20 Browser, right chat column, private workspace, or user’s ordinary Q shell.
Internal route and shared route should reuse task content components where possible:
```text
Internal: /tasks/:taskId?runId=:runId&view=graph
Shared: /share/task/:shareId
```
### §20H.4 TaskShareCapsule sketch
```ts
TaskShareCapsule {
share_id: string
task_id: string
task_version_ref: StorageRef
graph_snapshot_ref: StorageRef
optional_run_id?: string
shared_by_principal_id: string
recipient_principal_refs: PrincipalRef[]
permission:
| "view_only"
| "comment_only"
| "copy_task"
| "edit_proposal_only"
| "run_with_sender_approval"
| "run_within_limits"
| "edit_and_run_within_limits"
| "trusted_collaborator"
| "high_risk_warned_full_access"
share_mode:
| "task_design_only"
| "specific_run_summary"
| "specific_run_with_artifacts"
| "specific_artifacts_only"
| "collaborative_task_workspace"
| "audit_packet"
memory_policy: SharedTaskMemoryAccessPolicy
file_policy: SharedTaskFileAccessPolicy
output_policy: SharedTaskOutputPolicy
email_policy: SharedTaskEmailPolicy
capability_policy: SharedTaskCapabilityPolicy
agent_policy: SharedTaskAgentAccessPolicy
cost_policy: SharedTaskCostPolicy
approval_policy: SharedTaskApprovalPolicy
audit_policy: SharedTaskAuditPolicy
redaction_profile:
| "public_safe"
| "external_professional"
| "firm_collaborator"
| "trusted_internal"
| "full_access_warned"
expires_at?: string
revoked_at?: string
schema_version: "1.0"
}
```
### §20H.5 Access-control principles for shared task workspaces
A shared task workspace may eventually allow recipients to view, copy, propose edits, run with approval, run within limits, edit/run within limits, and use Task Agent within the granted scope. It must never grant implicit access to the sender’s full Q workspace.
Controls must be explicit for:
1. what memories are available;
2. what files/documents/folders/libraries are available;
3. where outputs can be saved;
4. whether outputs can be written to the sender’s DOC72 memory or DOC73 libraries;
5. whether email/iMessage/calendar/webhook delivery is allowed, draft-only, or sender-approved;
6. what APIs/connectors/MCPs/preconfigured agents are available;
7. what cost/runs limits apply;
8. whether the sender must approve first runs, every run, graph edits, prompt edits, capability mounting, memory expansion, file expansion, external sends, and output saves.
### §20H.6 Shared Task Agent access
Task Agent may be enabled in a shared task workspace only within the capsule’s policy.
```ts
SharedTaskAgentAccessPolicy {
enabled: boolean
allowed_entrypoints: Array<
| "review_existing_task"
| "design_task_patch"
| "inspect_task_run"
| "retrieve_task_output"
| "explain_task_graph"
| "assess_task"
| "improve_task_prompt"
>
grounding_mode:
| "shared_manifest_only"
| "shared_manifest_plus_selected_memory"
| "shared_manifest_plus_sender_approved_context"
may_query_sender_private_memory: boolean
may_query_unshared_task_history: boolean
may_modify_task_directly: boolean
may_propose_task_patch: boolean
require_sender_approval_for_task_agent_generated_patch: boolean
schema_version: "1.0"
}
```
Default shared-task Task Agent mode is `shared_manifest_only`, no private memory search, no unshared task-history access, no direct mutation, and proposal-only edits unless the sender grants stronger authority.
### §20H.7 Shared task audit
All shared access/run/edit/Task Agent/capability/file/memory/output events must emit share-scoped audit events.
```ts
TaskShareAuditEvent {
event_id: string
share_id: string
actor_principal_id: string
event_kind:
| "share_opened"
| "task_copied"
| "graph_viewed"
| "graph_patch_proposed"
| "graph_patch_applied"
| "task_run_requested"
| "task_run_approved"
| "task_run_started"
| "task_run_completed"
| "artifact_opened"
| "artifact_downloaded"
| "file_access_requested"
| "memory_access_requested"
| "capability_mount_requested"
| "email_draft_created"
| "external_send_requested"
| "external_send_approved"
| "policy_blocked"
| "share_revoked"
target_ref?: string
policy_decision_ref?: PolicyDecisionRef
created_at: string
schema_version: "1.0"
}
```
Full shared task workspace behavior is deferred, but all DOC23 task UI components must remain permission-aware and shell-independent enough to support this later.
---
## §21 — Routes, commands, read models, and SSE
### §21.1 Route posture
Routes here are starting contracts for DOC23 R3.2+ incorporation. EC route registry remains canonical.
### §21.2 Proposed routes
| Route | Method | Purpose |
|---|---|---|
| `/api/tasks/agent/design-sessions` | POST | Start Task Agent design session. |
| `/api/tasks/agent/design-sessions/:id` | GET | Get design session state. |
| `/api/tasks/agent/design-sessions/:id/propose-blueprint` | POST | Generate/update blueprint proposal. |
| `/api/tasks/agent/design-sessions/:id/propose-graph` | POST | Generate graph proposal. |
| `/api/tasks/agent/design-sessions/:id/preflight` | POST | Run preflight. |
| `/api/tasks/:taskId/blueprint` | GET/PUT | Read/update blueprint via EC command. |
| `/api/tasks/:taskId/runs/:runId/lens` | GET | TaskAgentRunLens. |
| `/api/tasks/:taskId/runs/:runId/artifacts` | GET | Artifact index for run. |
| `/api/tasks/artifacts/search` | POST | Search artifacts across tasks/runs. |
| `/api/tasks/artifacts/:artifactId/promote` | POST | Request artifact intake/promotion. |
| `/api/tasks/:taskId/runs/:runId/assessment` | POST/GET | Request/read Task Assessment. |
| `/api/tasks/templates/search` | POST | Semantic template search. |
| `/api/tasks/presets/search` | POST | Semantic module preset search. |
| `/api/tasks/knowledge-pack/status` | GET | Active TaskSystemKnowledgePack status. |
| `/api/tasks/knowledge-pack/rebuild` | POST | Request rebuild through EC. |
| `/api/tasks/design-learning/proposals` | GET | Review improvement proposals. |
| `/api/tasks/design-learning/proposals/:id/apply` | POST | Apply approved proposal. |
### §21.3 Proposed commands
```ts
TaskAgentCommandType =
| "task.design_session.start"
| "task.design_session.update"
| "task.blueprint.create"
| "task.blueprint.update"
| "task.graph.proposal.create"
| "task.graph.diff.apply"
| "task.preflight.run"
| "task.assessment.request"
| "task.artifact.promote.request"
| "task.template.semantic_card.update"
| "task.preset.semantic_card.update"
| "task.template.update.propose"
| "task.preset.update.propose"
| "task.knowledge_pack.rebuild"
| "task.design_improvement.proposal.apply"
```
### §21.4 Proposed SSE events
```ts
TaskAgentSseEvent =
| "task.agent.design_session.started"
| "task.agent.template_matches.updated"
| "task.agent.blueprint.proposed"
| "task.agent.graph.proposed"
| "task.agent.preflight.completed"
| "task.agent.run_lens.updated"
| "task.artifact.index.updated"
| "task.artifact.promotion.requested"
| "task.assessment.started"
| "task.assessment.completed"
| "task.knowledge_pack.staged"
| "task.knowledge_pack.activated"
| "task.knowledge_pack.drift_detected"
| "task.design_learning.proposal.created"
| "task.design_learning.proposal.applied"
```
---
## §21A — Routes, commands, read models, and events to add
### §21A.1 Routes / commands
```text
POST /api/tasks/mode-decision
GET /api/tasks/opportunity-packet
POST /api/tasks/task-agent/consult
POST /api/tasks/task-agent/design
POST /api/tasks/task-agent/adapt-template
POST /api/tasks/task-agent/review
POST /api/tasks/task-agent/inspect-run
POST /api/tasks/task-agent/retrieve-output
POST /api/tasks/task-agent/assess
POST /api/tasks/task-agent/portfolio-assess
POST /api/tasks/task-agent/improve-prompt
POST /api/tasks/task-agent/test-prompt
POST /api/tasks/prompt-evaluation-jobs
GET /api/tasks/prompt-evaluation-jobs/:jobId
POST /api/tasks/invocation-feedback
POST /api/tasks/proposal-edit-trace
GET /api/tasks/knowledge-pack/status
POST /api/tasks/knowledge-pack/rebuild
```
Route names are illustrative; final paths must conform to DOC23 route conventions and EC command-envelope standards.
### §21A.2 SSE / events
```text
task.mode.decision_recorded
task.opportunity.packet_ready
task.agent.consult_started
task.agent.consult_completed
task.agent.design_started
task.agent.design_completed
task.agent.assessment_started
task.agent.assessment_completed
task.agent.degraded
task.agent.fallback_model_used
task.suggestion.feedback_recorded
task.invocation.learning_signal_recorded
task.proposal_edit.trace_recorded
task.prompt.quality_assessed
task.prompt.candidates_generated
task.prompt.evaluation_started
task.prompt.evaluation_completed
task.knowledge_pack.stale
task.knowledge_pack.activated
```
### §21A.3 Read models
```text
TaskAgentStatusReadModel
TaskModeDecisionReadModel
TaskOpportunityPacketReadModel
TaskInvocationDirectiveReadModel
TaskSuggestionFeedbackReadModel
TaskAgentDesignSessionReadModel
TaskAgentProposalEditTraceReadModel
TaskPortfolioAssessmentReadModel
TaskPromptQualityReadModel
PromptEvaluationJobReadModel
TaskKnowledgePackReadinessReadModel
TaskModuleContextPacketReadModel
TaskModuleContextExclusionReadModel
```
---
## §21B — Additional routes, commands, read models, and events for R0.6.4 UI/run-control additions
### §21B.1 Additional routes / commands
Route names are illustrative until incorporated into DOC23 route conventions and EC registry.
```text
GET /api/tasks/:taskId/runs/:runId/inspector
GET /api/tasks/:taskId/runs/:runId/flow
GET /api/tasks/:taskId/runs/:runId/deliverables
GET /api/tasks/:taskId/runs/:runId/context-audit/:moduleId/:activationSeq
POST /api/tasks/:taskId/runs/:runId/context-feedback
GET /api/tasks/:taskId/runs/:runId/modules/:moduleId/:activationSeq/quick-access
POST /api/tasks/:taskId/runs/:runId/export-audit
POST /api/tasks/:taskId/runs/:runId/fork
GET /api/tasks/:taskId/runs/:runId/forks
POST /api/tasks/:taskId/runs/:runId/modules/:moduleId/:activationSeq/followup
POST /api/tasks/:taskId/runs/:runId/modules/:moduleId/:activationSeq/continue-session
POST /api/tasks/:taskId/runs/:runId/modules/:moduleId/:activationSeq/intervene
POST /api/tasks/segments
GET /api/tasks/segments/:segmentId
POST /api/tasks/segments/:segmentId/insert
POST /api/tasks/:taskId/segments/save-selection
GET /api/tasks/stats/saved-tasks
GET /api/tasks/stats/presets
GET /api/tasks/:taskId/navigation-state
PUT /api/tasks/:taskId/navigation-state
POST /api/tasks/share-capsules // deferred / DOC50-owned when available
GET /api/tasks/share-capsules/:shareId/manifest
```
### §21B.2 Additional command types
```ts
TaskUiAndRunControlCommandType =
| "task.run_inspector.open"
| "task.navigation_state.update"
| "task.context_feedback.record"
| "task.audit_export.request"
| "task.run.fork.request"
| "task.module.followup.start"
| "task.module.session.continue"
| "task.module.intervention.request"
| "task.segment.create"
| "task.segment.insert"
| "task.segment.update"
| "task.segment.archive"
| "task.share_capsule.create" // future DOC50 integration
| "task.share_capsule.revoke" // future DOC50 integration
```
### §21B.3 Additional read models
```text
TaskGraphInspectorNavigationStateReadModel
ModuleRunQuickAccessReadModel
TaskRunInspectorReadModel
TaskRunStepRowReadModel
TaskRunNarrativeStatusReadModel
TaskDeliverablesReadModel
TaskDeliveryRecordReadModel
TaskPromptContextInspectorReadModel
TaskAuditExportBundleReadModel
TaskModuleSessionRefReadModel
ModuleFollowupSessionReadModel
TaskRunForkRequestReadModel
TaskRunForkHistoryReadModel
ModuleActivationReplayRecordReadModel
TaskSegmentReadModel
SavedTaskOperationalStatsReadModel
ModulePresetOperationalStatsReadModel
TaskShareCapsuleReadModel // future DOC50 integration
SharedTaskSurfaceManifestReadModel // future DOC50 integration
```
### §21B.4 Additional SSE/events
```text
task.ui.navigation_state_updated
task.run.inspector_updated
task.run.flow_updated
task.run.narrative_status_updated
task.module.quick_access_updated
task.delivery.recorded
task.deliverables.updated
task.context.feedback_recorded
task.audit_export.started
task.audit_export.completed
task.module.followup.started
task.module.session.continuation_available
task.module.session.continuation_unavailable
task.module.intervention.recorded
task.run.fork_requested
task.run.fork_created
task.run.fork_completed
task.segment.created
task.segment.inserted
task.segment.updated
task.saved_task.stats_updated
task.preset.stats_updated
task.share_capsule.created // future DOC50 integration
task.share_capsule.revoked // future DOC50 integration
task.share.audit_event_recorded // future DOC50 integration
```
---
## §22 — Storage paths
### §22.1 Proposed paths
```ts
TaskAddendaBPaths = {
taskBlueprint: "ELNOR_MEMORY/tasks/{task_id}/blueprint.json",
taskBlueprintVersions: "ELNOR_MEMORY/tasks/{task_id}/blueprints/{blueprint_version}.json",
taskDesignSessions: "ELNOR_MEMORY/system/task_design/sessions/{design_session_id}.json",
taskDesignSessionEvents: "ELNOR_MEMORY/system/task_design/sessions/{design_session_id}/events.jsonl",
taskTraceSpans: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/trace_spans.jsonl",
taskPromptSnapshots: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/prompt_snapshots/",
taskCapabilitySnapshots: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/capability_snapshots/",
taskModuleContextPackets: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/context_packets/{module_id}/{activation_seq}.json",
taskPolicySnapshots: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/policy_snapshots/",
taskArtifactIndex: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/artifact_index.json",
taskAssessments: "ELNOR_MEMORY/tasks/{task_id}/assessments/{assessment_id}.json",
templateSemanticCards: "ELNOR_MEMORY/system/task_system/template_semantic_cards/{template_id}.json",
modulePresetSemanticCards: "ELNOR_MEMORY/system/task_system/module_preset_semantic_cards/{preset_id}.json",
taskDesignCases: "ELNOR_MEMORY/system/task_design/cases/{case_id}.json",
taskDesignPatterns: "ELNOR_MEMORY/system/task_design/patterns/{pattern_id}.json",
taskDesignImprovementProposals: "ELNOR_MEMORY/system/task_design/improvement_proposals/{proposal_id}.json",
taskKnowledgePacks: "ELNOR_MEMORY/system/task_agent/knowledge_packs/{pack_id}/",
activeTaskKnowledgePackPointer: "ELNOR_MEMORY/system/task_agent/active_knowledge_pack.json",
taskKnowledgeCompilerRuns: "ELNOR_MEMORY/system/task_agent/compiler_runs/{compiler_run_id}.json",
taskAgentEvalSuites: "ELNOR_MEMORY/system/task_agent/evals/{suite_id}.json",
taskAgentEvalRuns: "ELNOR_MEMORY/system/task_agent/eval_runs/{eval_run_id}.json",
taskDeliveries: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/deliveries.jsonl",
taskRunNarrativeStatus: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/narrative_status.json",
taskRunInspectorReadModel: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/inspector_read_model.json",
taskModuleQuickAccess: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/module_quick_access/{module_id}/{activation_seq}.json",
taskPromptContextInspectors: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/prompt_context_inspectors/{module_id}/{activation_seq}.json",
taskContextFeedbackEvents: "ELNOR_MEMORY/system/task_feedback/context_feedback.jsonl",
taskAuditExports: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/audit_exports/{export_id}/",
taskModuleSessionRefs: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/module_sessions/{module_id}/{activation_seq}.json",
moduleFollowupSessions: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/module_followups/{followup_session_id}.json",
taskRunForkRequests: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/fork_requests/{request_id}.json",
taskRunForks: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/forks/{child_run_id}.json",
moduleActivationReplayRecords: "ELNOR_MEMORY/tasks/{task_id}/runs/{run_id}/replay_records/{module_id}/{activation_seq}.json",
taskSegments: "ELNOR_MEMORY/system/task_system/segments/{segment_id}.json",
taskSegmentSemanticCards: "ELNOR_MEMORY/system/task_system/segment_semantic_cards/{segment_id}.json",
savedTaskOperationalStats: "ELNOR_MEMORY/system/task_system/stats/saved_tasks/{task_id}.json",
modulePresetOperationalStats: "ELNOR_MEMORY/system/task_system/stats/module_presets/{preset_id}.json",
taskShareCapsules: "ELNOR_MEMORY/system/sharing/task_capsules/{share_id}.json",
taskShareManifests: "ELNOR_MEMORY/system/sharing/task_manifests/{share_id}.json",
taskShareAuditEvents: "ELNOR_MEMORY/system/sharing/task_audit_events.jsonl"
}
```
### §22.2 Storage classification
| Artifact | Canonical/derived | Owner |
|---|---|---|
| Task Blueprint | canonical DOC23 task metadata | DOC23/EC |
| TaskTraceSpan | operational truth/log | DOC23/EC |
| TaskArtifactIndex | run read model | DOC23/EC |
| Task Assessment | derived assessment artifact | DOC23/EC |
| Semantic cards | projection/derived unless user-authored fields | DOC23 + DOC72 |
| Knowledge Pack | generated/derived active resource | DOC23/EC |
| Learning signals | append-only learning inputs | DOC8/BDSM + EC |
| Delivery records | append-only/run-indexed receipts | DOC23/EC + output owner docs |
| Context feedback events | append-only learning inputs | DOC8/BDSM + EC |
| Module session refs | runtime truth bridge/read model | DOC23 consumes DOC11/OpenClaw |
| Run fork requests/replay records | canonical child-run lineage inputs | DOC23/EC |
| Task Segments | canonical reusable task asset | DOC23/EC |
| Share capsules/manifests | future shared-surface permission artifact | DOC50/EC/PropA, DOC23 task-specific manifest |
---
## §23 — Validation codes
### §23.1 Proposed validation errors
```text
validation.task_agent_template_search_required
validation.blueprint_missing_required_output
validation.blueprint_graph_mismatch
validation.blueprint_side_effect_undeclared
validation.capability_required_unavailable
validation.capability_denied_but_requested
validation.runtime_expansion_outside_allowlist
validation.artifact_policy_endpoint_ambiguous
validation.doc25_required_but_unavailable
validation.doc73_library_missing
validation.task_memory_policy_conflict
validation.incognito_blocks_requested_memory
validation.prompt_snapshot_required_missing
validation.task_knowledge_pack_stale_blocking
validation.template_update_requires_approval
validation.preset_update_requires_approval
validation.graph_proposal_not_preflighted
validation.task_assessment_input_trace_missing
validation.task_module_context_packet_required
validation.task_module_context_active_context_unjustified
validation.task_module_context_policy_snapshot_missing
validation.task_module_context_source_visibility_violation
validation.task_module_context_profile_overrides_denied_capability
validation.module_run_quick_access_missing
validation.task_delivery_record_missing_for_delivered_output
validation.task_context_feedback_target_missing
validation.task_run_fork_requires_activation_seq
validation.task_run_fork_side_effect_replay_policy_required
validation.module_session_continuation_phantom_control
validation.task_segment_missing_exposed_io
validation.task_share_capsule_policy_missing
validation.task_shared_surface_shell_dependency
```
### §23.2 Proposed warnings
```text
warning.template_match_available_but_new_graph_requested
warning.intermediate_output_likely_high_value
warning.tool_pack_too_broad
warning.capability_historically_unused
warning.capability_historically_unreliable
warning.second_pass_deepening_recommended
warning.human_gate_recommended_before_side_effect
warning.final_output_not_saved_as_work_product
warning.blueprint_goal_unlinked
warning.task_design_case_candidate
warning.knowledge_pack_stale_nonblocking
warning.task_module_context_active_context_excluded
warning.task_module_context_broad_scope_declared
warning.task_module_context_relevant_memory_trimmed
warning.task_module_context_library_candidate_excluded
warning.task_module_context_prior_task_candidate_excluded
warning.run_inspector_timeline_linearization_loss
warning.module_output_link_missing
warning.task_agent_panel_unscoped_thread
warning.task_segment_possible_duplicate
warning.audit_export_redaction_required
warning.shared_task_feature_deferred_doc50
```
---
## §24 — Cross-doc obligations
### §24.1 DOC72 obligations
DOC72 must support or confirm:
1. `execution_trace` payloads for DOC23 TaskRun activity memory.
2. Task/matter/entity/artifact/work-product/followup-session relationship patterns.
3. Task-design `goal` payload/annotations and goal-link relations.
4. Semantic projection support for task templates, module presets, design cases, and design patterns.
5. Work-product links from task outputs.
6. Provenance discipline for any task-derived durable memory.
### §24.2 DOC24 obligations
DOC24 must support or confirm:
1. Task Agent as registered system agent/capability.
2. Capability registry lookup for task modules and Task Agent.
3. Runtime capability expansion receipts.
4. DOC24 packet snapshots for task module dispatch.
5. Rendering/injection of Task Design Intelligence cards.
6. MCP/connector/procedure/model availability surfaces.
7. Tool/procedure/capability utility bundle consumption from BDSM.
8. Task-module DOC24 context packet assembly that is task-scoped, module-scoped, profile-gated, policy-gated, and receipt-backed.
9. Active chat/work context exclusion unless explicitly inherited or high-confidence bound by task-owned evidence.
10. Context packet exclusion receipts for Run Inspector, Task Assessment, and Task Agent operational lens.
### §24.3 DOC25 obligations
DOC25 must support:
1. Task-produced document ingestion.
2. Task-downloaded document ingestion.
3. Task-received attachment ingestion.
4. Task final work-product ingestion.
5. DOC25_IngestionResult references back to TaskArtifactIndex.
6. Quality/degraded-state reporting visible in Run Inspector and Task Assessment.
### §24.4 DOC73 obligations
DOC73 must support:
1. Task-output-to-library/corpus source class.
2. Library/corpus binding from DOC23 artifact promotion.
3. Extraction profile selection from task/template/blueprint.
4. User-facing “library” terminology in task UI.
5. Receipts linking DOC23 artifact -> DOC25 ingestion -> DOC73 library member.
### §24.5 DOC8/BDSM obligations
DOC8/BDSM must support:
1. TaskCreationSessionTrace-derived learning signals.
2. Task Agent question utility.
3. Template/preset suggestion utility.
4. Capability utility in task context classes.
5. Artifact policy utility.
6. Task design pattern utility.
7. Task Assessment learning inputs.
8. Compiled bundles consumable by Task Agent and DOC24.
### §24.6 EC Core obligations
EC must support:
1. Task Agent command registration.
2. Knowledge Pack compiler jobs and activation pointer.
3. Route/read model closure.
4. Storage paths.
5. Atomic writes and rollback/proposal ledgers.
6. Nightly/ongoing extraction jobs.
7. Incognito/effective runtime enforcement.
8. Task run telemetry write paths.
9. Drift detection and eval gating.
### §24.7 DOC3 obligations
DOC3 must support:
1. Semantic procedure discovery by Task Agent through DOC24/DOC72.
2. Procedure use/outcome links to TaskRun memory.
3. Distinction between DOC3 procedures and DOC23 task templates/presets.
4. Optional task-design learning analogies without collapsing task templates into DOC3 procedures.
### §24.8 DOC20/DOC21/DOC22 obligations
UI/content docs must register:
- Task Agent Create surface;
- Blueprint editor;
- Template Match panel;
- Task Agent Run Lens;
- Artifact Finder;
- Task Assessment page/panel;
- Knowledge Pack inspector;
- Task Design Learning Review Queue;
- task modular page as DOC20 `task` tab;
- Browser task filter/default behavior;
- graph overview preservation rule;
- ModuleRunQuickAccess block in every module detail/config panel;
- Graph ⇄ Run Inspector navigation and state synchronization;
- Run Inspector with Header, Run Flow & Steps, Artifacts & Deliveries, and Context & Audit;
- artifact/document context menu and native-app/Finder actions where available;
- Context Inspector with DOC24 injected/excluded context and feedback controls;
- Task Agent side-panel chat with full chat treatment and scoped context;
- Tasks page command center with active/scheduled/saved/history/segments/presets views;
- Task Segment asset type in Browser/Tasks surfaces;
- saved/reusable task operational stats;
- shell-independent task content components for future shared-task rendering.
### §24.9 OP-A obligation tracker
All accepted cross-doc obligations from this addendum must be added to OP-A during the next OP-A maintenance pass.
---
## §24A — Cross-doc obligations to add to OP-A
### §24A.1 DOC24 obligations
1. Add Task Mode Resolver prefilter and `TaskModeDecision` generation.
2. Add `TaskOpportunityPacket` runtime packet lane.
3. Register Task Agent as an invokable agent capability.
4. Register Task Agent entrypoints in capability registry.
5. Support compact top-k task template / module preset / invocation directive injection.
6. Enforce no full TKP injection into ordinary chat.
7. Route TaskInvocationDirective cards through DOC24 delivery architecture.
8. Feed task suggestion feedback into BDSM/DOC8.
9. Expose model/capability/tool availability to Task Agent through live registry and packet assembly.
10. Add prompt-improvement routing so DOC17 Prompt Advisor service and Task Agent do not conflict.
11. Add TaskModuleContextPacket assembly and receipt support for task modules.
12. Enforce task-context isolation: active chat/work context is candidate evidence, not automatic task-module context.
13. Add library/document/source binding gates so DOC73/DOC25 context appears in task modules only when task-bound or explicitly selected.
### §24A.2 DOC11 / OpenClaw obligations
1. Expose Task Agent named-agent runtime truth if registered.
2. Expose model catalog, fallback chain, auth, and provider availability for Task Agent runtime profile.
3. Expose sub-agent settings/status/events for Task Agent background design sessions.
4. Support isolated/fork context mode truth for Task Agent context packaging.
5. Ensure visible Task Agent controls map to actual routes/read models or degraded states.
6. Distinguish OpenClaw heartbeat/session monitoring from saved-task process monitoring.
### §24A.3 DOC72 obligations
1. Store task runs as `execution_trace` hubs linked to entities, matters, artifacts, work products, templates, goals, directives, and follow-up sessions.
2. Support TaskInvocationDirective as memory_directive or standing_procedure payload depending on lifecycle/governance.
3. Support Task Design Goals as `goal` nodes and edge links from blueprints, templates, design decisions, patterns, outputs, and assessments.
4. Store semantic projections/cards for task templates and module presets without duplicating executable graph truth.
5. Support task-design casebook and prior-task rationale links.
### §24A.4 DOC3 obligations
1. Preserve boundary between DOC3 semantic skills/procedures and DOC23 saved task graphs.
2. Allow TaskInvocationDirective to reference DOC3 procedures without materializing task graphs as SKILL.md.
3. Ensure DOC24 delivery of graph-backed procedures remains direct injection from DOC72 contracts.
### §24A.5 DOC17 obligations
1. Reframe Prompt Advisor as lightweight prompt-analysis/rewrite service, not a separate agent.
2. Keep Prompt Lab as backend/offline bridge only if retained; no separate primary user-facing prompt lab required.
3. Allow Task Agent to call Prompt Advisor service in task-context prompt improvement.
4. Keep prompt recipes DOC17-owned while allowing Task Agent to test/use/convert them in task context.
### §24A.6 DOC23 Addenda A obligations
1. Extend Experiment to accept prompt-candidate sources or bundles.
2. Reserve DSPy/GEPA prompt optimization until R5 substrate is implemented.
3. Integrate PromptEvaluationTask with Experiment/Judge/Claim Extractor and Task Assessment.
4. Ensure prompt promotion uses promotion ledger, hashes, datasets, rollback, and post-promotion monitoring when operative.
### §24A.7 DOC8/BDSM obligations
1. Add task invocation utility signals.
2. Add task suggestion accepted/rejected/ignored events.
3. Add Task Agent design utility signals.
4. Add TaskAgentProposalEditTrace learning consumption.
5. Add prompt-edit and prompt-evaluation learning signals.
6. Compile task suggestion suppression/boost policies by context class.
### §24A.8 EC Core obligations
1. Add Task Agent system-agent identity registry entry.
2. Add Task Agent runtime profile storage and settings controls.
3. Add EC command routes for Task Agent proposals, TKP rebuilds, TaskModeDecision receipts, and prompt-evaluation tasks.
4. Enforce visible receipt for Elnor-created saved tasks.
5. Enforce no hidden graph runs.
6. Maintain TKP active/staged/rejected state and drift detection.
### §24A.9 DOC25 / DOC73 obligations
1. Produced, downloaded, received, or task-generated documents routed through DOC25 ingestion when artifact policy requires.
2. DOC73 libraries/corpora remain destinations for deep bounded knowledge; TKP is not a normal user-facing DOC73 library.
3. Prompt-evaluation tasks producing documents/work products route outputs through DOC25/DOC73 per artifact policy.
### §24A.10 DOC20 / DOC21 / DOC22 obligations
1. Render Task Agent panels in task editor, Run Inspector, task list, templates/presets, and prompt editor.
2. Render Task Agent settings.
3. Render task opportunity chips and suppression states.
4. Render prompt-improvement levels without exposing duplicate Prompt Lab surface.
5. Render TKP readiness/drift diagnostics for advanced/admin users.
### §24A.11 DOC20 / DOC21 / DOC22 UI obligations from R0.6.4
1. Task modular canvas opens as a DOC20 workspace `task` tab with right chat column closed by default.
2. Browser defaults to Task filter when Task page is opened through Pages navigation.
3. Browser drag/drop supports saved tasks, modules, documents, artifacts, and Task Segments into allowed graph/drop targets.
4. Module detail/config panels include ModuleRunQuickAccess with Open Output and Open Run Inspector.
5. Output/artifact clicks open the appropriate DOC20 viewer/tab and preserve task-tab navigation.
6. Graph ⇄ Run Inspector navigation preserves selected task/run/module/artifact state.
7. Run Inspector is consolidated into Header, Run Flow & Steps, Artifacts & Deliveries, and Context & Audit.
8. Artifact context menus include Show in Finder, Show in Browser, Save As, Open in Q, Open in Native App, Copy Link, Add to Library, Save as Work Product, Make Findable, Use in New Task, and Ask Task Agent where supported.
9. Task Agent side panel receives full chat treatment and scoped context refs.
10. Task page supports active/scheduled/saved/history/segments/presets command-center views and operational stats.
11. DOC21/22 component registry must include Task Segment, Run Flow & Steps, Artifacts & Deliveries, Context Inspector, Task Agent Panel Context, and ModuleRunQuickAccess components.
### §24A.12 DOC11 / OpenClaw obligations from R0.6.4
1. Expose whether a task module activation used a one-shot Gateway call, resumable OpenClaw session, spawned sub-agent session, forked session, archived session, or unavailable session.
2. Expose continuation availability/status for module agent sessions.
3. Expose child session keys, run IDs, model refs, think levels, fallback usage, and session archive/expiration status needed for `TaskModuleSessionRef`.
4. Support no-phantom continuation controls: if continuation is unavailable, Q must receive a reason code.
5. Surface sub-agent/session traces for Run Inspector and Task Agent without making DOC23 the OpenClaw runtime owner.
### §24A.13 DOC23 Addenda A obligations from R0.6.4
1. Judge modules must support rescore-same-output and compare-judge-runs flows when the underlying data exists.
2. Experiment modules must support rerun variants, add variant, compare variants, and fork downstream from a selected/winning variant.
3. Experiment prompt-candidate handling must integrate with run fork/downstream fork behavior when prompt candidates produce downstream artifacts.
4. Judge/Experiment detail views must expose evaluation evidence, prompts/rubrics, source context, and Task Assessment entrypoints.
### §24A.14 DOC50+ / sharing obligations to create
1. Create DOC50 Shared Surfaces, Links, and Collaboration Kernel as the common owner for share capsules, identity, permissions, revocation, audit, and shared rendering shells.
2. DOC23 owns only task-specific shared manifests and task-run behavior.
3. DOC73 owns library-specific shared manifests while consuming DOC50 permission capsules.
4. DOC20 renders shared surfaces as shell-independent viewports.
5. EC/PropA policy engine gates shared memory, file, tool, connector, email, cost, output, and Task Agent authority.
6. DOC24 assembles shared-session capability/context packets constrained by the share capsule.
7. Shared Task Agent access must be grounded only in the shared manifest/selected memory unless stronger trusted access is explicitly granted.
8. High-risk full-access sharing requires explicit warning, expiration/revocation, and audit receipts.
### §24A.15 DOC8/BDSM learning obligations from R0.6.4
1. Consume `TaskContextFeedbackEvent` for injected/excluded DOC24 context, tools, memories, documents, connectors, procedures, and libraries.
2. Consume module-output/open/save/promote actions as artifact-utility signals.
3. Consume Task Agent panel feedback, scoped-thread success/failure, and module-followup utility signals.
4. Consume run-fork/follow-up outcomes as task-design and repair utility signals.
5. Consume Task Segment reuse success/failure and duplicate/near-duplicate segment signals.
---
## §24B — Cross-doc obligations from Addenda A ↔ Addenda B coordination V3 FINAL (R0.7 absorption)
The following OP-A rows originate from the Addenda A ↔ Addenda B coordination V3 FINAL proposal. They are paste-ready for the OP-A ledger. Status reflects state as of R0.7 freeze (2026-05-17).
### §24B.1 OP-A rows
```
OBL-XDOC-EVAL-ENV-01
Owner: DOC23 Evaluation Common Contracts V1.1.1 (sibling doc; per coord V3 §3.2)
Consumer: Addenda A R4.1 V3, Addenda B Core R0.7.1, V3.3
Description: Shared EvaluationResultEnvelope schema with EvaluationArtifactEnvelope wrapper
Status: specified_in_owner
OBL-XDOC-MODULES-REGISTRY-01
Owner: DOC23 R3.2 (target)
Consumer: All addenda using these module types
Description: Register step.judge, step.evaluator, step.revisor, step.claim_extractor
in parent module registry
Status: pending_R3_2_compile
OBL-XDOC-SCOPE-PRIMITIVES-01
Owner: DOC23 Evaluation Common Contracts V1.1.1
Consumer: Addenda A, Addenda B Core R0.7.1, V3.3.1, PropA R6.3+
Description: ArtifactScopeRef, TextAnchor, StructuredAnchor as shared primitives
Status: specified_in_owner
OBL-XDOC-OUTCOME-COMPLIANCE-01
Owner: Addenda A R4.1 V3 (V4.1 Coordination Patch / V5 Mini-Card)
Consumer: Addenda B (V3.3 Evaluator) — via Pattern C wiring
Description: Judge gains outcome_compliance_scoring method consuming
EvaluationOutcomeDefinition.criteria[] directly. Pattern C wiring
allows Judge to attach downstream of any Evaluator output.
Status: in_review
OBL-XDOC-PROMPT-COMPARISON-SIGNAL-01
Owner: Addenda A R4.1 V3
Consumer: DOC8/BDSM
Description: Experiment emits PromptComparisonSignal wrapped in
EvaluationLearningSignalEnvelope including task_design_signature
when applicable
Status: in_review
OBL-XDOC-CLAIM-EXTRACTOR-PUBLIC-01
Owner: Addenda A R4.1 V3
Consumer: V3.3 Evaluator dispatches
Description: step.claim_extractor as public contract with claims_out port;
broadened output to ExtractedEvaluationUnit union (22 types);
section-anchored + privilege-tagged units; no virtual data_out alias
Status: in_review
OBL-XDOC-EVALUATOR-CLAIMS-IN-01
Owner: Addenda B V3.3.1 (specified in V3.3 §5.17)
Consumer: Addenda A Claim Extractor wiring patterns
Description: Evaluator adds claims_in port consuming ClaimSetBundle /
ExtractedEvaluationUnitBundle
Status: specified_in_owner
OBL-XDOC-EVAL-SIGNAL-OWNERSHIP-01
Owner: Addenda B Core R0.7.1 (this doc, §9.0)
Consumer: DOC8/BDSM
Description: Define and emit OutcomeEvaluationSignal, RepairCycleSignal (with
taint_evolution and qualitative_delta), TaskProcessGapSignal (runtime),
TaintClearanceSignal, HardCallResolutionSignal — all wrapped by
EvaluationLearningSignalEnvelope including task_design_signature
when applicable
Status: specified_in_owner
OBL-XDOC-LEARNING-MODE-01
Owner: Addenda B V3.3.1 (specified in V3.3 §6.16)
Consumer: EC Core (cost governance integration)
Description: RevisorConfig.learning_mode (production / signal_generation / calibration)
Status: specified_in_owner
OBL-XDOC-MODEL-CLASS-AXIS-01
Owner: DOC72
Consumer: Addenda B Core R0.7.1, V3.3.1, Addenda A R4.1
Description: Add model_class axis to PatternContextSignature;
add cross_model_applicability to Pattern
Status: pending_DOC72_update
OBL-XDOC-BDSM-CONSUME-SIGNALS-01
Owner: DOC8/BDSM
Consumer: Pattern primitive (DOC72), Task Agent (Addenda B)
Description: Consume governed signal stream (all eight Phase 1 signal types);
produce utility bundles; threshold-gate surfacing via
PatternSurfacingThreshold; emit TaskDesignCorrelationSignal
Status: pending_DOC8_update
OBL-XDOC-EC-POLICY-SIGNALS-01
Owner: EC Core
Consumer: All signal emitters
Description: Compiled policy engine gates every signal at envelope layer based on
data_class, matter_id, pattern_promotion_eligible
Status: pending_EC_Core_update
OBL-XDOC-PROPA-DSPY-TARGETS-01
Owner: PropA R6.3+
Consumer: Addenda A R4.1, Addenda B Core R0.7.1, V3.3
Description: Add DspyTargetIdSchemaV4 values: claim_extractor_main,
outcome_evaluator_main, revision_compiler_main, outcome_compiler_main
Status: pending_PropA_update
OBL-XDOC-DOC20-EVAL-UI-01
Owner: DOC20
Consumer: User UI
Description: UI surfaces for shared envelope, variant comparison (Patterns A and B),
Pattern C ad-hoc Judge attachment, learning_mode toggle, pattern display
with model_class context, graph-edit warning for implicit auto-revision
chains (Revisor downstream of Experiment with pass_through_winner +
autonomous Revisor), wiring validation error when non-comparison-aware
consumer wired downstream of Experiment with route_all_variants
Status: pending_DOC20_update
```
### §24B.2 Consuming-doc inserts (cross-doc obligation mechanism)
The following `[XDOC-INSERT: target_doc]` blocks are reciprocal to the Addenda A R4.1 V3 / V4.1 Coordination Patch outputs. The architect's coding agent applies them when the named target doc is next revised.
**[XDOC-INSERT: DOC23 R3.2 (when compilation pass runs)]**
```
1. Absorb DOC23 Evaluation Common Contracts V1.1.1 into parent doc as new top-level section.
Common Contracts V1.1.1 retires per its §11 migration guide.
2. Extend module type registry with:
- step.judge (Addenda A)
- step.evaluator (Addenda B V3.3)
- step.revisor (Addenda B V3.3; not "reviser")
- step.claim_extractor (Addenda A)
- step.dspy_optimizer (PropA; consumed by all extractors and evaluators)
3. Document Patterns A, B, C wiring as first-class supported patterns.
4. Update references in Addenda A R4.1 V3 / Addenda B Core R0.7.1 / V3.3.1 / sub-addenda
to point to DOC23 R3.2 sections rather than Common Contracts V1.1.1.
```
**[XDOC-INSERT: DOC72 next revision]**
```
1. Extend PatternContextSignature with model_class axis (cheap_local / cheap_api /
medium / expensive_frontier). Mandatory for new Pattern records.
2. Add cross_model_applicability field to Pattern primitive:
- "model_class_specific"
- "cross_model_applicable"
- "requires_validation" (default for new patterns)
3. Enforce matter-scoped retrieval firewall per Addenda B V3.1/V3.3 §13.4.
Matter-scoped Patterns do not surface in cross-matter contexts.
Privileged-matter pattern promotion requires EC Core policy gate.
4. Cross-reference OBL-XDOC-MODEL-CLASS-AXIS-01 in DOC72's OP-A list.
```
**[XDOC-INSERT: DOC8/BDSM next revision]**
```
1. Consume unified governed signal stream wrapped in EvaluationLearningSignalEnvelope
per Common Contracts §5.1. Discriminate by signal_type (eight Phase 1 types).
2. Produce utility bundles consumed by DOC72 Pattern primitive store.
3. Threshold-gate pattern surfacing via PatternSurfacingThreshold:
min_runs, min_distinct_tasks, min_success_confidence, max_regression_rate.
Default values: 10, 3, 0.7, 0.15.
4. Emit TaskDesignCorrelationSignal (aggregate). Consumed by Task Agent for
suggestion surfacing. BDSM does NOT emit runtime TaskProcessGapSignal — that
is emitted by Revisor or Task Agent during execution.
5. Phase 2 correlation analytics: clustering, deficiency taxonomy emergence,
auto-fix candidate detection. Operates on Phase 1 captured data.
6. Cross-reference OBL-XDOC-BDSM-CONSUME-SIGNALS-01 in DOC8's OP-A list.
```
**[XDOC-INSERT: EC Core Addendum A next revision]**
```
1. Extend compiled policy engine to gate signal envelope persistence and promotion:
- data_class enforcement (public / internal / privileged / local_only)
- matter_id firewall (matter-scoped signals do not auto-cross matter boundaries)
- pattern_promotion_eligible governs durable learning feed
2. Cost governance for learning_mode field (V3.3 §6.16):
- "signal_generation" mode draws from cheap-model budget pool
- "calibration" mode draws from mixed pool with explicit user authorization
- "production" mode draws from production pool
3. Retention policy for evaluation envelopes and signals per data_class and matter.
4. Cross-reference OBL-XDOC-EC-POLICY-SIGNALS-01 in EC Core's OP-A list.
```
**[XDOC-INSERT: PropA R6.3 → R6.4 or later]**
```
1. Extend DspyTargetIdSchemaV4 with new target ids:
- "claim_extractor_main" (Addenda A step.claim_extractor prompts)
- "outcome_evaluator_main" (Addenda B V3.3.1 Evaluator prompts)
- "revision_compiler_main" (Addenda B V3.3.1 Revisor compile prompts)
- "outcome_compiler_main" (Addenda B V3.3.1 Outcome Compiler prompts)
Each new target requires DspyTargetEligibilitySchemaV4 discipline.
2. Add a new section documenting the extraction landscape coordination:
- Boundary between PropA P0_master_extraction (DOC72 graph candidates) and
Addenda A step.claim_extractor (Judge/Evaluator inputs)
- Affirm separation: different consumers, different lifecycles
- Affirm shared infrastructure: anchoring primitives via Common Contracts §7
- Affirm shared DSPy lane: PropA is single optimization lane
3. Cross-reference OBL-XDOC-PROPA-DSPY-TARGETS-01 in PropA's OP-A list.
```
**[XDOC-INSERT: DOC20 next revision]**
```
1. UI surfaces for shared EvaluationResultEnvelope:
- Producer-aware rendering
- Slice-aware rendering (quantitative, qualitative, comparison, assurance, safety)
- Hard Call surface integration
2. UI surfaces for variant comparison (Patterns A and B per coord V3 §2.9):
- Per-variant result cards
- Cross-variant comparison view
3. UI surface for Pattern C ad-hoc Judge attachment:
- "Attach Judge to this Evaluator output" action on result cards
- Cost estimate before Judge dispatch
- Score display once Judge completes
4. UI for learning_mode toggle in RevisorConfig (V3.3 §6.16):
- Three-state selector with cost guidance per mode
5. UI for pattern display with model_class context:
- Pattern card shows model_class context
- "from memory" vs "adapted from memory" badge (V3.3 §21.8.1)
- cross_model_applicability status
6. Graph-edit warning when Revisor wired downstream of Experiment with
experiment_winner_routing = "pass_through_winner" AND Revisor's
AutonomousModePolicy permits autonomous repair. Surface confirmation
dialog explaining implicit auto-revision chain.
7. Wiring validation error at graph-edit time when non-comparison-aware
consumer wired downstream of Experiment with route_all_variants.
8. Cross-reference OBL-XDOC-DOC20-EVAL-UI-01 in DOC20's OP-A list.
```
### §24B.3 Coding-agent instructions
[CODING-AGENT-NOTE: cross-doc obligations pass]
When implementing changes from Addenda B Core R0.7.1 (this doc), scan §24B.2 for blocks tagged `[XDOC-INSERT: target_doc]` and apply them to the named target docs when those docs are next revised. The architect (Will) does not separately coordinate with other-doc chats; the inserts are the coordination mechanism. Status updates for each OP-A row are queryable via the OP-A ledger.
---
## §25 — Rejected concepts and non-goals
| Concept | Disposition |
|---|---|
| Central runtime orchestrator | Rejected. Undermines visible graph and repeatable execution. |
| Task Agent as ordinary DOC23 module | Rejected for primary Task Agent. It is a system capability/read-control interface. |
| Hidden subagent spawning outside modules | Rejected. Sub-agents belong inside modules or explicit non-task advisory routes. |
| Full raw task content stored in DOC72 | Rejected. Raw content belongs in run storage, DOC25, DOC73, or owner stores. |
| Automatic ingestion of all intermediate outputs | Rejected. Use artifact policy. |
| Giant monolithic Task Agent `skill.md` as primary truth | Rejected. Use hybrid knowledge pack/tools/cards/evals. |
| Purely generated knowledge pack | Rejected. Generated truth must be supplemented by doctrine, rationale, examples, cases, goals, preferences, and learning. |
| Automatic material mutation of templates/presets | Rejected. Use proposal/versioning/approval. |
| Capability utility silently deleting tools | Rejected. Utility informs recommendations/ranking/warnings unless approved. |
| DOC25 as final endpoint label | Rejected. DOC25 is processor; endpoint must be specified. |
---
## §25A — Rejected or deferred concepts added by the post-R0.3 merge
### §25A.1 Rejected for initial Addenda B
```text
Separate user-facing Workflow Designer agent
Separate user-facing Task Inspector agent
Separate user-facing Prompt Agent
Separate primary Prompt Lab location
Promptolution-specific module
Task Agent as ordinary DOC23 module
Task Agent as default classifier for every chat turn
Hidden saved-task graphs
Full TKP injection into ordinary chat
DSPy-lite implementation before R5 substrate
Task graphs materialized as DOC3 SKILL.md files
```
### §25A.2 Reserved / possible future
```text
Generic step.prompt_candidate_generator module
Standalone advanced Prompt Evaluation Jobs page
Task Agent sub-agent decomposition for heavy portfolio audits
Formal prompt artifact library beyond DOC17 prompt recipes
Task mode policy simulator
Multi-user shared TaskInvocationDirective governance
Full collaborative shared task workspace (DOC50+/Phase 2)
Generic real-time multi-user graph editing
Prompt/Context Inspector feedback auto-mutating DOC24 routing without review
Live mid-call instruction injection unless module/runtime explicitly supports intervention
```
---
## §26 — Open questions for architect review
1. Should “Template Group” be the placeholder name for multi-template bundles, or should this remain unnamed until needed?
2. Which task-design patterns should be architect-authored first beyond second-pass deepening, draft-redteam-revise, email attachment ingestion, and recurring monitor?
3. What default artifact policy should apply to final legal work products: `doc72_summary_reference` or `doc25_work_product_ingestion`?
4. Should Task Agent have a visible identity in the System Agent Identity Registry immediately, or land through DOC24 capability registry first?
5. What level of automatic low-risk semantic-card update is acceptable without user review?
6. How aggressively should prior task cases be extracted into the Design Casebook?
7. Should knowledge pack evals be mandatory before any pack activation, or only before major pack changes?
8. What is the correct default for intermediate high-value analysis outputs: ask each time, suggest after run, or make findable with summary when module preset says high-value?
9. How should legal matter privilege/sensitivity policy alter task memory defaults?
10. Should Task Assessment run automatically after every completed run, only after selected templates, or only when enabled?
11. What exact UI label should replace internal “TaskTemplateRecord” where user-facing “Reusable Saved Task” is sufficient?
12. Should Task Segments become searchable/recommendable through the same semantic-card machinery as saved tasks and module presets immediately, or after usage proves value?
13. Which run-fork defaults should vary by task type: reuse prior DOC24 context, recompute from same scope, or recompute from current memory/capabilities?
14. What redaction profile should be default for external audit export?
15. Which shared-task permissions, if any, belong in Phase 1.5 before full DOC50 collaboration exists?
---
## §27 — Implementation dependency note — non-normative
This addendum is an end-state spec. Implementation may proceed in dependency-aware order, but the spec should not be rewritten as phases.
Practical dependency order:
```text
1. Telemetry spine + Artifact Index
2. DOC72 Task Activity Memory
3. Artifact intake policy
4. Template/preset semantic cards
5. Task Blueprint
6. Capability binding/runtime expansion
7. Task Agent Run Lens and output retrieval
8. Task Agent creation flow
9. Task Knowledge Pack compiler
10. Task Assessment
11. Task design learning and improvement proposals
12. Eval/promotion/drift substrate
```
Task Agent may be partially useful before all layers exist, but it must degrade honestly and must not pretend unavailable memory/learning/capability substrates exist.
---
## §27A — End-to-end examples
### §27A.1 Create a complaint evaluation task
```text
User:
Create a task to evaluate a complaint.
Task Agent:
I found an existing Complaint Evaluation template. It matches your request better than a new blank graph.
Recommended customization:
- use DOC25 retrieval for complaint and source documents;
- include issue matrix + revision plan output;
- add second-pass deepening because you prefer “look deeper” in complex analysis;
- add optional red-team review;
- keep raw intermediate drafts with the run;
- make final report findable with summary;
- save final revised complaint as work product if produced.
I will run preflight before saving.
[Use recommended setup] [Inspect template] [Edit blueprint] [Create new instead]
```
### §27A.2 Ask about a running task
```text
User:
Where are we in the Marex Draft Complaint task?
Task Agent reads TaskAgentRunLens and answers:
The task is on step 5 of 7. The damages memo and first complaint draft are complete.
The red-team module is running now. It has reviewed the complaint draft and 12 source documents.
No human gate is pending yet. The next expected step is revision based on red-team findings.
Latest available artifacts: damages memo, first draft complaint, source-doc classification table.
```
### §27A.3 Retrieve a task output
```text
User:
Send me the results of the draft task we ran for Marex.
Task Agent:
I found the Marex Draft Complaint run from [date]. Final outputs were:
- Final complaint draft
- Damages analysis memo
- Red-team report
- Revision summary
The final complaint was saved as a work product. The damages memo is still run-scoped but can be made findable.
[Open final complaint] [Open damages memo] [Make damages memo findable] [Open Run Inspector]
```
### §27A.4 Learn from a refined task
```text
User creates or edits a complaint drafting task and adds:
- second-pass deepening;
- red-team before revision;
- separate damages memo output;
- final work-product save.
After run and assessment, Task Agent proposes:
This design worked well and appears reusable. I recommend adding a design case:
“For complex complaint drafting, use damages memo -> draft -> second-pass deepening -> red-team -> revision.”
Candidate template update:
Add damages memo and second-pass deepening as optional defaults in Complaint Drafting Template.
[Approve design case] [Review template update] [Reject]
```
---
---
## §27B — Non-normative external design benchmarks
This section preserves external design patterns considered during drafting. These references are not normative and do not override DOC23-native contracts. They explain why Addenda B emphasizes graph-visible execution, telemetry, evals, connector registries, reusable skills/procedures, and durable execution.
| External pattern | Lesson imported into Addenda B |
|---|---|
| Visual agent/workflow builders with connector registries and versioning | DOC23 should keep the graph as the visible execution substrate while adding semantic template/preset reuse and capability binding. |
| Claude Cowork-style natural-language process setup | Task Agent should accept underspecified intent, propose a richer process, show progress, retrieve outputs, and ask for approval when needed. |
| LangGraph/Temporal-style durable execution | Long-running tasks need checkpoints, idempotent side effects, durable waits, and safe resume semantics. |
| Phoenix/Braintrust-style traces/evals/datasets | Telemetry must feed eval datasets, promotion ledgers, regression checks, and Task Assessment. |
| Anthropic Skills / progressive disclosure | Task Agent should receive compact routing cards first and deeper cards/spec sections only when needed. |
| MCP connector ecosystems | MCP should be exposed through DOC24 capability IDs, health, auth, policy, and least-privilege tool packs, not raw uncontrolled tool lists. |
| OpenTelemetry / span-based observability | DOC23 should define native TaskTraceSpan records and may later export compatible traces. |
| Typed DAG planning / validator-gated repair research | Task Agent should synthesize typed blueprints/graphs, validate/preflight them, and repair through explicit proposals rather than free-form runtime improvisation. |
The architectural conclusion is: **structured plan + governed execution + trace/eval feedback** is preferable to opaque autonomous runtime improvisation for business-grade tasks.
## §27C — End-state examples
### §27C.1 Simple direct action
```text
User: File this in the Marex folder.
Elnor:
Done — I saved it under Marex > Drafts. I kept the original filename.
```
No saved task. No Task Agent. Possible DOC3 procedure/tool action.
### §27C.2 Ambiguous review
```text
User: Review this complaint.
Elnor:
Here is a quick review...
Optional chip:
[Run deeper Complaint Evaluation task]
```
Task chip appears only if DOC24 task opportunity and learned preferences justify it.
### §27C.3 Deep complaint review
```text
User: Do a deep review of this complaint against the source documents and give me a red-team report.
Elnor:
This is a good fit for your Complaint Evaluation task. It will review the complaint and source docs, produce an issue matrix, run red-team, and save the final report. I recommend using that saved task rather than doing this as an ordinary chat response.
[Use saved task] [Customize first] [Just answer here]
```
### §27C.4 Elnor-created recurring task
```text
User: Keep an eye on Marex filings and tell me when anything new appears.
Elnor:
I’ll set this up as a saved task so it can run independently and report back. You’ll be able to inspect the task, logs, and alerts from the Tasks page.
[View task] [Edit steps] [Enable]
```
### §27C.5 Existing task improvement
```text
User: Look at my complaint tasks and suggest improvements.
Elnor:
I’ll ask Task Agent to review the relevant task templates, recent runs, judge results, artifacts, and user edits, then give you a ranked improvement report.
```
### §27C.6 Prompt help quick
```text
User: Is this a good prompt?
Elnor:
It is missing source scope, a required output format, and a verification rule. Here is a stronger version...
```
No task. DOC17 Prompt Advisor service or direct reasoning.
### §27C.7 Prompt testing deep
```text
User: Refine my red-team prompt for effectiveness.
Elnor:
I can do a quick rewrite, or run a tracked prompt test with variants and judge scoring.
[Quick rewrite] [Run tracked prompt test]
```
If tracked, Task Agent creates a Prompt Evaluation Task using Experiment/Judge and candidate-generation methods.
---
## §27D — Coverage audit against post-R0.3 decisions
This R0.6.1 unified draft incorporates the following post-R0.3 architectural decisions:
1. Runtime/user-facing wording should use **saved task** / **tracked workflow** / **Task Agent**, not “use DOC23.”
2. `Workflow Designer` is not a separate agent name. Task Agent is the canonical identity; design/review/inspect/retrieve/assess are modes.
3. “Consult Task Agent for task fit” was too broad. DOC24/EC prefilter first; Task Agent is invoked only through typed entrypoints.
4. Saved tasks are direct-first and formalization-value driven; complexity alone is not enough.
5. `monitored` was ambiguous. OpenClaw heartbeat/session monitoring, OpenClaw background sub-agents, EC scheduler jobs, and saved-task monitoring are separate modes.
6. Elnor may create saved tasks to handle long-running/scheduled/multi-step work, but all such tasks must be visible, editable, inspectable, cancelable, and receipt-backed.
7. Task Agent is a system agent and invokable capability, registered through EC Agent Identity Registry and DOC24 Capability Registry, with DOC11/OpenClaw runtime truth.
8. Task Agent needs model, think-level, fallback, cost, context, sub-agent, and degradation settings.
9. TKP is Task Agent's design/reference substrate, not the ambient router, not the core contract, and not a user-facing DOC73 library.
10. Task invocation resembles procedural routing, but TaskInvocationDirective should not collapse DOC23 tasks into DOC3 skills.
11. Task suggestions, acceptances, rejections, overuse, underuse, and user edits are high-value BDSM/DOC8 learning signals.
12. Prompt help should be one user-facing capability, **Improve Prompt**, with quick and deep modes.
13. DOC17 Prompt Advisor should become a lightweight prompt-analysis/rewrite service, not a separate agent competing with Task Agent.
14. No separate primary Prompt Lab UI is required.
15. Deep prompt testing should be a tracked DOC23 prompt-evaluation task orchestrated by Task Agent.
16. Promptolution should not be a dedicated module or UI; it is a prompt-candidate generation method.
17. DSPy/GEPA is reserved until Addenda A R5 optimization substrate lands.
18. Candidate generation should initially be baked into Experiment or invoked by Task Agent; a generic Prompt Candidate Generator module is only reserved for later manual graph use.
19. Task Agent must inspect Judge/Experiment/Claim/optimization evidence when assessing tasks and prompts.
20. No hidden graph runs, no silent prompt mutation, no silent template/preset mutation, no full TKP injection into ordinary chat.
---
## §27E — Merge audit cleanup record
R0.6.1 integrates the audit-complete clarifications from the former post-R0.3 review appendix into their operative sections. The following items are now placed in the body of the addendum rather than left as standalone merge notes:
1. `TaskInvocationDirectiveCandidate` creation from real task-mode usage and user acceptance/rejection/edit traces (§7B).
2. `ElnorCreatedTaskSemanticTrace` for every Elnor-created visible saved task (§5A).
3. `OpenClawTaskRuntimeTraceBridge` and the boundary between native OpenClaw progress and saved-task formalization (§4C).
4. Task Agent access to Monitor, Environment, telemetry, snapshots, step-output shelves, tool-result shelves, DOC72 task memory, and DOC8/BDSM learning signals (§4.5).
5. `PromptCandidateMethodMetadata`, `ExperimentPromptCandidateMode`, and `ExperimentPromptCandidateSourcePlan` (§17B).
6. Contextual prompt-help access without a separate primary Prompt Lab surface (§17E).
Audit verdict: coverage is complete against the R0.3 base and R0.5 merge pack at schema/section/idea level. The remaining work is architectural red-team review, not merge repair.
## §27F — Closing post-R0.3 merge principle
The final design should make saved tasks naturally available without contaminating ordinary Elnor interactions.
```text
Direct-first for ordinary work.
Task-aware when task value is clear.
Task Agent only through typed entrypoints.
TKP only in specialist modes.
Prompt help unified under Improve Prompt.
Deep testing becomes tracked tasks.
Learning tunes overuse, underuse, design quality, prompts, templates, and preferences.
```
The goal is not to use tasks more. The goal is to use saved tasks when they materially improve reliability, auditability, reuse, background execution, artifact management, evaluation, or user outcomes.
---
## §27G — R0.6.2 context-injection patch record
R0.6.3 revises §13A to remove the singular-current-work-context assumption and to distinguish launch-surface context, scheduled/background task context, email/webhook/file/event trigger context, and task-run scope envelopes. R0.6.2 had closed the initial task-module DOC24 context-injection scope gap identified after R0.6.1; R0.6.3 sharpens that rule so scheduled, triggered, and concurrent-chat runs never inherit unrelated active context by default. The patch makes explicit that DOC24 task-module packets are task-scoped and module-scoped, not ordinary active-chat-context dumps; preserves relevant global hard constraints, preferences, procedures, and prior task memory; requires packet receipts and exclusion receipts; clarifies DOC73/library and DOC25/document injection conditions; and adds read-model, storage, validation, and OP-A obligations for the seam.
---
---
## §27H — R0.6.4 UI, continuation, and sharing-forward-compatibility patch record
R0.6.4 integrates the post-R0.6.3 UI/mockup and task-surface design decisions:
1. DOC20 tab/browser integration for the modular task page.
2. Graph overview preservation and non-obtrusive telemetry rule.
3. Module detail Run Data / Open Output / Open Run Inspector links.
4. Graph ⇄ Inspector state-synchronized navigation.
5. Run Inspector consolidation into Header, Run Flow & Steps, Artifacts & Deliveries, and Context & Audit.
6. TaskDeliveryRecord and deliverables read-model for where/when outputs were saved, emailed, messaged, posted, or otherwise delivered.
7. Context Inspector with DOC24 injection/exclusion visibility and feedback into DOC8/BDSM.
8. Tasks page command center with active/scheduled/saved/history/segments/presets views and operational stats.
9. User-facing saved-task/reusable-saved-task terminology.
10. Task Segment as reusable module group replacing the placeholder “sub-task.”
11. Task Agent contextual side-panel and scoped Task Agent threads.
12. Module session continuation/follow-up modes and TaskModuleSessionRef.
13. Rerun/fork-from-module and ModuleActivationReplayRecord.
14. Judge/Experiment rescore, rerun, variant, and downstream-fork affordances.
15. Export audit bundle semantics.
16. Standalone/shared task surface forward-compatibility and DOC50 owner split.
## §27I — R0.7 patch record (family-topology reorganization + V3 FINAL coordination absorption)
R0.7 carries the following deltas from R0.6.4:
### §27I.1 Family-topology reorganization
R0.6.4 was a singular Addenda B document. R0.7 reorganizes Addenda B into a family of focused specifications:
- Outcome Evaluator + Revisor content extracted to V3.3.1 sub-addendum (already produced)
- Source workspace + Source Research module content extracted to Source Workspace V1.0.1 sub-addendum (to be produced)
- Task Forum + Run Board + Module Assistance Requests content extracted to Task Forum + Run Board V1.0.1 sub-addendum (to be produced)
- In-run feedback delivery content extracted to Feedback Delivery V1.0.1 sub-addendum (to be produced)
- Shared evaluation primitives moved to DOC23 Evaluation Common Contracts V1.1.1 sibling document (already produced)
R0.7 retains the Task Design domain core: Task Agent, Blueprint, Templates, Task Design Learning, Task Activity Memory, DOC24 binding, Telemetry, Task Assessment, Prompt improvement, baseline UI/routes/storage/validation.
### §27I.2 R0.6.5 base material absorption
R0.6.5 proposal sections absorbed into R0.7:
- §0 Executive Summary — merged into R0.7 front matter
- §1 Governing Principles (including Principle X "Autonomy") — absorbed into §2 as §2.12
- §2 What This Adds — absorbed into §2 as §2.14
- §3 Object Taxonomy — pointed to §3A
- §4 Complexity Profiles — absorbed into §2 as §2.13
Domain-specific R0.6.5 content (sections §5 onward) distributed to sub-addenda per §27I.1. R0.6.5 proposal retires when sub-addenda are produced.
### §27I.3 Addenda A ↔ Addenda B coordination V3 FINAL absorption
The locked V3 FINAL coordination architecture integrated:
- **§0A** R0.7 supersession statement and family topology — NEW
- **§0B** Shared evaluation infrastructure integration — NEW
- **§2.12** Autonomy principle (Principle X) with normative spec-anchor sentence — NEW
- **§9.0** V3 FINAL coordination signal envelope with five Core-owned signal payloads (OutcomeEvaluationSignal, RepairCycleSignal full Phase 1 form with taint_evolution, TaskProcessGapSignal, TaintClearanceSignal, HardCallResolutionSignal) — NEW
- **§24B** V3 FINAL coordination OP-A rows + six `[XDOC-INSERT: target_doc]` blocks for DOC23 R3.2, DOC72, DOC8/BDSM, EC Core, PropA, DOC20 + coding-agent instruction — NEW
### §27I.4 Resolved open questions from R0.6.4 §26
The following R0.6.4 open questions are now resolved:
- "Should the outcome evaluator be a separate addendum or part of the core?" → resolved by V3.3 family-topology decision
- "How do learning signals coordinate with Addenda A's Judge module?" → resolved by V3 FINAL coordination (signal envelope, Pattern C wiring)
- "How should the system support running on cheap models for learning purposes?" → resolved by V3.3 §6.16 learning_mode field
- "What's the boundary between PropA extraction and Addenda A claim extraction?" → resolved by V3 FINAL §2.12 (separate modules, shared anchoring infrastructure, shared DSPy lane)
### §27I.5 What R0.7 does NOT change
R0.6.4 content for §3 through §22 (modulo the §9.0 insertion and §2 additions) is preserved. The Task Agent, Task Blueprint, templates, telemetry spine, DOC24 binding, OpenClaw alignment, prompt improvement, UI surfaces, routes, storage paths, validation codes, and existing cross-doc obligations carry through unchanged. R0.7 is a surgical absorption of the V3 FINAL coordination outputs plus a topology reorganization — it does not rewrite the Task Design substrate that R0.6.4 established.
### §27I.6 Migration
- R0.6.4 is superseded and should be treated as provenance/source-only now that Source Workspace V1.0.1, Task Forum + Run Board V1.0.1, and Feedback Delivery V1.0.1 have landed
- Cross-references from other addenda to R0.6.4 sections that moved should update during their next revisions
- The OP-A ledger picks up R0.7's new obligation rows (§24B); existing R0.6.4 obligation rows (§24, §24A) remain active
## §28 — Closing note
This addendum's core claim is that DOC23 can become a business-grade and personal-grade agentic process system without sacrificing the core strength of explicit modular graphs.
The architecture is:
```text
Visible process graph for execution.
Task Agent for intelligence and interface.
Task Blueprint for intent.
Telemetry for truth.
DOC72 for task activity memory.
DOC25/DOC73 for document and library outputs.
DOC24 for capabilities/context/tools/models.
DOC8/BDSM for learning and utility.
EC for durable writes and control.
```
The value comes from compounding: every task created, run, inspected, assessed, edited, and reused makes the Task Agent better at turning future intent into reliable task processes.