DOC17_R4_2_OVERLAY_LIBRARY_PROMPT_ADVISOR_PROMPT_RECIPES_AND_PROMPT_LAB.md
Current Specs/DOC17/DOC17_R4_2_OVERLAY_LIBRARY_PROMPT_ADVISOR_PROMPT_RECIPES_AND_PROMPT_LAB.md
# DOC17 R4.2 — Overlay Library, Prompt Advisor, Prompt Recipes & Prompt Lab
**Status:** Draft for architectural review (R4.2 folded revisions)
**Owner Doc:** DOC17
**Primary Companions:** Core, DOC8, DOC10, DOC11, DOC14, DOC15, DOC6, DOC12, DOC9
**Supersedes:** DOC17 R4.1, DOC17 Prompt Approach Overlay Addendum R1, and the separate DOC17 Appendix A research addendum
**Purpose:** Define the complete ELNOR spec for reusable prompt overlays, prompt-advisor workflows, reusable prompt recipes, overlay- and prompt-artifact feedback capture, and the Prompt Lab bridge for offline prompt evaluation of reusable artifacts.
---
## 0) Intent in Plain Language
DOC17 exists to solve four product problems without violating ELNOR’s core architecture.
1. **Reusable prompting:** the user should be able to turn on a reusable task mode such as `guided-verification`, `legal-research`, or `secure-code` without pasting the same prompt text over and over.
2. **Prompt improvement during work:** while drafting a prompt in Q, the user should be able to get immediate help from a deterministic Prompt Advisor and, optionally, an AI rewrite.
3. **Reusable prompt artifacts:** if a prompt becomes valuable enough to save and reuse, the user should be able to save it as a **Prompt Recipe**.
4. **Offline evaluation of reusable prompt artifacts:** overlays, prompt recipes, and prompt-advisor rewrite templates may be tested in a governed offline lab, but they must not self-mutate in the live chat hot path.
DOC17 is not a second orchestration spine, not a second context planner, not a second optimizer owner, and not a permission slip to let random prompt text drift silently through the system. DOC17 owns the product surface and the prompt-artifact definitions for overlays and prompt recipes. DOC8 owns optimizer artifacts, replay, canaries, variant assignment, and promotion/retirement lifecycle. DOC11 owns runtime prompt truth and final annotation assembly. DOC15 owns learned recommendation nodes and retrieval. DOC10 owns route facts and orchestration seams.
### 0.1 Binding Non-Negotiables
The following rules are binding.
1. **EC is the sole durable writer.** Nothing in DOC17 may write durable state except EC through the command queue.
2. **Q never writes durable state.** Q submits commands and renders state.
3. **Free-text chat remains Gateway-first.** DOC17 must not reroute ordinary chat through EC.
4. **DOC17 does not own optimizer lifecycle.** It may request jobs and render results, but DOC8 owns replay, canaries, prompt variants, and promotion/retirement.
5. **DOC11 owns runtime prompt truth.** DOC17 may emit overlay facts and bounded packets, but final prompt assembly order and runtime truth surfaces are DOC11-owned.
6. **No live auto-mutation.** Overlay text and prompt recipes may not silently rewrite themselves based on live usage signals.
7. **Prompt Lab is offline only.** Promptolution and any future DSPy integration are allowed only in the offline evaluation lane for reusable artifacts.
8. **Truth surfaces are required.** No learned recommendation application or candidate promotion UI may become active unless DOC11 prompt-truth surfaces are available.
9. **Protected native context remains protected.** DOC17 may not mutate SOUL.md, OpenClaw MEMORY.md, or any file under `~/.openclaw/`.
10. **Shared contracts live in `packages/contracts/`.** No local drift.
### 0.2 Product Capability Summary
DOC17 R4.2 exposes six product capabilities.
1. **Overlay Library** — reusable overlays with scoped activation, pinning, conflict rules, refresh controls, re-answer flows, and visible runtime truth.
2. **Prompt Advisor** — deterministic prompt-gap analysis plus optional AI rewrite.
3. **Prompt Recipes** — saved reusable prompt artifacts with metadata and reuse hooks.
4. **Overlay Refresh & Re-Answer Controls** — one-click refresh on the next turn plus selected-turn re-answer flows without fake prompt injection.
5. **Overlay & Prompt-Artifact Feedback** — explicit and implicit signals captured for later evaluation.
6. **Prompt Lab Bridge** — governed offline evaluation for reusable prompt artifacts.
### 0.3 How Overlays Interact with LLM Statefulness
Overlays are additive bounded instructions that modify how a model should approach a task. They are not personas, not memories, and not hidden one-off developer notes. Overlays are applied only when active for the current surface and only through the DOC11 prompt-plan assembly path.
Overlays must respect the ELNOR precedence model:
1. Protected native context
2. Authority Memory
3. Workspace instructions
4. Room/participant role prompts when applicable
5. Active overlays
6. CIL knowledge nodes / heuristic memory
7. Recent summary chain
Overlays never outrank Authority Memory. Learned patterns never override user-authored standing orders.
Active live overlays are semantically applied on every turn while they remain active for the session. They are not one-shot messages embedded in chat history, and they must survive long threads, compaction, and model switching through the DOC11 prompt-plan assembly path.
### 0.4 Live vs. Offline Prompting
DOC17 distinguishes between **live prompting surfaces** and **offline evaluation**.
- **Live prompting surfaces**: free-text chat, rooms, panel tasks, and task execution. These use the active overlay set and Prompt Advisor. They must remain responsive and predictable.
- **Offline evaluation**: Prompt Lab jobs run on reusable artifacts only: overlays, prompt recipes, and prompt-advisor rewrite templates. They may propose candidates, but they do not silently alter active artifacts.
---
## 1) Storage (EC-Owned)
All DOC17 durable state is written by EC under `ELNOR_MEMORY/`.
### 1.1 Storage Layout
```text
ELNOR_MEMORY/
system/
overlays/
templates/
<overlay_id>.md
archived/
<overlay_id>.md
index.json
metadata.json
preset_manifest.json
install_state.json
sessions/
<session_key>.json
pins/
global.json
workspace/
<workspace_id>.json
prompt_recipes/
recipes/
<recipe_id>.md
archived/
<recipe_id>.md
index.json
metadata.json
prompt_advisor/
config.json
rewrite_templates/
default.md
legal.md
coding.md
prompt_lab/
bridge_config.json
feature_flags.json
worker/
queue/
pending/
running/
completed/
failed/
results/
candidates/
replay_summaries/
canary_summaries/
signals/
doc17/
overlay_feedback.jsonl
prompt_recipe_feedback.jsonl
prompt_advisor_feedback.jsonl
prompt_advisor_rewrites.jsonl
```
### 1.2 What DOC17 Owns vs. References
DOC17 owns the following storage families:
- overlay template markdown files
- overlay index and metadata views
- overlay session state
- overlay pin state
- prompt recipe markdown files
- prompt recipe index and metadata views
- prompt advisor config and rewrite template definitions
- DOC17 feedback signal append streams
DOC17 does **not** own durable optimizer artifacts such as prompt variants, replay assignments, canary assignments, or promotion records. Those belong to the shared prompt-learning substrate owned by DOC8 and referenced by DOC17.
### 1.3 Retention
- Overlay and prompt-recipe source files are durable until archived or deleted by explicit command.
- Session state files may be retained for up to 14 days after last activity and then pruned by EC maintenance.
- JSONL feedback signals follow Core retention policy for operational learning signals.
- Prompt Lab worker queue files are transient but durable until consumed or explicitly pruned.
### 1.4 Atomic Write Rules
All JSON writes must use write-temp-then-rename. All JSONL writes must append via EC queue serialization. Session and pin updates must take per-key locks to avoid collision.
---
## 2) Shared Contracts and Core Schemas (`packages/contracts/`)
The following schemas are normative. All shared definitions live under `packages/contracts/src/doc17/` or the designated shared package path.
### 2.1 Primitive Enums and IDs
```ts
import { z } from "zod";
export const OverlayIdSchema = z.string().regex(/^[a-z0-9-]{1,64}$/);
export const PromptRecipeIdSchema = z.string().uuid();
export const PromptArtifactKindSchema = z.enum([
"overlay_template",
"prompt_recipe",
"prompt_advisor_rewrite",
"room_role_prompt"
]);
export const OverlayArtifactClassSchema = z.enum([
"live_overlay",
"advisor_protocol",
"post_task_template"
]);
export const OverlayCategorySchema = z.enum([
"quality",
"analysis",
"legal",
"coding",
"creative",
"strategy",
"architecture",
"review",
"debrief"
]);
export const OverlayVisibilityTierSchema = z.enum([
"core",
"advanced",
"hidden"
]);
export const SupportedSurfaceSchema = z.enum([
"chat",
"room",
"panel",
"task",
"prompt_advisor",
"post_task"
]);
export const ModelClassSchema = z.enum([
"reasoning_native",
"general_chat",
"code_specialist",
"unknown"
]);
export const OverlayPinScopeSchema = z.enum([
"thread",
"workspace",
"until_off"
]);
export const OverlayActivationSourceSchema = z.enum([
"manual",
"recommended",
"workspace_pin",
"global_pin",
"room_default",
"moderator_default",
"task_default",
"candidate_canary"
]);
export const OverlayAdviceStateSchema = z.enum([
"available",
"degraded",
"unavailable"
]);
export const OverlayRefreshReasonSchema = z.enum([
"manual",
"model_switch",
"compaction",
"packet_trim",
"session_rehydrate",
"activation_change",
"deactivation_change"
]);
export const ReanswerContextModeSchema = z.enum([
"clean_context",
"selected_turns_only",
"thread_plus_selected"
]);
export const OverlayDispatchOverrideModeSchema = z.enum([
"none",
"use_active_overlays",
"one_time_overlay_only",
"use_active_plus_one_time_overlay"
]);
export const EvidenceLabelSchema = z.enum([
"no_local_evidence",
"promising",
"established",
"needs_tuning"
]);
```
### 2.2 Overlay Source Schema
```ts
export const OverlayTemplateFrontmatterSchema = z.object({
id: OverlayIdSchema,
title: z.string().min(1),
artifact_class: OverlayArtifactClassSchema,
category: OverlayCategorySchema,
description: z.string().min(1),
visibility_tier: OverlayVisibilityTierSchema.default("advanced"),
supported_surfaces: z.array(SupportedSurfaceSchema).min(1),
recommended_model_classes: z.array(ModelClassSchema).default(["unknown"]),
reasoning_model_caution: z.string().optional(),
conflicts_with: z.array(OverlayIdSchema).default([]),
warns_with: z.array(OverlayIdSchema).default([]),
compatible_categories: z.array(OverlayCategorySchema).default([]),
live_token_target: z.number().int().min(40).max(450),
estimated_tokens: z.number().int().min(1).optional(),
requires_explicit_invocation: z.boolean().default(false),
suggested_use_cases: z.array(z.string()).default([]),
avoid_when: z.array(z.string()).default([]),
structure_labels: z.array(z.string()).default([]),
research_tags: z.array(z.string()).default([]),
preset: z.boolean().default(false),
deprecated_aliases: z.array(z.string()).default([]),
version: z.string().default("1.0.0")
});
export const OverlayTemplateRecordSchema = z.object({
frontmatter: OverlayTemplateFrontmatterSchema,
body: z.string().min(1),
body_hash: z.string(),
created_at: z.string().datetime(),
updated_at: z.string().datetime(),
archived_at: z.string().datetime().optional(),
source: z.enum(["preset", "user"])
});
```
### 2.3 Overlay Derived Index and Metadata
```ts
export const OverlayIndexEntrySchema = z.object({
overlay_id: OverlayIdSchema,
title: z.string(),
artifact_class: OverlayArtifactClassSchema,
category: OverlayCategorySchema,
visibility_tier: OverlayVisibilityTierSchema,
supported_surfaces: z.array(SupportedSurfaceSchema),
estimated_tokens: z.number().int().min(1),
conflicts_with: z.array(OverlayIdSchema),
warns_with: z.array(OverlayIdSchema),
preset: z.boolean(),
archived: z.boolean().default(false),
updated_at: z.string().datetime()
});
export const OverlayMetadataSchema = z.object({
overlay_id: OverlayIdSchema,
structure_labels: z.array(z.string()).default([]),
classifier_version: z.string(),
risk_flags: z.array(z.string()).default([]),
evidence_label: EvidenceLabelSchema.default("no_local_evidence"),
last_feedback_at: z.string().datetime().optional(),
last_lab_tested_at: z.string().datetime().optional(),
recommendation_eligible: z.boolean().default(true),
prompt_lab_eligible: z.boolean().default(false),
degraded_reason: z.string().optional()
});
```
### 2.4 Session and Pin State
```ts
export const OverlaySessionKeySchema = z.object({
workspace_id: z.string(),
surface_kind: z.enum(["chat", "room", "panel", "task"]),
surface_id: z.string(),
thread_id: z.string().optional(),
room_id: z.string().optional(),
task_id: z.string().optional()
});
export const ActiveOverlaySchema = z.object({
activation_id: z.string().uuid(),
overlay_id: OverlayIdSchema,
source: OverlayActivationSourceSchema,
artifact_class: OverlayArtifactClassSchema.default("live_overlay"),
requested_scope: OverlayPinScopeSchema.optional(),
active_for_surface: z.boolean().default(true),
priority_rank: z.number().int().min(1).max(20).default(10),
activated_at: z.string().datetime(),
activated_by: z.string().default("user"),
overlay_body_hash: z.string(),
candidate_id: z.string().uuid().optional(),
note: z.string().optional()
});
export const OverlaySessionStateSchema = z.object({
session_key: OverlaySessionKeySchema,
revision: z.number().int().nonnegative(),
active_overlays: z.array(ActiveOverlaySchema).default([]),
hard_blocked_pairs: z.array(z.tuple([OverlayIdSchema, OverlayIdSchema])).default([]),
warnings: z.array(z.string()).default([]),
prompt_packet_hash: z.string().optional(),
force_refresh_next_turn: z.boolean().default(false),
pending_refresh_reason: OverlayRefreshReasonSchema.optional(),
last_applied_revision: z.number().int().nonnegative().optional(),
last_applied_packet_hash: z.string().optional(),
last_applied_at: z.string().datetime().optional(),
last_refresh_at: z.string().datetime().optional(),
last_refresh_reason: OverlayRefreshReasonSchema.optional(),
last_trimmed_overlay_ids: z.array(OverlayIdSchema).default([]),
updated_at: z.string().datetime(),
updated_by: z.string().default("system")
});
export const PinnedOverlayStoreSchema = z.object({
scope: OverlayPinScopeSchema,
workspace_id: z.string().optional(),
overlay_ids: z.array(OverlayIdSchema).default([]),
updated_at: z.string().datetime(),
updated_by: z.string().default("user")
});
```
### 2.5 Prompt Recipes
```ts
export const PromptRecipeFrontmatterSchema = z.object({
id: PromptRecipeIdSchema,
title: z.string().min(1),
description: z.string().min(1),
task_type: z.string().min(1),
compatible_overlay_ids: z.array(OverlayIdSchema).default([]),
recommended_model_classes: z.array(ModelClassSchema).default(["unknown"]),
labels: z.array(z.string()).default([]),
required_inputs: z.array(z.string()).default([]),
output_contract: z.string().optional(),
lab_eligible: z.boolean().default(false),
version: z.string().default("1.0.0")
});
export const PromptRecipeRecordSchema = z.object({
frontmatter: PromptRecipeFrontmatterSchema,
prompt_text: z.string().min(1),
body_hash: z.string(),
created_at: z.string().datetime(),
updated_at: z.string().datetime(),
archived_at: z.string().datetime().optional(),
saved_from_surface: SupportedSurfaceSchema.optional(),
source_prompt_ref: z.object({
thread_id: z.string().optional(),
message_id: z.string().optional()
}).optional()
});
```
### 2.6 Prompt Advisor Schemas
```ts
export const PromptGapSchema = z.object({
gap_id: z.string().uuid(),
gap_kind: z.enum([
"missing_goal",
"missing_scope",
"missing_constraints",
"missing_inputs",
"missing_output_contract",
"missing_evidence_policy",
"missing_format",
"missing_verification",
"missing_decision_rule",
"missing_context_binding"
]),
severity: z.enum(["low", "medium", "high"]),
message: z.string(),
suggested_fix: z.string()
});
export const PromptImprovementHintSchema = z.object({
hint_id: z.string().uuid(),
kind: z.enum(["structure", "constraint", "verification", "context", "format", "overlay", "recipe"]),
title: z.string(),
detail: z.string()
});
export const PromptAdvisorImproveRequestSchema = z.object({
draft_prompt_text: z.string().min(1),
workspace_id: z.string(),
surface_kind: SupportedSurfaceSchema,
session_key: OverlaySessionKeySchema.optional(),
active_overlay_ids: z.array(OverlayIdSchema).default([]),
prompt_recipe_id: PromptRecipeIdSchema.optional(),
task_type: z.string().optional(),
model_class_hint: ModelClassSchema.default("unknown")
});
export const PromptAdvisorImproveResponseSchema = z.object({
gaps: z.array(PromptGapSchema).default([]),
hints: z.array(PromptImprovementHintSchema).default([]),
suggested_overlay_ids: z.array(OverlayIdSchema).default([]),
suggested_prompt_recipe_ids: z.array(PromptRecipeIdSchema).default([]),
overlay_advice_state: OverlayAdviceStateSchema.default("available"),
prompt_recipe_suggestion_state: OverlayAdviceStateSchema.default("available"),
summary: z.string(),
can_rewrite: z.boolean().default(true)
});
export const PromptAdvisorRewriteRequestSchema = z.object({
draft_prompt_text: z.string().min(1),
workspace_id: z.string(),
surface_kind: SupportedSurfaceSchema,
session_key: OverlaySessionKeySchema.optional(),
active_overlay_ids: z.array(OverlayIdSchema).default([]),
prompt_recipe_id: PromptRecipeIdSchema.optional(),
requested_style: z.enum(["default", "legal", "coding", "creative", "concise"]).default("default")
});
export const PromptAdvisorRewriteResponseSchema = z.object({
rewritten_prompt_text: z.string(),
what_changed: z.array(z.string()).default([]),
suggested_overlay_ids: z.array(OverlayIdSchema).default([]),
suggested_prompt_recipe_ids: z.array(PromptRecipeIdSchema).default([]),
rewrite_template_id: z.string(),
rewrite_candidate_hash: z.string()
});
```
### 2.7 Overlay Feedback and Prompt-Artifact Feedback
```ts
export const OverlayFeedbackSchema = z.object({
feedback_id: z.string().uuid(),
overlay_id: OverlayIdSchema,
workspace_id: z.string(),
session_key: OverlaySessionKeySchema,
thread_id: z.string().optional(),
active_overlay_ids: z.array(OverlayIdSchema).default([]),
response_rating: z.enum(["positive", "negative", "neutral"]).optional(),
overlay_helpfulness_rating: z.enum(["helpful", "not_helpful"]).optional(),
failure_category: z.enum([
"ignored_instructions",
"missed_context",
"hallucination_or_format_error",
"too_verbose",
"wrong_mode",
"other"
]).optional(),
comment: z.string().optional(),
created_at: z.string().datetime()
});
export const PromptRecipeFeedbackSchema = z.object({
feedback_id: z.string().uuid(),
prompt_recipe_id: PromptRecipeIdSchema,
workspace_id: z.string(),
session_key: OverlaySessionKeySchema.optional(),
helpfulness_rating: z.enum(["helpful", "not_helpful"]).optional(),
comment: z.string().optional(),
created_at: z.string().datetime()
});
export const PromptImprovementProposalSchema = z.object({
proposal_id: z.string().uuid(),
artifact_kind: PromptArtifactKindSchema,
artifact_id: z.string(),
source_failure_modes: z.array(z.string()).default([]),
proposed_delta_summary: z.string(),
candidate_text: z.string(),
confidence_note: z.string(),
replay_required: z.boolean().default(true),
created_at: z.string().datetime()
});
```
### 2.8 Prompt Lab Bridge Schemas
```ts
export const PromptOptimizationBackendSchema = z.enum([
"promptolution",
"dspy"
]);
export const PromptLabJobStatusSchema = z.enum([
"queued",
"running",
"completed",
"failed",
"cancelled",
"stale"
]);
export const PromptLabJobSchema = z.object({
job_id: z.string().uuid(),
artifact_kind: PromptArtifactKindSchema,
artifact_id: z.string(),
backend: PromptOptimizationBackendSchema,
request_reason: z.enum([
"manual_test",
"nightly_proposal_eval",
"candidate_retest",
"canary_followup"
]),
requested_by: z.string().default("user"),
dataset_ref: z.string().optional(),
min_replay_examples: z.number().int().min(1).default(20),
status: PromptLabJobStatusSchema,
created_at: z.string().datetime(),
updated_at: z.string().datetime()
});
export const PromptCandidateSchema = z.object({
candidate_id: z.string().uuid(),
artifact_kind: PromptArtifactKindSchema,
artifact_id: z.string(),
candidate_text: z.string(),
backend: PromptOptimizationBackendSchema,
evidence_label: EvidenceLabelSchema.default("no_local_evidence"),
replay_summary_ref: z.string().optional(),
canary_summary_ref: z.string().optional(),
source_job_id: z.string().uuid(),
source_artifact_hash: z.string(),
created_at: z.string().datetime()
});
export const OverlayRuntimeTruthSummarySchema = z.object({
session_key: OverlaySessionKeySchema,
revision: z.number().int().nonnegative(),
requested_overlay_ids: z.array(OverlayIdSchema).default([]),
applied_overlay_ids: z.array(OverlayIdSchema).default([]),
dropped_overlay_ids: z.array(OverlayIdSchema).default([]),
drop_reasons_by_overlay: z.record(z.string()).default({}),
packet_hash: z.string().optional(),
refresh_reason: OverlayRefreshReasonSchema.optional(),
generated_at: z.string().datetime().optional(),
applied_at: z.string().datetime().optional()
});
export const OverlayDispatchOverrideSchema = z.object({
mode: OverlayDispatchOverrideModeSchema.default("none"),
overlay_ids: z.array(OverlayIdSchema).default([]),
clear_after_dispatch: z.boolean().default(true),
refresh_reason: OverlayRefreshReasonSchema.optional()
});
export const OverlayReanswerPlanSchema = z.object({
plan_id: z.string().uuid(),
session_key: OverlaySessionKeySchema,
selected_message_ids: z.array(z.string()).min(1).max(8),
context_mode: ReanswerContextModeSchema.default("selected_turns_only"),
overlay_override: OverlayDispatchOverrideSchema,
keep_overlay_active_after_dispatch: z.boolean().default(false),
prompt_seed: z.string().default("Re-answer the selected turns using the selected overlay constraints."),
created_at: z.string().datetime(),
expires_at: z.string().datetime()
});
export const OverlayListResponseSchema = z.object({
items: z.array(OverlayIndexEntrySchema),
total: z.number().int().min(0),
evidence_state: OverlayAdviceStateSchema.default("available")
});
export const OverlaySessionResponseSchema = z.object({
session_state: OverlaySessionStateSchema,
prompt_truth_available: z.boolean().default(false),
runtime_truth: OverlayRuntimeTruthSummarySchema.optional()
});
export const PromptRecipeListResponseSchema = z.object({
items: z.array(PromptRecipeRecordSchema),
total: z.number().int().min(0)
});
export const PromptLabCandidateReviewDecisionSchema = z.object({
candidate_id: z.string().uuid(),
decision: z.enum(["approve_for_canary", "reject", "save_as_new_version", "replace_current_version"]),
note: z.string().optional()
});
```
---
## 3) Runtime Integration and Cross-Document Wiring
### 3.1 Runtime Path for Free-Text Chat
Free-text chat remains `Q Frontend -> Q Backend -> Gateway`, but DOC17 overlays must still be available to the runtime without letting Q or Q Backend read ELNOR memory directly.
The runtime path is:
1. User activates, deactivates, pins, unpins, or refreshes overlays through Q.
2. Q sends a structured command to EC when the action mutates durable session state.
3. EC updates the relevant `OverlaySessionState` and returns the new `revision`.
4. For each free-text turn, Q Backend forwards the current `session_key`, `overlay_session_revision`, and any one-time overlay override or re-answer plan reference to the DOC11/Gateway request path.
5. Q Backend requests a bounded **OverlayPromptPacket** from EC when any of the following is true:
- it does not already hold the current revision for the session,
- `force_refresh_next_turn` is true,
- DOC11 reported packet trim/drop on the prior turn,
- DOC11 reported a compaction event for the same session,
- the user invoked `Refresh on next turn`,
- a one-time re-answer plan or overlay dispatch override is attached.
6. EC returns the packet as data, not as a final assembled prompt.
7. DOC11 uses the packet during prompt-plan assembly, applies ordering and budget rules, and records runtime truth.
8. Gateway dispatches through OpenClaw.
9. Runtime truth flows back to Q.
Q Backend must not concatenate overlay strings into a prompt on its own. It may ferry the EC-issued packet, but final assembly and truth ownership remain with DOC11.
### 3.1A Active Every Turn, Not One-Time Injection
Normative rule: active live overlays are semantically applied on **every turn while active**. They are not inserted once at the beginning of a chat and then trusted to survive through long history, compaction, or model changes.
This means:
- compaction affects conversation-history channels, not the active overlay channel,
- model switching does not clear active overlays,
- active overlays must remain visible in runtime truth on later turns until explicitly deactivated or dropped for a documented reason,
- manual refresh is an assurance and packet-regeneration tool, not the primary way overlays stay alive.
### 3.2 OverlayPromptPacket (EC-issued, DOC11-consumed)
```ts
export const OverlayPromptBlockSchema = z.object({
overlay_id: OverlayIdSchema,
title: z.string(),
body: z.string(),
body_hash: z.string(),
artifact_class: OverlayArtifactClassSchema,
priority_rank: z.number().int().min(1).max(20),
token_estimate: z.number().int().min(1),
position_hint: z.enum(["after_workspace_before_heuristic"]),
source: OverlayActivationSourceSchema
});
export const OverlayPromptPacketSchema = z.object({
session_key: OverlaySessionKeySchema,
revision: z.number().int().nonnegative(),
requested_overlay_ids: z.array(OverlayIdSchema),
overlay_blocks: z.array(OverlayPromptBlockSchema).default([]),
warnings: z.array(z.string()).default([]),
estimated_total_tokens: z.number().int().min(0).default(0),
soft_budget: z.number().int().min(1).default(800),
hard_budget: z.number().int().min(1).default(1200),
packet_hash: z.string(),
refresh_reason: OverlayRefreshReasonSchema.optional(),
generated_at: z.string().datetime()
});
```
DOC11 may reorder or trim blocks during final prompt-plan assembly, but it must preserve truth surfaces for:
- requested overlay IDs
- applied overlay IDs
- dropped overlay IDs
- trim reasons
- final overlay packet hash
- total prompt token estimate
### 3.3 Prompt Ordering Ownership
DOC17 supplies overlay artifacts and position hints. DOC11 owns final prompt-plan ordering. R4.2 explicitly replaces any DOC17 language that claims final ordering authority.
### 3.4 Caching Rules
Prompt caching matters, but DOC17 must not destroy provenance to chase cache hits.
Normative rules:
1. Activation notices are **UI-only**. They do not appear in system/developer/user prompt text.
2. Overlay activation/deactivation takes effect at the **next turn boundary**.
3. DOC11 should attempt to preserve stable packet shape when the overlay set is unchanged.
4. Overlay packets may be byte-identical across consecutive turns when nothing relevant changed; semantic overlay application still occurs on every turn while active.
5. Overlay changes may invalidate cache-relevant prefixes; this is allowed, but changes must be explicit and visible.
6. DOC17 must not inject overlays as fake user messages to preserve cache.
### 3.4A Refresh Semantics
DOC17 supports both automatic refresh triggers and manual refresh requests. Refresh does **not** create a second overlay system. It regenerates or explicitly reapplies the current overlay packet for the next turn under the same clean ownership chain.
Automatic refresh triggers:
- session rehydrate on page reload or carry-forward,
- DOC11 reported packet trim/drop on the prior turn,
- DOC11 reported compaction for the same session,
- model switch in the same session while active overlays exist,
- restore of an archived overlay that is still pinned.
Manual refresh trigger:
- the user clicks **Refresh on next turn** from the active overlay pill or issues `/overlay refresh`.
Refresh rules:
1. refresh requests set `force_refresh_next_turn = true` and store a `pending_refresh_reason`,
2. packet generation for the next turn clears the pending refresh flag and records `last_refresh_reason` and `last_refresh_at`,
3. refresh does not change the active overlay set unless a hard conflict, archive event, or trim/drop rule requires it,
4. DOC11 runtime truth must expose the refresh reason for the turn if one was present,
5. model switch does not deactivate overlays; it only permits a refresh marker and packet rebuild if needed.
### 3.4B One-Time Re-Answer and Dispatch Overrides
DOC17 supports a one-time **Re-answer with overlay…** flow for selected turns. This flow exists for long conversations, compaction-heavy threads, and targeted redo operations.
Normative rules:
1. one-time re-answer must not be implemented by stuffing overlay text into fake user or developer messages,
2. a re-answer operation creates an `OverlayReanswerPlan` with a bounded selected-turn set and an `OverlayDispatchOverride`,
3. if `keep_overlay_active_after_dispatch = false`, the re-answer operation must not mutate durable overlay session state,
4. if `keep_overlay_active_after_dispatch = true`, the route must first apply the requested overlay through the normal session mutation path and then return the re-answer plan,
5. re-answer plans expire and cannot be replayed indefinitely,
6. DOC11 runtime truth must show when a one-time override was used on a dispatch.
### 3.5 Composability and Live Budget Policy
DOC17 R4.2 recognizes that instruction interference is often a bigger risk than absolute token size.
Normative live budget policy:
- Recommended active live overlays: **1–2**
- Maximum active live overlays without explicit override: **3**
- Soft live budget: **800 tokens** across all live overlays
- Hard live budget: **1200 tokens** across all live overlays
- Advisor protocols and post-task templates do **not** count toward the live-overlay budget because they are not persistently injected
Conflict policy:
- **Hard block** if an activation request contains a directly contradictory pair.
- **Soft warn** if the pair is merely likely to overconstrain or duplicate behavior.
- Conflicts must be symmetric in metadata.
Hard-block pairs in R4.2:
- `creative-flow × guided-verification`
- `creative-flow × dont-make-mistakes`
- `creative-flow × legal-strict`
- `creative-flow × legal-research`
- `spec-architect × concise-executive`
Soft warnings in R4.2:
- `dont-make-mistakes × guided-verification`
- `legal-strict × legal-research`
- `creative-innovation × concise-executive`
- `self-consistency-check × verify-first`
### 3.5A Session Bootstrap and Rehydration
When a surface starts a new session, EC must materialize `OverlaySessionState` in this order:
1. room defaults if the surface is a room,
2. moderator defaults if adopted for the surface,
3. workspace pins,
4. global `until_off` pins,
5. explicit user carry-forward flags from the immediately prior session if the UI requested carry-forward.
Bootstrap rules:
- archived overlays are skipped and reported as warnings,
- post-task templates are never inserted into live session state,
- advisor protocols are not auto-activated,
- hard-conflict pairs are trimmed before the initial state is returned,
- the resulting initial session revision is `0` for a brand-new state and increments only on subsequent mutation.
Rehydration rules:
- page refresh on the same session key restores current active live overlays,
- thread pins stay bound to the thread-specific session key,
- workspace pins reapply on new thread creation in the same workspace,
- `until_off` pins reapply globally until explicitly removed.
### 3.5B Session Mutation Semantics
Mutation semantics are explicit.
- activating an overlay that is already active is idempotent unless the source or pin scope changes,
- deactivating a non-active overlay is a no-op,
- requesting refresh on an unchanged session is allowed and sets `force_refresh_next_turn = true`,
- archiving an active overlay removes it from subsequent packet builds and adds a warning to affected session states,
- restoring an overlay does not auto-reactivate it unless a workspace or global pin still applies.
### 3.6 Session Boundary Rules
Overlay scope is explicit.
- **Thread** pin: survives page refresh for the current thread/session key only.
- **Workspace** pin: applies to new sessions in the workspace until removed.
- **Until off** pin: applies to all new sessions until removed.
Surface-specific rules:
- `chat`: session key includes workspace + thread.
- `room`: room-level overlay state is separate from participant role prompts.
- `panel`: overlay state is tied to panel session.
- `task`: overlay state is tied to task execution context.
### 3.7 Room and Moderator Integration (DOC12 / DOC6)
Overlay precedence inside a room is:
1. room role prompt
2. participant role overlay
3. active room overlays
4. user/session overlays
DOC17 does not own moderator profile schemas. If DOC6 does not accept `default_overlay_ids`, DOC17 stores moderator overlay defaults in an extension store and DOC6 reads them by reference.
### 3.8 Running Brief / OCM / Handoff Integration
When a surface generates a Running Brief or OCM handoff, the brief must include:
- active overlay IDs
- prompt recipe ID if used
- candidate/canary IDs if applicable
- evidence label if the artifact came from Prompt Lab
### 3.9 Cross-Document Delta Matrix
| Companion | DOC17 R4.2 requirement |
|---|---|
| **Core** | EC-owned storage, command queue serialization, scheduler hooks, maintenance for session pruning |
| **DOC8** | shared prompt-learning substrate ownership for replay/canary/promotion; DOC17 emits observations and job requests only |
| **DOC10** | route facts may include overlay session revision, overlay availability state, and one-time re-answer override references; overlay registry events distinct from general capability events |
| **DOC11** | prompt-plan assembly, overlay packet consumption, runtime truth surfaces, trim/drop reasons, refresh reasons, one-time override truth, feature gates |
| **DOC14** | shared prompt-learning substrate; `PromptArtifactKind` expanded to support `prompt_recipe`; no duplicate optimizer logic |
| **DOC15** | recommendation retrieval, `ResolvedOperation` additive fields, context facts include active overlay IDs and prompt recipe references |
| **DOC6** | moderator/profile extension seam for overlay defaults; intervention events may reference overlay IDs |
| **DOC12** | room participant overlay state and room-level active overlay truth |
| **DOC9** | overlay failure and prompt-artifact failure observations available to repair/review flows |
---
## 4) Capability Awareness, Registry Hooks, and Recommendations
### 4.1 Overlay Registry Hooks
Overlays are not generic capabilities. They require their own event family.
```ts
export const OverlayRegistryEventSchema = z.object({
event_type: z.enum([
"overlay.installed",
"overlay.updated",
"overlay.archived",
"overlay.restored",
"prompt_recipe.saved",
"prompt_recipe.updated",
"prompt_recipe.archived"
]),
overlay_id: OverlayIdSchema.optional(),
prompt_recipe_id: PromptRecipeIdSchema.optional(),
body_hash: z.string().optional(),
created_at: z.string().datetime()
});
```
DOC10 may subscribe to these for local index refresh or intent-surface discovery, but overlay text edits are not treated as generic `capability.updated` events.
### 4.2 DOC15 Additive Fields
`ResolvedOperationSchema` must grow additively:
```ts
export const ResolvedOperationSchema = ExistingResolvedOperationSchema.extend({
suggested_overlay_ids: z.array(OverlayIdSchema).default([]),
suggested_prompt_recipe_ids: z.array(PromptRecipeIdSchema).default([]),
prompt_improvement_hints: z.array(PromptImprovementHintSchema).default([]),
overlay_advice_state: OverlayAdviceStateSchema.default("unavailable"),
prompt_recipe_suggestion_state: OverlayAdviceStateSchema.default("unavailable")
});
```
If retrieval or recommendation generation is unavailable, DOC15 must return explicit degraded state instead of silently returning empty arrays without explanation.
### 4.3 Configuration Tuple Extensions
`ConfigurationTupleRefSchema` must add:
```ts
active_overlay_ids: z.array(OverlayIdSchema).default([]),
prompt_recipe_id: PromptRecipeIdSchema.optional(),
prompt_artifact_ids: z.array(z.string()).default([])
```
### 4.4 Feature Gates
The following features are gated by companion-doc readiness.
- **Learned overlay recommendation application** requires DOC11 prompt truth and DOC15 recommendation retrieval.
- **Candidate apply/promotion UI** requires DOC8 replay/canary ownership and DOC11 prompt truth.
- **Room participant overlay recommendations** require DOC12 integration.
- **Moderator profile overlay defaults** require either DOC6 schema adoption or the documented extension-store fallback.
---
## 5) Overlay Structure Classifier and Save-Time Validation
### 5.1 Purpose
DOC17 uses a two-part analysis model for overlays.
1. **Source frontmatter** is authoritative for curated presets and user-authored overlays where the author supplies metadata.
2. **Classifier-derived metadata** augments frontmatter with structure labels, risk flags, and evidence labels.
For presets, frontmatter is authoritative. The classifier must not override curated preset metadata.
### 5.2 Structure Labels
R4.2 retains and updates the structure taxonomy.
Required structure labels:
- `verification_checklist`
- `decomposition`
- `step_back`
- `pre_mortem`
- `counterargument`
- `creative_divergence`
- `legal_framework`
- `code_security`
- `spec_execution`
- `elicitation`
- `architecture_review`
- `verification_first`
- `diverse_sampling`
`chain_of_thought_explicit` is reserved and not used by default preset overlays in R4.2.
### 5.3 Save-Time Validation Rules
EC must reject overlays or prompt recipes that violate any of the following.
1. Invalid slug/ID.
2. Unsupported or unknown top-level frontmatter keys unless explicitly allowlisted.
3. YAML aliases/custom tags.
4. Duplicate IDs.
5. Unescaped reserved frame markers used by the prompt-plan renderer.
6. Direct attempts to override native/safety/authority hierarchy such as:
- “ignore all previous instructions”
- “disregard safety rules”
- “override system prompt”
- “do not reveal provenance/truth”
7. Hard conflicts not declared symmetrically.
8. Live overlays exceeding the hard live body limit of **450 estimated tokens**.
9. Advisor protocols or post-task templates incorrectly marked as `live_overlay`.
### 5.4 Parser Hardening
Normative parser requirements:
- strip BOMs
- normalize CRLF to LF
- parse frontmatter with safe YAML settings only
- reject custom tags
- reject aliases with `maxAliasCount: 0`
- preserve literal body text exactly after frontmatter parsing
- treat fenced examples in the spec as publication wrappers unless explicitly marked installable
---
## 6) Prompt Advisor
### 6.1 Purpose
Prompt Advisor is the live “help me make this prompt better” tool in Q. It has two distinct operations.
1. **Improve** — deterministic analysis. No model call required.
2. **Rewrite** — optional model-assisted rewrite using a controlled prompt-advisor rewrite template.
### 6.2 What Prompt Advisor Must and Must Not Do
Prompt Advisor must:
- analyze prompt drafts for missing goal/constraints/output contract/verifications
- recommend overlays or prompt recipes when appropriate
- stay fast and predictable
- make rewrite changes visible before the user accepts them
Prompt Advisor must not:
- silently send ad hoc drafts to Prompt Lab
- mutate active overlays
- create saved prompt recipes without explicit user action
- claim recommendation certainty when the suggestion substrate is degraded
### 6.3 Improve Flow (Deterministic)
Flow:
1. User clicks **Improve Prompt** in the composer.
2. Q calls `POST /api/prompt-advisor/improve`.
3. EC runs deterministic gap analysis using the draft, current surface, active overlays, current workspace, and available prompt-artifact suggestions.
4. Response returns gaps, hints, suggested overlays, optional suggested prompt recipes, and a short summary.
5. Q displays the result in the Prompt Advisor panel.
### 6.4 Rewrite Flow (Model-Assisted)
Flow:
1. User reviews improve results.
2. User clicks **Rewrite**.
3. Q calls `POST /api/prompt-advisor/rewrite`.
4. EC assembles a rewrite request using the selected rewrite template (`default`, `legal`, `coding`, `creative`, `concise`) and current prompt context.
5. Gateway/OpenClaw runs the rewrite.
6. Q shows the rewritten prompt, `What changed`, and any overlay/recipe suggestions.
7. User may **Use**, **Dismiss**, or **Save as Prompt Recipe**.
### 6.5 Prompt Advisor Rewrite Template Rules
Rewrite templates are reusable prompt artifacts under DOC17 storage but represented to the shared substrate as `prompt_advisor_rewrite` artifacts.
Template rules:
- wrap user text in a fenced or XML-like container
- allowlist placeholders only
- fail on unknown placeholders
- define loop/render order explicitly if loops are supported
- record template body hash and version
### 6.6 Prompt Advisor Feedback
EC must record:
- improve viewed
- rewrite requested
- rewrite accepted
- rewrite dismissed
- rewrite accepted then edited
- save as recipe after rewrite
These are observational signals only. They are not automatic promotion signals.
---
## 7) Prompt Recipes
### 7.1 Purpose
A Prompt Recipe is a saved reusable prompt artifact that the user can insert into the composer or reuse later. Prompt Recipes are for repeated workflows, not for live overlay injection.
### 7.2 What a Prompt Recipe Is Not
A Prompt Recipe is not:
- an overlay
- a memory node
- a hidden system prompt
- a canary assignment
- a Prompt Lab job result by itself
### 7.3 Save-as-Recipe UX
From the composer or rewrite result, the user can click **Save as Prompt Recipe**.
The save flow must capture:
- title
- description
- task type
- compatible overlays
- required inputs
- output contract (optional)
- whether the recipe is lab-eligible
The saved recipe is inserted into the composer when invoked. It is not automatically injected into the system prompt.
### 7.4 Runtime Behavior
Prompt Recipes operate at the user-prompt layer.
- Invoke recipe -> fills or replaces composer text.
- Recipe may suggest compatible overlays.
- Recipe usage is carried as an observational context fact (`prompt_recipe_id`) but not injected as a system overlay.
### 7.5 Prompt Recipe Eligibility for Prompt Lab
A recipe is eligible for Prompt Lab only if:
- it is saved as a reusable artifact,
- it has a stable body hash,
- it has enough replay examples or the user explicitly requests testing,
- it passes save-time validation.
---
## 8) Preset Overlay Library (21 Canonical Presets)
R4.2 reorganizes the library into **Core live overlays**, **Advanced live overlays**, **Advisor protocols**, and **Post-task templates**. The preset count rises to 21 because two experimental overlays are added (`verify-first` and `diverse-sampling`) while `reflexion-learning` is retained as a compatibility ID but reclassified as a post-task template.
### 8.1 Canonical Classes
#### Core live overlays (default surfaced)
1. `guided-verification`
2. `concise-executive`
3. `step-back-principles`
4. `structured-decomposition`
5. `secure-code`
6. `legal-research`
7. `dont-make-mistakes`
8. `legal-strict`
#### Advanced live overlays
9. `pre-mortem-risk-analysis`
10. `red-team-aggressive`
11. `self-consistency-check`
12. `spec-implementation`
13. `tree-of-approaches`
14. `creative-innovation`
15. `counterargument-generator`
16. `creative-flow`
17. `verify-first`
18. `diverse-sampling`
#### Advisor protocols (not persistent live overlays)
19. `spec-architect`
20. `elicitation`
#### Post-task template
21. `reflexion-learning` (display title: **Task Debrief**)
### 8.2 Overlay Metadata and Text
All preset overlays below are canonical install content. Coding agents should extract these source blocks into markdown files with the frontmatter shown.
---
### 8.2.1 `dont-make-mistakes.md`
```md
---
id: dont-make-mistakes
title: Don’t Make Mistakes
artifact_class: live_overlay
category: quality
description: Light caution overlay for high-stakes outputs where omission and careless error matter.
visibility_tier: core
supported_surfaces: [chat, room, panel, task]
recommended_model_classes: [general_chat, code_specialist, unknown]
conflicts_with: []
warns_with: [guided-verification]
live_token_target: 180
suggested_use_cases:
- High-stakes drafting
- Code changes touching contracts or security-sensitive paths
- Summaries that will be relied on
avoid_when:
- Pure brainstorming
- Very small acknowledgments or short clarifications
structure_labels: [verification_checklist]
research_tags: [refinebench, checklisting]
preset: true
version: 1.0.0
---
Apply this protocol to substantive outputs the user will rely on, cite, file, or execute. For trivial acknowledgments, ordinary confirmations, or one-line clarifications, respond naturally.
Before finalizing, do a short caution pass:
1. Check whether the response directly answers the user’s actual request.
2. Check whether any critical assumption is unsupported or unstated.
3. Check whether an omitted caveat, missing step, or wrong format would materially hurt the user.
4. Tighten any vague claim, invented detail, or careless overstatement.
Do not narrate the checklist unless the user asks. Use the caution pass to improve the output, not to inflate it.
```
### 8.2.2 `creative-flow.md`
```md
---
id: creative-flow
title: Creative Flow
artifact_class: live_overlay
category: creative
description: Ideation-only overlay for fast divergent generation while preserving explicit user constraints.
visibility_tier: advanced
supported_surfaces: [chat, room, panel]
recommended_model_classes: [general_chat, unknown]
conflicts_with: [guided-verification, dont-make-mistakes, legal-strict, legal-research]
warns_with: [creative-innovation]
live_token_target: 150
suggested_use_cases:
- Brainstorming names, angles, motifs, and rough concepts
- Early-stage ideation before evaluation
avoid_when:
- Final legal work
- Verified analysis
- Source-grounded research
structure_labels: [creative_divergence]
research_tags: [idea_divergence]
preset: true
version: 1.0.0
---
Use this mode only for brainstorming and early ideation.
Generate quickly and divergently. Relax internal filtering, but do not violate explicit user constraints, required format, safety rules, or deliverable boundaries.
Prefer:
- multiple distinct angles over one polished answer,
- novelty over premature optimization,
- rough but interesting options over a single safe option.
Unless the user asks for ranking or refinement, stop after generating the ideas. Do not silently switch into a verification or executive-summary mode.
```
### 8.2.3 `pre-mortem-risk-analysis.md`
```md
---
id: pre-mortem-risk-analysis
title: Pre-Mortem Risk Analysis
artifact_class: live_overlay
category: review
description: Assume the plan failed and identify the most likely reasons in advance.
visibility_tier: advanced
supported_surfaces: [chat, room, panel, task]
recommended_model_classes: [general_chat, unknown]
conflicts_with: []
warns_with: []
live_token_target: 140
suggested_use_cases:
- Strategy review
- Plan review
- Release review
- Procedural risk review
avoid_when:
- Pure drafting where the user only wants prose
structure_labels: [pre_mortem]
research_tags: [premortem]
preset: true
version: 1.0.0
---
For this task, assume the plan or output fails in the real world. Identify the most likely failure modes before recommending action.
Focus on:
1. the top risks,
2. the trigger or condition that would make each risk material,
3. the concrete mitigation or monitoring step.
Keep the analysis practical. Prefer fewer real risks over a long list of generic cautions.
```
### 8.2.4 `red-team-aggressive.md`
```md
---
id: red-team-aggressive
title: Red-Team Aggressive
artifact_class: live_overlay
category: review
description: Adversarial critique overlay for stress-testing a document, plan, or argument.
visibility_tier: advanced
supported_surfaces: [chat, room, panel, task]
recommended_model_classes: [general_chat, reasoning_native, unknown]
conflicts_with: []
warns_with: [counterargument-generator]
live_token_target: 220
suggested_use_cases:
- Spec review
- Attack memo review
- Security review
- Policy stress test
avoid_when:
- Friendly brainstorming
- Light editing
structure_labels: [counterargument]
research_tags: [adversarial_review]
preset: true
version: 1.0.0
---
Use this mode only when the user requests critique, adversarial review, or high-stakes stress-testing.
Attack the document, plan, or design as if you are trying to break it. Prefer evidence-bounded attacks over dramatic speculation.
For each major flaw you raise, try to show:
1. what exactly is weak,
2. why it fails under pressure,
3. what concrete evidence or scenario supports the criticism,
4. what fix would materially reduce the weakness.
Do not invent hidden facts just to make the critique harsher.
```
### 8.2.5 `concise-executive.md`
```md
---
id: concise-executive
title: Concise Executive
artifact_class: live_overlay
category: quality
description: Compress the response to decision-grade essentials.
visibility_tier: core
supported_surfaces: [chat, room, panel]
recommended_model_classes: [general_chat, unknown]
conflicts_with: []
warns_with: [creative-innovation, legal-research]
live_token_target: 100
suggested_use_cases:
- Briefing a principal
- Decision memo summary
- Email-ready summary
avoid_when:
- Detailed source-grounded legal analysis unless explicitly requested
structure_labels: [verification_checklist]
research_tags: [constraint_specification]
preset: true
version: 1.0.0
---
Deliver the response in executive form: brief, direct, and decision-oriented.
Prioritize:
- the answer,
- the main reasons,
- the next action,
- any deal-breaking caveat.
Do not bloat the response with process narration. If detail is necessary, keep it tightly structured and subordinate to the main answer.
```
### 8.2.6 `reflexion-learning.md` (Task Debrief)
```md
---
id: reflexion-learning
title: Task Debrief
artifact_class: post_task_template
category: debrief
description: Short end-of-task reflection template for completed or failed substantial work.
visibility_tier: hidden
supported_surfaces: [post_task]
recommended_model_classes: [general_chat, unknown]
conflicts_with: []
warns_with: []
live_token_target: 120
suggested_use_cases:
- End-of-task debrief
- Failed attempt review
- Postmortem on a completed complex task
avoid_when:
- Trivial exchanges
- Ordinary acknowledgments
structure_labels: [verification_checklist]
research_tags: [reflexion]
preset: true
deprecated_aliases: [task-debrief]
version: 1.0.0
---
Use this template only after a substantial completed task, a failed attempt, or an explicit request for reflection.
Provide a short debrief with three parts:
1. What worked.
2. What failed or remained weak.
3. One concrete improvement for the next attempt.
Do not claim that anything was logged, stored, or learned by the system unless the runtime explicitly asks for a structured artifact.
```
### 8.2.7 `step-back-principles.md`
```md
---
id: step-back-principles
title: Step-Back Principles
artifact_class: live_overlay
category: analysis
description: Abstract the governing principles before solving the case at hand.
visibility_tier: core
supported_surfaces: [chat, room, panel, task]
recommended_model_classes: [general_chat, reasoning_native, unknown]
conflicts_with: []
warns_with: []
live_token_target: 150
suggested_use_cases:
- Ambiguous analysis
- Legal reasoning
- Architecture decisions
- Strategy framing
avoid_when:
- Purely mechanical formatting tasks
structure_labels: [step_back]
research_tags: [step_back_prompting]
preset: true
version: 1.0.0
---
Before committing to a specific answer, briefly identify the governing principles, framework, or decision criteria that should control the task.
Then apply those principles to the actual facts or request.
Keep the abstraction useful and short. The point is to improve the answer, not to perform philosophy for its own sake.
```
### 8.2.8 `self-consistency-check.md`
```md
---
id: self-consistency-check
title: Self-Consistency Check
artifact_class: live_overlay
category: analysis
description: Compare more than one reasoning path when the task benefits from cross-checking.
visibility_tier: advanced
supported_surfaces: [chat, room, panel, task]
recommended_model_classes: [general_chat, unknown]
reasoning_model_caution: Often redundant on reasoning-native models; prefer use only when explicit path comparison is valuable.
conflicts_with: []
warns_with: [verify-first]
live_token_target: 200
suggested_use_cases:
- Ambiguous reasoning tasks
- Legal interpretation choices
- Decision tradeoff analysis
avoid_when:
- Straightforward factual lookups
- Trivial tasks
structure_labels: [decomposition]
research_tags: [self_consistency]
preset: true
version: 1.0.0
---
When the task benefits from cross-checking, generate at least two genuinely different reasoning paths before settling on the answer.
The paths should differ in approach, not merely wording. Compare the outcomes and resolve the disagreement explicitly.
If the task is simple or the second path would add noise instead of signal, respond normally.
```
### 8.2.9 `structured-decomposition.md`
```md
---
id: structured-decomposition
title: Structured Decomposition
artifact_class: live_overlay
category: analysis
description: Break the task into explicit subproblems before synthesis.
visibility_tier: core
supported_surfaces: [chat, room, panel, task]
recommended_model_classes: [general_chat, reasoning_native, code_specialist, unknown]
conflicts_with: []
warns_with: []
live_token_target: 170
suggested_use_cases:
- Multi-part tasks
- Spec review
- Code review
- Long-form reasoning
avoid_when:
- Tiny single-step requests
structure_labels: [decomposition]
research_tags: [plan_and_solve, least_to_most]
preset: true
version: 1.0.0
---
Break the task into the smallest useful set of subproblems, solve them in a logical order, and then synthesize the result.
Use decomposition to reduce omissions and hidden leaps. Do not over-fragment the task into busywork.
```
### 8.2.10 `legal-strict.md`
```md
---
id: legal-strict
title: Legal Strict
artifact_class: live_overlay
category: legal
description: Formal drafting and argument-discipline overlay for legal prose.
visibility_tier: core
supported_surfaces: [chat, room, panel, task]
recommended_model_classes: [general_chat, reasoning_native, unknown]
conflicts_with: [creative-flow]
warns_with: [legal-research]
live_token_target: 190
suggested_use_cases:
- Drafting briefs
- Motion sections
- Formal legal memoranda
avoid_when:
- Open-ended brainstorming
- Source gathering without drafting
structure_labels: [legal_framework]
research_tags: [legal_prompting]
preset: true
version: 1.0.0
---
Use a formal legal register and disciplined reasoning structure.
Prioritize:
1. issue framing,
2. rule/framework,
3. application,
4. conclusion,
5. disciplined qualifiers where the record or authority is incomplete.
This overlay is for drafting and formal legal analysis. It is not a substitute for source-grounded legal research.
```
### 8.2.11 `tree-of-approaches.md`
```md
---
id: tree-of-approaches
title: Tree of Approaches
artifact_class: live_overlay
category: strategy
description: Explore multiple candidate strategies before selecting one.
visibility_tier: advanced
supported_surfaces: [chat, room, panel]
recommended_model_classes: [general_chat, reasoning_native, unknown]
conflicts_with: []
warns_with: [concise-executive]
live_token_target: 210
suggested_use_cases:
- Strategy planning
- Product/design alternatives
- Litigation options analysis
avoid_when:
- Requests that need one direct answer only
structure_labels: [decomposition]
research_tags: [tree_of_thoughts]
preset: true
version: 1.0.0
---
Consider more than one viable approach before recommending a path.
For each serious approach, identify:
- its upside,
- its downside,
- the condition under which it becomes preferable.
Then recommend the best approach or explain why the choice remains contingent.
```
### 8.2.12 `guided-verification.md`
```md
---
id: guided-verification
title: Guided Verification
artifact_class: live_overlay
category: quality
description: Full verification checklist overlay for substantive outputs.
visibility_tier: core
supported_surfaces: [chat, room, panel, task]
recommended_model_classes: [general_chat, code_specialist, reasoning_native, unknown]
conflicts_with: [creative-flow]
warns_with: [dont-make-mistakes]
live_token_target: 230
suggested_use_cases:
- Legal analysis
- Code review and implementation
- Specs and plans
- High-stakes summaries
avoid_when:
- Trivial acknowledgments
- Pure ideation
structure_labels: [verification_checklist]
research_tags: [refinebench, guided_refinement]
preset: true
version: 1.0.0
---
Apply the full checklist only to substantive outputs the user will use, rely on, cite, or act on. For acknowledgments, small clarifications, or simple housekeeping turns, respond naturally.
Before finalizing a substantive output, verify:
1. Does it answer the actual request?
2. Are the key assumptions explicit?
3. Is any important evidence, caveat, dependency, or edge case missing?
4. Is the structure appropriate to the task?
5. Is there any unsupported claim, fabricated detail, or misleading certainty?
6. Is the final answer tighter and clearer after the check?
Use the checklist to improve the result silently unless the user asks to see the verification notes.
```
### 8.2.13 `spec-implementation.md`
```md
---
id: spec-implementation
title: Spec Implementation
artifact_class: live_overlay
category: coding
description: Spec-first implementation overlay for code, schema, and wiring work.
visibility_tier: advanced
supported_surfaces: [chat, room, panel, task]
recommended_model_classes: [code_specialist, general_chat, unknown]
conflicts_with: []
warns_with: [guided-verification, secure-code]
live_token_target: 240
suggested_use_cases:
- Implementing from a spec
- Contract or schema changes
- Wiring tasks across modules
avoid_when:
- Pure brainstorming
structure_labels: [spec_execution, decomposition]
research_tags: [coding_best_practices]
preset: true
version: 1.0.0
---
Apply this overlay to implementation tasks that change code, schemas, commands, routes, or wiring.
Default protocol:
1. Read the relevant spec and current code context first.
2. Plan the change in dependency order.
3. Implement in bounded steps.
4. Verify modified contracts, touched paths, and obvious regressions.
5. If one blocking ambiguity remains, state the safest assumption briefly and continue.
Do not invent interfaces, file paths, route names, or schema ownership boundaries.
```
### 8.2.14 `secure-code.md`
```md
---
id: secure-code
title: Secure Code
artifact_class: live_overlay
category: coding
description: Secure implementation and review overlay.
visibility_tier: core
supported_surfaces: [chat, room, panel, task]
recommended_model_classes: [code_specialist, general_chat, unknown]
conflicts_with: []
warns_with: [spec-implementation]
live_token_target: 210
suggested_use_cases:
- Security-sensitive code
- Input handling
- Auth, file I/O, parser work
avoid_when:
- Non-code tasks
structure_labels: [code_security, verification_checklist]
research_tags: [forge_2025, coding_security]
preset: true
version: 1.0.0
---
Prioritize secure defaults, input validation, least privilege, explicit trust boundaries, and failure-safe handling.
Before finalizing code or code review guidance, check for:
1. input validation,
2. injection or traversal risk,
3. unsafe parsing or deserialization,
4. auth or authorization gaps,
5. secrets or sensitive-data handling,
6. dangerous defaults or silent failure paths.
```
### 8.2.15 `legal-research.md`
```md
---
id: legal-research
title: Legal Research
artifact_class: live_overlay
category: legal
description: Source-grounded legal research overlay with anti-fabrication discipline.
visibility_tier: core
supported_surfaces: [chat, room, panel, task]
recommended_model_classes: [general_chat, reasoning_native, unknown]
conflicts_with: [creative-flow]
warns_with: [legal-strict]
live_token_target: 260
suggested_use_cases:
- Case-law or statute analysis
- Research memos
- Authority comparison
avoid_when:
- Pure prose drafting without source work
structure_labels: [legal_framework, verification_checklist]
research_tags: [legal_prompting, citation_hallucination]
preset: true
version: 1.0.0
---
For legal research, use a three-stage structure:
1. Define the legal question precisely.
2. Establish the governing framework or authority landscape.
3. Apply the framework to the facts or issue presented.
Do not fabricate quotations, pinpoint cites, or holdings. If authority is not available in provided context or verified retrieval, say so clearly and mark the statement as unverified rather than pretending certainty.
```
### 8.2.16 `creative-innovation.md`
```md
---
id: creative-innovation
title: Creative Innovation
artifact_class: live_overlay
category: creative
description: Structured novelty overlay for generating stronger and more varied ideas.
visibility_tier: advanced
supported_surfaces: [chat, room, panel]
recommended_model_classes: [general_chat, unknown]
conflicts_with: []
warns_with: [concise-executive, creative-flow]
live_token_target: 220
suggested_use_cases:
- Product ideas
- Strategy ideas
- Framing alternatives
avoid_when:
- Purely factual retrieval
structure_labels: [creative_divergence]
research_tags: [idea_divergence, creativity]
preset: true
version: 1.0.0
---
Generate ideas that are meaningfully different from one another rather than slight variations on the same default answer.
When helpful, vary the lens across ideas, for example by changing:
- assumptions,
- constraints,
- cost level,
- source domain of inspiration,
- time horizon.
After generating the ideas, identify which ones are novel, feasible, or high-risk.
```
### 8.2.17 `counterargument-generator.md`
```md
---
id: counterargument-generator
title: Counterargument Generator
artifact_class: live_overlay
category: review
description: Generate the strongest counterarguments to a position.
visibility_tier: advanced
supported_surfaces: [chat, room, panel, task]
recommended_model_classes: [general_chat, reasoning_native, unknown]
conflicts_with: []
warns_with: [red-team-aggressive]
live_token_target: 170
suggested_use_cases:
- Brief preparation
- Litigation strategy
- Debate or opposition prep
avoid_when:
- Friendly brainstorming
structure_labels: [counterargument]
research_tags: [adversarial_review]
preset: true
version: 1.0.0
---
Assume a capable opponent is trying to defeat the current position. Generate the strongest counterarguments rather than easy straw men.
For each major counterargument, identify:
- the pressure point,
- why it is persuasive,
- what rebuttal or mitigation is available.
```
### 8.2.18 `spec-architect.md`
```md
---
id: spec-architect
title: Spec Architect
artifact_class: advisor_protocol
category: architecture
description: Full architecture/spec review protocol for major design work.
visibility_tier: hidden
supported_surfaces: [prompt_advisor, task]
recommended_model_classes: [general_chat, reasoning_native, unknown]
conflicts_with: [concise-executive]
warns_with: []
live_token_target: 380
requires_explicit_invocation: true
suggested_use_cases:
- Major spec drafting
- Architecture review
- Cross-document consistency review
avoid_when:
- Small edits
- One-off narrow questions
structure_labels: [architecture_review, decomposition]
research_tags: [architecture_review]
preset: true
version: 1.0.0
---
Use the full protocol only for major architecture drafts, subsystem specs, or formal spec reviews.
When active as an advisor protocol:
1. Enumerate the required sections, interfaces, and owner-doc seams.
2. Draft or review against that manifest.
3. Flag omissions, conflicts, schema drift, and ownership violations.
4. Mark any deferred item explicitly with rationale.
Do not claim completeness unless every manifest item is covered or explicitly deferred.
```
### 8.2.19 `elicitation.md`
```md
---
id: elicitation
title: Elicitation
artifact_class: advisor_protocol
category: analysis
description: Targeted clarifying-question protocol for materially ambiguous tasks.
visibility_tier: hidden
supported_surfaces: [prompt_advisor, task]
recommended_model_classes: [general_chat, reasoning_native, unknown]
conflicts_with: []
warns_with: []
live_token_target: 220
requires_explicit_invocation: true
suggested_use_cases:
- Ambiguous requirements
- Under-specified requests
- Intake and scoping work
avoid_when:
- Clear requests
- Speed-critical tasks where assumptions are acceptable
structure_labels: [elicitation]
research_tags: [gate, star-gate, ambiguity_questions]
preset: true
version: 1.0.0
---
Before asking questions, classify the request as clear, partially ambiguous, or blocked.
- If clear: proceed directly.
- If partially ambiguous: ask at most 3 targeted questions that would materially change the output.
- If blocked: ask only the minimum questions needed to proceed.
Do not ask about information the user already supplied. If the user signals speed, proceed with explicit assumptions instead of continuing to question.
```
### 8.2.20 `verify-first.md`
```md
---
id: verify-first
title: Verify First
artifact_class: live_overlay
category: quality
description: Verification-first overlay for tasks where checking a candidate answer before final commitment improves reliability.
visibility_tier: advanced
supported_surfaces: [chat, room, panel, task]
recommended_model_classes: [general_chat, unknown]
conflicts_with: []
warns_with: [self-consistency-check]
live_token_target: 170
suggested_use_cases:
- Error-prone reasoning tasks
- High-stakes summaries
- Tasks with obvious candidate failure modes
avoid_when:
- Tiny factual lookups
- Fast ideation
structure_labels: [verification_first]
research_tags: [verification_first]
preset: true
version: 1.0.0
---
Before finalizing the answer, briefly test the most likely candidate answer for failure.
Ask:
1. What would make this answer wrong?
2. What important check would falsify or weaken it?
3. After that check, does the answer need revision?
Use this to improve the final output, not to narrate internal process unless the user asks.
```
### 8.2.21 `diverse-sampling.md`
```md
---
id: diverse-sampling
title: Diverse Sampling
artifact_class: live_overlay
category: creative
description: Experimental creative overlay for generating materially different candidate ideas or framings.
visibility_tier: advanced
supported_surfaces: [chat, room, panel]
recommended_model_classes: [general_chat, unknown]
conflicts_with: []
warns_with: [concise-executive, creative-flow]
live_token_target: 230
suggested_use_cases:
- Creative ideation
- Scenario generation
- Option framing
avoid_when:
- Straight legal research
- Source-bound outputs
structure_labels: [diverse_sampling, creative_divergence]
research_tags: [verbalized_sampling]
preset: true
version: 1.0.0
---
Generate a small set of candidate ideas or framings that differ in a material way, not just in wording.
Force diversity by varying assumptions, goals, constraints, or style of attack. Then briefly compare the candidates so the user can choose a direction.
Use this mode when variety itself is the objective.
```
### 8.3 Overlay Routing Rules
The old generic phrase “apply this protocol to substantive responses” is not sufficient on its own. R4.2 replaces it with overlay-specific scoping rules plus the following general policy.
General live-overlay routing policy:
- **ordinary acknowledgments / housekeeping / one-line clarifications:** respond naturally; do not run heavyweight overlay protocols
- **substantive outputs:** apply the overlay if the current task matches the overlay’s use case and the overlay is active
- **advisor protocols:** activate only on explicit invocation or when the surface itself is the Prompt Advisor
- **post-task templates:** activate only after task completion, failure, or explicit debrief request
### 8.4 Overlay Migration from R3
- `spec-architect`: live overlay -> advisor protocol
- `elicitation`: live overlay -> advisor protocol
- `reflexion-learning`: live overlay -> post-task template (displayed as **Task Debrief**)
- `verify-first`: new preset
- `diverse-sampling`: new preset
When upgrading from R3:
- Any active `spec-architect` or `elicitation` overlay must be deactivated and replaced with a one-time UI notice explaining the new invocation model.
- Any active `reflexion-learning` overlay must be deactivated from live injection and retained only as a post-task template.
---
## 9) Prompt Lab and Promotion Bridge
### 9.1 Purpose
Prompt Lab is the governed bridge between DOC17 reusable prompt artifacts and the DOC8-owned evaluation/promotion substrate.
Prompt Lab exists to test:
- overlay templates
- prompt recipes
- prompt-advisor rewrite templates
Prompt Lab does not operate on unsaved ad hoc composer drafts.
### 9.2 Backend Policy
R4.2 supports one active backend class and one deferred backend class.
- **Promptolution** — default backend for prompt-string artifacts such as overlays, prompt recipes, and prompt-advisor rewrite templates
- **DSPy** — optional later backend for structured LM programs with examples and metrics; not required for DOC17 R4.2
Promptolution may be feature-flagged on. DSPy remains off by default and outside the critical path.
### 9.3 Promptolution Rule
Promptolution may be used only for reusable artifacts with stable body hashes. It must run offline through EC-owned job requests and may not create a second durable store. Candidate output becomes a `PromptCandidate` and must be reviewed before adoption.
### 9.4 Non-Goals
Prompt Lab must not:
- optimize one-off unsaved prompt drafts during live work
- silently rewrite active overlays
- bypass DOC8 replay/canary/promotion ownership
- pretend observational evidence is enough for promotion
### 9.5 Evidence Policy
R4.2 replaces the R3 weak-sample logic with the following.
- `n < 20 matched examples / arm`: observational only; never promote
- `20 <= n < 30`: replay candidate only
- promotion requires replay support or canary support plus human approval
- mean score alone is insufficient; use lower confidence bound and evidence grade
Matching strata must include at least:
- task_type
- model_id or model_class
- interaction_mode / surface_kind
- workspace_id when relevant
- active_overlay_ids
- baseline_present
### 9.6 Candidate Review and Apply Rules
Q may show a candidate card only if:
- prompt truth is available from DOC11,
- the candidate is tied to a valid source artifact,
- the artifact has not changed since the job started,
- replay or canary evidence is available at the required grade.
Candidate application rules:
- overlay candidates create a review-and-promote action, not immediate overwrite
- prompt recipe candidates create a save-new-version or replace-current-version action
- rewrite-template candidates create a template-version review action
### 9.7 Worker Contract (Normative)
The worker contract closes the implementation gap for Promptolution and any future backend.
#### 9.7.1 Execution Model
- Worker is stateless.
- Worker may run as a local isolated process or local service.
- EC remains sole durable writer.
- Worker receives job inputs, returns results, and writes nothing durable to ELNOR memory directly.
#### 9.7.2 Request Schemas
```ts
export const PromptLabJobRequestSchema = z.object({
job_id: z.string().uuid(),
backend: PromptOptimizationBackendSchema,
artifact_kind: PromptArtifactKindSchema,
artifact_id: z.string(),
artifact_title: z.string(),
source_text: z.string(),
source_hash: z.string(),
workspace_id: z.string(),
request_reason: z.enum(["manual_test", "nightly_proposal_eval", "candidate_retest", "canary_followup"]),
optimization_goal: z.string(),
replay_examples: z.array(z.object({
example_id: z.string(),
input_text: z.string(),
expected_signal: z.string().optional(),
rubric_ref: z.string().optional(),
metadata: z.record(z.any()).default({})
})).default([]),
constraints: z.object({
max_candidates: z.number().int().min(1).max(8).default(3),
per_job_budget_usd: z.number().min(0).default(0),
banned_phrases: z.array(z.string()).default([]),
preserve_required_sections: z.array(z.string()).default([])
}),
created_at: z.string().datetime()
});
```
#### 9.7.3 Result Schemas
```ts
export const PromptLabCandidatePayloadSchema = z.object({
candidate_id: z.string().uuid(),
candidate_text: z.string(),
delta_summary: z.array(z.string()).default([]),
source_hash: z.string(),
candidate_hash: z.string(),
backend_notes: z.array(z.string()).default([])
});
export const PromptLabReplaySummarySchema = z.object({
examples_run: z.number().int().min(0),
examples_succeeded: z.number().int().min(0),
evidence_label: EvidenceLabelSchema,
lower_confidence_bound: z.number().min(0).max(1).optional(),
judge_notes: z.array(z.string()).default([])
});
export const PromptLabJobResultSchema = z.object({
job_id: z.string().uuid(),
status: PromptLabJobStatusSchema,
source_hash: z.string(),
candidates: z.array(PromptLabCandidatePayloadSchema).default([]),
replay_summary: PromptLabReplaySummarySchema.optional(),
failure_code: z.string().optional(),
failure_message: z.string().optional(),
started_at: z.string().datetime().optional(),
completed_at: z.string().datetime().optional()
});
```
#### 9.7.4 Failure Codes
Retryable:
- `worker_unavailable`
- `provider_rate_limited`
- `temporary_backend_error`
- `dataset_lock_contention`
Non-retryable:
- `artifact_not_found`
- `unsupported_artifact_kind`
- `invalid_source_text`
- `safety_lint_failure`
- `stale_source_hash`
- `insufficient_replay_examples`
#### 9.7.5 Stale Result Handling
If `source_hash` in the result does not match the current artifact hash when EC ingests it:
- mark the result `stale`
- do not create reviewable candidates
- retain the stale result for audit only
#### 9.7.6 Safety/Linting Before Persist
Before EC persists candidates, it must re-run save-time validation for the relevant artifact family. Any candidate failing linting is rejected and stored as a failed job outcome.
---
## 10) Overlay and Prompt-Artifact Learning via Friction Signals
### 10.1 Principles
DOC17 uses friction and feedback as **signal capture**, not as a truth oracle.
Signals can support:
- candidate generation,
- replay prioritization,
- recommendation ranking,
- evidence labeling.
Signals cannot directly rewrite active overlays or prompt recipes.
### 10.2 Explicit Signals
- overlay pill thumbs up/down
- response thumbs up/down
- thumbs-down category:
- missed facts/context
- ignored instructions (overlay failure)
- hallucination/format error
- prompt recipe helpful / not helpful
- prompt-advisor rewrite accepted / dismissed
- save-as-recipe after rewrite
### 10.3 Implicit Signals
High-value implicit signals:
- correction-follow-up in the next turn
- manual abort
- apply-with-edits / significant edit distance if measurable
- repeated retry on the same task with different overlay mix
Low-confidence signals:
- model switch within task
- zero-copy, except in code/apply workflows where downstream file-apply metrics exist
### 10.4 Nightly Critic
The nightly critic may consume failed cases and produce `PromptImprovementProposal` artifacts.
Inputs:
- source prompt or overlay text
- active overlay IDs
- output text
- explicit feedback categories
- implicit friction markers
- model/runtime truth
Outputs:
- hypothesized failure mode
- candidate delta summary
- proposed artifact text
- replay_required flag
Nightly critic output is a proposal only. Replay/canary/human approval remain required for promotion.
---
## 11) Q Dashboard UX
### 11.1 Primary Entry Points
R4.2 keeps two primary user entry points.
1. **Chat header pills / command palette** — activate, deactivate, pin, and inspect overlays.
2. **Overlay Library + Prompt Recipes page** — browse, create, edit, archive, inspect conflicts, see evidence labels, and access Prompt Lab actions.
Learning pages display analytics but are not the primary activation surface.
### 11.2 Active Overlay Pills
Each active live overlay appears as a pill with:
- overlay title
- scope badge (`thread`, `workspace`, `until off`)
- conflict warning badge if applicable
- click to open micro-drawer
Micro-drawer actions:
- Helpful / Not helpful
- Pin to thread / workspace / until off
- Turn off
- Refresh on next turn
- View runtime truth
- View details
### 11.2A Context Inspector Overlay Section
The Context Inspector must render an **Active Overlays** section backed by DOC11 runtime truth.
Required fields:
- requested overlay IDs
- applied overlay IDs
- dropped overlay IDs
- drop or trim reason per overlay when applicable
- session scope and pin scope
- packet hash
- revision
- refresh reason for the turn when present
- total overlay token estimate
If prompt truth is unavailable, the section must render an explicit unavailable state rather than implying that overlays were applied.
### 11.2B Selected-Turn Toolbar and Re-Answer Flow
When the user selects one or more turns, the turn toolbar must include:
- Copy
- Save as Prompt Recipe
- Re-answer with overlay…
The **Re-answer with overlay…** dialog must support:
- overlay picker,
- mode: use active overlays / one-time overlay only / active plus one-time overlay,
- context mode: clean context / selected turns only / thread plus selected,
- keep active after dispatch toggle,
- one-click confirm to create a new dispatch plan.
If `keep active after dispatch` is false, the operation is one-time only and must not durably mutate the session’s active overlay set.
### 11.2C Human-Friendly Commands and Command Palette Examples
The command palette and natural-language command examples must include at least:
- `Activate legal-research`
- `Turn off guided-verification`
- `Remove all overlays`
- `/overlay refresh`
- `/overlay status`
- `Re-answer with legal-research`
### 11.3 Prompt Advisor Panel
Composer includes a primary **Improve Prompt** button.
Panel layout:
- summary sentence
- missing pieces
- suggested overlays
- suggested prompt recipes
- primary CTA: **Rewrite**
- secondary CTA: **Activate suggested overlay**
- tertiary CTA: **Save as Prompt Recipe**
For high-stakes surfaces, do not default to **Accept & Run**. If provided at all, it must be an explicit advanced action with clear undo semantics.
### 11.4 Overlay Library Page
Tabs:
- Core
- Advanced
- Advisor Protocols
- Prompt Recipes
- Prompt Lab
Overlay detail card includes:
- purpose
- when to use
- when not to use
- research basis / research tags
- description
- exact text body
- conflicts / warnings
- supported surfaces
- model cautions
- evidence label
- last tested timestamp if available
- activation actions
### 11.4A Moderator Profile and Defaults Surface
If DOC6 adopts moderator default overlays directly, Q must render a **General Overlays** section inside the Moderator Profile Editor. If DOC6 uses the extension-store fallback, Q must still render the same UI while reading/writing through the DOC17-backed extension seam.
Required controls:
- add/remove default overlays,
- show conflicts before save,
- show whether defaults apply to rooms, panels, or both,
- link to overlay detail cards.
### 11.5 Evidence UI
Do not show naked scores like `0.77 (8 uses)` as the main UI.
Primary evidence labels:
- No local evidence yet
- Promising
- Established
- Needs tuning
Hover/details may show sample count, backend, and last replay/canary summary.
### 11.6 Cold Start UX
For new or data-sparse systems:
- show research-backed presets as defaults
- label all local evidence as `No local evidence yet`
- hide promotion/apply-candidate actions unless feature gates are satisfied
- Prompt Lab may remain visible but disabled with explanatory text if not configured
---
## 12) Commands, Routes, and Service Interfaces
### 12.1 Command Boundary Rule
All mutating operations below are EC commands. Q and Q Backend never write DOC17 durable state directly.
### 12.2 Commands
```ts
export const Doc17CommandSchema = z.discriminatedUnion("command_type", [
z.object({ command_type: z.literal("overlay.install_presets") }),
z.object({ command_type: z.literal("overlay.create"), payload: OverlayTemplateRecordSchema }),
z.object({ command_type: z.literal("overlay.update"), overlay_id: OverlayIdSchema, payload: OverlayTemplateRecordSchema }),
z.object({ command_type: z.literal("overlay.archive"), overlay_id: OverlayIdSchema }),
z.object({ command_type: z.literal("overlay.restore"), overlay_id: OverlayIdSchema }),
z.object({ command_type: z.literal("overlay.activate"), session_key: OverlaySessionKeySchema, overlay_id: OverlayIdSchema, source: OverlayActivationSourceSchema, pin_scope: OverlayPinScopeSchema.optional() }),
z.object({ command_type: z.literal("overlay.deactivate"), session_key: OverlaySessionKeySchema, overlay_id: OverlayIdSchema }),
z.object({ command_type: z.literal("overlay.pin"), session_key: OverlaySessionKeySchema, overlay_id: OverlayIdSchema, pin_scope: OverlayPinScopeSchema }),
z.object({ command_type: z.literal("overlay.unpin"), session_key: OverlaySessionKeySchema, overlay_id: OverlayIdSchema, pin_scope: OverlayPinScopeSchema }),
z.object({ command_type: z.literal("overlay.request_refresh"), session_key: OverlaySessionKeySchema, reason: OverlayRefreshReasonSchema.default("manual") }),
z.object({ command_type: z.literal("overlay.record_runtime_truth"), session_key: OverlaySessionKeySchema, runtime_truth: OverlayRuntimeTruthSummarySchema }),
z.object({ command_type: z.literal("overlay.submit_feedback"), payload: OverlayFeedbackSchema }),
z.object({ command_type: z.literal("prompt_recipe.create"), payload: PromptRecipeRecordSchema }),
z.object({ command_type: z.literal("prompt_recipe.update"), prompt_recipe_id: PromptRecipeIdSchema, payload: PromptRecipeRecordSchema }),
z.object({ command_type: z.literal("prompt_recipe.archive"), prompt_recipe_id: PromptRecipeIdSchema }),
z.object({ command_type: z.literal("prompt_recipe.submit_feedback"), payload: PromptRecipeFeedbackSchema }),
z.object({ command_type: z.literal("prompt_advisor.rewrite_feedback"), payload: z.record(z.any()) }),
z.object({ command_type: z.literal("prompt_lab.create_job"), payload: PromptLabJobSchema }),
z.object({ command_type: z.literal("prompt_lab.cancel_job"), job_id: z.string().uuid() })
]);
```
### 12.3 HTTP Routes (Q Backend -> EC or Gateway/DOC11 seam)
Read routes may be cached briefly in Q Backend memory but must never be persisted outside EC. Write routes require authenticated local user context.
Routes exposed to Q:
```text
GET /api/overlays
GET /api/overlays/search
GET /api/overlays/:overlay_id
POST /api/overlays
PUT /api/overlays/:overlay_id
POST /api/overlays/:overlay_id/archive
POST /api/overlays/:overlay_id/restore
GET /api/overlay-sessions/:session_key
POST /api/overlay-sessions/:session_key/activate
POST /api/overlay-sessions/:session_key/deactivate
POST /api/overlay-sessions/:session_key/pin
POST /api/overlay-sessions/:session_key/unpin
POST /api/overlay-sessions/:session_key/refresh
GET /api/overlay-sessions/:session_key/runtime-truth
POST /api/overlay-sessions/:session_key/reanswer-plan
POST /api/overlay-feedback
GET /api/prompt-recipes
GET /api/prompt-recipes/search
GET /api/prompt-recipes/:prompt_recipe_id
POST /api/prompt-recipes
PUT /api/prompt-recipes/:prompt_recipe_id
POST /api/prompt-recipes/:prompt_recipe_id/archive
POST /api/prompt-recipes/:prompt_recipe_id/feedback
POST /api/prompt-advisor/improve
POST /api/prompt-advisor/rewrite
POST /api/prompt-lab/jobs
GET /api/prompt-lab/jobs/:job_id
POST /api/prompt-lab/jobs/:job_id/cancel
POST /api/prompt-lab/candidates/:candidate_id/review
```
Internal route used by Q Backend / DOC11 seam:
```text
GET /internal/doc17/overlay-packet?workspace_id=...&surface_kind=...&surface_id=...&revision=...&force_refresh=...
```
### 12.3A Representative Request / Response Contracts
Overlay activation:
```ts
export const OverlayActivateRequestSchema = z.object({
overlay_id: OverlayIdSchema,
pin_scope: OverlayPinScopeSchema.optional(),
source: OverlayActivationSourceSchema.default("manual")
});
export const OverlayActivateResponseSchema = OverlaySessionResponseSchema;
```
Overlay refresh:
```ts
export const OverlayRefreshRequestSchema = z.object({
reason: OverlayRefreshReasonSchema.default("manual")
});
export const OverlayRefreshResponseSchema = OverlaySessionResponseSchema;
```
Overlay runtime truth:
```ts
export const OverlayRuntimeTruthResponseSchema = OverlayRuntimeTruthSummarySchema;
```
Overlay re-answer plan:
```ts
export const OverlayReanswerPlanRequestSchema = z.object({
selected_message_ids: z.array(z.string()).min(1).max(8),
context_mode: ReanswerContextModeSchema.default("selected_turns_only"),
override_mode: OverlayDispatchOverrideModeSchema.default("use_active_overlays"),
overlay_ids: z.array(OverlayIdSchema).default([]),
keep_overlay_active_after_dispatch: z.boolean().default(false)
});
export const OverlayReanswerPlanResponseSchema = OverlayReanswerPlanSchema;
```
Overlay search:
```ts
export const OverlaySearchRequestSchema = z.object({
query: z.string().default(""),
category: OverlayCategorySchema.optional(),
visibility_tier: OverlayVisibilityTierSchema.optional(),
include_archived: z.boolean().default(false)
});
export const OverlaySearchResponseSchema = OverlayListResponseSchema;
```
Prompt recipe create/update:
```ts
export const PromptRecipeUpsertRequestSchema = z.object({
record: PromptRecipeRecordSchema
});
export const PromptRecipeUpsertResponseSchema = z.object({
prompt_recipe_id: PromptRecipeIdSchema,
body_hash: z.string(),
updated_at: z.string().datetime()
});
```
Prompt Lab candidate review:
```ts
export const PromptLabCandidateReviewRequestSchema = PromptLabCandidateReviewDecisionSchema;
export const PromptLabCandidateReviewResponseSchema = z.object({
candidate_id: z.string().uuid(),
decision_recorded: z.boolean().default(true),
next_state: z.string()
});
```
### 12.4 Route Ordering and Error Contract
- Register `/search` or other static routes before `/:overlay_id` dynamic routes.
- All routes return `ApiErrorSchema` on failure.
- Write routes require authenticated local user context.
- Q Backend may cache GET responses in memory for a single request burst, but durable state and authoritative revision checking remain EC-owned.
### 12.5 Standard Service Interfaces (EC)
```ts
export interface OverlayRepository {
list(): Promise<OverlayIndexEntry[]>;
get(id: string): Promise<OverlayTemplateRecord | null>;
create(record: OverlayTemplateRecord): Promise<void>;
update(id: string, record: OverlayTemplateRecord): Promise<void>;
archive(id: string): Promise<void>;
restore(id: string): Promise<void>;
}
export interface OverlaySessionStore {
get(key: OverlaySessionKey): Promise<OverlaySessionState>;
activate(input: { key: OverlaySessionKey; overlayId: string; source: OverlayActivationSource; pinScope?: OverlayPinScope }): Promise<OverlaySessionState>;
deactivate(input: { key: OverlaySessionKey; overlayId: string }): Promise<OverlaySessionState>;
pin(input: { key: OverlaySessionKey; overlayId: string; pinScope: OverlayPinScope }): Promise<OverlaySessionState>;
unpin(input: { key: OverlaySessionKey; overlayId: string; pinScope: OverlayPinScope }): Promise<OverlaySessionState>;
requestRefresh(input: { key: OverlaySessionKey; reason: OverlayRefreshReason }): Promise<OverlaySessionState>;
recordRuntimeTruth(input: { key: OverlaySessionKey; truth: OverlayRuntimeTruthSummary }): Promise<void>;
getRuntimeTruth(key: OverlaySessionKey): Promise<OverlayRuntimeTruthSummary | null>;
}
export interface OverlayPacketService {
buildPacket(input: { key: OverlaySessionKey; revision?: number; forceRefresh?: boolean }): Promise<OverlayPromptPacket>;
}
export interface OverlayReanswerService {
preparePlan(input: {
key: OverlaySessionKey;
selectedMessageIds: string[];
contextMode: ReanswerContextMode;
overrideMode: OverlayDispatchOverrideMode;
overlayIds: string[];
keepOverlayActiveAfterDispatch: boolean;
}): Promise<OverlayReanswerPlan>;
}
export interface PromptAdvisorService {
improve(req: PromptAdvisorImproveRequest): Promise<PromptAdvisorImproveResponse>;
rewrite(req: PromptAdvisorRewriteRequest): Promise<PromptAdvisorRewriteResponse>;
}
export interface PromptRecipeRepository {
list(): Promise<PromptRecipeRecord[]>;
get(id: string): Promise<PromptRecipeRecord | null>;
create(record: PromptRecipeRecord): Promise<void>;
update(id: string, record: PromptRecipeRecord): Promise<void>;
archive(id: string): Promise<void>;
}
```
---
## 13) Implementation Map
### 13.1 Contracts Package
Create or update:
```text
packages/contracts/src/doc17/
overlay.ts
overlay-session.ts
prompt-recipes.ts
prompt-advisor.ts
prompt-lab.ts
feedback.ts
```
Update shared contracts where needed:
```text
packages/contracts/src/cil/
packages/contracts/src/doc8/
packages/contracts/src/doc10/
packages/contracts/src/doc11/
packages/contracts/src/doc14/
packages/contracts/src/doc15/
```
### 13.2 EC Service
Create or update:
```text
apps/ec-service/src/doc17/
OverlayRepository.ts
OverlayInstallService.ts
OverlayValidation.ts
OverlayClassifier.ts
OverlaySessionStore.ts
OverlayPinStore.ts
OverlayPacketService.ts
OverlayRuntimeTruthStore.ts
OverlayReanswerService.ts
PromptAdvisorService.ts
PromptRecipeRepository.ts
PromptRecipeInvocationService.ts
PromptLabBridge.ts
Doc17SignalEmitter.ts
Doc17Maintenance.ts
```
Responsibilities:
- storage reads/writes
- save-time linting and parsing
- overlay session mutations
- packet generation for DOC11 consumption
- refresh flag handling and runtime-truth persistence
- one-time re-answer plan creation
- prompt-advisor improve and rewrite orchestration
- prompt recipe CRUD and invocation
- feedback append streams
- Prompt Lab job submission and result ingestion
### 13.3 Q Backend
Create or update:
```text
apps/q-backend/src/routes/doc17/
overlays.ts
overlaySessions.ts
promptAdvisor.ts
promptRecipes.ts
promptLab.ts
apps/q-backend/src/services/
Doc17ApiClient.ts
OverlayPacketRelay.ts
```
Responsibilities:
- route validation with shared schemas
- auth enforcement on write paths
- pass-through of overlay packet requests to EC/DOC11 seam
- refresh and runtime-truth route passthrough
- re-answer plan preparation endpoints
- no durable writes
- no prompt-string concatenation
### 13.4 Q Frontend
Create or update:
```text
apps/q-frontend/src/features/doc17/
OverlayLibraryPage.tsx
OverlayDetailDrawer.tsx
OverlayPillBar.tsx
OverlayPillPopover.tsx
ContextInspectorOverlayPanel.tsx
ReanswerWithOverlayDialog.tsx
PromptAdvisorPanel.tsx
PromptRecipePage.tsx
SavePromptRecipeDialog.tsx
PromptLabPage.tsx
CandidateReviewDrawer.tsx
```
Responsibilities:
- overlay library browse/edit/archive flows
- active overlay pill rendering
- runtime-truth context inspector rendering
- selected-turn re-answer dialog
- prompt advisor panel and rewrite acceptance
- prompt recipe save/invoke flows
- prompt lab job creation and candidate review surfaces
- evidence labels and cold-start messaging
---
## 14) Security, Validation, and Prompt Hygiene
### 14.1 Overlay and Recipe Linting
Reject overlays or prompt recipes that:
- attempt hierarchy reset,
- suppress provenance or truth,
- contain unsafe parser tricks,
- contain disallowed placeholders,
- contain reserved prompt-frame delimiters,
- exceed live-body limits when marked as live overlays.
### 14.2 Safe YAML and Markdown Handling
- safe YAML parsing only
- no executable/custom tags
- `maxAliasCount: 0`
- unknown-key rejection unless allowlisted
- exact body preservation after parse
### 14.3 Slug and Path Safety
Overlay IDs must be slug-only. File resolution must use a known base directory plus normalized resolved path checks. `basename()` alone is insufficient.
### 14.4 Template Rendering Safety
Prompt-advisor rewrite templates and any worker templates must:
- wrap user text in fenced or XML containers,
- allowlist placeholders,
- fail on unknown placeholders,
- define loop substitution order clearly,
- escape braces or reserved characters where required.
### 14.5 User Document Prompt-Injection Hygiene
When the runtime references user documents, DOC11/OpenClaw assembly must preserve clear boundaries between:
- user-provided document text,
- overlay instructions,
- system/native context.
User document text must never be allowed to masquerade as overlay or system instructions.
---
## 15) Tests and Acceptance Criteria
### 15.1 Required Unit Tests
1. frontmatter parser rejects aliases/custom tags/unknown keys
2. slug validation blocks traversal attempts
3. symmetric conflict metadata enforced
4. archived overlays cannot activate
5. session-state revision increments correctly under lock
6. pin scopes serialize and rehydrate correctly
7. manual refresh sets `force_refresh_next_turn` and records refresh reason
8. runtime-truth recording updates last applied packet fields safely
9. one-time re-answer plan generation does not mutate session state when `keep_overlay_active_after_dispatch = false`
10. prompt recipe save/invoke path preserves body hash
11. prompt-advisor improve returns deterministic output for the same input
12. prompt-advisor rewrite template placeholder validation fails safely
13. stale Prompt Lab results are rejected
### 15.2 Required Integration Tests
1. Q activates overlay -> EC updates session state -> DOC11 runtime truth shows overlay applied on next turn
2. Q Backend never reads ELNOR_MEMORY directly
3. activation notice never appears in prompt text
4. hard-conflict activation returns a blocked response with clear reason
5. workspace pins apply to new threads in the workspace
6. room overlay precedence works against participant role prompts
7. prompt recipe invocation inserts prompt text without system-level injection
8. Prompt Lab job on unsaved ad hoc prompt is rejected
9. candidate review is disabled when prompt truth is unavailable
10. moderator overlay defaults resolve via accepted DOC6 schema or extension store fallback
11. manual refresh on next turn regenerates the packet and DOC11 truth shows `refresh_reason = manual`
12. compaction does not clear active overlays and the next turn truth remains populated
13. model switch does not clear active overlays and may set a refresh marker without mutating the active set
14. re-answer plan with `keep_overlay_active_after_dispatch = false` leaves durable session state unchanged
15. re-answer plan with `keep_overlay_active_after_dispatch = true` activates the requested overlay before dispatch
16. Context Inspector renders requested/applied/dropped overlay truth for the latest turn
### 15.3 Required End-to-End Product Tests
1. user activates `legal-research`, runs chat, sees active pill and prompt truth
2. user clicks `Refresh on next turn`, sends the next message, and sees the refresh reason in Context Inspector
3. user selects prior turns, chooses `Re-answer with overlay…`, uses clean context, and receives a new answer without mutating persistent overlay state unless requested
4. user uses Improve Prompt, accepts rewrite, saves as Prompt Recipe
5. user tests a saved prompt recipe in Prompt Lab and receives reviewable candidates later
6. user sees cold-start evidence labels with no false precision
7. user thumbs down an overlay with `ignored instructions` and the signal appears in the nightly input stream
### 15.4 R4.2 Release Gates
R4.2 is not ready until all are true:
- shared schemas merged
- session-state subsystem implemented
- DOC11 prompt truth visible for overlays
- Context Inspector overlay truth section implemented
- refresh and re-answer routes implemented
- overlay conflicts enforced
- Prompt Advisor improve/rewrite split implemented
- prompt recipe save/invoke implemented
- Prompt Lab feature flags default safe/off unless configured
---
## 16) Phasing
### Phase 0 — Safe Product Core
- overlay library CRUD
- session state + pins
- overlay packet generation
- DOC11 prompt-truth handoff
- Prompt Advisor improve + rewrite
- core presets
- feedback capture
### Phase 1 — Prompt Recipes
- save as recipe
- invoke recipe
- recipe metadata and feedback
- recipe suggestion hooks
### Phase 2 — Prompt Lab with Promptolution
- feature-gated Prompt Lab jobs
- worker contract
- candidate review UI
- replay summary ingestion
### Phase 3 — Candidate Apply / Canary Polish
- shared substrate integration with DOC8
- evidence grade and lower-bound gating
- human approval flows
### Phase 4 — Optional DSPy and Future Extensions
- structured LM program support only where a real metric-bearing program exists
- not required for DOC17 R4.2 acceptance
---
## 17) Appendix A — Prompt Engineering Research Compendium (Integrated, Canonical)
**Status:** Canonical integrated appendix. There is no separate active research-addendum spec after R4.2.
**Runtime mirror:** EC MAY materialize a read-only extracted mirror at `ELNOR_MEMORY/knowledge/prompt_engineering_research.md` for agent/runtime lookup. That mirror is derivative only and must not diverge from this embedded appendix.
**Purpose:** Living research reference for DOC17 preset design, Prompt Advisor heuristics, Prompt Recipes, Prompt Lab experiments, overlay/prompt review, and future artifact design.
**Last updated:** 2026-03-10
---
### A.0 How this appendix should be interpreted
This appendix is a research-backed design reference, not a bag of magic spells.
Rules for use:
1. **Task-specific constraints beat generic role-play.** “Act as an expert lawyer” is weaker than “state the legal standard, identify elements, apply facts, and flag unsupported citations.”
2. **Prompt techniques are contingent.** What helps on one model family or task may hurt on another.
3. **More instructions are not always better.** New 2025–2026 work shows that extra constraints can degrade task solving.
4. **Research gives hypotheses, not local truth.** DOC17 uses research for priors and design guidance, then relies on local evidence, replay, and operator review.
5. **Prompting and retrieval are distinct.** A good prompt does not create missing facts. In legal and factual domains, prompt design must be coupled with uncertainty handling and source discipline.
---
### A.1 Self-Refinement and Iterative Improvement
#### A.1.1 Self-Refine (Madaan et al., 2023)
**Citation:** Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Welleck, S., Majumder, B.P., Gupta, S., Yazdanbakhsh, A., & Clark, P. (2023). *Self-Refine: Iterative Refinement with Self-Feedback*. NeurIPS 2023. arXiv:2303.17651.
**What it does:** A three-phase loop — (1) generate initial output, (2) the same LLM critiques its own output, (3) the LLM refines based on the critique. Repeat until quality threshold or iteration limit.
**Key results:**
- ~20% absolute improvement across 7 diverse tasks (code generation, math reasoning, sentiment, dialogue, acronym generation, constrained generation, code review)
- Works without additional training, fine-tuning, or external tools
- Improvements come from the critique step, not just additional computation
**Limitations and caveats:**
- Diminishing returns after 2–3 iterations on most tasks
- Self-diagnosis is the bottleneck — models are better at executing corrections than identifying what's wrong
- Effectiveness varies by model capability; weaker models produce worse self-critiques
- Does not solve hallucination. A model can confidently refine a false premise.
**Relevance to DOC17:** Supports critique-and-rewrite flows, but only when the critique is anchored to explicit criteria. This is why DOC17 favors guided verification and deterministic gap analysis over vague “please improve” prompting.
#### A.1.2 RefineBench (Lee et al., 2025)
**Citation:** Lee, H., et al. (2025). *RefineBench: Evaluating LLM Self-Refinement Capabilities*. Published November 2025.
**What it does:** Benchmarks LLM self-refinement across frontier models under vague, guided, and checklist-style refinement prompts.
**Key results:**
- Explicit checklist-based feedback dramatically outperforms vague “refine this” instructions
- Guided refinement outperforms unguided self-refinement across tested models
- Self-diagnosis remains the bottleneck
- The gap between guided and unguided grows on harder tasks
**Implications for DOC17:**
- Overlays should provide explicit verification criteria, not vague improvement instructions
- Prompt Advisor’s deterministic gap analysis is justified
- `guided-verification` is stronger than “self-critique” overlays that ask the model to free-associate about its own errors
#### A.1.3 ProActive Self-Refinement / Selective Refinement
**Finding:** Follow-up work to Self-Refine suggests models should not reflexively refine every answer. Over-refinement can introduce new errors and bloat.
**Relevance:** Supports DOC17’s distinction between:
- substantive outputs that merit a full protocol,
- lightweight follow-ups that should stay light,
- and advisor/post-task patterns that should not ride every turn.
---
### A.2 Multiple Reasoning Paths and Verification-First Reasoning
#### A.2.1 Self-Consistency (Wang et al., 2022)
**Citation:** Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., & Zhou, D. (2022). *Self-Consistency Improves Chain of Thought Reasoning in Language Models*. ICLR 2023. arXiv:2203.11171.
**What it does:** Instead of greedy decoding a single chain of thought, sample multiple diverse reasoning paths and take the most common answer.
**Key results:**
- GSM8K: +17.9%
- SVAMP: +11.0%
- AQuA: +12.2%
- StrategyQA: +6.4%
**Limitations:**
- Requires multiple calls
- Diminishing returns beyond 5–10 paths
- Less useful when tasks have no single objectively correct answer
- Majority vote can still converge on the same wrong pattern
**Relevance to DOC17:** Supports multi-path exploration, but the full multi-call method is too expensive for a generic live overlay. DOC17 adapts the principle into bounded alternative-path checks and, where needed, replay-time experiments.
#### A.2.2 Tree of Thoughts (Yao et al., 2023)
**Citation:** Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T., Cao, Y., & Narasimhan, K. (2023). *Tree of Thoughts: Deliberate Problem Solving with Large Language Models*. NeurIPS 2023. arXiv:2305.10601.
**Key results:**
- Game of 24: 4% → 74% success rate vs. standard prompting
- Improved creative writing coherence and novelty
- Better crossword completion
**Limitations:**
- Very expensive in the original form
- Best for tasks with clear evaluation criteria and discrete solution steps
- Requires external orchestration for full search
**Relevance to DOC17:** Justifies `tree-of-approaches` as a simplified “explore before committing” technique, not as a full search algorithm embedded in every overlay.
#### A.2.3 Verification-First (Wu et al., 2025)
**Citation:** Wu, S., et al. (2025). *Asking LLMs to Verify First is Almost Free Lunch*. arXiv:2511.21734.
**What it does:** Gives the model a candidate answer (possibly trivial or even random) and asks it to verify/evaluate that answer before generating its own final response. The premise is that checking a candidate can be cognitively easier than synthesizing one from scratch.
**Key findings from the paper:**
- Verification-First improves reasoning performance with comparatively small additional token cost
- The effect generalizes beyond a single model family in the paper’s experiments
- The method can be iterated (Iter-VF), but complexity and token use climb with repeated cycles
**DOC17 interpretation:**
- This supports a `verify-first` overlay or Prompt Advisor pattern for difficult reasoning tasks
- It is **not** a universal replacement for all reasoning prompts
- It is most promising where a plausible candidate answer or hypothesis already exists (e.g., legal issue framing, code patch plausibility, argument review)
**R4.2 design consequence:** `verify-first` remains an **Extended / experimental** overlay rather than a default Core overlay.
---
### A.3 Abstraction and Decomposition
#### A.3.1 Step-Back Prompting (Zheng et al., 2023)
**Citation:** Zheng, H., Mishra, S., Chen, X., Cheng, H-T., Chi, E., Le, Q., & Zhou, D. (2023). *Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models*. arXiv:2310.06117.
**Key results:**
- MMLU Physics: +7%
- MMLU Chemistry: +11%
- TimeQA: +27%
- Significant gains on multi-hop reasoning benchmarks
**Why it works:** Forces the model to retrieve a governing principle before applying it to specifics.
**Relevance to DOC17:** Directly supports `step-back-principles`, especially for legal analysis, architecture, and policy tasks.
#### A.3.2 Plan-and-Solve (Wang et al., 2023)
**Citation:** Wang, L., Xu, W., Lan, Y., Hu, Z., Lan, Y., Lee, R.K-W., & Lim, E-P. (2023). *Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models*. ACL 2023. arXiv:2305.04091.
**What it does:** Plan the sub-tasks first, then solve them.
**Key takeaway:** Decomposition before execution reduces omission errors.
**Relevance to DOC17:** Directly supports `structured-decomposition`, `spec-implementation`, and manifest-style spec review.
#### A.3.3 Least-to-Most Prompting (Zhou et al., 2022)
**Citation:** Zhou, D., Schärli, N., Hou, L., Wei, J., Scales, N., Wang, X., Schuurmans, D., Cui, C., Bousquet, O., Le, Q., & Chi, E. (2022). *Least-to-Most Prompting Enables Complex Reasoning in Large Language Models*. ICLR 2023. arXiv:2205.10625.
**What it does:** Solve simpler sub-problems first, then roll upward.
**Relevance to DOC17:** Complements Plan-and-Solve and informs dependency-ordered implementation plans.
---
### A.4 Risk Analysis and Adversarial Thinking
#### A.4.1 Pre-Mortem Technique (Klein, 2007)
**Citation:** Klein, G. (2007). *Performing a Project Premortem*. Harvard Business Review.
**What it does:** Assume failure has already happened, then identify causes.
**Key result:** Prospective hindsight increases failure-cause identification vs. ordinary risk review.
**Relevance to DOC17:** Supports `pre-mortem-risk-analysis`, parts of `dont-make-mistakes`, `red-team-aggressive`, and `counterargument-generator`.
---
### A.5 Reflection and Learning
#### A.5.1 Reflexion (Shinn et al., 2023)
**Citation:** Shinn, N., Cassano, F., Gopinath, A., Narasimhan, K., & Yao, S. (2023). *Reflexion: Language Agents with Verbal Reinforcement Learning*. NeurIPS 2023. arXiv:2303.11366.
**What it does:** After an attempt, generate structured reflection and use it in subsequent attempts.
**Key results:**
- Strong gains in multi-attempt environments (AlfWorld, HotPotQA, HumanEval)
**Limitations:**
- Requires multiple attempts or memory reuse
- Not a magic one-shot overlay
- Reflection quality depends heavily on model capability
**DOC17 interpretation:**
- Good fit for **post-task debriefs** and friction analysis
- Bad fit as an always-on “reflect after every sentence” behavior
- Reflection should feed proposal generation and standing-order candidates, not silently rewrite active overlays
---
### A.6 Constraint and Structure
#### A.6.1 Constraint Specification Research
**General finding:** Specific output constraints (required sections, forbidden patterns, formatting rules, uncertainty handling) are more effective than open-ended generation instructions.
**Wharton Report 4 (December 2025):**
- Generic expert personas do not improve factual accuracy
- Task-specific instructions do
**Relevance to DOC17:** Justifies most overlays being concrete and behavior-specific rather than role-play heavy.
---
### A.7 Code Generation Prompting
#### A.7.1 FORGE 2025 Security Study
**Citation:** FORGE 2025 Conference. *Benchmarking Prompt Engineering Techniques for Secure Code Generation with GPT Models*. Published 2025.
**Key results:**
- Security-oriented prompt prefixes substantially reduced vulnerabilities
- Iterative review/fix prompting repaired many vulnerabilities
- Specific checklists beat generic “write secure code” requests
**Relevance to DOC17:** Direct support for `secure-code` and for checklist-style secure coding prompts.
#### A.7.2 Cursor Rules / Developer-Provided Context (Jiang & Nam, 2025/2026)
**Citation:** Jiang, S., & Nam, D. (2025). *An Empirical Study of Developer-Provided Context for AI Coding Assistants in Open-Source Projects*. arXiv:2512.18925; appearing as *Beyond the Prompt: An Empirical Study of Cursor Rules*, MSR 2026.
**What the paper actually shows:**
- Through qualitative analysis of 401 open-source repositories with Cursor rules, the paper develops a taxonomy of developer-provided context
- High-level themes include conventions, guidelines, project information, LLM directives, and examples
**What it does *not* prove:** It does **not** establish a universal causal ranking like “project-specific context is the strongest predictor of code quality in all settings.”
**Safe interpretation for DOC17:**
- Project-specific context appears highly important in repository-scale coding workflows
- Persistent, machine-readable constraints and project conventions matter
- Overlays and prompt recipes for code/spec work should favor project/context grounding over generic “be a great programmer” theater
#### A.7.3 OpenAI GPT-5 / Reasoning-Era Coding Guidance
**Primary source:** OpenAI official docs, including reasoning best practices and prompt-caching guidance (2025–2026).
**Useful operational guidance:**
- Plan before acting
- Break implementation into dependency-ordered steps
- Verify after changes
- Don’t over-prescribe chain-of-thought on reasoning-native models
**Relevance to DOC17:** Supports slim, concrete implementation overlays and model-class cautions.
#### A.7.4 Anthropic Claude Code / Prompting Best Practices
**Primary source:** Anthropic developer docs and Claude Code best-practice guidance (2025–2026).
**Useful operational guidance:**
- Read project context first
- Use explicit structure and examples where helpful
- Keep prompts clear and specific
- Use prompt caching / stable prefixes where possible
**Relevance to DOC17:** Supports prompt-truth visibility, stable-packet design, and project-context-first coding overlays.
---
### A.8 Legal Domain Prompting
#### A.8.1 Scientific Reports / Nature Portfolio Legal Framework Study (Zhang et al., 2026)
**Citation:** Zhang, M., Zhao, N., Qin, J., Xu, Q., Pan, K., & Luo, T. (2026). *A comprehensive framework for legal dispute analysis integrating prompt engineering and multi-dimensional knowledge graphs*. Scientific Reports, 16, Article 679. DOI: 10.1038/s41598-025-30306-9. Preprint: arXiv:2507.07893.
**What the study actually does:**
- Combines a three-stage hierarchical prompt structure (**task definition**, **knowledge background**, **reasoning guidance**) with a three-layer knowledge graph
- Evaluates on 500 test samples from six legal AI benchmark datasets
- Reports improvements in F1, BLEU-4, ROUGE-L F1, and expert-rated legal content quality for mainstream models
**Critical correction for DOC17:**
The original Appendix A summary over-compressed this into a generic “+16% legal accuracy” claim. That is too sloppy. The reported gains are for a combined framework (prompting + knowledge graph support), not prompt structure alone.
**DOC17 takeaway:**
- The staged legal reasoning structure is still highly relevant
- But DOC17 should only borrow the prompt-structure insight:
1. define the legal question,
2. establish the governing framework,
3. reason from framework to facts
- DOC17 must **not** imply that the full benchmark gains will appear from overlay prompting alone
#### A.8.2 Citation Hallucination and Legal Hallucinations
**Representative sources:**
- Dahl, M., Magesh, V., Suzgun, M., & Ho, D. (2024). *Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models*. arXiv:2401.01301.
- Blair-Stanek, A., & Van Durme, B. (2025). *LLMs Provide Unstable Answers to Legal Questions*. arXiv:2502.05196.
**What the literature shows:**
- Legal hallucinations remain severe
- Models may fabricate case names, holdings, article numbers, or confidently unstable legal judgments
- Deterministic settings do not eliminate instability
**Relevance to DOC17:**
- Legal overlays must harden uncertainty handling
- Legal prompts should separate verified authority from unsupported reasoning
- Citation and quote generation must be treated as high risk
- A good legal overlay is not merely “formal sounding”; it must actively suppress fabricated authority
---
### A.9 Creative and Ideation Prompting
#### A.9.1 LLM Idea Novelty (Si et al., 2025)
**Citation:** Si, C., et al. (2025). *Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers*. ICLR 2025.
**Key results:**
- LLM-generated ideas may score higher on novelty than many human-generated ideas
- Feasibility often lags novelty
- LLM outputs exhibit high pairwise similarity: the model tends to converge on a narrow band of “interesting” ideas
**Relevance to DOC17:** Supports forced divergence in creative overlays and cautions against assuming “generate 10 ideas” means 10 genuinely different ideas.
#### A.9.2 HBR / Collective Creativity (2025)
**Citation:** Harvard Business Review, December 2025. Research summary on LLMs and collective creativity.
**Key finding:** LLMs are strong at persistence and variation, but can reduce collective diversity if everyone uses the same model-generated ideation patterns.
**Relevance to DOC17:** Supports structural diversity prompts and creative-lens scaffolds.
#### A.9.3 Verbalized Sampling (Zhang et al., 2025)
**Citation:** Zhang, J., et al. (2025). *How to Mitigate Mode Collapse and Unlock LLM Diversity*. arXiv:2510.01171. Introduces **Verbalized Sampling (VS)**.
**What it does:** Prompts the model to verbalize a probability distribution over multiple responses rather than collapsing immediately to one “typical” answer.
**Key findings from the paper:**
- In tested settings, VS increases response diversity substantially
- The paper reports no obvious safety collapse relative to baseline generation in its experiments
- The technique is training-free and prompt-level
**DOC17 interpretation:**
- Good fit for creative ideation and exploratory strategy generation
- Not a universal default mode
- Best used as an Extended/experimental overlay or Prompt Advisor tactic, because it is inherently more verbose and branchy than everyday drafting
---
### A.10 Meta-Research on Prompting Effectiveness, Interference, and Hierarchy
#### A.10.1 Wharton Prompting Science Reports (2025)
**Citation:** Mollick, E., Mollick, L., Meincke, L., Shapiro, D., et al. (2025). *Prompting Science Reports 1–4*. Wharton Generative AI Labs / SSRN.
**Key takeaways:**
- Prompt effectiveness is highly model-specific and task-specific
- Chain-of-thought shows diminishing returns on reasoning-native models
- Emotional manipulation (“tip threats,” emotional coercion) does not reliably help
- Task-specific constraints beat generic personas for factual accuracy
**DOC17 relevance:** Confirms that overlay design should be concrete, measured, and model-aware.
#### A.10.2 The Prompt Report (Schulhoff et al., 2024/2025)
**Citation:** Schulhoff, S., et al. (2024). *The Prompt Report: A Systematic Survey of Prompt Engineering Techniques*. arXiv:2406.06608. Updated 2025.
**Relevance:** Good taxonomy reference. DOC17 uses a deliberately smaller subset of higher-value techniques rather than mirroring the full survey.
#### A.10.3 Instruction-Following Can Interfere with Task Solving (Qi et al., 2026)
**Citation:** Qi, Y., Peng, H., Shi, X., Xin, A., Wang, X., Xu, B., Hou, L., & Li, J. (2026). *On the Paradoxical Interference between Instruction-Following and Task Solving*. arXiv:2601.22047.
**What the paper shows:**
Adding even self-evident constraints can degrade task-solving performance. The paper introduces a benchmark/metric for this interference and shows measurable drops across math, multi-hop QA, and code generation.
**DOC17 consequence:**
This is one of the strongest research reasons for:
- fewer simultaneous overlays,
- shorter live overlays,
- hard conflict blocking,
- and avoiding gratuitous constraint stacking.
#### A.10.4 Entangled Multi-Turn Instructions (Han, 2025)
**Citation:** Han, C. (2025). *Can Language Models Follow Multiple Turns of Entangled Instructions?* arXiv:2503.13222. Findings of EMNLP 2025 version also available via ACL Anthology.
**What it shows:**
- Models struggle not only with remembering instructions across turns, but with integrating overlapping or conflicting instructions over time
- Strong memorization does not imply good conflict resolution
**DOC17 consequence:**
Supports explicit session-state overlays, turn-boundary refresh semantics, and visible runtime truth instead of assuming the model will just “remember” a past overlay instruction indefinitely.
#### A.10.5 Instruction Hierarchy Benchmarks (IHEval, 2025)
**Citation:** Zhang, Z., Li, S., Zhang, Z., Liu, X., Jiang, H., Tang, X., Gao, Y., Li, Z., Wang, H., Tan, Z., Li, Y., Yin, Q., Yin, B., & Jiang, M. (2025). *IHEval: Evaluating Language Models on Following the Instruction Hierarchy*. arXiv:2502.08745; NAACL 2025.
**What it shows:**
- Models struggle to resolve conflicting instructions across hierarchy levels
- Conflict cases sharply degrade performance relative to aligned cases
**DOC17 consequence:**
Supports:
- explicit prompt-plan truth,
- hierarchy-aware composition,
- conflict metadata,
- and strong ownership boundaries between system/overlay/user/tool layers.
#### A.10.6 Reasoning Up the Instruction Ladder (Zheng et al., 2025)
**Citation:** Zheng, Z., Balachandran, V., Park, C. Y., Brahman, F., & Kumar, S. (2025). *Reasoning Up the Instruction Ladder for Controllable Language Models*. arXiv:2511.04694.
**What it shows:**
Instruction hierarchy resolution can itself be treated as a reasoning task. Models can become more reliable at reconciling competing instructions when hierarchy handling is made explicit and trained/evaluated.
**DOC17 consequence:**
Justifies treating overlay composition as a hierarchy problem, not as naïve text concatenation.
---
### A.11 Interactive Elicitation
#### A.11.1 GATE — Generative Active Task Elicitation (Li et al., 2023/2025)
**Citation:** Li, B.Z., Tamkin, A., Goodman, N., & Andreas, J. (2023). *Eliciting Human Preferences with Language Models*. ICLR 2025. arXiv:2310.11589.
**Key insight:** Interactive elicitation can be lower effort and higher fidelity than forcing users to pre-specify all constraints in one shot.
**DOC17 consequence:**
Supports targeted clarifying-question protocols when ambiguity materially changes the output.
#### A.11.2 STaR-GATE (Andukuri et al., 2024)
**Citation:** Andukuri, C., Fränken, J.-P., Gerstenberg, T., & Goodman, N. (2024). *STaR-GATE: Teaching Language Models to Ask Clarifying Questions*. arXiv:2403.19154.
**Key insight:** RLHF often suppresses question-asking, pushing models to answer presumptively instead.
**DOC17 consequence:**
Explains why elicitation must be explicit when needed.
#### A.11.3 TO-GATE (2025)
**Citation:** *TO-GATE: Clarifying Questions and Summarizing Responses with Trajectory Optimization for Eliciting Human Preference*. arXiv:2506.02827.
**Key insight:** Asking the **right** questions in the **right order** matters more than asking more questions.
**DOC17 consequence:**
Supports short, high-value elicitation rather than mini-interrogations.
#### A.11.4 Clarifying Questions for Ambiguity (Zhang et al., 2024/2025)
**Citation:** Zhang, M.J.Q., Knox, W.B., & Choi, E. (2024). *Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions*. ICLR 2025. arXiv:2410.13788.
**Key insight:** Models can learn to decide *when* clarification is needed.
**DOC17 consequence:**
Supports advisor-protocol elicitation and scoped live elicitation, not a blanket “always ask 3–7 questions” rule.
---
### A.12 Automatic Prompt Optimization
#### A.12.1 Overview of the Field
Relevant systems and families include:
- **DSPy** — programmatic LM optimization over modules, examples, and metrics
- **OPRO** — Optimization by PROmpting
- **APE** — Automatic Prompt Engineer
- **PromptBreeder / EvoPrompt** — evolutionary prompt mutation/search
- **TextGrad** — textual gradients for iterative prompt/program improvement
- **BPO / black-box prompt optimization**
- **Promptolution** — modular framework for discrete prompt-string optimization
- **metaTextGrad** — optimizer-of-optimizers approach
#### A.12.2 DOC17 Positioning
DOC17 R4.2 does **not** support live hot-path prompt auto-mutation.
Instead:
- Prompt Advisor gives immediate deterministic and model-assisted help for live work
- Prompt Lab handles reusable artifacts offline
- replay/canary/human approval govern promotion
This is a deliberate design choice based on owner boundaries, statistical rigor, and runtime stability.
#### A.12.3 Promptolution (Zehle et al., 2025)
**Citation:** Zehle, T., Heiß, T., Schlager, M., Aßenmacher, M., & Feurer, M. (2025). *promptolution: A Unified, Modular Framework for Prompt Optimization*. arXiv:2512.02840.
**What it is:**
A modular framework focused on prompt optimization itself. It aims to produce framework-agnostic prompt strings and compare discrete prompt optimizers under a unified setup.
**Why it matters for DOC17:**
- Better fit for optimizing **prompt text artifacts** (overlay templates, prompt recipes, prompt-advisor rewrite prompts)
- Lower conceptual mismatch than DSPy for pure prompt-string work
- Appropriate for **offline Prompt Lab** use, not live chat-time mutation
#### A.12.4 DSPy
**Primary source:** DSPy documentation and papers (Stanford / open-source ecosystem; current docs at dspy.ai).
**What it is:**
A framework for **programming** LMs via modules, signatures, examples, and metrics. DSPy optimizes LM programs, not just raw prompt strings.
**Why it matters for DOC17:**
- Useful later for structured evaluators, rubric agents, and multi-step prompt-improvement programs
- Not the default optimizer for ordinary overlay text
#### A.12.5 metaTextGrad (Xu et al., 2025)
**Citation:** Xu, G., Yuksekgonul, M., Guestrin, C., & Zou, J. (2025). *metaTextGrad: Automatically Optimizing Language Model Optimizers*. arXiv:2505.18524.
**Why it matters:**
Shows that even prompt/program optimizers can themselves be optimized and task-specialized.
**DOC17 interpretation:**
Interesting future direction for Prompt Lab evolution, but not an R4.2 implementation target.
---
### A.13 Model-Specific and Operational Findings
#### A.13.1 Reasoning Models and Prompting
**Sources:** Wharton Report 2 (2025) and official vendor reasoning docs.
**Key finding:**
Explicit chain-of-thought prompting can show diminishing returns or redundancy on reasoning-native models (e.g., reasoning-specialized families). Verification, decomposition, and structure can still help, but “think step by step” is not universally beneficial.
**DOC17 consequence:**
Overlays should carry model-class cautions rather than assuming one-size-fits-all reasoning guidance.
#### A.13.2 Model-Family Instruction Following
**General finding:** Different model families respond differently to structure, XML-style framing, examples, and instruction density.
**DOC17 consequence:**
Prompt Lab and local evidence remain necessary even when the literature is directionally favorable.
#### A.13.3 Prompt Caching and Stable Prefixes
**Primary sources:**
- OpenAI prompt-caching guide (2025–2026)
- Anthropic prompt-caching guide (2025–2026)
**Operational relevance:**
- Stable repeated prefixes reduce latency/cost
- Frequent mutation of the prompt prefix degrades caching benefits
**DOC17 consequence:**
- activation notices should be UI-only
- overlay packets should remain stable when unchanged
- overlay refreshes should happen at turn boundaries
- long advisor protocols should not be always-on live overlays
#### A.13.4 Context Engineering for Agents
**Primary source:** Anthropic engineering guidance on context engineering for agents (2025).
**Operational relevance:**
Agent quality depends heavily on what context is included, excluded, refreshed, and hierarchically ordered.
**DOC17 consequence:**
Overlays should be explicit context channels with truth surfaces, not forgotten one-off messages buried in chat history.
---
### A.14 Software Engineering and Specification Writing
#### A.14.1 Interactive Requirement Elicitation
**Finding:** Interactive requirement elicitation helps reduce misinterpretation in software/system design.
**Relevance:** Supports advisor-protocol elicitation and architecture-review routing.
#### A.14.2 Project-Specific Context
**Finding:** Developer-provided project rules/context are important inputs to AI coding assistants.
**Relevance:** Supports prompt recipes and overlays that encode project-specific conventions.
---
### A.15 Research Gaps and Open Questions (Revised)
1. **Optimal live overlay length and density**
We now have interference research, but still do not know the best practical length/shape for stable high-value live overlays across model families.
2. **Overlay effectiveness on reasoning-native models**
CoT-specific guidance is weakening, but verification/decomposition effects remain under-characterized.
3. **Best evaluation design for single-user prompt libraries**
Replay/canary/human review are clearly safer than tiny live A/Bs, but exact sample thresholds and matching strategies remain partly empirical.
4. **Prompt-recipe vs. overlay boundaries**
More experience is needed to determine when a reusable artifact should live as an overlay versus a recipe or advisor protocol.
5. **Few-shot examples in persistent prompt artifacts**
Examples help in many settings, but their interaction with long-lived system/developer overlays remains underexplored.
6. **Long-conversation adherence decay**
Entangled-instruction research shows the problem is real, but overlay-specific refresh strategies still need local measurement.
---
### A.16 How to Use This Appendix in R4.2
**For overlay authors:**
- Cite the relevant subsection in frontmatter `research_basis`
- Prefer concrete constraints over persona theater
- Check model-class cautions before adding explicit reasoning instructions
**For Prompt Advisor:**
- Use this appendix as a hypothesis source for missing structures
- Suggest structures with plain-language rationale, not fake precision
**For Prompt Lab:**
- Treat this appendix as prior art and experiment guidance
- Do not treat research results as automatic permission to promote candidates
**For coding agents and human reviewers:**
- This appendix is the canonical evidence basis for why each major overlay family exists
- If new research materially changes a design premise, update this appendix and then revisit the affected overlays/prompts
**Canonical rule:**
If a separately materialized runtime mirror of this appendix exists under `ELNOR_MEMORY/knowledge/`, it is derivative only. This integrated Appendix A inside DOC17 R4.2 remains the source of truth.
---
## 18) Appendix B — Cross-Doc Change List
### B.1 DOC8
Add shared prompt-artifact substrate support for:
- `PromptArtifactKind = overlay_template | prompt_recipe | prompt_advisor_rewrite | room_role_prompt`
- replay jobs for overlay/prompt-recipe artifacts
- candidate storage and evidence grades
- no duplicate optimizer ownership inside DOC17
### B.2 DOC10
Add route-fact support for:
- overlay session revision awareness
- overlay availability / degraded-state facts
- distinct overlay registry events instead of generic capability churn
### B.3 DOC11
Add prompt-plan and truth support for:
- `OverlayPromptPacket`
- applied/dropped overlay truth
- trim reasons
- prompt recipe observational context ref if used
- feature gating for candidate apply/recommendation UI
### B.4 DOC14
Add or accept:
- `prompt_recipe` in shared prompt-artifact kinds
- DOC17 coexistence without duplicate optimizer logic
- overlay and recipe observations into the shared substrate
### B.5 DOC15
Additive `ResolvedOperation` fields:
- `suggested_overlay_ids`
- `suggested_prompt_recipe_ids`
- `prompt_improvement_hints`
- `overlay_advice_state`
- `prompt_recipe_suggestion_state`
Extend configuration tuples and observation payloads with:
- `active_overlay_ids`
- `prompt_recipe_id`
### B.6 DOC6
Either:
- formally adopt moderator/default overlay fields,
or:
- consume DOC17 extension-store defaults by reference.
### B.7 DOC12
Add room-level overlay state and participant precedence truth surfaces.
### B.8 DOC9
Consume overlay/prompt-artifact failure observations as repair inputs where relevant; DOC17 does not own repair logic.
---
## 19) Final R4.2 Verdict
DOC17 R4.2 is successful only if it becomes a **real spec** rather than a loose idea bundle.
The intended final shape is:
- overlays as reusable visible product surfaces,
- prompt advisor as the live prompt-help tool,
- prompt recipes as reusable user prompt artifacts,
- Prompt Lab as an offline governed evaluation bridge,
- DOC8-owned optimizer lifecycle,
- DOC11-owned prompt truth,
- DOC15-owned learned recommendations,
- no live auto-mutation,
- no duplicate prompt brains,
- and enough concrete schemas, session state, wiring, and tests that coding agents can build it without improvising architecture.
That is the R4.2 machine.