DOC6_PANELS_FORUMS_SELF_IMPROVEMENT_v1_11_8_1.md
Current Specs/DOC6/DOC6_PANELS_FORUMS_SELF_IMPROVEMENT_v1_11_8_1.md
ELNOR REPO READER TEXT MIRROR
Original path: Current Specs/DOC6/DOC6_PANELS_FORUMS_SELF_IMPROVEMENT_v1_11_8_1.md
Source repo: /Users/OpenClaw1/Elnor/Elnor Specs
Git branch: main
Git commit: dbaa25962edc11ab30e8d4ca1715f9ae5bf77331
Generated: 2026-06-09T01:23:58.539Z
---
# DOC6 — Panels & Forums Self‑Improvement Engine v1.11.8.1
Generated: 2026-02-25
## Spec pinning (non-negotiable)
This addendum MUST be implemented on top of the running repo. These are the authoritative inputs used to draft this DOC6:
- ELNOR_CORE_SPEC_v1_11_2_CANONICAL.md
`a1a23894dae7bbe93af868d02204896d887b2c2d1afd7b63a66a1c4468c81571`
- Q_DASHBOARD_SPEC_v1_11_2_CANONICAL.md
`f24788fbade9788a30df7e1d5546d14ac65e9464abd8cef84a9e4b4112405fa3`
- DOC1_MEMORY_RESILIENCE_v1_11_3_FINAL_REF_PACK_SEED_TOGGLE.md
`17cff5b7ba183dc1609589156bdefff0661ebb5607b241b66fde37fb96783372`
- DOC2_FRESHNESS_PERSONAL_STATE_v1_11_4.md
`240124c9175e7f23848b2f754b78696ff18e8a0b62e1ae94605b568c05a6876b`
- DOC4_OPENCLAW_BRIDGE_v1_11_6_1_REF_MAPPED.md
`8c2f05d91bc99de4a4e39668bebbbc617e19068c41cd042c7d7a8bc0c210baa1`
- DOC6_PANELS_FORUMS_SELF_IMPROVEMENT_v1_11_7_3.md
`42c1bfc3af04dceda495087c9e6e908c172602f911f74504eef689d80766724a`
- DOC6_ADDITION_CONTEXT_AND_REFERENCES.md
`e7534c6942517d482862c812473194d2e5b201030cfa4705cbdb64692b93c460`
---
## 0) Scope and intent
DOC6 adds a lightweight, measurable, self‑improving collaboration layer for **Panels** and the **Agent Forum**.
It is designed to:
- keep **Explore** freeform and creative,
- keep **Ship** structured, approval‑gated, and auditable,
- learn which **panel configurations** (roles, prompts, feedback presets, output profiles) produce better outcomes over time,
- avoid hidden latency/cost bloat (all heavy scoring is nightly/sampled; hot path is deterministic).
DOC6 is **additive**:
- It does **not** modify DOC1’s memory core invariants.
- It does **not** rename or replace flush artifacts.
- It does **not** introduce per‑turn reflection jobs by default.
- It extends existing Panel + Forum capabilities and Q’s existing Moderator Profile editor/UI.
### 0.1 Dependencies and compatibility
- **Canonical EC & Q specs (v1.11.2)** remain the source of truth for single‑writer, command queue, cost controls, and Q panels.
- **DOC1 (memory resilience)** is assumed implemented. DOC6 logs learning signals that can feed DOC1’s learning surfaces, but does not change DOC1’s memory rules.
- If OpenClaw integration (DOC4) exists, DOC6 must work whether Hands are online or offline; DOC6 itself does not require desktop tools.
### 0.2 Non‑negotiables (binding)
1) **EC is the sole durable writer.** All durable artifacts introduced by DOC6 are written by EC only.
2) **Q never writes durable state.** Q submits commands and renders state.
3) **Explore vs Ship:** Explore is allowed to be freeform. Ship changes must be explicit proposals → Inbox → approval.
4) **No silent auto-apply.** Learning outputs may produce recommendations and pending items only.
5) **No bloat defaults:** no heartbeat‑based flush generation, no flush/session seed renames, no per‑turn self‑reports by default.
6) **Auditable influence:** anything that influences a Ship proposal or learning promotion must be traceable to sources (message ids, transcript hash, evidence handles).
---
## 1) Core model: Explore vs Ship lanes
### 1.1 Explore lane
Explore is where panels/forums brainstorm and iterate ideas. Explore output may be fully freeform.
Explore generates:
- freeform transcript (channel-owned; Q/Forum view),
- optional sidecar feedback events (if Feedback Mode enabled),
- optional “proposal candidates” extracted at end-of-run.
Explore does **not** directly mutate durable knowledge beyond logging append‑only learning events.
### 1.2 Ship lane
Ship is where ideas become actionable changes (standing orders/corrections/policies/rules/tools/spec edits).
Ship requires:
- a structured **Proposal Candidate** artifact,
- explicit evidence handles (citations, transcript hash, message ids),
- routing to Unified Inbox (pending items),
- approval/rejection by user,
- post‑approval impact tracking via Impact Ledger.
---
## 2) Output Profiles and Output Blocks (optional + editable)
### 2.1 Definitions
- **Output Profile**: a reusable preset defining which “Output Blocks” the Synthesizer produces at end‑of‑run.
- **Output Blocks**: modular sections such as Summary, Proposals, Risks, Evidence Map, Experiments, Backlog, etc.
### 2.2 Required “sidecar envelope” (always; compact)
Even when visible output is **Freeform**, every panel run must produce a compact end‑of‑run sidecar envelope written by EC (or by the Synthesizer and submitted to EC), containing:
- `goal`, `success_metric`
- `run_id`, `thread_id`, `channel`
- `moderator_profile_id`, `output_profile_id`, `intensity_mode`, `feedback_mode_id`
- roster + overlays used
- `top_proposals[]` (may be empty)
- `votes[]` (may be absent; represented as empty with a flag)
**Important:** For Freeform, this envelope is created **once at end‑of‑run**, not per turn. Visible text remains unstructured.
---
## 3) Role Overlays and Moderator Profiles (no agent proliferation)
### 3.1 Role Overlays
Role Overlays are short, editable templates (Driver/Skeptic/Auditor/Scout/Synthesizer/Prosecutor, etc.) applied per run to existing agents.
Role Overlays are not new durable “agents”. They are configuration templates attached to Moderator Profiles and selectable per panel run.
### 3.2 Moderator Profile integration (mandatory)
All the following MUST be stored/edited under the existing Q “Moderator Profile Editor” (do not invent a parallel config universe):
- Output Profiles (incl Freeform/Minimal/Standard/Forensic/Innovation/Spec‑Edit)
- Intensity modes (Jam/Review/Ship/High‑Stakes)
- Feedback Mode presets (Off/Light/Standard/Strict) including editable overlay text
- Budgets (rounds/tokens/feedback event pool/prosecutor invocations)
- Risk/Failure taxonomy selection (see §5)
This aligns to Q’s existing Moderator Profile editor and schema (Q spec “Panels UI additions (Moderator Agent profiles)” and “Moderator Profile Editor”).
---
### 3.3 Intensity budgets (defaults; profile-editable) [ADDED]
These defaults MUST be enforced by EC and editable via Moderator Profiles.
| Intensity | Max rounds | Max total tokens | Feedback pool | Prosecutor | Sidecar envelope cap |
|----------:|-----------:|-----------------:|--------------:|:----------:|---------------------:|
| Jam Session | 3 | 8,000 | 20 | OFF | 800 tokens |
| Review | 5 | 20,000 | 20 | Optional | 800 tokens |
| Ship Review | 7 | 40,000 | 40 | Optional | 1,200 tokens |
| High Stakes | 10 | 80,000 | 40 | Conditional (risk/taxonomy) | 1,500 tokens |
Notes:
- Prosecutor is conditional for High Stakes unless profile forces it.
- “Convergence” preset is recommended for Ship Review and High Stakes.
- **Emergency synthesis reserve:** EC should reserve 150 tokens within max_total_tokens for an end-of-run Synthesizer summary when a run terminates due to budget/time (optional, but recommended for UX).
### 3.4 Overlay Intervention (optional; deterministic; logged) [ADDED]
**Intent:** Provide a rare, explicit “circuit breaker” that can salvage a panel/forum run when it is stuck (endless debate, budget exhaustion risk) or when it needs last‑mile rigor before Ship. This is **not** continuous adaptive steering.
**Rules (hard):**
- Interventions may occur **only at round boundaries** (never mid-message).
- Default maximum interventions per run: **1** (profile‑editable; recommended max = 1).
- Interventions are **deterministic**: triggered by simple counters/ratios computed from sidecars (no LLM judge).
- Every intervention MUST be explicitly logged and shown in Q (“Intervention applied…”). No invisible changes.
- Learning leaderboards MUST treat intervention runs separately by default (see §6.2).
**Allowed intervention actions (small menu):**
1) `enable_convergence` — switch Feedback Mode to `convergence` for the remainder of the run.
2) `add_synthesizer_emphasis` — apply a Synthesizer overlay emphasis for the next round (same agent; overlay template, not a new agent).
3) `add_evidence_auditor_emphasis` — apply Evidence Auditor emphasis for the next round.
4) `enable_prosecutor` — enable Prosecutor overlay for the next round (Ship/High‑Stakes only unless explicitly forced by profile).
**Deterministic triggers (defaults; profile‑editable):**
- **Debate loop:** `unresolved_object_count >= 6` AND `resolved_ratio < 0.25` by end of Round 1 → `enable_convergence`.
- **Budget risk:** feedback pool consumption ≥ 80% by end of Round 1 → `enable_convergence`.
- **Evidence meltdown (Ship only):** `request_evidence_count >= 3` AND `legal_distortion` enforced AND citations missing on proposal candidates → `add_evidence_auditor_emphasis` (next round).
- **Overconfidence risk (Ship/High‑Stakes):** high consensus (vote_consensus ≥ 0.8) AND low evidence handles (≤ 1) → `enable_prosecutor`.
**Logging (required):**
When an intervention occurs, EC appends an `overlay_intervention` record to `ELNOR_MEMORY/system/panels/overlay_interventions.jsonl` and includes it in the run’s sidecar envelope:
- `run_id`, `round_index`, `trigger`, `action`, `from_state`, `to_state`, `ts`.
**Outcome accounting (required):**
- Runs with interventions are tagged `intervention_applied=true` in the sidecar envelope.
- Baseline leaderboards exclude intervention runs by default and a separate `intervention_leaderboard.json` is produced for evaluation.
## 4) Feedback Mode (agent‑to‑agent feedback sidecars)
Feedback Mode is optional per run and editable. Its purpose is to capture structured, measurable feedback among agents without forcing structure into the visible transcript.
### 4.1 Feedback event types
Feedback sidecar events use `feedback_type`:
- `endorse`
- `refine`
- `object`
- `request_evidence`
- `propose_test`
- `meta_style` (compliments / style notes; low weight)
- `resolve` (marks an objection as resolved)
### 4.2 Required fields for usefulness
Each feedback event must include:
- `target_message_id`
- `feedback_type`
- `reason` (short; token capped)
- `confidence` (0–1)
For `object`, `request_evidence`, and `propose_test`, include:
- `severity` (blocker/major/minor)
- optional `evidence_handles[]`
For Innovation Scout feedback that responds to a specific objection:
- `addresses_event_id` (links innovation to the objection)
### 4.3 Revision linkage (did feedback change substance?)
If an agent revises a proposal after feedback, create a revision link:
- `revises_message_id`
- `revision_reason_event_ids[]`
- `substance_delta`: `none | wording | meaning`
This enables influence tracking and downstream “was feedback correct?” scoring.
**Maximum revision depth (required):** To prevent circular revision loops, EC must enforce a maximum revision depth of **1** per message. If a message has already been revised once (i.e., it appears as `revises_message_id` in any existing `revision_links` for the run), further revisions must be rejected and the Synthesizer prompted to summarize and move on.
### 4.4 Anti‑sycophancy weighting
- `meta_style` is logged and shown for UI morale, but has low weight in learning.
- `endorse` is only treated as strong signal if it includes at least one of:
- evidence handle,
- proposed test,
- concrete refinement.
### 4.5 Budgeting (hard defaults)
To prevent feedback spam:
- Feedback Mode uses a **shared pool per panel run**:
- Jam/Review: 20 events total
- Ship/High‑Stakes: 40 events total
- Anti‑monopoly cap: no single agent may consume more than **60%** of the run’s feedback event pool. EC enforces this by counting feedback events per `actor_agent_id` within the run and rejecting events once an agent reaches the cap.
When budget is exhausted, agents must stop producing sidecar feedback events.
---
**Graceful exhaustion (required):** When the feedback pool is exhausted, EC must append a single `budget_exhausted` event record (system-generated) to `feedback_events.jsonl` for the run. After exhaustion, agents may emit at most **one** final `summary_feedback` event (low-weight; no objections) to capture end-of-run guidance; all other feedback events are rejected.
### 4.6 Feedback Mode presets (editable) [ADDED]
Feedback Mode presets are stored in Moderator Profiles and are editable in Q. Default preset ids:
- `off`
- `light`
- `standard`
- `strict`
- `convergence`
Preset intent:
- **Light:** low‑friction feedback; minimal objections; short reasons.
- **Standard:** objections must include fix/test; endorsements must include reason.
- **Strict:** factual endorsements require evidence or explicit “speculative” flag; Prosecutor encouraged/conditional.
- **Convergence:** after Round 1, EC rejects new `object` and `request_evidence` events (unless explicitly allowed by profile override). Only `resolve`, `refine`, `endorse`, `propose_test`, and `meta_style` may be emitted. Purpose: prevent endless debate and force closure.
### 4.7 Resolve behavior (required) [ADDED]
A `resolve` feedback event MUST reference a prior objection or evidence request via `resolves_event_id`.
When EC receives a `resolve` event:
- EC marks the referenced event as **resolved** in derived views (Q badges, run summaries).
- The Synthesizer treats resolved objections as “addressed” and keeps them out of the “open blockers” list.
- Leaderboards penalize **unresolved** `object/request_evidence` events; resolved ones reduce penalty.
This is sidecar behavior only; it does not mutate original records (append‑only).
## 5) Risk/Failure Taxonomy (agnostic, editable; Legal Distortion included)
### 5.1 Purpose
A small editable taxonomy used to:
- classify failure modes,
- drive stricter gates in Ship lane,
- prioritize reviewer attention,
- remain domain‑agnostic while allowing domain‑specific categories.
### 5.1.1 Taxonomy model (required) [ADDED]
**Storage (EC‑owned JSON):** `ELNOR_MEMORY/system/panels/taxonomy.json`
`taxonomy.json` is a single JSON object with:
- `version`, `updated_at`
- `categories[]` (each category has: `key`, `display_name`, `description`, `enabled`, `gate_behavior`)
- `preset_overrides[]` (mapping: feedback_mode/intensity/output_profile → enforced categories)
Q edits taxonomy by submitting commands to EC; EC validates and writes `taxonomy.json` atomically.
### 5.2 Taxonomy default seeding (editable in Q)
On first run (taxonomy file missing), EC MUST create `taxonomy.json` seeded with:
- `legal_distortion` (enabled; default `gate_behavior = block_ship` for Ship lane)
- a small handful of general categories (enabled) to start (e.g., silent steering, log explosion, endless debate)
Users can add/delete categories and change enforcement in Q. This spec does not hardcode the full default list beyond requiring `legal_distortion` by default.
### 5.3 Legal Distortion gate (Ship lane)
Legal Distortion is a Ship‑lane gate that prevents uncited legal/procedural “rules” from becoming durable behavior:
If a Ship proposal asserts a legal rule/procedure/deadline, the proposal must include at least one citation object with:
- `source_type`: (doc/file/web/case)
- `path_or_url` (local path allowed only in trusted local context)
- `hash` (doc hash if local)
- `page_or_bates` (or equivalent pinpoint)
- `snippet_hash` (optional)
If missing, conversion to Ship must be blocked and routed to Inbox as “needs citation.”
This remains optional in non‑legal domains: taxonomy is configurable and gates apply only when selected.
**Scope guard (important):** The Legal Distortion gate applies **only** at **Ship conversion / Ship approval** boundaries. It does **not** block Explore discussion, feedback events, or non-legal Ship proposals. In other words: agents may freely discuss legal ideas; the gate only prevents a legal rule/procedure from becoming a durable Ship change without citations.
**Enablement:** Legal Distortion enforcement is controlled by Moderator Profile + intensity defaults:
- Jam/Review: Legal Distortion enforcement is **OFF** by default.
- Ship Review / High Stakes: Legal Distortion enforcement is **ON** by default.
Users can override per Moderator Profile in Q (toggle “Legal Distortion gate” and/or taxonomy enforced categories).
**Deep validation:** Structural citation presence is required at conversion; deeper “is this citation actually relevant” validation (if enabled) must occur at **approval time** (user-initiated), not in nightly jobs.
---
## 6) Impact Ledger (nightly, derived) + Rollback safety
### 6.1 Impact Events (append-only)
Impact is measured by append‑only events, not by model self‑grading.
Events include:
- `change_id used`
- `user reaction` (👍/👎/⭐ etc.)
- `inject_then_correct` flag (correction within 2 turns after a change was used)
- `cost` (if available; otherwise omitted)
- `adoption` (proposal approved/applied)
- `rollback invoked`
### 6.2 Nightly aggregation (bounded; 0 LLM calls by default)
Nightly job produces:
- `impact_ledger.jsonl` per change_id (7/14/30‑day windows)
- “harm candidates” list
- “best prompt templates / overlay configs” leaderboards
Hard bounds (defaults):
- max runtime: 300 seconds
- max change_ids processed: 1000
- max LLM calls: 0 (default). Optional “heavy evaluation” must be user‑initiated and separately budgeted.
**Overflow behavior (required):** If runtime or change_id limits are reached, EC writes a **partial** ledger and appends a `nightly_job_overflow` record containing: `coverage_pct`, `processed_change_ids`, `skipped_change_ids`, and the bound that triggered. Q displays “Impact Ledger covers X% of tracked changes” for that run/day.
**Leaderboards (deterministic scoring; required):**
EC produces `prompt_leaderboard.json` and `roster_profile_leaderboard.json` using the following composite score (per entry), computed over a rolling 30-day window with simple time decay (newer events weighted higher):
`score = 0.4 * star_rate + 0.3 * adoption_rate + 0.3 * (1 - inject_then_correct_rate)`
Where:
- `star_rate` = ⭐ reactions / eligible runs
- `adoption_rate` = Ship-approved proposals / proposal candidates
- `inject_then_correct_rate` = inject_then_correct events / uses
**Failure mode rollup (required):**
Nightly aggregation MUST also produce `failure_mode_rollup.json` summarizing counts of taxonomy-tagged failure events per category and per Moderator Profile (30-day window).
**Intervention runs (required):** Nightly aggregation MUST compute baseline leaderboards using **non-intervention runs only** (`intervention_applied=false`). It MUST also produce `intervention_leaderboard.json` scoring interventions separately (same formula, but only among intervention runs) so you can evaluate whether interventions help or harm outcomes.
### 6.3 Harm candidate definition (deterministic)
A change becomes a harm candidate if:
- `inject_then_correct` occurs ≥ 3 times in 14 days,
- and there are no offsetting positive signals above threshold,
- and adoption count ≥ 1 (it was actually used).
Harm candidates produce a pending Inbox item proposing:
- disable / demote / rollback,
never an auto‑apply action.
---
### 6.4 Self‑reports (structured; daily/end‑of‑task only) [ADDED]
**Purpose:** Capture agent/process observations as proposals without hot‑path cost spikes.
**Triggers (defaults):**
- End‑of‑task (panel run finalized) for runs in Ship/High‑Stakes intensity, OR
- Daily digest at 02:00 local time (configurable) per agent participating in at least one run.
**Default: OFF for per‑turn.** Per‑turn self‑reports are forbidden by default.
**Token cap:** 250–400 tokens per self‑report record (hard cap).
Self‑reports are append‑only events (`self_reports.jsonl`) and may spawn pending Inbox items but never auto‑apply.
### 6.5 Micro‑feedback prompts (sampled; non‑blocking) [ADDED]
**Purpose:** Collect high‑signal user validation without nagging.
**Sampling defaults:**
- Base rate: 5% of eligible situations
- 100% when a change is a `harm_candidate` or when user explicitly requests evaluation
**Eligibility:** only when a promoted/approved `change_id` was used in the last run OR the run produced a Ship proposal candidate.
**Caps:** max 1 micro‑feedback prompt per conversation/thread per day; max 3 per user per day.
Micro‑feedback prompts appear in Q as a queue item and can be dismissed; dismissal is logged and reduces prompting frequency for that thread.
## 7) Durable artifacts (EC-owned; append-only + derived)
All paths are EC-written only.
### 7.1 Append-only logs (JSONL)
Create under:
- `ELNOR_MEMORY/system/panels/`
- `taxonomy.json`
- `panel_runs.jsonl`
- `overlay_interventions.jsonl`
- `panel_turns.jsonl` (optional; may store hashes + limited excerpts)
- `feedback_events.jsonl`
- `revision_links.jsonl`
- `proposal_candidates.jsonl`
- `references_manifest_index.jsonl`
- `compaction_events.jsonl`
- `ELNOR_MEMORY/system/learning/`
- `impact_events.jsonl`
- `impact_ledger.jsonl` (derived nightly; append-only per run/day)
- `micro_feedback.jsonl`
- `self_reports.jsonl`
- `team_digest/YYYY-MM-DD.md` (derived)
### 7.2 Derived views (JSON)
- `ELNOR_MEMORY/system/panels/prompt_leaderboard.json`
- `ELNOR_MEMORY/system/panels/roster_profile_leaderboard.json`
- `ELNOR_MEMORY/system/panels/failure_mode_rollup.json`
### 7.3 Retention policy (no log explosion)
Defaults:
- `panel_turns.jsonl`, `feedback_events.jsonl`, `revision_links.jsonl`, and `overlay_interventions.jsonl`: retain 90 days then archive-compress.
- `impact_ledger.jsonl`: retain indefinitely (small summaries).
- `proposal_candidates.jsonl`: retain indefinitely (provenance for shipped changes).
- `micro_feedback.jsonl` and `self_reports.jsonl`: retain 180 days then archive.
- `references/<run_id>/access_log.jsonl`: retain 90 days then archive-compress.
- `compaction/<run_id>/*` derived views: retain 90 days then archive-compress.
- `reference_store/*` snapshots: retain until unreferenced by any manifest for 180 days, then archive.
Archival must follow archive‑not‑delete: move to `ELNOR_MEMORY/system/task_archive/…` with index entry.
---
## 8) Q UX requirements (wiring to existing panels)
### 8.1 Panel Builder (Moderator profile driven)
Panel Builder must allow:
- select Moderator Profile
- override per run: output profile (incl Freeform), intensity, feedback mode
- budgets display and edit (within allowed ranges)
- “Save as Profile”
### 8.2 Panel Run View
- Visible transcript remains freeform by default.
- Show badges derived from sidecars:
- endorsed/objected/requested evidence/revised/resolved
- Per-message reactions:
- 👍 good
- ⭐ great prompt/style
- 🎯 on-topic
- 🧪 needs evidence
- 🚫 off-topic/avoid pattern
### 8.3 Convert to Ship
For any idea chunk, provide “Convert to Proposal (Ship)”:
- creates a proposal candidate record
- routes to Inbox as pending item
- includes `source_transcript_hash` and message ids
- applies Legal Distortion gate if category selected
### 8.4 Learning Dashboard additions
- Impact Ledger view (what helped/hurt)
- Prompt/Overlay leaderboards
- Self-report digest view
- Micro-feedback queue (non-blocking; expires)
### 8.5 Taxonomy Editor (editable categories)
Add a simple editor allowing:
- add/delete categories
- enable/disable category gates for Ship lane
- map category → which presets enforce it (e.g., Strict)
---
**Placement:** The Taxonomy Editor must live inside the existing **Moderator Profile Editor** UI (Advanced/Risk Taxonomy section) to avoid new surfaces.
### 8.6 Panel Run View — Context Pressure Indicator [ADDED]
During an active panel run, show per-agent context pressure bars:
- Green 0–50%
- Amber 50–70%
- Red 70%+
Label: “[Agent] ([Model]): 62% context used”.
When compaction occurs, show a non-blocking notification: “Discussion compacted for [Agent]. [Y] tokens freed. Tap to view summary.”
### 8.7 Panel Builder — Reference Preview [ADDED]
When documents are attached in Panel Builder, show:
- token estimate per document
- predicted INLINE vs REPOSITORY under the selected roster
- per-document override: Auto / Force Inline / Force Repository
- inline budget bar for smallest model
## 9) EC API and commands (implementation contract)
### 9.1 New command types (append-only; no approvals unless explicitly Ship)
Commands (submitted via `/api/commands`; EC validates and writes):
- `panel_run_start` (creates run id; appends to panel_runs)
- `panel_turn_batch_append` (optional batching; appends to panel_turns)
- `panel_feedback_event_append` (appends to feedback_events; enforces budgets)
- `panel_revision_link_append` (appends to revision_links)
- `panel_overlay_intervention_append` (system-only; appends to overlay_interventions; used when EC applies an intervention)
- `panel_run_finalize` (writes sidecar envelope; appends proposal candidates if present)
- `panel_convert_to_proposal_candidate` (writes proposal_candidates; creates pending inbox item)
- `panel_reaction_event` (writes impact_events + feedback_signals hook)
- `panel_nightly_aggregate` (runs bounded aggregation; writes derived views)
- `taxonomy_set` (replace taxonomy.json after validation)
- `taxonomy_category_upsert` (add/update a category)
- `taxonomy_category_delete` (delete a category)
- `taxonomy_preset_override_set` (edit preset_overrides)
Ship‑lane proposal approval remains routed through existing pending item approval mechanisms.
- `model_registry_upsert` (updates DOC2 model_registry.json; manual only) [ADDED]
- `model_registry_delete` [ADDED]
- `model_registry_auto_detect` (manual maintenance job only; allowlist; bounded) [ADDED]
- `panel_reference_add` (register reference mid-run; triggers re-materialization) [ADDED]
- `panel_ref_read` (agent tool: retrieve reference sections; logs access) [ADDED]
- `panel_turn_drillback` (agent tool: retrieve original turns from compacted view) [ADDED]
- `panel_run_compact` (system-only; triggers compaction for an agent) [ADDED]
- `panel_compaction_event_append` (system-only; appends compaction_events) [ADDED]
### 9.2 Read endpoints (Q reads; EC serves)
Add read-only endpoints (or expose via existing Q backend proxy):
- `GET /api/panels/runs?limit=&since=`
- `GET /api/panels/run/:run_id`
- `GET /api/panels/feedback?run_id=`
- `GET /api/panels/leaderboards`
- `GET /api/learning/impact-ledger?since=`
- `GET /api/learning/team-digest?date=`
- `GET /api/learning/taxonomy`
- `GET /api/learning/taxonomy/rules`
### 9.3 Cost controls alignment
All new endpoints must integrate with existing global cost budgets and must not introduce new always-on expensive calls.
---
## 10) Repo implementation map (Codex must follow; no invented paths)
This addendum is written to match your known monorepo layout.
### 10.1 Contracts
- `packages/contracts/src/schemas.ts`: add Zod schemas for new records and commands.
- `packages/contracts/src/canonical.ts`: add canonical paths for new artifacts.
- `packages/contracts/src/index.ts`: export new schemas/types.
#### 10.1.1 Zod schemas (required; Codex must implement exactly) [ADDED]
Add these Zod schemas to `packages/contracts/src/schemas.ts`.
**Model registry extension (DOC2):** Extend the DOC2 model registry record schema (in DOC2 schema section) with the optional fields required by DOC6 §13.3. Do not introduce a second model registry schema/file.
**Enums:**
- `IntensityMode = z.enum(['jam','review','ship','high_stakes'])`
- `FeedbackMode = z.enum(['off','light','standard','strict','convergence'])`
- `FeedbackType = z.enum(['endorse','refine','object','request_evidence','propose_test','meta_style','resolve','summary_feedback','budget_exhausted'])`
- `Severity = z.enum(['blocker','major','minor'])`
- `SubstanceDelta = z.enum(['none','wording','meaning'])`
- `InterventionAction = z.enum(['enable_convergence','add_synthesizer_emphasis','add_evidence_auditor_emphasis','enable_prosecutor'])`
**feedback_event**
```ts
export const FeedbackEventSchema = z.object({
id: z.string(),
run_id: z.string(),
thread_id: z.string().optional(),
channel: z.string(),
ts: z.string(),
actor_agent_id: z.string(),
target_message_id: z.string(),
feedback_type: FeedbackType,
reason: z.string().max(600),
proposed_fix: z.string().max(600).optional(),
test_description: z.string().max(600).optional(),
confidence: z.number().min(0).max(1),
severity: Severity.optional(),
evidence_handles: z.array(z.string()).max(8).optional(),
addresses_event_id: z.string().optional(),
resolves_event_id: z.string().optional(),
meta_style_weight: z.number().min(0).max(1).default(0.1).optional(),
});
```
Validation rules:
- `severity` required if `feedback_type` in {`object`,`request_evidence`,`propose_test`}.
- `test_description` required if `feedback_type` == `propose_test`.
- `resolves_event_id` required if `feedback_type` == `resolve`.
- For `feedback_type == endorse`, require at least one of: `evidence_handles` present OR `addresses_event_id` present OR `reason.length > 80`.
- `budget_exhausted` events are system-generated by EC only (actor_agent_id = 'ec'); they must include `reason` summarizing counts.
- `summary_feedback` may be appended at most once per run after `budget_exhausted` and must not include `severity`.
Implementation note: enforce these conditional rules in Zod via `.superRefine(...)` (Codex must not omit this).
**revision_link**
```ts
export const RevisionLinkSchema = z.object({
id: z.string(),
run_id: z.string(),
ts: z.string(),
actor_agent_id: z.string(),
message_id: z.string(),
revises_message_id: z.string(),
revision_reason_event_ids: z.array(z.string()).min(1).max(8),
substance_delta: SubstanceDelta,
});
```
**overlay_intervention_event** (system-only)
```ts
export const OverlayInterventionEventSchema = z.object({
id: z.string(),
run_id: z.string(),
ts: z.string(),
round_index: z.number().int().nonnegative(),
trigger: z.string().max(240),
action: InterventionAction,
from_state: z.record(z.string(), z.unknown()).optional(),
to_state: z.record(z.string(), z.unknown()).optional(),
});
```
**sidecar_envelope** (end‑of‑run only)
```ts
export const SidecarEnvelopeSchema = z.object({
id: z.string(),
run_id: z.string(),
thread_id: z.string().optional(),
channel: z.string(),
ts: z.string(),
goal: z.string().max(400),
success_metric: z.string().max(240).optional(),
moderator_profile_id: z.string(),
output_profile_id: z.string(),
intensity_mode: IntensityMode,
feedback_mode: FeedbackMode,
roster: z.array(z.object({
agent_id: z.string(),
overlay_id: z.string().optional(),
model: z.string().optional(),
})).max(12),
top_proposals: z.array(z.object({
proposal_id: z.string(),
title: z.string().max(160),
summary: z.string().max(600),
ship_recommended: z.boolean().optional().default(false),
})).max(10).default([]),
intervention_applied: z.boolean().optional().default(false),
overlays_by_round: z.array(z.array(z.object({ agent_id: z.string(), overlay_id: z.string().optional() }))).max(12).optional(),
intervention_event_ids: z.array(z.string()).max(8).optional().default([]),
votes: z.array(z.object({
proposal_id: z.string(),
voter_agent_id: z.string(),
confidence: z.number().min(0).max(1),
stance: z.enum(['support','oppose','abstain']),
})).max(60).default([]),
});
```
**proposal_candidate**
```ts
export const ProposalCandidateSchema = z.object({
id: z.string(),
run_id: z.string(),
thread_id: z.string().optional(),
channel: z.string(),
ts: z.string(),
title: z.string().max(200),
summary: z.string().max(1200),
proposal_kind: z.enum(['standing_order','correction','policy','rule','spec_edit','code_change','other']),
source_transcript_hash: z.string(),
source_message_ids: z.array(z.string()).min(1).max(50),
risk_tags: z.array(z.string()).max(12).default([]),
evidence: z.array(z.object({
source_type: z.enum(['doc','file','web','case','memory','log']),
path_or_url: z.string().max(512),
hash: z.string().optional(),
page_or_bates: z.string().optional(),
snippet_hash: z.string().optional(),
})).max(12).default([]),
});
```
**impact_event**
```ts
export const ImpactEventSchema = z.object({
id: z.string(),
ts: z.string(),
change_id: z.string(),
run_id: z.string().optional(),
thread_id: z.string().optional(),
channel: z.string(),
used: z.boolean().optional().default(true),
inject_then_correct: z.boolean().optional().default(false),
user_reaction: z.enum(['up','down','star','on_topic','needs_evidence','off_topic','none']).default('none'),
cost_usd: z.number().nonnegative().optional(),
});
```
**micro_feedback_event**
```ts
export const MicroFeedbackEventSchema = z.object({
id: z.string(),
ts: z.string(),
run_id: z.string().optional(),
thread_id: z.string().optional(),
change_id: z.string().optional(),
prompt_id: z.string(),
response: z.enum(['yes','no','unsure','dismissed']),
note: z.string().max(600).optional(),
});
```
**self_report_event**
Enforcement: EC MUST enforce the cap by (a) schema structural limits (max items + max string lengths) and (b) truncating serialized records above a fixed byte ceiling (recommended 6 KB) before writing. Do not trust an agent-supplied token count.
```ts
export const SelfReportEventSchema = z.object({
id: z.string(),
ts: z.string(),
agent_id: z.string(),
scope: z.enum(['task','daily']),
wins: z.array(z.string().max(240)).max(6).default([]),
failures: z.array(z.string().max(240)).max(6).default([]),
proposed_improvements: z.array(z.object({
title: z.string().max(160),
rationale: z.string().max(400),
})).max(6).default([]),
token_count_cap: z.number().max(400).default(400),
});
```
**panel_reference**
```ts
export const PanelReferenceSchema = z.object({
ref_id: z.string().max(128),
ref_type: z.enum(['document','spec','code','prior_run','standing_orders','memory','other']),
title: z.string().max(200),
source_path: z.string().max(512),
content_hash: z.string(),
token_estimate: z.number().int().positive(),
snapshot: z.boolean().optional().default(false),
materialization: z.enum(['auto','force_inline','force_repository']).default('auto'),
manifest_detail: z.enum(['minimal','standard','detailed']).optional(),
structural_index: z.array(z.object({
section_id: z.string().max(32),
title: z.string().max(200),
depth: z.number().int().min(0).max(6),
token_estimate: z.number().int().nonnegative(),
start_offset: z.number().int().nonnegative(),
end_offset: z.number().int().nonnegative(),
})).max(400).default([]),
});
```
**ref_access_event**
```ts
export const RefAccessEventSchema = z.object({
id: z.string(),
run_id: z.string(),
ts: z.string(),
agent_id: z.string(),
ref_id: z.string(),
section_ids: z.array(z.string()).max(20).default([]),
full: z.boolean().optional().default(false),
tokens_returned: z.number().int().nonnegative(),
turn_number: z.number().int().nonnegative(),
});
```
**compaction_event**
```ts
export const CompactionEventSchema = z.object({
id: z.string(),
run_id: z.string(),
ts: z.string(),
agent_id: z.string(),
seq: z.number().int().nonnegative(),
trigger: z.string().max(240),
pressure_pct: z.number().min(0).max(100),
tokens_saved: z.number().int().nonnegative(),
compacted_turn_ids: z.array(z.string()).max(500).default([]),
status: z.enum(['ok','insufficient','skipped']),
});
```
**taxonomy**
```ts
export const TaxonomyCategorySchema = z.object({
key: z.string().max(64),
display_name: z.string().max(120),
description: z.string().max(400).optional(),
enabled: z.boolean().default(true),
gate_behavior: z.enum(['none','warn','block_ship']).default('none'),
});
export const TaxonomyConfigSchema = z.object({
version: z.string(),
updated_at: z.string(),
categories: z.array(TaxonomyCategorySchema).max(200),
preset_overrides: z.array(z.object({
feedback_mode: FeedbackMode.optional(),
intensity_mode: IntensityMode.optional(),
output_profile_id: z.string().optional(),
enforced_categories: z.array(z.string().max(64)).max(50),
})).max(200).default([]),
});
```
### 10.2 EC service
- `apps/ec-service/src/server.ts`: implement command handlers + read endpoints.
- `apps/ec-service/src/fs-utils.ts`: ensure helpers for append-jsonl, atomic write, bounded read, archive move.
- `apps/ec-service/src/pending-index.ts`: add new pending item kinds (proposal candidate, harm candidate, taxonomy change).
- `apps/ec-service/src/event-bus.ts`: emit compact events for realtime UI (panel run started/finalized; harm candidate created; ledger updated).
Recommended new EC modules (keep minimal; no dependencies):
- `apps/ec-service/src/panels-store.ts` (append-only writes + bounds + retention)
- `apps/ec-service/src/panels-aggregator.ts` (nightly deterministic rollups)
- `apps/ec-service/src/taxonomy-store.ts` (editable taxonomy config + validation)
### 10.3 Q backend
- `apps/q-backend/src/server.ts`: proxy new endpoints; enforce remote write restrictions; route writes to EC commands.
- `apps/q-backend/src/event-streamer.ts`: stream panel/ledger events if used.
### 10.4 Q frontend
Use existing panel framework. Add new UI components under existing folders only:
- `apps/q-frontend/src/pages/LearningPage.tsx` (impact ledger + leaderboards + self-reports)
- `apps/q-frontend/src/pages/InboxPage.tsx` / `components/InboxList.tsx` (new pending item types)
- `apps/q-frontend/src/pages/SettingsPage.tsx` (taxon editor entry point; profile settings)
- `apps/q-frontend/src/components/*` (PanelBuilder additions; ReactionBar; BadgeRow; FeedbackModeEditor; OutputProfileSelector; TaxonomyEditor)
---
## 10.5 EC‑down degraded mode (reliability requirement) [ADDED]
If EC is unavailable:
- Panels and Forum threads may continue in **Explore/Freeform** mode only.
- Feedback Mode is forcibly OFF (no sidecar writes).
- “Convert to Proposal (Ship)” is disabled and Q shows a persistent banner: “EC offline — Ship lane disabled; no durable logging.”
- When EC reconnects, the system must not backfill missing sidecars automatically. Users may optionally re-run a panel in Ship mode.
## 11) Acceptance tests (minimum)
Add tests under `tests/` (vitest) to validate:
1) Freeform run finalizes with sidecar envelope (end-of-run only), without per-turn structured requirements.
2) Feedback budget enforcement: after N events, further feedback events are rejected/ignored with explicit status.
3) Revision link records substance_delta and links to feedback_event_ids.
4) Legal Distortion gate blocks Ship conversion without citations, and creates “needs citation” pending item.
5) Nightly aggregation runs within runtime bounds and produces ledger + leaderboards deterministically (0 LLM calls).
6) Retention/archival moves old logs to archive path (no delete).
7) Moderator profile stores output profile, feedback mode, budgets, and applies them to runs.
8) Overlay intervention triggers only at round boundary, logs overlay_intervention record, and sets `intervention_applied=true` in sidecar envelope.
9) Intervention runs are excluded from baseline leaderboards and appear in intervention_leaderboard.json.
10) Reference materialization assigns INLINE/REPOSITORY deterministically; panel_ref_read returns sections with 2-turn persistence and guards.
11) Quota + dedupe: snapshots are pointer-by-default; snapshot true dedupes by content_hash and enforces per-run 200MB budget.
12) Compaction triggers only at required threshold in auto, max 1 per agent per run; drillback returns original turns with 2-turn persistence and guards.
13) Model registry is unified with DOC2 path and includes DOC6 §13.3 fields.
---
## 12) “Codex shoddy implementation” guardrails
Codex MUST:
- Run a **Repo Reality Check** (list files, find existing panel/forum code paths) and map DOC6 responsibilities onto existing modules without renaming.
- Provide a `PATCH_REPORT.md` containing:
- file inventory and exact changes,
- schema diffs,
- commands/endpoints added,
- screenshots/log excerpts of budget enforcement + legal gate behavior,
- test commands + outputs.
If Codex cannot find a referenced path or an existing implementation hook, it must STOP and report the mismatch rather than inventing structure.
---
## 13) Model registry and context window tracking (unified with DOC2) [ADDED]
### 13.1 Purpose
Panels/Forums must manage references and compaction based on real model context limits. EC must not guess.
### 13.2 Single source of truth (DOC2 registry)
DOC6 MUST reuse the DOC2 “Model Capability Registry” as the **only** registry file.
**Canonical path (EC-owned):**
- `ELNOR_MEMORY/system/freshness/model_registry.json`
DOC6 forbids creating any second registry file such as `ELNOR_MEMORY/system/config/model_registry.json`.
### 13.3 Required additive fields (DOC2 schema extension)
Extend the DOC2 per-model record schema by adding these **optional** fields (additive only; backwards compatible):
- `provider` (enum: anthropic/openai/google/xai/meta/local/other)
- `context_window_tokens` (int)
- `max_output_tokens` (int, optional)
- `approx_chars_per_token` (number, default 4)
- `cost_per_1k_input` / `cost_per_1k_output` (numbers, optional)
- `source` (manual/auto_detected/self_learned/seeded)
- `confidence` (verified/estimated/fallback)
- `last_verified_at` (ISO timestamp)
- `notes` (string, optional)
**Defaults:**
- Auto-detect is OFF by default. Registry updates run only via explicit user action or explicit command.
### 13.4 Update mechanisms (priority order)
1) Manual user edits via Q (highest authority).
2) Runtime self-learning from provider errors (preferred “truth”): when a provider returns “context length exceeded” or max output errors, EC records a `self_learned` update with `confidence=verified`.
3) Manual “maintenance job” auto-detect (optional): allowlist-only, bounded, manual button only.
4) Seeded defaults (lowest trust; `confidence=fallback`).
### 13.5 Context pressure estimation (deterministic)
EC maintains a per-agent `AgentContextEstimate` per run:
- token estimates derived deterministically (bytes / approx_chars_per_token + known overheads)
- `pressure_pct = used_tokens / context_window_tokens`
Pressure is displayed in Q (see §8 additions) and drives §14 and §15.
---
## 14) Panel reference management (adaptive materialization; inline-first) [ADDED]
### 14.1 Purpose
Panel runs often involve documents/specs/code/briefs. DOC6 adds an adaptive system:
- Keep small artifacts INLINE for best reasoning quality.
- Move large artifacts to a run-scoped Reference Repository (manifest + section retrieval) when needed.
### 14.2 Inline-first, smallest-window governs
All agents share the same materialization strategy per artifact. The smallest context window model governs (as in the proposal). Inline always preferred if it fits.
### 14.3 Strategies
Each reference artifact is assigned:
- `INLINE`: full text injected each turn.
- `REPOSITORY`: manifest injected; sections retrieved on-demand.
### 14.4 Decision algorithm (deterministic)
Compute smallest inline budget across roster:
- `available = context_window_tokens - system_prompt_est - tool_overhead_est`
- `inline_budget = available * reference_inline_budget_pct / 100`
- `smallest_inline_budget = min(inline_budget across agents)`
Sort artifacts by token_estimate ascending; assign INLINE until budget filled, remainder REPOSITORY.
Defaults:
- `reference_inline_budget_pct = 40` (profile-editable, 10–80)
- Max references per run: 20
### 14.5 Registration and overrides
References may be:
- auto-registered from attachments at run start (default ON), or
- added mid-run via `panel_reference_add`.
Overrides:
- per artifact: `materialization = auto | force_inline | force_repository`
- per profile: `reference_mode_override = auto | force_inline | force_repository`
Per-artifact overrides win.
### 14.6 Repository storage model: quota + dedupe + pointer-by-default
To prevent disk bloat (especially legal panels), the default is **pointer + hash**, not copying.
- Per-run snapshot storage budget default: **200 MB** (profile-editable).
- Dedupe by `content_hash`: stored once globally under:
- `ELNOR_MEMORY/system/panels/reference_store/<content_hash>.content`
- `ELNOR_MEMORY/system/panels/reference_store/<content_hash>.index.json`
Run folder stores:
- `ELNOR_MEMORY/system/panels/references/<run_id>/manifest.json`
- `ELNOR_MEMORY/system/panels/references/<run_id>/access_log.jsonl`
Each reference has `snapshot: boolean` (default false). If true, snapshot into the global store (deduped). Only new bytes count toward the run budget.
### 14.7 Manifest and section retrieval
When REPOSITORY, inject a compact manifest (detail level profile-editable: minimal/standard/detailed) and provide tools:
- `panel_ref_read(ref_id, section_ids[], full=false)`
Rules:
- Retrieved sections persist for current + next turn (2-turn window).
- `full=true` rejected if >50% of remaining context.
- EC logs each retrieval (tokens_returned, turn_number) to access_log.jsonl and as a system event.
### 14.8 Structural index generation (lazy + proactive; 0 LLM)
- Lazy: generate index when an artifact transitions to REPOSITORY.
- Proactive: when smallest-agent `pressure_pct >= 50%`, EC may pre-generate indexes for likely-large inline artifacts to avoid a latency spike.
Indexing is heading/paragraph parsing only (no LLM).
### 14.9 OpenClaw tool exposure (DOC4 integration)
When Hands/OpenClaw are active, `panel_ref_read` must be exposed as an OpenClaw tool/skill so the active agent loop can call it during a turn.
---
## 15) Discussion compaction (off by default; per-agent views) [ADDED]
### 15.1 Purpose
Long Ship/High-Stakes runs accumulate discussion turns that exhaust context. Compaction summarizes resolved discussion only. It never compacts references.
### 15.2 Mode and initial release trigger rule
Moderator Profile: `compaction_mode = off | auto | aggressive` (default OFF).
**Initial release rule (safety):** In `auto`, compaction triggers only at the required threshold (`pressure_pct >= 75%`). Eligible threshold prompts (e.g., 60% with user confirm) may be added in a later release.
Frequency cap:
- Default: max **1** compaction per agent per run.
- Profile-editable up to 3 for High Stakes only.
### 15.3 Compaction view structure
Compaction produces a structured injected view:
- Decisions ledger (verbatim; never re-summarized)
- Open items (verbatim)
- Resolved discussion summary (lossy)
- Turn anchors for drillback
### 15.4 Drillback tool
- `panel_turn_drillback(run_id, turn_ids[])`
Rules:
- 2-turn window
- Guard: if >30% of remaining context, return partial and instruct to request fewer turns.
### 15.5 Compaction artifacts and events (system operation)
Compaction is NOT an Overlay Intervention. It is logged as `compaction_event` and stored as derived views under:
- `ELNOR_MEMORY/system/panels/compaction/<run_id>/<agent_id>_<seq>.json`
- meta: trigger, pressure_pct, tokens_saved, compacted_turn_ids, status (ok/insufficient/skipped)
### 15.6 Compaction drift detection (taxonomy)
Seed taxonomy category `compaction_drift` (enabled, gate_behavior none).
Nightly deterministic check (0 LLM):
- if compaction occurred AND a post-compaction Ship proposal contradicts a pre-compaction decision ledger entry (string/hash compare), flag `compaction_drift` for review.
### 15.7 Cost accounting line items
Add cost line items (derived from event logs; no hot-path LLM):
- reference retrieval overhead
- compaction overhead
Q must surface these in existing cost views.
**End of DOC6 v1.11.8.1.**