ELNOR REPO READER TEXT MIRROR Original path: Current Specs/DOC8/DOC8_SELF_LEARNING_PREDICTION_REGRESSION_FRICTION_v1_11_4.md Source repo: /Users/OpenClaw1/Elnor/Elnor Specs Git branch: main Git commit: dbaa25962edc11ab30e8d4ca1715f9ae5bf77331 Generated: 2026-06-09T01:23:58.539Z --- # DOC8 — Self-Learning Engine (Prediction, Regressions, Friction) v1.11.4 **ELNOR Suite v1.11** | February 27, 2026 **Status:** Draft → ready for Claude Code implementation **Audience:** Claude Code / Codex implementers (EC + Q) ## Spec pinning (do not skip) The following specs were used as the reference set for this DOC8 revision. If any hash does **not** match your repo, **STOP** and reconcile before implementing. - `ELNOR_CORE_SPEC_v1_11_2_CANONICAL.md` — `a1a23894dae7bbe93af868d02204896d887b2c2d1afd7b63a66a1c4468c81571` - `Q_DASHBOARD_SPEC_v1_11_2_CANONICAL.md` — `f24788fbade9788a30df7e1d5546d14ac65e9464abd8cef84a9e4b4112405fa3` - `DOC1_MEMORY_RESILIENCE_v1_11_3_FINAL.md` — `03f1c5ce75292e58019421763812dfd1f6fdd5942b202a3f07a8768063de2496` - `DOC2_FRESHNESS_PERSONAL_STATE_v1_11_4.md` — `240124c9175e7f23848b2f754b78696ff18e8a0b62e1ae94605b568c05a6876b` - `DOC6_PANELS_FORUMS_SELF_IMPROVEMENT_v1_11_8_0.md` — `1d62d1050ad7ba3f95399ef3ff2c9122a2f6144096921ffcfa6ffca4d0b07c7e` - `DOC7_CONTEXT_BUCKETS_FILES_v1_11_8_DRAFT.md` — `4b254e6f674654b63baa51f04d14843b099b81b9b8ae273a249fccefede0d94f` - `DOC8_SELF_LEARNING_PREDICTION_REGRESSION_FRICTION_v1_11_3.md` — `4cccdc0c0620a3073aab356f918ff72088877b17e28b27773dd7f81f8740a1ec` **Supersedes:** `DOC8_SELF_LEARNING_PREDICTION_REGRESSION_FRICTION_v1_11_3.md` (pinned above). Implement **this** DOC8 as the single authoritative DOC8 moving forward. --- ## 0) Intent in plain language DOC8 makes ELNOR *measurably* self-healing without turning it into an opaque “auto-pilot”. It does three things: 1. **Friction Engine:** Automatically records real-world “stuff went wrong / got annoying / got slow” events across Q, EC, OpenClaw tool calls, context pressure, compaction, approvals, offline modes, and nightly jobs. 2. **Closed-loop healing:** Repeated friction produces *actionable* prevention candidates, repair suggestions, and optional forum/panel escalation. When a fix works, DOC8 emits an explicit positive signal (`prevented_friction`) so the system learns “this mitigation helped”. 3. **Low-bloat observability:** A nightly derived index (`friction_state.json`) + rollups power Q dashboards **without** rescanning huge JSONL logs or inventing new LLM calls. It is intentionally agnostic to domain (legal/coding/research/etc), while still allowing **editable categories** (including “legal_distortion”) through the taxonomy mechanisms already introduced in DOC6. --- ## 1) Non-negotiables (re-stated) 1. **Local-first:** Everything stored under `ELNOR_MEMORY/`. 2. **Single-writer:** **EC is the only durable writer.** Q (frontend/backend) must never directly write files under `ELNOR_MEMORY/`. 3. **Append-only logs:** Events + actions are JSONL append-only. Derived state is written atomically. 4. **No hot-path LLM calls:** Recording friction and maintaining state must not require LLM calls. 5. **No silent steering:** Any behavior-adapting context injection must be **visible** in the prompt and token-capped. 6. **Deterministic + bounded:** Nightly jobs have strict runtime/work limits and deterministic outputs. 7. **Degraded mode is honest:** If EC is down, Q shows “degraded” and does not pretend learning is happening. --- ## 2) Key changes from DOC8 v1.11.3 ### 2.1 Adopted fixes (from red-team + Claude review) - **Fingerprint redesign:** split into `fingerprint_structural` (grouping key) and `fingerprint_variant` (debug discriminator). - **Manual merge escape hatch:** Q can merge fingerprints via append-only actions; merges are applied **before** aggregation. - **Nightly scaling fix:** add `friction_state.json` as the fast-read derived index + cursor offsets, so nightly never rescans entire JSONL files. - **`prevented_friction` idempotency:** new `fix_epoch_id` to emit once per mitigation/fix epoch. - **Per-fingerprint forum threads:** escalation defaults to one thread per fingerprint; global thread is fallback only. - **Prediction calibration deferred:** keep raw prediction logs, but **remove** calibration metrics from rollups for v1. - **File topology simplification:** unify annotations/escalations/merges under a single `friction_actions.jsonl`. - **Targeted friction-aware context injection:** only inject relevant, recent, high-severity cautions; cap at 3 notes. ### 2.2 Added “actually self-healing” power without bloat - **Computed severity:** derived nightly from recurrence + escalation + emitted severities. - **Root-cause clustering:** deterministic grouping by shared `channel/stage/tool_name`. - **Canary mode for prevention rules:** rules graduate `canary → confirmed` or are marked `ineffective`. - **System health metrics + anomaly detection:** simple math, no LLM, early warning signals. - **Nightly job self-monitoring:** the monitor monitors itself. --- ## 3) Data model & durable artifacts (EC) All paths are relative to repo root. All durable data lives under `ELNOR_MEMORY/`. ### 3.1 Canonical paths (add to `packages/contracts/src/canonical.ts`) Add these to `CANONICAL_PATHS` (and export) if not already present: - `learningFrictionEvents`: `ELNOR_MEMORY/system/learning/friction_events.jsonl` - `learningFrictionActions`: `ELNOR_MEMORY/system/learning/friction_actions.jsonl` - `learningFrictionState`: `ELNOR_MEMORY/system/learning/friction_state.json` - `learningRollup`: `ELNOR_MEMORY/system/learning/learning_rollup.json` - `learningRollupHistoryDir`: `ELNOR_MEMORY/system/learning/rollups/` - `learningSystemHealth`: `ELNOR_MEMORY/system/learning/system_health.jsonl` - `learningPredictions`: `ELNOR_MEMORY/system/learning/predictions.jsonl` - `learningRegressions`: `ELNOR_MEMORY/system/learning/regressions.jsonl` - `learningSignals`: `ELNOR_MEMORY/system/learning/learning_signals.jsonl` - `learningControls`: `ELNOR_MEMORY/system/learning/learning_controls.json` > **Note:** `learning_controls.json` is intentionally minimal: it stores only user-facing toggles that change behavior. All other thresholds remain code constants tagged `DOC8-TUNE`. ### 3.2 Retention & bloat control - `friction_events.jsonl`: retain **180 days**, then archive (DOC1 retention style). - `friction_actions.jsonl`: retain **180 days**, then archive. - `friction_state.json`: always current (atomic overwrite). - `learning_rollup.json`: always current (atomic overwrite). - `rollups/`: keep last **7** daily rollups (delete older). - `system_health.jsonl`: retain **365 days** (1 row/day). - `predictions.jsonl`: retain **180 days** (small). - `regressions.jsonl`: retain **365 days** (small). - All pruning/archival must be EC-only and should reuse DOC1 “preview → archive” semantics. --- ## 4) Schemas & contracts (packages/contracts) All schemas below are **normative**. Implementers must not invent fields or rename them. ### 4.1 Shared enums - `LearningChannel` (string enum): - `ec_service` - `q_backend` - `q_frontend` - `openclaw` - `nightly` - `panels` - `forums` - `FrictionType` (string enum; additive allowed, but do not rename): - `tool_failure` - `tool_timeout` - `offline_mode` - `permission_error` - `budget_exhausted` - `context_pressure` - `compaction_event` - `memory_read_failure` - `memory_search_failure` - `validation_error` - `slow_path` - `ux_annoyance` (manual/default) - `quality_degradation` (manual/default) - `rollup_error` (nightly self-monitoring) - `FrictionSeverity` (string enum): `blocker | major | minor` - `FrictionStatus` (string enum): - `open` - `mitigated` - `fixed` - `ignored` - `stale` - `PreventionRuleState` (string enum): - `candidate` - `canary` - `confirmed` - `ineffective` ### 4.2 Fingerprint strategy (normative) **Goal:** stable grouping + debuggability without relying on fragile full-message hashes. - `fingerprint_structural` = `sha256(join("|", [ channel, friction_type, normalize_stage(stage), tool_name || "", error_code || "", http_status || "" ]))` - `fingerprint_variant` = `sha256(join("|", [ fingerprint_structural, message_norm_prefix_60 || "" ]))` Where: - `normalize_stage()` removes volatile ids (`UUID`, long hex, timestamps, numeric ids ≥ 5 digits) and lowercases. - `message_norm_prefix_60` is a normalized message string truncated to first 60 chars (after stripping volatile ids). - The **structural** key is the default aggregation key. - Variants are used for debugging, sampling, and display; they must not fragment aggregation. ### 4.3 FrictionEvent schema (append-only; immutable facts) Create `FrictionEventSchema` in `packages/contracts/src/schemas.ts`: Required: - `event_id: string` (uuid) - `created_at: string` (ISO) - `channel: LearningChannel` - `friction_type: FrictionType` - `severity: FrictionSeverity` - `stage: string` (short, stable; e.g. `fetchEc:/api/commands`, `context_assembly`, `openclaw:tool:browser_navigate`) - `fingerprint_structural: string` (sha256 hex) - `fingerprint_variant: string` (sha256 hex) Optional (but strongly recommended for auto-detected events): - `tool_name?: string` - `error_code?: string` - `http_status?: number` - `message_raw?: string` (cap 2,000 chars) - `message_norm_prefix_60?: string` (cap 60 chars) - `run_id?: string` (task/panel run id if present) - `task_id?: string` - `panel_run_id?: string` - `forum_thread_id?: string` - `conversation_id?: string` - `agent_id?: string` - `model_id?: string` - `context_pressure_pct?: number` (0–100; only for `context_pressure`) - `meta?: Record` (small; cap serialized bytes) ### 4.4 FrictionAction schema (append-only; mutable layer via events) All friction “edits” are append-only actions in `friction_actions.jsonl`. Create `FrictionActionSchema` as a discriminated union on `action_type`. Common required fields for all actions: - `action_id: string` (uuid) - `created_at: string` (ISO) - `fingerprint_structural: string` - `action_type: string` (see below) - `actor: "user" | "system"` - `actor_id?: string` (e.g. session id / user id; optional) Action variants: 1) `annotate_status` - `status: FrictionStatus` - `note?: string` (cap 800 chars) - `fix_epoch_id?: string` - **Required** when status is set to `mitigated` or `fixed`. - EC generates if not provided. 2) `add_note` - `note: string` (cap 800 chars) 3) `merge_fingerprint` - `merge_from: string` (fingerprint_structural) - `merge_into: string` (fingerprint_structural) - `note?: string` > Merge actions are stored once per merge decision. Nightly aggregation loads merge map **first** and applies it before counting. 4) `burst_suppressed` - `fingerprint_variant: string` - `window_start_at: string` (ISO) - `window_end_at: string` (ISO) - `suppressed_count: number` (>=1) 5) `escalate_forum` - `thread_id?: string` - `post_id?: string` - `post_excerpt?: string` (cap 600 chars) - `note?: string` 6) `link_repair_work` - `work_type: "task" | "panel" | "external"` - `work_id: string` (task_id / panel_run_id / external id) - `note?: string` 7) `prevention_rule_update` - `rule_id: string` (uuid) - `rule_state: PreventionRuleState` - `rule_summary: string` (cap 240 chars) - `mitigation_steps: string[]` (max 6, each cap 160 chars) - `code_hint?: string` (cap 240 chars) - `canary_until?: string` (ISO; required when rule_state=`canary`) - `linked_regression_id?: string` 8) `auto_mark_stale` - `note?: string` ### 4.5 FrictionState derived schema (fast read; atomic) `friction_state.json` is **derived** and written atomically by EC (nightly + incremental updates). Create `FrictionStateFileSchema`: Top-level required: - `generated_at: string` (ISO) - `cursor: { events_byte_offset: number, actions_byte_offset: number }` - `window_days: number` (default 14) - `entries: FrictionStateEntry[]` - `clusters: FrictionCluster[]` - `anomalies: SystemHealthAnomaly[]` `FrictionStateEntry` required: - `fingerprint_structural: string` - `status: FrictionStatus` - `computed_severity: FrictionSeverity` - `channel: LearningChannel` - `friction_type: FrictionType` - `stage: string` - `tool_name?: string` - `error_code?: string` - `first_seen_at: string` - `last_seen_at: string` - `count_total: number` - `count_window: number` (window_days) - `top_variants: Array<{ fingerprint_variant: string, count: number, message_prefix?: string }>` (max 5) - `latest_note?: string` - `merged_into?: string` (if this fingerprint_structural has been merged) Optional, for loop closure: - `last_escalation?: { thread_id?: string, post_id?: string, last_post_at?: string }` - `linked_repairs?: Array<{ work_type: "task" | "panel" | "external", work_id: string, note?: string }>` (max 6) - `prevention_rule?: { rule_id: string, rule_state: PreventionRuleState, rule_summary: string, canary_until?: string }` - `fix_epoch_id_current?: string` - `prevented_friction_emitted_epochs?: string[]` (max 12; for idempotency) `FrictionCluster` required: - `cluster_id: string` (sha256) - `key: string` (e.g. `q_backend|fetchEc:/api/commands|tool=fetchEc`) - `fingerprints: string[]` (fingerprint_structural ids; max 50) - `summary: string` (cap 240 chars) ## 4.6 Learning signals, regressions, predictions, system health (minimal but real) DOC8 relies on the existing “learning signals” pipeline (DOC6/DOC1). If those schemas already exist in `packages/contracts/src/schemas.ts`, extend them to match the required fields below. If they do not exist, create them exactly as specified. ### 4.6.1 LearningSignal schema (`learning_signals.jsonl`) Append-only. Used for positive/negative reinforcement and Impact Ledger ingestion. Required: - `signal_id: string` (uuid) - `created_at: string` (ISO) - `event_type: string` (enum; additive allowed, do not rename existing): - `friction_detected` - `prevented_friction` - `regression_triggered` - `canary_confirmed` - `canary_ineffective` - `health_anomaly` - `severity?: FrictionSeverity` (when applicable) - `fingerprint_structural?: string` - `fix_epoch_id?: string` - `rule_id?: string` - `meta?: Record` (small) **Idempotency requirement:** `prevented_friction` must be emitted at most once per `(fingerprint_structural, fix_epoch_id)`. ### 4.6.2 Regression entry schema (`regressions.jsonl`) Append-only. A regression is “this friction pattern is recurring enough that we need a prevention rule or repair work”. Required: - `regression_id: string` (uuid) - `created_at: string` (ISO) - `fingerprint_structural: string` - `severity: FrictionSeverity` - `summary: string` (cap 240 chars) - `detection_source: "nightly_threshold" | "user_report" | "forum_feedback" | "panel_conclusion"` - `status: "open" | "mitigated" | "fixed" | "ignored"` Optional: - `linked_rule_id?: string` - `linked_work_id?: string` - `category?: string` (optional taxonomy tag, e.g. `legal_distortion`) ### 4.6.3 Prediction events (`predictions.jsonl`) — raw logging only in v1 Prediction calibration dashboards are explicitly deferred, but raw logging is kept so you can build calibration later with real data. Use a single append-only event schema with `event_type`: - `prediction_create` - `prediction_finalize` Common required: - `prediction_id: string` (uuid) - `created_at: string` (ISO) - `event_type: "prediction_create" | "prediction_finalize"` - `run_id: string` - `run_type: "task" | "panel" | "forum"` Create-only fields (required on `prediction_create`): - `predicted_duration_min: number` - `predicted_success_prob: number` (0–1) Optional: - `predicted_cost_usd?: number` - `key_risks?: string[]` (max 6, each cap 120 chars) - `model_id?: string` Finalize-only fields (required on `prediction_finalize`): - `actual_duration_min: number` - `actual_success: boolean` Optional: - `actual_cost_usd?: number` - `notes?: string` (cap 240 chars) ### 4.6.4 System health rows (`system_health.jsonl`) Append-only daily summary. Required: - `row_id: string` (uuid) - `created_at: string` (ISO) - `date: string` (YYYY-MM-DD) - `rollup_duration_ms: number` - `new_events_processed: number` - `open_major_blocker_count: number` - `anomalies: Array<{ metric: string, value: number, mean7: number, std7: number, zscore: number, message: string }>` (max 12) --- ## 4.7 Learning rollup additions (Q-facing; minimal schema) If `LearningRollupSchema` already exists, add (or ensure it includes) a `friction` object: ``` friction: { generated_at: ISOString, window_days: number, totals: { new_events_processed: number, open_total: number, open_major_blocker: number, stale_total: number }, top: Array<{ fingerprint_structural: string, status: FrictionStatus, computed_severity: FrictionSeverity, count_window: number, last_seen_at: ISOString, channel: LearningChannel, friction_type: FrictionType, stage: string, tool_name?: string }>, // cap 50 clusters: FrictionCluster[], // cap 50 anomalies: SystemHealthAnomaly[] // cap 12 } ``` `learning_rollup.json` is for quick dashboards; `friction_state.json` is for the full table/drawer. --- ## 5) EC behavior & algorithms ### 5.1 Controls (minimal settings) `ELNOR_MEMORY/system/learning/learning_controls.json` schema: - `auto_escalate_enabled: boolean` (default true) - `context_cautions_enabled: boolean` (default true) No other tunables are user-exposed in v1. All thresholds are code constants marked `DOC8-TUNE`. ### 5.2 Automatic friction detection is primary Automatic detection must be on by default. User “Mark friction” is an optional override for: - subtle UX annoyances - “model got weird / misunderstood prompt” - quality degradation without a hard error - external services that don’t throw structured errors ### 5.3 Where to emit friction (required hook table) Implementers must wire friction emission at these minimum points (search-based insertion; do not invent new architecture). | Detection point | Channel | File (must exist) | What to hook (search hints) | friction_type | |---|---|---|---|---| | EC command processing errors | ec_service | `apps/ec-service/src/server.ts` | `app.post("/api/commands"` handler; catch blocks; validation failures | `validation_error` / `tool_failure` | | EC file IO failures | ec_service | `apps/ec-service/src/fs-utils.ts` | errors thrown by `appendJsonl`, `writeJsonAtomic` | `memory_read_failure` / `tool_failure` | | EC event-bus append failures | ec_service | `apps/ec-service/src/event-bus.ts` | failure of `appendJsonl(this.eventBusPath, ...)` | `tool_failure` | | Q backend → EC fetch failures | q_backend | `apps/q-backend/src/server.ts` | search `fetchEc(` and `.catch` / non-2xx | `offline_mode` / `tool_timeout` / `tool_failure` | | Q backend auth/permission denials | q_backend | `apps/q-backend/src/token-manager.ts` | invalid session/token / denied remote write | `permission_error` | | Q frontend API errors | q_frontend | `apps/q-frontend/src/api.ts` | fetch wrapper error paths | `tool_failure` | | Context pressure crossing threshold | ec_service | (create) `apps/ec-service/src/learning/context-pressure.ts` | called from Context Assembler (DOC1/DOC7 integration point) | `context_pressure` | | Compaction invoked / failed | ec_service | (existing compaction module per DOC1) | compaction start/end + failures | `compaction_event` | | Nightly job overflow/parse errors | nightly | (create) `apps/ec-service/src/learning/nightly-rollup.ts` | rollup runtime > warn threshold; parse error; schema validation fail | `rollup_error` | | Budget exhausted events | panels | (DOC6 panel runner module) | when feedback budget exhausted; append event | `budget_exhausted` | > If you cannot find the expected insertion point, you must record that in `PATCH_REPORT.md` and add a minimal grep output proving what exists. ### 5.4 `emitFrictionEvent()` helper (single path) Create `apps/ec-service/src/learning/friction-engine.ts` with: - `computeFingerprints(input): { fingerprint_structural, fingerprint_variant, message_norm_prefix_60 }` - `emitFrictionEvent(partialEvent): Promise` - fills ids/timestamps - computes fingerprints - validates against `FrictionEventSchema` - appends to `friction_events.jsonl` - updates in-memory burst cache (see below) - updates `friction_state.json` incrementally **or** marks “dirty” for nightly rebuild (implementation choice; must be documented) ### 5.5 Burst suppression (low bloat, keeps signal) `DOC8-TUNE` constants: - `BURST_WINDOW_MS = 10_000` - `BURST_FLUSH_MS = 60_000` Algorithm: - First event: append to `friction_events.jsonl`. - Repeats within `BURST_WINDOW_MS` for same `fingerprint_variant`: do **not** append new events; increment in-memory counter. - Periodically (or on window close), write a single `burst_suppressed` action to `friction_actions.jsonl` recording `suppressed_count`. This preserves frequency signal with far less disk churn. ### 5.6 Manual merge (escape hatch) Q can merge two fingerprints via `merge_fingerprint` action. - Nightly job must load merge map first and rewrite `fingerprint_structural` to its canonical target before aggregation. - Cycle detection: if merges create a cycle, ignore the newest merge and emit `rollup_error` friction event. ### 5.7 Nightly rollup + derived `friction_state.json` Create `apps/ec-service/src/learning/nightly-rollup.ts`. **Hard bounds** (`DOC8-TUNE`): - max runtime: **300s** (existing standard) - warn at: **200s** (emit `rollup_error` friction event) - max new events per run: **50k** (paranoid cap; if exceeded, stop and record overflow) - max entries in state: **10k** (paranoid cap) **No full rescans.** Use cursor byte offsets: - Store offsets in `friction_state.json.cursor`. - Nightly reads only new bytes since offsets for `friction_events.jsonl` and `friction_actions.jsonl`. Outputs: 1) Update `friction_state.json` (atomic) 2) Update `learning_rollup.json` (atomic) 3) Write history rollup to `rollups/YYYY-MM-DD_learning_rollup.json` and prune older than 7. 4) Append `system_health.jsonl` row (1/day). 5) Optionally create/append regressions and prevention candidates (see below). Nightly self-monitoring: - Any parse error, schema validation error, overflow, or write failure emits a friction event (`channel=nightly`, `friction_type=rollup_error`). ### 5.8 Computed severity (v1 formula; deterministic) For each `fingerprint_structural`, compute: - `base = mode(emitted_severity over window)` (fallback to `minor`) - `freq = count_window` in last 14 days - `escalated = 1 if has escalation action else 0` Rules: - If `base == blocker` → computed `blocker` - Else if `freq >= 10` → computed `major` - Else if `freq >= 3` and `escalated == 1` → computed `major` - Else → computed `minor` (Adjustable later via DOC8-TUNE, but do not add UI knobs in v1.) ### 5.9 Friction expiry / stale marking If a fingerprint: - has `count_window == 0` for **30 days** AND - has never been escalated AND - status is still `open` then nightly adds an `auto_mark_stale` action, and the derived status becomes `stale`. - `stale` is hidden in default Q filters but remains searchable. ### 5.10 Closed-loop healing: regression + prevention rule candidates **Trigger (`DOC8-TUNE`):** - If `computed_severity in {blocker, major}` AND - `count_window >= 3` in 14 days AND - status in `{open, mitigated}` Then nightly: 1) Appends a `regressions.jsonl` entry referencing the fingerprint. 2) Appends a `prevention_rule_update` action with `rule_state="candidate"` and structured fields: - `rule_summary` (short) - `mitigation_steps[]` (concrete) - `code_hint` (derived convention; optional) **Code hint convention (no static mapping table):** - `q_backend` → prefix `apps/q-backend/src/` - `q_frontend` → prefix `apps/q-frontend/src/` - `ec_service` → prefix `apps/ec-service/src/` - `openclaw` → prefix `~/.openclaw/workspace/` (do **not** assume; include only if you can resolve deterministically) Stage-to-path transformation is best-effort. If uncertain, omit code_hint. ### 5.11 Canary mode (rules that prove themselves) When user approves a prevention candidate (via Inbox/approval flow from DOC1/DOC6): - append `prevention_rule_update` with `rule_state="canary"` and `canary_until = now + 7 days` Nightly checks: - If `canary_until` passed and fingerprint had **zero** events in the canary window: - update to `confirmed` (action append) - emit `prevented_friction` learning signal once (see fix epoch below) - If fingerprint recurs during canary: - update to `ineffective` - (optional) auto-suggest repair panel ### 5.12 Fix epochs + prevented_friction idempotency When a fingerprint is manually marked `mitigated` or `fixed`: - EC assigns a new `fix_epoch_id` (uuid) and stores it in derived state. - Nightly may emit `prevented_friction` **once** per `(fingerprint_structural, fix_epoch_id)` pair. - The emitted epoch ids are stored in `prevented_friction_emitted_epochs[]` in `friction_state`. ### 5.13 Forum escalation (threading rules) User action (Q) creates an `escalate_forum` action; EC performs: 1) If `friction_state.last_escalation.thread_id` exists → post reply there. 2) Else create new thread titled: `Friction: {channel} {stage} ({computed_severity})` and post initial message. 3) Else fallback to `default_forum_thread_id` if configured elsewhere (DOC6 forum system); if still unavailable, create a pending item containing the post text for copy/paste. Forum post is deterministic template (no LLM): - fingerprint_structural - top variants + counts - last_seen_at + count_window - linked repairs / rules - request: “Please propose fixes, mitigations, or spec/code changes.” ### 5.14 Repair work linkage + auto-mitigation When user creates a repair task/panel from a fingerprint: - append `link_repair_work` action with `work_id` - nightly watches the linked work outcome: - if work is marked complete AND no recurrence for 7 days → auto-annotate `mitigated` (not `fixed`) ### 5.15 Friction-aware context injection (visible, targeted) If `learning_controls.context_cautions_enabled == true`, then during context assembly for an operation with `(channel, stage)`: - select fingerprints where: - `entry.channel == channel` - status in `{open, mitigated}` - computed_severity in `{blocker, major}` - last_seen_at within **7 days** - sort by (computed_severity desc, last_seen desc) - inject up to **3** one-line notes: Example injection block (visible, system prompt): ``` [Friction Cautions — last 7 days] - fetchEc:/api/commands has timed out intermittently (major; last seen 2026-02-26). Prefer retry-once + fallback. - context_assembly hit pressure threshold recently (major; last seen 2026-02-25). Keep outputs concise. ``` Token cap: **150 tokens total**. If context pressure is already high, cap to 1 line. This is **not** a standing order and must not override policy or memory; it is a caution overlay. --- ## 6) System health metrics (cheap, deterministic) ### 6.1 `system_health.jsonl` row (1/day) Each nightly run appends a row with: - date - rollup duration - number of new events processed - number of open major/blocker fingerprints - q_backend EC error count (if available) - any anomalies flagged ### 6.2 Simple anomaly detection Maintain a rolling 7-day baseline (mean + stddev) for numeric metrics. Flag anomaly if current value deviates > **2σ** and absolute change exceeds a small floor. Anomalies are included in `friction_state.anomalies[]` and `learning_rollup.json`. No LLM. Pure math. --- ## 7) Q Dashboard UX (implementation-friendly, but informative) ### 7.1 V1 surfaces (ship these) 1) **Learning → Friction sub-tab** (inside existing Learning page) - Table driven from `friction_state.json.entries` - Default filter: status != stale AND computed_severity != minor (user can toggle) - Cluster grouping (optional UI grouping by cluster key) 2) **Row detail drawer** (lightweight) - Shows: top variants/messages, actions history (last 10), linked repairs, rule state, escalation link - Buttons: - Mark mitigated / fixed / ignored - Add note - Merge into… - Discuss in Forum - Create Repair Panel (DOC6 panel preset) - Create Repair Task (existing task system) 3) **Global “Mark friction” action** - Always available (top bar / quick actions) - Creates either: - a new friction event (manual) OR - a note/status action on an existing fingerprint if user selects one ### 7.2 Settings (minimal) In Q Settings → Learning: - Toggle: `Auto-escalate suggestions` (writes `learning_controls.auto_escalate_enabled`) - Toggle: `Show friction cautions in prompt` (writes `learning_controls.context_cautions_enabled`) No other settings in v1. ### 7.3 Optional (defer unless easy) - Home card - Trend charts - Full-text search across raw message bodies --- ## 8) Tests & acceptance criteria ### 8.1 Required unit tests (EC) - Fingerprint test: structural stable across message drift; variant differs when prefix differs. - Merge test: merge action collapses counts before aggregation. - Cursor test: second nightly run processes zero bytes when no new events. - `prevented_friction` idempotency: emitted once per fix_epoch. - Stale marking: 30-day inactivity marks stale. - Context injection selection: caps at 3 and respects channel/stage + recency. ### 8.2 Required integration test (the “self-healing loop is real” test) Simulate: 1) Emit same friction event 3 times (major) within window. 2) Run nightly → regression entry + prevention candidate created. 3) Append approval action setting rule to canary. 4) Run nightly without new events until canary expires. 5) Verify: rule becomes confirmed + prevented_friction signal emitted once. This test must fail if implementation is “hollow”. --- ## 9) Implementation map (do not drift) ### 9.1 Contracts - `packages/contracts/src/canonical.ts` (add paths) - `packages/contracts/src/schemas.ts` (add schemas + export) - `packages/contracts/src/index.ts` (export types) ### 9.2 EC service Create: - `apps/ec-service/src/learning/friction-engine.ts` - `apps/ec-service/src/learning/nightly-rollup.ts` - `apps/ec-service/src/learning/system-health.ts` (helpers; optional) - `apps/ec-service/src/learning/learning-controls.ts` Modify: - `apps/ec-service/src/server.ts` - register read endpoints: - `GET /api/learning/friction/state` → returns friction_state.json - `GET /api/learning/rollup` → returns learning_rollup.json - handle new command types for friction actions and manual friction append - `apps/ec-service/src/index.ts` - start nightly scheduler (if not already) to run rollup once per day (reuse existing scheduler per DOC1/DOC6) ### 9.3 Q backend Modify: - `apps/q-backend/src/server.ts` - proxy GET endpoints to EC for rollup/state - add helper route for `POST /api/learning/friction/action` that enqueues `learning_friction_action_append` - ensure remote write gating is enforced for status-changing actions ### 9.4 Q frontend Modify/add: - `apps/q-frontend/src/pages/LearningPage.tsx` (add Friction sub-tab) - `apps/q-frontend/src/components/*`: - `FrictionTable.tsx` - `FrictionDetailDrawer.tsx` - `MarkFrictionModal.tsx` Keep changes additive; do not refactor `QDashV11.jsx` except minimal wiring to open the modal (per prior guardrails). --- ## 10) Claude Code / Codex guardrails (implementation quality) Implementer must produce: - `PATCH_REPORT.md` with: - exact files changed/created - grep evidence for hook points used - summary of any deviations (and why) - Run and report: - `npm test` - any relevant manual smoke steps (start servers, open Learning tab, mark friction) --- ## 11) Future directions (explicitly not in v1) - Tier 3 autonomy (“act-and-report”) — **do not implement plumbing yet**. - Prediction calibration dashboards — revisit after 30 days of real data.