Elnor Repo Reader

DOC8_SELF_LEARNING_PREDICTION_REGRESSION_FRICTION_v1_11_4.md

Current Specs/DOC8/DOC8_SELF_LEARNING_PREDICTION_REGRESSION_FRICTION_v1_11_4.md

Generated 2026-06-09T01:23:58.539Z from commit dbaa25962edc11ab30e8d4ca1715f9ae5bf77331. Worktree: clean.

Open text page · Open raw txt · Open path URL

# DOC8 — Self-Learning Engine (Prediction, Regressions, Friction) v1.11.4
**ELNOR Suite v1.11** | February 27, 2026  
**Status:** Draft → ready for Claude Code implementation  
**Audience:** Claude Code / Codex implementers (EC + Q)  

## Spec pinning (do not skip)
The following specs were used as the reference set for this DOC8 revision. If any hash does **not** match your repo, **STOP** and reconcile before implementing.

- `ELNOR_CORE_SPEC_v1_11_2_CANONICAL.md` — `a1a23894dae7bbe93af868d02204896d887b2c2d1afd7b63a66a1c4468c81571`
- `Q_DASHBOARD_SPEC_v1_11_2_CANONICAL.md` — `f24788fbade9788a30df7e1d5546d14ac65e9464abd8cef84a9e4b4112405fa3`
- `DOC1_MEMORY_RESILIENCE_v1_11_3_FINAL.md` — `03f1c5ce75292e58019421763812dfd1f6fdd5942b202a3f07a8768063de2496`
- `DOC2_FRESHNESS_PERSONAL_STATE_v1_11_4.md` — `240124c9175e7f23848b2f754b78696ff18e8a0b62e1ae94605b568c05a6876b`
- `DOC6_PANELS_FORUMS_SELF_IMPROVEMENT_v1_11_8_0.md` — `1d62d1050ad7ba3f95399ef3ff2c9122a2f6144096921ffcfa6ffca4d0b07c7e`
- `DOC7_CONTEXT_BUCKETS_FILES_v1_11_8_DRAFT.md` — `4b254e6f674654b63baa51f04d14843b099b81b9b8ae273a249fccefede0d94f`
- `DOC8_SELF_LEARNING_PREDICTION_REGRESSION_FRICTION_v1_11_3.md` — `4cccdc0c0620a3073aab356f918ff72088877b17e28b27773dd7f81f8740a1ec`

**Supersedes:** `DOC8_SELF_LEARNING_PREDICTION_REGRESSION_FRICTION_v1_11_3.md` (pinned above).  
Implement **this** DOC8 as the single authoritative DOC8 moving forward.

---

## 0) Intent in plain language
DOC8 makes ELNOR *measurably* self-healing without turning it into an opaque “auto-pilot”.

It does three things:
1. **Friction Engine:** Automatically records real-world “stuff went wrong / got annoying / got slow” events across Q, EC, OpenClaw tool calls, context pressure, compaction, approvals, offline modes, and nightly jobs.
2. **Closed-loop healing:** Repeated friction produces *actionable* prevention candidates, repair suggestions, and optional forum/panel escalation. When a fix works, DOC8 emits an explicit positive signal (`prevented_friction`) so the system learns “this mitigation helped”.
3. **Low-bloat observability:** A nightly derived index (`friction_state.json`) + rollups power Q dashboards **without** rescanning huge JSONL logs or inventing new LLM calls.

It is intentionally agnostic to domain (legal/coding/research/etc), while still allowing **editable categories** (including “legal_distortion”) through the taxonomy mechanisms already introduced in DOC6.

---

## 1) Non-negotiables (re-stated)
1. **Local-first:** Everything stored under `ELNOR_MEMORY/`.
2. **Single-writer:** **EC is the only durable writer.** Q (frontend/backend) must never directly write files under `ELNOR_MEMORY/`.
3. **Append-only logs:** Events + actions are JSONL append-only. Derived state is written atomically.
4. **No hot-path LLM calls:** Recording friction and maintaining state must not require LLM calls.
5. **No silent steering:** Any behavior-adapting context injection must be **visible** in the prompt and token-capped.
6. **Deterministic + bounded:** Nightly jobs have strict runtime/work limits and deterministic outputs.
7. **Degraded mode is honest:** If EC is down, Q shows “degraded” and does not pretend learning is happening.

---

## 2) Key changes from DOC8 v1.11.3
### 2.1 Adopted fixes (from red-team + Claude review)
- **Fingerprint redesign:** split into `fingerprint_structural` (grouping key) and `fingerprint_variant` (debug discriminator).
- **Manual merge escape hatch:** Q can merge fingerprints via append-only actions; merges are applied **before** aggregation.
- **Nightly scaling fix:** add `friction_state.json` as the fast-read derived index + cursor offsets, so nightly never rescans entire JSONL files.
- **`prevented_friction` idempotency:** new `fix_epoch_id` to emit once per mitigation/fix epoch.
- **Per-fingerprint forum threads:** escalation defaults to one thread per fingerprint; global thread is fallback only.
- **Prediction calibration deferred:** keep raw prediction logs, but **remove** calibration metrics from rollups for v1.
- **File topology simplification:** unify annotations/escalations/merges under a single `friction_actions.jsonl`.
- **Targeted friction-aware context injection:** only inject relevant, recent, high-severity cautions; cap at 3 notes.

### 2.2 Added “actually self-healing” power without bloat
- **Computed severity:** derived nightly from recurrence + escalation + emitted severities.
- **Root-cause clustering:** deterministic grouping by shared `channel/stage/tool_name`.
- **Canary mode for prevention rules:** rules graduate `canary → confirmed` or are marked `ineffective`.
- **System health metrics + anomaly detection:** simple math, no LLM, early warning signals.
- **Nightly job self-monitoring:** the monitor monitors itself.

---

## 3) Data model & durable artifacts (EC)
All paths are relative to repo root. All durable data lives under `ELNOR_MEMORY/`.

### 3.1 Canonical paths (add to `packages/contracts/src/canonical.ts`)
Add these to `CANONICAL_PATHS` (and export) if not already present:

- `learningFrictionEvents`: `ELNOR_MEMORY/system/learning/friction_events.jsonl`
- `learningFrictionActions`: `ELNOR_MEMORY/system/learning/friction_actions.jsonl`
- `learningFrictionState`: `ELNOR_MEMORY/system/learning/friction_state.json`
- `learningRollup`: `ELNOR_MEMORY/system/learning/learning_rollup.json`
- `learningRollupHistoryDir`: `ELNOR_MEMORY/system/learning/rollups/`
- `learningSystemHealth`: `ELNOR_MEMORY/system/learning/system_health.jsonl`
- `learningPredictions`: `ELNOR_MEMORY/system/learning/predictions.jsonl`
- `learningRegressions`: `ELNOR_MEMORY/system/learning/regressions.jsonl`
- `learningSignals`: `ELNOR_MEMORY/system/learning/learning_signals.jsonl`
- `learningControls`: `ELNOR_MEMORY/system/learning/learning_controls.json`

> **Note:** `learning_controls.json` is intentionally minimal: it stores only user-facing toggles that change behavior. All other thresholds remain code constants tagged `DOC8-TUNE`.

### 3.2 Retention & bloat control
- `friction_events.jsonl`: retain **180 days**, then archive (DOC1 retention style).
- `friction_actions.jsonl`: retain **180 days**, then archive.
- `friction_state.json`: always current (atomic overwrite).
- `learning_rollup.json`: always current (atomic overwrite).
- `rollups/`: keep last **7** daily rollups (delete older).
- `system_health.jsonl`: retain **365 days** (1 row/day).
- `predictions.jsonl`: retain **180 days** (small).
- `regressions.jsonl`: retain **365 days** (small).
- All pruning/archival must be EC-only and should reuse DOC1 “preview → archive” semantics.

---

## 4) Schemas & contracts (packages/contracts)
All schemas below are **normative**. Implementers must not invent fields or rename them.

### 4.1 Shared enums
- `LearningChannel` (string enum):
  - `ec_service`
  - `q_backend`
  - `q_frontend`
  - `openclaw`
  - `nightly`
  - `panels`
  - `forums`
- `FrictionType` (string enum; additive allowed, but do not rename):
  - `tool_failure`
  - `tool_timeout`
  - `offline_mode`
  - `permission_error`
  - `budget_exhausted`
  - `context_pressure`
  - `compaction_event`
  - `memory_read_failure`
  - `memory_search_failure`
  - `validation_error`
  - `slow_path`
  - `ux_annoyance` (manual/default)
  - `quality_degradation` (manual/default)
  - `rollup_error` (nightly self-monitoring)
- `FrictionSeverity` (string enum): `blocker | major | minor`
- `FrictionStatus` (string enum):
  - `open`
  - `mitigated`
  - `fixed`
  - `ignored`
  - `stale`
- `PreventionRuleState` (string enum):
  - `candidate`
  - `canary`
  - `confirmed`
  - `ineffective`

### 4.2 Fingerprint strategy (normative)
**Goal:** stable grouping + debuggability without relying on fragile full-message hashes.

- `fingerprint_structural` = `sha256(join("|", [
  channel,
  friction_type,
  normalize_stage(stage),
  tool_name || "",
  error_code || "",
  http_status || ""
]))`

- `fingerprint_variant` = `sha256(join("|", [
  fingerprint_structural,
  message_norm_prefix_60 || ""
]))`

Where:
- `normalize_stage()` removes volatile ids (`UUID`, long hex, timestamps, numeric ids ≥ 5 digits) and lowercases.
- `message_norm_prefix_60` is a normalized message string truncated to first 60 chars (after stripping volatile ids).
- The **structural** key is the default aggregation key.
- Variants are used for debugging, sampling, and display; they must not fragment aggregation.

### 4.3 FrictionEvent schema (append-only; immutable facts)
Create `FrictionEventSchema` in `packages/contracts/src/schemas.ts`:

Required:
- `event_id: string` (uuid)
- `created_at: string` (ISO)
- `channel: LearningChannel`
- `friction_type: FrictionType`
- `severity: FrictionSeverity`
- `stage: string` (short, stable; e.g. `fetchEc:/api/commands`, `context_assembly`, `openclaw:tool:browser_navigate`)
- `fingerprint_structural: string` (sha256 hex)
- `fingerprint_variant: string` (sha256 hex)

Optional (but strongly recommended for auto-detected events):
- `tool_name?: string`
- `error_code?: string`
- `http_status?: number`
- `message_raw?: string` (cap 2,000 chars)
- `message_norm_prefix_60?: string` (cap 60 chars)
- `run_id?: string` (task/panel run id if present)
- `task_id?: string`
- `panel_run_id?: string`
- `forum_thread_id?: string`
- `conversation_id?: string`
- `agent_id?: string`
- `model_id?: string`
- `context_pressure_pct?: number` (0–100; only for `context_pressure`)
- `meta?: Record<string, unknown>` (small; cap serialized bytes)

### 4.4 FrictionAction schema (append-only; mutable layer via events)
All friction “edits” are append-only actions in `friction_actions.jsonl`.

Create `FrictionActionSchema` as a discriminated union on `action_type`.

Common required fields for all actions:
- `action_id: string` (uuid)
- `created_at: string` (ISO)
- `fingerprint_structural: string`
- `action_type: string` (see below)
- `actor: "user" | "system"`
- `actor_id?: string` (e.g. session id / user id; optional)

Action variants:

1) `annotate_status`
- `status: FrictionStatus`
- `note?: string` (cap 800 chars)
- `fix_epoch_id?: string`
  - **Required** when status is set to `mitigated` or `fixed`.
  - EC generates if not provided.

2) `add_note`
- `note: string` (cap 800 chars)

3) `merge_fingerprint`
- `merge_from: string` (fingerprint_structural)
- `merge_into: string` (fingerprint_structural)
- `note?: string`

> Merge actions are stored once per merge decision. Nightly aggregation loads merge map **first** and applies it before counting.

4) `burst_suppressed`
- `fingerprint_variant: string`
- `window_start_at: string` (ISO)
- `window_end_at: string` (ISO)
- `suppressed_count: number` (>=1)

5) `escalate_forum`
- `thread_id?: string`
- `post_id?: string`
- `post_excerpt?: string` (cap 600 chars)
- `note?: string`

6) `link_repair_work`
- `work_type: "task" | "panel" | "external"`
- `work_id: string` (task_id / panel_run_id / external id)
- `note?: string`

7) `prevention_rule_update`
- `rule_id: string` (uuid)
- `rule_state: PreventionRuleState`
- `rule_summary: string` (cap 240 chars)
- `mitigation_steps: string[]` (max 6, each cap 160 chars)
- `code_hint?: string` (cap 240 chars)
- `canary_until?: string` (ISO; required when rule_state=`canary`)
- `linked_regression_id?: string`

8) `auto_mark_stale`
- `note?: string`

### 4.5 FrictionState derived schema (fast read; atomic)
`friction_state.json` is **derived** and written atomically by EC (nightly + incremental updates).

Create `FrictionStateFileSchema`:

Top-level required:
- `generated_at: string` (ISO)
- `cursor: { events_byte_offset: number, actions_byte_offset: number }`
- `window_days: number` (default 14)
- `entries: FrictionStateEntry[]`
- `clusters: FrictionCluster[]`
- `anomalies: SystemHealthAnomaly[]`

`FrictionStateEntry` required:
- `fingerprint_structural: string`
- `status: FrictionStatus`
- `computed_severity: FrictionSeverity`
- `channel: LearningChannel`
- `friction_type: FrictionType`
- `stage: string`
- `tool_name?: string`
- `error_code?: string`
- `first_seen_at: string`
- `last_seen_at: string`
- `count_total: number`
- `count_window: number` (window_days)
- `top_variants: Array<{ fingerprint_variant: string, count: number, message_prefix?: string }>` (max 5)
- `latest_note?: string`
- `merged_into?: string` (if this fingerprint_structural has been merged)

Optional, for loop closure:
- `last_escalation?: { thread_id?: string, post_id?: string, last_post_at?: string }`
- `linked_repairs?: Array<{ work_type: "task" | "panel" | "external", work_id: string, note?: string }>` (max 6)
- `prevention_rule?: { rule_id: string, rule_state: PreventionRuleState, rule_summary: string, canary_until?: string }`
- `fix_epoch_id_current?: string`
- `prevented_friction_emitted_epochs?: string[]` (max 12; for idempotency)

`FrictionCluster` required:
- `cluster_id: string` (sha256)
- `key: string` (e.g. `q_backend|fetchEc:/api/commands|tool=fetchEc`)
- `fingerprints: string[]` (fingerprint_structural ids; max 50)
- `summary: string` (cap 240 chars)

## 4.6 Learning signals, regressions, predictions, system health (minimal but real)
DOC8 relies on the existing “learning signals” pipeline (DOC6/DOC1). If those schemas already exist in `packages/contracts/src/schemas.ts`, extend them to match the required fields below. If they do not exist, create them exactly as specified.

### 4.6.1 LearningSignal schema (`learning_signals.jsonl`)
Append-only. Used for positive/negative reinforcement and Impact Ledger ingestion.

Required:
- `signal_id: string` (uuid)
- `created_at: string` (ISO)
- `event_type: string` (enum; additive allowed, do not rename existing):
  - `friction_detected`
  - `prevented_friction`
  - `regression_triggered`
  - `canary_confirmed`
  - `canary_ineffective`
  - `health_anomaly`
- `severity?: FrictionSeverity` (when applicable)
- `fingerprint_structural?: string`
- `fix_epoch_id?: string`
- `rule_id?: string`
- `meta?: Record<string, unknown>` (small)

**Idempotency requirement:** `prevented_friction` must be emitted at most once per `(fingerprint_structural, fix_epoch_id)`.

### 4.6.2 Regression entry schema (`regressions.jsonl`)
Append-only. A regression is “this friction pattern is recurring enough that we need a prevention rule or repair work”.

Required:
- `regression_id: string` (uuid)
- `created_at: string` (ISO)
- `fingerprint_structural: string`
- `severity: FrictionSeverity`
- `summary: string` (cap 240 chars)
- `detection_source: "nightly_threshold" | "user_report" | "forum_feedback" | "panel_conclusion"`
- `status: "open" | "mitigated" | "fixed" | "ignored"`

Optional:
- `linked_rule_id?: string`
- `linked_work_id?: string`
- `category?: string` (optional taxonomy tag, e.g. `legal_distortion`)

### 4.6.3 Prediction events (`predictions.jsonl`) — raw logging only in v1
Prediction calibration dashboards are explicitly deferred, but raw logging is kept so you can build calibration later with real data.

Use a single append-only event schema with `event_type`:
- `prediction_create`
- `prediction_finalize`

Common required:
- `prediction_id: string` (uuid)
- `created_at: string` (ISO)
- `event_type: "prediction_create" | "prediction_finalize"`
- `run_id: string`
- `run_type: "task" | "panel" | "forum"`

Create-only fields (required on `prediction_create`):
- `predicted_duration_min: number`
- `predicted_success_prob: number` (0–1)
Optional:
- `predicted_cost_usd?: number`
- `key_risks?: string[]` (max 6, each cap 120 chars)
- `model_id?: string`

Finalize-only fields (required on `prediction_finalize`):
- `actual_duration_min: number`
- `actual_success: boolean`
Optional:
- `actual_cost_usd?: number`
- `notes?: string` (cap 240 chars)

### 4.6.4 System health rows (`system_health.jsonl`)
Append-only daily summary.

Required:
- `row_id: string` (uuid)
- `created_at: string` (ISO)
- `date: string` (YYYY-MM-DD)
- `rollup_duration_ms: number`
- `new_events_processed: number`
- `open_major_blocker_count: number`
- `anomalies: Array<{ metric: string, value: number, mean7: number, std7: number, zscore: number, message: string }>` (max 12)

---

## 4.7 Learning rollup additions (Q-facing; minimal schema)
If `LearningRollupSchema` already exists, add (or ensure it includes) a `friction` object:

```
friction: {
  generated_at: ISOString,
  window_days: number,
  totals: {
    new_events_processed: number,
    open_total: number,
    open_major_blocker: number,
    stale_total: number
  },
  top: Array<{
    fingerprint_structural: string,
    status: FrictionStatus,
    computed_severity: FrictionSeverity,
    count_window: number,
    last_seen_at: ISOString,
    channel: LearningChannel,
    friction_type: FrictionType,
    stage: string,
    tool_name?: string
  }>, // cap 50
  clusters: FrictionCluster[], // cap 50
  anomalies: SystemHealthAnomaly[] // cap 12
}
```

`learning_rollup.json` is for quick dashboards; `friction_state.json` is for the full table/drawer.


---

## 5) EC behavior & algorithms
### 5.1 Controls (minimal settings)
`ELNOR_MEMORY/system/learning/learning_controls.json` schema:
- `auto_escalate_enabled: boolean` (default true)
- `context_cautions_enabled: boolean` (default true)

No other tunables are user-exposed in v1. All thresholds are code constants marked `DOC8-TUNE`.

### 5.2 Automatic friction detection is primary
Automatic detection must be on by default. User “Mark friction” is an optional override for:
- subtle UX annoyances
- “model got weird / misunderstood prompt”
- quality degradation without a hard error
- external services that don’t throw structured errors

### 5.3 Where to emit friction (required hook table)
Implementers must wire friction emission at these minimum points (search-based insertion; do not invent new architecture).

| Detection point | Channel | File (must exist) | What to hook (search hints) | friction_type |
|---|---|---|---|---|
| EC command processing errors | ec_service | `apps/ec-service/src/server.ts` | `app.post("/api/commands"` handler; catch blocks; validation failures | `validation_error` / `tool_failure` |
| EC file IO failures | ec_service | `apps/ec-service/src/fs-utils.ts` | errors thrown by `appendJsonl`, `writeJsonAtomic` | `memory_read_failure` / `tool_failure` |
| EC event-bus append failures | ec_service | `apps/ec-service/src/event-bus.ts` | failure of `appendJsonl(this.eventBusPath, ...)` | `tool_failure` |
| Q backend → EC fetch failures | q_backend | `apps/q-backend/src/server.ts` | search `fetchEc(` and `.catch` / non-2xx | `offline_mode` / `tool_timeout` / `tool_failure` |
| Q backend auth/permission denials | q_backend | `apps/q-backend/src/token-manager.ts` | invalid session/token / denied remote write | `permission_error` |
| Q frontend API errors | q_frontend | `apps/q-frontend/src/api.ts` | fetch wrapper error paths | `tool_failure` |
| Context pressure crossing threshold | ec_service | (create) `apps/ec-service/src/learning/context-pressure.ts` | called from Context Assembler (DOC1/DOC7 integration point) | `context_pressure` |
| Compaction invoked / failed | ec_service | (existing compaction module per DOC1) | compaction start/end + failures | `compaction_event` |
| Nightly job overflow/parse errors | nightly | (create) `apps/ec-service/src/learning/nightly-rollup.ts` | rollup runtime > warn threshold; parse error; schema validation fail | `rollup_error` |
| Budget exhausted events | panels | (DOC6 panel runner module) | when feedback budget exhausted; append event | `budget_exhausted` |

> If you cannot find the expected insertion point, you must record that in `PATCH_REPORT.md` and add a minimal grep output proving what exists.

### 5.4 `emitFrictionEvent()` helper (single path)
Create `apps/ec-service/src/learning/friction-engine.ts` with:
- `computeFingerprints(input): { fingerprint_structural, fingerprint_variant, message_norm_prefix_60 }`
- `emitFrictionEvent(partialEvent): Promise<FrictionEvent>`
  - fills ids/timestamps
  - computes fingerprints
  - validates against `FrictionEventSchema`
  - appends to `friction_events.jsonl`
  - updates in-memory burst cache (see below)
  - updates `friction_state.json` incrementally **or** marks “dirty” for nightly rebuild (implementation choice; must be documented)

### 5.5 Burst suppression (low bloat, keeps signal)
`DOC8-TUNE` constants:
- `BURST_WINDOW_MS = 10_000`
- `BURST_FLUSH_MS = 60_000`

Algorithm:
- First event: append to `friction_events.jsonl`.
- Repeats within `BURST_WINDOW_MS` for same `fingerprint_variant`: do **not** append new events; increment in-memory counter.
- Periodically (or on window close), write a single `burst_suppressed` action to `friction_actions.jsonl` recording `suppressed_count`.

This preserves frequency signal with far less disk churn.

### 5.6 Manual merge (escape hatch)
Q can merge two fingerprints via `merge_fingerprint` action.
- Nightly job must load merge map first and rewrite `fingerprint_structural` to its canonical target before aggregation.
- Cycle detection: if merges create a cycle, ignore the newest merge and emit `rollup_error` friction event.

### 5.7 Nightly rollup + derived `friction_state.json`
Create `apps/ec-service/src/learning/nightly-rollup.ts`.

**Hard bounds** (`DOC8-TUNE`):
- max runtime: **300s** (existing standard)
- warn at: **200s** (emit `rollup_error` friction event)
- max new events per run: **50k** (paranoid cap; if exceeded, stop and record overflow)
- max entries in state: **10k** (paranoid cap)

**No full rescans.** Use cursor byte offsets:
- Store offsets in `friction_state.json.cursor`.
- Nightly reads only new bytes since offsets for `friction_events.jsonl` and `friction_actions.jsonl`.

Outputs:
1) Update `friction_state.json` (atomic)
2) Update `learning_rollup.json` (atomic)
3) Write history rollup to `rollups/YYYY-MM-DD_learning_rollup.json` and prune older than 7.
4) Append `system_health.jsonl` row (1/day).
5) Optionally create/append regressions and prevention candidates (see below).

Nightly self-monitoring:
- Any parse error, schema validation error, overflow, or write failure emits a friction event (`channel=nightly`, `friction_type=rollup_error`).

### 5.8 Computed severity (v1 formula; deterministic)
For each `fingerprint_structural`, compute:

- `base = mode(emitted_severity over window)` (fallback to `minor`)
- `freq = count_window` in last 14 days
- `escalated = 1 if has escalation action else 0`

Rules:
- If `base == blocker` → computed `blocker`
- Else if `freq >= 10` → computed `major`
- Else if `freq >= 3` and `escalated == 1` → computed `major`
- Else → computed `minor`

(Adjustable later via DOC8-TUNE, but do not add UI knobs in v1.)

### 5.9 Friction expiry / stale marking
If a fingerprint:
- has `count_window == 0` for **30 days** AND
- has never been escalated AND
- status is still `open`

then nightly adds an `auto_mark_stale` action, and the derived status becomes `stale`.
- `stale` is hidden in default Q filters but remains searchable.

### 5.10 Closed-loop healing: regression + prevention rule candidates
**Trigger (`DOC8-TUNE`):**
- If `computed_severity in {blocker, major}` AND
- `count_window >= 3` in 14 days AND
- status in `{open, mitigated}`

Then nightly:
1) Appends a `regressions.jsonl` entry referencing the fingerprint.
2) Appends a `prevention_rule_update` action with `rule_state="candidate"` and structured fields:
   - `rule_summary` (short)
   - `mitigation_steps[]` (concrete)
   - `code_hint` (derived convention; optional)

**Code hint convention (no static mapping table):**
- `q_backend` → prefix `apps/q-backend/src/`
- `q_frontend` → prefix `apps/q-frontend/src/`
- `ec_service` → prefix `apps/ec-service/src/`
- `openclaw` → prefix `~/.openclaw/workspace/` (do **not** assume; include only if you can resolve deterministically)

Stage-to-path transformation is best-effort. If uncertain, omit code_hint.

### 5.11 Canary mode (rules that prove themselves)
When user approves a prevention candidate (via Inbox/approval flow from DOC1/DOC6):
- append `prevention_rule_update` with `rule_state="canary"` and `canary_until = now + 7 days`

Nightly checks:
- If `canary_until` passed and fingerprint had **zero** events in the canary window:
  - update to `confirmed` (action append)
  - emit `prevented_friction` learning signal once (see fix epoch below)
- If fingerprint recurs during canary:
  - update to `ineffective`
  - (optional) auto-suggest repair panel

### 5.12 Fix epochs + prevented_friction idempotency
When a fingerprint is manually marked `mitigated` or `fixed`:
- EC assigns a new `fix_epoch_id` (uuid) and stores it in derived state.
- Nightly may emit `prevented_friction` **once** per `(fingerprint_structural, fix_epoch_id)` pair.
- The emitted epoch ids are stored in `prevented_friction_emitted_epochs[]` in `friction_state`.

### 5.13 Forum escalation (threading rules)
User action (Q) creates an `escalate_forum` action; EC performs:
1) If `friction_state.last_escalation.thread_id` exists → post reply there.
2) Else create new thread titled: `Friction: {channel} {stage} ({computed_severity})` and post initial message.
3) Else fallback to `default_forum_thread_id` if configured elsewhere (DOC6 forum system); if still unavailable, create a pending item containing the post text for copy/paste.

Forum post is deterministic template (no LLM):
- fingerprint_structural
- top variants + counts
- last_seen_at + count_window
- linked repairs / rules
- request: “Please propose fixes, mitigations, or spec/code changes.”

### 5.14 Repair work linkage + auto-mitigation
When user creates a repair task/panel from a fingerprint:
- append `link_repair_work` action with `work_id`
- nightly watches the linked work outcome:
  - if work is marked complete AND no recurrence for 7 days → auto-annotate `mitigated` (not `fixed`)

### 5.15 Friction-aware context injection (visible, targeted)
If `learning_controls.context_cautions_enabled == true`, then during context assembly for an operation with `(channel, stage)`:
- select fingerprints where:
  - `entry.channel == channel`
  - status in `{open, mitigated}`
  - computed_severity in `{blocker, major}`
  - last_seen_at within **7 days**
- sort by (computed_severity desc, last_seen desc)
- inject up to **3** one-line notes:

Example injection block (visible, system prompt):
```
[Friction Cautions — last 7 days]
- fetchEc:/api/commands has timed out intermittently (major; last seen 2026-02-26). Prefer retry-once + fallback.
- context_assembly hit pressure threshold recently (major; last seen 2026-02-25). Keep outputs concise.
```

Token cap: **150 tokens total**. If context pressure is already high, cap to 1 line.

This is **not** a standing order and must not override policy or memory; it is a caution overlay.

---

## 6) System health metrics (cheap, deterministic)
### 6.1 `system_health.jsonl` row (1/day)
Each nightly run appends a row with:
- date
- rollup duration
- number of new events processed
- number of open major/blocker fingerprints
- q_backend EC error count (if available)
- any anomalies flagged

### 6.2 Simple anomaly detection
Maintain a rolling 7-day baseline (mean + stddev) for numeric metrics.
Flag anomaly if current value deviates > **2σ** and absolute change exceeds a small floor.
Anomalies are included in `friction_state.anomalies[]` and `learning_rollup.json`.

No LLM. Pure math.

---

## 7) Q Dashboard UX (implementation-friendly, but informative)
### 7.1 V1 surfaces (ship these)
1) **Learning → Friction sub-tab** (inside existing Learning page)
   - Table driven from `friction_state.json.entries`
   - Default filter: status != stale AND computed_severity != minor (user can toggle)
   - Cluster grouping (optional UI grouping by cluster key)
2) **Row detail drawer** (lightweight)
   - Shows: top variants/messages, actions history (last 10), linked repairs, rule state, escalation link
   - Buttons:
     - Mark mitigated / fixed / ignored
     - Add note
     - Merge into…
     - Discuss in Forum
     - Create Repair Panel (DOC6 panel preset)
     - Create Repair Task (existing task system)
3) **Global “Mark friction” action**
   - Always available (top bar / quick actions)
   - Creates either:
     - a new friction event (manual) OR
     - a note/status action on an existing fingerprint if user selects one

### 7.2 Settings (minimal)
In Q Settings → Learning:
- Toggle: `Auto-escalate suggestions` (writes `learning_controls.auto_escalate_enabled`)
- Toggle: `Show friction cautions in prompt` (writes `learning_controls.context_cautions_enabled`)

No other settings in v1.

### 7.3 Optional (defer unless easy)
- Home card
- Trend charts
- Full-text search across raw message bodies

---

## 8) Tests & acceptance criteria
### 8.1 Required unit tests (EC)
- Fingerprint test: structural stable across message drift; variant differs when prefix differs.
- Merge test: merge action collapses counts before aggregation.
- Cursor test: second nightly run processes zero bytes when no new events.
- `prevented_friction` idempotency: emitted once per fix_epoch.
- Stale marking: 30-day inactivity marks stale.
- Context injection selection: caps at 3 and respects channel/stage + recency.

### 8.2 Required integration test (the “self-healing loop is real” test)
Simulate:
1) Emit same friction event 3 times (major) within window.
2) Run nightly → regression entry + prevention candidate created.
3) Append approval action setting rule to canary.
4) Run nightly without new events until canary expires.
5) Verify: rule becomes confirmed + prevented_friction signal emitted once.

This test must fail if implementation is “hollow”.

---

## 9) Implementation map (do not drift)
### 9.1 Contracts
- `packages/contracts/src/canonical.ts` (add paths)
- `packages/contracts/src/schemas.ts` (add schemas + export)
- `packages/contracts/src/index.ts` (export types)

### 9.2 EC service
Create:
- `apps/ec-service/src/learning/friction-engine.ts`
- `apps/ec-service/src/learning/nightly-rollup.ts`
- `apps/ec-service/src/learning/system-health.ts` (helpers; optional)
- `apps/ec-service/src/learning/learning-controls.ts`

Modify:
- `apps/ec-service/src/server.ts`
  - register read endpoints:
    - `GET /api/learning/friction/state` → returns friction_state.json
    - `GET /api/learning/rollup` → returns learning_rollup.json
  - handle new command types for friction actions and manual friction append
- `apps/ec-service/src/index.ts`
  - start nightly scheduler (if not already) to run rollup once per day (reuse existing scheduler per DOC1/DOC6)

### 9.3 Q backend
Modify:
- `apps/q-backend/src/server.ts`
  - proxy GET endpoints to EC for rollup/state
  - add helper route for `POST /api/learning/friction/action` that enqueues `learning_friction_action_append`
  - ensure remote write gating is enforced for status-changing actions

### 9.4 Q frontend
Modify/add:
- `apps/q-frontend/src/pages/LearningPage.tsx` (add Friction sub-tab)
- `apps/q-frontend/src/components/*`:
  - `FrictionTable.tsx`
  - `FrictionDetailDrawer.tsx`
  - `MarkFrictionModal.tsx`

Keep changes additive; do not refactor `QDashV11.jsx` except minimal wiring to open the modal (per prior guardrails).

---

## 10) Claude Code / Codex guardrails (implementation quality)
Implementer must produce:
- `PATCH_REPORT.md` with:
  - exact files changed/created
  - grep evidence for hook points used
  - summary of any deviations (and why)
- Run and report:
  - `npm test`
  - any relevant manual smoke steps (start servers, open Learning tab, mark friction)

---

## 11) Future directions (explicitly not in v1)
- Tier 3 autonomy (“act-and-report”) — **do not implement plumbing yet**.
- Prediction calibration dashboards — revisit after 30 days of real data.