Elnor Repo Reader

DOC72_PROPOSAL_SURFACE_INTAKE_CONTRACTS_V2.md

Current Specs/DOC72/DOC72_PROPOSAL_SURFACE_INTAKE_CONTRACTS_V2.md

Short text page f41b474abe9c. Generated 2026-06-09T01:23:58.539Z from commit dbaa25962edc11ab30e8d4ca1715f9ae5bf77331. Worktree: clean.

Open readable HTML page · Open raw txt · Open path URL

ELNOR REPO READER TEXT MIRROR
Original path: Current Specs/DOC72/DOC72_PROPOSAL_SURFACE_INTAKE_CONTRACTS_V2.md
Source repo: /Users/OpenClaw1/Elnor/Elnor Specs
Git branch: main
Git commit: dbaa25962edc11ab30e8d4ca1715f9ae5bf77331
Generated: 2026-06-09T01:23:58.539Z

---

# DOC72 Proposal — Surface Intake Contracts for To-Do, Calendar, Notes, and Browser

**Source:** DOC20 R4.1 §§6.21.7, 6.22.8, 6.22.9, 6.19.22
**Target:** DOC72 §20A (Surface-Specific Intake Contracts)
**Status:** Proposal — needs integration into DOC72 R5.6+
**Date:** 2026-04-07

---

## 1. Purpose

DOC72 §20A defines surface-specific intake contracts for each Q surface. This proposal specifies the full contracts for four surfaces whose intake was under-specified or missing: To-Do, Calendar, Notes, and Browser. Each contract defines: what signals the surface emits, when extraction triggers, what DOC72 nodes and edges result, and what entity resolution is needed.

**Architectural principle (from DOC20 §2.4):** Extraction is triggered by EC commands, not by surfaces. All surfaces — palette, main workspace tabs, note canvas modules, future mobile — converge at EC. EC processes the command, writes to Layer 1 (application tables), then triggers the intake pipeline for Layer 2 (DOC72 entity graph). The intake contract is surface-agnostic.

---

## 2. `intake.todo` — To-Do System Intake Contract

### 2.1 Trigger

On every EC command that creates, updates, or deletes to-do data:
- `TodoListCreateCommand` → create `work_product` node
- `TodoListUpdateCommand` → update `work_product` node (including rename)
- `TodoListDeleteCommand` → tombstone `work_product` node
- `TodoItemCreateCommand` → create `obligation` node
- `TodoItemUpdateCommand` → update `obligation` node (including done/undone, text edit, due date change)
- `TodoItemDeleteCommand` → tombstone `obligation` node

**No significance gate.** All to-do mutations are significant by definition — the user explicitly created them, satisfying the "user action demonstrates intent" invariant (DOC72 §20.10).

### 2.2 Extraction Mode

**Hybrid — mostly deterministic, with entity resolution:**

| Source field | Extraction | Cost |
|---|---|---|
| `task.text` | Entity resolution — **primary signal** | Cheap (text matching) |
| `list.name` | Entity resolution — **contextual signal** (strengthens task matches) | Cheap (text matching) |
| Subtask `text` | Entity resolution — inherits parent context | Cheap (text matching) |
| `task.due_date`, `due_time` | Deterministic copy | Zero |
| `task.done`, `done_at` | Deterministic copy | Zero |
| Subtask structure | Deterministic edge creation | Zero |
| `list.project_id` | Deterministic edge to project entity | Zero |
| `task.attachments` | Deterministic `references_document` edges | Zero |
| `list.tags`, `task.tags` | Deterministic metadata | Zero |

**Entity resolution hierarchy:** Task `text` is the primary entity resolution input — each task item ("Prepare expert report in Paramount") is resolved independently for matter/case, people, document types, and actions. List `name` provides contextual framing — "February 8" adds temporal context, "Henderson MTD Prep" adds matter context that boosts confidence on ambiguous task text. Subtask text inherits the parent task's resolved context for higher confidence matching.

**Optional LLM-assisted (idle-time, low priority):**
- Goal inference: "these tasks are collectively preparing for trial" → edge to `goal` node
- Action classification: "prepare" → document creation, "review" → read action
- Category inference for ambiguous task text

### 2.3 Output Nodes

**`work_product` node for each to-do list:**

```ts
{
  node_kind: "work_product",
  entity_subtype: "todo_list",
  canonical_name: list.name,                 // "Henderson MTD Prep" or "February 8" — contextual signal for task entity resolution
  source_type: "todo_list",
  source_id: list.id,
  principal_id: "will",
  scope: "personal",                         // or "firm_shared" if project-linked
  fields: {
    task_count: list.tasks.length,
    completed_count: list.tasks.filter(t => t.done).length,
    project_id: list.project_id,
    note_id: list.noteId,
    tags: list.tags,
  },
  temporal: {
    created_at: list.created_at,
    updated_at: list.updated_at,
  },
}
```

**`obligation` node for each task:**

```ts
{
  node_kind: "obligation",
  canonical_name: task.text,                 // "Prepare expert report in Paramount" — PRIMARY entity resolution input
  source_type: "todo_item",
  source_id: task.id,
  principal_id: "will",
  scope: "personal",
  fields: {
    obligation_type: "task",
    status: task.done ? "completed" : "active",
    due_date: task.due_date,
    due_time: task.due_time,
    completed_at: task.done_at,
    reminder: task.reminder,
    has_subtasks: task.sub.length > 0,
    attachment_count: task.attachments.length,
  },
  temporal: {
    created_at: task.created_at,
    updated_at: task.updated_at,
    effective_from: task.created_at,
    effective_until: task.done_at || null,     // completed tasks have a bounded timeframe
  },
}
```

### 2.4 Output Edges

| Edge type | From | To | When |
|---|---|---|---|
| `belongs_to_list` | task `obligation` | list `work_product` | Always |
| `subtask_of` | subtask `obligation` | parent task `obligation` | When subtasks exist |
| `references_document` | task `obligation` | document `work_product` | When attachments exist |
| `associated_with_project` | list `work_product` or task `obligation` | project entity | When `project_id` is set |
| `related_to` | task `obligation` or list `work_product` | case/matter `world_entity` | Entity resolution on name/text |
| `related_to` | task `obligation` | person `world_entity` | Entity resolution on text mentioning known people |
| `related_to` | subtask `obligation` | case/matter `world_entity` | Entity resolution on subtask text, with parent task context as confidence boost |
| `deadline_for` | task `obligation` | goal `goal` | LLM-inferred (idle, optional) |

**Context inheritance for subtasks:** When entity resolution runs on a subtask ("Draft motion sections 1-3"), the parent task's resolved entities are available as context. If the parent task resolved to the Henderson case, the subtask inherits that association at boosted confidence even if "Henderson" doesn't appear in the subtask text.

### 2.5 Dedup and Merge

- **Key:** `source_type` + `source_id` uniquely identifies each node.
- **Update behavior:** On `TodoItemUpdateCommand`, EC updates the existing node in place. No new node created. Confidence is not affected by updates (this is operational data, not heuristic).
- **Delete behavior:** On `TodoItemDeleteCommand`, the node is tombstoned (retained with `status: "deleted"` for historical queries, not actively surfaced).
- **Entity resolution edges:** Re-run on text changes. Old edges removed, new edges created.

---

## 3. `intake.calendar` — Calendar Event Intake Contract

### 3.1 Trigger

On every EC command that creates, updates, or deletes calendar data:
- `CalendarEventCreateCommand` → create `obligation` node
- `CalendarEventUpdateCommand` → update `obligation` node
- `CalendarEventDeleteCommand` → tombstone `obligation` node
- Outlook sync events from DOC16 → create/update/delete as above

**No significance gate.** Same rationale as to-do — calendar events are explicitly created actions.

### 3.2 Extraction Mode

**Hybrid — structured fields deterministic, text fields use entity resolution:**

| Source field | Extraction | Cost |
|---|---|---|
| `title` | Entity resolution — primary input | Cheap |
| `event_type` | Deterministic copy | Zero |
| `date`, `time`, `endTime` | Deterministic copy | Zero |
| `location` | Entity resolution against known locations | Cheap |
| `participants` | Entity resolution against known people | Cheap |
| `notes` | Entity resolution + optional LLM for complex notes | Cheap to moderate |
| `project_id` | Deterministic edge | Zero |
| `attachments` | Deterministic edges | Zero |
| `cal` (source) | Deterministic edge to calendar entity | Zero |
| `source`, `external_id` | Deterministic metadata | Zero |
| `tags` | Deterministic metadata | Zero |

**Optional LLM-assisted (idle-time):**
- Goal linkage: "this hearing is the culmination of trial preparation" → edge to `goal`
- Task group linkage: "the MTD hearing creates a deadline for 5 preparation tasks" → `deadline_for` edges

### 3.3 Output Nodes

**`obligation` node for each calendar event:**

```ts
{
  node_kind: "obligation",
  canonical_name: event.title,               // "Henderson MTD Hearing"
  source_type: "calendar_event",
  source_id: event.id,
  principal_id: "will",
  scope: "personal",                         // or "firm_shared" for shared calendar events
  fields: {
    obligation_type: "calendar_event",
    event_type: event.event_type,            // "hearing", "deposition", etc.
    event_date: event.date,
    event_time: event.time,
    event_end_time: event.endTime,
    location: event.location,
    participants: event.participants,
    calendar_source: event.cal,
    sync_source: event.source,               // "local", "outlook", "manual"
    external_id: event.external_id,
    has_attachments: event.attachments.length > 0,
    notes_summary: null,                     // populated by LLM during idle if notes are long
  },
  temporal: {
    created_at: event.created_at,
    updated_at: event.updated_at,
    effective_from: `${event.date}T${event.time || "00:00"}`,
    effective_until: `${event.date}T${event.endTime || "23:59"}`,
  },
}
```

**`world_entity` node for each calendar source:**

```ts
{
  node_kind: "world_entity",
  canonical_name: calendar.name,             // "Henderson Case Calendar"
  entity_subtype: "calendar",
  source_type: "calendar_source",
  source_id: calendar.id,
  fields: {
    is_shared: calendar.shared,
    outlook_id: calendar.outlookId || null,
    color: calendar.color,
  },
}
```

### 3.4 Output Edges

| Edge type | From | To | When |
|---|---|---|---|
| `synced_to_calendar` | event `obligation` | calendar `world_entity` | Always |
| `related_to` | event `obligation` | case/matter `world_entity` | Entity resolution on title |
| `related_to` | event `obligation` | person `world_entity` | Entity resolution on participants |
| `located_at` | event `obligation` | location `world_entity` | Entity resolution on location |
| `associated_with_project` | event `obligation` | project entity | When `project_id` is set |
| `references_document` | event `obligation` | document `work_product` | When attachments exist |
| `deadline_for` | event `obligation` | task `obligation` or `goal` | LLM-inferred (idle, optional) |

### 3.5 Calendar-Specific Reasoning Affordances

The `event_type` field enables differentiated reasoning:

| Event type | System reasoning |
|---|---|
| `hearing`, `deposition`, `filing_deadline` | Hard deadlines. Cannot be moved. Preparation tasks inherit urgency. Alert if insufficient preparation time. |
| `client_meeting`, `external_meeting` | Semi-hard. May need preparation materials. Participants are high-value entity linkages. |
| `internal_meeting` | Soft. Lower urgency for preparation. |
| `task_deadline` | Soft deadline. Links to specific to-do items or groups. |
| `vacation`, `personal` | Blocks availability. No preparation needed. Affects scheduling recommendations. |

---

## 4. `intake.notes` — Notes Intake Contract (Gap — Needs Design)

Notes are free-form rich text. Unlike to-do and calendar data, they cannot be deterministically extracted — they require LLM-assisted extraction, similar to the browser's Tier 2 pipeline.

### 4.1 Trigger — Proposed

**On auto-save, after significance gate:**

A note emits an extraction signal when:
1. The note has been saved (auto-save or manual)
2. AND the note crosses a significance threshold:
   - Word count > 100 (excludes trivial scratchpads)
   - OR note is in a project (any project-linked note is worth extracting)
   - OR note has been edited for > 60 seconds total in this session (demonstrates sustained engagement)
   - OR note title matches a known entity (e.g., "Henderson Case Notes")

**Extraction does NOT run on every save.** It runs when the note transitions from "not yet extracted" to "significant" or when a previously extracted note has material changes (>50 new words since last extraction).

### 4.2 Extraction Mode

**LLM-assisted — similar to browser Tier 2 but with note-specific prompting:**

```
Extract structured knowledge from this note.

Note title: {title}
Note folder: {folder_name}
Note project: {project_name or "none"}
User's active matters: {active_matter_names}

Extract:
- Named entities (people, organizations, courts, case names, case numbers)
- Dates and deadlines mentioned
- Key facts, holdings, or decisions
- Document references (filings, motions, orders)
- To-do items or action items mentioned in prose (not in task list blocks — those are handled separately)
- Any information related to the user's active matters

Output as structured JSON candidates. Mark confidence based on extraction clarity.
Do NOT extract module block content (task lists, feeds, threads) — those have their own intake pipelines.
```

**Important:** The extraction prompt must exclude content from embedded module blocks (to-do modules, calendar modules, activity feeds, inline threads). Those blocks have their own intake contracts. Only the free-form text content of the note is extracted.

### 4.3 Output Nodes

**`work_product` node for the note itself:**

```ts
{
  node_kind: "work_product",
  canonical_name: note.title,
  entity_subtype: "note",
  source_type: "note",
  source_id: note.note_id,
  fields: {
    folder: note.folder_id,
    project_id: note.project_id,
    word_count: note.word_count,
    block_types: note.block_type_summary,
    tags: note.tags,
  },
  temporal: {
    created_at: note.created_at,
    updated_at: note.updated_at,
  },
}
```

Plus entity candidates extracted by the LLM: `world_entity` nodes for people/organizations/cases, `obligation` nodes for mentioned deadlines, edges linking the note to resolved entities.

### 4.4 Processing

- Uses the configured cheap extraction model (same as browser Tier 2)
- Runs during idle processing (same infrastructure as browser extraction and Satisfaction Matrix)
- Results feed into DOC72's standard promotion pipeline — entity linking, dedup, confidence assignment
- Note `intake_state` tracks: `"not_extracted"` | `"queued"` | `"extracted"` | `"re_extraction_needed"`

### 4.5 Status

This contract is a **proposal** — the significance gate thresholds, extraction prompt, and output schema need design review. The browser intake contract (§5 below) is more mature and can serve as a reference implementation.

---

## 5. `intake.browser` — Browser Intake Contract (Reference)

The browser intake contract is separately specified in `DOC20_Q_BROWSER_INTAKE_ARCHITECTURE.md`. Key points:

- **Tier 1 (immediate, deterministic):** `BrowserPageVisitSchema` metadata on every page load. Zero LLM cost.
- **Tier 2 (idle, LLM-assisted):** Entity extraction from significant pages only. Significance gate: dwell time >30s, user interaction, M365 surface, domain match, or return visit.
- **M365 "god mode":** Enhanced DOM extraction for Word Online/SharePoint documents. Creates `work_product` nodes with document metadata and content summary.
- **Processing:** Idle background, interruptible, incremental. Same infrastructure as Satisfaction Matrix and notes extraction.

This contract should be integrated into DOC72 §20A alongside the to-do, calendar, and notes contracts.

---

## 6. Cross-Surface Intake Summary

| Surface | Trigger | Extraction mode | Significance gate | Output nodes | LLM cost |
|---|---|---|---|---|---|
| **To-Do** | Every CRUD command | Deterministic + entity resolution | None (all significant) | `obligation`, `work_product` | Zero (entity resolution only) |
| **Calendar** | Every CRUD + Outlook sync | Deterministic + entity resolution | None (all significant) | `obligation`, `world_entity` | Zero to low |
| **Notes** | Auto-save after significance gate | LLM-assisted | Yes (word count, project, engagement) | `work_product` + extracted entities | Moderate (1 LLM call per note) |
| **Browser** | Page visit (Tier 1), idle (Tier 2) | Deterministic (T1), LLM (T2) | Yes (dwell, interaction, domain) | Visit metadata + extracted entities | Zero (T1), moderate (T2) |
| **Chats** | Conversation mining (existing) | LLM-assisted | Existing DOC72 contract | Extracted entities | Per existing spec |
| **Rooms/Panels** | Same as chats | LLM-assisted | Existing DOC72 contract | Extracted entities | Per existing spec |

**Key invariant:** All intake is triggered at EC, not at the surface. A to-do item created in the palette, in a note module, in a standalone tab, or on mobile all trigger the same `TodoItemCreateCommand` at EC, which runs the same intake pipeline. The surface is irrelevant to extraction.

---

## 7. Integration Notes

### 7.1 Where this goes in DOC72

These contracts belong in DOC72 §20A as surface-specific intake contracts, alongside the existing contracts for Rooms, Panels, CANDOR, and Email.

### 7.2 Entity Resolution Requirements

The to-do and calendar contracts rely heavily on entity resolution — matching free-text names against known entities in the DOC72 graph. This requires:
- A fast entity matching function (text similarity against `canonical_name` fields)
- Configurable confidence thresholds for auto-linking vs candidate-for-review
- Ability to create new `world_entity` nodes when no match is found (e.g., a new case name appears in a to-do item)

### 7.3 Idle Processing Coordination

Notes extraction, browser Tier 2 extraction, and optional LLM-assisted inference for to-do/calendar all share the idle processing infrastructure. The priority queue should be:
1. Calendar events with upcoming dates (time-sensitive)
2. Browser pages with user interactions (high-value signals)
3. Notes with significant changes
4. Browser long-dwell pages
5. To-do/calendar LLM inference (goal linkage, etc.)

### 7.4 Schema Gaps — Resolved

The DOC20 `TodoTask` schema was missing `created_at` and `updated_at` fields. These have been added in DOC20 R4.1 §6.21.2 along with `done_at`, `attachments`, and `TodoAttachment` schema. No remaining schema gaps for to-do or calendar intake.