DOC7_Context_Buckets_Files_R7_Consolidated.md
Current Specs/DOC7/DOC7_Context_Buckets_Files_R7_Consolidated.md
# DOC7 Context Buckets / Files — R7 [Consolidated Current]
## Revision Lineage (must persist in all later versions)
Based on DOC7 Context Buckets / Files v1.11.8 R5 Final plus DOC7 v1.11.8 R5.1 (Graph-Aware Materialization and Support Packs) plus R7 bucket file intake integration (DOC72 §20B.14 trigger, EC Core Addendum A orchestrator reference). This consolidated current version fully subsumes those prior operative versions.
## Consolidation Rule
If an inherited baseline statement conflicts with a later merged revision block in this file, the later merged revision block governs.
## Included Source Chain
- 1. Inherited Baseline — DOC7 Context Buckets / Files v1.11.8 R5 Final — source file: `DOC7_CONTEXT_BUCKETS_FILES_v1_11_8_R5_FINAL.md`
- 2. Merged Revision — DOC7 v1.11.8 R5.1 (Graph-Aware Materialization and Support Packs) — source file: `DOC7_CONTEXT_BUCKETS_FILES_v1_11_8_R5_1_Graph_Aware_Materialization_and_Support_Packs.md`
---
# Part 1 — Inherited Baseline — DOC7 Context Buckets / Files v1.11.8 R5 Final
# DOC7 — Context Buckets & Files
**ELNOR Suite v1.11.8 (R5 FINAL)** — February 27, 2026
**Status:** Final spec for implementation by Claude Code / Codex
**Audience:** Claude Code / Codex implementers (EC + Q)
---
## Spec pinning (inputs used to draft this doc)
SHA256 hashes of the source specs/files used while drafting this addendum:
- `ELNOR_CORE_SPEC_v1_11_2_CANONICAL.md` — `a1a23894dae7bbe93af868d02204896d887b2c2d1afd7b63a66a1c4468c81571`
- `Q_DASHBOARD_SPEC_v1_11_2_CANONICAL.md` — `f24788fbade9788a30df7e1d5546d14ac65e9464abd8cef84a9e4b4112405fa3`
- `DOC1_MEMORY_RESILIENCE_v1_11_3_FINAL.md` — `03f1c5ce75292e58019421763812dfd1f6fdd5942b202a3f07a8768063de2496`
- `DOC6_PANELS_FORUMS_SELF_IMPROVEMENT_v1_11_7_3.md` — `42c1bfc3af04dceda495087c9e6e908c172602f911f74504eef689d80766724a`
- `DOC8_SELF_LEARNING_PREDICTION_REGRESSION_FRICTION_v1_11_4.md` — `f00862f9f2fa455cae06250cf31b8cec9dc1f246bb8d8c18856a5957d574d4c4`
- Prior drafts: `DOC7_CONTEXT_BUCKETS_FILES_v1_11_8_R4_FINAL.md`
If any of these hashes do not match your local copies, **STOP** and resolve drift before implementing.
> This document **supersedes** all prior DOC7 drafts (R1, R2, R3, R4).
> Apply **only this DOC7 v1.11.8 R5 FINAL**.
---
## Instruction
Append this entire addendum last to the canonical specs (v1.10 base) **plus all prior approved addenda**.
This doc is **additive** and must not rename/move canonical files, endpoints, or existing storage artifacts unless explicitly stated.
---
## 0) Preamble
### 0.1 Purpose
Context Buckets are a lightweight, user-managed (and optionally system-managed) layer for attaching **structured background + document references** to any operation (chat, task, panel run, forum thread/channel, agent, project, global).
The system injects *just enough* bucket context automatically:
- If context room is ample, inline bucket background + small documents.
- If context is tight, inject a **manifest** (what exists + summaries + retrieval instructions) and keep full content in a **repository** fetchable on demand.
**System-managed buckets** provide always-current operational context about EC state, Q state, runtime health, and codebase structure — enabling Elnor to reason about its own infrastructure.
### 0.2 Key goals
- **Easy**: create/edit buckets quickly; attach with checkboxes; see what's active at a glance.
- **Additive**: layers on top of Memory/CRS/Freshness systems — never overrides them.
- **Low latency**: no hot-path LLM calls; deterministic indexing + caching.
- **Context-saving**: large docs are not stuffed into every prompt; manifests + on-demand reads.
- **Transparent**: model sees bucket headers/manifests in context; Q shows attachments.
- **Self-aware**: system-managed buckets keep Elnor's understanding current without manual intervention.
- **Ask-integrated**: Ask buttons auto-attach relevant buckets for full system awareness.
- **Model-switch resilient**: lightweight context seed bridges model changes mid-conversation.
### 0.3 Changes vs. R4
This revision incorporates third-round external review. Key changes from R4:
- **File removal tombstones**: `ContextBucketFileEntrySchema` now includes `removed`, `removed_at`, `removed_by` fields. Rebuild correctly excludes removed files.
- **Two-tier cache eviction**: orphan cleanup first, then pressure eviction with re-derivability check. LRU computed from access_log, not filesystem atime.
- **Health status for background-only buckets**: buckets with background but zero files now show "healthy" (not "empty").
- **Budget math clarified**: compute `bucket_pool` once at start, decrement per bucket. No double-subtract.
- **Ask origin `page_type`**: access_log entries include `operation_origin_page_type` so Ask matrix tuning can identify which page mapping to change.
- **DOC7 independence**: DOC7 produces metrics only (access_stats, context_insufficiency, page_type counts). DOC9 nightly reads these and generates proposals. DOC7 does not depend on DOC9.
- **Parseable markers in ops_map and code_map**: structured `<!-- BEGIN/END -->` blocks for machine-readable content.
- **`include_archived` removed**: archived buckets are always excluded. Unarchive to re-enable.
- **"Pinned files" reference removed**: files do not have a pinned field. Injection order is MRU within each bucket.
- **File inline order clarified**: MRU (from access_log), then by title alphabetical.
### 0.4 Dependencies
- **DOC1** memory/CRS: Session Context Seed, referenced document handles, "single-writer" + taint. **Existing coded modules**: `apps/ec-service/src/memory/`, `packages/contracts/src/schemas.ts`, `packages/contracts/src/canonical.ts`.
- **DOC6**: "INLINE vs REPOSITORY" reference patterns and read tools. **Existing coded modules**: `apps/ec-service/src/context/assembler.ts`.
- **DOC8**: friction/regression signals may reference bucket usage (optional, no coupling). **Existing coded modules**: `apps/ec-service/src/learning/friction.ts`, `apps/ec-service/src/learning/nightly.ts`.
- **DOC9** (companion, NOT a dependency): uses system-managed buckets and `source_read`. DOC7 does NOT depend on DOC9. DOC9 depends on DOC7.
### 0.5 Non-negotiables
- **EC is sole durable writer.** Q submits commands; EC writes. Pattern: `apps/ec-service/src/server.ts` → `processCommand()`.
- **Local-first.** No required cloud service.
- **No hot-path LLM calls** for indexing/materialization.
- **Additive context**: buckets never delete/suppress other context systems.
- **Hard caps**: token caps for injected blocks; storage caps and dedup for caches.
- **No silent changes**: every downgrade (inline → repository) is visible + logged.
- **Append-only logs are never mutated in place.** JSONL is strictly append-only. Derived `*_current.json` files are atomic-write replacements.
### 0.6 Existing codebase integration points
| Integration point | Existing file | What DOC7 adds |
|---|---|---|
| Command router | `apps/ec-service/src/server.ts` → `processCommand()` | New `context_bucket_*` command cases |
| Context assembly | `apps/ec-service/src/context/assembler.ts` → `assembleContext()` | Bucket injection step (after memory overlays, before task/panel overlays) |
| Canonical paths | `packages/contracts/src/canonical.ts` | All bucket storage paths registered |
| Zod schemas | `packages/contracts/src/schemas.ts` | All bucket schemas exported |
| Schema re-exports | `packages/contracts/src/index.ts` | Re-export new schemas |
| Friction emission | `apps/ec-service/src/learning/friction.ts` → `emitFrictionEvent()` | Optional `context_bucket_ids` field on events |
| Server startup | `apps/ec-service/src/server.ts` → startup sequence | Init bucket storage + SystemBucketManager |
| Nightly job | `apps/ec-service/src/learning/nightly.ts` → `runNightly()` | Code map rebuild, access_stats rebuild |
| Q backend proxy | `apps/q-backend/src/server.ts` | New `/api/context/...` proxy routes + `/api/q/status` |
| Q frontend nav | `apps/q-frontend/src/App.tsx` or equivalent router | New Context page route |
| Atomic writes | `apps/ec-service/src/utils/fs.ts` or equivalent | Uses existing `writeJsonAtomic`, `appendJsonl` |
---
## 1) Bucket data model (contracts)
All schemas go in `packages/contracts/src/schemas.ts` and are re-exported from `packages/contracts/src/index.ts`.
### 1.1 ContextBucket (registry record)
Stored in `.../buckets/registry.json` (EC-owned, atomic writes).
```ts
export const BucketMaterializationSchema = z.enum(["auto", "inline_prefer", "repo_prefer"]);
export const SystemBucketTypeSchema = z.enum([
"ops_map", // static architecture + debugging playbook
"ec_state", // dynamic EC runtime state
"q_state", // dynamic Q runtime state
"runtime_health", // dynamic DOC8 health + friction summary
"code_map", // auto-generated codebase structure
]);
export const BucketHealthStatusSchema = z.enum(["healthy", "degraded", "empty"]);
// DERIVATION RULES (see §1.1.1):
// "healthy" = all non-removed files index_status=="ready", OR file_count==0 but background exists and is non-empty
// "degraded" = at least one non-removed file is NOT "ready" (pending OR error), OR any ready file has missing cache
// "empty" = file_count==0 AND (no background OR background is empty)
export const ContextBucketSchema = z.object({
bucket_id: z.string(), // uuid
title: z.string().max(80),
summary: z.string().max(240),
description: z.string().max(800).optional(),
background_path: z.string().optional(),
background_updated_at: z.string().optional(), // ISO
// Derived file stats (exclude removed files)
file_count: z.number().int().min(0),
approx_total_tokens: z.number().int().min(0),
files_ready: z.number().int().min(0).default(0),
files_pending: z.number().int().min(0).default(0),
files_error: z.number().int().min(0).default(0),
// System bucket fields
system_managed: z.boolean().default(false),
system_bucket_type: SystemBucketTypeSchema.optional(),
default_materialization: BucketMaterializationSchema.default("auto"),
inline_budget_tokens: z.number().int().min(0).max(10000).optional(),
repo_budget_tokens: z.number().int().min(0).max(5000).optional(),
health_status: BucketHealthStatusSchema.default("empty"),
created_at: z.string(),
updated_at: z.string(),
created_by: z.enum(["user", "system"]),
updated_by: z.enum(["user", "system"]),
archived: z.boolean().default(false),
deleted: z.boolean().default(false),
deleted_at: z.string().optional(),
pinned: z.boolean().default(false),
});
```
#### 1.1.1 Health status derivation
```ts
function deriveHealthStatus(bucket: ContextBucket, currentFiles: ContextBucketFileEntry[]): BucketHealthStatus {
const activeFiles = currentFiles.filter(f => f.bucket_id === bucket.bucket_id && !f.removed);
const hasBackground = bucket.background_path && fs.existsSync(bucket.background_path)
&& fs.statSync(bucket.background_path).size > 0;
if (activeFiles.length === 0) {
// Background-only buckets (system buckets) are healthy if background exists
return hasBackground ? "healthy" : "empty";
}
if (activeFiles.every(f => f.index_status === "ready")) return "healthy";
return "degraded"; // at least one file is pending, error, or cache-missing
}
```
### 1.2 ContextBucketFileEntry (files index record)
Append-only `.../buckets/files_index.jsonl`, derived to `files_index_current.json`.
```ts
export const FileSourceTypeSchema = z.enum(["local_path", "web_url", "pasted_text"]);
export const FileIndexStatusSchema = z.enum(["pending", "ready", "error"]);
export const FileTaintStatusSchema = z.enum(["unknown", "clean", "tainted"]);
export const ContextBucketFileEntrySchema = z.object({
bucket_id: z.string(),
file_id: z.string(),
title: z.string().max(120),
source_type: FileSourceTypeSchema,
source_ref: z.string(),
// CONDITIONAL: required only when index_status === "ready"
content_hash: z.string().optional(),
size_bytes: z.number().int().min(0).optional(),
// Version tracking: version increments ONLY when content_hash changes.
version: z.number().int().min(1).default(1),
supersedes_hash: z.string().optional(),
index_status: FileIndexStatusSchema,
index_error: z.string().max(300).optional(),
// Section index: only when index_status === "ready"
section_index: z.array(z.object({
section_id: z.string().max(32),
title: z.string().max(200),
start_offset: z.number().int().min(0), // 0-based UTF-16 code unit index
end_offset: z.number().int().min(0),
})).optional(),
docmeta_summary: z.string().max(240).optional(), // only when ready
last_indexed_at: z.string().optional(),
snapshot_at: z.string().optional(), // web_url only, after fetch
taint_status: FileTaintStatusSchema.default("unknown"),
provenance: z.object({
added_by: z.enum(["user", "system"]),
added_at: z.string(),
notes: z.string().max(400).optional(),
}),
// === REMOVAL TOMBSTONE ===
// When a file is removed via context_bucket_file_remove:
// append a new JSONL entry with the same file_id, removed=true.
// Rebuild treats the latest entry as current. Removed files are excluded
// from all derived counters, health calculations, injection, and manifests.
removed: z.boolean().default(false),
removed_at: z.string().optional(),
removed_by: z.enum(["user", "system"]).optional(),
});
```
**Derived `files_index_current.json`:**
```ts
type FilesIndexCurrent = {
files: Record<string, ContextBucketFileEntry>; // keyed by file_id → latest JSONL entry
// Entries where removed===true ARE included in this map (so rebuild is correct).
// Consumers MUST filter on removed===false for display/injection/counting.
version_hash: string;
rebuilt_at: string;
};
```
**Rebuild rule**: replay `files_index.jsonl` in order. For each entry, overwrite `files[file_id]`. The final map contains the latest state per file_id, including removed files.
### 1.3 ContextBucketAssignmentEvent
Append-only `.../buckets/assignments.jsonl`, derived to `assignments_current.json`.
```ts
export const AssignmentTargetTypeSchema = z.enum([
"global", "project", "chat", "task", "panel_run",
"forum_channel", "forum_thread", "agent", "moderator_profile",
]);
export const ContextBucketAssignmentEventSchema = z.object({
event_id: z.string(),
created_at: z.string(),
op: z.enum(["add", "remove"]),
bucket_id: z.string(),
target_type: AssignmentTargetTypeSchema,
target_id: z.string().optional(), // required unless target_type === "global"
label: z.string().max(120).optional(),
});
```
Derived `assignments_current.json`:
```ts
type AssignmentsCurrent = {
by_bucket: Record<string, { global: boolean; targets: Record<string, string[]> }>;
by_target: Record<string, Record<string, string[]>>;
version_hash: string;
rebuilt_at: string;
};
```
### 1.4 AccessLogEntry
Append-only `.../buckets/access_log.jsonl`.
```ts
export const BucketAccessLogEntrySchema = z.object({
event_id: z.string(),
created_at: z.string(),
bucket_id: z.string().optional(),
file_id: z.string().optional(),
section_id: z.string().optional(),
action: z.enum([
"inject_inline", "inject_manifest", "read_section", "read_full",
"source_read", "model_switch_seed",
]),
selected_mode: z.enum(["inline", "repository"]).optional(), // only for inject events
reason: z.string().max(120).optional(),
model_id: z.string().max(80).optional(),
operation_type: z.string().max(40).optional(),
operation_id: z.string().optional(),
// Ask origin tracking — set when operation was created by an Ask button
operation_origin_page_type: z.string().max(60).optional(),
// e.g., "friction_detail", "task_detail", "settings", "error_state", etc.
// Matches the Ask matrix page types in §11.
// source_read metadata
source_read_meta: z.object({
source_path: z.string().max(300),
line_start: z.number().int().optional(),
line_end: z.number().int().optional(),
search: z.string().max(200).optional(),
}).optional(),
// Context insufficiency flag (set when model reads a bucket NOT in original context_bucket_ids)
context_insufficiency: z.boolean().optional(),
});
```
Derived `access_stats_current.json`:
```ts
type AccessStatsCurrent = {
by_bucket: Record<string, {
last_accessed_at: string;
last_selected_mode?: "inline" | "repository";
access_count_30d: number;
read_count_30d: number; // context_read + source_read specifically
}>;
// Per-hash LRU tracking for cache eviction
by_content_hash: Record<string, {
last_accessed_at: string;
access_count_30d: number;
}>;
// Context insufficiency counts for Ask matrix tuning
insufficiency_by_page_bucket: Record<string, number>;
// Key: "{page_type}:{system_bucket_type}" → count in last 14 days
version_hash: string;
rebuilt_at: string;
};
```
### 1.5 Pasted text durable storage
- Canonical: `pasted/<file_id>.txt` — **never** LRU-evicted.
- `source_ref` = `pasted://<file_id>`
- Cache at `cache/by_hash/<content_hash>.txt` may be evicted; re-derived from pasted file.
### 1.6 Background storage
- Path: `background/<bucket_id>.md`
- Max: 64 KB on disk. Max injected: 800 tokens (DOC7-TUNE).
- System-managed buckets: written by SystemBucketManager.
- ops_map (system_managed=false): user-editable via Q.
### 1.7 FileSuggestion
Append-only `.../buckets/suggestions.jsonl`, derived to `suggestions_current.json`.
```ts
export const FileSuggestionSchema = z.object({
suggestion_id: z.string(),
created_at: z.string(),
bucket_id: z.string(),
source_type: FileSourceTypeSchema,
source_ref: z.string(),
title: z.string().max(120),
reason: z.string().max(600),
suggested_by_operation: z.string().optional(),
status: z.enum(["pending", "approved", "rejected"]),
reviewed_at: z.string().optional(),
});
```
---
## 2) Bucket storage (EC-owned)
### 2.1 Canonical paths
All under `ELNOR_MEMORY/system/context/buckets/`:
- `registry.json` — atomic, bucket records
- `files_index.jsonl` — append-only, file entry events (including removal tombstones)
- `files_index_current.json` — derived, latest entry per file_id
- `assignments.jsonl` — append-only, assignment edge events
- `assignments_current.json` — derived, current assignment lookup
- `access_log.jsonl` — append-only, access events
- `access_stats_current.json` — derived, per-bucket stats + per-hash LRU + insufficiency counts
- `suggestions.jsonl` — append-only, file suggestion events
- `suggestions_current.json` — derived, current pending/recent suggestions
- `background/` — one `.md` per bucket
- `cache/by_hash/` — extracted text caches keyed by content_hash
- `pasted/` — durable pasted text
All paths MUST be registered in `packages/contracts/src/canonical.ts`.
### 2.2 Initialization and integrity
On EC startup:
1. Ensure all files/dirs exist (create empty defaults).
2. Verify derived `*_current.json` files against JSONL sources. Rebuild if stale/missing.
3. Verify system-managed buckets exist (§13). Create if missing.
### 2.3 Retention
- `files_index.jsonl`: keep indefinitely (small, contains version history + removal tombstones).
- `assignments.jsonl`: **keep indefinitely**. If >10 MB: write `assignments_baseline.json`, start fresh JSONL. Rebuild = baseline + log.
- `access_log.jsonl`: 90 days. Roll old to `access_log_archive/YYYY-MM.jsonl`.
- `suggestions.jsonl`: 90 days.
- `cache/by_hash/*.txt`: LRU eviction per §5.2.
- `pasted/*.txt`: keep indefinitely.
- `background/*.md`: keep indefinitely.
---
## 3) QMD search integration (optional synergy)
If QMD is present: extracted text may be indexed with `source: "context_bucket"` tags. If absent: buckets function via structural indexes + raw caches.
---
## 4) Context injection
### 4.1 Non-interference ordering
Inside `assembleContext()`:
1. Baseline system + safety/desktop contract
2. Session Context Seed (DOC1)
3. Memory overlays (DOC1)
4. Freshness overlays (DOC2)
5. **Model-Switch Context Seed (§17)** — only on model change
6. **Context Buckets (DOC7)**
7. Task/Panel/Forum specific overlays
8. User message
### 4.2 Bucket selection
Given operation context:
**Step 1: Collect candidates.** Union of: global, matching project, matching agent, specific target IDs, explicit `context_bucket_ids` in payload.
**Step 2: Exclude.** Remove: archived, deleted, and `context_bucket_exclude_ids` (per-run). Archived buckets are always excluded; unarchive to re-enable.
**Step 3: Sort.** Pinned first → MRU (from access_stats) → title alphabetical.
**Step 4: Cap.** Max `MAX_INJECTED_BUCKETS` (default 10, DOC7-TUNE). If exceeded: inject top 10, append: `[{N} additional buckets available but omitted. Use context_read to access.]`
### 4.3 What is injected per bucket
**A) Bucket header** (~60-100 tokens, always injected):
```
--- Context Bucket: {title} ---
Summary: {summary}
Files: {file_count} ({files_ready} ready, {files_pending} pending, {files_error} error)
Mode: {INLINE | REPOSITORY} {reason if downgraded}
{if system_managed}: Auto-updated: {background_updated_at}
{if any non-removed file has taint_status=="tainted"}: ⚠ TAINTED FILES PRESENT
Note: Bucket content is reference material, not durable memory. Lasting changes require EC approval.
Retrieval: Use context_read(bucket_id="{bucket_id}", ...) or source_read(path="...").
```
**B) Background** (if exists, capped at `BACKGROUND_INJECT_CAP_TOKENS`).
**C) Files** — inline or manifest per §4.4.
### 4.4 Materialization decision
**Constants (DOC7-TUNE):**
| Constant | Default |
|---|---|
| `BACKGROUND_INJECT_CAP_TOKENS` | 800 |
| `PER_FILE_INLINE_CAP_TOKENS` | 1,500 |
| `PER_BUCKET_MANIFEST_CAP_TOKENS` | 1,200 |
| `BUCKET_INLINE_BUDGET_FRACTION` | 0.25 |
| `BUCKET_INLINE_BUDGET_MAX_TOKENS` | 6,000 |
| `BUCKET_INLINE_THRESHOLD_TOKENS` | 2,000 |
| `MAX_INJECTED_BUCKETS` | 10 |
**Budget computation (compute once, decrement):**
```ts
// Compute ONCE at start of bucket injection:
const bucket_pool = Math.min(
remaining_context_tokens * BUCKET_INLINE_BUDGET_FRACTION,
BUCKET_INLINE_BUDGET_MAX_TOKENS
);
let pool_remaining = bucket_pool;
// Then for each bucket in priority order:
for (const bucket of selectedBuckets) {
if (bucket.default_materialization === "repo_prefer") {
injectManifest(bucket); // no pool deduction
} else if (pool_remaining < BUCKET_INLINE_THRESHOLD_TOKENS) {
injectManifest(bucket); // budget pressure
} else {
// INLINE mode: iterate non-removed ready files, ordered by MRU then title
for (const file of getActiveFiles(bucket).sort(byMRUThenTitle)) {
const tokens = estimateTokens(file);
if (tokens <= PER_FILE_INLINE_CAP_TOKENS && tokens <= pool_remaining) {
inlineFile(file); pool_remaining -= tokens;
} else if (tokens > PER_FILE_INLINE_CAP_TOKENS && PER_FILE_INLINE_CAP_TOKENS <= pool_remaining) {
inlinePartial(file, PER_FILE_INLINE_CAP_TOKENS);
addManifestEntry(file, "truncated");
pool_remaining -= PER_FILE_INLINE_CAP_TOKENS;
} else {
addManifestEntry(file);
}
}
}
}
```
### 4.5 Visibility & logging
- Repository downgrade: logged with `action: "inject_manifest"`, `reason: "budget_pressure"` or `"repo_prefer"`.
- Partial inline: logged with `reason: "partial_truncated"`.
- All events include `operation_origin_page_type` if the operation was Ask-initiated.
### 4.6 Synergies
#### Session Context Seed (DOC1)
Add pointers to active bucket material in seed's `referenced_documents[]`.
#### CRS/backup friendliness
`registry.json`, `files_index_current.json`, `assignments_current.json` → protected artifacts, not auto-pruned.
#### Self-learning data production
DOC7 **produces** the following metrics. It does NOT generate improvement proposals (that's DOC9's job):
- `access_stats_current.json`: per-bucket access/read counts, per-hash LRU, context insufficiency counts by page_type+bucket_type.
- `access_log.jsonl`: raw events with `context_insufficiency` flags and `operation_origin_page_type`.
DOC9 nightly reads these artifacts and generates improvement proposals for bucket usefulness, Ask matrix tuning, etc. DOC7 does not depend on DOC9.
#### Memory extraction
Bucket files are scanned by DOC72's knowledge intake pipeline (§4.7 below) for entity extraction. Extracted entities enter the standard DOC72 promotion pipeline with DOC1 governance gating — no auto-promotion into durable memory without approval gates.
### 4.7 DOC72 knowledge extraction from bucket files
When a file reaches `index_status: "ready"` after a `context_bucket_file_add` command, EC SHALL emit an `intake.bucket.file_added` observation to DOC72's intake pipeline (DOC72 §20B.14). The file content is queued for entity extraction through the BackgroundJobOrchestrator (EC Core Addendum A §3) as a high-priority tier2_extractor task.
DOC7 is NOT responsible for the extraction logic — it only emits the trigger event. DOC72 owns the extraction pipeline, entity linking, and promotion rules. DOC7 provides the file content and metadata; DOC72 produces knowledge nodes.
File updates (content_hash change on re-add or version increment) re-trigger extraction. File removals do NOT trigger extraction or entity deletion — extracted knowledge persists independently of the bucket file's lifecycle.
This ensures that bucket files — which the user explicitly curated as relevant reference material — contribute to DOC72's knowledge graph and DOC24's entity resolution, not just DOC7's raw context injection.
**Provenance coordination with DOC24:** DOC72 tags all bucket-file-extracted entities with `provenance.source_ref = "{bucket_id}:{file_id}"`. DOC24 uses this provenance to detect overlap between knowledge cards and inlined bucket content, suppressing redundant cards when the full document is already in the LLM's context. See DOC24 Unified Context Budget Governance for the coordination mechanism.
---
## 5) File access & permission management
### 5.1 Read endpoints and tools
- `GET /api/context/buckets/:bucketId/files/:fileId/read?section_id=...&max_chars=...`
- Tool: `context_read(bucket_id, file_id, section_id?, max_tokens?)`
- `max_chars = max_tokens * 4` (cap 16,000)
- Stale section_id → `SECTION_NOT_FOUND`
- `query` parameter: not supported in v1.
### 5.2 Cache strategy
**Directory structure:** `cache/by_hash/<content_hash>.txt`
**Lookup chain:** file_id → `files_index_current.json` → content_hash → `cache/by_hash/<hash>.txt`
**On file add:**
| source_type | On add | On reindex |
|---|---|---|
| `local_path` (text-like ≤100KB) | Sync fast-path: extract immediately, set `ready`. | Re-read, new entry if hash changed. |
| `local_path` (large/binary) | `pending`. Background worker extracts. | Same. |
| `web_url` | `pending`. Do NOT fetch. | Fetch (SSRF-safe), extract, set `ready` + `snapshot_at`. |
| `pasted_text` | Write `pasted/<file_id>.txt`. Copy to cache. Set `ready`. | Re-derive cache from pasted file. |
**Text-like extensions (sync fast-path):** `.md`, `.txt`, `.json`, `.ts`, `.js`, `.tsx`, `.jsx`, `.css`, `.html`, `.xml`, `.yaml`, `.yml`, `.toml`, `.env`, `.sh`, `.py`, `.rs`, `.go`, `.java`, `.rb`, `.sql`, `.csv`.
**Dedup:** if `cache/by_hash/<hash>.txt` already exists, skip extraction.
**Two-tier cache eviction:**
Total cache cap: 200 MB (DOC7-TUNE).
**Tier 1 — Orphan cleanup (safe, always runs first):**
Evict cache files whose `content_hash` is referenced by ZERO non-removed entries in `files_index_current.json`. These are stale hashes from old versions or removed files.
**Tier 2 — Pressure eviction (only if still above cap after tier 1):**
Evict least-recently-used cache files even if referenced, subject to re-derivability:
| source_type | Re-derivable? | Eviction allowed? |
|---|---|---|
| `pasted_text` | Always (from `pasted/<file_id>.txt`) | Yes |
| `local_path` | If file exists at `source_ref` and path in allowlist | Yes (with warning) |
| `web_url` | Only via fetch (may be offline) | Yes (with warning: "cache evicted, reindex to restore") |
**LRU source:** `access_stats_current.json` → `by_content_hash[hash].last_accessed_at`. Nightly rebuilds this from `access_log.jsonl`. Do NOT rely on filesystem atime.
**When cache is evicted for a referenced file:**
- The file entry remains `index_status: "ready"` (the index is still valid).
- Reads detect missing cache file and return `CACHE_MISSING: reindex to restore`.
- Manifest shows `[cache evicted]` next to file title.
- Health derivation: if any ready file's cache is missing → "degraded".
### 5.3 Section indexing
Offsets are 0-based UTF-16 code unit indices. `section_id = sha256(file_id + ":" + ordinal + ":" + normalized_heading).slice(0,16)`.
### 5.4 File versioning on reindex
New `content_hash` → append entry with `version + 1`, `supersedes_hash`. Same hash → append entry with same `version`, updated `last_indexed_at`. Always append; never mutate.
### 5.5 File removal
On `context_bucket_file_remove`:
1. Append a new `ContextBucketFileEntry` to `files_index.jsonl` with same `file_id` and:
- `removed: true`
- `removed_at: now`
- `removed_by: "user"` (or `"system"`)
- All other fields copied from current entry.
2. Update `files_index_current.json` — entry now has `removed: true`.
3. All derived counters (`file_count`, `files_ready`, etc.) exclude removed files.
4. Health derivation excludes removed files.
5. Injection excludes removed files.
6. Cache cleanup: if this was the last reference to the content_hash, the cache becomes orphan-evictable.
### 5.6 Permissions and security
**Local path allowlist (from config, not hardcoded):**
```ts
function getAllowedLocalRoots(): string[] {
const roots: string[] = [];
if (process.env.CODEX_BUILD_ROOT) roots.push(path.resolve(process.env.CODEX_BUILD_ROOT));
if (process.env.ELNOR_MEMORY_ROOT) roots.push(path.resolve(process.env.ELNOR_MEMORY_ROOT));
if (process.env.OPENCLAW_SHARED_ROOT) roots.push(path.resolve(process.env.OPENCLAW_SHARED_ROOT));
return roots;
}
```
Reject with `LOCAL_PATH_BLOCKED` if resolved real path outside allowed roots.
**Web URL SSRF protections:**
- Only `http://`, `https://`.
- Block private IP ranges: `127.0.0.0/8`, `10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16`, `169.254.0.0/16`, `0.0.0.0/8`, `fc00::/7`, `::1`.
- After DNS resolution, verify resolved IP not in blocked range.
- Max response: 10 MB. Timeout: 30 seconds.
---
## 6) Q UI
### 6.1 Context page (`/context`)
List view: title, summary, health badge ("healthy ✓" / "degraded ⚠ (2 pending)" / "empty ○"), pinned/archived/system badges, file count, last used.
### 6.2 Bucket detail (`/context/:bucketId`)
Header + health with "(X ready, Y pending, Z error)" counters. Background preview. Assignments. Files table (title, source_type, index_status, version, last_indexed_at, snapshot_at, removed indicator). Pending suggestions.
### 6.3 Bucket edit (`/context/:bucketId/edit`)
User-editable buckets only. ops_map is editable; other system-managed are not. Ops_map cannot be deleted (archive only).
### 6.4 ContextBucketPicker component
Reusable chip picker for chat/task/panel/forum/agent headers. Per-run exclude: ⊘ icon toggles gray strikethrough (not persisted).
---
## 7) EC commands
All writes through `processCommand()`.
Commands: `context_bucket_create`, `context_bucket_update`, `context_bucket_delete`, `context_bucket_duplicate`, `context_bucket_background_set`, `context_bucket_file_add`, `context_bucket_file_add_batch`, `context_bucket_file_remove`, `context_bucket_file_reindex`, `context_bucket_suggest_file`, `context_bucket_suggest_file_approve`, `context_bucket_suggest_file_reject`, `context_bucket_assign`, `context_bucket_pin`, `context_bucket_unpin`, `context_bucket_archive`, `context_bucket_unarchive`.
Key rules:
- `context_bucket_delete`: rejects pinned, rejects `system_bucket_type==="ops_map"`, rejects `system_managed===true`. On delete: set deleted, append "remove" for all edges, schedule orphan cache cleanup.
- `context_bucket_background_set`: rejects `system_managed===true`. ops_map (system_managed=false) is allowed.
- `context_bucket_file_remove`: appends tombstone entry per §5.5.
---
## 8) EC read endpoints
- `GET /api/context/buckets` → list (excludes deleted)
- `GET /api/context/buckets/:bucketId` → details + files + assignments + suggestions
- `GET /api/context/buckets/:bucketId/background` → background markdown
- `GET /api/context/targets/:targetType/:targetId/buckets` → bucket IDs
- `GET /api/context/buckets/:bucketId/files/:fileId/read` → content
- `GET /api/context/buckets/:bucketId/manifest` → deterministic manifest
- `GET /api/context/system-buckets` → system bucket list
---
## 9) EC service modules
New files in `apps/ec-service/src/context-buckets/`:
- `registry.ts` — CRUD, health derivation with background-only logic, file counter derivation
- `files.ts` — file index append + current rebuild (with tombstone handling), version tracking
- `assignments.ts` — assignment events, current rebuild, baseline compaction
- `cache.ts` — extraction (sync fast-path + async), hash-keyed cache, two-tier LRU eviction with refcount + re-derivability checks, pasted durable storage, SSRF-safe URL fetch
- `read.ts` — read slices, CACHE_MISSING detection, access_log + stats
- `manifest.ts` — deterministic manifest generation, token capping
- `injector.ts` — selection, inline/repo decision, header generation, budget pool. Called from `assembleContext()`.
- `suggestions.ts` — suggest/approve/reject, JSONL management
- `system-buckets.ts` — SystemBucketManager (§13)
- `source-read.ts` — source_read tool (§14)
- `model-switch.ts` — model-switch context seed (§17)
---
## 10) Q backend + frontend wiring
**q-backend:** proxy routes for `/api/context/...`, `/api/q/status` health endpoint.
**q-frontend:** new pages: `ContextPage.tsx`, `ContextBucketPage.tsx`, `ContextBucketEditPage.tsx`, components: `ContextBucketPicker.tsx`, `ContextBucketChips.tsx`.
---
## 11) Ask button context fusion
### 11.1 Bucket attachment matrix
| Ask location | Auto-attached system buckets |
|---|---|
| Friction detail | `runtime_health` + `ec_state` + `ops_map` + `code_map` |
| Memory page | `ec_state` + `ops_map` |
| Task detail | `ec_state` + `ops_map` + `code_map` + task's buckets |
| Panel run | `ec_state` + `runtime_health` + `ops_map` + `code_map` + panel's buckets |
| Forum thread | `ops_map` + `code_map` + thread's buckets |
| Settings / Config | `ec_state` + `q_state` + `ops_map` |
| Error state | ALL 5 system buckets |
| Learning page | `runtime_health` + `ec_state` + `ops_map` |
### 11.2 Origin page_type tracking
When Q creates an Ask-initiated chat:
```json
{
"context_bucket_ids": ["..."],
"operation_origin": {
"kind": "ask",
"page_type": "friction_detail",
"item_id": "friction-xyz"
}
}
```
EC writes `operation_origin_page_type: "friction_detail"` on all access_log entries for this operation. This enables DOC9 nightly to measure Ask matrix effectiveness per page type.
### 11.3 Prompt templates
Per-page structured templates (≤200 tokens) with: context summary, how-to-help guidance, available tools.
---
## 12) Acceptance tests
### 12.1 EC unit tests
1. Create bucket → "empty".
2. Add small .md → sync "ready", health "healthy".
3. Add large PDF → "pending", health "degraded". Background complete → "ready", "healthy".
4. Add web_url → "pending", no fetch. Health "degraded".
5. Reindex web_url → "ready", snapshot_at set.
6. Same-hash reindex → new JSONL entry, same version, updated last_indexed_at.
7. Changed-hash reindex → version+1, supersedes_hash.
8. **File remove** → tombstone appended, removed=true. file_count decremented. Health recalculated.
9. **Removed file not injected** → context injection skips removed files.
10. **Removed file rebuild** → corrupt current.json → rebuild from JSONL → removed files present with removed=true.
11. Assign to task+agent+global → assignments_current correct.
12. Manifest → deterministic, token capped.
13. Read section → correct text, access_log updated.
14. Stale section_id → SECTION_NOT_FOUND.
15. Pasted text durability → evict cache → pasted file persists → re-derive.
16. Local path allowlist → outside roots → LOCAL_PATH_BLOCKED.
17. SSRF → `http://127.0.0.1/...` → rejected.
18. Delete bucket → marked deleted, remove events, orphan cache cleanup.
19. **Background-only health** → system bucket with background, zero files → "healthy" (not "empty").
20. File status counters match actual state.
21. Symlink escape → source_read rejects.
22. **Tier 1 eviction** → orphan hash (no current refs) → evicted.
23. **Tier 2 eviction** → above cap, LRU hash with pasted_text source → evicted, CACHE_MISSING on read.
24. **Tier 2 no-evict** → only hash referenced by web_url file, no pressure → not evicted.
25. Suggestion → pending → approve → file added; reject → dismissed.
26. Batch add → 3 files, 1 bad → 2 succeed, 1 error.
### 12.2 Context injection tests
1. Low budget → manifests only.
2. High budget → inline + background.
3. repo_prefer → manifest.
4. Partial inline for large file.
5. Multi-bucket: pinned gets inline, others manifest.
6. Per-run exclude → not injected.
7. >10 buckets → top 10 + omission line.
### 12.3 System bucket tests
1. Startup creates 5 system buckets if missing.
2. Timer updates dynamic buckets.
3. Archive stops updates; unarchive resumes.
4. User cannot edit system-managed background (except ops_map).
5. ops_map deletion → rejected. Archive → allowed.
### 12.4 Ask + model-switch tests
1. Friction Ask → correct bucket_ids + `operation_origin_page_type="friction_detail"`.
2. Model switch → seed generated with previews + active buckets, ≤500 tokens.
3. No model switch → no seed.
---
## 13) System-managed buckets
### 13.1 Five bucket definitions
**ops_map**: `system_managed=false`, `created_by="system"`, `pinned=true`, `repo_prefer`. Assigned to `agent` by default. Non-deletable (archive only). User-editable. EC creates with default template on startup if missing.
**ec_state**: `system_managed=true`, `inline_prefer`. Updated every 15 min + on events. NOT assigned to agent by default.
**q_state**: `system_managed=true`, `inline_prefer`. Updated every 15 min. NOT assigned to agent by default.
**runtime_health**: `system_managed=true`, `inline_prefer`. Updated after nightly + on friction escalation. NOT assigned to agent by default.
**code_map**: `system_managed=true`, `repo_prefer`. Updated nightly + on code-applied + manual. NOT assigned to agent by default. Hard cap: 4,000 tokens. Excludes: `node_modules/`, `.git/`, `dist/`, `.repair/`, dotfiles.
### 13.2 SystemBucketManager
Timer: 15-min for ec_state, q_state, runtime_health only. code_map: nightly + onCodeApplied + manual.
ops_map: not auto-updated. Startup creates if missing. 30-day staleness warning.
### 13.3 Parseable markers in system bucket backgrounds
**ops_map** must include a machine-parseable modules index:
```markdown
<!-- OPS_KEY_MODULES:BEGIN -->
| Module | Path |
|---|---|
| Command router | apps/ec-service/src/server.ts |
| Context assembler | apps/ec-service/src/context/assembler.ts |
| Memory manager | apps/ec-service/src/memory/ |
<!-- OPS_KEY_MODULES:END -->
```
**code_map** must include:
```markdown
<!-- CODE_MAP_INDEX:BEGIN -->
apps/ec-service/src/
apps/ec-service/src/server.ts
apps/ec-service/src/context/assembler.ts
...
<!-- CODE_MAP_INDEX:END -->
```
DOC9 spec drift detection parses these markers. If markers missing: DOC9 nightly proposes "ops_map/code_map lacks parseable index block."
### 13.4 ops_map default template
Created on first boot only. Includes `OPS_KEY_MODULES` markers, architecture overview, key rules, safe change workflow, debugging playbook (see full template in R4 §13.5, updated with markers above).
---
## 14) source_read tool
Read-only codebase access. Restricted to CODEX BUILD. Symlink defense via `realpathSync()`. Token cap: 4,000 tokens. Binary detection: reject. Search: line-by-line JS scan, ±3 context.
Path resolution:
```ts
const realCandidate = fs.realpathSync(path.resolve(CODEX_BUILD_ROOT, relativePath));
const realRoot = fs.realpathSync(CODEX_BUILD_ROOT);
if (!realCandidate.startsWith(realRoot + path.sep) && realCandidate !== realRoot)
throw "PATH_OUTSIDE_REPO";
```
---
## 15) Phasing
1. Core storage + commands + read endpoints (including tombstone handling)
2. Manifest + injection (with budget pool math)
3. Q UI
4. Web URL + reindex (SSRF-safe)
5. System-managed buckets (with parseable markers)
6. source_read
7. Ask button fusion (with page_type tracking)
8. Model-switch context seed
9. Advanced (suggestions, batch add, QMD, cache eviction, access_stats rebuild)
---
## 16) Self-learning data production
DOC7 is a **data producer** for self-learning. It does NOT generate proposals.
### 16.1 What DOC7 produces
- `access_stats_current.json` → per-bucket access/read counts, per-hash LRU, insufficiency counts
- `access_log.jsonl` → raw events with `context_insufficiency` and `operation_origin_page_type`
### 16.2 What DOC9 consumes (DOC9's responsibility, not DOC7's)
DOC9 nightly reads DOC7 artifacts and generates:
- **Bucket usefulness proposals**: bucket with `access_count_30d > 0, read_count_30d === 0` → "consider unassigning"
- **Ask matrix tuning proposals**: `insufficiency_by_page_bucket` shows which page types frequently need buckets not in default attachment → "suggest adding bucket X to page Y's Ask attachment"
This keeps DOC7 independent of DOC9.
---
## 17) Model-switch context handoff
### 17.1 Detection
Model switch = current turn's `model_id` differs from previous turn's in same operation.
### 17.2 Seed generation (deterministic, no LLM)
```ts
type ModelSwitchContextSeed = {
generated_at: string;
previous_model_id: string;
new_model_id: string;
switch_reason: "user_initiated" | "fallback" | "agent_handoff" | "routing_policy";
conversation_context: {
turn_count: number;
first_user_message_preview: string; // first 200 chars
last_user_message_preview: string; // first 200 chars
last_assistant_message_preview: string; // first 400 chars
active_task_id?: string;
active_task_title?: string;
active_task_status?: string;
};
active_context: {
bucket_ids: string[];
bucket_titles: string[];
files_read_this_session: string[];
source_reads_this_session: string[];
};
operation_state?: {
operation_type: string;
operation_id: string;
progress_summary: string;
};
};
```
### 17.3 Injection
Max 500 tokens (DOC7-TUNE). Injected at position 5 in §4.1. Includes orientation block + "continue naturally" instruction. For automatic switches: adds note about user not being aware.
### 17.4 History forwarding
- Chat: last 5-10 turns or 2,000 tokens.
- Task/Panel: spec + last output summary.
- Agent: standing orders + memory overlays + last action.
---
## 18) Non-goals
- Automatic web crawling
- LLM-based summarization required for correctness
- Replacing Session Context Seed, CRS, or Memory overlays
- Writing or modifying source code (DOC9's domain)
- Auto-promoting bucket content into durable memory
- Generating improvement proposals (DOC9's domain)
---
**End of DOC7 v1.11.8 R5 FINAL**
---
# Part 2 — Merged Revision — DOC7 v1.11.8 R5.1 (Graph-Aware Materialization and Support Packs)
# DOC7 — Context Buckets & Files
## ELNOR Suite v1.11.8 R5.1 — Graph-Aware Materialization and Support-Pack Alignment
**Date:** March 10, 2026
**Status:** targeted revision draft — Wave C consumer alignment
**Supersedes:** DOC7 v1.11.8 R5 FINAL only for the subjects covered here
**Companions:** DOC10 R10.1, DOC15 Contract v1.1.1, DOC16 R3.1 / R1.1
---
## Why this revision exists
R5 already gave DOC7 the right base model: deterministic bucket storage, manifests, materialization budgets, context-saving inline/repository behavior, and Ask integration. Wave A and Wave B then clarified that DOC7 must also be able to consume:
- graph-aware `document_priority_hints`,
- support-pack grouping hints,
- retrieval/provider truth where document recommendations originated from semantic lanes,
- active review-target neighbor logic,
- contradiction / supersession-aware document packaging.
R5.1 adds those consumer rules without changing DOC7’s core ownership. DOC7 still owns bucket storage and materialization. It does **not** become the owner of canonical graph truth, provider internals, or semantic retrieval policy.
---
## 0) Interpretation rules
### 0.1 DOC7 consumes hints; it does not infer graph truth from scratch
DOC7 may consume relation-aware reason codes and support-pack hints. It may not silently invent contradiction, supersession, or legal taxonomy truth beyond what upstream systems already exported.
### 0.2 Materialization still obeys hard budgets
Graph-aware hints can improve ranking and grouping; they may not justify bypassing token, file-count, or cache constraints.
### 0.3 Support packs are resolved views, not new bucket classes
A support pack is a bounded resolved grouping of document refs for a specific operation. It is not a new canonical bucket type and it is not persisted as a second truth store unless another owner doc explicitly stores it.
### 0.4 Active review targets get special handling
When upstream systems mark a document as the active review target, DOC7 should preserve it at full fidelity where budget allows and treat graph-neighbor documents as secondary supporting context.
---
## 1) Schema amendments
### 1.1 Graph-aware document hint extension
R5.1 extends the DOC15-consumed `DocumentPriorityHintSchema` with optional graph-aware metadata when DOC7 is the materialization consumer.
```ts
// packages/contracts/src/context-buckets/graph-hints.ts
import { z } from "zod";
export const DocumentPriorityReasonCodeSchema = z.enum([
"explicit_user_selected",
"workspace_default",
"same_matter",
"same_issue",
"same_motion_type",
"support_pack_member",
"active_review_target_neighbor",
"references_target",
"supports_target",
"contradicts_target",
"supersedes_target",
"fallback_no_topology_data",
]);
export const GraphAwareDocumentPriorityHintSchema = z.object({
doc_id: z.string().max(200),
priority: z.enum(["critical", "useful", "background"]),
reason: z.string().max(240),
source_node_ids: z.array(z.string().max(200)).max(12).default([]),
reason_codes: z.array(DocumentPriorityReasonCodeSchema).max(12).default([]),
relation_types: z.array(z.string().max(120)).max(8).default([]),
relation_strength: z.number().min(0).max(1).optional(),
support_pack_id: z.string().max(160).optional(),
provider_receipt_refs: z.array(z.string().max(200)).max(12).default([]),
active_review_target: z.boolean().default(false),
});
```
### 1.2 Support-pack candidate schema
```ts
export const SupportPackCandidateSchema = z.object({
support_pack_id: z.string().uuid(),
title: z.string().max(160),
doc_ids: z.array(z.string().max(200)).min(2).max(8),
primary_reason_codes: z.array(DocumentPriorityReasonCodeSchema).max(8).default([]),
source_hint_refs: z.array(z.string().max(200)).max(20).default([]),
provider_receipt_refs: z.array(z.string().max(200)).max(12).default([]),
created_for_operation_id: z.string().max(160),
created_at: z.string(),
});
```
### 1.3 Materialization preview extension
```ts
export const GraphAwareMaterializationPreviewSchema = z.object({
included_doc_ids: z.array(z.string().max(200)).max(20).default([]),
dropped_doc_ids: z.array(z.string().max(200)).max(20).default([]),
support_packs: z.array(SupportPackCandidateSchema).max(6).default([]),
hidden_due_to_supersession: z.array(z.string().max(200)).max(20).default([]),
comparison_only_doc_ids: z.array(z.string().max(200)).max(20).default([]),
degraded_reason: z.string().max(160).optional(),
schema_version: z.literal(1),
});
```
---
## 2) Graph-aware materialization behavior
### 2.1 Ranking order
When graph-aware hints exist, DOC7 should rank candidate docs in this order:
1. explicit user-selected docs
2. active review target
3. `critical` hints
4. same support-pack docs already partially selected
5. same matter / same issue / same motion-type useful docs
6. background docs
7. stale or degraded docs only if no better alternative exists
Within a tier, rank by:
- reason-code importance,
- relation strength,
- freshness,
- provider confidence/receipt health,
- recent use.
### 2.2 Contradiction and supersession handling
If reason codes indicate `supersedes_target`:
- newer doc may replace older candidate in the inline set,
- older doc may remain retrievable by manifest,
- the preview must explain the replacement.
If reason codes indicate `contradicts_target`:
- include only in comparison-capable contexts,
- otherwise relegate to manifest or comparison-only list,
- never silently inline contradiction documents as if they were ordinary support docs.
### 2.3 Active review-target neighbor behavior
If a doc is marked `active_review_target=true`:
- treat it as non-compressible where possible,
- allow at most **2** same-issue/same-matter neighbors into the inline set by default,
- move additional neighbors into manifest/support-pack suggestions.
### 2.4 Support-pack assembly rules
DOC7 may resolve support packs from incoming hints when:
- two or more docs share compatible reason codes,
- group size stays within configured caps,
- the materialization profile allows grouped recommendations.
Support packs should be surfaced as grouped refs/manifests before they are fully inlined.
Recommended defaults:
- support-pack min docs: **2**
- support-pack target docs: **3–5**
- hard max docs: **8**
### 2.5 Topology-unavailable fallback
If graph/topology inputs are unavailable, DOC7 should continue with ordinary priority-based materialization and set `degraded_reason = "no_topology_data_available"` in preview where applicable.
---
## 3) Budgeting and fetch-on-demand amendments
### 3.1 Graph-aware hints do not bypass caps
Existing inline/repository rules remain unchanged. Graph-aware hints only change which docs are favored under the cap.
### 3.2 Support-pack-first repository mode
When budget is tight, DOC7 should prefer:
- active target inline,
- compact support-pack manifest for grouped neighbors,
- fetch-on-demand refs for the rest.
### 3.3 Bounded neighbor expansion
If a graph-aware preview requests neighbor expansion through DOC10/Core seams, DOC7 should never request more than:
- 1 expansion step,
- 5 neighbors per source doc,
- 8 total extra doc refs.
DOC7 does not own graph walking; it only consumes bounded expansion results.
---
## 4) Endpoints and command amendments
### 4.1 Manifest endpoint extension
`GET /api/context/buckets/:bucketId/manifest` may include a `graph_preview` block with `GraphAwareMaterializationPreviewSchema` when graph-aware hints participated in the decision.
### 4.2 Support-pack preview endpoint
```ts
GET /api/context/support-packs/:supportPackId/preview
```
#### Response
```ts
const SupportPackPreviewResponseSchema = z.object({
support_pack: SupportPackCandidateSchema,
docs: z.array(z.object({
doc_id: z.string().max(200),
title: z.string().max(240).optional(),
inline_eligible: z.boolean(),
relation_reasons: z.array(DocumentPriorityReasonCodeSchema).default([]),
})).max(12),
degraded_reason: z.string().max(160).optional(),
schema_version: z.literal(1),
});
```
This is a read surface only. It does not make support packs durable.
---
## 5) UI amendments
### 5.1 Bucket manifest panel
When graph-aware materialization participated, Q should show:
- support-pack chips,
- relation reason pills,
- superseded/contradiction notices,
- active target badge,
- degraded note if topology data was missing.
### 5.2 Support-pack preview drawer
For grouped document suggestions, add a preview drawer showing:
- grouped docs,
- why they were grouped,
- which are inline vs repository,
- what was suppressed and why.
States:
- loading
- empty / unavailable
- degraded
- populated
### 5.3 Mobile behavior
On narrow widths, render support-pack cards as collapsible accordions rather than multi-column tables.
---
## 6) Code implementation plan
### 6.1 New or amended files
```text
packages/contracts/src/context-buckets/graph-hints.ts
apps/ec-service/src/context-buckets/injector.ts
apps/ec-service/src/context-buckets/manifest.ts
apps/ec-service/src/context-buckets/support-packs.ts
apps/ec-service/src/context-buckets/read.ts
apps/q-backend/src/server.ts # support-pack preview route
apps/q-frontend/src/components/context/SupportPackPreview.tsx
apps/q-frontend/src/components/context/GraphAwareManifestCard.tsx
```
### 6.2 Required functions
```ts
export function rankDocumentHintsForMaterialization(input: {
hints: z.infer<typeof GraphAwareDocumentPriorityHintSchema>[];
budgetTokens: number;
}): z.infer<typeof GraphAwareMaterializationPreviewSchema>;
export function buildSupportPackCandidates(input: {
hints: z.infer<typeof GraphAwareDocumentPriorityHintSchema>[];
operationId: string;
}): z.infer<typeof SupportPackCandidateSchema>[];
export function applySupersessionAndComparisonRules(input: {
hints: z.infer<typeof GraphAwareDocumentPriorityHintSchema>[];
}): {
inlineHints: z.infer<typeof GraphAwareDocumentPriorityHintSchema>[];
hiddenDueToSupersession: string[];
comparisonOnlyDocIds: string[];
};
```
### 6.3 Failure handling
- If support-pack generation fails validation, continue with ordinary per-doc hints.
- If graph-aware fields are malformed, ignore them and emit a degraded preview note.
- If a provider receipt ref cannot be resolved, retain the doc recommendation but mark receipt details unavailable.
---
## 7) Telemetry additions
Add at minimum:
- `context.support_pack.generated`
- `context.support_pack.previewed`
- `context.support_pack.loaded`
- `context.graph_hint.used`
- `context.graph_hint.skipped`
- `context.supersession.applied`
- `context.comparison_only.set`
---
## 8) Acceptance scenarios
1. **Active review target with neighbors**
The active target stays inline. Two same-issue neighbors are included. Extra neighbors are grouped into a manifest support pack.
2. **Superseding memo**
A newer memo tagged `supersedes_target` replaces the older memo in inline materialization while the older memo remains visible in the manifest.
3. **Contradiction note in non-comparison mode**
A doc tagged `contradicts_target` is not inlined but remains available as comparison-only.
4. **No topology data**
Materialization succeeds using ordinary priorities and marks the preview degraded rather than pretending graph-aware logic ran.
5. **Support-pack preview**
A grouped support pack preview shows 4 docs, relation reasons, and inline eligibility, all within hard caps.
---
## 9) Manifest reconciliation for this revision
R5.1 covers the Wave C DOC7 obligations:
- graph-aware document-priority hint consumption,
- support-pack grouping and preview,
- contradiction / supersession-aware materialization,
- active review-target preservation,
- bounded fallback behavior,
- implementation-ready schemas, routes, UI states, and code seams.
DOC7 still does not own graph truth, provider truth, or semantic retrieval policy.