DOC18_LlamaIndex_Retrieval_Sidecar_R2.md

Current Specs/DOC18/DOC18_LlamaIndex_Retrieval_Sidecar_R2.md
Generated 2026-06-09T01:23:58.539Z from commit dbaa25962edc11ab30e8d4ca1715f9ae5bf77331. Worktree: clean.
Open text page · Open raw txt · Open path URL
# DOC18 LlamaIndex Retrieval Sidecar — R2 [Consolidated Current]

## Revision Lineage (must persist in all later versions)

Based on DOC18 LlamaIndex Retrieval Sidecar R1 plus DOC18 LlamaIndex Retrieval Sidecar R1.1 (Retrieval Lane and Topology Alignment). This consolidated current version fully subsumes those prior operative versions.


## Consolidation Rule

If an inherited baseline statement conflicts with a later merged revision block in this file, the later merged revision block governs.


## Included Source Chain

- 1. Inherited Baseline — DOC18 LlamaIndex Retrieval Sidecar R1 — source file: `DOC18_LlamaIndex_Retrieval_Sidecar_R1.md`
- 2. Merged Revision — DOC18 LlamaIndex Retrieval Sidecar R1.1 (Retrieval Lane and Topology Alignment) — source file: `DOC18_LlamaIndex_Retrieval_Sidecar_R1_1_Retrieval_Lane_and_Topology_Alignment.md`



---

# Part 1 — Inherited Baseline — DOC18 LlamaIndex Retrieval Sidecar R1


# DOC18 — LlamaIndex Retrieval Sidecar (R1)

## Purpose

DOC18 defines a **dedicated LlamaIndex-based retrieval sidecar** for ELNOR.

It exists to provide:

- semantic retrieval over selected document corpora,
- folder- and matter-scoped retrieval over OneDrive / SharePoint / local mirrors,
- a controlled sidecar implementation that **enhances** ELNOR without replacing:
  - ELNOR Core canonical memory,
  - DOC7 Context Buckets,
  - QMD / internal search,
  - OpenClaw local file/runtime capabilities,
  - Microsoft live APIs / MCP connectors.

This document is intentionally **separate from DOC3** because LlamaIndex is a concrete subsystem/library integration, not a first-principles capability rule. DOC3 should describe **how ELNOR routes to a LlamaIndex provider**; DOC18 should describe **how to build and operate the provider**.

---

## Why this is a separate numbered spec (not only a DOC16 punch-list item)

DOC16 is the running punch list for deferred additions and future work. It is useful for preserving ideas that are not yet ready for a canonical build pass. In this case, LlamaIndex is no longer just a vague future idea; it is a real, buildable subsystem. That makes it better suited for a dedicated implementation spec than a short punch-list entry. fileciteturn25file0

Recommended treatment:

- **DOC18** = buildable subsystem spec for the LlamaIndex sidecar
- **DOC3 addendum** = normative architectural hook telling ELNOR how to route to it
- optional later step: add a one-line reference to DOC18 in the next DOC16 revision if desired

---

## Non-normative external validation

This spec is based on the following upstream realities:

1. LlamaIndex is a framework for context-augmented LLM applications with:
   - data connectors,
   - indexes,
   - query engines,
   - and routing/retrieval composition.  
2. LlamaIndex provides documented Microsoft readers for:
   - **OneDrive**
   - **SharePoint**
3. LlamaIndex provides retriever/query routing patterns such as:
   - `RouterRetriever`
   - query engines
   - metadata-aware retrieval
4. LlamaCloud offers managed OneDrive / SharePoint data-source integrations, but this spec uses **local/OSS sidecar first** as the default because the ELNOR architecture is local-first / governance-first.

These external facts justify using LlamaIndex as an **optional semantic retrieval provider** rather than inventing an ad hoc semantic index from scratch.

---

## Problem statement

ELNOR needs a semantic retrieval option that can sit between:

- live Microsoft APIs / MCP-based retrieval,
- local/OpenClaw file access,
- and internal QMD/ELNOR search.

Reasons this matters:

1. Microsoft live retrieval may be unavailable, licensing-limited, slow, or too narrow for some corpora.
2. Internal QMD / ELNOR search should not be overloaded with massive external document corpora when those corpora have their own retrieval dynamics.
3. Legal/work corpora often benefit from:
   - semantic retrieval,
   - metadata filtering,
   - folder/matter scoping,
   - cross-document synthesis,
   - and curated sidecar indexing.
4. ELNOR should not be locked to one search method forever.

---

## Architectural decision

## Decision

Adopt **LlamaIndex as an optional sidecar retrieval provider**.

### Keep

- ELNOR Core remains the semantic identity / policy / orchestration layer.
- DOC7 remains the supporting-context and project/bucket layer.
- OpenClaw remains the runtime / tool / local automation substrate.
- Microsoft live APIs / MCP connectors remain preferred for some live cloud system-of-record tasks.
- QMD/internal search remains the owner of internal ELNOR search and memory retrieval.

### Add

- a **Python sidecar service** that indexes selected corpora using LlamaIndex,
- a new retrieval provider kind: `llamaindex_index`,
- a corpus registry and sync layer,
- a query endpoint and health endpoint,
- route scoring inside ELNOR’s search orchestrator,
- user controls and telemetry.

### Do not do

- Do **not** treat LlamaIndex as canonical memory.
- Do **not** store the LlamaIndex vector/docstore inside canonical ELNOR durable-memory structures.
- Do **not** let the sidecar infer project/matter identity independently of ELNOR Core.
- Do **not** make LlamaIndex the default path for exact file lookup when a better live/scoped API path exists.
- Do **not** silently index the user’s entire Microsoft corpus.
- Do **not** let the sidecar become a second source of truth for file permissions, project identity, or durable memory.

---

## Role of LlamaIndex in the overall search stack

LlamaIndex should be used as a **sidecar semantic retrieval engine**.

### Best-fit use cases

Use LlamaIndex for:

- semantic retrieval over a selected matter folder or case corpus,
- brief-bank / template-bank / pre-litigation-bank retrieval,
- cross-corpus semantic search over mixed corpora,
- fallback semantic retrieval when Microsoft Retrieval API is unavailable or not a good fit,
- experimentation with routing/reranking/metadata-filtered retrieval.

### Not the best default for

Do not make it primary for:

- exact file lookup in a known matter folder,
- live Microsoft permissions truth,
- “who can access what right now?” checks,
- file metadata truth,
- source-of-truth project resolution.

### Routing doctrine

ELNOR Search Orchestrator should generally prefer:

**Exact known-file lookup**
1. Microsoft Drive / folder search
2. Microsoft Graph Search fallback
3. local/OpenClaw filesystem fallback
4. browser fallback

**Semantic argument / issue retrieval**
1. Microsoft Retrieval API when available and healthy
2. `llamaindex_index` sidecar for selected corpora / fallback / cross-corpus use
3. internal QMD / managed semantic search if appropriate
4. browser fallback last

---

## Component model

```text
Q / User
  -> EC Search Orchestrator
      -> Project / Matter Resolver
      -> Search Route Registry
      -> Route Scorer / Policy Filter
      -> Retrieval Provider Client
           -> Microsoft live route(s)
           -> LlamaIndex Sidecar
           -> QMD / internal search
           -> OpenClaw local file path
      -> Result Merger / Reranker
      -> Q result + telemetry
```

LlamaIndex sidecar itself:

```text
LlamaIndex Sidecar
  ├─ Corpus Registry
  ├─ Microsoft Reader adapters
  │    ├─ OneDrive reader
  │    └─ SharePoint reader
  ├─ Local file reader (optional)
  ├─ Ingestion / chunking pipeline
  ├─ Vector store / docstore
  ├─ Query engine(s)
  ├─ Health / sync status
  └─ HTTP API
```

---

## Corpus model

The sidecar should not be one giant undifferentiated index. It should manage **named corpora**.

Examples:
- `active_matters`
- `johnson_matter`
- `brief_bank`
- `prelit_bank`
- `selected_work_product`
- `mixed_litigation_authorities`

Each corpus should have explicit configuration.

### Canonical corpus-binding ownership

Canonical corpus-binding metadata should be owned by EC and stored in ELNOR-managed config.

Suggested path:

```text
ELNOR_MEMORY/system/retrieval/llamaindex/corpus_bindings_current.json
ELNOR_MEMORY/system/retrieval/llamaindex/provider_health_current.json
ELNOR_MEMORY/system/retrieval/llamaindex/query_stats_current.json
```

These are **metadata/config/read-models** owned by EC.

### Sidecar-owned index state

The sidecar’s actual index data should live outside canonical memory:

```text
ELNOR_STATE/llamaindex/
  corpora/
  vector_store/
  docstore/
  sync_jobs/
  cache/
```

This preserves single-writer discipline for canonical memory.

---

## Required schemas

## A. EC-facing shared contracts (TypeScript / Zod)

```ts
// packages/contracts/src/retrieval/llamaindex.ts
import { z } from "zod";

export const LlamaIndexSourceTypeSchema = z.enum([
  "onedrive",
  "sharepoint",
  "local",
  "mixed",
]);

export const LlamaIndexSyncModeSchema = z.enum([
  "manual",
  "scheduled",
  "on_demand",
]);

export const LlamaIndexCorpusBindingSchema = z.object({
  corpus_id: z.string().max(120),
  title: z.string().max(200),
  source_type: LlamaIndexSourceTypeSchema,
  project_id: z.string().max(120).optional(),
  matter_id: z.string().max(120).optional(),
  source_binding_ref: z.string().max(240),
  root_locator: z.object({
    drive_id: z.string().optional(),
    site_id: z.string().optional(),
    folder_id: z.string().optional(),
    folder_path: z.string().optional(),
    local_path: z.string().optional(),
  }),
  include_patterns: z.array(z.string().max(240)).default([]),
  exclude_patterns: z.array(z.string().max(240)).default([]),
  metadata_defaults: z.record(z.string(), z.unknown()).default({}),
  sync_mode: LlamaIndexSyncModeSchema.default("manual"),
  enabled: z.boolean().default(true),
  schema_version: z.literal(1),
});

export const LlamaIndexQueryRequestSchema = z.object({
  corpus_ids: z.array(z.string().max(120)).min(1),
  query: z.string().max(4000),
  top_k: z.number().int().min(1).max(50).default(8),
  filters: z.record(z.string(), z.unknown()).default({}),
  mode: z.enum([
    "semantic",
    "semantic_with_keyword_boost",
    "hybrid",
  ]).default("hybrid"),
  include_chunks: z.boolean().default(true),
  include_documents: z.boolean().default(true),
  include_debug: z.boolean().default(false),
  trace_id: z.string().max(120).optional(),
  schema_version: z.literal(1),
});

export const LlamaIndexResultChunkSchema = z.object({
  chunk_id: z.string().max(160),
  doc_id: z.string().max(160),
  text: z.string(),
  score: z.number(),
  metadata: z.record(z.string(), z.unknown()).default({}),
  start_offset: z.number().int().optional(),
  end_offset: z.number().int().optional(),
});

export const LlamaIndexResultDocumentSchema = z.object({
  doc_id: z.string().max(160),
  title: z.string().max(240).optional(),
  source_uri: z.string().max(500).optional(),
  path: z.string().max(500).optional(),
  last_modified_at: z.string().datetime().optional(),
  metadata: z.record(z.string(), z.unknown()).default({}),
});

export const LlamaIndexQueryResponseSchema = z.object({
  corpus_ids: z.array(z.string().max(120)),
  hits: z.array(z.object({
    document: LlamaIndexResultDocumentSchema,
    chunks: z.array(LlamaIndexResultChunkSchema).default([]),
    aggregate_score: z.number(),
  })),
  retrieval_mode: z.enum(["semantic", "semantic_with_keyword_boost", "hybrid"]),
  freshness: z.object({
    indexed_at: z.string().datetime().optional(),
    stale: z.boolean().default(false),
    stale_reason: z.string().max(240).optional(),
  }),
  trace_id: z.string().max(120).optional(),
  schema_version: z.literal(1),
});

export const LlamaIndexHealthSchema = z.object({
  service_status: z.enum(["healthy", "degraded", "disabled", "unknown"]),
  corpora: z.array(z.object({
    corpus_id: z.string(),
    status: z.enum(["healthy", "stale", "syncing", "error", "disabled", "unknown"]),
    last_indexed_at: z.string().datetime().optional(),
    document_count: z.number().int().nonnegative().default(0),
    chunk_count: z.number().int().nonnegative().default(0),
    last_error: z.string().max(240).optional(),
  })).default([]),
  schema_version: z.literal(1),
});
```

## B. Python service internal models (Pydantic)

```python
# services/llamaindex-sidecar/app/models.py
from pydantic import BaseModel, Field
from typing import Any, Literal

class CorpusBinding(BaseModel):
    corpus_id: str
    title: str
    source_type: Literal["onedrive", "sharepoint", "local", "mixed"]
    project_id: str | None = None
    matter_id: str | None = None
    source_binding_ref: str
    root_locator: dict[str, Any]
    include_patterns: list[str] = []
    exclude_patterns: list[str] = []
    metadata_defaults: dict[str, Any] = {}
    sync_mode: Literal["manual", "scheduled", "on_demand"] = "manual"
    enabled: bool = True
    schema_version: Literal[1] = 1

class QueryRequest(BaseModel):
    corpus_ids: list[str]
    query: str
    top_k: int = Field(default=8, ge=1, le=50)
    filters: dict[str, Any] = {}
    mode: Literal["semantic", "semantic_with_keyword_boost", "hybrid"] = "hybrid"
    include_chunks: bool = True
    include_documents: bool = True
    include_debug: bool = False
    trace_id: str | None = None
    schema_version: Literal[1] = 1
```

---

## Service layout

Recommended repo layout:

```text
services/llamaindex-sidecar/
  pyproject.toml
  requirements.in
  requirements.lock
  app/
    main.py
    config.py
    models.py
    auth.py
    logging.py
    corpus_registry.py
    health.py
    sync.py
    query.py
    storage.py
    routes.py
    connectors/
      onedrive_reader.py
      sharepoint_reader.py
      local_reader.py
    ingestion/
      chunking.py
      metadata.py
      pipeline.py
    vector/
      backend.py
      chroma_backend.py
```

### Why Python sidecar
Default to **Python** because upstream OneDrive and SharePoint reader support is mature there. The TypeScript side should talk to the sidecar over HTTP rather than reimplementing the ingestion/retrieval stack.

---

## Installation and dependency guidance

Recommended minimum stack:

- Python 3.11+
- FastAPI
- uvicorn
- pydantic
- httpx
- msal
- llama-index
- Microsoft OneDrive reader package
- Microsoft SharePoint reader package
- chosen vector backend

### Dependency policy

Because LlamaIndex packaging changes quickly, the coding agent should:

1. resolve current upstream package names for:
   - core `llama-index`
   - OneDrive reader
   - SharePoint reader
   - vector backend adapter
2. pin exact versions in `requirements.lock`
3. never float major versions implicitly
4. update only via explicit reviewed dependency bumps

### Default backend choice

Recommended MVP default:
- **local persistent Chroma** backend

Reason:
- simple local deployment
- no extra external service required
- sufficient for first sidecar implementation

Future optional backends:
- Qdrant
- LlamaCloud Index
- custom managed vector store

---

## Environment and secrets

```text
LLAMAINDEX_SIDEcar_HOST=127.0.0.1
LLAMAINDEX_SIDECAR_PORT=8094
LLAMAINDEX_STORAGE_ROOT=/var/lib/elnor/llamaindex
LLAMAINDEX_VECTOR_BACKEND=local_chroma
LLAMAINDEX_LOG_LEVEL=INFO

M365_TENANT_ID=...
M365_CLIENT_ID=...
M365_CLIENT_SECRET_REF=secret://m365/llamaindex/client-secret
M365_USER_PRINCIPAL_NAME=...

LLAMAINDEX_ENABLE_ONEDRIVE=true
LLAMAINDEX_ENABLE_SHAREPOINT=true
```

### Secret rule
Do not store raw secrets in ELNOR memory.
Use:
- OS keychain
- secrets manager
- or approved runtime secret store

EC stores only secret references, not secret values.

---

## Connector strategy

## OneDrive
Use the official LlamaIndex OneDrive reader for corpus ingestion.

Required capabilities:
- scoped folder ingest by `folder_id` or `folder_path`
- recursive ingest for matter folders
- file metadata capture
- extension filtering
- manual and scheduled refresh

## SharePoint
Use the official LlamaIndex SharePoint reader for corpus ingestion.

Required capabilities:
- scoped site + drive + folder ingest
- folder or site-page support when needed
- optional permission metadata attachment
- folder / drive scoping
- manual and scheduled refresh

## Local fallback corpus
Optionally support `SimpleDirectoryReader` or a local reader for:
- synced OneDrive/SharePoint mirrors
- brief-bank exports
- local work-product archives

---

## Ingestion pipeline rules

1. Ingest only explicitly bound corpora.
2. Attach metadata to every document/chunk:
   - corpus_id
   - source_type
   - project_id / matter_id if available
   - source path
   - site/drive/folder identifiers
   - modified time
   - indexed time
   - document type / extension
3. Use deterministic chunking policy:
   - legal briefs and motions should preserve section headers if possible
   - chunk overlap required
   - chunk size and overlap configurable
4. Store provenance on every chunk.
5. Never drop corpus provenance.

Recommended chunk policy defaults:
- `chunk_size_tokens = 600`
- `chunk_overlap_tokens = 120`

---

## Query modes and when to use them

### `semantic`
Use for:
- concept search
- argument search
- issue search
- case-theory search

### `semantic_with_keyword_boost`
Use for:
- concept search where key terms also matter
- named doctrines or legal standards
- finding concepts inside one matter

### `hybrid`
Default for:
- most legal matter searches
- mixed exact + conceptual requests
- cross-document synthesis

---

## Query endpoint

```text
POST /query
```

### FastAPI implementation skeleton

```python
# services/llamaindex-sidecar/app/routes.py
from fastapi import APIRouter, HTTPException
from .models import QueryRequest
from .query import run_query

router = APIRouter()

@router.post("/query")
async def query_endpoint(request: QueryRequest):
    try:
        return await run_query(request)
    except ValueError as exc:
        raise HTTPException(status_code=400, detail=str(exc))
    except PermissionError as exc:
        raise HTTPException(status_code=403, detail=str(exc))
    except Exception as exc:
        raise HTTPException(status_code=500, detail=f"llamaindex_query_failed: {exc}")
```

---

## Health endpoint

```text
GET /health
```

```python
@router.get("/health")
async def health_endpoint():
    from .health import get_health_status
    return await get_health_status()
```

---

## Sync endpoints

```text
POST /corpora/:corpusId/refresh
GET  /corpora
GET  /corpora/:corpusId
```

### Behavior
- manual refresh starts a sync job
- sync job updates index and freshness metadata
- response should include:
  - job id
  - corpus id
  - accepted status
  - expected mode (`manual`/`scheduled`)

---

## Storage and persistence

### EC-owned metadata/config
```text
ELNOR_MEMORY/system/retrieval/llamaindex/
```

### Sidecar-owned runtime storage
```text
/var/lib/elnor/llamaindex/
```

Subdirectories:
```text
corpora/
vector_store/
docstore/
sync_jobs/
cache/
logs/
```

### Retention
- keep sync job history for 30 days by default
- keep per-query request logs only in redacted form
- no raw document text in logs
- chunk/document content lives in vector/docstore only

---

## EC integration

## A. New retrieval provider client

```ts
// apps/ec-service/src/retrieval/providers/llamaindex-client.ts
import { LlamaIndexHealthSchema, LlamaIndexQueryRequestSchema, LlamaIndexQueryResponseSchema } from "@contracts/retrieval/llamaindex";
import { z } from "zod";

export interface LlamaIndexClientConfig {
  baseUrl: string;
  timeoutMs: number;
}

export class LlamaIndexClient {
  constructor(private readonly config: LlamaIndexClientConfig) {}

  async query(input: z.infer<typeof LlamaIndexQueryRequestSchema>) {
    const parsed = LlamaIndexQueryRequestSchema.parse(input);
    const res = await fetch(`${this.config.baseUrl}/query`, {
      method: "POST",
      headers: { "content-type": "application/json" },
      body: JSON.stringify(parsed),
      signal: AbortSignal.timeout(this.config.timeoutMs),
    });

    if (!res.ok) {
      throw new Error(`llamaindex_query_failed:${res.status}`);
    }

    const json = await res.json();
    return LlamaIndexQueryResponseSchema.parse(json);
  }

  async health() {
    const res = await fetch(`${this.config.baseUrl}/health`, {
      method: "GET",
      signal: AbortSignal.timeout(this.config.timeoutMs),
    });

    if (!res.ok) {
      throw new Error(`llamaindex_health_failed:${res.status}`);
    }

    const json = await res.json();
    return LlamaIndexHealthSchema.parse(json);
  }
}
```

## B. Search orchestrator integration

Add route kind:
```ts
export const SearchProviderKindSchema = z.enum([
  "m365_drive_search",
  "m365_graph_search",
  "m365_retrieval_api",
  "qmd_local_semantic",
  "llamaindex_index",
  "openclaw_local_fs",
  "browser_fallback",
]);
```

Add evaluation logic:
- only eligible if corpus binding exists
- only eligible if health is healthy/degraded
- prefer for semantic or cross-corpus search
- not first choice for exact file lookup in a known Microsoft folder unless no better live route is available

---

## Search orchestrator policy

### Default route recommendations

**Exact document lookup in known matter folder**
- `m365_drive_search`
- `m365_graph_search`
- `openclaw_local_fs`
- `browser_fallback`

**Semantic issue/argument retrieval in Microsoft matter docs**
- `m365_retrieval_api`
- `llamaindex_index`
- `qmd_local_semantic`
- `browser_fallback`

**Cross-corpus work-product search**
- `llamaindex_index`
- `qmd_local_semantic`
- `openclaw_local_fs`

### Adaptive learning
ELNOR may adapt route scores over time based on:
- user opened result
- user accepted result
- zero-hit failure
- wrong-result correction
- stale corpus
- auth failure
- timeout

But learning is bounded:
- policy cannot be overridden
- disabled providers stay disabled
- score changes are bounded
- user may pin/prefer/ban routes

---

## User controls / UI

Recommended Q surfaces:

### Settings → Retrieval Providers
Show:
- provider name
- status
- corpora count
- last indexed
- stale flag
- enable/disable toggle
- provider notes

### Settings → Semantic Corpora
For each corpus:
- title
- source type
- project/matter binding
- source path / folder label
- sync mode
- last refreshed
- document count
- chunk count
- actions:
  - refresh
  - disable
  - inspect
  - remove

### Search results UI additions
Show:
- source chip (`Microsoft Live`, `LlamaIndex`, `QMD`, `Local`)
- corpus chip
- freshness chip
- “why this route” tooltip
- “compare another route” action in advanced mode

### Matter search advanced panel
Allow:
- force route once
- pin preferred route for this matter
- disable route for this matter
- debug route scoring

---

## Telemetry

Emit:

```text
retrieval.provider.selected
retrieval.provider.fallback
retrieval.provider.degraded
retrieval.provider.timeout
retrieval.corpus.refreshed
retrieval.corpus.stale
retrieval.result.accepted
retrieval.result.rejected
retrieval.route.learned
```

Each event should include:
- `trace_id`
- `provider_kind`
- `corpus_id[]`
- `project_id?`
- `matter_id?`
- `query_class`
- `latency_ms`
- `result_count`
- `decision_reason_codes[]`

---

## Risks and mitigations

### Risk 1 — Duplicate search ecosystems
**Mitigation:** LlamaIndex remains one provider among several, not a replacement for QMD or Microsoft live retrieval.

### Risk 2 — Stale semantic index
**Mitigation:** freshness metadata, manual refresh, scheduled refresh, stale warnings, and route scoring penalties.

### Risk 3 — Permission drift
**Mitigation:** connector health checks; do not treat sidecar index as live access truth for permissions.

### Risk 4 — Massive over-indexing
**Mitigation:** explicit corpus bindings only; no whole-tenant indexing by default.

### Risk 5 — Single-writer violations
**Mitigation:** sidecar writes only its own runtime storage; EC owns canonical config and any proposals.

### Risk 6 — Legal/privacy concerns
**Mitigation:** local/OSS sidecar default; no LlamaCloud or remote managed service by default unless explicitly approved.

---

## Suggested implementation sequence

### Phase 1 — sidecar MVP
- Python service
- OneDrive / SharePoint readers
- local persistent vector backend
- manual corpus refresh
- `/query` + `/health`
- EC client integration
- no adaptive learning yet

### Phase 2 — routed semantic provider
- Search Orchestrator route kind
- corpus bindings
- telemetry
- Q provider/corpus UI
- route-aware result receipts

### Phase 3 — learning and refinement
- adaptive scoring
- query-class route learning
- project/matter route pinning
- compare-route debugging
- broader corpora

### Phase 4 — optional managed modes
- optional LlamaCloud support
- optional advanced reranking
- optional multi-index / fusion retrieval packs

---

## Acceptance criteria

DOC18 is complete when:

1. A coding agent can install and run the sidecar from the spec without guessing core architecture.
2. OneDrive and SharePoint corpora can be ingested into named corpora.
3. The sidecar can return semantic/hybrid query results with provenance.
4. EC can call the sidecar through a dedicated provider client.
5. ELNOR can route to the sidecar as one provider among several.
6. Q can display provider/corpus health and route usage.
7. The sidecar does not become canonical memory or project identity.
8. Disabling the sidecar degrades cleanly to other search providers.

---

## Recommended decision for now

Adopt DOC18 as a dedicated subsystem spec.

Do **not** bury LlamaIndex details inside DOC3 core text.
Do **not** leave it as a vague punch-list item only.
Use DOC18 for implementation, and DOC3 only for the architectural hook.


---

# Part 2 — Merged Revision — DOC18 LlamaIndex Retrieval Sidecar R1.1 (Retrieval Lane and Topology Alignment)



# DOC18 — LlamaIndex Retrieval Sidecar (R1.1)

**Date:** 2026-03-10  
**Status:** targeted owner-doc revision — retrieval-lane truth and graph/topology alignment  
**Supersedes:** DOC18 R1  
**Scope rule:** This is a focused R1.1 revision. Unchanged portions of R1 remain operative unless replaced below.

---

## What changed in R1.1

R1.1 keeps the original sidecar architecture and adds the missing control-plane details needed to keep retrieval coherent with the new graph/topology amendment wave:

- formal **retrieval-lane truth** for `llamaindex_index`,
- provider-level **retrieval receipt fields** that route traces and Q surfaces can show consistently,
- explicit **corpus health/current-view** exports owned by EC,
- stronger boundary text so LlamaIndex does **not** drift into canonical memory, project identity, permission truth, or graph truth,
- explicit metadata and reason-code support so broader graph/topology read-models can consume sidecar results without making the sidecar the owner of graph state,
- clarified UI and telemetry requirements for provider truth, corpus truth, stale/degraded truth, and “why this route?” explanation.

---

## 1. Purpose (unchanged, but restated)

DOC18 defines a **dedicated LlamaIndex-based retrieval sidecar** for ELNOR.

It exists to provide:

- semantic retrieval over selected document corpora,
- folder- and matter-scoped retrieval over OneDrive / SharePoint / local mirrors,
- metadata-aware and cross-corpus retrieval when a live/exact route is not the best fit,
- a controlled sidecar implementation that **enhances** ELNOR without replacing:
  - ELNOR Core canonical memory,
  - DOC7 Context Buckets,
  - QMD / internal search,
  - OpenClaw local file/runtime capabilities,
  - Microsoft live APIs / MCP connectors.

---

## 2. Architectural decision (clarified)

### 2.1 R1.1 owner split

**DOC18 owns:**
- the LlamaIndex sidecar itself,
- corpus ingestion / sync / query behavior,
- provider-level health and corpus-level health,
- provider-specific receipt fields,
- provider-specific route reason codes,
- sidecar API contracts,
- sidecar storage and retention.

**DOC18 does not own:**
- canonical memory,
- project/matter identity,
- permission truth,
- OpenClaw native memory/runtime search,
- the broader graph/topology read-model,
- route-trace persistence,
- suite-wide retrieval receipts,
- Q global “search mode” or route-debug surfaces.

Those other items remain owned by their respective docs.

### 2.2 Retrieval-lane doctrine

`llamaindex_index` belongs to the **semantic corpus lane**.

It is the right fit for:
- matter folder semantic retrieval,
- brief-bank / pre-litigation-bank search,
- selected work-product search,
- cross-corpus semantic search,
- controlled fallback semantic retrieval.

It is **not** the default first-choice path for:
- exact known-file lookup,
- live metadata truth,
- live permissions truth,
- “what exists right now in Microsoft?” truth,
- canonical memory retrieval.

### 2.3 Graph/topology boundary

The broader graph/topology layer may:
- consume sidecar results,
- use corpus metadata,
- use result reason codes,
- use result provenance and stable aliases,
- use sidecar freshness/degraded state in explanations.

The sidecar may **not**:
- invent canonical topology nodes/edges as truth,
- silently write graph truth into ELNOR canonical storage,
- act as the owner of contradiction/supersession state,
- become the system’s universal memory substrate.

---

## 3. Provider truth and retrieval receipts

### 3.1 Provider-level receipt fields

DOC18 must export provider-level receipt fields that can be lifted into route traces, Q receipts, and CIL/advisor explanations.

```ts
// packages/contracts/src/retrieval/provider-receipts.ts
import { z } from "zod";

export const RetrievalLaneSchema = z.enum([
  "exact_live_lookup",
  "semantic_corpus",
  "canonical_memory",
  "native_runtime_local",
  "browser_fallback",
]);

export const RetrievalProviderReceiptSchema = z.object({
  provider_kind: z.literal("llamaindex_index"),
  search_lane: z.literal("semantic_corpus"),
  corpus_ids: z.array(z.string().max(120)).default([]),
  route_reason_codes: z.array(z.string().max(120)).default([]),
  freshness_state: z.enum(["fresh", "stale", "unknown"]).default("unknown"),
  degraded_reason: z.string().max(240).optional(),
  route_note: z.string().max(240).optional(),
  topology_enrichment_status: z.enum([
    "not_requested",
    "eligible",
    "applied_by_consumer",
    "suppressed",
    "unknown",
  ]).default("unknown"),
  schema_version: z.literal(1),
});
```

### 3.2 Route reason codes required from DOC18

DOC18 should define provider-specific reason codes including at least:

- `llamaindex_corpus_match`
- `llamaindex_cross_corpus_query`
- `llamaindex_semantic_best_fit`
- `llamaindex_hybrid_best_fit`
- `llamaindex_exact_route_not_available`
- `llamaindex_stale_penalty`
- `llamaindex_health_degraded`
- `llamaindex_corpus_disabled`
- `llamaindex_topology_metadata_present`

These codes are provider truth, not UI prose. Downstream docs may render user-facing explanations from them.

### 3.3 Receipt rule

Every successful or failed sidecar query that participates in ELNOR routing must expose enough provider truth that EC can write a coherent route trace and Q can render a coherent receipt.

That means the response must contain, directly or derivably:

- `provider_kind`
- `search_lane`
- `corpus_ids`
- `route_reason_codes`
- `freshness_state`
- `degraded_reason?`

---

## 4. Corpus model and health read-models

### 4.1 Corpus model (clarified)

The sidecar still manages **named corpora** rather than one giant index.

Each corpus remains:

- explicitly bound,
- explicitly enabled/disabled,
- explicitly scoped,
- freshness-tracked,
- separately inspectable,
- reversible/removable.

### 4.2 EC-owned current views

The sidecar continues to own its runtime storage, but EC must maintain current-view read models for routing and Q.

```text
ELNOR_MEMORY/system/retrieval/llamaindex/corpus_bindings_current.json
ELNOR_MEMORY/system/retrieval/llamaindex/provider_health_current.json
ELNOR_MEMORY/system/retrieval/llamaindex/corpus_health_current.json
ELNOR_MEMORY/system/retrieval/llamaindex/query_stats_current.json
ELNOR_MEMORY/system/retrieval/llamaindex/route_receipts.jsonl
```

### 4.3 New corpus-health schema

```ts
// packages/contracts/src/retrieval/llamaindex.ts
export const LlamaIndexCorpusHealthSnapshotSchema = z.object({
  corpus_id: z.string().max(120),
  provider_kind: z.literal("llamaindex_index"),
  status: z.enum(["healthy", "stale", "syncing", "error", "disabled", "unknown"]),
  freshness_state: z.enum(["fresh", "stale", "unknown"]).default("unknown"),
  last_indexed_at: z.string().datetime().optional(),
  last_sync_started_at: z.string().datetime().optional(),
  last_sync_finished_at: z.string().datetime().optional(),
  document_count: z.number().int().nonnegative().default(0),
  chunk_count: z.number().int().nonnegative().default(0),
  degraded_reason: z.string().max(240).optional(),
  last_error: z.string().max(240).optional(),
  schema_version: z.literal(1),
});

export const LlamaIndexProviderHealthCurrentSchema = z.object({
  service_status: z.enum(["healthy", "degraded", "disabled", "unknown"]),
  corpora: z.array(LlamaIndexCorpusHealthSnapshotSchema).default([]),
  updated_at: z.string().datetime(),
  schema_version: z.literal(1),
});
```

### 4.4 Corpus binding metadata additions

To support better retrieval and future topology-aware consumption, R1.1 adds optional metadata fields to corpus bindings.

```ts
export const LlamaIndexCorpusBindingSchema = z.object({
  corpus_id: z.string().max(120),
  title: z.string().max(200),
  source_type: LlamaIndexSourceTypeSchema,
  project_id: z.string().max(120).optional(),
  matter_id: z.string().max(120).optional(),
  source_binding_ref: z.string().max(240),
  root_locator: z.object({
    drive_id: z.string().optional(),
    site_id: z.string().optional(),
    folder_id: z.string().optional(),
    folder_path: z.string().optional(),
    local_path: z.string().optional(),
  }),
  include_patterns: z.array(z.string().max(240)).default([]),
  exclude_patterns: z.array(z.string().max(240)).default([]),
  metadata_defaults: z.record(z.string(), z.unknown()).default({}),
  motion_type_defaults: z.array(z.string().max(120)).default([]),
  document_role_defaults: z.array(z.string().max(120)).default([]),
  workflow_stage_defaults: z.array(z.string().max(120)).default([]),
  topology_labels: z.array(z.string().max(120)).default([]),
  sync_mode: LlamaIndexSyncModeSchema.default("manual"),
  enabled: z.boolean().default(true),
  schema_version: z.literal(1),
});
```

These fields do **not** make the sidecar the owner of legal taxonomy or graph truth. They are hint/provenance fields that help routing, filtering, and later topology-aware explanation.

---

## 5. Query/request/response changes

### 5.1 Query request additions

```ts
export const LlamaIndexQueryRequestSchema = z.object({
  corpus_ids: z.array(z.string().max(120)).min(1),
  query: z.string().max(4000),
  top_k: z.number().int().min(1).max(50).default(8),
  filters: z.record(z.string(), z.unknown()).default({}),
  mode: z.enum([
    "semantic",
    "semantic_with_keyword_boost",
    "hybrid",
  ]).default("hybrid"),
  include_chunks: z.boolean().default(true),
  include_documents: z.boolean().default(true),
  include_debug: z.boolean().default(false),
  include_receipt_fields: z.boolean().default(true),
  intended_query_class: z.enum([
    "exact_document_lookup",
    "semantic_issue_lookup",
    "cross_corpus_lookup",
    "support_pack_lookup",
    "mixed_unknown",
  ]).default("mixed_unknown"),
  trace_id: z.string().max(120).optional(),
  schema_version: z.literal(1),
});
```

### 5.2 Query response additions

```ts
export const LlamaIndexTopologyMetadataSchema = z.object({
  stable_doc_aliases: z.array(z.string().max(240)).default([]),
  relation_hints: z.array(z.string().max(120)).default([]),
  neighbor_expansion_eligible: z.boolean().default(false),
  schema_version: z.literal(1),
});

export const LlamaIndexQueryResponseSchema = z.object({
  corpus_ids: z.array(z.string().max(120)),
  hits: z.array(z.object({
    document: LlamaIndexResultDocumentSchema,
    chunks: z.array(LlamaIndexResultChunkSchema).default([]),
    aggregate_score: z.number(),
    topology_metadata: LlamaIndexTopologyMetadataSchema.optional(),
  })),
  retrieval_mode: z.enum(["semantic", "semantic_with_keyword_boost", "hybrid"]),
  freshness: z.object({
    indexed_at: z.string().datetime().optional(),
    stale: z.boolean().default(false),
    stale_reason: z.string().max(240).optional(),
  }),
  provider_receipt: RetrievalProviderReceiptSchema.optional(),
  trace_id: z.string().max(120).optional(),
  schema_version: z.literal(1),
});
```

### 5.3 Response rule

If `include_receipt_fields=true`, the sidecar must return `provider_receipt` unless the request fails before the sidecar can determine route/provider truth.

If the sidecar is degraded or stale, it must say so explicitly.

---

## 6. Metadata and topology boundary rules

### 6.1 What topology-aware metadata is allowed

Allowed:
- stable aliases for documents/corpora,
- motion/document/workflow metadata attached as corpus or document metadata,
- result-level “relation hints” such as `same_issue_candidate`, `same_matter`, `support_pack_candidate`,
- provenance back to corpus/document/chunk.

### 6.2 What topology-aware behavior is not allowed here

Not allowed:
- the sidecar deciding canonical contradiction truth,
- the sidecar deciding supersession truth,
- the sidecar mutating ELNOR topology snapshots,
- the sidecar writing relation edges into canonical memory or DocIndex current views,
- the sidecar making unbounded graph walks on behalf of consumers.

### 6.3 Consumer rule

If a downstream system wants graph-neighbor expansion, contradiction filtering, support-pack grouping, or supersession-aware display, it must use the graph/topology read-model owned elsewhere.

DOC18 only provides enough metadata and provenance to support that.

---

## 7. Sidecar API additions

### 7.1 Current endpoints remain

- `POST /query`
- `GET /health`
- `POST /corpora/:corpusId/refresh`
- `GET /corpora`
- `GET /corpora/:corpusId`

### 7.2 Add corpus-health snapshot endpoint

```text
GET /corpora-health
```

Returns:
- provider service health,
- corpus health list,
- freshness/degraded notes,
- update timestamp.

### 7.3 Add receipt-friendly debug endpoint

```text
POST /query/debug
```

Use only in advanced/admin mode.

Returns:
- the normal query response,
- provider receipt fields,
- route reason codes,
- corpus health snapshot used at query time,
- selected query mode,
- filter summary.

This endpoint is not required for hot-path operation, but it is strongly recommended for implementation-time debugging and Q advanced mode.

### 7.4 Auth / availability rule

If the sidecar or corpus is unavailable:
- return a structured degraded/error response,
- never imply silent success,
- never let the consumer mistake “no sidecar result” for “no matching documents exist.”

---

## 8. EC integration changes

### 8.1 Provider client behavior

The EC-side provider client must parse and preserve `provider_receipt` when present.

```ts
// apps/ec-service/src/retrieval/providers/llamaindex-client.ts
export class LlamaIndexClient {
  constructor(private readonly config: LlamaIndexClientConfig) {}

  async query(input: z.infer<typeof LlamaIndexQueryRequestSchema>) {
    const parsed = LlamaIndexQueryRequestSchema.parse(input);
    const res = await fetch(`${this.config.baseUrl}/query`, {
      method: "POST",
      headers: { "content-type": "application/json" },
      body: JSON.stringify(parsed),
      signal: AbortSignal.timeout(this.config.timeoutMs),
    });

    if (!res.ok) {
      throw new Error(`llamaindex_query_failed:${res.status}`);
    }

    const json = await res.json();
    return LlamaIndexQueryResponseSchema.parse(json);
  }
}
```

### 8.2 Route-trace integration

Whenever `llamaindex_index` is chosen, EC must write route-trace fields that preserve:
- provider kind,
- lane,
- corpus IDs,
- reason codes,
- freshness state,
- degraded reason.

### 8.3 Current-view integration

EC must refresh `provider_health_current.json` and `corpus_health_current.json` on:
- startup,
- scheduled health polls,
- manual refresh completion,
- sidecar state changes.

---

## 9. Q surfaces and user experience

### 9.1 Settings → Retrieval Providers

Add:
- provider status,
- degraded/stale warning,
- corpus count,
- last health update,
- link to corpus list.

### 9.2 Settings → Semantic Corpora

Each corpus row should show:
- title,
- project/matter binding,
- source type,
- freshness chip,
- degraded chip if any,
- doc/chunk counts,
- actions: refresh / disable / inspect / remove.

### 9.3 Search result receipt surface

Any result influenced by DOC18 should be able to show:
- provider chip (`LlamaIndex`),
- lane chip (`Semantic Corpus`),
- corpus chips,
- freshness chip,
- route note / reason tooltip,
- degraded warning if stale or degraded.

### 9.4 Advanced mode

Advanced mode should be able to show:
- compare-route option,
- route reason codes,
- corpus health snapshot,
- query mode (`semantic`, `hybrid`, etc.),
- “topology enrichment eligible” if provided by result metadata.

---

## 10. Telemetry additions

Retain existing telemetry and add/clarify:

```text
retrieval.provider.selected
retrieval.provider.fallback
retrieval.provider.degraded
retrieval.provider.timeout
retrieval.corpus.refreshed
retrieval.corpus.stale
retrieval.corpus.health_changed
retrieval.result.accepted
retrieval.result.rejected
retrieval.route.learned
retrieval.receipt.rendered
```

Each event should include:
- `trace_id`
- `provider_kind`
- `search_lane`
- `corpus_id[]`
- `project_id?`
- `matter_id?`
- `query_class`
- `latency_ms`
- `result_count`
- `decision_reason_codes[]`
- `freshness_state`
- `degraded_reason?`

---

## 11. File and module additions

### Contracts
```text
packages/contracts/src/retrieval/provider-receipts.ts
packages/contracts/src/retrieval/llamaindex.ts
```

### EC
```text
apps/ec-service/src/retrieval/providers/llamaindex-client.ts
apps/ec-service/src/retrieval/health/llamaindex-health-poller.ts
apps/ec-service/src/retrieval/receipts/write-retrieval-receipt.ts
apps/ec-service/src/retrieval/current-views/write-corpus-health-current.ts
```

### Sidecar
```text
services/llamaindex-sidecar/app/routes.py
services/llamaindex-sidecar/app/query.py
services/llamaindex-sidecar/app/health.py
services/llamaindex-sidecar/app/corpora.py
```

---

## 12. Risks and mitigations (updated)

### Risk 1 — Duplicate search ecosystems
Mitigation: keep LlamaIndex as one provider among several and surface provider truth in receipts.

### Risk 2 — Stale semantic index
Mitigation: explicit freshness/current views, stale warnings, and route penalties.

### Risk 3 — Permission drift
Mitigation: do not treat sidecar as live permissions truth.

### Risk 4 — Graph drift
Mitigation: sidecar never owns contradiction/supersession or broader topology truth.

### Risk 5 — “LlamaIndex becomes the memory system”
Mitigation: keep canonical memory retrieval in the EC-owned memory lane; sidecar remains corpus-scoped and non-canonical.

### Risk 6 — Provider truth goes invisible
Mitigation: provider receipt fields are mandatory for any sidecar-assisted result that reaches route traces/Q/CIL.

---

## 13. Acceptance criteria (expanded)

DOC18 R1.1 is complete when:

1. A coding agent can run the sidecar without guessing the lane/ownership split.
2. `llamaindex_index` produces provider receipt fields that downstream route traces can preserve.
3. Corpus health and freshness can be shown in Q without custom sidecar spelunking.
4. Sidecar-assisted search results can say which corpora were used and why the route was selected.
5. The sidecar remains non-canonical and does not become project identity, permission truth, or graph truth.
6. A downstream graph/topology consumer can use sidecar metadata/provenance without requiring DOC18 to own graph storage.
7. Disabling the sidecar degrades cleanly to other providers.

---

## 14. Recommended decision for this wave

Keep the original DOC18 build direction.

R1.1 does **not** widen the sidecar into a giant new search empire. It just closes the missing control-plane seams so the rest of the suite can use LlamaIndex results honestly, consistently, and without drifting into a second hidden memory system.