ELNOR_SEARCH_ARCHITECTURE_REFERENCE_R1.md
Current Specs/Miscellaneous Specs/ELNOR_SEARCH_ARCHITECTURE_REFERENCE_R1.md
# ELNOR Search Architecture Reference
**Status:** Reference document, not an operative spec. Synthesizes existing specs into one place for navigation. Non-normative — every claim points at the underlying owner spec which remains authoritative.
**Date:** 2026-04-27
**Version:** R1
**Author context:** Will Brody, principal architect.
**Purpose:** Will's working flow involves heavy and varied corpus search — *grab me X documents*, *grab me documents showing X*, *what do these documents say about X and pull the relevant ones*, etc. The routing logic for these is layered across DOC24 §13.2, DOC16 16.7 §C, V4 (subagent precompute spec) §§1.7–1.9, DOC25 §16, DOC18 R2, DOC72, and DOC73. This reference walks every common search intent end-to-end, names the layers and agents that handle it, and provides decision trees so future-Will and future-Claude don't have to reconstruct the routing from scratch every time.
This is a **reference**, not an addendum. It modifies nothing. If review reveals routing decisions that are missing or wrong, the fix lands in the underlying owner spec (DOC24 / DOC73 / etc.), and this reference is updated to track. The reference's value is being a single navigable place; it carries no normative weight.
---
## 1. The three search layers
Three orthogonal layers exist in ELNOR, each answering a different kind of question. They are not redundant. Most user questions touch two or all three.
| Layer | What it stores | Owner spec | Tech | Latency | Answers |
|---|---|---|---|---|---|
| **A. Document text + chunks** | Raw extracted text per page, semantic chunks, vector embeddings | DOC25 §11–§14 (extraction); DOC18 R2 (indexing sidecar) | MarkItDown / Docling output → LlamaIndex vector index → Qwen3-Embedding-0.6B | 100–500ms across a multi-thousand-doc corpus | *"Find passages mentioning insider stock sales."* |
| **B. Extracted memories** | Structured outputs from corpus extraction profiles — legal theories, citations, holdings, fact patterns, scienter arguments, expert methodology, etc. | DOC73 §14 owns extraction; DOC72 owns the resulting memory nodes; DOC25 §17 owns the consumer contract | DOC72 entity graph (SQLite), corpus_id-scoped | 5–50ms for graph queries | *"What scienter theories did plaintiffs argue across MTD oppositions in 2024?"* |
| **C. Document and case metadata** | Filing entity nodes: court, judge, date filed, party, case number, docket entry number, motion type, page count, source instance | DOC72 §20A intake contracts; DOC25 §17 DocumentEntity | DOC72 entity graph (SQLite) — pure structured queries | 5–50ms | *"All 9th Circuit MTD oppositions from 2022 onward."* |
The layers share keys: every chunk in Layer A references a `document_id`; every memory in Layer B references the source document(s) it was extracted from; every metadata record in Layer C *is* a document. So a query touching all three flows naturally — Layer C narrows the universe, Layer B applies the structured filter, Layer A drills to passages.
---
## 2. The agents that drive search
Per V4 (subagent precompute spec) §§1.7–1.9 and the precompute-vs-runtime division. Three actors:
**Primary LLM** (the conversation's main agent — Elnor itself in chat, or a corpus-chat agent inside an open corpus). Keeps a fast toolkit for direct queries: `search_knowledge`, `search_memories`, `retrieve_memory_by_id`, basic `retrieve_document_pages`. **Per V4 line 88, internal graph and metadata queries (Layers B and C) NEVER spawn a sub-agent — graph queries are 5–50ms; the spawn cost would be slower than the work.** Primary handles them directly.
**MemoryAgent** (V4 §1.8). Specialist for multi-step retrieval through DOC72/DOC73 memories with judgment about scope, relevance, gaps, and when to escalate to documents. Tools: `search_knowledge`, `retrieve_memory_by_id`, `retrieve_related_memories`, `retrieve_memory_cluster`, `search_memories`, `retrieve_memories_by_entity`, plus basic `retrieve_document_pages` and `retrieve_full_document` for simple drill-downs within scope.
**DocumentIntelligenceAgent** (V4 §1.9). Specialist for document processing — Docling/MarkItDown conversion, OCR, chunked analysis of large documents, multi-document parallel analysis, format/layout reasoning. Tools: Docling, MarkItDown, OCR pipelines, `retrieve_document_pages`, `retrieve_full_document`, `ensure_materialized` (per `DOC25_FILE_MATERIALIZATION_AND_PROVIDER_PROFILES_PROPOSAL_V1.md`), plus basic `search_memories` and `search_knowledge` for simple lookups within scope.
Two cross-pollination rules:
- MemoryAgent escalates to DocumentIntelligenceAgent for OCR-heavy, large-document, or multi-document parallel work (V4 line 295). Direct spawn if `maxSpawnDepth` permits; return-with-suggestion otherwise.
- DocumentIntelligenceAgent stays in document territory and uses its memory tools only for simple lookups; it does not do graph thread-following — that's MemoryAgent's job.
JIT mounting per DOC24 R2.5 (V4 line 134): when a corpus is active in chat, both specialists mount; when no corpus is active, only DocumentIntelligenceAgent mounts (for ad-hoc document work). MemoryAgent without a corpus or active scope has nothing useful to delegate to it.
---
## 3. Routing the four canonical query intents
These are the four routing intents documented in DOC16 16.7 §C lines 294–297, generalized. Real-world questions usually map to one of them or compose them.
### 3.1 Intent: Known-document retrieval
*"Pull up the Henderson MTD."*
User has a specific document in mind, identified by case + document type or by docket entry.
```
Primary LLM
│
├─ Step 1: search_knowledge with (case=Henderson, doc_type=motion_to_dismiss)
│ → Layer C lookup, returns document_id(s)
│
├─ Step 2: ensure_materialized(document_id)
│ → if local, no-op; if cloud-only, materialization trigger sequence
│ per DOC25 V2.1 §4 (file-materialization addendum)
│
└─ Step 3: render to user — link, viewer, or chat preview per request shape
```
Latency: ~50ms for step 1, 0ms–seconds for step 2 depending on materialization state, instant for step 3.
No specialist agent involvement. Primary handles end-to-end. If the user asks for *"the latest Henderson MTD"* and there are multiple, step 1 returns the candidates and the primary clarifies.
### 3.2 Intent: Semantic / conceptual document search
*"Find me a 9th Circuit MTD arguing loss causation."*
User has a concept or argument theme but no specific case. Returns documents (not just answers).
```
Primary LLM
│
├─ Step 1: Layer C narrowing via search_knowledge
│ (court=9th Circuit, doc_type=MTD)
│ → set of candidate document_ids
│
├─ Step 2: Layer A semantic search via DOC18 LlamaIndex
│ scoped to the candidates — embedding query
│ for "loss causation argument"
│ → top-K chunks ranked by similarity
│ → de-duplicated to top-K documents
│
├─ Step 3 (optional): If user wants surrounding context,
│ spawn DocumentIntelligenceAgent for each top-K
│ doc to retrieve the section containing the matching
│ chunks plus argumentation context.
│
└─ Step 4: present ranked documents with hit snippets
```
Step 1 is 5–50ms primary-direct. Step 2 is 100–500ms (DOC18 vector search). Step 3 spawns DocumentIntelligenceAgent only if the user wants more than snippets.
This is the dominant pattern for brief-bank exploration. The Layer-C-then-Layer-A funnel is much faster than running Layer A across the whole corpus.
### 3.3 Intent: Hybrid case + concept
*"Find everything about loss causation in Henderson."*
User has both a case scope and a concept.
```
Primary LLM
│
├─ Step 1: Layer C: scope to all documents in Henderson corpus
│ or all filings tagged with case=Henderson
│
├─ Step 2 (parallel-spawn): three branches
│
│ ├─ Branch A: Layer A — DOC18 chunk search within Henderson scope
│ │ for "loss causation"
│ │
│ ├─ Branch B: Layer B — MemoryAgent for extracted memories
│ │ tagged with theme=loss_causation AND
│ │ case=Henderson
│ │
│ └─ Branch C (optional): DOC16 16.7 Graph Search across
│ the Henderson SharePoint folder for files not
│ in any corpus yet — "ambient" Henderson docs
│
└─ Step 3: merge and present
```
Three parallel sub-agents per V4 line 90 ("if Elnor needs to search transcripts AND documents AND the graph simultaneously for a complex question, three parallel sub-agents may be faster than three sequential tool calls").
This is when the agent fan-out is worth the spawn cost — three real searches with judgment-required result curation.
### 3.4 Intent: Broad firm-wide discovery
*"Has anyone at the firm written about Section 10(b) safe harbor?"*
No case scope, no specific document; broad discovery across all knowledge.
```
Primary LLM
│
├─ Step 1: spawn MemoryAgent with broad scope
│ "Find extracted memories tagged with section_10b
│ OR safe_harbor across all corpora and ambient nodes"
│
├─ Step 2 (in parallel): DOC16 16.7 Graph Search across
│ tenant — KQL query across firm SharePoint /
│ OneDrive for Section 10(b) safe harbor
│
├─ Step 3 (in parallel): DOC18 LlamaIndex across all indexed
│ corpora — vector search on the concept
│
└─ Step 4: MemoryAgent integrates findings and reports
```
Branch ordering matters: MemoryAgent first (cheapest, structured), then Graph Search (broad ambient catch), then LlamaIndex (concept similarity in indexed text). Results merge by document or memory, dedup against the same source.
---
## 4. Mapping Will's stated query patterns
Will's three example modes from the conversation map onto the canonical intents as follows.
### "Grab me X documents"
Pure metadata retrieval. Maps to **§3.1 known-document** (specific X) or **§3.2 semantic search** with the Layer C step alone (X = "all 9th Circuit MTDs"). Primary-direct, no agent spawn for the search; possibly DocumentIntelligenceAgent only if `ensure_materialized` is needed for many.
Examples:
- *"Grab me all the Brooge oppositions to MTDs."* → Layer C: filter on `case=Brooge AND filing_type=opposition_to_mtd`. Primary direct. ~10ms.
- *"Grab me every brief from the brief bank filed in 2024."* → Layer C: corpus members with `date_filed BETWEEN 2024-01-01 AND 2024-12-31`. Primary direct. ~50ms.
### "Grab me documents showing this"
Concept-driven retrieval. Maps to **§3.2 semantic / conceptual search**. Primary direct + DOC18 vector search; optional DocumentIntelligenceAgent for context retrieval per result.
Examples:
- *"Grab me opposition briefs that argue insider trading establishes scienter under Tellabs."* → Layer A semantic search across `securities_mtd_oppositions` corpus, embedding query for the theme; rank by similarity; return top-N briefs with hit snippets. ~500ms.
- *"Show me MTDs where defendants invoked the safe harbor."* → Layer A vector search across MTD corpus; theme = safe-harbor invocation. ~500ms.
### "What do the documents say about X, and grab me the documents"
Synthesis + retrieval. Maps to **§3.3 hybrid** — extract findings (Layer B + Layer A) and return source docs.
Examples:
- *"What do our briefs say about the Affiliated Ute presumption, and grab me the relevant briefs."* → MemoryAgent for extracted memories tagged with theme=affiliated_ute_presumption (Layer B) + DOC18 vector search for actual passages (Layer A); return integrated findings + source documents.
- *"Across our brief bank, what arguments have been most successful against MTDs based on PSLRA scienter pleading? Pull me the briefs."* → MemoryAgent for memories tagged with theme=pslra_scienter_pleading + outcome=mtd_denied (cross-references to outcomes, which Layer C provides via case status); return synthesized argument summary + source briefs.
These are the heaviest queries — multi-agent fan-out, multi-layer integration, sometimes minutes to deliver a polished synthesis. Worth the wait for research-level questions; cost is real.
---
## 5. The decision tree, condensed
Use this as the cheat-sheet flowchart. For any user query, walk it top-down.
```
START: User asks a query.
Q1: Is it about a specific document the user can name (case + doc type or docket)?
YES → §3.1 Known-document retrieval. Primary direct. DONE.
NO → Q2
Q2: Is it about specific structured properties (court, date, party, type, judge)
AND the user wants a list of documents matching those properties?
YES → Layer C only. Primary direct via search_knowledge. DONE.
NO → Q3
Q3: Is it about a concept or theme, and the user wants documents matching the concept?
YES → §3.2 Semantic / conceptual search. Primary + DOC18 LlamaIndex.
Optional DocumentIntelligenceAgent for context per hit. DONE.
NO → Q4
Q4: Is it about a concept AND scoped to a case / corpus / time range?
YES → §3.3 Hybrid. Three-way parallel spawn (chunks, memories, ambient).
DONE.
NO → Q5
Q5: Is it a synthesis question — "what do the docs say about X" — where the
user wants both an answer AND the supporting docs?
YES → MemoryAgent for Layer B (extracted memories) +
DOC18 for Layer A (source passages) +
DocumentIntelligenceAgent if the answer requires reading
full docs that haven't been pre-extracted.
Primary integrates and presents. DONE.
NO → Q6
Q6: Is it broad firm-wide discovery — no case, no corpus, just "has anyone
written about X"?
YES → §3.4 Broad discovery. MemoryAgent + DOC16 16.7 Graph Search +
DOC18 across all indexed corpora. DONE.
NO → User probably wants conversational reasoning, not search.
Primary handles directly using whatever DOC24 already injected.
DONE.
```
---
## 6. Worked routing examples beyond the canonical four
### 6.1 "Find me cases similar to Henderson."
Similarity search, not concept search. Primary direct via `retrieve_related_memories` on Henderson's case entity (DOC72 graph traversal — entity neighbors and shared themes). Optional DOC18 vector search if the user wants similarity by argument structure rather than entity overlap. Bridges to the corpus binding model: if a `similar_cases` corpus exists, similarity hits become candidate corpus members.
### 6.2 "Find every brief I've personally drafted."
Layer C with `created_by=will OR drafted_by=will` filter. The PACER plugin V1.1 §7.1 (Filing entity nodes) records authorship metadata; Will's drafts are similarly captured by DOC16 (file watcher) and content extraction. Primary direct.
### 6.3 "Read the latest Brooge MTD opposition and tell me what their best three arguments are."
Single-document deep read. Spawn DocumentIntelligenceAgent with `task: deep_read_and_summarize, document_id: <brief>, output: top_three_arguments`. DocumentIntelligenceAgent calls `ensure_materialized` first per the materialization addendum, runs the analysis, returns. Primary integrates. ~30s–2min depending on document size.
### 6.4 "Across all opposition briefs in our brief bank, count how many argued insider sales as a scienter theory and group by circuit."
Aggregation query against Layer B. MemoryAgent fetches all extracted memories from `securities_mtd_oppositions` corpus tagged with theme=insider_sales_scienter, joins to Layer C metadata for circuit, returns aggregate. ~200ms. This is the kind of cross-corpus structured analysis Layer B exists for.
### 6.5 "Pull every exhibit table from this brief and give me a structured CSV."
Document-structure extraction, not search. DocumentIntelligenceAgent handles end-to-end: routes the brief through Docling (per DOC25 §10.3 — table-heavy → Docling), extracts table structures, formats as CSV. Layer A is touched but the operation is structural, not search.
### 6.6 "Has the firm ever filed a brief in the District of Maryland on PSLRA grounds?"
Discovery + jurisdiction filter. Primary direct: `search_knowledge` with `court=D.Md. AND theme=PSLRA`. If the firm has a `firm_filed_briefs` corpus that tracks all filings, this is a Layer C + Layer B query. If no such corpus exists yet, escalates to DOC16 16.7 Graph Search over firm SharePoint.
---
## 7. Latency expectations and budget
These are the realistic expectations for each pattern. Sources: V4 line 88 (graph queries 5–50ms), DOC25 §15 (5,000-doc batch policy), DOC18 R2 (LlamaIndex vector search). Numbers assume the materialization addendum is in place — i.e., bytes are reachable.
| Pattern | Realistic latency | What dominates |
|---|---|---|
| Known-document retrieval (§3.1) | 10–50ms (already materialized) / +0–10s (cloud fetch via Graph) | Materialization |
| Pure Layer C metadata query | 5–50ms | SQLite |
| Semantic search Layer C-narrowed (§3.2) | 100–500ms | DOC18 vector search |
| Hybrid case + concept (§3.3) | 500ms – 5s | Parallel sub-agents, sub-agent spawn cost |
| Synthesis with sub-agent reading | 30s – 2min | DocumentIntelligenceAgent reading time |
| Broad discovery (§3.4) | 1–10s | Graph Search latency (network) |
| Cross-corpus aggregation | 200ms – 2s | Graph aggregation over Layer B |
The user-visible UI should set expectations: simple lookups are instant; conceptual searches show a brief spinner; synthesis questions show a progress indicator with intermediate results streaming in (per DOC10 engagement orchestration).
---
## 8. UI surface implications
This reference doesn't specify UI, but it implies what the UI should support. Four surfaces matter:
**Q omnibox / palette search.** The default entry point. Should classify the query (intent Q1–Q6) on input, render an estimated-latency hint, and stream results as they come from each layer. Result groups can be toggled (chunks / memories / metadata).
**Corpus chat.** Operating inside an active corpus, with both MemoryAgent and DocumentIntelligenceAgent JIT-mounted. The corpus scope is implicit; queries typically skip Q6 (broad discovery) and start at Q3 (concept search) or Q5 (synthesis). Per V4 line 134: "corpus-scope active → both mounted."
**Documents tab (DOC25 §19).** Direct browse of documents, sortable / filterable on Layer C metadata. Click-through to full document. This IS Layer C made visible.
**"Ask the corpus" affordance.** A specific entry point that defaults to §3.5 (synthesis). User asks a research question, gets an answer with citations. Spawned agents do their work; primary streams the answer.
---
## 9. What this reference depends on (cross-doc obligations)
This reference assumes the following are operative or absorbed:
| Spec | Provides |
|---|---|
| DOC72 R5.73 §42A | Entity graph, search_knowledge, novelty gate, hybrid retrieval (Layer B + C foundation) |
| DOC73 V1.4.1 + Corpus Source Bindings V1 (forthcoming) | Corpus model, extraction profiles, member nodes (Layer B substantive content) |
| DOC25 V2.0 + File Materialization Proposal V1 (forthcoming) | Document conversion (MarkItDown, Docling), chunk extraction, retrieval tools, materialization (Layer A foundation + byte access) |
| DOC18 R2 | LlamaIndex chunk indexing and vector search (Layer A retrieval) |
| DOC16 16.7 R2.1 | Microsoft Graph Search across tenant; OneDrive / SharePoint path resolution (broad discovery; ambient catch) |
| DOC24 R2.5 | Capability registry, JIT tool / agent mounting, routing decision §13.2 |
| V4 (subagent precompute V4) | MemoryAgent and DocumentIntelligenceAgent specs; primary-vs-specialist division |
| DOC11 | OpenClaw runtime — actual sub-agent spawn mechanics, named workspaces, forked context |
| DOC10 | Engagement orchestration — when to spawn, when to stream, how to integrate sub-agent results |
| EC Core Addendum A V3.3 | Durable queue for async agent work; sole-writer for any state changes |
Anything that *doesn't* hang together across these — e.g., a routing case the user hits that no spec covers — is a cross-doc gap. Flagging-on-encounter is the right discipline; this reference is updated when the gap is filled in the owner spec.
---
## 10. Known gaps and open questions
These are real ambiguities the underlying specs don't yet resolve. Each should be flagged to the relevant owner spec for closure.
1. **Cross-corpus search permissions.** When a corpus is `firewalled` (DOC25 §12 / Q2), should an ambient query for Section 10(b) include hits from inside it or not? DOC25 §12 implies different `source_instance_id` for policy contexts, but the search-side implication isn't spec'd. **Owner: DOC25 V2.1 + DOC73 §3.1 trust posture interaction.**
2. **Synthesis attribution and confidence.** For §3.5 synthesis answers, when MemoryAgent integrates findings from N memories with varying confidence, what's the surfaced confidence? V4 references "specialist judgment" but the actual confidence-rollup logic isn't spec'd. **Owner: V4 §1.8 + DOC73 §15.4 (IngestionQualityReport already covers per-doc quality, not aggregate-answer confidence).**
3. **Real-time freshness vs. cached extraction.** Layer B is pre-computed. If a brief was extracted yesterday and re-uploaded today with edits, when does Layer B reflect the new content? DOC25 V2.0 §13 covers cross-surface dedup; DOC2 covers freshness; the search-layer view of staleness is implicit. **Owner: DOC2 + DOC25 §15 (IngestionQualityReport materialization status from the file-materialization addendum).**
4. **Cross-corpus dedup of memories.** A brief that appears in two corpora (different policy contexts) produces two extracted memory sets via §12 source_instance_ids. When MemoryAgent searches across corpora, should the same brief appear twice in results? **Owner: DOC73 §3.1 + DOC25 §13.**
5. **Agent confidence in its own search judgment.** MemoryAgent decides "findings are incomplete; broaden the search." How does the primary know the search was thorough? V4 line 237 mentions "broaden rather than give up" but no surfacing mechanism. **Owner: V4 §1.8 + DOC10 (engagement orchestration's role in surfacing sub-agent state).**
6. **The unified search router.** No spec currently owns the *decision* of which layers / agents to invoke for a given user query. DOC24 §13.2 has the routing rule set; V4 has the agent decision; this reference describes the integrated tree. But is there an actual runtime component implementing the tree, or is it inlined into the primary's reasoning? Worth deciding explicitly. **Owner candidate: DOC24 R2.6 (forthcoming) or new DOC layer.**
These are not blockers for current development — most queries fall cleanly into the canonical four intents and route reliably. They're collected here so they don't get lost; each gets resolved in its owning spec when the time comes.
---
## 11. Versioning
R1 is the first cut. As underlying specs change (DOC25 V2.1 absorbs the materialization addendum, DOC73 V1.5 absorbs corpus source bindings, V4 evolves), this reference is bumped to R2 and so on, with the changes being purely additive description of the current architecture. The reference never owns truth; it always points at the current operative versions of the underlying specs.
If the routing decisions themselves change (e.g., a new layer is added, or a new specialist agent joins), R-next captures the change. The R1 file is preserved per the post-absorption versioning rule — never re-edited.