Q_INTEGRATION_SUITE_ADDITIONS_R1.md

Current Specs/Connector and Integration Specs/Q_INTEGRATION_SUITE_ADDITIONS_R1.md
Short text page 4646d9f5ec51. Generated 2026-06-09T01:23:58.539Z from commit dbaa25962edc11ab30e8d4ca1715f9ae5bf77331. Worktree: clean.
Open readable HTML page · Open raw txt · Open path URL
ELNOR REPO READER TEXT MIRROR
Original path: Current Specs/Connector and Integration Specs/Q_INTEGRATION_SUITE_ADDITIONS_R1.md
Source repo: /Users/OpenClaw1/Elnor/Elnor Specs
Git branch: main
Git commit: dbaa25962edc11ab30e8d4ca1715f9ae5bf77331
Generated: 2026-06-09T01:23:58.539Z

---

# Q Integration Suite — Additions R1: Firecrawl, Securities Data, MarkItDown

**Date:** 2026-04-19
**Status:** Proposal — ready for integration into Q Integration Suite spec
**Scope:** Three new integrations (§§8-10), DOC24 routing infrastructure for all new integrations, updates to §6 capability registry, §6.3-6.5 knowledge storage / system awareness / cross-integration procedures, and §7 implementation priority
**Target:** Q_INTEGRATION_SUITE_SPEC.md (primary), with cross-doc implications for DOC24 (routing), DOC11 (installation checklist), DOC72 (intake), DOC20 (document extraction)
**Related:** DOC Intelligence & Extraction Pipeline Proposal (companion document — covers OCR, PDF text extraction, and `retrieve_document_pages` tool)

---

## Note on OpenClaw Native Capabilities

Before installing any MCP server, check whether OpenClaw already ships the capability natively. As of early 2026, OpenClaw may include native integrations for web search (Exa, Brave), browser automation (Playwright), and other tools. If a capability is built in, the work is routing configuration (verb families, registry entries) — not installation.

**DOC11 obligation:** Add an "Integration Capabilities Checklist" to DOC11 that confirms which search, browser automation, and data integrations are active in the current OpenClaw installation. This checklist should be consulted before any MCP server is registered.

---

## 8. Firecrawl — Web Content Extraction

### 8.1 What It Does

Firecrawl is a self-hosted web scraping and content extraction service. It converts any URL into clean, LLM-ready markdown — handling JavaScript-rendered pages, rotating proxies, rate limits, and web-hosted PDFs/DOCX files that browser DOM scraping can't reliably extract. It runs as a Docker container on the local machine, keeping all data local.

Firecrawl is **infrastructure, not a user-facing feature.** The user never interacts with Firecrawl directly. Other ELNOR subsystems call it when they need clean web content: the Q Browser intake pipeline uses it for Tier 2 extraction, the DOC72 knowledge intake uses it for URL-based entity extraction, and Elnor uses it through MCP tools when asked to read or research web content.

### 8.2 Use Cases

- **"Read this article and remember it"** → User pastes a URL → Elnor calls Firecrawl scrape → clean markdown → DOC72 entity extraction → knowledge nodes stored
- **"Summarize this court opinion"** → CourtListener URL → Firecrawl scrapes the opinion page → LLM summary → DOC72
- **"Research this expert witness"** → Elnor searches the web, then scrapes relevant profile pages, university pages, publication lists → builds dossier in DOC72
- **"Ingest this documentation site"** → Firecrawl crawls entire site → DOC18 LlamaIndex indexes the corpus
- **Q Browser Tier 2 extraction** → Significant page → Firecrawl provides clean content → DOC72 extraction LLM
- **SEC EDGAR HTML filings** → Firecrawl scrapes HTML-rendered filings directly
- **Legal blog research** → Scrape and extract from Law360, JD Supra, SCOTUS Blog

### 8.3 MCP Server

Use `firecrawl-mcp` (official). Register in OpenClaw's MCP config:

```json
{
  "mcpServers": {
    "firecrawl": {
      "command": "npx",
      "args": ["-y", "firecrawl-mcp"],
      "env": { "FIRECRAWL_API_KEY": "fc-local-dev-key" }
    }
  }
}
```

| Tool | Description |
|------|-------------|
| `firecrawl_scrape` | Convert any URL to clean markdown. Handles JS-rendered pages, web-hosted PDFs/DOCX. |
| `firecrawl_search` | Search the web and get full page content from results (not just snippets). |
| `firecrawl_crawl` | Scrape all pages of a website (async job). For corpus ingestion. |
| `firecrawl_map` | Discover all URLs on a website. Returns structured list. |
| `firecrawl_batch_scrape` | Scrape multiple URLs asynchronously. |

### 8.4 Setup

Self-hosted via Docker:

```bash
git clone https://github.com/firecrawl/firecrawl.git
cd firecrawl
docker-compose up -d
# Verify: curl http://localhost:3002/v2/scrape ...
# Register MCP server with OpenClaw
```

**Privacy:** Entirely local. No intermediary cloud. Satisfies attorney-client privilege and local-first requirements. Does not affect Q Browser page load times — activates only when explicitly called.

### 8.5 Integration Points

**Q Browser Tier 2 extraction (DOC20 §6.19):** Replaces unreliable DOM scraping as the content source for idle-time LLM entity extraction. The significance filtering and extraction prompt are unchanged — Firecrawl only improves input quality.

**Fallback:** If Firecrawl is unavailable (Docker not running), the pipeline falls back to DOM text extraction from the Electron BrowserView.

**DOC72 intake — URL-based extraction:** New intake surface `intake.web_url`. When Elnor is asked to read/summarize/remember a URL, calls Firecrawl scrape → clean markdown → DOC72 extraction pipeline → knowledge nodes with `source_url` provenance.

**DOC18 LlamaIndex corpus ingestion:** Firecrawl crawl endpoint feeds bulk site ingestion into the LlamaIndex sidecar for semantic retrieval.

### 8.6 Scope Boundary — Web Content vs Local Files

| Content source | Handler |
|---|---|
| URL to a web page | Firecrawl `scrape` |
| URL to a web-hosted PDF/DOCX | Firecrawl `scrape` (media parsing) |
| Local .docx file | OnlyOffice (editing) / MarkItDown (extraction) |
| Local .pdf file | PDF.js (viewing) / MarkItDown (extraction) / DOC Intelligence tiered system |
| OneDrive-synced file | Local file pipeline (already on disk) |
| PACER document download | Local file pipeline (downloaded to disk first) |

**Rule:** Once a file is on disk, the local pipeline owns it. Firecrawl only processes URLs. If a web-hosted document is scraped by Firecrawl AND later downloaded locally, the local extraction is canonical. DOC72 dedup resolves via URL-to-file-path matching.

**Cross-doc obligation:** DOC20 Addendum B and DOC Intelligence Spec should acknowledge that web-hosted documents may arrive through Firecrawl before or instead of local download.

### 8.7 Settings

```
Settings > Integrations > Firecrawl (Web Extraction)
├─ Status: ● Running (Docker, localhost:3002)
├─ [Restart Service] [Stop Service]
├─ Browser Intake Integration
│   ├─ ☑ Use Firecrawl for Q Browser Tier 2 extraction
│   └─ ☑ Fallback to DOM extraction if Firecrawl unavailable
├─ Knowledge Storage
│   ├─ ☑ Store extracted web content in knowledge graph
│   └─ ☑ Follow PDF/DOCX links found on pages (up to 5 per page)
├─ Crawl Limits
│   ├─ Max pages per crawl job: [200]
│   └─ Respect robots.txt: ☑
└─ Privacy
    └─ ☑ Self-hosted only (no cloud API)
```

### 8.8 Dependencies

- Docker Desktop on macOS
- MCP server (`firecrawl-mcp`) registered with OpenClaw
- No external API keys
- **Can build now — 1-2 hours**

---

## 9. Securities Litigation Data (FMP, EDGAR, Yahoo Finance, Quartr)

### 9.1 What It Does

Unified securities research capability: SEC filings, earnings call transcripts, stock price/volume data, insider trading records, institutional holdings, analyst estimates, press releases, and investor presentations. Designed for securities fraud litigation: class period analysis, loss causation, corrective disclosure identification, damages calculations, event studies, and lead plaintiff research.

| Source | Role | Cost |
|---|---|---|
| **Financial Modeling Prep (FMP) Ultimate** | Primary hub — filings, transcripts, prices, insiders, 13F, analysts, press releases | ~$50-80/month (firm expense) |
| **SEC EDGAR API** | Direct backup for SEC filings — free, authoritative | Free |
| **Yahoo Finance (yfinance)** | Free backup for stock prices | Free |
| **Quartr Pro** | Investor presentations, slide decks, conference materials — via Q Browser | ~$20/month |

### 9.2 Use Cases

**Case intake:** "Pull all 10-K, 10-Q, and 8-K filings for Brooge Energy from 2019 to 2024" → FMP. "Get the earnings call transcripts where the CEO discussed revenue guidance" → FMP transcripts. "Show me the stock price chart during the class period" → FMP historical prices.

**Loss causation and damages:** "Get daily closing prices and volumes for the class period" → FMP/Yahoo. "Pull S&P 500 daily returns for the event study" → FMP/Yahoo index data. "What happened to the stock price on the disclosure date?" → FMP intraday.

**Insider trading / scienter:** "Show all insider sales by officers during the class period" → FMP insider trading. "Did the CEO sell before the corrective disclosure?" → FMP insider search.

**Lead plaintiff:** "What institutions held shares as of Q4 2021?" → FMP 13F. "Track institutional ownership changes across the class period" → FMP 13F quarterly.

**Non-SEC materials:** "Pull the investor day presentation from 2021" → Quartr Pro (Q Browser skill). "Get the corrective disclosure press release" → FMP press releases.

**Cross-integration:** "Build a complete case file for White v. Brooge Energy" → FMP + PACER + Firecrawl + Quartr + DOC72.

### 9.3 Data Sources

#### 9.3.1 Financial Modeling Prep (FMP) Ultimate

Primary structured data hub. Use existing FMP MCP server.

| Tool | Description | Litigation Use |
|------|-------------|----------------|
| `fmp_sec_filings` | Search/retrieve SEC filings by ticker, date, type | Class period filings, alleged misstatements |
| `fmp_earnings_transcript` | Full earnings call transcript by ticker + quarter | Officer statements, corrective disclosures |
| `fmp_historical_prices` | Daily OHLCV + adjusted close, up to 30 years | Damages calculations, event studies |
| `fmp_intraday_chart` | 1-minute intraday price data | Event study around disclosure timestamps |
| `fmp_insider_trading` | Form 4 — insider name, role, transaction type, shares, price, date | Scienter evidence |
| `fmp_insider_search` | Search insider trades by company, name, or CIK | Officer trading patterns |
| `fmp_13f_holdings` | Institutional ownership by quarter | Lead plaintiff analysis |
| `fmp_analyst_estimates` | Consensus EPS/revenue estimates, price targets | Truth vs expectations analysis |
| `fmp_press_releases` | Company press releases with dates and full text | Corrective disclosure identification |
| `fmp_company_profile` | Company info, market cap, sector, officers | Case background |
| `fmp_financial_statements` | Income statement, balance sheet, cash flow | Restatement analysis |
| `fmp_stock_news` | News articles mentioning ticker | Media coverage |

**Coverage:** Transcripts for 8,000+ publicly traded U.S. companies. Transcripts and 13F holdings require the Ultimate tier. Setup: `export FMP_API_KEY="your-key"` + register MCP server.

#### 9.3.2 SEC EDGAR API (Direct)

Free, authoritative backup. No API key — requires User-Agent header only.

| Tool | Description |
|------|-------------|
| `edgar_submissions` | Full filing history by CIK |
| `edgar_company_facts` | XBRL financial data (structured) |
| `edgar_full_text_search` | Full-text search across EDGAR filings |

Rate limit: 10 requests/second. Setup: `export SEC_USER_AGENT="SchallFirm will@schallfirm.com"`.

#### 9.3.3 Yahoo Finance (yfinance)

Free backup for stock prices. No API key.

| Tool | Description |
|------|-------------|
| `yfinance_history` | Daily OHLCV, dividends, splits — full history |
| `yfinance_info` | Company info, market cap, sector |

Setup: `pip install yfinance --break-system-packages`.

#### 9.3.4 Quartr Pro (Browser-Accessed)

Investor presentations and conference materials. Accessed via Q Browser as a DOC3 learned skill.

**How it works:** Log into Quartr Pro in a Q Browser tab. Demonstrate the workflow once ("watch me find this company's investor presentations"). ELNOR captures via CDP observation adapter, stores the procedure in DOC72. From then on, Elnor navigates Quartr's web interface automatically when asked for investor presentations.

**Fallback:** For companies Quartr doesn't cover, Firecrawl scrapes the company's IR page directly.

### 9.4 DOC72 Integration

| Data type | DOC72 Node Kind | Stored? |
|---|---|---|
| SEC filing metadata | `work_product` (sec_filing) | Yes — compounds knowledge |
| Earnings transcript | `work_product` (earnings_transcript) | Yes — compounds knowledge |
| Stock prices/volumes | Not stored | Queried on demand |
| Insider trading records | `execution_trace` (insider_trade) | Yes — scienter evidence |
| Institutional holdings | Not stored | Queried on demand |
| Company profile | `world_entity` (public_company) | Yes |
| Press releases | `work_product` (press_release) | Yes |
| Analyst estimates | Not stored | Queried on demand |

**Storage policy:** Only store data that compounds knowledge. Prices and estimates are ephemeral lookups. Filings, transcripts, insider trades, and press releases build durable understanding.

### 9.5 Settings

```
Settings > Integrations > Securities Research
├─ Financial Modeling Prep
│   ├─ API Key: [****abcd            ] [Update]
│   ├─ Plan: Ultimate ● Active
│   ├─ ☑ SEC filings  ☑ Earnings transcripts  ☑ Historical prices
│   ├─ ☑ Insider trading  ☑ 13F holdings  ☑ Analyst estimates
│   ├─ ☑ Press releases
│   └─ ☑ Store filings and transcripts in knowledge graph
├─ SEC EDGAR (Direct)
│   ├─ Status: ● Active (no key required)
│   └─ ☑ Use as verification source
├─ Yahoo Finance
│   ├─ Status: ● Active (no key required)
│   └─ ☑ Use as backup price source
└─ Quartr Pro
    ├─ Status: ● Logged in via Q Browser
    └─ ☑ Allow Elnor to navigate Quartr automatically
```

### 9.6 Dependencies

| Component | Setup Time |
|---|---|
| FMP Ultimate API key + MCP server | 30 min |
| EDGAR API (User-Agent header) | 5 min |
| Yahoo Finance (pip install) | 5 min |
| Quartr Pro (subscribe + demonstrate) | 1 hour |

---

## 10. MarkItDown — Universal Document Extraction

### 10.1 What It Does

Microsoft's open-source document converter (91K+ GitHub stars). Converts 29+ file formats to clean, structured markdown optimized for LLMs. Replaces the patchwork of format-specific extraction tools (mammoth.js for DOCX, `getTextContent()` for PDFs, Tesseract for OCR) with one universal converter.

Like Firecrawl, MarkItDown is **infrastructure.** The user never interacts with it directly. Other subsystems call it when they need text extraction from any document type.

### 10.2 What It Replaces

| Current approach | MarkItDown replacement |
|---|---|
| mammoth.js for DOCX → HTML → text | MarkItDown converts DOCX → structured markdown |
| PDF.js `getTextContent()` for PDF text | MarkItDown converts PDF → structured markdown (preserves headings, tables, lists) |
| Tesseract.js for OCR (placeholder, not wired up) | MarkItDown OCR plugin (LLM vision-based, higher accuracy) + Tesseract.js as local fallback |
| No PPTX extraction | MarkItDown converts PPTX → markdown |
| No XLSX text extraction | MarkItDown converts XLSX → markdown tables |
| No audio transcription | MarkItDown converts audio → text (via Whisper integration) |

**Critical distinction:** MarkItDown does NOT replace PDF.js for **viewing** PDFs in Q. PDF.js is the visual renderer. MarkItDown is the text extraction backend. They serve different jobs:

| Job | Tool |
|---|---|
| Visual PDF rendering in Q viewer | PDF.js (keep) |
| Text extraction for LLMs and DOC72 | MarkItDown (primary) |
| OCR for scanned PDFs (server-side) | MarkItDown OCR plugin (primary) |
| OCR for scanned PDFs (browser fallback) | Tesseract.js (lightweight fallback) |
| DOCX/PPTX/XLSX/audio extraction | MarkItDown (universal) |
| DOCX editing | OnlyOffice / Word Online (unchanged) |

### 10.3 MCP Server

Microsoft maintains the official MarkItDown MCP server:

```json
{
  "mcpServers": {
    "markitdown": {
      "command": "docker",
      "args": ["run", "-i", "--rm", "-v", "/Users/OpenClaw1:/workdir", "markitdown-mcp:latest"]
    }
  }
}
```

| Tool | Description |
|------|-------------|
| `convert_to_markdown` | Convert any file or URL to structured markdown. Accepts `file:`, `http:`, `https:`, and `data:` URIs. |

One tool, universal coverage. The URI scheme determines what gets converted.

### 10.4 Integration Points

**DOC72 intake pipeline:** Every surface that ingests documents (Browser, Notes, Document Viewer, Email, Tasks) can call MarkItDown as the universal text extraction step before running the entity extraction LLM. The extraction prompt receives clean structured markdown instead of raw text blobs.

**DOC Intelligence Spec:** MarkItDown provides the Tier 2 (text-only) and Tier 3 (summary) extraction that feeds the tiered context system. Replaces the current PDF.js `getTextContent()` path with higher-quality structured output.

**DOC16 M365 integration:** The M365 spec already references MarkItDown for search indexing of DOCX files (DOC16 Entry 16.7, line 46). MarkItDown handles the "read and understand" path; OnlyOffice/Word Online handles the "edit and produce" path. No conflict.

**DOC3 R11.3 observation pipeline:** When Elnor observes a demonstration involving documents, MarkItDown can extract the document content that the interpretation LLM needs to understand what the user is doing.

### 10.5 OCR Architecture

See companion document **"Document Intelligence & Extraction Pipeline Proposal"** for full OCR pipeline specification. Summary:

- **Primary OCR:** MarkItDown OCR plugin (LLM vision-based, higher accuracy, server-side)
- **Browser fallback:** Tesseract.js (local, free, runs in Electron renderer, ~80-90% accuracy on clean scans)
- **Output:** Extracted text written back as PDF text layer (via pdf-lib) + stored in DOC72
- **Result:** PDF becomes searchable in the viewer. Any LLM can RAG against the text. No re-OCR needed.

### 10.6 Setup

```bash
# Option 1: Docker (recommended)
docker pull markitdown-mcp:latest

# Option 2: pip
pip install 'markitdown[all]' --break-system-packages

# Register MCP server with OpenClaw
```

### 10.7 Settings

```
Settings > Integrations > MarkItDown (Document Extraction)
├─ Status: ● Running
├─ Extraction Sources
│   ├─ ☑ Use for DOC72 document intake (replaces mammoth.js/PDF text extraction)
│   ├─ ☑ Use for DOC Intelligence Spec Tier 2/3 extraction
│   └─ ☑ Use for Q Browser Tier 2 URL extraction (alongside Firecrawl)
├─ OCR
│   ├─ Primary: MarkItDown OCR plugin (LLM vision)
│   ├─ Fallback: Tesseract.js (local, browser-side)
│   └─ ☑ Write OCR text back into PDF text layer
└─ Supported Formats
    └─ PDF, DOCX, PPTX, XLSX, HTML, images, audio (29+ formats)
```

### 10.8 Dependencies

- Docker or Python 3.10+
- MCP server registered with OpenClaw
- No API keys (self-hosted)
- **Can build now — 30 minutes**

---

## 11. DOC24 Routing Infrastructure — How Elnor Knows When to Use These Tools

All three integrations share the same routing problem: Elnor needs to know when to use them without the user explicitly saying "use Firecrawl" or "use FMP." DOC24's routing cascade (§13) handles this through three layers.

### 11.1 Verb-Family Patterns (§13.2A additions)

Add to the `VERB_FAMILY_SEEDS` array:

```typescript
// Financial research
{
  family: "financial_research",
  patterns: [
    "stock price", "share price", "trading volume", "market cap",
    "SEC filing", "10-K", "10-Q", "8-K", "annual report", "quarterly report",
    "earnings call", "earnings transcript", "conference call",
    "insider trading", "insider sales", "Form 4",
    "institutional holdings", "13F", "institutional ownership",
    "analyst estimate", "price target", "consensus",
    "press release", "corrective disclosure",
    "investor presentation", "investor day", "capital markets day",
    "class period", "loss causation", "event study", "damages analysis",
    "EDGAR", "SEC", "filing history",
  ],
  implicit_action_id: "financial_research.query",
  schema_version: 1,
},

// Web research (broader than financial)
{
  family: "web_research",
  patterns: [
    "scrape", "extract from website", "get page content",
    "crawl site", "read this webpage", "read this URL",
    "what does this page say", "summarize this URL",
    "index this site", "research this topic online",
  ],
  implicit_action_id: "web_research.scrape",
  schema_version: 1,
},
```

These fire at Tier 1 (deterministic, <40ms) and tell the router which tool pack to mount.

### 11.2 Tool Packs (§16 additions)

```typescript
// Securities research — JIT mount on financial_research verb family
const securitiesResearchPack: ToolPackDefinition = {
  pack_id: "securities_research_pack",
  display_name: "Securities Research",
  mount_policy: "jit",
  sticky_duration_turns: 10,
  tools: [
    "fmp_sec_filings", "fmp_earnings_transcript", "fmp_historical_prices",
    "fmp_intraday_chart", "fmp_insider_trading", "fmp_insider_search",
    "fmp_13f_holdings", "fmp_analyst_estimates", "fmp_press_releases",
    "fmp_company_profile", "fmp_financial_statements", "fmp_stock_news",
    "edgar_submissions", "edgar_company_facts",
    "yfinance_history",
  ],
  mount_triggers: [
    { type: "verb_family", family: "financial_research" },
    { type: "entity_kind", kind: "public_company" },
    { type: "domain_tag", tag: "securities_litigation" },
  ],
  schema_version: 1,
};

// Web extraction — JIT mount on web_research verb family
const webExtractionPack: ToolPackDefinition = {
  pack_id: "web_extraction_pack",
  display_name: "Web Content Extraction",
  mount_policy: "jit",
  sticky_duration_turns: 5,
  tools: [
    "firecrawl_scrape", "firecrawl_search", "firecrawl_crawl",
    "firecrawl_map", "firecrawl_batch_scrape",
  ],
  mount_triggers: [
    { type: "verb_family", family: "web_research" },
    { type: "url_in_message", pattern: "https?://" },
  ],
  schema_version: 1,
};
```

The securities pack also mounts when the active entity is a `public_company` or the current project is tagged `securities_litigation` — so when you're working in a case workspace, the financial tools are already available before you ask.

### 11.3 Capability Registry Entries (§14 additions)

```typescript
const newRegistryEntries: ActionRegistryEntry[] = [
  // === WEB EXTRACTION ===
  {
    action_id: "web_research.scrape",
    domain: "web_research",
    display_name: "Firecrawl — Web Content Extraction",
    user_goal: "Extract clean content from a web page or site",
    description: "Scrape any URL to clean markdown. Handles JS-rendered pages, web-hosted PDFs/DOCX.",
    stability_class: "stable_action",
    agent_invocable: true,
    invocation_bindings: [
      { transport: "ec_mcp_tool", binding_name: "firecrawl_scrape", readiness: "ready",
        client_exposure: { elnor_native: true, mcp_external: false, q_ui: false }, schema_version: 1 },
    ],
    confirmation_policy: "none",
    safety_class: "read",
    aliases: ["scrape", "web scrape", "extract page", "crawl"],
    common_phrases: ["read this URL", "scrape this page", "get the content from this site"],
    codegen_source: "companion_registration",
    schema_version: 1,
  },

  // === SECURITIES RESEARCH (representative entries) ===
  {
    action_id: "financial_research.sec_filings",
    domain: "financial_research",
    display_name: "SEC Filings Lookup",
    user_goal: "Find and retrieve SEC filings for a company",
    description: "Search SEC filings by ticker, date range, and filing type.",
    stability_class: "stable_action",
    agent_invocable: true,
    invocation_bindings: [
      { transport: "ec_mcp_tool", binding_name: "fmp_sec_filings", readiness: "ready",
        client_exposure: { elnor_native: true, mcp_external: false, q_ui: false }, schema_version: 1 },
      { transport: "ec_mcp_tool", binding_name: "edgar_submissions", readiness: "ready",
        client_exposure: { elnor_native: true, mcp_external: false, q_ui: false }, schema_version: 1 },
    ],
    confirmation_policy: "none",
    safety_class: "read",
    aliases: ["SEC filings", "EDGAR search", "10-K", "10-Q", "8-K"],
    common_phrases: ["pull the 10-K", "find SEC filings", "get the annual report"],
    codegen_source: "companion_registration",
    schema_version: 1,
  },
  {
    action_id: "financial_research.earnings_transcript",
    domain: "financial_research",
    display_name: "Earnings Call Transcript",
    user_goal: "Get the full transcript of an earnings call",
    description: "Full earnings call transcript by ticker and quarter/year.",
    stability_class: "stable_action",
    agent_invocable: true,
    invocation_bindings: [
      { transport: "ec_mcp_tool", binding_name: "fmp_earnings_transcript", readiness: "ready",
        client_exposure: { elnor_native: true, mcp_external: false, q_ui: false }, schema_version: 1 },
    ],
    confirmation_policy: "none",
    safety_class: "read",
    aliases: ["earnings transcript", "conference call", "earnings call"],
    common_phrases: ["get the earnings call", "what did the CEO say", "pull the transcript"],
    codegen_source: "companion_registration",
    schema_version: 1,
  },
  {
    action_id: "financial_research.stock_prices",
    domain: "financial_research",
    display_name: "Historical Stock Prices",
    user_goal: "Get historical stock price and volume data",
    description: "Daily OHLCV with dividend-adjusted prices and intraday data.",
    stability_class: "stable_action",
    agent_invocable: true,
    invocation_bindings: [
      { transport: "ec_mcp_tool", binding_name: "fmp_historical_prices", readiness: "ready",
        client_exposure: { elnor_native: true, mcp_external: false, q_ui: false }, schema_version: 1 },
      { transport: "ec_mcp_tool", binding_name: "yfinance_history", readiness: "ready",
        client_exposure: { elnor_native: true, mcp_external: false, q_ui: false }, schema_version: 1 },
    ],
    confirmation_policy: "none",
    safety_class: "read",
    aliases: ["stock price", "price history", "trading data", "OHLCV"],
    common_phrases: ["stock price on March 15", "daily volumes", "price during class period"],
    codegen_source: "companion_registration",
    schema_version: 1,
  },
  {
    action_id: "financial_research.insider_trading",
    domain: "financial_research",
    display_name: "Insider Trading Records",
    user_goal: "Find insider buying and selling activity",
    description: "Form 4 insider trading by company or insider name.",
    stability_class: "stable_action",
    agent_invocable: true,
    invocation_bindings: [
      { transport: "ec_mcp_tool", binding_name: "fmp_insider_trading", readiness: "ready",
        client_exposure: { elnor_native: true, mcp_external: false, q_ui: false }, schema_version: 1 },
    ],
    confirmation_policy: "none",
    safety_class: "read",
    aliases: ["insider trading", "insider sales", "Form 4"],
    common_phrases: ["insider sales during class period", "did the CEO sell shares"],
    codegen_source: "companion_registration",
    schema_version: 1,
  },
  {
    action_id: "financial_research.institutional_holdings",
    domain: "financial_research",
    display_name: "Institutional Holdings (13F)",
    user_goal: "Find institutional ownership of company shares",
    description: "13F institutional holdings by quarter.",
    stability_class: "stable_action",
    agent_invocable: true,
    invocation_bindings: [
      { transport: "ec_mcp_tool", binding_name: "fmp_13f_holdings", readiness: "ready",
        client_exposure: { elnor_native: true, mcp_external: false, q_ui: false }, schema_version: 1 },
    ],
    confirmation_policy: "none",
    safety_class: "read",
    aliases: ["13F", "institutional holdings", "institutional ownership"],
    common_phrases: ["who held shares", "institutional ownership in Q4", "largest holders"],
    codegen_source: "companion_registration",
    schema_version: 1,
  },
  {
    action_id: "financial_research.analyst_estimates",
    domain: "financial_research",
    display_name: "Analyst Estimates",
    user_goal: "Get analyst consensus estimates and price targets",
    description: "Consensus EPS/revenue estimates and price targets.",
    stability_class: "stable_action",
    agent_invocable: true,
    invocation_bindings: [
      { transport: "ec_mcp_tool", binding_name: "fmp_analyst_estimates", readiness: "ready",
        client_exposure: { elnor_native: true, mcp_external: false, q_ui: false }, schema_version: 1 },
    ],
    confirmation_policy: "none",
    safety_class: "read",
    aliases: ["analyst estimates", "consensus", "price target"],
    common_phrases: ["analyst expectations", "consensus estimate before disclosure"],
    codegen_source: "companion_registration",
    schema_version: 1,
  },
  {
    action_id: "financial_research.press_releases",
    domain: "financial_research",
    display_name: "Company Press Releases",
    user_goal: "Find press releases issued by a company",
    description: "Press releases by ticker and date range.",
    stability_class: "stable_action",
    agent_invocable: true,
    invocation_bindings: [
      { transport: "ec_mcp_tool", binding_name: "fmp_press_releases", readiness: "ready",
        client_exposure: { elnor_native: true, mcp_external: false, q_ui: false }, schema_version: 1 },
    ],
    confirmation_policy: "none",
    safety_class: "read",
    aliases: ["press release", "company announcement"],
    common_phrases: ["press release on March 15", "corrective disclosure"],
    codegen_source: "companion_registration",
    schema_version: 1,
  },

  // === DOCUMENT EXTRACTION ===
  {
    action_id: "document.convert_to_markdown",
    domain: "document_processing",
    display_name: "MarkItDown — Document Extraction",
    user_goal: "Extract text content from a document file",
    description: "Convert any document (PDF, DOCX, PPTX, XLSX, images, audio) to structured markdown.",
    stability_class: "stable_action",
    agent_invocable: true,
    invocation_bindings: [
      { transport: "ec_mcp_tool", binding_name: "convert_to_markdown", readiness: "ready",
        client_exposure: { elnor_native: true, mcp_external: false, q_ui: false }, schema_version: 1 },
    ],
    confirmation_policy: "none",
    safety_class: "read",
    aliases: ["extract text", "convert document", "read file", "OCR"],
    common_phrases: ["extract text from this PDF", "convert to markdown", "OCR this document"],
    codegen_source: "companion_registration",
    schema_version: 1,
  },
];
```

### 11.4 DOC72 Knowledge Storage (§6.3 additions)

| Integration | DOC72 Domain Tag | Node Types Used |
|---|---|---|
| Firecrawl | `web_content` | work_product (web_page, web_article) |
| FMP / EDGAR | `securities`, `litigation` | work_product (sec_filing, earnings_transcript, press_release), world_entity (public_company), execution_trace (insider_trade) |
| Yahoo Finance | — | Not stored (on-demand) |
| Quartr | `securities`, `litigation` | work_product (investor_presentation) |
| MarkItDown | — | No own nodes — feeds extraction pipeline for other surfaces |

### 11.5 Elnor System Awareness (§6.4 additions)

Add to SOUL.md / system context:

```
WEB EXTRACTION: Firecrawl for deep web content extraction. Scrape any URL to clean 
  markdown. Search the web with full page content. Crawl entire sites for corpus 
  ingestion. Handles JS-rendered pages, web-hosted PDFs/DOCX. All local.

SECURITIES RESEARCH: Full securities litigation data. SEC filings via FMP and EDGAR. 
  Earnings call transcripts. Historical stock prices and volumes (daily + intraday). 
  Insider trading records (Form 4). Institutional holdings (13F). Analyst estimates. 
  Press releases. Investor presentations via Quartr. Use proactively for active 
  securities cases — pull price data around disclosures, check insider trading during 
  class periods, cross-reference filings with price movements.

DOCUMENT EXTRACTION: MarkItDown converts any file (PDF, DOCX, PPTX, XLSX, images, 
  audio) to structured markdown. Use when you need to read, analyze, or extract 
  information from a document. Results are clean structured text optimized for analysis.
```

### 11.6 Cross-Integration Procedures (§6.5 additions)

| Procedure | Integrations Used |
|---|---|
| "Build a case file for [Company]" | FMP (filings + transcripts + prices + insiders + 13F) + PACER (court docs) + Firecrawl (IR page) + Quartr (presentations) + DOC72 |
| "Analyze loss causation for [date]" | FMP (daily prices + volumes + S&P 500) + DOC72 |
| "Check insider trading during class period" | FMP (insider trading by date range) + DOC72 |
| "Read this and remember it" + URL | Firecrawl (scrape) + DOC72 (entity extraction) |
| "Index this documentation site" | Firecrawl (crawl) + DOC18 (LlamaIndex) |
| "Extract text from this scanned filing" | MarkItDown (OCR) + DOC72 (entity extraction) |
| "Morning briefing" | PACER + Calendar + Email + Reminders + Firecrawl (flagged legal news) |

---

## 12. Updated Implementation Priority (§7 replacement)

| Integration | Build Now? | Effort | Dependencies |
|---|---|---|---|
| Apple Reminders | ✅ Yes | 30 min | macOS only |
| Philips Hue | ✅ Yes | 1-2 hrs | Hue Bridge |
| **MarkItDown** | **✅ Yes** | **30 min** | **Docker or Python** |
| **Firecrawl** | **✅ Yes** | **1-2 hrs** | **Docker** |
| Screenshots + OCR | ✅ Yes | 1-2 hrs | Electron native |
| YouTube MCP | ✅ MCP setup now | 1 hr | DOC72 for storage |
| **FMP Securities Research** | **✅ Yes** | **2-3 hrs** | **FMP Ultimate API key** |
| **EDGAR Direct** | **✅ Yes** | **30 min** | **None** |
| **Yahoo Finance** | **✅ Yes** | **15 min** | **Python** |
| PACER Phase 1 | ✅ Phase 1 now | 4-6 hrs | Full features need DOC72 |
| Zoom | ⚠️ OAuth now | 1 hr | Processing needs DOC3 |
| Teams | ⚠️ Already in M365 | Minimal | Uses existing M365 auth |
| **Quartr Pro** | **⚠️ After DOC3** | **1 hr** | **DOC3 demonstration capture** |
| DOC24 registry | ⚠️ Register now | 2 hrs | Full routing needs DOC24 |

**Recommended build order:**
1. MarkItDown (infrastructure — improves every document intake path immediately)
2. Firecrawl (infrastructure — improves every web intake path)
3. Apple Reminders (trivial, immediate value)
4. Hue (fun, immediate value)
5. FMP + EDGAR + Yahoo Finance (securities research core)
6. Screenshots + OCR
7. YouTube MCP
8. PACER Phase 1
9. Quartr Pro (after DOC3 R11.3 is stable — needs demonstration capture)
10. Zoom/Teams OAuth registration

---

## Cross-Document Obligations Generated

| Target Doc | Obligation |
|---|---|
| DOC11 | Add "Integration Capabilities Checklist" confirming which search/browser/data integrations are active in OpenClaw |
| DOC20 Addendum B | Note that web-hosted documents may arrive through Firecrawl; dedup rule for local vs web-sourced |
| DOC20 | MarkItDown replaces mammoth.js and PDF.js `getTextContent()` for DOC72 intake extraction |
| DOC Intelligence Spec | MarkItDown as Tier 2/3 extraction backend; add `retrieve_document_pages` tool (see companion document) |
| DOC24 §13.2A | Add `financial_research` and `web_research` verb families |
| DOC24 §14 | Add registry entries for all new integrations |
| DOC24 §16 | Add `securities_research_pack` and `web_extraction_pack` tool packs |
| DOC72 | Register new `work_product` subtypes (sec_filing, earnings_transcript, press_release, investor_presentation) |