CAPABILITY_NOTE_EXTERNAL_AGENT_CLI_ADAPTERS.md
OP-A and Operations and Trackers/CAPABILITY_NOTE_EXTERNAL_AGENT_CLI_ADAPTERS.md
ELNOR REPO READER TEXT MIRROR Original path: OP-A and Operations and Trackers/CAPABILITY_NOTE_EXTERNAL_AGENT_CLI_ADAPTERS.md Source repo: /Users/OpenClaw1/Elnor/Elnor Specs Git branch: main Git commit: dbaa25962edc11ab30e8d4ca1715f9ae5bf77331 Generated: 2026-06-09T01:23:58.539Z --- # Capability Note — External Agent CLI/SDK Adapters (Codex · Claude · Gemini) **Date:** 2026-06-04 · **Status:** capability note for **workstream K (Capability Registry Sweep)** pickup — not a spec; owning home assigned at build planning (likely an EC/DOC11 runtime adapter with DOC24 delivery seams). **Architect:** approved for logging 2026-06-04. ## 1. The capability ELNOR drives external coding/review agents **headlessly**, using Will's existing subscriptions — no human courier between agents: | Engine | Headless interface | Auth source | Verified | |---|---|---|---| | OpenAI Codex | `codex exec "<prompt>"` (one-shot) · `codex exec resume` (multi-turn, context kept) · `@openai/codex-sdk` (TS, npm 0.137.0) | ChatGPT Pro login (`codex login` → `~/.codex/auth.json`, auto-refresh) | **2026-06-04 FULL LOOP** — headless run from the Cowork sandbox on the architect's ChatGPT-plan auth (`LOOP_CONFIRMED`, no API key); §3.1 auth copy done | | Claude (Code engine) | `claude -p "<prompt>"` (print mode) · `@anthropic-ai/claude-agent-sdk` (TS, npm 0.3.162 — ELNOR is Node: native embed) | Claude subscription login (Claude Code already authenticated on the Mac; `claude setup-token` for long-lived headless) | In daily manual use; headless mode standard | | Gemini (optional 3rd reviewer) | `gemini` CLI | Google account | Not yet wired | **Communication pattern (proven in spec ops):** the file bus — commissions written into the repo, agent runs against the repo as working directory, reports written back into the repo (e.g. `Reviews/`). Multi-turn dialogue available via session resume. Any agent CLI that can run shell commands can drive any other — the pattern is symmetric (Codex can invoke `claude -p`; Claude can invoke `codex exec`). **What is NOT drivable (do not design against):** the consumer **apps** — ChatGPT desktop app, Claude desktop app chats, and **Cowork** sessions have no programmatic interface. ELNOR talks to the *engines* (CLIs/SDKs), which run on the same subscriptions and models; the apps remain the human cockpits. Everything Cowork-class work needs (agentic file access, tools, MCP) is reachable via the Claude Agent SDK / Claude Code engine. ## 2. Governance hooks — already specced (do not re-invent at build) Delegating to an external agent **is egress**: E0 §22 destination classes (`cloud_api`, `agent_messaging`) + E0 §3.3 `DelegationMFC`; DOC81 R2 governs what memory content may ride along (typed destination + destination-specific policy decision + send-time gate; `local_file_export` rules apply to artifacts the agents write). The build-time adapter must register as an egress route under DOC5's controls. Phase-1 *spec-ops* use (red-team/audit orchestration from Cowork) runs outside the ELNOR runtime and is governed by the architect directly. ## 3. Architect setup/auth checklist (the "what Will must do" list) 1. **Codex / Mac:** `codex login` once (ChatGPT Pro). **Codex / Cowork sandbox:** copy `~/.codex/auth.json` → `/Users/OpenClaw1/Elnor/.codex-auth/auth.json` — inside the mount, **outside the repo** (never committed, never synced; refresh the copy if Codex auth is revoked/re-issued). 2. **Claude / Mac:** already logged in (Claude Code). If ELNOR embeds the Agent SDK headlessly and hits auth limits, run `claude setup-token` once and store alongside (same `.codex-auth/`-style non-repo folder, e.g. `.claude-auth/`). 3. **Gemini (when wanted):** install Gemini CLI + Google login. 4. **No API keys needed** for the subscription paths above. Raw API embedding (OpenAI/Anthropic Messages APIs) is a separate, per-token-billed decision — record it per adapter if ever chosen. 5. **Version pinning:** the adapter must record + check CLI/SDK versions (`codex-cli 0.137.0`, `claude-code 2.1.162`, `codex-sdk 0.137.0`, `agent-sdk 0.3.162` at note date) — headless flags drift across versions. 6. **Sandbox flags per task:** audits run `codex exec -s read-only` (or `workspace-write` scoped to the report folder); Claude headless runs with explicit permission modes. Default least-privilege. 6a. **Reasoning effort:** audit/review runs use **`-c model_reasoning_effort=xhigh`** as the standing default (valid Codex levels, ascending: `none, minimal, low, medium, high, xhigh`). Architect directive 2026-06-08 — no reason to under-spend reasoning on a correctness audit. Lower levels only for trivial mechanical confirmations where latency matters. 6b. **Detached-run hygiene (rig lessons, 2026-06):** poll the `.DONE` sentinel (not process-grep, which self-matches the poller); match the running process by binary path (`\.local/bin/codex`) not the bare string. 6c. **Cowork-sandbox execution limit (DIAGNOSED 2026-06-08 — important).** Heavy unattended runs **cannot complete inside the Cowork bash sandbox**: each bash call is an ephemeral sandbox, so a `setsid`/`disown`-detached `codex` child is **torn down at the launching call's boundary** (verified: log grew 779B→228KB while the launch call was open, then froze at the exact byte count the instant the call returned, process gone); the foreground path is capped at ~44s. Net: in Cowork, the bridge is reliable only for runs that FIT IN ONE ~40s call (quick fidelity/mechanical checks), or with the architect present to keep the session warm — NOT for multi-minute `xhigh` design reviews. **This is a sandbox-lifecycle limit, NOT present in the build environment:** when ELNOR runs on the Mac, a spawned `codex exec` / Agent-SDK call is a normal long-lived child process of the ELNOR runtime with no call-boundary teardown and no time cap. **Therefore the DOC11 adapter MUST provide durable/resumable job execution** (spawn → persist job handle → poll/await → capture report), explicitly NOT fire-and-forget; orphaned-session recovery + `codex exec resume` are the failure-path contract. The bridge concept is validated (subscription auth, `xhigh`, real review output produced); only the Cowork host can't sustain it. For Cowork spec-ops, heavy external reviews route to the architect's ChatGPT/Claude apps; the orchestrator does direct verification. 7. **Limits & queueing:** subscription plans rate-limit; the adapter queues (concurrency 1–2) and surfaces stalls rather than spinning. 8. **Secrets hygiene:** auth material never enters the repo, logs, reports, or memory files; never echoed to output. ## 4. Current operational use (pre-build) The Cowork orchestrator (this assistant) runs the Stage-6+ red-team/audit loop directly: writes the commission → `codex exec` against the repo mount → reads the report from `Reviews/` → adjudicates. First production use: the DOC81 R2 application-fidelity audit round (2026-06). This rig is the working prototype of the ELNOR adapter. **Loop verified end-to-end 2026-06-04** on the architect's subscription auth. ## 5. Engine tiers, billing, and auth precedence (architect-reviewed 2026-06-04) | Tier | Engine path | Billing | Use | |---|---|---|---| | **T1 — default for all agentic work** | Claude **Agent SDK** / Claude Code engine on **Claude-plan OAuth**; Codex CLI/SDK on **ChatGPT-plan OAuth** | Flat subscription quota — **no per-token fees**; individual-use license (Phase-1 personal OS = the intended case) | Drafting, repo ops, audits, adjudication, anything Cowork/Code-shaped — most ELNOR agent roles | | **T2 — explicit overflow/scale** | Raw APIs (Anthropic Messages / OpenAI) | Per-token, opt-in per task policy | Unattended bulk beyond plan rate windows; high-volume structured micro-calls; **all Phase-2 other-user traffic** | | **T3 — local** | Local models | — | Per the existing local-first architecture | **Hard rules:** 1. **Auth precedence gotcha:** an exported `ANTHROPIC_API_KEY` **silently overrides** subscription OAuth (verified vs. current docs 2026-06-04) — API keys stay OUT of the default agent environment; the adapter **asserts the expected billing path before every run** (build lint: `engine.billing_path_mismatch`). 2. **Plan limits:** queue (concurrency 1–2) and surface stalls; spill-to-API only via an explicit policy toggle — never silently. 3. **Phase-2 multi-principal:** other users authenticate themselves or ride T2 — never the architect's subscription (license bound + DOC81 R-5 alignment). 4. Per-engine auth artifacts live in non-repo folders (`/Users/OpenClaw1/Elnor/.codex-auth/` pattern; `.claude-auth/` when needed) — never committed, logged, or echoed. ## 6. Seen-in-ELNOR (settings + UX stub — input to the DOC11 proposal) - **Engines settings panel** (DOC20 surface; DOC11 owns the contracts): per agent role — engine (T1/T2/T3), model, **think level** (thinking-token budget), permission mode + tool allowlist, plan-usage meter, and an **auth-source indicator** (subscription vs API — surfaced precisely because of hard rule 1). - **Run provenance:** every agent run record stamps engine + model + think level + billing path; Inspector renders it (extends DOC11's usage-telemetry / runtime-truth conventions). - **EC gating:** the SDK's **permission hooks** are the EC policy-evaluator socket — every agent action passes the DOC81 R2 gates; delegation egress per E0 §22 (`cloud_api` / `agent_messaging`) + `DelegationMFC`. Reference, never redefine. - **Spec homes:** **DOC11** (adapter contracts, routing, model controls, telemetry — proposal prompt issued 2026-06-04, see `Active Working and Red Team/Instructions and Prompts/DOC11_External_Agent_Engine_Adapter_Proposal_Prompt.md`); **DOC13** (cost/billing-tier obligations); **DOC12** (inter-agent/ACP seam); **DOC20** (panel render). Workstream K registry entry = this note.