DOC13_Costs_R1.md
Current Specs/DOC13/DOC13_Costs_R1.md
ELNOR REPO READER TEXT MIRROR
Original path: Current Specs/DOC13/DOC13_Costs_R1.md
Source repo: /Users/OpenClaw1/Elnor/Elnor Specs
Git branch: main
Git commit: dbaa25962edc11ab30e8d4ca1715f9ae5bf77331
Generated: 2026-06-09T01:23:58.539Z
---
# DOC13 — Unified Cost Tracking, Budget Enforcement, and Automatic Pricing Research
**Version:** R1
**Status:** Proposed companion spec
**Companion ledger additions:** Merge into DOC10 Orchestration Integration Ledger R8 (and future versions)
**Related docs:** DOC10 R9, DOC11 V2, DOC12 R1, DOC7 R5, DOC4 OpenClaw Bridge, DOC8 v1.11.4, OpenClaw native cron
**Prior version:** DOC13 R0 (v1.1 Draft)
**Red-team reviewers:** Gemini, Claude, Claude Code
**Revision summary:** Complete rewrite addressing R0's missing token ingestion path, pre-dispatch-only enforcement gap, CostResearchAgent brittleness concerns, latency optimization, and cross-doc integration requirements. Incorporates session binding wire shapes from DOC12 gateway protocol work.
---
## Why This Document Exists
Costs are currently decorative in Q and scattered across chats, rooms (DOC12), panels, forums, tasks, subtasks, projects, and global views. There is no single source of truth, no automatic updates, and no enforcement. LLM operations have unbounded runtime cost — an agent entering a tool-use loop or generating a massive response can blow past any budget that is only checked before dispatch.
This spec creates one canonical registry, one calculation engine, one enforcement point in DOC10, and one automatic research job powered by OpenClaw cron + a dedicated CostResearchAgent. All surfaces become consumers only. The system stays honest, budgets are enforced with zero added latency in normal operation, and pricing stays current without manual effort after initial setup.
### What Changed from R0
R0 had four critical gaps identified by the red team:
1. **No token ingestion mechanism.** R0 defined how to calculate projected costs and how to store events, but never specified where actual token counts come from after execution. R1 defines the full ingestion path from DOC11 reverse telemetry through to cost event recording.
2. **Pre-dispatch-only enforcement is insufficient.** R0's "hard enforcement at dispatch time" (§0.4) cannot prevent budget overruns from long generations, tool-use loops, or concurrent operations. R1 adds a three-layer enforcement model: in-memory budget check, dynamic max_tokens injection, and mid-stream circuit breakers.
3. **CostResearchAgent web scraping brittleness.** R0's design relied on an LLM agent navigating and parsing complex provider pricing pages as the sole pricing source. R1 restructures the research pipeline to use machine-readable JSON registries as the fast/cheap first pass, with the full CostResearchAgent as the autonomous fallback for models not in registries, new model launches, and registry failures. Both paths are autonomous — no human approval required.
4. **Zero-latency enforcement design.** R0's enforcement model would have added a synchronous pre-dispatch gate on every operation. R1's tiered enforcement runs the cost check in-memory (sub-millisecond), adds zero latency in normal operation (below 80% budget utilization), and only introduces user-facing gates when approaching budget limits.
---
## 0) Guiding Principles
0.1 **One registry, many consumers.** `cost_registry.json` is the single source of truth for pricing. All surfaces read from it. Only the CostResearchAgent, deterministic JSON fetch, and manual overrides write to it.
0.2 **Cron is set-and-forget after one-time enablement in Q.**
0.3 **The CostResearchAgent is autonomous.** It updates pricing without user approval. Anomaly bounds catch catastrophic misreads (§2.6). Change notifications provide passive visibility (§2.7). Manual override is always available as a correction mechanism, never as a required approval step.
0.4 **Enforcement is tiered and zero-latency in normal operation.** The budget check is an in-memory comparison that takes sub-millisecond. It runs on every dispatch but only escalates enforcement actions (warnings, blocks, modals) when budget utilization warrants it (§3.3).
0.5 **Costs are hierarchical and always in USD.** Global → per-mode → per-project → per-task → per-room. Any scope's cap can block dispatch independently.
0.6 **Research job never blocks user operations.** Pricing research runs on cron or manual trigger. It never gates or delays dispatch.
0.7 **Actual costs replace estimates.** Pre-dispatch estimates are planning tools. Post-execution token counts from Gateway reverse telemetry are the source of truth for budget tracking. Estimates are replaced by actuals as soon as reverse telemetry arrives.
0.8 **Provider-side limits are a valid backstop.** Most providers (Anthropic, OpenAI) enforce their own spend limits. DOC13's enforcement is defense-in-depth, not the sole safety mechanism. This means enforcement design should prioritize avoiding false blocks over catching every edge case.
---
## 1) Canonical Storage and Schemas
### 1.1 Central Files
All under `ELNOR_MEMORY/system/`:
| File | Purpose | Write authority |
|---|---|---|
| `cost_registry.json` | Current pricing snapshot (atomic read/write) | CostResearchAgent, JSON fetch, manual override |
| `cost_registry_events.jsonl` | Audit log of every registry change | Append-only by registry manager |
| `cost_overrides.json` | Manual overrides (highest priority) | User via Q |
| `cost_tracking_events.jsonl` | Every cost event (actuals, reservations, releases) | Cost tracker module |
| `cost_caps.json` | Budget configuration per scope | User via Q |
| `cost_running_totals.json` | Current period spend per scope (rebuilt from events on startup) | Cost tracker module |
### 1.2 Core Schemas
Add to `packages/contracts/src/schemas.ts`:
```ts
// ── Pricing Registry ──
const CostModelEntrySchema = z.object({
provider: z.string(), // "anthropic", "openai", "google", etc.
model_id: z.string(), // "claude-sonnet-4-5", "gpt-4o", etc.
display_name: z.string().max(100), // Human-readable name for Q
input_cost_per_1m: z.number().min(0), // USD per 1M input tokens
output_cost_per_1m: z.number().min(0), // USD per 1M output tokens
context_window_tokens: z.number().int(), // Max context window
max_output_tokens: z.number().int(), // Max output tokens for this model
image_cost_per_1m: z.number().min(0).optional(), // Vision models
cached_input_cost_per_1m: z.number().min(0).optional(), // Cached/batched input pricing
last_updated_at: z.string(), // ISO timestamp
last_verified_at: z.string(), // Last time agent confirmed price unchanged
source: z.enum(["json_registry", "agent_research", "manual_override"]),
research_notes: z.string().max(1000).optional(),
previous_input_cost_per_1m: z.number().min(0).optional(), // For change tracking
previous_output_cost_per_1m: z.number().min(0).optional(),
schema_version: z.literal(2),
});
// ── Cost Tracking Events ──
const CostTrackingEventSchema = z.object({
event_id: z.string().max(160),
cost_type: z.enum([
"actual", // From real token counts via reverse telemetry
"reservation", // Budget hold placed at dispatch
"reservation_release", // Hold released when actual arrives
"adjustment", // Manual correction
]),
operation_id: z.string().max(160),
scope_kind: z.enum(["chat", "room", "task", "panel", "forum", "project", "global"]),
scope_id: z.string().max(160),
room_id: z.string().max(160).optional(),
room_turn_id: z.string().max(160).optional(),
panel_run_id: z.string().max(160).optional(),
forum_thread_id: z.string().max(160).optional(),
task_id: z.string().max(160).optional(),
project_id: z.string().max(160).optional(),
participant_id: z.string().max(160).optional(),
model_id: z.string(),
provider: z.string(),
input_tokens: z.number().int().min(0),
output_tokens: z.number().int().min(0),
cost_usd: z.number(), // Positive for spend, negative for releases
route_trace_id: z.string().max(160).optional(),
timestamp: z.string(),
schema_version: z.literal(2),
});
// ── Budget Caps ──
const CostCapConfigSchema = z.object({
global_daily_usd: z.number().min(0).default(50),
warning_threshold_pct: z.number().min(0).max(100).default(80),
enforcement_threshold_pct: z.number().min(0).max(100).default(95),
per_mode: z.record(OrchestrationModeSchema, z.number().min(0)).optional(),
per_project: z.record(z.string(), z.number().min(0)).optional(),
per_task: z.record(z.string(), z.number().min(0)).optional(),
per_room: z.record(z.string(), z.number().min(0)).optional(),
schema_version: z.literal(2),
});
// ── Budget Override (Q → EC) ──
const CostOverrideRequestSchema = z.object({
operation_id: z.string().max(160),
override_type: z.enum(["once", "increase_cap"]),
scope_kind: z.enum(["global", "project", "task", "room"]),
scope_id: z.string().max(160),
new_cap_usd: z.number().min(0).optional(), // Required for "increase_cap"
reason: z.string().max(500).optional(),
timestamp: z.string(),
});
// ── Registry Change Notification ──
const CostRegistryChangeNotificationSchema = z.object({
notification_id: z.string().max(160),
research_run_id: z.string().max(160),
timestamp: z.string(),
source: z.enum(["nightly_cron", "manual_refresh"]),
changes: z.array(z.object({
model_id: z.string(),
provider: z.string(),
change_type: z.enum(["updated", "added", "anomaly_held", "verified_unchanged", "fetch_failed"]),
old_input_cost_per_1m: z.number().optional(),
new_input_cost_per_1m: z.number().optional(),
old_output_cost_per_1m: z.number().optional(),
new_output_cost_per_1m: z.number().optional(),
notes: z.string().max(500).optional(),
})),
summary: z.string().max(500), // Human-readable one-liner for Q header notification
});
```
---
## 2) Automatic Pricing Research
### 2.1 Architecture: Deterministic-First, Agent-Fallback
The pricing research pipeline has two autonomous layers, both running without user intervention:
**Layer 1 — Deterministic JSON fetch.** For each model in the tracking list, attempt to fetch pricing from machine-readable JSON registries (e.g., LiteLLM's `model_prices_and_context_window.json`, or similar community-maintained sources). This is fast, cheap (no token burn), and deterministic. Most mainstream models (Anthropic, OpenAI, Google) are covered.
**Layer 2 — CostResearchAgent (autonomous LLM fallback).** For any model not found in JSON registries, or when JSON registries are unreachable or stale, the CostResearchAgent browses provider pricing pages and extracts costs. This handles new model launches, niche providers, pricing tier changes, and registry failures. The agent operates autonomously — no user approval is required. Anomaly bounds (§2.6) catch catastrophic misreads.
**Why both layers:** The deterministic fetch reduces token burn and eliminates hallucination risk for the common case. The agent handles everything the deterministic path can't — and critically, handles it autonomously. If a JSON registry URL changes, the agent notices the failure and adapts. If a new provider launches that isn't in any registry, the agent can find and parse their pricing page. The agent is the resilient path; the JSON fetch is the optimization.
### 2.2 One-Time Setup (User Does This Once in Q)
- Go to **Settings → Advanced → Cost Research**
- Toggle "Enable nightly cost research"
- Choose time (default 02:00 local)
- Click "Save & Enable Cron"
- OpenClaw creates the cron job automatically (no manual terminal work)
### 2.3 CostResearchAgent Configuration
The CostResearchAgent is an OpenClaw cron-triggered task — not a persistent DOC10 system agent. It runs, completes, and exits. It does not appear in the SystemAgentDirectory, does not have a heartbeat obligation, and is not subject to depth/cycle/timeout guards.
**Instruction storage:** The agent's configuration lives in its native OpenClaw workspace files:
| File | Purpose |
|---|---|
| Agent's `SOUL.md` | Behavioral instructions: how to research pricing, output format, anomaly handling, fallback strategies |
| Agent's `pricing_sources.md` | Model tracking list, JSON registry URLs, provider pricing page URLs, parsing hints |
These files are editable through Q's agent configuration page (the same UI used for DOC12 agent management). The user can add new models, change URLs, or adjust instructions at any time. The agent reads its own workspace files as context when it runs.
**Why this instead of a DOC7 bucket:** DOC7 buckets are designed for user-facing workspace context. System agent configuration belongs in the agent's own workspace. This keeps system configuration out of the user's context page, uses existing OpenClaw primitives (SOUL.md, workspace files) without new infrastructure, and is editable through the agent config UI that DOC12 already requires.
### 2.4 Research Pipeline (What Runs on Each Trigger)
The same pipeline runs for both nightly cron and manual "Refresh All Pricing Now" button:
1. **Read current state.** Load `cost_registry.json` and `cost_overrides.json`. Note which models are manually overridden (these are skipped unless the override is older than a configurable threshold, default 30 days, in which case the agent verifies the override is still accurate and flags if not).
2. **Deterministic JSON fetch (Layer 1).** For each model in `pricing_sources.md`, check if a machine-readable JSON registry URL is listed. If so, fetch the JSON, parse the model's entry, and compare to the current registry. This covers most mainstream models in milliseconds with zero token burn.
3. **Agent research (Layer 2).** For any model not resolved by Layer 1 — either because no JSON source is listed, the JSON source failed, or the model wasn't found in the JSON — the CostResearchAgent browses the provider pricing page URLs listed in `pricing_sources.md`. The agent:
- Visits the official provider pricing page first
- If the page structure has changed or the model isn't found, adapts its parsing approach
- If the official page fails entirely, falls back to community sources listed in `pricing_sources.md`
- Extracts input/output cost per 1M tokens, context window, and max output tokens
- Records detailed parsing notes in `research_notes`
4. **Anomaly check (§2.6).** Every new value — from JSON fetch or agent research — passes through anomaly bounds before being committed.
5. **Atomic registry update.** Validated new entries are written atomically to `cost_registry.json`. Every change is appended to `cost_registry_events.jsonl` with full provenance (source, old value, new value, timestamp, research notes).
6. **Emit telemetry.** `cost.research.completed` event with summary of all changes.
7. **Push change notification (§2.7).** Notification delivered to Q showing all changes.
### 2.5 Manual Refresh Button
Always available in **Settings → Model Costs** ("Refresh All Pricing Now"). Triggers the identical pipeline from §2.4. The only difference is the trigger source (`manual_refresh` vs `nightly_cron`) recorded in telemetry and notification.
### 2.6 Anomaly Bounds
Every pricing update — from any source — passes through anomaly detection before committing to the registry:
**Rule:** If a new price differs from the previous price by more than 3× in either direction (e.g., old: $3.00, new: $9.01 or new: $0.99), the update is held and the agent automatically re-verifies from a second independent source.
- If the second source confirms the new price, the update is committed with a note: "Large change confirmed by [second source]"
- If the second source contradicts, the previous price is kept with a note: "Anomaly detected — [source A] reported $X, [source B] reported $Y — keeping previous value"
- The anomaly is surfaced in the change notification (§2.7) with ⚠ indicator so the user can see it passively
**This is not a user-approval gate.** The agent handles anomalies autonomously by re-verifying. The user sees the result in the notification but never needs to act unless they disagree.
**Absolute bounds:** No model price may be set above $500/1M tokens or below $0.001/1M tokens via automatic research. Values outside this range require manual override. (These bounds are configurable in the agent's SOUL.md.)
### 2.7 Change Notifications
Every research run (nightly or manual) pushes a `CostRegistryChangeNotification` to Q. The notification is displayed as a dismissable card in Q's notification area:
```
Cost Registry Updated (Nightly Research — 2026-03-03 02:00)
✓ claude-sonnet-4-5: $3.00/$15.00 → $3.00/$15.00 (unchanged)
⚡ gpt-4o: $2.50/$10.00 → $2.75/$10.00 (input +10%)
✓ claude-haiku-4-5: $0.80/$4.00 → $0.80/$4.00 (unchanged)
⚠ gemini-2.5-pro: $1.25/$10.00 → $3.75/$10.00 (input +200%, held — re-verified via LiteLLM JSON)
✚ claude-opus-4-5: NEW — $15.00/$75.00 (added from Anthropic pricing page)
✗ mistral-large: fetch failed — keeping previous value ($2.00/$6.00)
```
This is the passive human audit trail. The user scans it in seconds and moves on. If something looks wrong, they can use manual override. They never need to act on it.
### 2.8 Manual Override
Always available in **Settings → Model Costs**. The user can set any model's pricing directly. Manual overrides are stored in `cost_overrides.json` and take absolute precedence over both JSON fetch and agent research. The research pipeline skips manually overridden models (unless the override is very old, per §2.4 step 1).
---
## 3) Enforcement: Three Layers, Zero Normal-Path Latency
### 3.1 Design Principle: The Budget Check Is Free
The cost check itself is an in-memory comparison: read the in-memory running total for the applicable scope(s), compare to the cap(s), and optionally compute a max_tokens value. This is sub-millisecond — faster than a single `JSON.parse()`. It runs on every dispatch but adds no perceptible latency.
The cost registry and running totals are loaded into memory at EC startup and maintained incrementally as cost events arrive. Disk persistence is asynchronous. The enforcement path never hits disk, never makes a network call, and never invokes an LLM.
**Provider-side limits are the first line of defense.** Anthropic, OpenAI, and most providers enforce their own account-level spend limits. DOC13's enforcement is defense-in-depth. This means the system should prioritize avoiding false blocks (which stop legitimate work) over catching every theoretical edge case (which providers already handle).
### 3.2 Token Ingestion: Where Actual Costs Come From
This is the critical path that R0 omitted entirely.
**Source:** DOC11 Gateway reverse telemetry. Every Gateway completion event — `gateway.chat.completed`, `room_turn_completed`, `gateway.task.completed`, etc. — must include a `usage` block:
```ts
usage: {
prompt_tokens: number,
completion_tokens: number,
total_tokens: number,
}
```
This is the actual token count from the provider's API response. It is the ground truth for cost calculation.
**Processing:** When EC receives a completion event with usage data:
1. Look up the model in the in-memory cost registry
2. Calculate actual cost: `(prompt_tokens × input_rate + completion_tokens × output_rate) / 1_000_000`
3. Write a `CostTrackingEvent` with `cost_type: "actual"`
4. Release any outstanding reservation for this operation (write a `cost_type: "reservation_release"` event with negative cost_usd)
5. Update in-memory running totals for all applicable scopes
6. Flush to disk asynchronously
**DOC11 Amendment Requirement:** DOC11 must be amended to include `usage` data on all completion events. This is additive to the DOC11 reverse telemetry amendment already identified as P0 in the DOC10 R9 audit. The `usage` block is a standard field in every major provider's API response (OpenAI, Anthropic, Google) — DOC11 just needs to pass it through.
### 3.3 Tiered Enforcement
Enforcement escalates based on budget utilization. The tiers are determined by the `warning_threshold_pct` (default 80%) and `enforcement_threshold_pct` (default 95%) in `CostCapConfigSchema`.
#### Tier 1: Normal Operation (Below Warning Threshold)
**Budget utilization < 80% (configurable)**
- The in-memory budget check runs (sub-millisecond) but takes no enforcement action
- `max_output_tokens` is NOT injected into the handoff payload — the model uses its default max output or whatever the calling context specifies
- Dispatch proceeds with zero added overhead
- Cost is tracked normally via reverse telemetry
- Global header in Q shows spend/cap color-coded green
This is the state the system is in 90%+ of the time. Zero latency impact.
#### Tier 2: Watchful (Warning to Enforcement Threshold)
**Budget utilization 80–95% (configurable)**
- The in-memory budget check runs (sub-millisecond)
- Q header turns yellow/amber with current spend vs cap
- `cost.budget.warning` telemetry event emitted (once per threshold crossing, not per operation)
- `max_output_tokens` is injected into the handoff payload, calculated from remaining budget: `floor(remaining_budget_usd / output_cost_per_1m × 1_000_000)`, floored to a minimum useful threshold (default 500 tokens). If the calculated max is below minimum, escalate to Tier 3.
- Dispatch still proceeds without user interaction — the max_tokens injection is invisible to the user but prevents any single generation from exhausting the remaining budget
The max_tokens injection is the highest-leverage enforcement mechanism because it operates at the provider level. The provider physically cannot generate more output tokens than this value, regardless of what EC, OpenClaw, or any agent does.
#### Tier 3: Guarded (Above Enforcement Threshold)
**Budget utilization > 95% (configurable)**
- The in-memory budget check calculates whether this operation's maximum possible cost would exceed the remaining budget
- Maximum possible cost = `(estimated_input_tokens × input_rate + model_max_output × output_rate) / 1_000_000`
- Estimated input tokens = context window utilization estimate (model max context × 0.3 as conservative default, or actual prompt token count if available from the DOC10 Decision Context Builder)
- If maximum possible cost > remaining budget:
- Return `DispatchErrorSchema.code = "BUDGET_EXCEEDED"` with details
- Q renders the budget modal (§3.4)
- Dispatch is blocked until user resolves
- If maximum possible cost ≤ remaining budget:
- Place an in-memory reservation equal to the maximum possible cost
- Inject tight `max_output_tokens` (from remaining budget minus reservation for input)
- Dispatch proceeds
- Reservation is released when actual cost arrives via reverse telemetry
**Reservation concurrency:** Reservations are in-memory. If four parallel operations in a DOC12 room each check the budget, each sees the other operations' outstanding reservations reflected in the remaining budget. This prevents the race condition where parallel dispatches all pass individually but collectively exceed the cap.
#### Tier 4: Mid-Stream Circuit Breaker (Phase 2)
**For agentic tool-use loops only (not single generations)**
When an operation involves multiple sequential API calls (tool-use loops, multi-step agent workflows), the running cost accumulator may exceed the reservation placed at dispatch. The mid-stream circuit breaker checks accumulated cost against the reservation after each tool call completes:
- If `accumulated_actual_cost > reservation_amount`, emit `cost.budget.midstream_exceeded` telemetry
- Send abort signal via DOC10 abort cascade (§4.10E in DOC10 R9)
- Return `[INCOMPLETE: BUDGET LIMIT REACHED]` system message
- Q displays the budget modal
This is Phase 2 because single-generation overruns are already handled by max_tokens injection (Tier 2/3), and multi-call overruns are relatively rare. The circuit breaker is the safety net for edge cases.
### 3.4 Budget Exceeded Flow in Q
When dispatch is blocked by `BUDGET_EXCEEDED`, Q renders a modal:
```
Budget Limit Reached
─────────────────────
Daily Budget: $50.00
Spent Today: $48.50
This Operation: ~$3.20 (estimated)
Remaining: $1.50
What would you like to do?
[ Override once for this operation ]
[ Increase daily budget → $_____ ]
[ Switch to cheaper model: claude-haiku-4-5 (~$0.40) ]
[ Stop ]
```
**Override once:** Adds `budget_override: true` to the `OperationEnvelope`, bypassing the cost check for this single dispatch. Emits `cost.budget.override` telemetry. The operation still gets max_tokens injection based on remaining budget, and actual cost is still tracked — the override only bypasses the pre-dispatch block.
**Increase budget:** Updates `cost_caps.json` for the applicable scope. Takes effect immediately for this and all subsequent operations.
**Switch to cheaper model:** Only shown when a cheaper model is available for the current operation type. Re-dispatches with the selected model. The cost estimate updates to show the cheaper option's projected cost.
**Stop:** Cancels the operation. User returns to Q.
### 3.5 DOC8 Friction Exclusion
`BUDGET_EXCEEDED` dispatch errors are NOT friction events. Budget exhaustion is not a code defect, capability failure, or routing error. It must not trigger DOC8 learning proposals, DOC9 repair proposals, or any self-improvement workflow.
**Implementation:**
- DOC10 Event Intake must exclude `BUDGET_EXCEEDED` from the event stream routed to DOC8
- DOC8 must add `BUDGET_EXCEEDED` to its explicit evaluation exclusion list (defense-in-depth)
- DOC10's `DecisionFeedbackEvent` must not be emitted for budget-blocked operations
---
## 4) Unified Calculation
### 4.1 Cost Calculation Formula
```ts
cost_usd = (input_tokens * input_cost_per_1m / 1_000_000) +
(output_tokens * output_cost_per_1m / 1_000_000)
```
For cached input tokens (when `cached_input_cost_per_1m` is set and the provider reports cache hits):
```ts
cost_usd = (cached_input_tokens * cached_input_cost_per_1m / 1_000_000) +
(uncached_input_tokens * input_cost_per_1m / 1_000_000) +
(output_tokens * output_cost_per_1m / 1_000_000)
```
### 4.2 Cost Attribution in Multi-Participant Rooms
Each API call's full cost is attributed to the participant who triggered it. Shared context (room chat history) is included in each participant's input tokens because it was part of that participant's prompt. The room's total cost is the sum of all participant costs.
This aligns with Claude Code's session binding wire shapes: `room_turn_cost` events carry `participant_id`, `room_id`, and `room_turn_id` correlation fields. The cost tracker attributes the full `total_cost_usd` to that participant within that room.
No cost-splitting across participants. No separate "shared context" cost category. The accounting is simple: every API call has exactly one participant, one model, one cost.
### 4.3 Running Totals
The cost tracker maintains in-memory running totals per scope:
| Scope | Key | Reset interval |
|---|---|---|
| Global daily | `global` | Midnight local time |
| Per-mode | `mode:{mode_name}` | Midnight local time |
| Per-project | `project:{project_id}` | Configurable (default: monthly) |
| Per-task | `task:{task_id}` | Task lifetime |
| Per-room | `room:{room_id}` | Room lifetime |
Running totals are rebuilt from `cost_tracking_events.jsonl` on EC startup (filtered by the current reset period). During operation, they are maintained incrementally as cost events arrive.
### 4.4 Budget Check Function
```ts
function checkBudget(
scopes: Array<{ kind: string; id: string }>,
model_id: string,
estimated_input_tokens?: number
): BudgetCheckResult {
const registry = inMemoryCostRegistry.get(model_id);
if (!registry) return { status: "no_pricing", proceed: true };
const caps = inMemoryCostCaps;
const totals = inMemoryRunningTotals;
for (const scope of scopes) {
const cap = getCap(caps, scope);
if (!cap) continue;
const spent = totals.get(scopeKey(scope)) ?? 0;
const reserved = inMemoryReservations.sumForScope(scope);
const utilization = (spent + reserved) / cap;
if (utilization >= caps.enforcement_threshold_pct / 100) {
const maxCost = computeMaxCost(registry, estimated_input_tokens);
if (spent + reserved + maxCost > cap) {
return {
status: "exceeded",
proceed: false,
scope,
spent,
reserved,
cap,
estimated_cost: maxCost,
};
}
// Under cap but in enforcement zone — inject tight max_tokens + reservation
return {
status: "guarded",
proceed: true,
max_output_tokens: computeMaxOutputTokens(registry, cap - spent - reserved),
reservation_usd: maxCost,
};
}
if (utilization >= caps.warning_threshold_pct / 100) {
return {
status: "watchful",
proceed: true,
max_output_tokens: computeMaxOutputTokens(registry, cap - spent - reserved),
};
}
}
return { status: "normal", proceed: true };
}
```
This function runs in the Decision Broker before every dispatch. It is entirely in-memory. No I/O, no network, no LLM. Sub-millisecond.
---
## 5) UI Surfaces (Q)
### 5.1 Global Header
Always visible. Shows daily spend / global cap. Color-coded:
| Utilization | Color | Display |
|---|---|---|
| < 50% | Green | `$12.40 / $50.00` |
| 50–80% | Blue | `$35.00 / $50.00` |
| 80–95% | Amber | `⚡ $42.00 / $50.00` |
| > 95% | Red | `🔴 $48.50 / $50.00` |
### 5.2 Settings → Model Costs
Full table of all tracked models:
| Model | Provider | Input $/1M | Output $/1M | Context | Source | Last Updated | Actions |
|---|---|---|---|---|---|---|---|
| claude-sonnet-4-5 | Anthropic | $3.00 | $15.00 | 200K | JSON Registry | 2h ago | Override / View History |
| gpt-4o | OpenAI | $2.75 | $10.00 | 128K | Agent Research | 14h ago | Override / View History |
Plus: "Refresh All Pricing Now" button, link to agent config for CostResearchAgent.
### 5.3 Settings → Budget Caps
| Scope | Cap | Warning At | Hard Stop At | Spent | Status |
|---|---|---|---|---|---|
| Global Daily | $50.00 | 80% ($40) | 95% ($47.50) | $12.40 | ✓ Normal |
| Project: ELNOR | $200/month | 80% | 95% | $45.00 | ✓ Normal |
| Room: Arch Review | $10/session | 80% | 95% | $0.00 | ✓ Normal |
All editable inline.
### 5.4 Per-Surface Cost Display
Every surface (chat session, room, task, panel, project) shows:
- Cost so far for this surface
- Remaining budget (if a cap is set)
- Per-participant breakdown (for rooms)
### 5.5 Running Jobs Table
Add cost column showing: estimated cost (pre-dispatch) → actual cost (post-completion).
### 5.6 Engineering Panel
Phase 2: Cost charts by mode, project, room, time period. Trend lines. Top cost drivers. Model cost comparison.
### 5.7 Change Notification Display
Dismissable card in Q's notification area showing the latest research run results (§2.7).
---
## 6) Integration Points
### 6.1 DOC10 (Orchestration)
- **Decision Broker:** Add `checkBudget()` call. Inject `max_output_tokens` into `GatewayHandoffPayloadSchema` when budget utilization warrants (Tier 2+).
- **GatewayHandoffPayloadSchema:** Add `max_output_tokens: z.number().int().optional()` field. When set, DOC11 must honor this limit when constructing the provider API call.
- **DispatchErrorSchema:** Add `"BUDGET_EXCEEDED"` to error code enum. Include `BudgetExceededDetail` with scope, spent, cap, estimated cost.
- **Mode × Operation Authority Matrix:** Add cost check as a pre-dispatch step for all paths. Note: cost check is in-memory and does not add a matrix row — it is a universal pre-step, not a path-specific authority.
- **Event Intake:** Exclude `BUDGET_EXCEEDED` errors from DOC8 routing.
- **Telemetry Event List:** Add cost telemetry event family (§7) with phase tags.
- **Ledger R8:** Add DOC13 obligations section.
### 6.2 DOC11 (Gateway)
- **Reverse telemetry:** All completion events must include `usage: { prompt_tokens, completion_tokens, total_tokens }`. This is additive to the existing DOC11 amendment requirements.
- **max_output_tokens:** When `GatewayHandoffPayload.max_output_tokens` is set, DOC11 must pass this value as the `max_tokens` parameter in the provider API call. This physically limits output generation at the provider level.
- **Room turn events:** `room_turn_completed` events (per Claude Code's wire shapes) must include `token_usage` and `total_cost_usd`. The `room_turn_cost` event provides the same data in a dedicated cost event for room-aware tracking.
### 6.3 DOC12 (Rooms)
- **Room state:** Add `room_cost_so_far_usd: number` to room state schema.
- **Per-participant tracking:** Cost events carry `participant_id` from Claude Code's session binding correlation fields. Room cost breakdowns in Q use these for per-participant display.
- **Room budget caps:** `CostCapConfigSchema.per_room` allows per-room caps. Room dispatch respects these caps through the standard `checkBudget()` path.
### 6.4 DOC8 (Learning / Friction)
- **Friction exclusion:** Add `BUDGET_EXCEEDED` to the explicit exclusion list. Budget failures must not generate friction events, learning proposals, or repair proposals.
- **This is bilateral:** DOC10 does not route budget errors to DOC8, AND DOC8 independently excludes them. Defense-in-depth.
### 6.5 DOC4 (OpenClaw)
- **Cron runner:** OpenClaw owns the cron execution for the nightly research job.
- **CostResearchAgent:** Registered as an OpenClaw cron-triggered task, not a DOC10 system agent. Uses standard OpenClaw tool access (web browsing, file read/write) to perform research.
- **Agent workspace files:** SOUL.md and `pricing_sources.md` live in the agent's workspace directory, editable through Q's agent configuration UI.
### 6.6 Q (Dashboard)
- All proxy routes for cost controls (override, increase cap, manual refresh)
- All read routes for cost data (registry, running totals, event history, change notifications)
- Global header cost display
- Settings pages (Model Costs, Budget Caps)
- Per-surface cost displays
- Budget exceeded modal
- Change notification card
- Agent configuration page for CostResearchAgent
---
## 7) Telemetry Events
All events in the `cost.*` family. Phase-tagged per DOC10 conventions.
### Phase 0
| Event | Emitted when |
|---|---|
| `cost.tracking.recorded` | Actual cost recorded from reverse telemetry |
| `cost.reservation.placed` | Budget hold created at dispatch |
| `cost.reservation.released` | Hold settled against actual cost |
| `cost.budget.exceeded` | Dispatch blocked due to budget |
| `cost.budget.override` | User overrides a budget block |
| `cost.registry.updated` | Pricing data changes (any source) |
| `cost.research.completed` | Research run finished (nightly or manual) |
| `cost.research.failed` | Research run failed for one or more models |
### Phase 1
| Event | Emitted when |
|---|---|
| `cost.budget.warning` | Spend crosses warning threshold (once per crossing) |
| `cost.cap.changed` | User modifies a budget cap |
### Phase 2
| Event | Emitted when |
|---|---|
| `cost.budget.midstream_exceeded` | Mid-stream circuit breaker fires during tool-use loop |
| `cost.anomaly.detected` | Research finds price change > 3× requiring re-verification |
| `cost.anomaly.resolved` | Re-verification completes (confirmed or rejected) |
---
## 8) Phasing
### Phase 0 (Ship with DOC12)
**Registry and Research:**
- Cost registry (`cost_registry.json`) with manual override
- CostResearchAgent as OpenClaw cron job with SOUL.md + `pricing_sources.md`
- Deterministic JSON fetch as Layer 1 in research pipeline
- Nightly cron + manual refresh button (same pipeline, different triggers)
- Anomaly bounds on all pricing updates
- Change notification pushed to Q on every research run
- All core schemas (CostModelEntrySchema, CostTrackingEventSchema v2, CostCapConfigSchema v2)
- `cost_registry_events.jsonl` audit log
**Enforcement:**
- In-memory budget check in Decision Broker (sub-millisecond, every dispatch)
- Tiered enforcement: normal (no action) → watchful (max_tokens injection) → guarded (reservation + block)
- Dynamic `max_output_tokens` injection in GatewayHandoffPayload (Tier 2+)
- Budget reservation system for Tier 3 concurrent operations
- `BUDGET_EXCEEDED` dispatch error code
- DOC8 friction exclusion for budget errors
**Token Ingestion:**
- Token usage extraction from DOC11 reverse telemetry completion events
- Actual cost calculation and recording
- Running total maintenance (in-memory with async disk flush)
**UI:**
- Global header cost display (spend / cap, color-coded)
- Settings → Model Costs (full table, manual overrides, refresh button)
- Settings → Budget Caps (global daily cap, warning/enforcement thresholds)
- Budget exceeded modal (override / increase / cheaper model / stop)
- Change notification card
### Phase 1
- Hierarchical caps (per-mode, per-project, per-task, per-room)
- Per-surface spend display in all Q views
- Per-participant cost breakdown in DOC12 rooms
- Cost column in Running Jobs table
- Warning threshold notifications in Q
- `cost_overrides.json` with provenance and override age tracking
### Phase 2
- Mid-stream circuit breaker for tool-use loops
- Cost charts in Engineering Panel (by mode, project, room, time period)
- Historical cost analytics and trend visualization
- Cost-aware route scoring weight in DOC10 (prefer cheaper models when budget utilization is high)
- Budget recommendation engine (suggest cap adjustments based on usage patterns)
---
## 9) Consolidated Checklist for Claude Code
### Contracts and Schemas
1. Add CostModelEntrySchema, CostTrackingEventSchema (v2), CostCapConfigSchema (v2), CostOverrideRequestSchema, CostRegistryChangeNotificationSchema to `packages/contracts/src/schemas.ts`
2. Add `max_output_tokens: z.number().int().optional()` to GatewayHandoffPayloadSchema
3. Add `"BUDGET_EXCEEDED"` to DispatchErrorSchema error code enum
4. Add `budget_override: z.boolean().optional()` to OperationEnvelope
### EC Cost Modules
5. Create `apps/ec-service/src/cost/cost-registry.ts` — in-memory registry manager with atomic file I/O, JSON fetch integration, anomaly detection
6. Create `apps/ec-service/src/cost/budget-enforcer.ts` — tiered budget check, reservation system, max_tokens calculation
7. Create `apps/ec-service/src/cost/cost-tracker.ts` — processes reverse telemetry usage data, writes tracking events, maintains in-memory running totals
8. Create `apps/ec-service/src/cost/cost-notifications.ts` — builds change notification from research run results, pushes to Q
### DOC10 Integration
9. Add `checkBudget()` call to `decision-broker.ts` — inject max_output_tokens into handoff payload when warranted
10. Add `BUDGET_EXCEEDED` to DOC8 friction exclusion in event intake
11. Add cost telemetry events to telemetry event registry with phase tags
### DOC11 Integration
12. Amend DOC11 to include `usage: { prompt_tokens, completion_tokens, total_tokens }` on all completion reverse telemetry events
13. Amend DOC11 to honor `max_output_tokens` from GatewayHandoffPayload as provider API `max_tokens` parameter
### Research Agent
14. Create CostResearchAgent with SOUL.md (research instructions, anomaly handling) and `pricing_sources.md` (model list, JSON registry URLs, provider pricing page URLs)
15. Wire OpenClaw cron job (`cost-research-nightly`) to launch the agent with its workspace context
16. Wire manual refresh button to same pipeline
### Q Proxy Routes and UI
17. Add proxy routes: `POST /api/orchestration/cost/override`, `POST /api/orchestration/cost/cap`, `POST /api/orchestration/cost/refresh`
18. Add read routes: `GET /api/orchestration/cost/registry`, `GET /api/orchestration/cost/totals`, `GET /api/orchestration/cost/events`, `GET /api/orchestration/cost/notifications`
19. Add global header cost display
20. Add Settings → Model Costs page
21. Add Settings → Budget Caps page
22. Add budget exceeded modal
23. Add change notification card
24. Add CostResearchAgent configuration to agent config UI
### Ledger
25. Update DOC10 Integration Ledger R7 → R8 with DOC13 obligations section
---
## 10) Acceptance Tests
### AT-1: Registry Populated and Current
- After nightly cron run, `cost_registry.json` contains entries for all models in `pricing_sources.md`
- Each entry has `last_updated_at` within 24 hours
- JSON-fetched entries have `source: "json_registry"`
- Agent-researched entries have `source: "agent_research"` with non-empty `research_notes`
### AT-2: Cost Tracking on Every Operation
- After any Gateway-dispatched operation completes, `cost_tracking_events.jsonl` contains an event with `cost_type: "actual"`, correct `model_id`, and non-zero `input_tokens` / `output_tokens`
- Running totals reflect the new actual cost
### AT-3: Tiered Enforcement — Normal
- With budget utilization < 80%, operations dispatch without max_tokens injection or user interaction
- Q header shows green spend indicator
### AT-4: Tiered Enforcement — Watchful
- With budget utilization 80–95%, operations dispatch with max_tokens injection
- Q header turns amber
- `cost.budget.warning` telemetry emitted
### AT-5: Tiered Enforcement — Guarded
- With budget utilization > 95%, expensive operations are blocked with `BUDGET_EXCEEDED`
- Budget modal renders in Q with correct spend/cap/estimate values
- "Override once" dispatches the operation with `budget_override: true`
- "Increase budget" updates cap and allows dispatch
- "Switch to cheaper model" re-dispatches with cheaper model
### AT-6: Anomaly Detection
- If CostResearchAgent returns a price > 3× previous, the agent re-verifies from a second source
- If re-verification confirms, registry updates with confirmation note
- If re-verification contradicts, previous value is kept with anomaly note
- Change notification shows ⚠ indicator for anomalies
### AT-7: Change Notification
- After every research run, Q displays a dismissable notification card with all changes
- Notification includes model name, old/new prices, change type, and any anomaly flags
### AT-8: DOC8 Exclusion
- `BUDGET_EXCEEDED` dispatch errors do not appear in DOC8 friction event stream
- No learning proposals or repair proposals are generated for budget failures
### AT-9: Manual Override
- User-entered pricing in `cost_overrides.json` takes precedence over all automatic sources
- CostResearchAgent skips manually overridden models
- Override values persist across research runs
### AT-10: Room Cost Attribution
- In a DOC12 room with multiple participants, each participant's costs are tracked separately via `participant_id`
- Room total equals sum of participant costs
- Room-level budget cap applies to room total
---
*This specification gives you one unified, automatically updated, zero-latency-in-normal-operation, enforceable cost system that is set-and-forget after the single cron enablement step. The CostResearchAgent operates autonomously with anomaly bounds and change notifications as passive audit trails. All surfaces reuse the same schemas and events. Provider-side limits serve as the first-line backstop, with DOC13's three-layer enforcement as defense-in-depth.*