Test_Set_Card_V2_Red_Team_Prompt.md
Active Working and Red Team/DOC23 Working/DOC23 Red Teaming/Test-set Card V2 Red Team Responses/Test_Set_Card_V2_Red_Team_Prompt.md
# Red-Team Review Request -- DOC23 Addenda B Test-Set Adjudication Card V2 **Intended reviewers:** ChatGPT 5 Pro, Gemini 3.1, Codex. Run each in its own session. Be direct, no hedging; excellence and completeness over brevity. You are an expert systems architect and red-teamer reviewing an ADJUDICATION CARD for ELNOR, a local-first AI orchestration platform (legal + general knowledge work). This card (V2) adjudicates a prior multi-model red-team of the DOC23 Addenda B "Test prompt" harvest -- deciding what is net-new vs. already covered and what features to build -- and consolidates the accepted items into six named constructs (P-1..P-6) plus per-item dispositions. ## PRIMARY DIRECTIVE -- return paste-ready spec content This card is intentionally compact: in many places it names a construct, disposition, or obligation without the full schema. **Your most valuable output is the concrete, paste-ready spec material needed to implement each adopted item** -- TypeScript interfaces/enums, field definitions, lint names, fixture definitions, and algorithm pseudocode. For every package (P-1..P-6) and every adopted AB-T item: supply the schema/contract you would actually add, the lints that enforce it, and at least one executable fixture; where the card already gives a schema, complete/tighten it (missing fields, enums, edge cases). Do NOT stop at "this is underspecified" -- write the spec text, and name the operative spec + section each addition lands in. ## Where the documents are (repo `wbrody/Elnor-Specs`, branch `main`) PRIMARY (review this): - `Active Working and Red Team/DOC23 Working/DOC23 Red Teaming/DOC23_ADDENDA_B_TEST_SET_ADJUDICATION_CARD_V2.md` Prior red-team of the card (context): - `Active Working and Red Team/DOC23 Working/DOC23 Red Teaming/DOC23 Test Set Adj Card Reviews .md` Operative specs the card adjudicates / where new schema would land: - `Current Specs/DOC23/DOC23 Addenda B/DOC23_ADDB_OUTCOME_EVALUATOR_REVISOR_V3_3_1.md` - `Current Specs/DOC23/DOC23 Addenda B/DOC23_EVALUATION_COMMON_CONTRACTS_V1_1_1.md` - `Current Specs/DOC23/DOC23 Addenda B/DOC23_ADDB_SUBSYS_SOURCE_WORKSPACE_V1_0_1.md` - `Current Specs/DOC23/DOC23 Addenda B/DOC23_ADDB_SUBSYS_TASK_FORUM_RUN_BOARD_V1_0_1.md` - `Current Specs/DOC23/DOC23 Addenda B/DOC23_ADDB_SUBSYS_FEEDBACK_DELIVERY_V1_0_1.md` - `Current Specs/DOC23/DOC23 Addenda B/DOC23_ADDENDA_B_CORE_R0_7_1.md` (esp. Sec. 13A for AB-T-HYDR; Sec. 8A TKP readiness + stale_pack_behavior; Sec. 11.17 write-failure taxonomy) - Owner specs referenced by OP-A rows, under `Current Specs/`: DOC24 (context packet; the Sec. 13A consumer; Sec. 10.3A/27/30/38), DOC73 (library/corpus promotion), DOC20 (read-model surfaces). If your tool cannot access the repo, the operator will attach these. ## What the card decided (scrutinize here) - **Six unifying constructs (Sec. 2):** P-1 `CleanVerdictEligibility` (verdict-honesty object + precedence rule; unifies AB-T-01/02/03/05 + staleness), P-2 `ContextBoundaryRef` (matter-dominant isolation; unifies 09/10/11), P-3 `RecoveryPolicyRegistry` (13/14), P-4 gate-signal integrity, P-5 Evidentiary Quarantine -> DOC73, P-6 EnforcementBadge -> DOC20. - **Per-item dispositions (Sec. 3):** AB-T-03 raised to CRITICAL; eight DISCUSS->ADOPT promotions (07,10,11,13,14,15,16,17); AB-T-09 scoped-adopt (admission policy, NOT an orchestrator -- Sec. 2.2-safe); AB-T-12 lightweight; AB-T-08 stays DISCUSS (hard gate declined); re-scopes on 06/14/15/17. - **Interaction bugs (Sec. 4):** eight defects that surface only when the adopted items run together. - **Gemini systemic bugs (Sec. 5):** BUG-01..04 with triage (BUG-03 routed to an existing OP-A; BUG-04 carries a security caveat). - **AB-T-HYDR (Sec. 6):** task-path injection precedence; carries an OPEN placement question. - **Resolved divergences:** D1 (AB-T-02 field-not-state, over Gemini's discrete `satisfied_downgraded` state); AB-T-08 hard gate declined. ## What I want 1. **Paste-ready spec content for the barebones places (priority -- see Primary Directive).** Per P-1..P-6 and per adopted AB-T item: full schema/contract + lints + a fixture, with the target operative spec/section. 2. **Correctness of each adjudication.** Is each disposition right? Does each construct actually solve the finding? Scrutinize the divergences: is D1 (field-not-state) correct, or is Gemini's discrete state genuinely better for DAG routing? Is declining the AB-T-08 hard gate right? 3. **Completeness / missing wiring / phantom features / missing contracts.** Anything referenced-but-undefined, wired-but-not-connected, or asserted-without-a-contract -- in the card, or that the card's adopted items would create downstream. 4. **The six packages specifically.** Are the consolidations sound -- do AB-T-01/02/03/05 truly belong under one P-1 object; is the P-1 precedence rule correct and total (no unhandled combination); does `ContextBoundaryRef` correctly subordinate `context_class_key` to matter? Provide the complete schemas. 5. **Interaction bugs (Sec. 4) + Gemini bugs (Sec. 5).** Are the fixes correct and complete? Any interaction bug missed? For BUG-04 (syntactic-taint), is the P-4 provenance caveat sufficient to keep the "mechanical tool" path from becoming an injection-laundering vector? 6. **AB-T-HYDR placement.** Where should it live -- a DOC24 OP-A (extending `OBL-DOC24-CTXPKT-01`), a Core Sec. 13A.x subsection, or split? Is the proposed instruction-precedence order right? Provide the schema. 7. **OP-A obligations.** For the five OP-A candidate rows (Sec. 9), propose the obligation contract/schema each owner doc would implement. NOTE: DOC24 and the memory stack are flattening-scoped -- do NOT propose editing them directly; propose the OP-A obligation content (the schema/behavior the owner must add) instead. 8. **New/better ideas (BETTER_IDEA).** Anything that materially improves the design. ## How to report - Type every finding: BUG / GAP / SUGGESTION / CONFIRMED / BETTER_IDEA / ARCHITECT_STOP. - Cite the card section/item id, and the operative-spec section (+ line) where relevant, for every claim. - Prefer paste-ready spec content (TypeScript, lint names, fixtures) over prose -- this is the explicit priority for this review -- and name the target operative spec + section for each addition. - Close with a value-tiered summary: Critical / Substantive / Minor / Considered-and-declined. ## Where to save your output If your tool can write to the repo, save as: `Active Working and Red Team/DOC23 Working/DOC23 Red Teaming/Test-set Card V2 Red Team Responses/<YourModelName>_review.md` Otherwise return it in chat and the operator will save it there.