The Autonomous Layer (M4)

The opt-in background executor and the discipline that makes unattended agency acceptable: its design lineage, the five-phase walk, the single chokepoint, the brakes, the honest caveats, and what shipped.

R4 — Economic restraint

Definition. A per-session or per-execution cost cap that halts the run rather than overspend. An agent running in a loop can quietly burn a fortune in API credits; the brake checks projected cost against a remaining budget before every tool call and halts gracefully if a call would exceed the cap.

Current implementation: SHIPPED (M4) for the autonomous layer.

The autonomous layer ships a hard per-session and per-trigger max_cost_usd cap, checked before every tool call at the single guarded_tool_call chokepoint (api/app/autonomous/guard.py, the R4 stage of the R5 → R6 → R4 ordering). The pre-flight projects the call's cost (api/app/autonomous/cost.py::estimate_tool_cost), and if session.cost_total_usd + projected > session.max_cost_usd it raises CostCapReached, latches the session, and produces a receipt with terminal_reason=cost_cap_reached — a graceful halt, not an overrun. The per-trigger cap (a watch's or schedule's own max_cost_usd, inherited by the spawned session) landed in migration 0045.

The pre-M4 cost-tracking surfaces remain and feed the estimator:

Per-call cost tracking (M1). inference_routing_log table captures cost_estimate per provider call (PRD §5.5).
Per-purpose tagging (M2-E2). gateway/app/routing_log.py adds lq_ai_purpose so judge-call cost can be differentiated from chat-call cost.
Rolling-average cost estimator (M2-E2). api/app/citation/cost.py::estimate_judge_call_cost_usd queries per-model rolling averages from inference_routing_log with cold-start defaults; the autonomous cost wrapper reuses it for inference-bearing intents.
Per-message ensemble pre-flight budget (M2-D1). Pre-flight check in chats.py::_resolve_ensemble_config falls back from ensemble to single-judge Stage 3 when projected n_citations × n_judges × per-judge-cost exceeds the per-message cap.

Live acceptance evidence. The M4 fresh-install acceptance run produced a real halted session (4554cdd9) from a watch with max_cost_usd=$0.001: the analysis-phase run_skill call's R4 brake latched, the session halted, and its stored receipt carried terminal_reason=cost_cap_reached. R4 fired for real, end-to-end through the production-shape trigger → session → chokepoint chain.

Remaining gap. The non-autonomous Playbook executor still does not surface a per-execution cost cap to the operator at execution time; that retrofit is tracked by DE-292 (folds into the executor's pre-flight cost-check). The autonomous-layer cap (DE-293, R4 facet) is shipped.

Verification path.

rg -n "CostCapReached|max_cost_usd" api/app/autonomous/guard.py   # R4 stage at the chokepoint
less api/app/autonomous/cost.py                                   # Per-call pre-flight projection
less api/alembic/versions/0045_autonomous_per_trigger_max_cost.py # Per-trigger cap migration
cd api && pytest tests/autonomous/test_r4_per_trigger_cap.py      # asserts terminal_reason == cost_cap_reached
less api/app/citation/cost.py                                     # Rolling-average estimator (reused)