The Inference Gateway: the security boundary
The component that holds the keys and guards the egress: its pipeline, the threat model around it, what anonymization does to outbound prompts, how model routing stays inspectable, and where the inference-tier boundary sits.
Orthogonal boundary — the Inference Choice Spectrum
The Inference Choice Spectrum (PRD §1.5.2) is a seventh boundary that runs along a different axis from R1–R6. R1–R6 restrain what the model may decide, spend, run, or touch. The Inference Choice Spectrum restrains where the data goes during inference.
- The five tiers (PRD §1.5.2): local-only (Tier 1), customer-hosted cloud inference (Tier 2), enterprise managed inference with ZDR / no-training commitments (Tier 3), standard cloud API (Tier 4), consumer or free tier (Tier 5).
- Skills, Projects, and requests can require a minimum tier (R2-adapted, above); the gateway refuses routing decisions that violate the floor (
tier_below_minimum). - The audit log records every routing decision (
inference_routing_logper PRD §5.5). - Tier 3 is recommended for most pragmatic enterprise deployments; Tier 1 is recommended for the most sensitive privileged work.
This boundary is documented separately in:
- PRD §1.5.2 (the spectrum's five-tier definition)
- PRD §3.13 (the Inference Tier badge in the UI)
- PRD §4.4 (gateway configuration of tier mapping)
- PRD §1.8 (security posture; calls the spectrum "the central security trade-off")
It is named here so a reader doesn't conflate the two boundaries: a deployment can ship full R1 + R2 + R3 + R4 + R5 + R6 and still expose customer data to a weaker tier through configuration choices, or vice-versa.