Document review & citations, end to end

How a contract becomes reviewed, cited, and trustworthy: the verification cascade, where citations are stored, the playbook and tabular surfaces that ride on it, and the honest edges where verification stops.

Cascade

Each citation the model emits (a "<quote>" (Source: [N]) pair) becomes a CitationCandidate and runs through staged verification. The first stage to verify wins; the persisted method names the stage. Misses propagate to the next stage. A row that misses every stage is not persisted — its absence is the unverified signal the M2-C2 UI consumes.

Stage	Method (DB)	Description	Cost
1	`exact_match`	Byte-for-byte equality of `source_text` against `documents.normalized_content[offset_start:offset_end]`.	Free (pure Python).
2	`tolerant_match`	After normalizing both sides (whitespace, smart quotes, OCR confusions when `was_ocrd=true`), `rapidfuzz.fuzz.ratio ≥ 95`.	Free (pure Python).
3	`paraphrase_judge`	LLM judge call through the gateway. Returns `yes` / `partial` / `no` with `high` / `medium` / `low` confidence (mapped to 0.90 / 0.70 / 0.50). `partial=true` persists to flag "source partially supports the claim."	One judge call per citation.
4	`ensemble_strict` / `ensemble_majority`	The paraphrase judge runs in parallel across N models (configured in `gateway.yaml`). Replaces Stage 3 when activated. Aggregation rule decides whether disagreement misses or majority wins.	N judge calls per citation (pre-flight budget check enforces a per-message cap).

Stage 3 vs Stage 4 is exclusive: when ensemble is activated, Stage 3 does not run as a pre-flight. The cascade goes 1 → 2 → 4 (per M2-D1 decision B; the single-judge stage would be redundant with N parallel judges already in flight).