Document review & citations, end to end
How a contract becomes reviewed, cited, and trustworthy: the verification cascade, where citations are stored, the playbook and tabular surfaces that ride on it, and the honest edges where verification stops.
Cascade
Each citation the model emits (a "<quote>" (Source: [N]) pair)
becomes a CitationCandidate and runs through staged verification.
The first stage to verify wins; the persisted method names the stage.
Misses propagate to the next stage. A row that misses every stage
is not persisted — its absence is the unverified signal the M2-C2 UI
consumes.
| Stage | Method (DB) | Description | Cost |
|---|---|---|---|
| 1 | exact_match |
Byte-for-byte equality of source_text against documents.normalized_content[offset_start:offset_end]. |
Free (pure Python). |
| 2 | tolerant_match |
After normalizing both sides (whitespace, smart quotes, OCR confusions when was_ocrd=true), rapidfuzz.fuzz.ratio ≥ 95. |
Free (pure Python). |
| 3 | paraphrase_judge |
LLM judge call through the gateway. Returns yes / partial / no with high / medium / low confidence (mapped to 0.90 / 0.70 / 0.50). partial=true persists to flag "source partially supports the claim." |
One judge call per citation. |
| 4 | ensemble_strict / ensemble_majority |
The paraphrase judge runs in parallel across N models (configured in gateway.yaml). Replaces Stage 3 when activated. Aggregation rule decides whether disagreement misses or majority wins. |
N judge calls per citation (pre-flight budget check enforces a per-message cap). |
Stage 3 vs Stage 4 is exclusive: when ensemble is activated, Stage 3 does not run as a pre-flight. The cascade goes 1 → 2 → 4 (per M2-D1 decision B; the single-judge stage would be redundant with N parallel judges already in flight).