Document review & citations, end to end
How a contract becomes reviewed, cited, and trustworthy: the verification cascade, where citations are stored, the playbook and tabular surfaces that ride on it, and the honest edges where verification stops.
message_citations
Per M2-A2 (migration 0025_create_message_citations.py). One row per
model-emitted citation, written by the chat-send path after the
assistant message is persisted and the Citation Engine has run its
verification cascade.
CREATE TABLE message_citations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
message_id UUID NOT NULL REFERENCES messages(id) ON DELETE CASCADE,
source_file_id UUID NOT NULL REFERENCES files(id) ON DELETE CASCADE,
source_offset_start INTEGER NOT NULL,
source_offset_end INTEGER NOT NULL,
source_page INTEGER,
source_text TEXT NOT NULL,
verified BOOLEAN NOT NULL DEFAULT FALSE,
verification_method TEXT, -- enum below
verification_confidence NUMERIC(3,2),
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
CONSTRAINT chk_message_citations_offset_start_nonneg
CHECK (source_offset_start >= 0),
CONSTRAINT chk_message_citations_offset_end_gt_start
CHECK (source_offset_end > source_offset_start),
CONSTRAINT chk_message_citations_method_values
CHECK (
verification_method IS NULL
OR verification_method IN (
'exact_match', 'tolerant_match', 'llm_judge', 'ensemble', 'failed'
)
),
CONSTRAINT chk_message_citations_confidence_range
CHECK (
verification_confidence IS NULL
OR (verification_confidence >= 0 AND verification_confidence <= 1)
),
CONSTRAINT chk_message_citations_verified_has_method
CHECK ((verified = false) OR (verification_method IS NOT NULL))
);
CREATE INDEX idx_message_citations_message ON message_citations(message_id);
CREATE INDEX idx_message_citations_file ON message_citations(source_file_id);
The verification_method enum carries the stage that produced the
verdict — every stage writes into the same row shape so the
persistence layer (and the UI) don't need to switch on stage:
| Value | Stage | Confidence | Lands in |
|---|---|---|---|
'exact_match' |
Stage 1: byte-for-byte against documents.normalized_content[start:end] |
always 1.0 |
M2-A2 (here) |
'tolerant_match' |
Stage 2: whitespace + OCR-artefact + smart-quote normalization | similarity-based | M2-B1 |
'llm_judge' |
Stage 3: LLM paraphrase judge | judge-reported | M2-C1 |
'ensemble' |
Stage 4: multi-model agreement for high-stakes ops | quorum-derived | M2-D1 |
'failed' |
Every stage rejected; rendered as unverified | NULL | M2-C2 wiring |
The verified=true ⇒ verification_method IS NOT NULL CHECK constraint
prevents a row from claiming verification without naming which stage
passed.
M2-A2 ships Stage 1 only: extraction (app.citation.extraction) finds
"..." (Source: [N]) pairs in the assistant response, locates the
quote inside the cited retrieved chunk's content, and derives byte-
precise document offsets. The verifier (app.citation.verification)
confirms normalized_content[start:end] == source_text byte-for-byte.
Candidates that fail Stage 1 are dropped (not persisted) until later
stages ship; the M2-C2 UI work decides what to render for "model
emitted but we couldn't verify."