Document review & citations, end to end

How a contract becomes reviewed, cited, and trustworthy: the verification cascade, where citations are stored, the playbook and tabular surfaces that ride on it, and the honest edges where verification stops.

The workflow

The executor (api/app/tabular/executor.py) is a three-node LangGraph graph run by the ARQ worker (api/app/workers/tabular_worker.py) on the shared playbook queue (arq:m3a6, per Phase C Decision C-3 — the same queue as the Easy Playbook wizard). The nodes live in api/app/tabular/nodes.py:

load_documents_node — resolves document_ids to Document rows in the operator's selection order (missing / soft-deleted documents are skipped silently; the row is preserved as audit but the result set is honest about which sources resolved). It joins the parent File so each row's display name is the operator-uploaded filename, not the document UUID.
extract_cells_node — for each (document, column) pair: runs a lexical full-text search (FTS) over that document's document_chunks using the column's query as keyword input (top-K = 4 chunks; falls back to the first N chunks when FTS yields nothing, so the LLM still sees document content), then dispatches one structured-output LLM call. The model returns {value, cited_chunk_indices, confidence, justification}; the node maps cited_chunk_indices → cited_chunk_ids against the retrieved chunks. Every cell is wrapped in its own try/except: any failure (no chunks retrieved, gateway error, malformed response, empty value) lands as confidence='failed' with a populated error (Decision C-10) rather than aborting the run. Dispatch is sequential in v0.3.0; per-cell parallelism is a follow-on if the 2,000-cell latency forces it.
aggregate_node — groups per-cell results by document into the results JSONB and flips status to completed (or failed if a prior node set state['error']). It writes cost_actual_usd as the sum of per-cell costs — which is currently always 0 (see Per-cell cost & tier).

Retrieval is lexical FTS, not vector search — the column query is treated as keyword input over the document's chunks. This is a deliberate v0.3.0 choice: it needs no embedding pre-pass and keeps each cell's context small. It also means a column query whose wording diverges from the contract's vocabulary may retrieve weaker chunks; the FTS-falls-back-to-first-N behavior keeps the cell from failing outright, but the operator should treat low-confidence cells accordingly.