docs/SESSION-HANDOFF-2026-05-17-m2-shipped-m3-kickoff.md

Session Handoff — 2026-05-17 — M2 shipped (v0.2.0) → M3 kickoff next

Purpose: Context transfer for the M3-kickoff session. M2 is complete and tagged v0.2.0 on main (squash-merged from m2-development via PR #42). The next session opens M3 — Playbooks, Word Add-In, Tabular Review, Slack/Teams bridge per PRD §8 M3. M3 is multi-track and benefits from upfront scope decisions before any code lands.

Read time: ~8 minutes. Decisions to surface to Kevin before any code: §4.

1. State at handoff

Repo state

Branch / Tag	SHA	Meaning
`main`	`8a1b3fc`	M2 squash-merge commit; v0.2.0 release point
`v0.2.0` (tag)	`8a1b3fc`	M2 release; GitHub Release published
`v0.1.0` (tag)	`3cb2b17`	M1 release; retro-tagged at M2 close
`m2-development`	`05b7da4`	Archive branch; kept per Kevin's call for git-blame archaeology

Mirrors: origin (LegalQuants) and tucuxi (Tucuxi-Inc) both synced. Tags pushed to both. Confirm with git fetch --all && git log --oneline main origin/main tucuxi/main before branching for M3.

GitHub Release: https://github.com/LegalQuants/lq-ai/releases/tag/v0.2.0 — contains the canonical M2 changelog including all scope-reframe reasoning.

What landed in M2 (v0.2.0)

Citation Engine — 4-stage cascade (exact match / tolerant match / paraphrase judge / ensemble). M2-A2, M2-B1, M2-C1, M2-D1. 4-state UI (M2-C2). Cost calibration from routing log (M2-E2).
Anonymization Layer — pre/post middleware in the Inference Gateway. Custom legal recognizers (M2-B2). Privileged-project handling (M2-D3). Retrieval-context skip (M2-D2). Streaming-aware rehydrator.
Azure OpenAI provider adapter — M2-E1 / DE-267. API-key auth this release; AD path at DE-278.
Documentation finalization (M2-F3) — two new Learn-tab playgrounds (Citation Engine cascade, Anonymization Layer); PRD §3.3 / §3.8 / §4.7 flipped from "deferred" to "shipped"; honest validation posture surfaced in docs/security/anonymization.md §"What's validated vs unvalidated" + README.

What was deferred via principled scope reframes (NOT busywork — read the reasoning if unsure why)

M2-F1 Citation Engine acceptance corpus — closed because existing unit + integration + Cypress + browser + round-trip + edge-case tests already pin the load-bearing behavior. Citation type 2 (case-citation validation, DE-279) and type 3 (case-content accuracy, DE-280) are architecturally distinct surfaces and tracked separately.
M2-F2 Anonymization acceptance corpus — closed via transparency-first deferral. Recognizer recall/precision on legal corpus is empirically unmeasured but a maintainer-built partial corpus would understate the scope. The principled response: document the gap (docs/security/anonymization.md §"What's validated vs unvalidated") + invite community contribution via DE-282 + give operators actionable guidance (route to Tier 1 / disable anonymization / pre-redact / per-message review).

Test deltas (M1 → v0.2.0)

Suite	M1 baseline	v0.2.0	Delta
`api/`	~700	1013	+313
`gateway/`	~300	515	+215
`web/` vitest	~400	456	+56
Cypress E2E suites	6	7	+1 (`m2-c2-citation-states`)

All gates green at v0.2.0: ruff format + ruff check + mypy (strict on gateway, standard on api) + pytest + svelte-check + Vitest.

Pre-tag fresh-install validation

Performed against m2-development@05b7da4 (now main@8a1b3fc): volumes destroyed, images removed, fresh clone, full docker compose up --build. All 7 services healthy; migrations applied through 0029; both new Learn-tab playgrounds serve cleanly. One UX paper-cut surfaced and filed as DE-283 — bootstrap admin password is in docker compose logs api but not surfaced at the login UI on 401. Community-friendly first contribution.

2. New DE entries filed during M2 — relevance for M3

10 DEs filed across M2 (DE-274 through DE-283). The ones likely to bear on M3 decisions:

DE	Why M3 cares
DE-276 — Ingest observability	If M3 Tabular Review or Playbooks rely on KB content being reliably embedded, the silent-fail surface from DE-276 hits them too. May want to address as M3 dependency.
DE-279 — Case citation validation	If M3 wants citation-grounded research surfaces (litigation work), the Citation Engine type-1 (KB-quote) doesn't cover Bluebook resolution. This is the natural next-citation-type to land.
DE-280 — Case-content accuracy	Hardest of the three citation surfaces. Likely M4, not M3, given complexity.
DE-282 — Anonymization empirical validation	Community-friendly DE. M3 surfaces interact with this (Word Add-In sees the same anonymized content; Tabular Review does too).

Full list at PRD §9.

3. M3 scope (per PRD §8)

M3 — Playbooks, Word Add-In, Tabular Review, and Slack/Teams (~8 weeks after M2)

Theme: Feature parity with commercial legal AI; surface coverage beyond the web.

The PRD's M3 deliverables:

Playbook engine + 4 built-in playbooks — Playbook schema + LangGraph executor + Easy Playbook auto-generation wizard. 4 built-ins: Generic SaaS MSA, NDA, DPA (GDPR-aligned), Commercial MSA. Playbook execution UI in web app.
Word Add-In (Office.js) — Chat against open document; apply skills to selection or whole doc; execute Playbooks against the doc; redlines as Word tracked changes; comments as Word comments; Inference Tier badge in the task pane; enterprise sideload distribution package.
Tabular / Multi-Document Review (PRD §3.14) — output_format: table skill mode; tabular UI surface; bulk operations; XLSX/CSV export; cost preview before execution.
Slack / Teams Light Intake Bridge (PRD §3.15) — OAuth install on Slack and Teams; /lq slash command (forward-as-chat) and /lq ask quick-skill flows; bot configuration in LQ.AI admin UI.

These are four largely-independent tracks, not a sequential cascade like M2 was. Some can run in parallel; some have ordering implications. Surface that question to Kevin first.

4. Decisions to surface to Kevin before any M3 code

These are the choice points I'd ask before opening a docs/M3-IMPLEMENTATION-PLAN.md analogous to M2's. Don't decide unilaterally.

A. Phase ordering for the 4 tracks

The four M3 tracks are largely independent. Options:

Sequential by complexity — Playbook engine first (the substrate; Word Add-In + Tabular both can call playbooks), then Word Add-In, then Tabular, then Slack/Teams. ~weeks 1-3, 4-5, 6, 7-8.
Sequential by visibility — Word Add-In first (most-requested surface for legal teams already on Word), Playbook engine second, Tabular third, Slack/Teams last (lowest priority).
Parallel where possible — Playbook engine + Word Add-In in parallel (different stacks: Python + Office.js); Tabular + Slack/Teams after.

Kevin's call. The PRD doesn't lock an order.

B. Out-of-scope-for-M3 candidates

The PRD lists everything in M3 as committed. Realistic question: is all of it shipping in M3, or is the maintainer-team budget likely to require a similar scope-reframe pass at M3 close? Worth being honest up-front about which of the 4 tracks are "must ship" vs "would be nice."

C. Plan format

M2 had a detailed docs/M2-IMPLEMENTATION-PLAN.md with 18 numbered tasks across 6 phases. Same structure for M3? Or a lighter-touch plan given M3's parallel-track nature?

D. Word Add-In — distribution + signing

The Office.js add-in needs a signed manifest for enterprise sideload. This is procurement-relevant (operators' IT will require signed builds). Worth deciding upfront whether v0.3 ships a signed manifest or a development-only manifest with a v0.3.1 follow-on for the signing.

E. Pre-M3 DE landings

Three DEs surfaced during M2 are worth picking up before M3 starts (or as M3-phase-0 tasks) rather than during M3:

DE-283 (fresh-install login UX) — small, community-friendly, easy win. Worth filing as the first community-contribution PR target.
DE-276 (ingest observability) — load-bearing for Tabular Review since silent embed failures break tabular outputs.
DE-277 (citation extractor chunk-boundary fallback) — small, isolated, would land cleanly between milestones.

Kevin's call on which of these become M3-phase-0 vs M3.1 vs deferred to M4.

5. Next-session entry point

After this handoff merges, the next session opens with this prompt:

Start M3. Read docs/SESSION-HANDOFF-2026-05-17-m2-shipped-m3-kickoff.md first. §4 has the open decisions. Don't write code; surface the decisions to Kevin, then propose a docs/M3-IMPLEMENTATION-PLAN.md.

The session opens with §4 decision-A through E pending. The new Claude should NOT default to building anything. The right move is to read the handoff + surface the four decisions + wait for Kevin.

If Kevin gives explicit direction (e.g., "build the Playbook engine first") then the next session writes the M3 plan + the first PR's investigation map and branches off main.

6. Memory state at end-of-session

MEMORY.md updated — points at this handoff as the M2-close + M3-kickoff entry.
project_lq_ai_status.md updated — M2 SHIPPED at v0.2.0 / 8a1b3fc; M3 next.
Other memory files (user role, feedback, references) unchanged.

7. Open PRs at handoff time

None. PRs #22–#42 all merged. The release-cycle PRs (#41 DE-283 doc; #42 main merge) closed cleanly.

This branch (handoff/m2-shipped-m3-kickoff) is doc-only and adds this file. After merging it, no open PRs.

8. Loose ends explicitly NOT being carried into M3

Per-skill prompt-injection detection rates — PRD §1.9 commits to publishing these; M2 didn't ship them. Tracked at PRD §9 Engineering Discipline subsection. Worth noting on the M3 plan since it's a continuing commitment.
OpenSSF Scorecard / Best Practices Badge — Silver tier targeted at M2 release per PRD §1.8. Not yet shipped. Decision: address in M3 or push to M3.x?
Mutation testing per release — PRD §1.9 commits to this; M2 release didn't include a mutation score. Tracked at PRD §9. Decision: M3 or later?

These are public commitments the project has made. The honest framing for M3 is to surface them explicitly + decide which to pick up vs leave on the deferred list.

End of handoff. The next session begins at §4 with the five decisions for Kevin.