skills/msa-review-saas/test-plan.md

Acceptance Test Plan — MSA Review — SaaS v1.0.0

Skill summary

Reviews SaaS Master Services Agreements from the customer or vendor perspective. Supports customer and vendor perspectives; supports comprehensive and quick_triage review modes; calibrated to common in-house counsel review of SaaS contracts.

Test corpus requirements

Source 8–12 anonymized SaaS MSAs covering:

At least 3 customer-perspective MSAs (the user's organization is the customer purchasing SaaS): ideally one mid-market SaaS template, one enterprise SaaS template, one developer-tools SaaS template. Different SaaS categories surface different concerns (data-platform SaaS has different sensitivities than collaboration SaaS).
At least 3 vendor-perspective MSAs (the user's organization is the SaaS vendor selling): ideally one customer-facing template, one channel-partner template, one enterprise-RFP-response template.
At least 1 MSA with notable data-handling provisions (PII processing, customer-data-as-training-data, sub-processor cascades, international transfers) — to test the cross-pointer to DPA Checklist Review.
At least 1 MSA with notable IP provisions (joint IP development, customer-data-derived IP, AI-training rights) — to test the IP-clause findings.
At least 1 MSA with usage-based or consumption-based pricing (rather than seat-based) — to test commercial-terms calibration.
At least 1 routine, market-standard SaaS MSA to confirm calibration on a baseline document.

For perspective-branching tests, run the same MSA twice with different perspective inputs to verify perspective-aware calibration. For mode-branching tests, run the same MSA in comprehensive mode and quick_triage mode and verify the output difference.

Test scenarios

Scenario 1: Routine SaaS MSA, customer perspective, comprehensive mode (baseline)

Inputs: Market-standard SaaS MSA. Perspective: customer. Mode: comprehensive.

Expected output structure:

Markdown report with sections: "Executive summary" (or "Bottom line"), "Critical issues", "Material issues", "Minor issues / observations", "Recommended position", "What this skill does not do".
Severity tags follow rubric.
Citations reference clause numbers in the source MSA.

Expected calibration:

0–1 critical findings.
3–8 material findings (typically: liability cap calibration, indemnification scope, service-level credits structure, termination-for-convenience, audit rights).
6–15 minor findings.
"Bottom line" leads with customer-side recommendation.

Edge cases to verify:

Liability cap discussion is calibrated to the contract's commercial value and risk profile (not flagging a 12-month-fee cap as critical on a $50K/year contract; flagging unlimited liability exposure as critical).
Indemnification scope differentiates IP-infringement (typically uncapped or super-capped) from general indemnification (typically capped).
Auto-renewal language is identified and severity-tagged.

Pass criteria:

Structural pass: All required sections present; severity tags formatted correctly.
Calibration pass: Reviewing attorney confirms severity distribution is consistent with how they would review a routine SaaS MSA from customer side.

Scenario 2: Routine SaaS MSA, vendor perspective, comprehensive mode

Inputs: Market-standard SaaS MSA. Perspective: vendor. Mode: comprehensive.

Expected output structure: Same as Scenario 1.

Expected calibration:

0–1 critical findings (typically only if a customer-favorable provision creates outsized vendor exposure — e.g., unlimited indemnification of customer's downstream losses).
3–8 material findings (typically: data ownership, suspension rights, payment terms, liability cap defense, IP defense scope).
6–15 minor findings.
"Bottom line" leads with vendor-side recommendation.

Edge cases to verify:

Skill flags customer-favorable provisions appropriately from the vendor's perspective.
Skill does not lapse into customer-favorable analysis mid-report.
Recommended language reflects vendor-protective positions (e.g., suggesting suspension rights for non-payment; suggesting carve-outs in customer-favorable indemnification language).

Pass criteria: Same as Scenario 1, with perspective-awareness verification.

Scenario 3: SaaS MSA with significant data-handling provisions

Inputs: SaaS MSA where the vendor processes customer PII or sensitive data. Perspective: customer. Mode: comprehensive.

Expected output structure: Same as Scenario 1, with an explicit cross-reference to DPA Checklist Review for the data-handling-specific analysis.

Expected calibration:

0–2 critical findings depending on data-handling provisions: explicit AI-training rights without opt-out is critical; missing data-export-on-termination is critical; lack of breach notification is critical.
3–6 material findings on data handling (sub-processor approval, audit rights, data residency, retention, return on termination).
6–12 minor findings.
"Bottom line" notes whether a separate DPA is in place or whether the MSA's data-handling provisions are stand-alone, and recommends DPA Checklist Review for the deeper analysis.

Edge cases to verify:

Skill identifies the cross-pointer to DPA Checklist Review without duplicating that skill's analysis.
Skill flags AI-training-on-customer-data provisions at appropriate severity (critical if no opt-out; material if opt-out is present but operationally complex).
Skill addresses sub-processor cascade visibility.

Pass criteria: Same as Scenario 1, with cross-skill-pointer verification.

Scenario 4: SaaS MSA with notable IP provisions

Inputs: SaaS MSA where IP ownership is non-standard — e.g., joint IP on custom development, customer-data-derived IP claimed by vendor, AI-output ownership ambiguity. Perspective: customer. Mode: comprehensive.

Expected output structure: Same as Scenario 1.

Expected calibration:

1–3 critical findings on IP provisions where the customer's position is materially compromised (e.g., vendor claiming derivative IP on customer data).
2–4 material findings on IP-related provisions (work-product ownership, customer-data-derived insights, license-back provisions).
4–10 minor findings.
Recommended language addresses IP allocation explicitly.

Edge cases to verify:

Skill differentiates licensed-IP language (vendor licenses to customer) from owned-IP language (customer retains ownership).
Skill flags work-product-derived IP claims as customer-perspective concerns.
Skill addresses AI-output ownership where present.

Pass criteria: Same as Scenario 1, with IP-aware calibration verification.

Scenario 5: SaaS MSA in quick triage mode

Inputs: Same MSA as Scenario 1. Perspective: customer. Mode: quick_triage.

Expected output structure:

Shorter markdown report focused on critical and material issues only.
"Bottom line" / "Executive summary" present.
Minor issues either omitted or summarized in a single brief section.
"What this skill does not do" section present.

Expected calibration:

0–1 critical findings (same as comprehensive mode).
3–6 material findings (subset of comprehensive mode — only the most operationally significant).
Minor findings either compressed to a one-line summary or omitted.

Edge cases to verify:

Mode change shifts the report's depth without changing the substance of the calibration.
Critical and material findings in quick triage mode are a strict subset of the comprehensive mode findings; nothing surfaces in triage that does not also surface in comprehensive.

Pass criteria:

Structural pass: Triage-mode output is meaningfully shorter than comprehensive-mode output.
Calibration pass: Triage findings are the same severity calibration as comprehensive — what changes is depth, not substance.

Scenario 6: SaaS MSA with usage-based pricing

Inputs: SaaS MSA with consumption-based or seat-flexible pricing. Perspective: customer. Mode: comprehensive.

Expected output structure: Same as Scenario 1.

Expected calibration:

Findings address usage-meter accuracy, dispute mechanism, audit rights on usage data, true-up provisions.
"Bottom line" notes whether the pricing structure has been reviewed for predictability and dispute mechanisms.

Edge cases to verify:

Skill recognizes usage-based pricing structure (rather than treating it as fixed-fee).
Skill addresses meter-dispute and overage protection at appropriate severity.

Pass criteria: As above with pricing-aware verification.

Refusal scenarios

Refusal 1: Document is not a SaaS MSA

Input: A commercial purchase MSA, professional services agreement, or other contract type misidentified as a SaaS MSA.

Expected behavior:

Skill identifies that the document is not (primarily) a SaaS MSA.
Skill recommends the appropriate alternative skill (MSA Review — Commercial Purchase for goods/services agreements; "no specific skill applies, escalate to expert" for less common contract types).
Skill does not apply SaaS-specific analysis to a non-SaaS document.

Pass criteria: Skill explicitly refuses with a clear explanation and cross-pointer.

Refusal 2: Document is a SaaS Order Form rather than the underlying MSA

Input: A SaaS Order Form, Statement of Work, or other ancillary document referencing an underlying MSA the user has not provided.

Expected behavior:

Skill identifies that the document is an ancillary, not the MSA itself.
Skill suggests providing the underlying MSA for proper analysis.
Skill optionally identifies provisions in the Order Form that warrant attention but defers full analysis pending the MSA.

Pass criteria: Skill explicitly identifies the document type and suggests next steps.

Cross-cutting verification

These checks apply across all scenarios:

No invented authorities. Every citation to law, regulation, or industry guidance is real and accurately characterized.
No enforceability opinions. Skills defer enforceability questions ("this provision is unusual; enforceability is jurisdiction-specific and warrants escalation").
No regulatory-compliance opinions outside scope. The skill does not assert HIPAA-compliance, GDPR-compliance, or other regime-specific compliance — it surfaces patterns and refers to DPA Checklist Review or Vendor Privacy Policy First Pass for regime-specific analysis.
Recommended language is operationally usable. Suggested replacement language is clean and customer- or vendor-perspective-appropriate as the skill's perspective input dictates.
"What this skill does not do" enumeration is present. Typical items: substantive product/services-specific analysis, jurisdiction-specific enforceability opinions, tax structuring, regulatory-compliance certifications.
Citations resolve. Every citation in the report points at a clause in the source MSA.
Cross-skill pointers are accurate. When the skill recommends DPA Checklist Review or Vendor Privacy Policy First Pass, the recommendation is appropriate for the document's content.

Pass / fail decision

MSA Review — SaaS v1.0.0 passes acceptance testing when:

All 6 test scenarios pass structural checks.
All 6 test scenarios pass calibration evaluation by a reviewing attorney with SaaS contracting experience.
Both refusal scenarios trigger the documented refusal behavior.
Cross-cutting verification passes on every scenario.

The reviewing attorney pays particular attention to:

Liability-cap calibration relative to contract value (not over-flagging market-standard caps; not under-flagging unusual caps).
AI-training-on-customer-data provisions (a recent and high-stakes calibration target).
Cross-pointer accuracy to DPA Checklist Review and Vendor Privacy Policy First Pass.

Reviewer notes

The reviewing attorney for MSA Review — SaaS acceptance testing should have direct experience negotiating SaaS contracts from at least one of: customer side, vendor side, or both. Specific competencies:

Distinguishing market-standard vs. unusual liability caps in SaaS context.
Recognizing AI-training-on-customer-data provisions and calibrating their materiality.
Calibrating sub-processor and data-handling provisions in a SaaS MSA vs. a separate DPA.
Identifying when an Order Form or SOW is masquerading as the underlying MSA.

Calibration assessment is documented in test-results/msa-review-saas-v1.0.0/calibration-assessment.md.