Acceptance Test Plan — MSA Review — Commercial Purchase v1.0.0
Skill summary
Reviews commercial purchase MSAs (goods, services, professional services) from the customer or vendor perspective. Sister skill to MSA Review — SaaS, calibrated to non-SaaS commercial agreements where the substantive concerns differ (delivery, acceptance, warranties on tangible goods or service performance, professional services scope, change-order management).
Test corpus requirements
Source 6–10 anonymized commercial purchase MSAs covering:
- At least 2 customer-perspective MSAs for goods purchase (the user's organization is buying physical goods or equipment from a vendor).
- At least 2 customer-perspective MSAs for professional services (the user's organization is engaging consultants, contractors, or service providers).
- At least 2 vendor-perspective MSAs (the user's organization is selling goods or services).
- At least 1 MSA with significant warranty / acceptance / inspection provisions (test mechanical-warranties calibration).
- At least 1 MSA with statement-of-work / change-order architecture (test SOW-MSA boundary calibration).
- At least 1 routine, market-standard MSA to confirm baseline calibration.
For perspective-branching tests, run the same MSA twice with different perspective inputs.
Test scenarios
Scenario 1: Routine goods-purchase MSA, customer perspective
Inputs: Standard commercial-purchase MSA for goods. Perspective: customer. Mode: comprehensive.
Expected output structure:
- Markdown report with sections: "Bottom line", "Findings", "Recommended position", "What this skill does not do".
- Severity tags follow rubric.
- Citations reference clauses in source.
Expected calibration:
- 0–1 critical findings.
- 3–7 material findings (typically: warranty scope and duration, acceptance criteria, delivery/risk-of-loss, indemnification scope, termination rights).
- 5–12 minor findings.
- "Bottom line" leads with customer-side recommendation.
Edge cases to verify:
- Skill addresses warranty duration and remedies appropriate to the type of goods (a 90-day warranty on durable equipment is unusually short; a 24-month warranty on consumables is unusually long).
- Skill addresses acceptance criteria and inspection rights at appropriate severity.
- Skill addresses risk-of-loss / title-transfer mechanics (FOB / FCA / DDP).
Pass criteria:
- Structural pass: All required sections present.
- Calibration pass: Reviewing attorney confirms calibration.
Scenario 2: Professional services MSA, customer perspective
Inputs: Professional services MSA (consulting, implementation, integration services). Perspective: customer. Mode: comprehensive.
Expected output structure: Same as Scenario 1.
Expected calibration:
- 0–1 critical findings.
- 3–7 material findings (typically: scope and change-order, deliverables acceptance, IP ownership of work product, key-personnel commitments, no-poach provisions, dispute mechanisms).
- 5–12 minor findings.
- "Bottom line" addresses scope-of-work clarity and IP allocation prominently.
Edge cases to verify:
- IP ownership of work product is addressed at appropriate severity (typically critical or material from the customer's perspective if vendor retains broad rights).
- Skill addresses key-personnel commitments and substitution rights.
- Skill addresses change-order architecture and its boundary with the underlying MSA.
Pass criteria: As above with services-specific calibration verification.
Scenario 3: Vendor perspective MSA
Inputs: Commercial purchase MSA. Perspective: vendor. Mode: comprehensive.
Expected output structure: Same as Scenario 1.
Expected calibration:
- 0–1 critical findings (typically only on customer-favorable provisions creating outsized vendor exposure: unlimited indemnification, broad most-favored-nation provisions, warranty obligations beyond product life).
- 3–6 material findings on payment terms, acceptance disputes, limitation of liability defense, IP defense scope.
- 5–10 minor findings.
- "Bottom line" leads with vendor-side recommendation.
Edge cases to verify:
- Payment-terms provisions (net-30 vs. net-60 vs. payment-on-acceptance) are addressed from vendor perspective.
- Skill flags overbroad customer-favorable IP indemnification scope.
Pass criteria: Calibration pass verifying perspective-aware findings.
Scenario 4: MSA with notable warranty provisions
Inputs: MSA with unusual warranty scope (extended duration, broad remedy, or notably narrow). Perspective: as appropriate.
Expected output structure: Same as Scenario 1.
Expected calibration:
- The unusual warranty provisions surface explicitly with severity calibrated to the deviation from market.
- Recommended language addresses the unusual provision.
Edge cases to verify:
- Skill differentiates express warranties from implied warranties.
- Skill addresses warranty disclaimers ("AS IS") at appropriate severity.
Pass criteria: As above.
Scenario 5: MSA with SOW / change-order architecture
Inputs: MSA that contemplates separate Statements of Work and a change-order process. Perspective: customer or as appropriate.
Expected output structure: Same as Scenario 1, with explicit attention to the MSA-SOW boundary.
Expected calibration:
- Findings address change-order approval mechanics.
- Findings address how SOWs interact with MSA-level provisions (does the SOW override the MSA or vice versa?).
- Findings address SOW dispute escalation.
Edge cases to verify:
- Skill identifies whether the MSA controls the SOW-level terms or vice versa.
- Skill addresses change-order pricing and timeline-impact provisions.
Pass criteria: As above with SOW-architecture-aware verification.
Scenario 6: Quick triage mode
Inputs: Same MSA as Scenario 1. Mode: quick_triage.
Expected output structure:
- Shorter report focused on critical and material issues.
- Minor issues compressed or omitted.
Expected calibration:
- Critical and material findings are a subset of comprehensive mode.
Pass criteria: Triage output is meaningfully shorter without changing severity calibration of surfaced findings.
Refusal scenarios
Refusal 1: Document is a SaaS MSA
Input: A SaaS MSA misidentified as a commercial purchase MSA.
Expected behavior:
- Skill identifies the SaaS context.
- Skill recommends MSA Review — SaaS as the appropriate skill.
- Skill does not apply commercial-purchase-specific analysis to a SaaS MSA.
Pass criteria: Explicit refusal with cross-pointer.
Refusal 2: Document is an Order Form / SOW only
Input: A standalone Order Form or SOW without the underlying MSA.
Expected behavior:
- Skill identifies that the document is ancillary.
- Skill suggests providing the underlying MSA for proper analysis.
Pass criteria: Explicit identification and recommendation.
Cross-cutting verification
- No invented authorities.
- No enforceability opinions.
- No regulatory-compliance opinions outside scope.
- Recommended language is operationally usable.
- "What this skill does not do" enumeration present.
- Citations resolve.
- Cross-skill pointers accurate.
Pass / fail decision
MSA Review — Commercial Purchase v1.0.0 passes acceptance testing when:
- All 6 test scenarios pass structural checks.
- All 6 test scenarios pass calibration evaluation by a reviewing attorney with commercial contracting experience.
- Both refusal scenarios trigger the documented refusal behavior.
- Cross-cutting verification passes on every scenario.
Reviewer notes
The reviewing attorney should have direct experience with commercial purchase MSAs (goods or services). Specific competencies:
- Distinguishing goods-side concerns (warranty, acceptance, risk-of-loss, delivery) from services-side concerns (scope, change orders, IP allocation, key personnel).
- Calibrating warranty duration and remedy across product types.
- Evaluating SOW-MSA boundary clarity.
- Addressing change-order architecture from customer and vendor perspectives.
Calibration assessment is documented in test-results/msa-review-commercial-purchase-v1.0.0/calibration-assessment.md.