skills/vendor-privacy-policy-first-pass/test-plan.md

Acceptance Test Plan — Vendor Privacy Policy First Pass v1.0.0

Skill summary

Triage assessment of a vendor's published privacy policy. Produces a structured summary of data handling practices and surfaces red flags relevant to in-house counsel evaluating whether to engage the vendor or escalate to deeper privacy review (DPA negotiation, full data-protection assessment).

Test corpus requirements

Source 5–8 anonymized vendor privacy policies covering:

At least 1 high-quality, market-standard privacy policy (e.g., a SaaS platform with a comprehensive, GDPR-aware policy) — to confirm baseline calibration.
At least 1 ambiguous-quality policy (sparse provisions; vague data-handling language; missing key sections) — to surface red flags.
At least 1 policy with notable data-handling features (AI-training rights, broad data-sharing, third-party advertiser ecosystem, cross-border data transfers without specified mechanism).
At least 1 policy with consumer-app focus (vs. enterprise focus) — different concerns surface.
At least 1 policy with regulated-industry context (financial services, healthcare, education) — additional regulatory lenses apply.

Documents can be sourced from the vendor's public website (privacy policies are typically published) and "anonymized" simply by replacing the vendor's name with a placeholder; the substantive content is what is being tested.

Test scenarios

Scenario 1: High-quality, market-standard privacy policy (baseline)

Inputs: A comprehensive SaaS-vendor privacy policy.

Expected output structure:

Markdown report with sections: "Bottom line" (summary recommendation: engage / escalate / decline), "Summary of practices" (factual summary of what the policy says), "Red flags / concerns" (issues warranting attention), "Open questions for the vendor" (gaps in the policy that require follow-up), "What this skill does not do".

Expected calibration:

0 critical red flags (this is a market-standard policy).
0–2 material red flags (often: international transfer mechanisms not explicit; sub-processor list not published).
2–6 minor observations (typically: cookie consent mechanism, retention period clarity, data-subject-rights mechanism specifics).
"Bottom line" leads with "engage with standard contractual safeguards" or similar.

Edge cases to verify:

Skill summarizes practices factually without inventing claims the policy does not make.
Skill differentiates clear practices from ambiguous-language practices.

Pass criteria:

Structural pass: All required sections present.
Calibration pass: Reviewing attorney confirms baseline calibration is correct (not over-flagging market-standard).

Scenario 2: Ambiguous / sparse privacy policy

Inputs: A privacy policy with vague language, missing sections, or unclear data-handling commitments.

Expected output structure: Same structure with greater emphasis on "Open questions for the vendor."

Expected calibration:

1–3 critical red flags if substantively missing key elements (no data-handling commitment, no breach notification, no data-subject rights).
2–5 material red flags on ambiguous-but-present provisions.
3–8 minor observations.
"Open questions for the vendor" is substantive — listing the specific questions that need answering before engagement.

Edge cases to verify:

Skill flags absences (missing sections) as findings, not as silent omissions.
Skill differentiates "policy says nothing about X" from "policy says X without specificity."

Pass criteria: As above with ambiguity-aware verification.

Scenario 3: Policy with notable data-handling features

Inputs: A privacy policy with notable features: AI-training-on-customer-data, broad data-sharing with affiliates / advertisers, cross-border transfers without specified mechanism.

Expected output structure: Same structure with explicit findings on the notable features.

Expected calibration:

AI-training rights without clear opt-out: critical red flag.
Broad data-sharing with advertisers if user data is implicated: critical or material depending on scope.
Cross-border transfers without specified mechanism: material red flag (specific to cross-border-data-flow contexts).
"Bottom line" reflects the notable-features posture.

Edge cases to verify:

Skill identifies AI-training-on-customer-data provisions specifically and addresses them at appropriate severity.
Skill addresses ad-tech / data-broker ecosystem if implicated.

Pass criteria: As above with notable-features verification.

Scenario 4: Consumer-app privacy policy

Inputs: A consumer-facing app's privacy policy (B2C rather than B2B).

Expected output structure: Same structure but with B2C-specific concerns: COPPA if minors are implicated, geolocation, advertising, in-app behavior tracking.

Expected calibration:

Findings address consumer-app-specific concerns appropriately.
"Bottom line" notes whether the in-house counsel context (B2B engagement evaluating a vendor) maps to the policy's posture.

Edge cases to verify:

Skill identifies the B2C focus and notes how it differs from typical B2B-vendor evaluation.
Skill addresses COPPA / minor-targeted concerns if present.

Pass criteria: As above with B2C-context verification.

Scenario 5: Regulated-industry vendor policy

Inputs: A privacy policy for a vendor operating in a regulated industry (financial services, healthcare, education).

Expected output structure: Same structure with attention to regulatory-specific concerns: GLBA / financial-privacy provisions; HIPAA / healthcare data handling; FERPA / education records.

Expected calibration:

Findings address the regulatory context specifically.
"Bottom line" recommends regulatory-specific deeper review (e.g., DPA Checklist Review with HIPAA BAA regime if healthcare data is in scope).

Edge cases to verify:

Skill identifies the regulatory context.
Skill recommends appropriate cross-skill follow-up (DPA Checklist Review for the regulatory layer).

Pass criteria: As above with regulatory-context verification.

Refusal scenarios

Refusal 1: Document is not a privacy policy

Input: Terms of service, EULA, or other policy type misidentified as a privacy policy.

Expected behavior:

Skill identifies that the document is not (primarily) a privacy policy.
Skill suggests appropriate next steps (read the actual privacy policy; or note that the document type is not within scope).

Pass criteria: Explicit refusal.

Refusal 2: Document is in non-English

Input: A non-English privacy policy.

Expected behavior:

Skill identifies the language and notes the skill is calibrated to English-language US/EU/UK contexts.
Skill seeks confirmation before proceeding.

Pass criteria: Explicit identification and confirmation request.

Cross-cutting verification

Triage scope discipline. This skill is a first-pass triage, not a comprehensive privacy assessment. Findings should not overclaim on the depth of analysis — the skill explicitly recommends DPA Checklist Review or full privacy assessment for deeper analysis where appropriate.
No invented privacy practices. The skill summarizes what the policy says, not what the vendor does in practice (the policy may not reflect actual practice; this is acknowledged in the "What this skill does not do" enumeration).
No regulatory-compliance opinions. The skill flags regulatory concerns but does not assert "this policy is GDPR-compliant" or "this policy violates CCPA."
Open questions are substantive. The "Open questions for the vendor" section produces real questions, not generic placeholders.
"What this skill does not do" enumeration present.
Citations to specific sections of the policy resolve.

Pass / fail decision

Vendor Privacy Policy First Pass v1.0.0 passes acceptance testing when:

All 5 test scenarios pass structural checks.
All 5 test scenarios pass calibration evaluation by a reviewing attorney with privacy / data-protection experience.
Both refusal scenarios trigger documented refusal behavior.
Cross-cutting verification passes on every scenario.

Reviewer notes

The reviewing attorney for Vendor Privacy Policy First Pass should have practical privacy / data-protection experience covering both consumer and enterprise contexts. Specific competencies:

Identifying AI-training-on-customer-data provisions and calibrating their materiality.
Evaluating cross-border transfer mechanisms in the post-Schrems II environment.
Distinguishing market-standard from sub-standard privacy policy practices.
Recognizing when a triage flag warrants escalation to full DPA negotiation vs. when standard contractual safeguards are sufficient.

Calibration assessment is documented in test-results/vendor-privacy-policy-first-pass-v1.0.0/calibration-assessment.md.