skills/action-items-from-client-alert/test-plan.md

Acceptance Test Plan — Action Items from Client Alert v1.0.0

Skill summary

Extracts time-sensitive action items, deadlines, and obligations from a regulatory bulletin, law firm memo, or client alert into a deadline-organized checklist. Supports applicable_jurisdictions and industry_filter inputs to narrow the extraction to items affecting the user's organization.

Test corpus requirements

Source 5–8 anonymized client alerts / regulatory bulletins covering:

At least 1 single-jurisdiction alert with explicit deadlines (e.g., a state-AG enforcement-deadline notice with a specific compliance date).
At least 1 multi-jurisdiction alert (e.g., a bulletin covering simultaneous federal and state regulatory changes with overlapping but distinct deadlines).
At least 1 alert with implicit deadlines (e.g., "registrants must file before the next annual renewal" — requires inference of when the next annual renewal is).
At least 1 alert with no deadlines (informational / advisory in nature) to test the no-deadlines handling.
At least 1 alert with notably narrow industry applicability (e.g., specific to financial services, healthcare, or another regulated industry).
At least 1 alert from a law-firm-memo source (vs. a regulator-issued bulletin) — different framing, often more analytical.

Sources can include public client alerts (most law firm websites publish them) and regulatory-agency bulletins (Federal Register, SEC press releases, state AG announcements).

Test scenarios

Scenario 1: Single-jurisdiction alert with explicit deadlines

Inputs: Client alert covering a specific regulatory development with explicit compliance deadlines. Inputs: applicable jurisdiction matching the alert.

Expected output structure:

Markdown report with sections: "Bottom line" (summary of relevance and urgency), "Action items" (organized by deadline timeframe — Immediate / Short-term / Medium-term / Long-term), "Open questions / context" (items needing clarification before action), "What this skill does not do".
Each action item includes: action description, deadline (date or relative timeframe), source citation in the alert, severity / urgency tag.
Citations point at the specific section of the alert.

Expected calibration:

Action items extracted are real (a reviewing attorney would identify the same items from the alert).
Deadlines are accurately captured (specific dates preserved; relative deadlines preserved as relative).
Actions are appropriately atomic (each action item is one thing the user can do, not a vague "comply with the new rule").
"Bottom line" addresses applicability to the user's stated jurisdiction.

Edge cases to verify:

If the alert has multiple dates (effective date vs. compliance date vs. enforcement-discretion date), the skill differentiates them.
If the alert has phased deadlines (initial requirement at date X, full requirement at date Y), each phase becomes a separate action item.

Pass criteria:

Structural pass: All required sections present; deadline tags consistent.
Calibration pass: Reviewing attorney confirms action items extracted are correct and complete.

Scenario 2: Multi-jurisdiction alert

Inputs: Alert covering federal and state-level changes with overlapping but distinct deadlines. Inputs: applicable_jurisdictions: [federal, california, new-york] (or similar).

Expected output structure: Same structure with explicit jurisdiction tagging on each action item.

Expected calibration:

Action items are tagged by jurisdiction.
If the user's jurisdictions filter excludes a jurisdiction in the alert, items applicable only to the excluded jurisdiction are either omitted or surfaced separately under a "filtered out" notation.
Deadlines that differ across jurisdictions (the federal effective date vs. the California compliance date) are clearly differentiated.

Edge cases to verify:

Items applicable to multiple jurisdictions are flagged once with multi-jurisdiction tagging, not duplicated.
If a jurisdiction filter is applied but the alert covers a related-but-different jurisdiction (e.g., user filter is California; alert covers federal preemption of California requirements), the skill notes the relationship.

Pass criteria: As above with jurisdiction-aware verification.

Scenario 3: Alert with implicit deadlines

Inputs: Alert where deadlines are implicit ("at the time of next annual renewal", "before the start of the next reporting period", "promptly after the effective date").

Expected output structure: Same structure with implicit deadlines surfaced as such.

Expected calibration:

Implicit deadlines are flagged as such (not inferred to specific dates the alert does not provide).
The skill suggests how the user can resolve the implicit deadline (e.g., "the next annual renewal date for our company is required to convert this to an actionable date").
"Open questions / context" includes the implicit-deadline resolution as a follow-up item.

Edge cases to verify:

The skill does not invent specific dates from implicit references.
The skill calibrates urgency based on the implicit timeframe (e.g., "next reporting period" suggests near-term; "by next annual renewal" suggests up to 12 months).

Pass criteria: As above with implicit-deadline verification.

Scenario 4: Alert with no deadlines (informational)

Inputs: Advisory alert (e.g., a regulatory commentary, FAQ release, or interpretive guidance) without explicit compliance dates.

Expected output structure: Modified structure — the "Action items" section is sparse or replaced with an "Awareness items" section; the "Bottom line" addresses why the alert is relevant despite the absence of immediate action items.

Expected calibration:

Skill does not invent deadlines that aren't in the alert.
Skill identifies forward-looking implications (e.g., "this guidance suggests a future enforcement direction; monitor for follow-on regulatory action").
Skill does not produce action items where none are warranted.

Edge cases to verify:

Skill clearly distinguishes "no deadlines because none stated" from "deadlines that I missed in extraction."

Pass criteria: As above with no-deadline verification.

Scenario 5: Industry-filtered alert

Inputs: Alert with notable industry applicability (e.g., financial services, healthcare). Input: industry_filter: <industry> matching the alert's scope.

Expected output structure: Same structure with industry-specific scoping called out.

Expected calibration:

Action items are filtered to those applicable to the specified industry.
Items not applicable to the industry are either omitted or surfaced with a clear "not applicable to your industry" flag.
"Bottom line" addresses industry-specific impact.

Edge cases to verify:

If the alert is industry-specific but the user's filter doesn't match, the skill notes the mismatch and recommends whether to proceed.
If the alert has cross-industry implications, the skill differentiates the industry-specific from the cross-cutting.

Pass criteria: As above with industry-filter verification.

Scenario 6: Law-firm-memo source

Inputs: Alert sourced from a law firm memo (often more analytical than a regulator-issued bulletin; may include strategic recommendations).

Expected output structure: Same structure but with awareness that the source is interpretive / strategic.

Expected calibration:

The skill differentiates the alert's factual content (what the regulation requires) from strategic content (the firm's recommendation on how to respond).
The skill does not import the firm's strategic framing as fact.
Action items are operationally usable for the user's organization, not just summaries of the firm's recommendations.

Edge cases to verify:

The skill calibrates against the firm-memo style (often longer, more analytical, with embedded commentary).

Pass criteria: As above with source-aware verification.

Refusal scenarios

Refusal 1: Document is not a client alert / regulatory bulletin

Input: A research paper, news article, or other document misidentified as a client alert.

Expected behavior:

Skill identifies the document type mismatch.
Skill suggests whether the document is amenable to action-item extraction or whether a different skill or approach is appropriate.

Pass criteria: Explicit identification.

Refusal 2: Alert is for a jurisdiction outside the user's filter

Input: Alert covers a jurisdiction (or industry) that the user has explicitly filtered out.

Expected behavior:

Skill identifies the mismatch.
Skill recommends either expanding the filter or noting that the alert may not be relevant.
Skill does not produce filtered-out action items as if they applied.

Pass criteria: Explicit mismatch handling.

Cross-cutting verification

No invented action items. Every action item is grounded in the alert.
No invented deadlines. Implicit deadlines are flagged as implicit, not inferred to specific dates the alert does not provide.
No regulatory-compliance opinions. The skill summarizes what the alert says is required; it does not assert that the user's compliance approach satisfies the requirement.
Citations resolve. Every action item points at the section of the alert where it is supported.
"What this skill does not do" enumeration present. Typical: substantive regulatory analysis, jurisdiction-specific compliance certifications, strategic legal advice on enforcement risk.

Pass / fail decision

Action Items from Client Alert v1.0.0 passes acceptance testing when:

All 6 test scenarios pass structural checks.
All 6 test scenarios pass calibration evaluation by a reviewing attorney with regulatory-update experience.
Both refusal scenarios trigger documented refusal behavior.
Cross-cutting verification passes on every scenario.

Reviewer notes

The reviewing attorney for Action Items from Client Alert acceptance testing should have practical experience with regulatory update intake and compliance-program management. Specific competencies:

Recognizing when an extracted action item is too vague to be operational ("comply with new rules") vs. appropriately atomic ("file the new disclosure form by March 31").
Calibrating implicit deadlines (when "promptly" warrants treatment as immediate vs. when it warrants ~30 days).
Differentiating an alert's factual content from interpretive framing in firm memos.

Calibration assessment is documented in test-results/action-items-from-client-alert-v1.0.0/calibration-assessment.md.