Technical design — MOD-018 Alert case management system¶
Module: MOD-018 — Alert case management system
System: SD03 — AML Transaction Monitoring Platform
Repo: bank-aml
FR scope: FR-113, FR-114, FR-115, FR-116
NFR scope: NFR-010, NFR-011, NFR-019, NFR-024
Policies satisfied: AML-005 (LOG), AML-006 (LOG), GOV-006 (LOG)
Author: AI agent (Claude Opus 4.7)
Date: 2026-05-01
Dependencies: MOD-016 (Built — bank-aml), MOD-104 (Built), MOD-103 (Built)
Stage covered: designed; deploy unblocked (no cross-bus IAM gap — MOD-018 stays on the bank-aml bus).
Objective¶
MOD-018 is the alert case management system for SD03. It consumes bank.aml.alert_raised from the bank-aml bus, deduplicates alerts on the same customer within a 24-hour window into a single case (FR-113), assigns cases to analysts via a round-robin balancer with a 4-hour wall-clock escalation timer (FR-114), records every action as an immutable event in aml.case_events (FR-115), and gates NO_ACTION closures on high-risk cases behind mandatory supervisor approval (FR-116). It updates aml.aml_alerts.case_id on attach, owns the aml.aml_cases lifecycle, and publishes bank.aml.case_opened/escalated/closed to the bank-aml bus. Closure also publishes a staff.action_taken event to the bank-platform bus per MOD-047's producer contract — the GOV-006 cross-cut audit trail.
Architecture¶
┌───────────────────────────┐
│ bank-aml EventBridge bus │ own bus — no IAM widening
└─────────────┬─────────────┘
│ bank.aml.alert_raised
▼
┌───────────────────────┐
│ MOD-018 alert-consumer │ ADOT layer; BankAmlRole;
│ src/handlers/ │ reserved concurrency 50;
│ alert-consumer.ts │ VPC-attached for Neon.
└──────────┬─────────────┘
│ withTransaction
▼
┌───────────────────────────────────────┐
│ findDedupCase(party_id, 24h, FOR UPD) │
│ ↳ hit → attachAlertToExistingCase │
│ ↳ miss → createCaseFromAlert │
│ + pickNextAnalyst() │
│ + writeCaseEvent×3 │
└────────────┬──────────────────────────┘
│ COMMIT
│ (publish-last)
▼
┌──────────────────────────┐
│ bank-aml EventBridge bus │
│ bank.aml.case_opened │ → MOD-064, MOD-074
└──────────────────────────┘
┌─────────────────────────────────────────────────┐
│ EventBridge schedule rate(15min) ENABLED │
│ → MOD-018 escalation-sweeper Lambda │
│ DLQ + CloudWatch alarm on Errors > 0 │ ← Minor 2 mandate
│ Wall-clock 4h check from aml_cases.created_at
│ (no reset on reassignment / decline) │
└─────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────┐
│ HTTP API (10 routes) │
│ POST /cases/{id}/{assign,accept,decline,notes, │
│ escalate,close} │
│ GET /cases/{id}, /cases, /analysts │
│ PUT /analysts │
│ ↓ │
│ case-api Lambda │
│ ├─ FR-116 SAR-gate on close (NO_ACTION + risk≥threshold)│
│ ├─ aml.case_events (immutable) per action │
│ └─ on close: publishStaffAction → bank-platform bus │
│ (MOD-047 audit trail) │
└────────────────────────────────────────────────────────────┘
Path A confirmed (continued from MOD-016)¶
The orchestrator's MOD-016 Path A decision (local materialised cache for posting history) is reused here implicitly — MOD-018 reads aml.typology_matches, aml.aml_alerts, aml.rule_definitions directly from the same bank_aml Neon DB. No HTTP dependency on MOD-016. Per AP-010, same-domain Postgres reads are preferred over cross-module HTTP.
FR coverage¶
| FR | Where |
|---|---|
| FR-113 — alerts on same customer within 24h dedup into one case | alert-consumer.ts calls findDedupCase() (FOR UPDATE on the partial index idx_aml_cases_open_dedup) before deciding to create vs. attach. The dedup window is ENV.CASE_DEDUP_WINDOW_HOURS (default 24). |
| FR-114 — assign with workload balancing; escalate to supervisor if no acceptance within 4 business hours | pickNextAnalyst() round-robin via aml.analyst_pool.last_assigned_at with FOR UPDATE SKIP LOCKED. The 4h timer is wall-clock from aml_cases.created_at per the orchestrator's K2 decision (clock does NOT reset on reassignment). The escalation-sweeper Lambda runs every 15 min — sweeps unassigned cases past the timer, writes CASE_ESCALATED audit row, publishes bank.aml.case_escalated to MLRO. Sweeper has its own DLQ + CloudWatch alarm on Errors>0 (Minor 2). |
| FR-115 — every case action is an immutable event | writeCaseEvent() writes to aml.case_events (V001 BEFORE-row trigger rejects UPDATE/DELETE). One row per action type — CASE_OPENED / ALERT_ATTACHED / CASE_ASSIGNED / CASE_ACCEPTED / CASE_DECLINED / CASE_REASSIGNED / NOTE_ADDED / STATUS_CHANGED / CASE_ESCALATED / CASE_SUPERVISOR_APPROVED / CASE_CLOSED. Operator ID = actor_staff_id for staff actions; actor_kind = 'system' for the auto-create + auto-escalation paths. |
| FR-116 — mandatory supervisor approval before NO_ACTION close on high-risk case | enforceSarGate() on the close endpoint. Threshold from AppConfig (MOD-018-sar-threshold, default 70). Gate logic per K3: disposition == NO_ACTION AND case.max_alert_risk_score ≥ threshold requires approving_supervisor_id that is (a) different from case.assigned_to, and (b) present in aml.analyst_pool with is_supervisor = true AND active = true. Otherwise 403. |
Policy coverage¶
AML-005 LOG — Transaction monitoring (alert disposition recorded)¶
Mechanism: Every alert that enters MOD-018 is either attached to a new or existing case, with at least two aml.case_events rows recording the linkage (CASE_OPENED + ALERT_ATTACHED, or just ALERT_ATTACHED on dedup). The dedup path makes "no alert silently discarded" structurally impossible — the consumer commits the case_event row in the same transaction as the case-creation/attach.
LOG immutability test: tests/policy/pol-aml-005-log.test.ts — source-token scan + a structural check that every alert in the last 24h has case_id IS NOT NULL.
AML-006 LOG — SAR decision trail; tipping-off prohibition¶
Mechanism: Every SAR / NO_ACTION decision writes (a) STATUS_CHANGED, (b) CASE_CLOSED, and where applicable (c) CASE_SUPERVISOR_APPROVED rows to aml.case_events, plus a bank.aml.case_closed event for downstream consumers (MOD-019 etc.). The trail from triggering alert IDs (aml_alerts.case_id) → case → close is fully queryable.
Tipping-off: MOD-018 has no customer-facing API surface; it never publishes case state to channels customers receive. The source-scan tests grep for likely tipping-off strings as a regression guard.
LOG immutability test: tests/policy/pol-aml-006-log.test.ts — UPDATE/DELETE rejected on aml.case_events; tipping-off string scan.
GOV-006 LOG — Compliance function performance measurable¶
Mechanism: The closeCase handler emits staff.action_taken to the bank-platform bus per MOD-047's producer contract. MOD-047 lands an immutable row in audit.agent_actions with full operator identity, action type (WRITE_AML_CASE_CLOSED), party_id, and timestamps. Downstream reporting (volume / aging / disposition rate) is queryable via aml.aml_cases + aml.case_events.
LOG test: tests/policy/pol-gov-006-log.test.ts — source assertion that close calls publishStaffActionSafe with WRITE_AML_CASE_CLOSED; that the publisher targets the bank-platform bus / staff.action_taken detail-type.
Audit emission strategy¶
aml.case_events is the primary FR-115 audit ledger — every mutation writes one row in the same transaction. staff.action_taken to MOD-047 (bank-platform bus) is the cross-cut audit trail.
v1 scope: Close (the SAR / NO_ACTION decision path) emits staff.action_taken. Other case actions are persisted only in aml.case_events — not cross-emitted.
v2 follow-up: Cross-emit every case action. This requires (a) a clean route-context plumbing of correlation_id + jurisdiction + party_id into each handler, (b) reliable resilience around the publish-last failure mode (alert is durable in case_events, but the staff.action_taken miss creates an audit gap that requires reconciliation). v1 scope keeps the implementation tight; v2 lands the cross-cut as a hardening pass.
Documented in the handoff as a known v1 limitation — does NOT affect FR-115 (case_events is the FR-115 evidence).
Database tables (aml schema additions — see handoff)¶
| Table | Owner | Read | Write | Mutability |
|---|---|---|---|---|
aml.aml_cases |
MOD-018 (creates) + MOD-019 (FK from regulatory_submissions) | yes | INSERT (open); UPDATE on lifecycle fields | mutable on lifecycle; created_at immutable |
aml.case_events |
MOD-018 | yes (case display) | INSERT-only | append-only trigger |
aml.analyst_pool |
MOD-018 | yes (balancer + SAR gate) | INSERT/UPDATE (admin endpoints) | mutable |
aml.case_assignments |
MOD-018 | yes (balancer history) | INSERT (assign); UPDATE (accept/decline/supersede markers) | mutable on those markers only |
aml.aml_alerts and aml.typology_matches (created by MOD-016) — read by MOD-018; UPDATE on aml_alerts.case_id is granted in MOD-016 V001.
aml.idempotency_keys (created by MOD-016) — used for sync HTTP idempotency.
The FK from aml.aml_alerts.case_id → aml.aml_cases.id is added in MOD-018 V001 as DEFERRABLE INITIALLY DEFERRED (MOD-016 had the column but no FK because aml_cases didn't yet exist).
DB-enforced invariants (ADR-048)¶
Authoritative register lives at SD03-aml-monitoring.md §DB-enforced invariants. MOD-018 owns these invariants — the wiki SD03 register currently lists MOD-016's only; the MOD-018-data-model-additions handoff requests the additions below be added to the wiki register.
Immutability triggers (Category 1)¶
| Table | Trigger | Function |
|---|---|---|
aml.case_events |
trg_case_events_immutable |
aml.fn_immutable_row() (SECURITY DEFINER, owned by bank_aml_migrate_user) |
V001 created the trigger as case_events_no_mutation calling MOD-016's legacy aml.reject_mutation(). V002 rebinds it to the canonical SECURITY DEFINER function and renames per the wiki convention.
CHECK constraints (Category 1)¶
| Table | Constraint | Definition |
|---|---|---|
aml.aml_cases |
chk_aml_cases_max_alert_risk_score |
max_alert_risk_score >= 0 AND max_alert_risk_score <= 100 |
aml.aml_cases |
chk_aml_cases_closed_after_created |
closed_at IS NULL OR closed_at >= created_at |
aml.case_assignments |
chk_case_assignments_accepted_after_assigned |
accepted_at IS NULL OR accepted_at >= assigned_at |
aml.case_assignments |
chk_case_assignments_declined_after_assigned |
declined_at IS NULL OR declined_at >= assigned_at |
aml.case_assignments |
chk_case_assignments_superseded_after_assigned |
superseded_at IS NULL OR superseded_at >= assigned_at |
The V001 case_assignments_accept_xor_decline CHECK (a row cannot be both accepted and declined) is also Category 1, registered here for completeness.
Not DB-enforced (Category 3 — cross-service or config-driven)¶
| Rule | Reason | Owner |
|---|---|---|
| SAR-threshold gate (FR-116) | AppConfig-configurable threshold; supervisor-pool join needs runtime context | MOD-018 Lambda |
| Escalation timer (FR-114) | Wall-clock + AppConfig-configurable timer; depends on EventBridge schedule | MOD-018 Lambda |
| Case state-machine transitions | Codified in case-state-machine.ts; depends on disposition + supervisor approval state |
MOD-018 Lambda |
| Workload balancer round-robin | Reads + writes analyst_pool.last_assigned_at with FOR UPDATE SKIP LOCKED — would be expensive as a trigger; semantically belongs in the assignment service |
MOD-018 Lambda |
| Dedup window | AppConfig-tunable; FOR UPDATE on the partial dedup index gives the same race-safety as a trigger would | MOD-018 Lambda |
Negative tests¶
tests/integration/adr-048-invariants.test.ts per ADR-048 §5: every immutability trigger and CHECK constraint has a negative test that attempts the violation inside a transaction, asserts the expected exception, and rolls back.
SSM outputs¶
| Output | SSM path | Consumed by |
|---|---|---|
| API base URL | /bank/{env}/mod-018/api/base-url |
MOD-074 (back-office), MOD-019 (case data for SAR submission) |
| alert-consumer Lambda ARN | /bank/{env}/mod-018/lambda/alert-consumer-arn |
Operational tooling; replay |
| case-api Lambda ARN | /bank/{env}/mod-018/lambda/case-api-arn |
MOD-074 direct invoke (deferred) |
| escalation-sweeper Lambda ARN | /bank/{env}/mod-018/lambda/escalation-sweeper-arn |
Operational tooling; manual replay if Lambda outages |
| AppConfig application ID | /bank/{env}/mod-018/appconfig/application-id |
Lambda env-var resolution; ops tooling |
| AppConfig environment ID | /bank/{env}/mod-018/appconfig/environment-id |
Same |
| SAR threshold (mirror) | /bank/{env}/mod-018/sar-threshold |
Ops visibility (per K3) — AppConfig is authoritative at runtime |
SSM inputs¶
| SSM path | Source | Use |
|---|---|---|
/bank/{env}/eventbridge/bank-aml/arn + /dlq-arn |
MOD-104 | Subscribe to alert_raised; publish case_*; DLQ |
/bank/{env}/eventbridge/bank-platform/arn |
MOD-104 | Publish staff.action_taken to MOD-047 |
/bank/{env}/iam/lambda/bank-aml/arn |
MOD-104 | BankAmlRole — Lambda execution |
/bank/{env}/network/{vpc-id,private-subnet-ids} |
MOD-104 | Lambda VPC for Neon |
/bank/{env}/observability/adot-layer-arn |
MOD-076 | OTel/X-Ray |
/bank/{env}/sns/alerts/arn |
MOD-104 | Alarm destinations |
/bank/{env}/mod043/schema-registry/name |
MOD-043 | Upload 3 case event JSON Schemas |
/bank/{env}/kms/operational/arn |
MOD-104 | Sweeper SQS DLQ encryption |
/bank/{env}/neon/pooler-host + bank-neon/{env}/bank_aml/app_user |
MOD-103 | Neon at runtime |
EventBridge events¶
Consumed¶
| Event | Source bus | Filter pattern |
|---|---|---|
bank.aml.alert_raised |
bank-aml | {"source":["bank.aml"],"detail-type":["alert_raised"]} |
Published¶
| Event | Bus | Schema |
|---|---|---|
bank.aml.case_opened |
bank-aml | schemas/bank.aml.case_opened.json |
bank.aml.case_escalated |
bank-aml | schemas/bank.aml.case_escalated.json |
bank.aml.case_closed |
bank-aml | schemas/bank.aml.case_closed.json |
staff.action_taken (close only in v1) |
bank-platform | per MOD-047 producer contract |
Module type¶
Application Lambda + IaC. Three Lambdas (alert-consumer, case-api, escalation-sweeper). AppConfig application + 1 profile. HTTP API Gateway with 10 routes. EventBridge consumer rule (own-bus). 15-min schedule (with DLQ). 5 CloudWatch alarms. 4 SSM outputs.
Key design decisions¶
Decision: single case-api Lambda for 10 routes (vs 10 Lambdas)¶
Choice: One Lambda, internal route dispatch via event.routeKey.
Reason: Cold-start budget; SSM/Secrets cache hit rates; lower IaC surface. The API surface is small (admin + analyst CRUD); a single Lambda is the simpler pattern. Per-route observability is preserved via the route dimension in EMF metrics.
Decision: NOT cross-emit staff.action_taken for every action in v1¶
Choice: Only the close handler emits staff.action_taken to MOD-047. Other actions persist only in aml.case_events.
Reason: aml.case_events is FR-115's primary audit evidence. Cross-emission to MOD-047 is the GOV-006 cross-cut hardening. v1 keeps the cross-emit on the highest-stakes operation (SAR / NO_ACTION decisions) and defers the rest to a v2 hardening pass. Documented in the handoff.
Decision: Wall-clock 4h timer from aml_cases.created_at¶
Choice: Per K2. Single SQL predicate now() - created_at > timer_threshold. No business-hours math. AppConfig-tunable.
Reason: Operationally simpler. The wall-clock interpretation matches the orchestrator's resolution of FR-114's "business hours" wording.
Decision: Round-robin via last_assigned_at not least-cases¶
Choice: Round-robin from aml.analyst_pool ordered by last_assigned_at NULLS FIRST.
Reason: Simpler than computing per-analyst open-case counts on every assignment; fairer in steady state. A "least-cases" balancer can land in v2 if observed distributions warrant it.
Decision: case_events is mutable-only-on-status-fields not fully event-sourced¶
Choice: Per Minor 1 / wiki data model: aml.aml_cases carries the current status as a mutable field; aml.case_events records every transition.
Reason: Matches the wiki schema; analyst UI reads current status from one row; investigators reconstruct history from the event log.
Test approach¶
| Tier | Files | Count | What it covers |
|---|---|---|---|
| Unit | tests/unit/ |
4 files | state machine (8) · SAR gate (6) · logger (2) · types (5) |
| Contract | tests/contract/ |
2 files | alert_raised consumer parity with MOD-016's JSON Schema · 3 published case events parity with their JSON Schemas |
| Integration | tests/integration/ |
5 files | one per FR + observability (FR-113 dedup, FR-114 escalation, FR-115 event log, FR-116 supervisor gate end-to-end) |
| Policy | tests/policy/ |
3 files | AML-005 LOG (source-token + alert→case parity) · AML-006 LOG (tipping-off scan + immutability) · GOV-006 LOG (close emits staff.action_taken) |
Run:
- pnpm typecheck — clean
- pnpm test:unit — 26 / 26 pass (unit + contract)
- pnpm vitest run tests/policy — 4 pass + 3 DB-gated skips
- RUN_INTEGRATION=1 STAGE=dev AWS_PROFILE=bank-dev NEON_APP_PASSWORD=… pnpm test:integration — needs deploy
Operational runbook¶
Deploy¶
No cross-bus IAM widening required — MOD-018 stays on the bank-aml bus. Deploy lands cleanly assuming MOD-104, MOD-103, MOD-016 are already deployed.
Then Flyway:
DIRECT_HOST=$(aws ssm get-parameter --name /bank/dev/neon/direct-host --query Parameter.Value --output text)
flyway -url="jdbc:postgresql://$DIRECT_HOST/bank_aml?sslmode=require" \
-user=bank_aml_migrate_user -password=$NEON_MIGRATE_PASSWORD \
-locations=filesystem:db/migrations migrate
Add an analyst¶
curl -X PUT "$(aws ssm get-parameter --name /bank/dev/mod-018/api/base-url --query Parameter.Value --output text)/internal/v1/analysts" \
-H 'content-type: application/json' \
-d '{"staff_id":"STAFF-0042","display_name":"Alex Doe","email":"alex@example.test","is_supervisor":false,"active":true}'
Tune SAR threshold (FR-116)¶
AppConfig deployment via runbook (same pattern as MOD-016 rule config). Profile MOD-018-sar-threshold content shape: { "threshold": 70 }. Linear/5-min bake.
Replay a missed case event¶
aws lambda invoke \
--function-name bank-aml-mod-018-escalation-sweeper-dev \
--payload "$(echo '{}' | base64)" \
--cli-binary-format raw-in-base64-out /tmp/out.json
Event types emitted in structured logs¶
Per src/lib/event-types.ts. 24 entries; full list in source. Mandatory fields per ADR-031: trace_id, correlation_id, module_id, jurisdiction, event_type, level, timestamp. Optional: case_id, staff_id, party_id.
Custom metrics (EMF, namespace bank/modules)¶
| Metric | Unit | Dimensions |
|---|---|---|
alert_consumer_duration_ms |
Milliseconds | module_id, jurisdiction, environment, outcome |
case_event_publish_failed_total |
Count | module_id, jurisdiction, environment |
auto_escalations_total |
Count | module_id, jurisdiction, environment |
escalation_sweep_duration_ms |
Milliseconds | module_id, environment |
escalation_sweep_completed_total |
Count | module_id, environment |
Related artefacts¶
- Wiki spec:
bank-wiki/source/entities/modules/MOD-018.{yaml,md} - Handoffs:
docs/handoffs/MOD-018-complete.handoff.mddocs/handoffs/MOD-018-data-model-additions.handoff.md- ADRs in effect: ADR-001, ADR-019, ADR-025, ADR-029, ADR-030, ADR-031, ADR-033, ADR-042, ADR-043