Skip to content

MOD-063 — Notification orchestration

Purpose

The decisioning layer between platform events and customer communications. Subscribes to EventBridge governance buses (MOD-043) and workflow transitions (MOD-062), applies preference + dedup + regulatory-timing rules, selects a delivery channel from {email, SMS, push}, dispatches via SES (V1) or SNS (deferred), and writes an immutable audit row for every state transition.

FR scope: FR-297, FR-298, FR-299, FR-300, NFR-005, NFR-019. Capabilities: CAP-103 (triggering), CAP-104 (channel selection), CAP-105 (audit trail).

Architecture

        bank-kyc bus               bank-platform bus
            │                            │
   kyc.verification_completed     workflow.transitioned
   kyc.verification_rejected      notification.test
            │                            │
            └────────────┬───────────────┘
                ┌─────────────────────┐
                │ EventBridge target  │  retry: 3 attempts, exp backoff
                │   → Lambda          │  on exhaust → DLQ
                └─────────┬───────────┘
            ┌────────────────────────────┐
            │ notification-handler λ      │
            │  1. ensureSchema (DDL)     │
            │  2. matchRule              │
            │  3. audit TRIGGERED        │
            │  4. dedup check            │
            │  5. read prefs             │
            │  6. selectChannels         │
            │  7. dispatch (SES email)   │
            │  8. audit DISPATCHED/FAILED│
            └─────────┬───────────┬──────┘
                      ▼           ▼
        notifications schema    SES (email)
        (audit_log,             SNS (sms/push — V1 stub)
         preferences,
         dedup_keys, templates)

What MOD-063 owns

Resource Purpose
notification-handler Lambda Single Lambda; consumes events, dispatches, audits
4× EventBridge rules (one per detail-type) Targeting the Lambda; each with 3-attempt retry policy
SQS DLQ + retry queue DLQ catches retry-exhausted events; retry queue is V1 standby (EB does retries directly today)
SES configuration set bank-platform-notifications-{env} — reputation tracking + sending enabled
notifications schema (Postgres) templates, preferences, audit_log, dedup_keys
3× CloudWatch alarms error rate, DLQ depth, p99 latency
8× SSM downstream contract paths Lambda ARN/name, audit table FQN, SES config set, sender, DLQ

SSM contract

Read (upstream)

Path Owner
/bank/{env}/network/vpc-id, /private-subnet-ids MOD-104
/bank/{env}/kms/operational/arn MOD-104
/bank/{env}/eventbridge/bank-kyc/arn MOD-104
/bank/{env}/eventbridge/bank-platform/arn MOD-104
/bank/{env}/sns/alerts/arn MOD-104
/bank/{env}/neon/direct-host (etc.) MOD-103
bank-neon/{env}/bank_platform/app_user (Secrets Manager) MOD-103

Write

Path Value
/bank/{env}/mod063/lambda-arn, /lambda-name Function pointers
/bank/{env}/mod063/audit-log-table notifications.audit_log (FQN for MOD-076 dashboards)
/bank/{env}/mod063/ses-config-set, /ses-sender-address For downstream SES senders that want the same bounce-handling
/bank/{env}/mod063/dlq-arn, /dlq-url DLQ for ops drain tooling
/bank/{env}/mod063/retry-queue-arn Standby retry queue

FR coverage

FR Where
FR-297 (≤60s preferred channel) EventBridge target retry policy + Lambda timeout 30s + SES <1s typical → comfortably within 60s budget
FR-298 (preferences enforced; opt-out suppressed) channel-selector.ts filters enabled=false / opted_out_at IS NOT NULL / missing destination
FR-299 (3× retry + fallback channel + log) EventBridge retryPolicy.maximumRetryAttempts: 3 + DLQ + audit FAILED row per retry
FR-300 (log every transition; 2yr retention) notifications.audit_log append-only; locked via REVOKE; retention is operational (cleanup task TBD when audit volume relevant)
NFR-005 Indirect — timely customer comms reduce human-agent load
NFR-019 Lambda is stateless; audit replays from EventBridge archive (MOD-043)

Policy coverage

Policy Mode How
CON-001 AUTO Regulatory disclosures sent at correct time override_preferences: true on regulatory rules → selectChannels ignores opt-outs and uses every available channel
GOV-003 LOG All comms logged with content / channel / timestamp / status audit_log row per transition; UPDATE/DELETE revoked from app_user; __tests__/policy/gov-003-immutability.test.ts proves it

V1 deferrals

Feature Why deferred Path forward
SMS dispatch Needs verified origination identity per region (NZ + AU different processes) Add SNS topic + SMS sending policy + populate sns-dispatcher.ts; add "sms" to AVAILABLE_CHANNELS
Push dispatch Needs FCM (Android) + APNs (iOS) credentials; mobile app must be registered with stores SNS platform applications + secrets + populate sns-dispatcher.ts
EventBridge outbound events No downstream consumer needs them yet Add notification.dispatched/failed/bounced publishing when MOD-076 dashboards or another consumer surfaces
Audit cleanup task Volume not yet relevant Scheduled Lambda or pg_cron job: DELETE FROM audit_log WHERE occurred_at < now() - interval '2 years'
CAP-058 preference cutover (SD02) SD02 not deployed Single Flyway-style migration when SD02 lands: copy preferences from notifications.preferences to party.preferences; update getPreferences to read from there
EventBridge archive replay (NFR-019) EventBridge archive is provisioned by MOD-043 already Operator-driven replay procedure; scripted in a follow-up

Tests

  • 21 unit (channel selector, rules matcher, event-patterns invariants).
  • Policy: GOV-003 immutability — live test gated on app_user existing.
  • Integration: 23/23 live verification via scripts/verify-deployment.mjs.

Operational notes

  • Customer destinations. notifications.preferences.destination is per-channel (email address / phone number / push endpoint ARN). Operator UI / CAP-058 will own destination management; for V1 destinations are seeded via the schema-on-cold-start + manual updates.
  • SES sender domain. no-reply@bank-platform-{env}.test is a placeholder; real sender requires DNS verification.
  • Adding a new trigger. Append to TRIGGER_RULES in src/config/event-patterns.ts, redeploy.
  • DLQ drain. When mod063/dlq-depth-{env} alarm fires, ops drains the DLQ via aws sqs receive-message and decides per-message: re-trigger via notification.test, log as known-bad, or escalate.