Skip to content

Risk management platform

ID MOD-150
System SD06
Repo bank-risk-platform
Build status Not started
Deployed No

Purpose

The Risk Management Platform is the automated risk intelligence and registration layer for the platform. It ingests events from every system domain, classifies them against the risk taxonomy (D01–D11), maintains the operational risk register without manual entry, monitors all critical third-party relationships, runs the Risk Appetite Framework (RAF) dashboard, maintains the model inventory, and triggers regulatory breach notifications automatically.

Every risk event that requires human judgement is elevated to the Risk Case Console (MOD-151) as a case. Everything else is logged and closed without human involvement. The operating principle is that a risk manager's time is for assessing exceptions, not recording events.

Operational risk register

All events ingested from MOD-076 (observability alerts), MOD-048 (system decision log), MOD-047 (agent action log), AWS CloudTrail (IAM and API events), and CI/CD pipeline webhooks are classified against the risk taxonomy and written as entries to risk.operational_risk_events. A rules engine performs classification: error type, source module, affected domain, and severity determine the risk category. Entries record: event_id, source_module, risk_domain (D01–D11), event_type, severity (P1/P2/P3), description, occurred_at, auto_resolved (bool), resolution_timestamp, and related_incident_id.

The register is append-only. No event is ever deleted or modified.

RCSA (Risk and Control Self-Assessment) metrics are derived automatically from control test pass/fail rates in CI pipelines. A control that repeatedly fails testing auto-generates a risk register entry under OPS-004, creating a direct link between engineering quality signals and the formal risk register — without any manual intervention.

Risk Appetite Framework dashboard

Aggregates all quantitative RAF indicators from SD06 into a single continuously updated view. Indicators include:

  • CET1 ratio (MOD-033)
  • LCR and NSFR (MOD-032)
  • IRRBB EVE sensitivity (MOD-035)
  • Stress test capital adequacy (MOD-034)
  • Related party exposure % (MOD-147)
  • High-risk customer concentration (MOD-039)
  • Operational loss trend (from the risk event register)

Each indicator has a configured RAF threshold. A breach triggers an automatic alert to the CRO, CFO, and Board Risk Committee chair. The board risk report is auto-generated on the board reporting calendar cadence, pulling live values and 90-day trends for all indicators. No spreadsheet is involved in its production.

Model inventory and lifecycle management

Every model deployment event from the CI/CD pipeline — model ID, version, training data lineage hash, feature set version, and validation report reference — is written to risk.model_inventory. The inventory records: model_id, owner_module, model_type (ML / statistical / rules), deployment_status (development / awaiting_validation / production / retired), deployed_at, last_validated_at, next_review_at, performance_thresholds (JSONB), current_psi, current_accuracy, current_recall, champion_id (nullable), challenger_id (nullable).

Nightly jobs run PSI and accuracy computations for all production models against the latest population. A threshold breach auto-creates an incident and flags the model for priority review. A model that has not been validated within its review SLA is flagged and its owner notified.

The validation gate is hard: no model can be promoted to production status in the inventory without a validation report reference attached. This constraint is checked by the CI/CD hook before deployment proceeds — model promotion is not a UI action, it is an outcome of a completed validation case in MOD-151.

Third-party health monitoring

All designated critical third-party services are registered in risk.critical_service_providers with: provider_name, service_type, dependency_modules (list), sla_uptime_pct, sla_latency_ms, health_check_endpoint, contractual_review_date, and tier (critical / important / standard).

Health checks run every 60 seconds. Neon database connection latency, Snowflake query latency, AWS service health, BPAY API availability, NPP connectivity, and eIDV provider response times are all monitored. When a check fails or an SLA metric is breached, an incident is auto-created with the provider name, metric, breach value, and a dependency graph showing which platform modules are affected. Contractual review dates approaching within 90 days auto-generate a reminder case in MOD-151.

Intraday liquidity monitoring

Payment events from MOD-020 are aggregated in real-time to produce a running intraday liquidity position. The position tracks: gross inflows (incoming payments received), gross outflows (outgoing payments settled), net intraday position, peak intraday exposure to each payment system (NPP, BPAY, NZ Faster Payments), and available intraday credit headroom. Positions are stored in Snowflake Dynamic Tables with a 5-minute refresh.

When intraday exposure exceeds the configured limit, an alert is sent to Treasury and the CRO. End-of-day positions feed into the MOD-032 LCR calculation as the final liquidity position for the day, completing the loop between intraday tracking and end-of-day regulatory reporting.

Incident and breach auto-creation

When MOD-076 fires a P1 or P2 alert, this module auto-creates an incident record in risk.incidents: incident_id, severity (P1/P2/P3), alert_source, alert_code, description, created_at, sla_resolve_by, status, regulatory_notification_required (bool), regulatory_notification_sent_at (nullable).

P1 incidents with regulatory_notification_required = true auto-assemble a notification document citing the incident, the affected service, the estimated customer impact, and the current resolution status. Where a regulator API exists — RBNZ incident notification portal, APRA breach notification API — the notification is submitted automatically within the required window. Where no API exists, a draft is staged in MOD-151 for human review before submission.

Change management feed

Every CI/CD deployment event — success, failure, rollback — creates a change record in risk.change_records: change_id, environment, module_id, artefact_hash, deployed_by (CI pipeline identity), deployed_at, outcome, rollback_of (nullable), post_impl_review_required (bool), post_impl_review_due_at (nullable). Post-implementation review is auto-required for any deployment that resulted in a rollback, or where a P1 incident occurred within 72 hours of deployment. Review cases are created in MOD-151.

Compliance rationale

RBNZ Operational Resilience Standard and APRA CPS 230 require that operational risk events are identified, classified, and managed within defined timeframes. OPS-003 through OPS-007 encode these obligations. Automation is not a convenience here — regulators assess the timeliness and completeness of risk event capture. A manual register populated by a human reviewing alert emails is inherently incomplete: it misses events that occur outside business hours, events that occur simultaneously, and events that are never reviewed because the reviewer is handling a higher-priority incident. This module makes completeness structurally guaranteed.

APRA CPS 220 and RBNZ technology risk guidance (DT-003, DT-008) require ongoing monitoring of technology risks and third-party providers. Continuous automated health checking against configured SLA thresholds is the only approach that satisfies the "ongoing" requirement at the pace of a digital bank's operational tempo.

Model risk under APRA CPS 220 (DT-005) requires a documented model inventory and validation process. Deriving the inventory automatically from CI/CD events means it is always current — a manually maintained spreadsheet will always lag deployments.

GOV-002 (RAF requirements) under both RBNZ and APRA frameworks requires the board to have visibility of risk appetite indicators on a regular basis. Auto-generating the board report from live data eliminates the lag and manual error inherent in compiled spreadsheet packs.

REP-009 encodes mandatory breach notification timelines: RBNZ requires notification within 24 hours of a material incident; APRA requires notification within 24 hours under CPS 234. PRI-002 encodes the NZ Privacy Act 2020 mandatory breach reporting obligation and the AU Notifiable Data Breaches (NDB) scheme. The assembly and submission workflow means the clock stops ticking on notification compliance when the automated submission lands, not when a compliance officer finishes drafting an email.

Commercial rationale

The cost of a late or incomplete breach notification — regulatory censure, public disclosure, reputational damage — far exceeds the cost of automating the process. A risk manager spending their time manually entering events into a register is not doing risk management; they are doing data entry. This module eliminates data entry entirely. The risk function's attention is reserved for the cases that require it.

A continuously computed RAF dashboard also eliminates the quarterly board pack compilation cycle — a process that typically consumes two to three person-weeks of finance and risk staff time per quarter, and produces a snapshot that is already weeks old by the time it reaches the board.


Module dependencies

Depends on

Module Title Required? Contract Reason
MOD-076 Observability platform Required Source of all operational alerting that feeds incident auto-creation and the operational risk register.
MOD-048 System decision log Required All risk register entries, RAF alerts, and incident records are written as immutable system decision log entries.
MOD-047 Agent action logger Required Agent action log entries feed into the operational risk register for conduct risk monitoring.
MOD-032 LCR / NSFR calculator Required LCR/NSFR figures are a primary input to the RAF dashboard and the intraday liquidity calculation.
MOD-033 RWA & capital ratio engine Required The Tier 1 capital ratio is a core RAF metric and the denominator for related party exposure limits.
MOD-034 Stress testing scenario engine Required Stress test outputs are surfaced in the RAF dashboard and the board risk report.
MOD-035 IRRBB / EVE / NII model Required IRRBB/EVE metrics are a core RAF component and trigger ALCO escalation when limits are breached.
MOD-039 Customer risk score model Required Customer risk score distribution is a risk appetite indicator tracking concentration of high-risk customers.
MOD-147 Related party exposure monitor Required Related party exposure percentage is a regulated RAF metric and must flow into the dashboard continuously.
MOD-042 CDC pipeline — Neon logical replication to S3 Iceberg Required The CDC pipeline delivers balance sheet and operational data to Snowflake for RAF calculations.
MOD-020 Pre-payment validation suite Required The payment event stream is the real-time data source for the intraday liquidity position calculation.
MOD-102 Snowflake account configuration & governance Required Snowflake compute layer where all RAF calculations and risk register aggregation runs.
MOD-104 AWS shared infrastructure bootstrap Required AWS shared infrastructure is required before this module can be deployed.

Required by

Module Title As Contract
MOD-151 Risk case console Hard dependency

Policies satisfied

Policy Title Mode How
OPS-003 Incident Management Policy AUTO Incidents are auto-created from observability alerts with P1/P2/P3 classification, SLA timers, and routing — no manual incident registration required.
OPS-004 Operational Risk Policy AUTO Risk events from all system domains are auto-classified against the risk taxonomy and written to the operational risk register continuously — no manual entry.
OPS-005 Third-Party & Critical Service Provider Policy AUTO All designated critical third parties (Neon, Snowflake, AWS, BPAY, NPP, eIDV providers, card bureau) are continuously health-monitored; SLA breach auto-creates an incident.
OPS-006 Change Management Policy LOG CI/CD pipeline deployment events auto-create change records with timestamp, artefact hash, environment, and outcome; post-implementation review is auto-scheduled for P1 changes.
OPS-007 Financial Processing Resilience & Idempotency Policy LOG Idempotency key collision rates, reprocessing events, and settlement reconciliation outcomes are continuously tracked and logged against this policy.
DT-003 Technology Risk Management Policy AUTO Technology risk events (unpatched CVEs from SAST, latency SLA breaches, infrastructure anomalies) are auto-classified and written to the risk register.
DT-005 Model Risk Management Policy LOG Model inventory is auto-maintained from CI/CD deployment events; scheduled PSI and accuracy monitoring runs nightly; model validation gate is enforced before production promotion.
DT-008 Third-Party & Outsourcing Risk Policy AUTO All designated critical third-party services are continuously monitored for health and SLA compliance; contract expiry dates trigger review reminders.
GOV-002 Risk Appetite Statement Policy CALC The RAF dashboard is continuously computed from SD06 outputs; RAF threshold breach auto-alerts the CRO and Board Risk Committee chair.
REP-009 Regulatory incident & breach notification AUTO Material incidents and privacy breaches are auto-detected and routed through a notification assembly workflow; regulator API submission proceeds where an API is available.
PRI-002 Data Breach Response Policy AUTO Security anomalies (CloudTrail access failures, Cognito brute-force patterns, Secrets Manager anomalies) are auto-classified as potential breaches and the notification timer starts automatically.
CLQ-002 Liquidity Risk Management Policy CALC Intraday payment system exposure is computed in real-time from the payment event stream, extending MOD-032's end-of-day LCR calculation to cover intraday exposure.

Capabilities satisfied

(No capabilities mapped)


Part of SD06 — Snowflake Analytics & Risk Platform Compiled 2026-05-22 from source/entities/modules/MOD-150.yaml