Risk management platform¶


ID	`MOD-150`
System	SD06
Repo	`bank-risk-platform`
Build status	Not started
Deployed	No

Purpose¶

The Risk Management Platform is the automated risk intelligence and registration layer for the platform. It ingests events from every system domain, classifies them against the risk taxonomy (D01–D11), maintains the operational risk register without manual entry, monitors all critical third-party relationships, runs the Risk Appetite Framework (RAF) dashboard, maintains the model inventory, and triggers regulatory breach notifications automatically.

Every risk event that requires human judgement is elevated to the Risk Case Console (MOD-151) as a case. Everything else is logged and closed without human involvement. The operating principle is that a risk manager's time is for assessing exceptions, not recording events.

Operational risk register¶

All events ingested from MOD-076 (observability alerts), MOD-048 (system decision log), MOD-047 (agent action log), AWS CloudTrail (IAM and API events), and CI/CD pipeline webhooks are classified against the risk taxonomy and written as entries to risk.operational_risk_events. A rules engine performs classification: error type, source module, affected domain, and severity determine the risk category. Entries record: event_id, source_module, risk_domain (D01–D11), event_type, severity (P1/P2/P3), description, occurred_at, auto_resolved (bool), resolution_timestamp, and related_incident_id.

The register is append-only. No event is ever deleted or modified.

RCSA (Risk and Control Self-Assessment) metrics are derived automatically from control test pass/fail rates in CI pipelines. A control that repeatedly fails testing auto-generates a risk register entry under OPS-004, creating a direct link between engineering quality signals and the formal risk register — without any manual intervention.

Risk Appetite Framework dashboard¶

Aggregates all quantitative RAF indicators from SD06 into a single continuously updated view. Indicators include:

CET1 ratio (MOD-033)
LCR and NSFR (MOD-032)
IRRBB EVE sensitivity (MOD-035)
Stress test capital adequacy (MOD-034)
Related party exposure % (MOD-147)
High-risk customer concentration (MOD-039)
Operational loss trend (from the risk event register)

Each indicator has a configured RAF threshold. A breach triggers an automatic alert to the CRO, CFO, and Board Risk Committee chair. The board risk report is auto-generated on the board reporting calendar cadence, pulling live values and 90-day trends for all indicators. No spreadsheet is involved in its production.

Model inventory and lifecycle management¶

Every model deployment event from the CI/CD pipeline — model ID, version, training data lineage hash, feature set version, and validation report reference — is written to risk.model_inventory. The inventory records: model_id, owner_module, model_type (ML / statistical / rules), deployment_status (development / awaiting_validation / production / retired), deployed_at, last_validated_at, next_review_at, performance_thresholds (JSONB), current_psi, current_accuracy, current_recall, champion_id (nullable), challenger_id (nullable).

Nightly jobs run PSI and accuracy computations for all production models against the latest population. A threshold breach auto-creates an incident and flags the model for priority review. A model that has not been validated within its review SLA is flagged and its owner notified.

The validation gate is hard: no model can be promoted to production status in the inventory without a validation report reference attached. This constraint is checked by the CI/CD hook before deployment proceeds — model promotion is not a UI action, it is an outcome of a completed validation case in MOD-151.

Third-party health monitoring¶

All designated critical third-party services are registered in risk.critical_service_providers with: provider_name, service_type, dependency_modules (list), sla_uptime_pct, sla_latency_ms, health_check_endpoint, contractual_review_date, and tier (critical / important / standard).

Health checks run every 60 seconds. Neon database connection latency, Snowflake query latency, AWS service health, BPAY API availability, NPP connectivity, and eIDV provider response times are all monitored. When a check fails or an SLA metric is breached, an incident is auto-created with the provider name, metric, breach value, and a dependency graph showing which platform modules are affected. Contractual review dates approaching within 90 days auto-generate a reminder case in MOD-151.

Intraday liquidity monitoring¶

Payment events from MOD-020 are aggregated in real-time to produce a running intraday liquidity position. The position tracks: gross inflows (incoming payments received), gross outflows (outgoing payments settled), net intraday position, peak intraday exposure to each payment system (NPP, BPAY, NZ Faster Payments), and available intraday credit headroom. Positions are stored in Snowflake Dynamic Tables with a 5-minute refresh.

When intraday exposure exceeds the configured limit, an alert is sent to Treasury and the CRO. End-of-day positions feed into the MOD-032 LCR calculation as the final liquidity position for the day, completing the loop between intraday tracking and end-of-day regulatory reporting.

Incident and breach auto-creation¶

When MOD-076 fires a P1 or P2 alert, this module auto-creates an incident record in risk.incidents: incident_id, severity (P1/P2/P3), alert_source, alert_code, description, created_at, sla_resolve_by, status, regulatory_notification_required (bool), regulatory_notification_sent_at (nullable).

P1 incidents with regulatory_notification_required = true auto-assemble a notification document citing the incident, the affected service, the estimated customer impact, and the current resolution status. Where a regulator API exists — RBNZ incident notification portal, APRA breach notification API — the notification is submitted automatically within the required window. Where no API exists, a draft is staged in MOD-151 for human review before submission.

Change management feed¶

Every CI/CD deployment event — success, failure, rollback — creates a change record in risk.change_records: change_id, environment, module_id, artefact_hash, deployed_by (CI pipeline identity), deployed_at, outcome, rollback_of (nullable), post_impl_review_required (bool), post_impl_review_due_at (nullable). Post-implementation review is auto-required for any deployment that resulted in a rollback, or where a P1 incident occurred within 72 hours of deployment. Review cases are created in MOD-151.

Compliance rationale¶

RBNZ Operational Resilience Standard and APRA CPS 230 require that operational risk events are identified, classified, and managed within defined timeframes. OPS-003 through OPS-007 encode these obligations. Automation is not a convenience here — regulators assess the timeliness and completeness of risk event capture. A manual register populated by a human reviewing alert emails is inherently incomplete: it misses events that occur outside business hours, events that occur simultaneously, and events that are never reviewed because the reviewer is handling a higher-priority incident. This module makes completeness structurally guaranteed.

APRA CPS 220 and RBNZ technology risk guidance (DT-003, DT-008) require ongoing monitoring of technology risks and third-party providers. Continuous automated health checking against configured SLA thresholds is the only approach that satisfies the "ongoing" requirement at the pace of a digital bank's operational tempo.

Model risk under APRA CPS 220 (DT-005) requires a documented model inventory and validation process. Deriving the inventory automatically from CI/CD events means it is always current — a manually maintained spreadsheet will always lag deployments.

GOV-002 (RAF requirements) under both RBNZ and APRA frameworks requires the board to have visibility of risk appetite indicators on a regular basis. Auto-generating the board report from live data eliminates the lag and manual error inherent in compiled spreadsheet packs.

REP-009 encodes mandatory breach notification timelines: RBNZ requires notification within 24 hours of a material incident; APRA requires notification within 24 hours under CPS 234. PRI-002 encodes the NZ Privacy Act 2020 mandatory breach reporting obligation and the AU Notifiable Data Breaches (NDB) scheme. The assembly and submission workflow means the clock stops ticking on notification compliance when the automated submission lands, not when a compliance officer finishes drafting an email.

Commercial rationale¶

The cost of a late or incomplete breach notification — regulatory censure, public disclosure, reputational damage — far exceeds the cost of automating the process. A risk manager spending their time manually entering events into a register is not doing risk management; they are doing data entry. This module eliminates data entry entirely. The risk function's attention is reserved for the cases that require it.

A continuously computed RAF dashboard also eliminates the quarterly board pack compilation cycle — a process that typically consumes two to three person-weeks of finance and risk staff time per quarter, and produces a snapshot that is already weeks old by the time it reaches the board.

Module dependencies¶

Depends on¶

Module	Title	Required?	Contract	Reason
MOD-076	Observability platform	Required	—	Source of all operational alerting that feeds incident auto-creation and the operational risk register.
MOD-048	System decision log	Required	—	All risk register entries, RAF alerts, and incident records are written as immutable system decision log entries.
MOD-047	Agent action logger	Required	—	Agent action log entries feed into the operational risk register for conduct risk monitoring.
MOD-032	LCR / NSFR calculator	Required	—	LCR/NSFR figures are a primary input to the RAF dashboard and the intraday liquidity calculation.
MOD-033	RWA & capital ratio engine	Required	—	The Tier 1 capital ratio is a core RAF metric and the denominator for related party exposure limits.
MOD-034	Stress testing scenario engine	Required	—	Stress test outputs are surfaced in the RAF dashboard and the board risk report.
MOD-035	IRRBB / EVE / NII model	Required	—	IRRBB/EVE metrics are a core RAF component and trigger ALCO escalation when limits are breached.
MOD-039	Customer risk score model	Required	—	Customer risk score distribution is a risk appetite indicator tracking concentration of high-risk customers.
MOD-147	Related party exposure monitor	Required	—	Related party exposure percentage is a regulated RAF metric and must flow into the dashboard continuously.
MOD-042	CDC pipeline — Neon logical replication to S3 Iceberg	Required	—	The CDC pipeline delivers balance sheet and operational data to Snowflake for RAF calculations.
MOD-020	Pre-payment validation suite	Required	—	The payment event stream is the real-time data source for the intraday liquidity position calculation.
MOD-102	Snowflake account configuration & governance	Required	—	Snowflake compute layer where all RAF calculations and risk register aggregation runs.
MOD-104	AWS shared infrastructure bootstrap	Required	—	AWS shared infrastructure is required before this module can be deployed.

Required by¶

Module	Title	As	Contract
MOD-151	Risk case console	Hard dependency	—

Policies satisfied¶

Policy	Title	Mode	How
OPS-003	Incident Management Policy	`AUTO`	Incidents are auto-created from observability alerts with P1/P2/P3 classification, SLA timers, and routing — no manual incident registration required.
OPS-004	Operational Risk Policy	`AUTO`	Risk events from all system domains are auto-classified against the risk taxonomy and written to the operational risk register continuously — no manual entry.
OPS-005	Third-Party & Critical Service Provider Policy	`AUTO`	All designated critical third parties (Neon, Snowflake, AWS, BPAY, NPP, eIDV providers, card bureau) are continuously health-monitored; SLA breach auto-creates an incident.
OPS-006	Change Management Policy	`LOG`	CI/CD pipeline deployment events auto-create change records with timestamp, artefact hash, environment, and outcome; post-implementation review is auto-scheduled for P1 changes.
OPS-007	Financial Processing Resilience & Idempotency Policy	`LOG`	Idempotency key collision rates, reprocessing events, and settlement reconciliation outcomes are continuously tracked and logged against this policy.
DT-003	Technology Risk Management Policy	`AUTO`	Technology risk events (unpatched CVEs from SAST, latency SLA breaches, infrastructure anomalies) are auto-classified and written to the risk register.
DT-005	Model Risk Management Policy	`LOG`	Model inventory is auto-maintained from CI/CD deployment events; scheduled PSI and accuracy monitoring runs nightly; model validation gate is enforced before production promotion.
DT-008	Third-Party & Outsourcing Risk Policy	`AUTO`	All designated critical third-party services are continuously monitored for health and SLA compliance; contract expiry dates trigger review reminders.
GOV-002	Risk Appetite Statement Policy	`CALC`	The RAF dashboard is continuously computed from SD06 outputs; RAF threshold breach auto-alerts the CRO and Board Risk Committee chair.
REP-009	Regulatory incident & breach notification	`AUTO`	Material incidents and privacy breaches are auto-detected and routed through a notification assembly workflow; regulator API submission proceeds where an API is available.
PRI-002	Data Breach Response Policy	`AUTO`	Security anomalies (CloudTrail access failures, Cognito brute-force patterns, Secrets Manager anomalies) are auto-classified as potential breaches and the notification timer starts automatically.
CLQ-002	Liquidity Risk Management Policy	`CALC`	Intraday payment system exposure is computed in real-time from the payment event stream, extending MOD-032's end-of-day LCR calculation to cover intraday exposure.

Capabilities satisfied¶

(No capabilities mapped)

Part of SD06 — Snowflake Analytics & Risk Platform Compiled 2026-05-22 from source/entities/modules/MOD-150.yaml