Skip to content

Data quality & reconciliation monitor

ID MOD-038
System SD06
Repo bank-risk-platform
Build status Deployed
Deployed Yes
Last commit 54197c02fff0ca78a988e6140d31778e59f05b46

Automated data quality and reconciliation layer for SD06. Owns the governance_meta schema — the first point of truth for whether downstream risk calculations can proceed.

What it does

MOD-038 runs as a Snowflake Task in the SD06 Task DAG, positioned immediately after the CDC refresh task and before all risk calculation modules. It executes in two stages:

Stage 1 — dbt test run (FR-225, FR-226). dbt test --select tag:mod-038 evaluates a battery of checks against every raw_cdc_* staging model: completeness (not-null), referential integrity (relationships), value range (accepted-values, custom generic tests), and format conformance. store_results: true persists one row per (run_id, dataset, check) into governance_meta.data_quality_log (append-only). A quality score per dataset is computed as passing-checks / total-checks. If any gated dataset scores below the configured threshold (default 98%, stored in governance_meta.config), the Task exits non-zero — all downstream Tasks in the DAG (MOD-032, MOD-033, MOD-035, MOD-036 etc.) do not start. This is the FR-226 halt mechanism: a Task DAG dependency, not an EventBridge event.

Stage 2 — reconciliation check (FR-227). A dbt model compares row counts in raw_cdc_core.postings against the LSN-ack metadata published by MOD-042 into the Iceberg snapshot. Discrepancies exceeding 0.01% of the aggregate are written to governance_meta.reconciliation_breaks. This is a pure Snowflake SQL operation — no Lambda, no cross-VPC Neon read.

Published views (FR-228). governance_meta.v_quality_scores and governance_meta.v_open_breaks are the published contract surfaces. Downstream modules reference these views via dbt source(). The CRO report is driven by governance_meta.daily_quality_summary (Dynamic Table, target_lag = 1 hour).

External alert (FR-226 human notification). If the Task exits non-zero, a thin Lambda publishes bank.risk-platform.data_quality_run_failed to the bank-risk-platform EventBridge bus. The sole consumer is MOD-076 (observability — alerts data engineering team). This event is a human alert, not a machine gate; the gate is enforced by the Task DAG.

Compliance rationale

REP-005 GATE is satisfied because the Task DAG dependency means no downstream regulatory return can run on data that has not passed DQ. The break cannot be hidden because data_quality_log and reconciliation_breaks are append-only with UPDATE/DELETE revoked (GOV-006 LOG). DT-004 AUTO is satisfied because the quality threshold is read from governance_meta.config — it is not hard-coded, has no override path, and is enforced at the pipeline level by dbt test failure.

Module type

Snowflake DDL + dbt + single Lambda (external alert only). No Lambda queries Snowflake. No EventBridge for intra-SD06 coordination.

Streamlit dashboard

MOD-038 ships a Streamlit page GOVERNANCE_META.STREAMLIT_DQ_SCORECARD providing: - DQ break count and break rate heat map by system domain - 30-day open-break trend per domain - Break detail list per domain (rule ID, table, column, break count, first seen) - Last-refreshed timestamp per domain

Consumed by MOD-172 (Operations & Model Intelligence Dashboard) as the DQ scorecard landing page. Cross-schema SELECT on GOVERNANCE_META.* published views required for OPERATIONS_ROLE.


Module dependencies

Depends on

Module Title Required? Contract Reason
MOD-042 CDC pipeline — Neon logical replication to S3 Iceberg Required Raw CDC schemas (raw_cdc_*) are the primary input to dbt staging models; LSN-ack metadata in the Iceberg snapshot is the authoritative source for FR-227 row-count reconciliation.
MOD-001 Double-entry posting engine Required Ledger posting totals in raw_cdc_core.postings (replicated via MOD-042) are the source-of-truth for FR-227 reconciliation checks. Cross-VPC direct Neon read deferred to v2.
MOD-104 AWS shared infrastructure bootstrap Required MOD-104 provisions the S3 Iceberg bucket (consumed by MOD-042 and read by Snowflake external tables), KMS keys, and BANK_SNS_INTEGRATION SNS topic used for the Snowflake Alert notification path to MOD-076.
MOD-102 Snowflake account configuration & governance Required Snowflake account, BANK_{ENV}_RISK database, BANK_DBT_ROLE, NONPROD_WH/PROD_RISK_WH, EXECUTE ALERT ON ACCOUNT and APPLY METRIC ON ACCOUNT grants (migrations 021–022) must exist before this module can create governance_meta schema, attach DMFs, or create the DQ breach Alert.
MOD-172 Operations & Model Intelligence Dashboard Required Operations & Model Intelligence Dashboard uses MOD-038 DQ published views as its scorecard landing page — MOD-038 Streamlit is the primary DQ visibility surface.

Required by

Module Title As Contract
MOD-036 Prudential return builder (RBNZ / APRA) Hard dependency
MOD-172 Operations & Model Intelligence Dashboard Hard dependency

Policies satisfied

Policy Title Mode How
REP-005 Data Quality & Assurance Policy GATE Source-to-report reconciliation automated — breaks cannot be hidden or ignored
DT-004 Data Governance Policy AUTO Data quality rules enforced at pipeline level — not a manual check
GOV-006 Internal Audit Policy LOG Internal audit has access to reconciliation break history — data quality is auditable

Capabilities satisfied

(No capabilities mapped)


Part of SD06 — Snowflake Analytics & Risk Platform Compiled 2026-05-22 from source/entities/modules/MOD-038.yaml