Data quality & reconciliation monitor¶
| ID | MOD-038 |
| System | SD06 |
| Repo | bank-risk-platform |
| Build status | Deployed |
| Deployed | Yes |
| Last commit | 54197c02fff0ca78a988e6140d31778e59f05b46 |
Automated data quality and reconciliation layer for SD06. Owns the governance_meta schema — the first point of truth for whether downstream risk calculations can proceed.
What it does¶
MOD-038 runs as a Snowflake Task in the SD06 Task DAG, positioned immediately after the CDC refresh task and before all risk calculation modules. It executes in two stages:
Stage 1 — dbt test run (FR-225, FR-226). dbt test --select tag:mod-038 evaluates a battery of checks against every raw_cdc_* staging model: completeness (not-null), referential integrity (relationships), value range (accepted-values, custom generic tests), and format conformance. store_results: true persists one row per (run_id, dataset, check) into governance_meta.data_quality_log (append-only). A quality score per dataset is computed as passing-checks / total-checks. If any gated dataset scores below the configured threshold (default 98%, stored in governance_meta.config), the Task exits non-zero — all downstream Tasks in the DAG (MOD-032, MOD-033, MOD-035, MOD-036 etc.) do not start. This is the FR-226 halt mechanism: a Task DAG dependency, not an EventBridge event.
Stage 2 — reconciliation check (FR-227). A dbt model compares row counts in raw_cdc_core.postings against the LSN-ack metadata published by MOD-042 into the Iceberg snapshot. Discrepancies exceeding 0.01% of the aggregate are written to governance_meta.reconciliation_breaks. This is a pure Snowflake SQL operation — no Lambda, no cross-VPC Neon read.
Published views (FR-228). governance_meta.v_quality_scores and governance_meta.v_open_breaks are the published contract surfaces. Downstream modules reference these views via dbt source(). The CRO report is driven by governance_meta.daily_quality_summary (Dynamic Table, target_lag = 1 hour).
External alert (FR-226 human notification). If the Task exits non-zero, a thin Lambda publishes bank.risk-platform.data_quality_run_failed to the bank-risk-platform EventBridge bus. The sole consumer is MOD-076 (observability — alerts data engineering team). This event is a human alert, not a machine gate; the gate is enforced by the Task DAG.
Compliance rationale¶
REP-005 GATE is satisfied because the Task DAG dependency means no downstream regulatory return can run on data that has not passed DQ. The break cannot be hidden because data_quality_log and reconciliation_breaks are append-only with UPDATE/DELETE revoked (GOV-006 LOG). DT-004 AUTO is satisfied because the quality threshold is read from governance_meta.config — it is not hard-coded, has no override path, and is enforced at the pipeline level by dbt test failure.
Module type¶
Snowflake DDL + dbt + single Lambda (external alert only). No Lambda queries Snowflake. No EventBridge for intra-SD06 coordination.
Streamlit dashboard¶
MOD-038 ships a Streamlit page GOVERNANCE_META.STREAMLIT_DQ_SCORECARD providing:
- DQ break count and break rate heat map by system domain
- 30-day open-break trend per domain
- Break detail list per domain (rule ID, table, column, break count, first seen)
- Last-refreshed timestamp per domain
Consumed by MOD-172 (Operations & Model Intelligence Dashboard) as the DQ scorecard landing page. Cross-schema SELECT on GOVERNANCE_META.* published views required for OPERATIONS_ROLE.
Module dependencies¶
Depends on¶
| Module | Title | Required? | Contract | Reason |
|---|---|---|---|---|
| MOD-042 | CDC pipeline — Neon logical replication to S3 Iceberg | Required | — | Raw CDC schemas (raw_cdc_*) are the primary input to dbt staging models; LSN-ack metadata in the Iceberg snapshot is the authoritative source for FR-227 row-count reconciliation. |
| MOD-001 | Double-entry posting engine | Required | — | Ledger posting totals in raw_cdc_core.postings (replicated via MOD-042) are the source-of-truth for FR-227 reconciliation checks. Cross-VPC direct Neon read deferred to v2. |
| MOD-104 | AWS shared infrastructure bootstrap | Required | — | MOD-104 provisions the S3 Iceberg bucket (consumed by MOD-042 and read by Snowflake external tables), KMS keys, and BANK_SNS_INTEGRATION SNS topic used for the Snowflake Alert notification path to MOD-076. |
| MOD-102 | Snowflake account configuration & governance | Required | — | Snowflake account, BANK_{ENV}_RISK database, BANK_DBT_ROLE, NONPROD_WH/PROD_RISK_WH, EXECUTE ALERT ON ACCOUNT and APPLY METRIC ON ACCOUNT grants (migrations 021–022) must exist before this module can create governance_meta schema, attach DMFs, or create the DQ breach Alert. |
| MOD-172 | Operations & Model Intelligence Dashboard | Required | — | Operations & Model Intelligence Dashboard uses MOD-038 DQ published views as its scorecard landing page — MOD-038 Streamlit is the primary DQ visibility surface. |
Required by¶
| Module | Title | As | Contract |
|---|---|---|---|
| MOD-036 | Prudential return builder (RBNZ / APRA) | Hard dependency | — |
| MOD-172 | Operations & Model Intelligence Dashboard | Hard dependency | — |
Policies satisfied¶
| Policy | Title | Mode | How |
|---|---|---|---|
| REP-005 | Data Quality & Assurance Policy | GATE |
Source-to-report reconciliation automated — breaks cannot be hidden or ignored |
| DT-004 | Data Governance Policy | AUTO |
Data quality rules enforced at pipeline level — not a manual check |
| GOV-006 | Internal Audit Policy | LOG |
Internal audit has access to reconciliation break history — data quality is auditable |
Capabilities satisfied¶
(No capabilities mapped)
Part of SD06 — Snowflake Analytics & Risk Platform
Compiled 2026-05-22 from source/entities/modules/MOD-038.yaml