Skip to content

ADR-046: SD06 data product architecture — Snowflake-native orchestration, dbt transformation layer, schema-as-product

Status Accepted
Date 2026-05-01
Deciders CTO, Head of Data, Head of Risk Engineering
Affects repos bank-risk-platform

Status: Accepted — 2026-05-01

Context

SD06 is a Snowflake-centric analytics and risk platform. Data flows from the CDC-replicated raw layer (MOD-042 Iceberg tables) through transformations to published data products consumed by risk calculation modules and regulatory reporting.

Early SD06 module proposals treated Snowflake as a passive store: Lambda functions opened Snowflake connections to execute SQL checks, EventBridge carried intra-platform signals between modules, and AWS SSM held platform-internal configuration values such as DQ thresholds. This pattern is wrong for three reasons. First, it bypasses Snowflake's native compute, scheduling, and orchestration capabilities — Snowflake Tasks and dbt can execute and sequence all of this work inside the platform. Second, routing SD06-to-SD06 control flow through the AWS event bus adds unnecessary operational complexity (Lambda cold starts, EventBridge payload limits, IAM surface) for work that has no external subscriber. Third, it creates the impression that SD06 modules are Lambda-primary, when the majority of the compute is and should be Snowflake SQL.

This ADR establishes the authoritative architecture for all SD06 modules. It refines ADR-035 (Snowflake account configuration) and ADR-003 (CDC pipeline) by defining exactly where the Snowflake-native boundary begins and where the AWS boundary ends.

Decision

1. Snowflake Tasks for intra-platform orchestration

Scheduled and dependency-based execution within SD06 is handled by Snowflake Tasks. A Task DAG governs the full pipeline:

  • Leaf tasks detect new CDC data arriving in Iceberg external tables (via Snowflake Streams).
  • Interior tasks invoke dbt run --select tag:mod-NNN and dbt test --select tag:mod-NNN for each module in dependency order.
  • DAG edges enforce execution order: upstream data must be present and pass quality checks before downstream modules run. A failing upstream Task suspends all downstream Tasks in that branch.

Lambda schedulers that call Snowflake via JDBC or the Snowflake Python connector to trigger SQL work are not used within SD06.

2. dbt as the transformation layer

All Snowflake transformations are dbt models. No module may hand-write Dynamic Table DDL or CTAS SQL when a dbt model achieves the same result.

  • Staging models (staging_*) — cast, clean, and deduplicate from raw_cdc_* source tables.
  • Intermediate models — business logic, joins, derivations.
  • Mart / product models — published data products exposed as views or Dynamic Tables.
  • dbt tests — implement data quality assertions (completeness, referential integrity, value range, format conformance). store_results: true persists test outcomes to the owning module's governance table.
  • dbt sources — the declared contract for consuming another module's output. Every cross-module data dependency is a source() reference in sources.yml, not a Snowflake DDL foreign key and not an EventBridge subscription.

3. Schema-as-data-product

Each SD06 module owns one primary schema. The schema is the unit of the data product:

  • Tables are private implementation detail. Their names, column sets, and partitioning may change across module versions without notice.
  • Views are the published contract. Any other module or external consumer that needs this module's data must reference the view, not the underlying table. A breaking change to a view's column set or semantics requires a version suffix (v_quality_scores_v2) and a deprecation notice before the old view is retired.
  • Dynamic Tables are used for performance-sensitive aggregations. They are still versioned and treated as part of the published contract when referenced by other modules.

4. Configuration lives in Snowflake

Platform-internal runtime configuration — DQ thresholds, calculation parameters, rate card values — belongs in Snowflake. The canonical form is a config table in the owning module's schema, with effective-from/to versioning and an audit trail. dbt variables (dbt_project.yml vars:) are acceptable for compile-time defaults.

AWS SSM is reserved for AWS-service configuration: Lambda ARNs, EventBridge bus ARNs, S3 bucket names, KMS key ARNs. SSM is not used for Snowflake-internal runtime parameters.

5. EventBridge only at external boundaries

An EventBridge event is justified when, and only when, at least one of the following is true:

(a) The subscriber is outside SD06. MOD-076 (observability alerting), MOD-048 (system decision log), MOD-063 (notification orchestration), and SD04/SD08 consumers are legitimate external subscribers. Publishing a data_quality_run_failed event to alert the data engineering team via MOD-076 is correct.

(b) An external system triggers SD06 work. The CDC arrival signal from MOD-042 is the canonical input boundary — Firehose → S3 Iceberg → Snowflake External Table → Stream → Task.

SD06-to-SD06 signals must not use EventBridge. The mechanism for inter-module coordination is the Task DAG (sequencing) and dbt source() references (data dependencies).

6. Lambda functions — justified uses only

Lambda is appropriate in SD06 for:

  • A thin Task runner that invokes dbt run or dbt test on a cron schedule or in response to an external trigger, bridging the AWS scheduling surface to Snowflake execution.
  • Publishing a single EventBridge event to an external subscriber when a Snowflake Task completes or fails with significance beyond SD06 (e.g., DQ run failed → alert data engineering team).
  • Reading external APIs that have no Snowflake-native connector (e.g., AWS Cost Explorer in MOD-098, Snowflake Marketplace polling in MOD-085).

Lambda is not appropriate for: executing SQL checks against Snowflake rows, mediating SD06-to-SD06 control flow, reading Snowflake query results and re-publishing them as EventBridge events, or storing operational parameters that live inside Snowflake.

Consequences

Positive: - Module boundaries are clean and inspectable: each module owns a schema, publishes views, and declares its upstream dependencies in sources.yml. A new engineer can read the dbt DAG and understand the full data lineage. - Operational simplicity: the Snowflake Task DAG is inspectable in Snowflake Studio; no Lambda timeout logs to diagnose for a missed DQ run. - dbt provides version-controlled, tested, documented transformations with column-level lineage. - Eliminates the Lambda-as-Snowflake-client anti-pattern and its associated failure modes (connection pool exhaustion, JDBC cold starts, Lambda timeout on large result sets). - The DQ halt mechanism is enforced by the infrastructure (Task dependency) rather than relying on consuming modules to subscribe to and honour an EventBridge event.

Negative: - Snowflake Tasks add a new operational surface. Task state monitoring and warehouse credit consumption for scheduled Tasks must be accounted for in the platform cost model. - The dbt project grows with each SD06 module. Shared macro libraries, test definitions, and packages.yml governance become a coordination concern. - Snowflake's Task DAG depth limit (currently 1,000 tasks per DAG root) must be managed for complex dependency chains in later phases.

Guidance for existing modules

MOD-085 (market rates ingestion): The bank.risk-platform.market_rates_updated EventBridge event is correct for SD04 consumers (MOD-025, MOD-071). SD06 consumers (MOD-032, MOD-035, MOD-086) must be refactored to reference the market.* schema via dbt source() — they must not subscribe to the EventBridge event as a trigger.

MOD-098 (cost attribution engine): Reads AWS Cost Explorer API (Lambda justified) and produces Dynamic Tables. The unattributed_cost_threshold_exceeded event has only external subscribers (MOD-076) — correctly scoped. No change required.

MOD-032, MOD-033, MOD-035, MOD-036 and other calculation modules: Where module specs describe consuming results from sibling SD06 modules via EventBridge, replace with dbt source() references and Task DAG sequencing. EventBridge subscriptions to sibling SD06 events must be removed from infra stacks.


All ADRs Compiled 2026-05-22 from source/entities/adrs/ADR-046.yaml