Skip to content

Snowflake account configuration & governance

ID MOD-102
System SD07
Repo bank-platform
Build status Deployed
Deployed Yes
Last commit ce48ff1

Purpose

Provisions and governs the Snowflake account configuration. This is an IaC module — it does not contain Lambda application code. It runs via the bank-platform CI/CD pipeline when configuration changes are merged, and must be fully deployed before any module that reads from or writes to Snowflake can operate.

This module is the implementation of the decisions in ADR-002 and ADR-035.

Execution pattern

All Snowflake configuration is expressed as versioned SQL scripts — no Terraform, no provider state file. Snowflake's native IaC is SQL; every object (warehouse, database, role, integration, masking policy, tag, task) is created and managed via CREATE OR REPLACE / ALTER DDL. Scripts are version-controlled in bank-platform under snowflake/setup/ and applied in order by the CI/CD pipeline.

dbt core handles all data transformation layer DDL (views, dynamic tables, incremental models). The SQL setup scripts cover account-level objects only — warehouses, databases, schemas, roles, integrations, network policy, and tag/masking infrastructure.

Account structure

Single Snowflake account for all environments. Environments are namespaced by convention, not by separate accounts. This preserves the ability to use Snowflake's Dynamic Data Masking to control what lower-environment users see when accessing replicated production data.

Database namespace convention

BANK_{ENV}_{DOMAIN}
Environment Core KYC AML Payments Credit Risk Platform
prod BANK_PROD_CORE BANK_PROD_KYC BANK_PROD_AML BANK_PROD_PAYMENTS BANK_PROD_CREDIT BANK_PROD_RISK BANK_PROD_PLATFORM
uat BANK_UAT_CORE BANK_UAT_KYC BANK_UAT_AML
dev BANK_DEV_CORE BANK_DEV_KYC BANK_DEV_AML

Schemas within each database mirror the Neon schema names (e.g. BANK_PROD_CORE contains schemas ACCOUNTS, POSTINGS, TREASURY, CONTEXTS, ASSETS).

Lower environment data strategy

Two options, not mutually exclusive:

  • Synthetic data — dev and UAT databases populated from synthetic data generators; no prod data involved
  • Masked prod data — dev and UAT databases populated via Snowflake replication with Dynamic Data Masking policies active; PII columns are masked at query time for all non-privileged roles

Both options are available. The masking infrastructure is provisioned by this module regardless of which strategy is in use at any given time.

Warehouses

Production carries dedicated per-workload warehouses. Dev and UAT share a single minimal warehouse to minimise credit consumption.

Production warehouses

Warehouse Size Use Auto-suspend
PROD_ETL_WH Small CDC ingestion, Firehose landing, batch loads 60s
PROD_ANALYTICS_WH Small (scales to Medium) Interactive queries, BI, ad-hoc 60s
PROD_RISK_WH Medium Risk model computation, LCR/NSFR, stress tests 60s
PROD_DECISIONS_WH X-Small Decision result publication (MOD-079) 30s
PROD_DBT_WH Small dbt core model runs (transformation pipeline) 60s

Resource monitors cap daily credit consumption per warehouse with alerts at 75% and a hard stop at 100%.

Non-production warehouses

Warehouse Size Use Environments
NONPROD_WH X-Small All workloads (ETL, analytics, dbt runs) dev, uat

A single warehouse per non-prod environment prevents idle credit burn from multiple suspended warehouses.

RBAC architecture

Role hierarchy

ACCOUNTADMIN  (Snowflake built-in — break-glass only, credentials in PAM)
  └─ SYSADMIN
        └─ BANK_FUNCTIONAL_ROLE
              ├─ BANK_PROD_CORE_ROLE        (PROD — bank_prod_core DB, read/write)
              ├─ BANK_PROD_KYC_ROLE         (PROD — bank_prod_kyc DB)
              ├─ BANK_PROD_AML_ROLE
              ├─ BANK_PROD_PAYMENTS_ROLE
              ├─ BANK_PROD_CREDIT_ROLE
              ├─ BANK_PROD_RISK_ROLE        (PROD — read/write)
              ├─ BANK_PROD_PLATFORM_ROLE    (PROD — ETL write)
              ├─ BANK_NONPROD_CORE_ROLE     (dev + uat — all BANK_DEV_CORE + BANK_UAT_CORE)
              ├─ BANK_NONPROD_KYC_ROLE
              ├─ … (one per domain for non-prod)
              ├─ BANK_ANALYTICS_ROLE        (SELECT on approved prod views — masked)
              ├─ BANK_REPORTING_ROLE        (SELECT on regulatory reporting schemas — prod)
              ├─ BANK_DEVELOPER_ROLE        (SELECT on all dev + uat databases; no prod access)
              └─ BANK_DBT_ROLE              (CREATE/REPLACE on transformation schemas — dbt service account)

SECURITYADMIN
  └─ BANK_SECURITY_ROLE   (manages grants and masking policy assignments — not used for data access)

Developer access

BANK_DEVELOPER_ROLE grants: - SELECT on all BANK_DEV_* and BANK_UAT_* databases - No access to BANK_PROD_* databases - Usage on NONPROD_WH

Developer role assignment is managed via EntraID AD group sync (see Identity section). No manual Snowflake user creation or role grants for individual developers.

Service account roles

Each system domain's Lambda execution role authenticates to Snowflake via key-pair authentication and assumes the corresponding prod or non-prod domain role. Keys are stored in Secrets Manager (MOD-045), scoped per environment.

Identity and access — EntraID SSO

EntraID is external to the bank software boundary. This section documents the Snowflake-side configuration required and what must be configured in EntraID by the identity team.

Snowflake-side setup (provisioned by this module)

-- Enable SCIM for EntraID provisioning
CREATE OR REPLACE SECURITY INTEGRATION bank_entra_scim
  TYPE = SCIM
  SCIM_CLIENT = 'AZURE'
  RUN_AS_ROLE = BANK_SECURITY_ROLE;

-- SAML SSO integration for interactive login
CREATE OR REPLACE SECURITY INTEGRATION bank_entra_sso
  TYPE = SAML2
  SAML2_ISSUER = 'https://sts.windows.net/{tenant-id}/'
  SAML2_SSO_URL = 'https://login.microsoftonline.com/{tenant-id}/saml2'
  SAML2_PROVIDER = 'CUSTOM'
  SAML2_X509_CERT = '<certificate from EntraID app registration>';

EntraID configuration required (identity team — external)

Item Detail
Enterprise app Create "Snowflake" enterprise app in EntraID tenant
SCIM provisioning Enable SCIM provisioning to Snowflake SCIM endpoint; provision bearer token from the SCIM integration above
User lifecycle SCIM creates Snowflake users on EntraID assignment; disables/deletes on offboarding
AD group → role mapping bank-snowflake-developersBANK_DEVELOPER_ROLE; bank-snowflake-analystsBANK_ANALYTICS_ROLE; bank-snowflake-reportingBANK_REPORTING_ROLE; domain service accounts use key-pair auth, not SSO
MFA policy Conditional Access policy requiring MFA for all Snowflake SSO sessions

Service account connections (Lambda, dbt, CI/CD) use key-pair authentication and are not subject to SSO. Only human interactive logins go via EntraID SAML.

Integrations

Storage integration — S3

S3 access for CDC landing, Iceberg data lake, and file export stages:

CREATE OR REPLACE STORAGE INTEGRATION bank_s3_integration
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = 'S3'
  ENABLED = TRUE
  STORAGE_ALLOWED_LOCATIONS = (
    's3://bank-snowflake-prod-landing/',
    's3://bank-iceberg-prod/',
    's3://bank-export-prod/'
  );

Notification integration — SQS

SQS for Snowpipe auto-ingest from CDC Firehose landing:

CREATE OR REPLACE NOTIFICATION INTEGRATION bank_sqs_integration
  TYPE = QUEUE
  NOTIFICATION_PROVIDER = AWS_SQS
  ENABLED = TRUE
  AWS_SQS_ARN = '<SQS ARN from MOD-104>';

Git integration — CI/CD and dbt

Snowflake Git integration connects directly to the bank-platform repository. This enables dbt models to be executed natively within Snowflake without an external orchestrator, using Snowflake's embedded dbt support (Snowflake Notebook + dbt Core via Snowpark Container Services):

CREATE OR REPLACE API INTEGRATION bank_github_integration
  API_PROVIDER = git_https_api
  API_ALLOWED_PREFIXES = ('https://github.com/totara-bank/')
  ENABLED = TRUE;

CREATE OR REPLACE GIT REPOSITORY bank_platform_repo
  API_INTEGRATION = bank_github_integration
  ORIGIN = 'https://github.com/totara-bank/bank-platform';

dbt models are fetched from bank_platform_repo and executed by Snowflake Tasks on the PROD_DBT_WH. This replaces any external dbt runner — transformation runs are entirely internal to Snowflake.

Network policy

PrivateLink is not configured at this stage (cost not justified for current scale). Network access is controlled by an IP allowlist network policy:

CREATE OR REPLACE NETWORK POLICY bank_network_policy
  ALLOWED_IP_LIST = (
    '<AWS NAT gateway EIP — NZ>',
    '<AWS NAT gateway EIP — AU>',
    '<CI/CD runner IP range>',
    '<Admin bastion IP — PAM-gated, MOD-046>'
  );

ALTER ACCOUNT SET NETWORK_POLICY = bank_network_policy;

No public internet access from application code. All Lambda-to-Snowflake traffic exits via the AWS NAT gateway with a fixed EIP, which is included in the allowlist. PrivateLink is documented as the natural upgrade path when data volume and cost profile warrant it.

dbt core — transformation pipeline

dbt core handles all data transformation: model definitions, incremental materialisation, data mart construction, Dynamic Table definitions, and regulatory view construction. Schemachange is not used.

dbt runs natively within Snowflake via the Git repository integration and Snowflake Tasks. The embedded dbt execution path means no external orchestration dependency for transformation runs.

Responsibilities: - Schema creation (CREATE SCHEMA IF NOT EXISTS) for transformation layers - Model materialisation (views, tables, incremental, Dynamic Tables) - Data lineage documentation (dbt docs) - Data tests (schema tests, custom tests on business rules)

Each system domain that writes to Snowflake has a corresponding dbt project directory under bank-platform/dbt/. The BANK_DBT_ROLE / BANK_DBT_WH are the execution context.

PII tagging and dynamic data masking

Object tags

A PII_CLASSIFICATION tag is applied to all columns containing personal or sensitive data:

CREATE OR REPLACE TAG pii_classification
  ALLOWED_VALUES 'PII_HIGH', 'PII_MEDIUM', 'FINANCIAL', 'INTERNAL', 'PUBLIC';

PII_HIGH covers: name, date of birth, tax identifiers (IRD/TFN), passport/driver licence numbers, biometric data. PII_MEDIUM covers: email, phone, address, account numbers. FINANCIAL covers: balances, transaction amounts, credit scores.

Tags are applied to columns as part of the dbt model definitions (meta: {pii_classification: PII_HIGH}) and enforced via the BANK_SECURITY_ROLE governance process.

Masking policies

Dynamic Data Masking policies are bound to the PII_CLASSIFICATION tag. In BANK_PROD_* databases: - Roles with BANK_PROD_{DOMAIN}_ROLE see unmasked data (service accounts only) - BANK_ANALYTICS_ROLE and BANK_REPORTING_ROLE see masked values (e.g. '***' for names, last-4-digits for account numbers) - BANK_DEVELOPER_ROLE has no access to prod databases at all

In BANK_DEV_* and BANK_UAT_* databases, masking policies are present but pass-through — lower environments hold either synthetic data or masked-at-source prod data. The same policy infrastructure applies in all environments for consistency.

Internal orchestration

Snowflake Tasks and Streams drive all internal pipeline triggers. No external orchestrator is required for Snowflake-internal workloads.

Key tasks provisioned by this module: - CDC ingestion trigger — Task on PROD_ETL_WH polling the Snowpipe ingest completion stream - dbt model refresh — Task on PROD_DBT_WH executing the dbt Git-based run on schedule - Dynamic Table refresh scheduling — managed by Snowflake natively once DTs are defined by dbt - Regulatory report generation — Task on PROD_RISK_WH triggering MOD-086 report queries on schedule

Task dependency graphs are defined in the SQL setup scripts. All tasks are BANK_FUNCTIONAL_ROLE-owned.

File export

COPY INTO external S3 stages is the mechanism for all outbound file generation from Snowflake. Use cases requiring file export:

Use case Stage Format Consumer
Regulatory submissions bank-export-prod/regulatory/ CSV / pipe-delimited RBNZ, FMA, AUSTRAC — via MOD-086
Audit data extracts bank-export-prod/audit/ Parquet External auditors, internal audit
Third-party data feeds bank-export-prod/partners/ CSV Configured per agreement
Credit bureau submissions bank-export-prod/credit-bureau/ Fixed-width / CSV MOD-059

The bank_s3_integration storage integration covers all export stages. Export tasks are triggered by Snowflake Tasks (see internal orchestration above).

Observability → MOD-076

Snowflake account metadata is exported to the observability platform (MOD-076) for monitoring, alerting, and cost governance. A scheduled Task runs on PROD_ANALYTICS_WH to export from ACCOUNT_USAGE views to a nominated S3 path, where MOD-076 ingests it.

Exported datasets:

Source view Content Frequency
ACCOUNT_USAGE.QUERY_HISTORY Query execution, latency, credit consumption per query Hourly
ACCOUNT_USAGE.METERING_HISTORY Credit consumption per warehouse Daily
ACCOUNT_USAGE.TASK_HISTORY Task execution status, errors, duration Hourly
ACCOUNT_USAGE.COPY_HISTORY Snowpipe and COPY INTO completions/failures Hourly
ACCOUNT_USAGE.LOGIN_HISTORY Authentication events (success/failure) Daily
ACCOUNT_USAGE.GRANT_PRIVILEGES_HISTORY Role grant changes Daily

Alerting on: warehouse credit overage (resource monitor threshold), task failures, login anomalies. Alerts route through MOD-076's alerting pipeline.


Module dependencies

Depends on

Module Title Required? Contract Reason
MOD-076 Observability platform Optional MOD-076 consumes Snowflake data and binds to BANK_SNS_INTEGRATION; it does not block MOD-102 deployment.
MOD-104 AWS shared infrastructure bootstrap Required AWS S3 storage integration, SQS notification integration, and export stage S3 buckets require the shared AWS resources provisioned by MOD-104 to exist first.

Required by

Module Title As Contract
MOD-017 ML behavioural scoring model Hard dependency
MOD-032 LCR / NSFR calculator Hard dependency
MOD-033 RWA & capital ratio engine Hard dependency
MOD-034 Stress testing scenario engine Hard dependency
MOD-035 IRRBB / EVE / NII model Hard dependency
MOD-036 Prudential return builder (RBNZ / APRA) Hard dependency
MOD-037 AUSTRAC / RBNZ AML reporting pipeline Hard dependency
MOD-038 Data quality & reconciliation monitor Hard dependency
MOD-039 Customer risk score model Hard dependency
MOD-040 Churn & health score engine Hard dependency
MOD-041 Categorisation & merchant enrichment model Hard dependency
MOD-042 CDC pipeline — Neon logical replication to S3 Iceberg Hard dependency
MOD-055 Onboarding fraud scoring engine Hard dependency
MOD-056 Compliance visibility engine Hard dependency
MOD-057 Statistical returns & survey engine Hard dependency
MOD-058 Regulatory incident & breach notification engine Hard dependency
MOD-060 FATCA/CRS/AEOI reporting engine Hard dependency
MOD-080 Statutory financial reporting & ERP integration Hard dependency
MOD-085 Market rates ingestion & normalisation Hard dependency
MOD-086 Funds transfer pricing engine Hard dependency
MOD-098 Cost attribution engine Hard dependency
MOD-099 Infrastructure cost reports Hard dependency
MOD-101 Wealth intelligence engine Hard dependency
MOD-105 Product eligibility engine Hard dependency
MOD-106 ROTE engine Hard dependency
MOD-147 Related party exposure monitor Hard dependency
MOD-150 Risk management platform Hard dependency
MOD-152 Climate risk assessment Hard dependency
MOD-170 Regulatory Submissions Portal Hard dependency
MOD-171 Risk Intelligence Dashboard Hard dependency
MOD-172 Operations & Model Intelligence Dashboard Hard dependency
MOD-173 Model risk register & inventory Hard dependency
MOD-174 Model performance monitoring & drift detection Hard dependency
MOD-176 Snowflake read API service Hard dependency

Policies satisfied

Policy Title Mode How
DT-001 Information Security Policy GATE Snowflake RBAC roles enforce schema-level access boundaries — no system domain can read another domain's raw tables without an explicit cross-domain read grant reviewed by the data governance board.
DT-002 Cybersecurity Policy AUTO All schema transformations are applied through the version-controlled dbt core pipeline — no ad-hoc schema modifications permitted in production.
DT-004 Data Governance Policy AUTO The single governed CDC pipeline (MOD-042) is the only path for operational data to enter Snowflake — enforced by storage integration policy and warehouse grants.
GOV-007 Conflicts of Interest Policy LOG All warehouse activity, login events, and grant changes are captured in Snowflake's account_usage schema and retained for seven years via the ACCOUNT_USAGE database.

Capabilities satisfied

(No capabilities mapped)


Part of SD07 — Data Platform & Governance Infrastructure Compiled 2026-05-22 from source/entities/modules/MOD-102.yaml