Snowflake account configuration & governance¶


ID	`MOD-102`
System	SD07
Repo	`bank-platform`
Build status	Deployed
Deployed	Yes
Last commit	`ce48ff1`

Purpose¶

Provisions and governs the Snowflake account configuration. This is an IaC module — it does not contain Lambda application code. It runs via the bank-platform CI/CD pipeline when configuration changes are merged, and must be fully deployed before any module that reads from or writes to Snowflake can operate.

This module is the implementation of the decisions in ADR-002 and ADR-035.

Execution pattern¶

All Snowflake configuration is expressed as versioned SQL scripts — no Terraform, no provider state file. Snowflake's native IaC is SQL; every object (warehouse, database, role, integration, masking policy, tag, task) is created and managed via CREATE OR REPLACE / ALTER DDL. Scripts are version-controlled in bank-platform under snowflake/setup/ and applied in order by the CI/CD pipeline.

dbt core handles all data transformation layer DDL (views, dynamic tables, incremental models). The SQL setup scripts cover account-level objects only — warehouses, databases, schemas, roles, integrations, network policy, and tag/masking infrastructure.

Account structure¶

Single Snowflake account for all environments. Environments are namespaced by convention, not by separate accounts. This preserves the ability to use Snowflake's Dynamic Data Masking to control what lower-environment users see when accessing replicated production data.

Database namespace convention¶

BANK_{ENV}_{DOMAIN}

Environment	Core	KYC	AML	Payments	Credit	Risk	Platform
prod	`BANK_PROD_CORE`	`BANK_PROD_KYC`	`BANK_PROD_AML`	`BANK_PROD_PAYMENTS`	`BANK_PROD_CREDIT`	`BANK_PROD_RISK`	`BANK_PROD_PLATFORM`
uat	`BANK_UAT_CORE`	`BANK_UAT_KYC`	`BANK_UAT_AML`	…	…	…	…
dev	`BANK_DEV_CORE`	`BANK_DEV_KYC`	`BANK_DEV_AML`	…	…	…	…

Schemas within each database mirror the Neon schema names (e.g. BANK_PROD_CORE contains schemas ACCOUNTS, POSTINGS, TREASURY, CONTEXTS, ASSETS).

Lower environment data strategy¶

Two options, not mutually exclusive:

Synthetic data — dev and UAT databases populated from synthetic data generators; no prod data involved
Masked prod data — dev and UAT databases populated via Snowflake replication with Dynamic Data Masking policies active; PII columns are masked at query time for all non-privileged roles

Both options are available. The masking infrastructure is provisioned by this module regardless of which strategy is in use at any given time.

Warehouses¶

Production carries dedicated per-workload warehouses. Dev and UAT share a single minimal warehouse to minimise credit consumption.

Production warehouses¶

Warehouse	Size	Use	Auto-suspend
`PROD_ETL_WH`	Small	CDC ingestion, Firehose landing, batch loads	60s
`PROD_ANALYTICS_WH`	Small (scales to Medium)	Interactive queries, BI, ad-hoc	60s
`PROD_RISK_WH`	Medium	Risk model computation, LCR/NSFR, stress tests	60s
`PROD_DECISIONS_WH`	X-Small	Decision result publication (MOD-079)	30s
`PROD_DBT_WH`	Small	dbt core model runs (transformation pipeline)	60s

Resource monitors cap daily credit consumption per warehouse with alerts at 75% and a hard stop at 100%.

Non-production warehouses¶

Warehouse	Size	Use	Environments
`NONPROD_WH`	X-Small	All workloads (ETL, analytics, dbt runs)	dev, uat

A single warehouse per non-prod environment prevents idle credit burn from multiple suspended warehouses.

RBAC architecture¶

Role hierarchy¶

ACCOUNTADMIN  (Snowflake built-in — break-glass only, credentials in PAM)
  └─ SYSADMIN
        └─ BANK_FUNCTIONAL_ROLE
              ├─ BANK_PROD_CORE_ROLE        (PROD — bank_prod_core DB, read/write)
              ├─ BANK_PROD_KYC_ROLE         (PROD — bank_prod_kyc DB)
              ├─ BANK_PROD_AML_ROLE
              ├─ BANK_PROD_PAYMENTS_ROLE
              ├─ BANK_PROD_CREDIT_ROLE
              ├─ BANK_PROD_RISK_ROLE        (PROD — read/write)
              ├─ BANK_PROD_PLATFORM_ROLE    (PROD — ETL write)
              ├─ BANK_NONPROD_CORE_ROLE     (dev + uat — all BANK_DEV_CORE + BANK_UAT_CORE)
              ├─ BANK_NONPROD_KYC_ROLE
              ├─ … (one per domain for non-prod)
              ├─ BANK_ANALYTICS_ROLE        (SELECT on approved prod views — masked)
              ├─ BANK_REPORTING_ROLE        (SELECT on regulatory reporting schemas — prod)
              ├─ BANK_DEVELOPER_ROLE        (SELECT on all dev + uat databases; no prod access)
              └─ BANK_DBT_ROLE              (CREATE/REPLACE on transformation schemas — dbt service account)

SECURITYADMIN
  └─ BANK_SECURITY_ROLE   (manages grants and masking policy assignments — not used for data access)

Developer access¶

BANK_DEVELOPER_ROLE grants: - SELECT on all BANK_DEV_* and BANK_UAT_* databases - No access to BANK_PROD_* databases - Usage on NONPROD_WH

Developer role assignment is managed via EntraID AD group sync (see Identity section). No manual Snowflake user creation or role grants for individual developers.

Service account roles¶

Each system domain's Lambda execution role authenticates to Snowflake via key-pair authentication and assumes the corresponding prod or non-prod domain role. Keys are stored in Secrets Manager (MOD-045), scoped per environment.

Identity and access — EntraID SSO¶

EntraID is external to the bank software boundary. This section documents the Snowflake-side configuration required and what must be configured in EntraID by the identity team.

Snowflake-side setup (provisioned by this module)¶

-- Enable SCIM for EntraID provisioning
CREATE OR REPLACE SECURITY INTEGRATION bank_entra_scim
  TYPE = SCIM
  SCIM_CLIENT = 'AZURE'
  RUN_AS_ROLE = BANK_SECURITY_ROLE;

-- SAML SSO integration for interactive login
CREATE OR REPLACE SECURITY INTEGRATION bank_entra_sso
  TYPE = SAML2
  SAML2_ISSUER = 'https://sts.windows.net/{tenant-id}/'
  SAML2_SSO_URL = 'https://login.microsoftonline.com/{tenant-id}/saml2'
  SAML2_PROVIDER = 'CUSTOM'
  SAML2_X509_CERT = '<certificate from EntraID app registration>';

EntraID configuration required (identity team — external)¶

Item	Detail
Enterprise app	Create "Snowflake" enterprise app in EntraID tenant
SCIM provisioning	Enable SCIM provisioning to Snowflake SCIM endpoint; provision bearer token from the SCIM integration above
User lifecycle	SCIM creates Snowflake users on EntraID assignment; disables/deletes on offboarding
AD group → role mapping	`bank-snowflake-developers` → `BANK_DEVELOPER_ROLE`; `bank-snowflake-analysts` → `BANK_ANALYTICS_ROLE`; `bank-snowflake-reporting` → `BANK_REPORTING_ROLE`; domain service accounts use key-pair auth, not SSO
MFA policy	Conditional Access policy requiring MFA for all Snowflake SSO sessions

Service account connections (Lambda, dbt, CI/CD) use key-pair authentication and are not subject to SSO. Only human interactive logins go via EntraID SAML.

Integrations¶

Storage integration — S3¶

S3 access for CDC landing, Iceberg data lake, and file export stages:

CREATE OR REPLACE STORAGE INTEGRATION bank_s3_integration
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = 'S3'
  ENABLED = TRUE
  STORAGE_ALLOWED_LOCATIONS = (
    's3://bank-snowflake-prod-landing/',
    's3://bank-iceberg-prod/',
    's3://bank-export-prod/'
  );

Notification integration — SQS¶

SQS for Snowpipe auto-ingest from CDC Firehose landing:

CREATE OR REPLACE NOTIFICATION INTEGRATION bank_sqs_integration
  TYPE = QUEUE
  NOTIFICATION_PROVIDER = AWS_SQS
  ENABLED = TRUE
  AWS_SQS_ARN = '<SQS ARN from MOD-104>';

Git integration — CI/CD and dbt¶

Snowflake Git integration connects directly to the bank-platform repository. This enables dbt models to be executed natively within Snowflake without an external orchestrator, using Snowflake's embedded dbt support (Snowflake Notebook + dbt Core via Snowpark Container Services):

CREATE OR REPLACE API INTEGRATION bank_github_integration
  API_PROVIDER = git_https_api
  API_ALLOWED_PREFIXES = ('https://github.com/totara-bank/')
  ENABLED = TRUE;

CREATE OR REPLACE GIT REPOSITORY bank_platform_repo
  API_INTEGRATION = bank_github_integration
  ORIGIN = 'https://github.com/totara-bank/bank-platform';

dbt models are fetched from bank_platform_repo and executed by Snowflake Tasks on the PROD_DBT_WH. This replaces any external dbt runner — transformation runs are entirely internal to Snowflake.

Network policy¶

PrivateLink is not configured at this stage (cost not justified for current scale). Network access is controlled by an IP allowlist network policy:

CREATE OR REPLACE NETWORK POLICY bank_network_policy
  ALLOWED_IP_LIST = (
    '<AWS NAT gateway EIP — NZ>',
    '<AWS NAT gateway EIP — AU>',
    '<CI/CD runner IP range>',
    '<Admin bastion IP — PAM-gated, MOD-046>'
  );

ALTER ACCOUNT SET NETWORK_POLICY = bank_network_policy;

No public internet access from application code. All Lambda-to-Snowflake traffic exits via the AWS NAT gateway with a fixed EIP, which is included in the allowlist. PrivateLink is documented as the natural upgrade path when data volume and cost profile warrant it.

dbt core — transformation pipeline¶

dbt core handles all data transformation: model definitions, incremental materialisation, data mart construction, Dynamic Table definitions, and regulatory view construction. Schemachange is not used.

dbt runs natively within Snowflake via the Git repository integration and Snowflake Tasks. The embedded dbt execution path means no external orchestration dependency for transformation runs.

Responsibilities: - Schema creation (CREATE SCHEMA IF NOT EXISTS) for transformation layers - Model materialisation (views, tables, incremental, Dynamic Tables) - Data lineage documentation (dbt docs) - Data tests (schema tests, custom tests on business rules)

Each system domain that writes to Snowflake has a corresponding dbt project directory under bank-platform/dbt/. The BANK_DBT_ROLE / BANK_DBT_WH are the execution context.

PII tagging and dynamic data masking¶

Object tags¶

A PII_CLASSIFICATION tag is applied to all columns containing personal or sensitive data:

CREATE OR REPLACE TAG pii_classification
  ALLOWED_VALUES 'PII_HIGH', 'PII_MEDIUM', 'FINANCIAL', 'INTERNAL', 'PUBLIC';

PII_HIGH covers: name, date of birth, tax identifiers (IRD/TFN), passport/driver licence numbers, biometric data. PII_MEDIUM covers: email, phone, address, account numbers. FINANCIAL covers: balances, transaction amounts, credit scores.

Tags are applied to columns as part of the dbt model definitions (meta: {pii_classification: PII_HIGH}) and enforced via the BANK_SECURITY_ROLE governance process.

Masking policies¶

Dynamic Data Masking policies are bound to the PII_CLASSIFICATION tag. In BANK_PROD_* databases: - Roles with BANK_PROD_{DOMAIN}_ROLE see unmasked data (service accounts only) - BANK_ANALYTICS_ROLE and BANK_REPORTING_ROLE see masked values (e.g. '***' for names, last-4-digits for account numbers) - BANK_DEVELOPER_ROLE has no access to prod databases at all

In BANK_DEV_* and BANK_UAT_* databases, masking policies are present but pass-through — lower environments hold either synthetic data or masked-at-source prod data. The same policy infrastructure applies in all environments for consistency.

Internal orchestration¶

Snowflake Tasks and Streams drive all internal pipeline triggers. No external orchestrator is required for Snowflake-internal workloads.

Key tasks provisioned by this module: - CDC ingestion trigger — Task on PROD_ETL_WH polling the Snowpipe ingest completion stream - dbt model refresh — Task on PROD_DBT_WH executing the dbt Git-based run on schedule - Dynamic Table refresh scheduling — managed by Snowflake natively once DTs are defined by dbt - Regulatory report generation — Task on PROD_RISK_WH triggering MOD-086 report queries on schedule

Task dependency graphs are defined in the SQL setup scripts. All tasks are BANK_FUNCTIONAL_ROLE-owned.

File export¶

COPY INTO external S3 stages is the mechanism for all outbound file generation from Snowflake. Use cases requiring file export:

Use case	Stage	Format	Consumer
Regulatory submissions	`bank-export-prod/regulatory/`	CSV / pipe-delimited	RBNZ, FMA, AUSTRAC — via MOD-086
Audit data extracts	`bank-export-prod/audit/`	Parquet	External auditors, internal audit
Third-party data feeds	`bank-export-prod/partners/`	CSV	Configured per agreement
Credit bureau submissions	`bank-export-prod/credit-bureau/`	Fixed-width / CSV	MOD-059

The bank_s3_integration storage integration covers all export stages. Export tasks are triggered by Snowflake Tasks (see internal orchestration above).

Observability → MOD-076¶

Snowflake account metadata is exported to the observability platform (MOD-076) for monitoring, alerting, and cost governance. A scheduled Task runs on PROD_ANALYTICS_WH to export from ACCOUNT_USAGE views to a nominated S3 path, where MOD-076 ingests it.

Exported datasets:

Source view	Content	Frequency
`ACCOUNT_USAGE.QUERY_HISTORY`	Query execution, latency, credit consumption per query	Hourly
`ACCOUNT_USAGE.METERING_HISTORY`	Credit consumption per warehouse	Daily
`ACCOUNT_USAGE.TASK_HISTORY`	Task execution status, errors, duration	Hourly
`ACCOUNT_USAGE.COPY_HISTORY`	Snowpipe and COPY INTO completions/failures	Hourly
`ACCOUNT_USAGE.LOGIN_HISTORY`	Authentication events (success/failure)	Daily
`ACCOUNT_USAGE.GRANT_PRIVILEGES_HISTORY`	Role grant changes	Daily

Alerting on: warehouse credit overage (resource monitor threshold), task failures, login anomalies. Alerts route through MOD-076's alerting pipeline.

Module dependencies¶

Depends on¶

Module	Title	Required?	Contract	Reason
MOD-076	Observability platform	Optional	—	MOD-076 consumes Snowflake data and binds to BANK_SNS_INTEGRATION; it does not block MOD-102 deployment.
MOD-104	AWS shared infrastructure bootstrap	Required	—	AWS S3 storage integration, SQS notification integration, and export stage S3 buckets require the shared AWS resources provisioned by MOD-104 to exist first.

Required by¶

Module	Title	As	Contract
MOD-017	ML behavioural scoring model	Hard dependency	—
MOD-032	LCR / NSFR calculator	Hard dependency	—
MOD-033	RWA & capital ratio engine	Hard dependency	—
MOD-034	Stress testing scenario engine	Hard dependency	—
MOD-035	IRRBB / EVE / NII model	Hard dependency	—
MOD-036	Prudential return builder (RBNZ / APRA)	Hard dependency	—
MOD-037	AUSTRAC / RBNZ AML reporting pipeline	Hard dependency	—
MOD-038	Data quality & reconciliation monitor	Hard dependency	—
MOD-039	Customer risk score model	Hard dependency	—
MOD-040	Churn & health score engine	Hard dependency	—
MOD-041	Categorisation & merchant enrichment model	Hard dependency	—
MOD-042	CDC pipeline — Neon logical replication to S3 Iceberg	Hard dependency	—
MOD-055	Onboarding fraud scoring engine	Hard dependency	—
MOD-056	Compliance visibility engine	Hard dependency	—
MOD-057	Statistical returns & survey engine	Hard dependency	—
MOD-058	Regulatory incident & breach notification engine	Hard dependency	—
MOD-060	FATCA/CRS/AEOI reporting engine	Hard dependency	—
MOD-080	Statutory financial reporting & ERP integration	Hard dependency	—
MOD-085	Market rates ingestion & normalisation	Hard dependency	—
MOD-086	Funds transfer pricing engine	Hard dependency	—
MOD-098	Cost attribution engine	Hard dependency	—
MOD-099	Infrastructure cost reports	Hard dependency	—
MOD-101	Wealth intelligence engine	Hard dependency	—
MOD-105	Product eligibility engine	Hard dependency	—
MOD-106	ROTE engine	Hard dependency	—
MOD-147	Related party exposure monitor	Hard dependency	—
MOD-150	Risk management platform	Hard dependency	—
MOD-152	Climate risk assessment	Hard dependency	—
MOD-170	Regulatory Submissions Portal	Hard dependency	—
MOD-171	Risk Intelligence Dashboard	Hard dependency	—
MOD-172	Operations & Model Intelligence Dashboard	Hard dependency	—
MOD-173	Model risk register & inventory	Hard dependency	—
MOD-174	Model performance monitoring & drift detection	Hard dependency	—
MOD-176	Snowflake read API service	Hard dependency	—

Policies satisfied¶

Policy	Title	Mode	How
DT-001	Information Security Policy	`GATE`	Snowflake RBAC roles enforce schema-level access boundaries — no system domain can read another domain's raw tables without an explicit cross-domain read grant reviewed by the data governance board.
DT-002	Cybersecurity Policy	`AUTO`	All schema transformations are applied through the version-controlled dbt core pipeline — no ad-hoc schema modifications permitted in production.
DT-004	Data Governance Policy	`AUTO`	The single governed CDC pipeline (MOD-042) is the only path for operational data to enter Snowflake — enforced by storage integration policy and warehouse grants.
GOV-007	Conflicts of Interest Policy	`LOG`	All warehouse activity, login events, and grant changes are captured in Snowflake's account_usage schema and retained for seven years via the ACCOUNT_USAGE database.

Capabilities satisfied¶

(No capabilities mapped)

Part of SD07 — Data Platform & Governance Infrastructure Compiled 2026-05-22 from source/entities/modules/MOD-102.yaml