Snowflake account configuration & governance¶
| ID | MOD-102 |
| System | SD07 |
| Repo | bank-platform |
| Build status | Deployed |
| Deployed | Yes |
| Last commit | ce48ff1 |
Purpose¶
Provisions and governs the Snowflake account configuration. This is an IaC module — it does not contain Lambda application code. It runs via the bank-platform CI/CD pipeline when configuration changes are merged, and must be fully deployed before any module that reads from or writes to Snowflake can operate.
This module is the implementation of the decisions in ADR-002 and ADR-035.
Execution pattern¶
All Snowflake configuration is expressed as versioned SQL scripts — no Terraform, no provider state file. Snowflake's native IaC is SQL; every object (warehouse, database, role, integration, masking policy, tag, task) is created and managed via CREATE OR REPLACE / ALTER DDL. Scripts are version-controlled in bank-platform under snowflake/setup/ and applied in order by the CI/CD pipeline.
dbt core handles all data transformation layer DDL (views, dynamic tables, incremental models). The SQL setup scripts cover account-level objects only — warehouses, databases, schemas, roles, integrations, network policy, and tag/masking infrastructure.
Account structure¶
Single Snowflake account for all environments. Environments are namespaced by convention, not by separate accounts. This preserves the ability to use Snowflake's Dynamic Data Masking to control what lower-environment users see when accessing replicated production data.
Database namespace convention¶
| Environment | Core | KYC | AML | Payments | Credit | Risk | Platform |
|---|---|---|---|---|---|---|---|
| prod | BANK_PROD_CORE |
BANK_PROD_KYC |
BANK_PROD_AML |
BANK_PROD_PAYMENTS |
BANK_PROD_CREDIT |
BANK_PROD_RISK |
BANK_PROD_PLATFORM |
| uat | BANK_UAT_CORE |
BANK_UAT_KYC |
BANK_UAT_AML |
… | … | … | … |
| dev | BANK_DEV_CORE |
BANK_DEV_KYC |
BANK_DEV_AML |
… | … | … | … |
Schemas within each database mirror the Neon schema names (e.g. BANK_PROD_CORE contains schemas ACCOUNTS, POSTINGS, TREASURY, CONTEXTS, ASSETS).
Lower environment data strategy¶
Two options, not mutually exclusive:
- Synthetic data — dev and UAT databases populated from synthetic data generators; no prod data involved
- Masked prod data — dev and UAT databases populated via Snowflake replication with Dynamic Data Masking policies active; PII columns are masked at query time for all non-privileged roles
Both options are available. The masking infrastructure is provisioned by this module regardless of which strategy is in use at any given time.
Warehouses¶
Production carries dedicated per-workload warehouses. Dev and UAT share a single minimal warehouse to minimise credit consumption.
Production warehouses¶
| Warehouse | Size | Use | Auto-suspend |
|---|---|---|---|
PROD_ETL_WH |
Small | CDC ingestion, Firehose landing, batch loads | 60s |
PROD_ANALYTICS_WH |
Small (scales to Medium) | Interactive queries, BI, ad-hoc | 60s |
PROD_RISK_WH |
Medium | Risk model computation, LCR/NSFR, stress tests | 60s |
PROD_DECISIONS_WH |
X-Small | Decision result publication (MOD-079) | 30s |
PROD_DBT_WH |
Small | dbt core model runs (transformation pipeline) | 60s |
Resource monitors cap daily credit consumption per warehouse with alerts at 75% and a hard stop at 100%.
Non-production warehouses¶
| Warehouse | Size | Use | Environments |
|---|---|---|---|
NONPROD_WH |
X-Small | All workloads (ETL, analytics, dbt runs) | dev, uat |
A single warehouse per non-prod environment prevents idle credit burn from multiple suspended warehouses.
RBAC architecture¶
Role hierarchy¶
ACCOUNTADMIN (Snowflake built-in — break-glass only, credentials in PAM)
└─ SYSADMIN
└─ BANK_FUNCTIONAL_ROLE
├─ BANK_PROD_CORE_ROLE (PROD — bank_prod_core DB, read/write)
├─ BANK_PROD_KYC_ROLE (PROD — bank_prod_kyc DB)
├─ BANK_PROD_AML_ROLE
├─ BANK_PROD_PAYMENTS_ROLE
├─ BANK_PROD_CREDIT_ROLE
├─ BANK_PROD_RISK_ROLE (PROD — read/write)
├─ BANK_PROD_PLATFORM_ROLE (PROD — ETL write)
├─ BANK_NONPROD_CORE_ROLE (dev + uat — all BANK_DEV_CORE + BANK_UAT_CORE)
├─ BANK_NONPROD_KYC_ROLE
├─ … (one per domain for non-prod)
├─ BANK_ANALYTICS_ROLE (SELECT on approved prod views — masked)
├─ BANK_REPORTING_ROLE (SELECT on regulatory reporting schemas — prod)
├─ BANK_DEVELOPER_ROLE (SELECT on all dev + uat databases; no prod access)
└─ BANK_DBT_ROLE (CREATE/REPLACE on transformation schemas — dbt service account)
SECURITYADMIN
└─ BANK_SECURITY_ROLE (manages grants and masking policy assignments — not used for data access)
Developer access¶
BANK_DEVELOPER_ROLE grants:
- SELECT on all BANK_DEV_* and BANK_UAT_* databases
- No access to BANK_PROD_* databases
- Usage on NONPROD_WH
Developer role assignment is managed via EntraID AD group sync (see Identity section). No manual Snowflake user creation or role grants for individual developers.
Service account roles¶
Each system domain's Lambda execution role authenticates to Snowflake via key-pair authentication and assumes the corresponding prod or non-prod domain role. Keys are stored in Secrets Manager (MOD-045), scoped per environment.
Identity and access — EntraID SSO¶
EntraID is external to the bank software boundary. This section documents the Snowflake-side configuration required and what must be configured in EntraID by the identity team.
Snowflake-side setup (provisioned by this module)¶
-- Enable SCIM for EntraID provisioning
CREATE OR REPLACE SECURITY INTEGRATION bank_entra_scim
TYPE = SCIM
SCIM_CLIENT = 'AZURE'
RUN_AS_ROLE = BANK_SECURITY_ROLE;
-- SAML SSO integration for interactive login
CREATE OR REPLACE SECURITY INTEGRATION bank_entra_sso
TYPE = SAML2
SAML2_ISSUER = 'https://sts.windows.net/{tenant-id}/'
SAML2_SSO_URL = 'https://login.microsoftonline.com/{tenant-id}/saml2'
SAML2_PROVIDER = 'CUSTOM'
SAML2_X509_CERT = '<certificate from EntraID app registration>';
EntraID configuration required (identity team — external)¶
| Item | Detail |
|---|---|
| Enterprise app | Create "Snowflake" enterprise app in EntraID tenant |
| SCIM provisioning | Enable SCIM provisioning to Snowflake SCIM endpoint; provision bearer token from the SCIM integration above |
| User lifecycle | SCIM creates Snowflake users on EntraID assignment; disables/deletes on offboarding |
| AD group → role mapping | bank-snowflake-developers → BANK_DEVELOPER_ROLE; bank-snowflake-analysts → BANK_ANALYTICS_ROLE; bank-snowflake-reporting → BANK_REPORTING_ROLE; domain service accounts use key-pair auth, not SSO |
| MFA policy | Conditional Access policy requiring MFA for all Snowflake SSO sessions |
Service account connections (Lambda, dbt, CI/CD) use key-pair authentication and are not subject to SSO. Only human interactive logins go via EntraID SAML.
Integrations¶
Storage integration — S3¶
S3 access for CDC landing, Iceberg data lake, and file export stages:
CREATE OR REPLACE STORAGE INTEGRATION bank_s3_integration
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = 'S3'
ENABLED = TRUE
STORAGE_ALLOWED_LOCATIONS = (
's3://bank-snowflake-prod-landing/',
's3://bank-iceberg-prod/',
's3://bank-export-prod/'
);
Notification integration — SQS¶
SQS for Snowpipe auto-ingest from CDC Firehose landing:
CREATE OR REPLACE NOTIFICATION INTEGRATION bank_sqs_integration
TYPE = QUEUE
NOTIFICATION_PROVIDER = AWS_SQS
ENABLED = TRUE
AWS_SQS_ARN = '<SQS ARN from MOD-104>';
Git integration — CI/CD and dbt¶
Snowflake Git integration connects directly to the bank-platform repository. This enables dbt models to be executed natively within Snowflake without an external orchestrator, using Snowflake's embedded dbt support (Snowflake Notebook + dbt Core via Snowpark Container Services):
CREATE OR REPLACE API INTEGRATION bank_github_integration
API_PROVIDER = git_https_api
API_ALLOWED_PREFIXES = ('https://github.com/totara-bank/')
ENABLED = TRUE;
CREATE OR REPLACE GIT REPOSITORY bank_platform_repo
API_INTEGRATION = bank_github_integration
ORIGIN = 'https://github.com/totara-bank/bank-platform';
dbt models are fetched from bank_platform_repo and executed by Snowflake Tasks on the PROD_DBT_WH. This replaces any external dbt runner — transformation runs are entirely internal to Snowflake.
Network policy¶
PrivateLink is not configured at this stage (cost not justified for current scale). Network access is controlled by an IP allowlist network policy:
CREATE OR REPLACE NETWORK POLICY bank_network_policy
ALLOWED_IP_LIST = (
'<AWS NAT gateway EIP — NZ>',
'<AWS NAT gateway EIP — AU>',
'<CI/CD runner IP range>',
'<Admin bastion IP — PAM-gated, MOD-046>'
);
ALTER ACCOUNT SET NETWORK_POLICY = bank_network_policy;
No public internet access from application code. All Lambda-to-Snowflake traffic exits via the AWS NAT gateway with a fixed EIP, which is included in the allowlist. PrivateLink is documented as the natural upgrade path when data volume and cost profile warrant it.
dbt core — transformation pipeline¶
dbt core handles all data transformation: model definitions, incremental materialisation, data mart construction, Dynamic Table definitions, and regulatory view construction. Schemachange is not used.
dbt runs natively within Snowflake via the Git repository integration and Snowflake Tasks. The embedded dbt execution path means no external orchestration dependency for transformation runs.
Responsibilities:
- Schema creation (CREATE SCHEMA IF NOT EXISTS) for transformation layers
- Model materialisation (views, tables, incremental, Dynamic Tables)
- Data lineage documentation (dbt docs)
- Data tests (schema tests, custom tests on business rules)
Each system domain that writes to Snowflake has a corresponding dbt project directory under bank-platform/dbt/. The BANK_DBT_ROLE / BANK_DBT_WH are the execution context.
PII tagging and dynamic data masking¶
Object tags¶
A PII_CLASSIFICATION tag is applied to all columns containing personal or sensitive data:
CREATE OR REPLACE TAG pii_classification
ALLOWED_VALUES 'PII_HIGH', 'PII_MEDIUM', 'FINANCIAL', 'INTERNAL', 'PUBLIC';
PII_HIGH covers: name, date of birth, tax identifiers (IRD/TFN), passport/driver licence numbers, biometric data.
PII_MEDIUM covers: email, phone, address, account numbers.
FINANCIAL covers: balances, transaction amounts, credit scores.
Tags are applied to columns as part of the dbt model definitions (meta: {pii_classification: PII_HIGH}) and enforced via the BANK_SECURITY_ROLE governance process.
Masking policies¶
Dynamic Data Masking policies are bound to the PII_CLASSIFICATION tag. In BANK_PROD_* databases:
- Roles with BANK_PROD_{DOMAIN}_ROLE see unmasked data (service accounts only)
- BANK_ANALYTICS_ROLE and BANK_REPORTING_ROLE see masked values (e.g. '***' for names, last-4-digits for account numbers)
- BANK_DEVELOPER_ROLE has no access to prod databases at all
In BANK_DEV_* and BANK_UAT_* databases, masking policies are present but pass-through — lower environments hold either synthetic data or masked-at-source prod data. The same policy infrastructure applies in all environments for consistency.
Internal orchestration¶
Snowflake Tasks and Streams drive all internal pipeline triggers. No external orchestrator is required for Snowflake-internal workloads.
Key tasks provisioned by this module:
- CDC ingestion trigger — Task on PROD_ETL_WH polling the Snowpipe ingest completion stream
- dbt model refresh — Task on PROD_DBT_WH executing the dbt Git-based run on schedule
- Dynamic Table refresh scheduling — managed by Snowflake natively once DTs are defined by dbt
- Regulatory report generation — Task on PROD_RISK_WH triggering MOD-086 report queries on schedule
Task dependency graphs are defined in the SQL setup scripts. All tasks are BANK_FUNCTIONAL_ROLE-owned.
File export¶
COPY INTO external S3 stages is the mechanism for all outbound file generation from Snowflake. Use cases requiring file export:
| Use case | Stage | Format | Consumer |
|---|---|---|---|
| Regulatory submissions | bank-export-prod/regulatory/ |
CSV / pipe-delimited | RBNZ, FMA, AUSTRAC — via MOD-086 |
| Audit data extracts | bank-export-prod/audit/ |
Parquet | External auditors, internal audit |
| Third-party data feeds | bank-export-prod/partners/ |
CSV | Configured per agreement |
| Credit bureau submissions | bank-export-prod/credit-bureau/ |
Fixed-width / CSV | MOD-059 |
The bank_s3_integration storage integration covers all export stages. Export tasks are triggered by Snowflake Tasks (see internal orchestration above).
Observability → MOD-076¶
Snowflake account metadata is exported to the observability platform (MOD-076) for monitoring, alerting, and cost governance. A scheduled Task runs on PROD_ANALYTICS_WH to export from ACCOUNT_USAGE views to a nominated S3 path, where MOD-076 ingests it.
Exported datasets:
| Source view | Content | Frequency |
|---|---|---|
ACCOUNT_USAGE.QUERY_HISTORY |
Query execution, latency, credit consumption per query | Hourly |
ACCOUNT_USAGE.METERING_HISTORY |
Credit consumption per warehouse | Daily |
ACCOUNT_USAGE.TASK_HISTORY |
Task execution status, errors, duration | Hourly |
ACCOUNT_USAGE.COPY_HISTORY |
Snowpipe and COPY INTO completions/failures | Hourly |
ACCOUNT_USAGE.LOGIN_HISTORY |
Authentication events (success/failure) | Daily |
ACCOUNT_USAGE.GRANT_PRIVILEGES_HISTORY |
Role grant changes | Daily |
Alerting on: warehouse credit overage (resource monitor threshold), task failures, login anomalies. Alerts route through MOD-076's alerting pipeline.
Module dependencies¶
Depends on¶
| Module | Title | Required? | Contract | Reason |
|---|---|---|---|---|
| MOD-076 | Observability platform | Optional | — | MOD-076 consumes Snowflake data and binds to BANK_SNS_INTEGRATION; it does not block MOD-102 deployment. |
| MOD-104 | AWS shared infrastructure bootstrap | Required | — | AWS S3 storage integration, SQS notification integration, and export stage S3 buckets require the shared AWS resources provisioned by MOD-104 to exist first. |
Required by¶
| Module | Title | As | Contract |
|---|---|---|---|
| MOD-017 | ML behavioural scoring model | Hard dependency | — |
| MOD-032 | LCR / NSFR calculator | Hard dependency | — |
| MOD-033 | RWA & capital ratio engine | Hard dependency | — |
| MOD-034 | Stress testing scenario engine | Hard dependency | — |
| MOD-035 | IRRBB / EVE / NII model | Hard dependency | — |
| MOD-036 | Prudential return builder (RBNZ / APRA) | Hard dependency | — |
| MOD-037 | AUSTRAC / RBNZ AML reporting pipeline | Hard dependency | — |
| MOD-038 | Data quality & reconciliation monitor | Hard dependency | — |
| MOD-039 | Customer risk score model | Hard dependency | — |
| MOD-040 | Churn & health score engine | Hard dependency | — |
| MOD-041 | Categorisation & merchant enrichment model | Hard dependency | — |
| MOD-042 | CDC pipeline — Neon logical replication to S3 Iceberg | Hard dependency | — |
| MOD-055 | Onboarding fraud scoring engine | Hard dependency | — |
| MOD-056 | Compliance visibility engine | Hard dependency | — |
| MOD-057 | Statistical returns & survey engine | Hard dependency | — |
| MOD-058 | Regulatory incident & breach notification engine | Hard dependency | — |
| MOD-060 | FATCA/CRS/AEOI reporting engine | Hard dependency | — |
| MOD-080 | Statutory financial reporting & ERP integration | Hard dependency | — |
| MOD-085 | Market rates ingestion & normalisation | Hard dependency | — |
| MOD-086 | Funds transfer pricing engine | Hard dependency | — |
| MOD-098 | Cost attribution engine | Hard dependency | — |
| MOD-099 | Infrastructure cost reports | Hard dependency | — |
| MOD-101 | Wealth intelligence engine | Hard dependency | — |
| MOD-105 | Product eligibility engine | Hard dependency | — |
| MOD-106 | ROTE engine | Hard dependency | — |
| MOD-147 | Related party exposure monitor | Hard dependency | — |
| MOD-150 | Risk management platform | Hard dependency | — |
| MOD-152 | Climate risk assessment | Hard dependency | — |
| MOD-170 | Regulatory Submissions Portal | Hard dependency | — |
| MOD-171 | Risk Intelligence Dashboard | Hard dependency | — |
| MOD-172 | Operations & Model Intelligence Dashboard | Hard dependency | — |
| MOD-173 | Model risk register & inventory | Hard dependency | — |
| MOD-174 | Model performance monitoring & drift detection | Hard dependency | — |
| MOD-176 | Snowflake read API service | Hard dependency | — |
Policies satisfied¶
| Policy | Title | Mode | How |
|---|---|---|---|
| DT-001 | Information Security Policy | GATE |
Snowflake RBAC roles enforce schema-level access boundaries — no system domain can read another domain's raw tables without an explicit cross-domain read grant reviewed by the data governance board. |
| DT-002 | Cybersecurity Policy | AUTO |
All schema transformations are applied through the version-controlled dbt core pipeline — no ad-hoc schema modifications permitted in production. |
| DT-004 | Data Governance Policy | AUTO |
The single governed CDC pipeline (MOD-042) is the only path for operational data to enter Snowflake — enforced by storage integration policy and warehouse grants. |
| GOV-007 | Conflicts of Interest Policy | LOG |
All warehouse activity, login events, and grant changes are captured in Snowflake's account_usage schema and retained for seven years via the ACCOUNT_USAGE database. |
Capabilities satisfied¶
(No capabilities mapped)
Part of SD07 — Data Platform & Governance Infrastructure
Compiled 2026-05-22 from source/entities/modules/MOD-102.yaml