Data architecture¶
Core principle¶
Postgres is the operational source of truth. Snowflake is the analytical brain. Kafka is the nerve system connecting them. Nothing flows backwards — data moves from Postgres → Kafka → Snowflake. Operational results from Snowflake write back to Postgres via a controlled API.
OLTP layer — Postgres¶
Purpose: Every transactional write. Source of truth for all financial positions.
Key schemas:
- ledger — ledger_entries (append-only), accounts, balances
- customers — customer records, kyc_status, risk_score (write-back)
- kyc — kyc_records, cdd_decisions, sanctions_matches
- payments — payment_instructions, payment_status, fx_conversions
- products — product_instances, interest_rates, account_features
Critical rules:
- ledger_entries is append-only — enforced by trigger. No UPDATE or DELETE ever.
- All balance reads for payment authorisation use this layer — never Snowflake
- Write latency target: p99 ≤ 10ms (NFR-012)
- Read latency target: p99 ≤ 5ms for balance queries (NFR-013)
CDC pipeline — Debezium¶
Debezium reads the Postgres WAL (write-ahead log) and publishes a change event to Kafka for every INSERT, UPDATE, and DELETE across all operational tables. Schema-versioned via Confluent Schema Registry.
Latency target: Postgres commit → Kafka delivery ≤ 1s p99 (NFR-015)
Event streaming — Kafka¶
Topic naming: bank.[domain].[event-type]
| Topic | Source | Consumers |
|---|---|---|
bank.transactions.created |
bank-core | bank-aml, bank-risk-platform, bank-app |
bank.accounts.updated |
bank-core | bank-risk-platform, bank-app |
bank.kyc.events |
bank-kyc | bank-aml, bank-risk-platform |
bank.payments.completed |
bank-payments | bank-core, bank-app |
bank.ledger.entries |
bank-core | bank-risk-platform |
OLAP layer — Snowflake¶
Purpose: All analytical computation. Risk models, regulatory reporting, ML training, customer intelligence.
Key objects: - Dynamic tables — continuously refreshed from Kafka via Snowpipe Streaming - Cortex models — fraud scoring, categorisation, credit scoring, AML typology - Regulatory marts — capital/liquidity, RBNZ/APRA returns, AML reporting
Latency from Postgres commit to Snowflake availability: typically 5–15 seconds via Snowpipe Streaming.
Write-back pattern¶
Snowflake computes values that need to be read on the transaction path. These are written back to Postgres via a controlled write-back API — never via direct DB connection.
| Value | Snowflake model | Write-back target | Latency target |
|---|---|---|---|
| Customer risk score | customer_risk_score | customers.risk_score |
≤ 60s (NFR-014) |
| Fraud score | fraud_transaction_model | payments.fraud_score |
≤ 60s (NFR-014) |
| Credit pre-approval | credit_preapproval_model | credit_offers table |
Nightly batch |
| AML typology score | aml_typology_model | aml_scores table |
≤ 60s |
Rule: If a value is needed on the transaction path, it must be in Postgres. Snowflake is never called inline on a payment or account operation.
Data residency¶
All Postgres data for NZ customers stored in AWS ap-southeast-2 (Sydney) or ap-southeast-4 (Melbourne). AU customer data in AU regions. No NZ customer PII in US or EU regions. Snowflake account configured for AU data residency with no cross-region replication of PII.
Related¶
- Architectural principles — AP-002 (data governance), AP-008 (real time) and AP-001 (KISS) are the primary principles governing this architecture
- ADR-001 — Postgres as the OLTP operational store
- ADR-002 — Snowflake as the analytics and risk compute platform
- ADR-003 — Kafka as the real-time event streaming backbone
- ADR-013 — External data sharing from Snowflake