Technical design — MOD-003 Real-time balance engine¶
Module: MOD-003 System: SD01 Core Banking Repo: bank-core FR scope: FR-053..056, FR-429..432 NFR scope: NFR-012, NFR-013, NFR-019 Policies satisfied: PAY-001 (GATE), CLQ-002 (CALC), CON-005 (AUTO), CLQ-004 (CALC) Author: AI coding agent (Claude) Date: 2026-04-30
Objective¶
MOD-003 is the read-side of SD01 account state. MOD-001 already keeps
accounts.accounts.balance and available_balance in sync inside the
posting transaction (FR-053's write path); MOD-003 owns the
synchronous balance read API, the hold lifecycle (FR-429), historical
balance reconstruction from MOD-002's immutable transaction log
(FR-056), the daily EOD balance snapshot for regulatory point-in-time
queries (FR-431), and the publication of bank.core.balance_updated
to downstream consumers. Every payment validator (MOD-020), fraud
scorer (MOD-023), liquidity engine (MOD-032), IRRBB engine (MOD-035),
customer dashboard (MOD-077), and transaction history view (MOD-070)
calls into this module.
Internal architecture¶
API Gateway HTTP API
GET /internal/v1/balance/{account_id} ─▶ Mod003BalanceQueryHandler
GET /internal/v1/balance/by-party/{party_id} ─▶ Mod003BalanceQueryHandler
GET /internal/v1/balance/{account_id}/at ─▶ Mod003ReconstructionHandler
POST /internal/v1/holds ─▶ Mod003HoldsHandler
POST /internal/v1/holds/{hold_id}/release ─▶ Mod003HoldsHandler
EventBridge bank-core ─▶ posting_completed rule ─▶ Mod003BalanceUpdatedPublisher
└─▶ bank.core.balance_updated
EventBridge schedules:
cron(55 11 * * ? *) — NZ EOD, 23:55 NZST ─▶ Mod003EodSnapshotJob
cron(55 13 * * ? *) — AU EOD, 23:55 AEST ─▶ Mod003EodSnapshotJob
rate(5 minutes) — hold expiry sweep ─▶ Mod003HoldExpirySweeper
Six Lambdas; one HTTP API with five routes; one EventBridge consumer rule plus three schedule rules; two CloudWatch alarms; one dashboard.
Key design decisions¶
Decision: MOD-001 keeps the writer path (orchestrator A1)¶
Context: FR-053 attributes balance maintenance to MOD-003. MOD-001
already updates accounts.accounts.balance / available_balance
atomically inside the posting transaction.
Choice: MOD-001 keeps the writer; MOD-003 is the read surface + hold management + reconstruction + EOD snapshot + event publisher.
Reason: Smallest scope; preserves NFR-012 posting latency; treats FR-053 as already-satisfied by MOD-001's atomic update. MOD-003 asserts the read reflects the live denormalised values.
Decision: optimistic locking for hold writes only (orchestrator A2)¶
Context: FR-430 specifies optimistic locking with a version
counter. MOD-001's posting flow uses pessimistic SELECT … FOR UPDATE.
Choice: Pessimistic on the posting hot path, optimistic on the
hold-write path. The version int NOT NULL DEFAULT 0 column on
accounts.accounts (added by V002) is incremented atomically inside
both writers — MOD-001's FOR UPDATE makes the increment serialisable;
MOD-003's hold writes do UPDATE … WHERE version = $expected and
retry up to 3 times on row-count = 0.
Reason: Avoids invasive changes to MOD-001's posting hot path
while satisfying FR-430's contract. The two strategies coexist
because FOR UPDATE blocks the optimistic UPDATE until the posting
commits, at which point the retry succeeds with the post-image
version. After 3 failed retries → CONCURRENT_MODIFICATION (HTTP 503
+ retryable per the FR).
Decision: reconstruction reads from MOD-002 (orchestrator A6)¶
Context: Both accounts.postings (MOD-001) and core.transaction_log
(MOD-002) contain the same data. FR-056 is a "match the stored balance"
check.
Choice: Replay signed amounts from core.transaction_log. The
hash chain verifies its own integrity (FR-427); using it as the
source decouples MOD-003 from MOD-001's storage layout.
Reason: Independent verification path. If the live balance ever drifts from the immutable log, the FR-056 endpoint surfaces it.
Decision: EOD snapshots are append-only¶
Context: FR-431 requires daily EOD snapshots, retained 7 years, "to support regulatory point-in-time balance queries".
Choice: accounts.daily_balance_snapshots is append-only —
INSERT-only role grant + RLS policies that block UPDATE / DELETE
(same pattern as MOD-002 core.transaction_log). Re-running the
snapshot for the same date is a no-op via ON CONFLICT (snapshot_date,
account_id) DO NOTHING.
Reason: A regulatory snapshot must be tamper-evident. Anyone who needs to "correct" a snapshot inserts a one-off audit record at a synthetic date — never overwrites the original.
Decision: hold expiry computed at read time + scheduled cleanup¶
Context: FR-429 says active holds reduce available balance and expire after 24h (default).
Choice: The LATERAL aggregate in balance-reader.ts filters
by expires_at > now() AND released_at IS NULL, so an expired hold
ceases to reduce available balance the instant now() rolls past
expires_at. A scheduled sweeper Lambda (rate(5 minutes)) flips
released_at = now() on expired rows so the partial index stays small.
Reason: Read-time freshness without depending on the sweeper's cadence; sweeper is bookkeeping not correctness.
Decision: per-jurisdiction EOD cron¶
Context: NZ and AU have different "end of day" wall-clocks (NZST = UTC+12, AEST = UTC+10).
Choice: Two separate aws.cloudwatch.EventRules, each invoking
the EOD Lambda with { jurisdiction: "NZ" | "AU" } in the input.
The Lambda's SQL filters accounts.accounts WHERE jurisdiction = $1.
Reason: Each jurisdiction's snapshot reflects its own true EOD. Daylight-saving handling deferred to a follow-up — today the cron runs on standard-time offsets and accepts the 1-hour slip across DST.
External dependencies¶
- Database:
bank_coreon Neon (provisioned by MOD-103) - READ:
accounts.accounts,accounts.pending_holds,accounts.account_party_relationships,core.transaction_log - WRITE:
accounts.pending_holds,accounts.daily_balance_snapshots,accounts.accounts.version - EventBridge (
bank-corebus) - Consumes:
bank.core.posting_completed - Publishes:
bank.core.balance_updated - Secrets Manager:
bank-neon/{stage}/bank_core/app_user - SSM (read):
/bank/{stage}/eventbridge/bank-core/arn/bank/{stage}/eventbridge/bank-core/dlq-arn/bank/{stage}/iam/lambda/bank-core/arn/bank/{stage}/observability/adot-nodejs-layer-arn/bank/{stage}/sns/alerts/arn/bank/{stage}/mod-002/transaction-log-table
SSM outputs table¶
| Output | SSM path | Consumers |
|---|---|---|
| Balance API base URL | /bank/{stage}/mod-003/api/base-url |
MOD-020, MOD-023, MOD-070, MOD-077, MOD-032, MOD-035 |
| Single-account URL | /bank/{stage}/mod-003/balance/url |
(alias) |
| Multi-account URL | /bank/{stage}/mod-003/balance/multi/url |
MOD-077, MOD-074 |
| Holds URL | /bank/{stage}/mod-003/holds/url |
MOD-020, MOD-023 |
| Reconstruction URL | /bank/{stage}/mod-003/reconstruct/url |
MOD-018, MOD-074 |
| Daily snapshot table | /bank/{stage}/mod-003/daily-snapshot-table |
MOD-036, MOD-042 |
| Lambda ARNs | /bank/{stage}/mod-003/{balance-query,holds}-lambda/arn |
MOD-020 (direct invoke if selected) |
Security and data handling¶
- No customer PII flows through MOD-003; UUIDs and money amounts only.
- The EOD snapshot table is append-only at the DB layer (privilege revoke + RLS), defending the regulatory point-in-time query against tampering by the runtime app role.
- Holds carry
payment_idandrelease_reasononly — no document or free-text customer data.
Performance approach¶
- NFR-013 ≤ 5 ms p99 balance read: a single SELECT on
accounts.accountsjoined to aLATERALaggregate of pending_holds. Both keys are indexed (accounts_pkey,idx_pending_holds_account_id_active) so the read is a primary-key lookup + sub-millisecond LATERAL.withConnection(no BEGIN/COMMIT) avoids the transaction overhead for read-only queries. Real verification is the staging in-region load test; the integration NFR-013 check here bounds dev single-read latency at 1 s as a regression gate (laptop ↔ Sydney RTT dominates). - NFR-013 ≤ 20 ms p99 multi-account read (FR-432): keyset on
idx_acct_party_rel_party_idfilters down to ≤ 50 relationships before joining accounts. - ADOT Node.js layer attached to all six Lambdas; X-Ray spans flow
automatically with
trace_idcorrelation.
Error handling¶
- Sync HTTP paths — standard error envelope per the error-handling standard (HTTP 422 / 503 / 500).
CONCURRENT_MODIFICATIONis HTTP 503 +retryable: true(FR-430 contract) — caller retries with the same idempotency_key.- EventBridge consumer (
posting_completed→balance_updated) — re-raise on transient failures so EventBridge retries; bank-core DLQ catches after retry exhaustion. - Scheduled paths (EOD, sweeper) — re-raise on transient failures so the next scheduled run retries; alarm trips if errors ≥ 3 in 5 min.
Event types emitted in structured logs¶
Registered in src/lib/logger.ts (EVENT_TYPES):
balance_query_served,balance_multi_query_served,balance_reconstruction_servedhold_created,hold_released,hold_expiredbalance_updated_published,balance_updated_publish_failedeod_snapshot_completedconcurrent_modification_retriedtrace_id_missing_from_upstream,validation_failed,internal_error
Test approach¶
| Tier | Files | Status |
|---|---|---|
| Unit | tests/unit/{amount,logger,trace,errors,emf,hold-math}.test.ts |
29 / 29 |
| Contract | tests/contract/{balance-updated-event,balance-response,holds-request-response}.test.ts |
6 / 6 |
| FR integration (one per FR) | tests/integration/fr-{053,054,055,056,429,430,431,432}.test.ts + observability-log-schema.test.ts |
pending dev Neon |
| Policy satisfaction (one per row) | tests/policy/{pay-001,clq-002,con-005,clq-004}.test.ts |
pending dev Neon |
The skipIfNoDb + transactionLogExists guards keep the integration
tier green-with-skips while dev Neon's compute is unreachable; once
dev is back the tests run unconditionally.