Skip to content

ADR-028: Document storage — S3 and Postgres metadata

Status Accepted
Date 2026-04-10
Deciders CTO, Chief Risk Officer
Affects repos bank-kyc, bank-aml, bank-core, bank-platform, bank-app

Context

The bank handles three categories of documents with distinct access patterns, retention requirements, and compliance obligations:

  1. KYC uploads — customer-submitted identity documents, selfies, and proof of address collected during onboarding
  2. Customer-facing documents — bank-generated statements, product disclosure statements, loan contracts, and notices
  3. Audit and internal documents — system decision logs, regulatory submissions, case files, and immutable compliance records

Each category has different readers (customers vs staff vs regulators), different retention periods, and different mutability requirements. A single storage strategy must handle all three without leaking access across categories.

Decision

AWS S3 (ap-southeast-2) for all document storage. Postgres metadata table (per domain, within the domain's Neon database per ADR-024) for queryable document metadata. Pre-signed URLs for all document access.

Storage tiers

Tier Contents Encryption Retention Access Mutability
KYC uploads Customer ID docs, selfies, proof of address SSE-KMS (customer-managed key) 7 years (AML/CFT) Internal staff only, via case tooling Immutable after upload
Customer documents Statements, contracts, PDS, notices SSE-S3 7 years (financial records) Customer via app (pre-signed URL); staff via back-office Immutable once generated
Audit/internal Decision logs, regulatory reports, case files SSE-KMS Per record type (7–10 years) Staff and regulators only S3 Object Lock (WORM) where NFR-024 applies

Access pattern

No document has a permanent public URL. Every document access is an authenticated, time-limited event:

Customer requests statement
  → Lambda verifies JWT, checks ownership
  → Lambda generates pre-signed S3 URL (15-minute TTL)
  → URL returned to app; app fetches document directly from S3
  → Access logged (document_id, customer_id, timestamp, IP)

Staff access follows the same pattern via back-office tooling, with RBAC enforced at the Lambda level before pre-signed URL generation.

Metadata layer

Each domain that owns documents maintains a documents table in its Neon database (ADR-024):

Column Type Description
document_id UUID Primary key
type enum kyc_upload, statement, contract, pds, audit_record, etc.
owner_id UUID Customer ID, case ID, or system identifier
s3_bucket text Bucket name (environment-specific)
s3_key text Object key within bucket
tier enum kyc, customer, audit
created_at timestamptz Generation timestamp
generated_by text Lambda function or system that created the document
access_log jsonb Array of access events (reader_id, timestamp, method)

This enables: listing all statements for a customer, finding all documents in a case, querying documents by date range or type — none of which are possible against S3 alone.

Upload pattern (KYC documents)

Customers never upload directly through Lambda. Lambda generates a pre-signed upload URL; the client uploads directly to S3. An S3 event trigger fires a Lambda that: 1. Runs virus/malware scan 2. Validates file type and size 3. Routes to eIDV provider (MOD-009) if applicable 4. Creates the metadata record 5. Archives to the KYC tier

This keeps document bytes out of Lambda memory and minimises data transit.

Consequences

Positive: - Pre-signed URLs mean no document bytes flow through application Lambda — low latency, reduced cost, no memory pressure - S3 Object Lock on the audit tier satisfies NFR-024 (audit log mutability = 0) without custom immutability logic - Postgres metadata enables rich querying (list customer statements, case documents, regulatory report history) that S3 cannot provide alone - KMS encryption with customer-managed keys satisfies CPS 234 encryption requirements and allows key rotation without re-encrypting documents - S3 lifecycle policies automate retention — documents are transitioned to Glacier after active period and deleted at retention expiry without manual intervention

Negative: - Two components to keep in sync (S3 object + Postgres metadata record) — a failed metadata write after a successful S3 upload creates an orphaned object. Lambda must handle this idempotently (metadata record upsert on S3 event trigger) - Pre-signed URL TTL must be tuned — too short causes UX friction; too long increases exposure window. 15 minutes is the starting point, adjustable per document tier - S3 Object Lock requires bucket-level configuration at creation — cannot be added retroactively. Audit buckets must be created with Object Lock enabled from day one

Alternatives considered

Streaming documents through Lambda: Rejected. Documents up to several MB flowing through Lambda memory is wasteful and slow. Pre-signed URLs are the standard S3 pattern for this reason.

Dedicated document management system (SharePoint, Confluence, DocuWare): Rejected. Adds a third-party vendor with its own data residency and access control concerns. S3 + metadata in Postgres achieves the same capability within the existing infrastructure.

DynamoDB for metadata: Rejected. Document metadata is relational — joins to customers, cases, and access logs are natural. Postgres (already the OLTP store) is the right fit.

Single flat S3 bucket structure: Rejected. Mixing KYC uploads, customer statements, and audit records in one bucket with prefix-based access control is an access control error waiting to happen. Separate buckets per tier with distinct IAM policies is the correct pattern.



Signoff record

Date Name Role Status
2026-04-10 Ross Millen CTO Approved
2026-04-10 Ross Millen Head of Architecture Approved
2026-04-10 Ross Millen Head of Data Approved

Capabilities

Capability Description Relationship
CAP-029 Immutable audit log enabled — S3 Object Lock (WORM) on audit tier; access log on every document access
CAP-030 Regulator evidence portal enabled — immutable audit tier with PAM-gated access for regulator and legal
CAP-045 Digital KYC — document + selfie verification enabled — KYC upload tier stores customer identity documents
CAP-117 Document upload & secure storage enabled — pre-signed upload URL pattern; S3 storage tiers; virus scan on upload
CAP-118 Statement generation & download enabled — statements stored in S3 customer document tier; pre-signed download URL

ADR Title Relationship
ADR-023 Cloud provider and region strategy S3 in ap-southeast-2 satisfies data residency
ADR-024 Database hosting — Neon serverless Postgres document metadata lives in domain Neon databases
ADR-026 Customer authentication — Cognito, mobile-first, passwordless JWT required for pre-signed URL generation

All ADRs Compiled 2026-05-22 from source/entities/adrs/ADR-028.yaml