ADR-028: Document storage — S3 and Postgres metadata¶
| Status | Accepted |
| Date | 2026-04-10 |
| Deciders | CTO, Chief Risk Officer |
| Affects repos | bank-kyc, bank-aml, bank-core, bank-platform, bank-app |
Context¶
The bank handles three categories of documents with distinct access patterns, retention requirements, and compliance obligations:
- KYC uploads — customer-submitted identity documents, selfies, and proof of address collected during onboarding
- Customer-facing documents — bank-generated statements, product disclosure statements, loan contracts, and notices
- Audit and internal documents — system decision logs, regulatory submissions, case files, and immutable compliance records
Each category has different readers (customers vs staff vs regulators), different retention periods, and different mutability requirements. A single storage strategy must handle all three without leaking access across categories.
Decision¶
AWS S3 (ap-southeast-2) for all document storage. Postgres metadata table (per domain, within the domain's Neon database per ADR-024) for queryable document metadata. Pre-signed URLs for all document access.
Storage tiers¶
| Tier | Contents | Encryption | Retention | Access | Mutability |
|---|---|---|---|---|---|
| KYC uploads | Customer ID docs, selfies, proof of address | SSE-KMS (customer-managed key) | 7 years (AML/CFT) | Internal staff only, via case tooling | Immutable after upload |
| Customer documents | Statements, contracts, PDS, notices | SSE-S3 | 7 years (financial records) | Customer via app (pre-signed URL); staff via back-office | Immutable once generated |
| Audit/internal | Decision logs, regulatory reports, case files | SSE-KMS | Per record type (7–10 years) | Staff and regulators only | S3 Object Lock (WORM) where NFR-024 applies |
Access pattern¶
No document has a permanent public URL. Every document access is an authenticated, time-limited event:
Customer requests statement
→ Lambda verifies JWT, checks ownership
→ Lambda generates pre-signed S3 URL (15-minute TTL)
→ URL returned to app; app fetches document directly from S3
→ Access logged (document_id, customer_id, timestamp, IP)
Staff access follows the same pattern via back-office tooling, with RBAC enforced at the Lambda level before pre-signed URL generation.
Metadata layer¶
Each domain that owns documents maintains a documents table in its Neon database (ADR-024):
| Column | Type | Description |
|---|---|---|
| document_id | UUID | Primary key |
| type | enum | kyc_upload, statement, contract, pds, audit_record, etc. |
| owner_id | UUID | Customer ID, case ID, or system identifier |
| s3_bucket | text | Bucket name (environment-specific) |
| s3_key | text | Object key within bucket |
| tier | enum | kyc, customer, audit |
| created_at | timestamptz | Generation timestamp |
| generated_by | text | Lambda function or system that created the document |
| access_log | jsonb | Array of access events (reader_id, timestamp, method) |
This enables: listing all statements for a customer, finding all documents in a case, querying documents by date range or type — none of which are possible against S3 alone.
Upload pattern (KYC documents)¶
Customers never upload directly through Lambda. Lambda generates a pre-signed upload URL; the client uploads directly to S3. An S3 event trigger fires a Lambda that: 1. Runs virus/malware scan 2. Validates file type and size 3. Routes to eIDV provider (MOD-009) if applicable 4. Creates the metadata record 5. Archives to the KYC tier
This keeps document bytes out of Lambda memory and minimises data transit.
Consequences¶
Positive: - Pre-signed URLs mean no document bytes flow through application Lambda — low latency, reduced cost, no memory pressure - S3 Object Lock on the audit tier satisfies NFR-024 (audit log mutability = 0) without custom immutability logic - Postgres metadata enables rich querying (list customer statements, case documents, regulatory report history) that S3 cannot provide alone - KMS encryption with customer-managed keys satisfies CPS 234 encryption requirements and allows key rotation without re-encrypting documents - S3 lifecycle policies automate retention — documents are transitioned to Glacier after active period and deleted at retention expiry without manual intervention
Negative: - Two components to keep in sync (S3 object + Postgres metadata record) — a failed metadata write after a successful S3 upload creates an orphaned object. Lambda must handle this idempotently (metadata record upsert on S3 event trigger) - Pre-signed URL TTL must be tuned — too short causes UX friction; too long increases exposure window. 15 minutes is the starting point, adjustable per document tier - S3 Object Lock requires bucket-level configuration at creation — cannot be added retroactively. Audit buckets must be created with Object Lock enabled from day one
Alternatives considered¶
Streaming documents through Lambda: Rejected. Documents up to several MB flowing through Lambda memory is wasteful and slow. Pre-signed URLs are the standard S3 pattern for this reason.
Dedicated document management system (SharePoint, Confluence, DocuWare): Rejected. Adds a third-party vendor with its own data residency and access control concerns. S3 + metadata in Postgres achieves the same capability within the existing infrastructure.
DynamoDB for metadata: Rejected. Document metadata is relational — joins to customers, cases, and access logs are natural. Postgres (already the OLTP store) is the right fit.
Single flat S3 bucket structure: Rejected. Mixing KYC uploads, customer statements, and audit records in one bucket with prefix-based access control is an access control error waiting to happen. Separate buckets per tier with distinct IAM policies is the correct pattern.
Signoff record¶
| Date | Name | Role | Status |
|---|---|---|---|
| 2026-04-10 | Ross Millen | CTO | Approved |
| 2026-04-10 | Ross Millen | Head of Architecture | Approved |
| 2026-04-10 | Ross Millen | Head of Data | Approved |
Capabilities¶
| Capability | Description | Relationship |
|---|---|---|
| CAP-029 | Immutable audit log | enabled — S3 Object Lock (WORM) on audit tier; access log on every document access |
| CAP-030 | Regulator evidence portal | enabled — immutable audit tier with PAM-gated access for regulator and legal |
| CAP-045 | Digital KYC — document + selfie verification | enabled — KYC upload tier stores customer identity documents |
| CAP-117 | Document upload & secure storage | enabled — pre-signed upload URL pattern; S3 storage tiers; virus scan on upload |
| CAP-118 | Statement generation & download | enabled — statements stored in S3 customer document tier; pre-signed download URL |
Related decisions¶
| ADR | Title | Relationship |
|---|---|---|
| ADR-023 | Cloud provider and region strategy | S3 in ap-southeast-2 satisfies data residency |
| ADR-024 | Database hosting — Neon serverless Postgres | document metadata lives in domain Neon databases |
| ADR-026 | Customer authentication — Cognito, mobile-first, passwordless | JWT required for pre-signed URL generation |
All ADRs
Compiled 2026-05-22 from source/entities/adrs/ADR-028.yaml