Schema migrations¶
Resolves: GAP-D10 — No schema migration management specification.
The platform uses Flyway for schema migrations on all Neon Postgres databases. Each of the 8 system domain repositories contains a db/migrations/ directory. This document defines naming conventions, sequencing rules, the migration pipeline, rollback procedures, and schema ownership.
Related: deployment sequence · post-deployment checklist · secrets manifest
Naming convention¶
Forward migrations¶
Example: V001__create_accounts_table.sql
Undo migrations¶
Example: U001__create_accounts_table_rollback.sql
An undo migration must exist for every forward migration. The CI pipeline enforces this: a PR that adds a V file without a corresponding U file fails the lint check.
Rules¶
| Rule | Detail |
|---|---|
| Sequence number | Zero-padded 3 digits, starting at 001 per database. Independent per database — V001 in bank_core is unrelated to V001 in bank_kyc. |
| Description | lower_snake_case, ≤ 50 characters, imperative mood. |
| Imperative verbs | Use: create_, add_, drop_, alter_, backfill_, index_. Never use vague descriptions like misc_changes or fix. |
| Characters | No spaces. No special characters except underscore (_) and the double-underscore separator between number and description. |
| Extension | .sql only. No other migration formats. |
Example file listing¶
db/migrations/
V001__create_accounts_table.sql
U001__create_accounts_table_rollback.sql
V002__add_balance_snapshot_column.sql
U002__add_balance_snapshot_column_rollback.sql
V003__create_posting_entries_index.sql
U003__create_posting_entries_index_rollback.sql
Sequencing across domains¶
- Each database has its own independent migration sequence. There is no global sequence number.
- Cross-database migrations never exist. No single migration file touches two databases. If two databases need coordinated changes (for example, adding a foreign-key-equivalent relationship enforced at the application layer), they are written as separate migration files in separate repositories.
- When two separate migrations must be deployed in a specific order (for example, the
bank_kycschema change before thebank_coreconsuming service), this is documented as a deployment dependency in the release notes and enforced by the deployment sequence, not by Flyway.
Migration pipeline¶
How migrations run¶
Flyway runs as a CI pipeline step before the Lambda deployment step. The Lambda deployment does not proceed if the migration fails — this prevents a new Lambda version from starting against an incompatible schema.
Pull request merged → CI pipeline starts
↓
1. Flyway migrate (against target environment branch)
└─ Fail → pipeline aborts; Lambda does not deploy
└─ Pass → continue
↓
2. Lambda deploy (SAM / Terraform)
↓
3. Post-deploy health checks (see post-deployment checklist)
Connection during migration¶
Flyway connects using the direct (non-pooled) Neon connection string — not the PgBouncer pool. DDL statements require session mode and cannot run via PgBouncer transaction mode. The direct connection string is read from the environment variable FLYWAY_URL, which is injected from Secrets Manager in the CI pipeline.
Do not use the PgBouncer connection string for migration. The migration will appear to succeed for simple SQL, but DDL that requires session-level state (such as CREATE INDEX CONCURRENTLY, advisory locks, or SET LOCAL) will behave unexpectedly.
Transactional safety¶
Each migration runs inside a transaction. If the migration SQL raises an error, Postgres rolls back the entire migration automatically. Flyway marks the migration as FAILED in flyway_schema_history and the pipeline fails.
Exception — non-transactional DDL. Certain Postgres DDL statements cannot run inside a transaction:
CREATE INDEX CONCURRENTLYCREATE DATABASEVACUUM
If a migration contains non-transactional DDL and fails partway through, Postgres cannot roll it back. This leaves the schema in a partial state. These migrations must be designed so that partial application is recoverable — typically by wrapping in a conditional (CREATE INDEX CONCURRENTLY IF NOT EXISTS) so the migration can be re-run safely. Any migration containing non-transactional DDL must be reviewed by a DBA before merging (use the db-review label on the PR).
If a broken migration leaves the schema in a partial state, manual DBA intervention is required via MOD-046 (database access and DBA tooling) before the pipeline can proceed.
Flyway schema history¶
Flyway records each applied migration in the flyway_schema_history table in each database. This table must never be manually modified. Modifying it to force a migration state is prohibited outside of a formal incident response with written DBA sign-off.
Environment promotion¶
Migrations are applied independently to each environment branch as code moves through the pipeline.
Neon's branching model means each environment has its own database branch. Flyway applies migrations to the target branch when the CI pipeline runs for that environment.
Promotion rules¶
- A migration that passes dev and UAT is expected to pass production. Any difference indicates schema drift (the environment branches have diverged) or a data dependency that the migration was not designed to handle. Both are bugs.
- Never apply a migration directly to production. Do not connect to the prod Neon branch with the
migrate_userrole outside the CI pipeline. This bypasses review, logging, and the pre-migration validation check. - Never skip an environment. A migration that has not been applied to UAT must not be applied to production, even under incident pressure. If an urgent fix is needed, run it through the full pipeline from dev — Neon branching makes this fast.
- Hotfix branches follow the same rule: create a hotfix branch from the production branch, run migrations through the hotfix dev and UAT branches, then merge to production.
Backward compatibility requirement¶
All migrations must be backward-compatible for at least one release cycle. "Backward-compatible" means the current production Lambda version continues to function correctly against the migrated schema, so that a rollback of the Lambda (without rolling back the schema) is safe.
Mandatory rules¶
Never drop a column the current production Lambda reads. The new Lambda may not need it, but the old Lambda (the rollback target) does. Mark the column for removal in the next release, not the current one.
Never rename a column directly. A direct rename breaks the current Lambda immediately. Instead:
- Add a new column with the new name.
- Deploy the new Lambda that reads the new column name (and writes to both during the transition).
- In the next release, drop the old column.
Never change a column type in a breaking way. Widening a type (e.g. INT → BIGINT) is usually safe. Narrowing (e.g. BIGINT → INT) or changing semantics (e.g. VARCHAR to an enum) is not. Use the expand/contract pattern:
- Add the new column.
- Backfill data from the old column.
- Deploy code that uses the new column.
- In a subsequent migration, drop the old column.
The expand/contract pattern¶
Any migration that would break the current Lambda in production must be split into at least two migrations across two release cycles:
| Phase | Migration | Code change |
|---|---|---|
| Expand | Add new column, index, or table. Old column remains. | Deploy code that writes to both old and new. Reads from old. |
| Contract | Remove old column. | Deploy code that reads from new only. |
Never combine the expand and contract steps in a single migration. Never combine them in a single release.
Rollback procedure¶
Non-production environments (dev, UAT)¶
Run the corresponding undo migration directly against the environment branch:
This runs the U{NNN} file for the target migration version. Confirm the schema state:
Verify no Pending or Failed entries remain.
Production environment¶
Do not use flyway undo on production without explicit DBA sign-off. The preferred rollback method is Neon point-in-time recovery (PITR).
Production rollback procedure:
- Identify the Neon database branch and the timestamp immediately before the migration was applied. Confirm from the
flyway_schema_historytable:SELECT installed_on FROM flyway_schema_history WHERE version = '{NNN}'. - Create a Neon branch from the production branch at the pre-migration timestamp:
- Update the Secrets Manager connection string for the affected database to point to the new branch.
- Lambda picks up the new connection string on cold-start. Force a cold-start if needed by deploying a no-op Lambda update or clearing the Lambda container (Lambda console → Configuration → edit any non-breaking env var).
- Confirm the rollback branch is healthy using the post-deployment checklist Phase 2 checks.
- Rename the branches:
production → production-pre-rollback-{date}andprod-rollback-{date} → production. This keeps the original branch available for forensic inspection. - Raise a post-incident review ticket. Do not leave the production connection pointing at a PITR branch permanently — plan the re-migration for the next release cycle.
flyway undo on production (exceptional only)¶
If PITR is not available or the incident timeline requires a faster response, flyway undo may be used on production with:
- Written sign-off from the DBA (named individual, not a role)
- The Head of Technology notified and approving
- A record in the incident ticket before the command runs
The undo SQL must be reviewed before execution — flyway undo runs the U{NNN} file, which must have been written and reviewed when the forward migration was written.
Schema ownership¶
Each system domain owns its databases exclusively. No repository has DDL rights on another domain's schema.
| Domain | Repository | Databases | Migration user |
|---|---|---|---|
| SD01 Core Banking | bank-core | bank_core |
core_migrate_user |
| SD02 KYC Platform | bank-kyc | bank_kyc |
kyc_migrate_user |
| SD03 AML Monitoring | bank-aml | bank_aml |
aml_migrate_user |
| SD04 Payments | bank-payments | bank_payments |
payments_migrate_user |
| SD05 Credit | bank-credit | bank_credit |
credit_migrate_user |
| SD06 Risk Platform | bank-risk-platform | bank_risk |
risk_migrate_user |
| SD07 Data Platform | bank-platform | bank_platform |
platform_migrate_user |
| SD08 App | bank-app | bank_app |
app_migrate_user |
The {domain}_migrate_user Postgres role has DDL rights only within that domain's schema. It cannot read from or write to another domain's tables. Cross-domain data access is via the event bus or internal API, never via direct SQL.
Migrations are reviewed in the same PR as the Lambda code change that requires them. A migration that sits in the repository without a corresponding code change is not permitted — it creates schema drift that is difficult to audit.
Flyway configuration¶
Each repository contains a flyway.conf file in its db/ directory. The configuration is identical across domains except for the flyway.url source.
# flyway.conf — standard configuration for all bank-* repositories
# Connection — read from environment variable injected from Secrets Manager in CI
# Use the direct (non-pooled) connection string for DDL
flyway.url=${FLYWAY_URL}
# Migration location
flyway.locations=filesystem:db/migrations
# Behaviour
flyway.baselineOnMigrate=false
flyway.validateOnMigrate=true
flyway.outOfOrder=false
flyway.mixed=false
# Encoding
flyway.encoding=UTF-8
flyway.sqlMigrationPrefix=V
flyway.undoSqlMigrationPrefix=U
flyway.sqlMigrationSeparator=__
flyway.sqlMigrationSuffixes=.sql
# Schema history table
flyway.table=flyway_schema_history
flyway.baselineOnMigrate=false — migrations are applied from V001 on a fresh Neon branch. Never set this to true; it would skip all migrations up to the current version on a new branch, leaving the schema uninitialised.
flyway.validateOnMigrate=true — Flyway verifies the checksum of each previously-applied migration before applying new ones. If a previously-applied migration file has been modified (its checksum no longer matches), Flyway aborts. This prevents accidental or malicious modification of applied migrations. A checksum mismatch is always a bug or a security event — investigate before proceeding.
flyway.outOfOrder=false — migrations must be applied in strict sequential order. A migration with a lower version number than the highest already-applied version will be rejected. This prevents out-of-order application that could leave the schema in an inconsistent state.
Common failure modes¶
| Failure | Symptom | Resolution |
|---|---|---|
| Checksum mismatch | flyway validate or flyway migrate aborts with "checksum mismatch for migration V{NNN}" |
Someone modified an applied migration file. Restore the original file from git history. Never use flyway repair to update the checksum — investigate the modification first. |
| Out-of-order migration | Pipeline fails with "out-of-order migration detected" | A migration was merged out of sequence. Renumber the migration to use the next available sequence number. |
| Partial non-transactional DDL | Schema is in a partial state; flyway_schema_history shows FAILED |
DBA intervention required via MOD-046. Manually complete or undo the DDL, then run flyway repair to mark the migration as applied (only after confirming the schema is in the intended final state). |
| Environment drift | Migration passes dev/UAT but fails prod | The environments have diverged. Compare flyway_schema_history across branches to find the difference. Apply the missing migrations to the production branch via the pipeline (never directly). |
| Direct production connection | Someone connected to prod with migrate_user outside CI |
Rotate the migrate_user credentials immediately. Review CloudTrail for the session. Create an incident record. |