Skip to content

Schema migrations

Resolves: GAP-D10 — No schema migration management specification.

The platform uses Flyway for schema migrations on all Neon Postgres databases. Each of the 8 system domain repositories contains a db/migrations/ directory. This document defines naming conventions, sequencing rules, the migration pipeline, rollback procedures, and schema ownership.

Related: deployment sequence · post-deployment checklist · secrets manifest


Naming convention

Forward migrations

V{NNN}__{description}.sql

Example: V001__create_accounts_table.sql

Undo migrations

U{NNN}__{description}_rollback.sql

Example: U001__create_accounts_table_rollback.sql

An undo migration must exist for every forward migration. The CI pipeline enforces this: a PR that adds a V file without a corresponding U file fails the lint check.

Rules

Rule Detail
Sequence number Zero-padded 3 digits, starting at 001 per database. Independent per database — V001 in bank_core is unrelated to V001 in bank_kyc.
Description lower_snake_case, ≤ 50 characters, imperative mood.
Imperative verbs Use: create_, add_, drop_, alter_, backfill_, index_. Never use vague descriptions like misc_changes or fix.
Characters No spaces. No special characters except underscore (_) and the double-underscore separator between number and description.
Extension .sql only. No other migration formats.

Example file listing

db/migrations/
  V001__create_accounts_table.sql
  U001__create_accounts_table_rollback.sql
  V002__add_balance_snapshot_column.sql
  U002__add_balance_snapshot_column_rollback.sql
  V003__create_posting_entries_index.sql
  U003__create_posting_entries_index_rollback.sql

Sequencing across domains

  • Each database has its own independent migration sequence. There is no global sequence number.
  • Cross-database migrations never exist. No single migration file touches two databases. If two databases need coordinated changes (for example, adding a foreign-key-equivalent relationship enforced at the application layer), they are written as separate migration files in separate repositories.
  • When two separate migrations must be deployed in a specific order (for example, the bank_kyc schema change before the bank_core consuming service), this is documented as a deployment dependency in the release notes and enforced by the deployment sequence, not by Flyway.

Migration pipeline

How migrations run

Flyway runs as a CI pipeline step before the Lambda deployment step. The Lambda deployment does not proceed if the migration fails — this prevents a new Lambda version from starting against an incompatible schema.

Pull request merged → CI pipeline starts
1. Flyway migrate (against target environment branch)
   └─ Fail → pipeline aborts; Lambda does not deploy
   └─ Pass → continue
2. Lambda deploy (SAM / Terraform)
3. Post-deploy health checks (see post-deployment checklist)

Connection during migration

Flyway connects using the direct (non-pooled) Neon connection string — not the PgBouncer pool. DDL statements require session mode and cannot run via PgBouncer transaction mode. The direct connection string is read from the environment variable FLYWAY_URL, which is injected from Secrets Manager in the CI pipeline.

Do not use the PgBouncer connection string for migration. The migration will appear to succeed for simple SQL, but DDL that requires session-level state (such as CREATE INDEX CONCURRENTLY, advisory locks, or SET LOCAL) will behave unexpectedly.

Transactional safety

Each migration runs inside a transaction. If the migration SQL raises an error, Postgres rolls back the entire migration automatically. Flyway marks the migration as FAILED in flyway_schema_history and the pipeline fails.

Exception — non-transactional DDL. Certain Postgres DDL statements cannot run inside a transaction:

  • CREATE INDEX CONCURRENTLY
  • CREATE DATABASE
  • VACUUM

If a migration contains non-transactional DDL and fails partway through, Postgres cannot roll it back. This leaves the schema in a partial state. These migrations must be designed so that partial application is recoverable — typically by wrapping in a conditional (CREATE INDEX CONCURRENTLY IF NOT EXISTS) so the migration can be re-run safely. Any migration containing non-transactional DDL must be reviewed by a DBA before merging (use the db-review label on the PR).

If a broken migration leaves the schema in a partial state, manual DBA intervention is required via MOD-046 (database access and DBA tooling) before the pipeline can proceed.

Flyway schema history

Flyway records each applied migration in the flyway_schema_history table in each database. This table must never be manually modified. Modifying it to force a migration state is prohibited outside of a formal incident response with written DBA sign-off.


Environment promotion

Migrations are applied independently to each environment branch as code moves through the pipeline.

Feature branch → dev branch → uat branch → prod branch

Neon's branching model means each environment has its own database branch. Flyway applies migrations to the target branch when the CI pipeline runs for that environment.

Promotion rules

  • A migration that passes dev and UAT is expected to pass production. Any difference indicates schema drift (the environment branches have diverged) or a data dependency that the migration was not designed to handle. Both are bugs.
  • Never apply a migration directly to production. Do not connect to the prod Neon branch with the migrate_user role outside the CI pipeline. This bypasses review, logging, and the pre-migration validation check.
  • Never skip an environment. A migration that has not been applied to UAT must not be applied to production, even under incident pressure. If an urgent fix is needed, run it through the full pipeline from dev — Neon branching makes this fast.
  • Hotfix branches follow the same rule: create a hotfix branch from the production branch, run migrations through the hotfix dev and UAT branches, then merge to production.

Backward compatibility requirement

All migrations must be backward-compatible for at least one release cycle. "Backward-compatible" means the current production Lambda version continues to function correctly against the migrated schema, so that a rollback of the Lambda (without rolling back the schema) is safe.

Mandatory rules

Never drop a column the current production Lambda reads. The new Lambda may not need it, but the old Lambda (the rollback target) does. Mark the column for removal in the next release, not the current one.

Never rename a column directly. A direct rename breaks the current Lambda immediately. Instead:

  1. Add a new column with the new name.
  2. Deploy the new Lambda that reads the new column name (and writes to both during the transition).
  3. In the next release, drop the old column.

Never change a column type in a breaking way. Widening a type (e.g. INTBIGINT) is usually safe. Narrowing (e.g. BIGINTINT) or changing semantics (e.g. VARCHAR to an enum) is not. Use the expand/contract pattern:

  1. Add the new column.
  2. Backfill data from the old column.
  3. Deploy code that uses the new column.
  4. In a subsequent migration, drop the old column.

The expand/contract pattern

Any migration that would break the current Lambda in production must be split into at least two migrations across two release cycles:

Phase Migration Code change
Expand Add new column, index, or table. Old column remains. Deploy code that writes to both old and new. Reads from old.
Contract Remove old column. Deploy code that reads from new only.

Never combine the expand and contract steps in a single migration. Never combine them in a single release.


Rollback procedure

Non-production environments (dev, UAT)

Run the corresponding undo migration directly against the environment branch:

flyway -url="$FLYWAY_URL" -target="{NNN-1}" undo

This runs the U{NNN} file for the target migration version. Confirm the schema state:

flyway -url="$FLYWAY_URL" info

Verify no Pending or Failed entries remain.

Production environment

Do not use flyway undo on production without explicit DBA sign-off. The preferred rollback method is Neon point-in-time recovery (PITR).

Production rollback procedure:

  1. Identify the Neon database branch and the timestamp immediately before the migration was applied. Confirm from the flyway_schema_history table: SELECT installed_on FROM flyway_schema_history WHERE version = '{NNN}'.
  2. Create a Neon branch from the production branch at the pre-migration timestamp:
    neon branch create --name prod-rollback-{YYYY-MM-DD} \
      --parent production \
      --point-in-time {ISO8601_TIMESTAMP}
    
  3. Update the Secrets Manager connection string for the affected database to point to the new branch.
  4. Lambda picks up the new connection string on cold-start. Force a cold-start if needed by deploying a no-op Lambda update or clearing the Lambda container (Lambda console → Configuration → edit any non-breaking env var).
  5. Confirm the rollback branch is healthy using the post-deployment checklist Phase 2 checks.
  6. Rename the branches: production → production-pre-rollback-{date} and prod-rollback-{date} → production. This keeps the original branch available for forensic inspection.
  7. Raise a post-incident review ticket. Do not leave the production connection pointing at a PITR branch permanently — plan the re-migration for the next release cycle.

flyway undo on production (exceptional only)

If PITR is not available or the incident timeline requires a faster response, flyway undo may be used on production with:

  • Written sign-off from the DBA (named individual, not a role)
  • The Head of Technology notified and approving
  • A record in the incident ticket before the command runs

The undo SQL must be reviewed before execution — flyway undo runs the U{NNN} file, which must have been written and reviewed when the forward migration was written.


Schema ownership

Each system domain owns its databases exclusively. No repository has DDL rights on another domain's schema.

Domain Repository Databases Migration user
SD01 Core Banking bank-core bank_core core_migrate_user
SD02 KYC Platform bank-kyc bank_kyc kyc_migrate_user
SD03 AML Monitoring bank-aml bank_aml aml_migrate_user
SD04 Payments bank-payments bank_payments payments_migrate_user
SD05 Credit bank-credit bank_credit credit_migrate_user
SD06 Risk Platform bank-risk-platform bank_risk risk_migrate_user
SD07 Data Platform bank-platform bank_platform platform_migrate_user
SD08 App bank-app bank_app app_migrate_user

The {domain}_migrate_user Postgres role has DDL rights only within that domain's schema. It cannot read from or write to another domain's tables. Cross-domain data access is via the event bus or internal API, never via direct SQL.

Migrations are reviewed in the same PR as the Lambda code change that requires them. A migration that sits in the repository without a corresponding code change is not permitted — it creates schema drift that is difficult to audit.


Flyway configuration

Each repository contains a flyway.conf file in its db/ directory. The configuration is identical across domains except for the flyway.url source.

# flyway.conf — standard configuration for all bank-* repositories

# Connection — read from environment variable injected from Secrets Manager in CI
# Use the direct (non-pooled) connection string for DDL
flyway.url=${FLYWAY_URL}

# Migration location
flyway.locations=filesystem:db/migrations

# Behaviour
flyway.baselineOnMigrate=false
flyway.validateOnMigrate=true
flyway.outOfOrder=false
flyway.mixed=false

# Encoding
flyway.encoding=UTF-8
flyway.sqlMigrationPrefix=V
flyway.undoSqlMigrationPrefix=U
flyway.sqlMigrationSeparator=__
flyway.sqlMigrationSuffixes=.sql

# Schema history table
flyway.table=flyway_schema_history

flyway.baselineOnMigrate=false — migrations are applied from V001 on a fresh Neon branch. Never set this to true; it would skip all migrations up to the current version on a new branch, leaving the schema uninitialised.

flyway.validateOnMigrate=true — Flyway verifies the checksum of each previously-applied migration before applying new ones. If a previously-applied migration file has been modified (its checksum no longer matches), Flyway aborts. This prevents accidental or malicious modification of applied migrations. A checksum mismatch is always a bug or a security event — investigate before proceeding.

flyway.outOfOrder=false — migrations must be applied in strict sequential order. A migration with a lower version number than the highest already-applied version will be rejected. This prevents out-of-order application that could leave the schema in an inconsistent state.


Common failure modes

Failure Symptom Resolution
Checksum mismatch flyway validate or flyway migrate aborts with "checksum mismatch for migration V{NNN}" Someone modified an applied migration file. Restore the original file from git history. Never use flyway repair to update the checksum — investigate the modification first.
Out-of-order migration Pipeline fails with "out-of-order migration detected" A migration was merged out of sequence. Renumber the migration to use the next available sequence number.
Partial non-transactional DDL Schema is in a partial state; flyway_schema_history shows FAILED DBA intervention required via MOD-046. Manually complete or undo the DDL, then run flyway repair to mark the migration as applied (only after confirming the schema is in the intended final state).
Environment drift Migration passes dev/UAT but fails prod The environments have diverged. Compare flyway_schema_history across branches to find the difference. Apply the missing migrations to the production branch via the pipeline (never directly).
Direct production connection Someone connected to prod with migrate_user outside CI Rotate the migrate_user credentials immediately. Review CloudTrail for the session. Create an incident record.