Skip to content

MOD-017 — ML Behavioural Scoring (Path C: ADR-047 reference write-back)

Repo: bank-aml Module dir: MOD-017-ml-behavioural-scoring/ Bus: bank-aml (publishes nothing; consumed by MOD-016 via direct DB read)

What this module does

Pre-computes a 0–1000 behavioural risk score per party per model version in Snowflake Cortex, writes the score back to Postgres aml.behavioural_scores so MOD-016's rule path can read it on the real-time evaluation hot path without a Snowflake round-trip, and records every model lifecycle event (CHALLENGER_DEPLOYED, PROMOTED_TO_CHAMPION, ROLLED_BACK, RETIRED) in aml.model_deployment_history for FR-112 evidence and DT-005 governance.

Why this architecture

Snowflake Cortex inference is the right tool for behavioural scoring, but it cannot meet FR-112's real-time NFR (≤200ms on the rule path). The resolution is ADR-047 reference write-back: Cortex runs in batch (1- minute target lag), a Snowflake Task forwards new scores to a Lambda via an EXTERNAL FUNCTION, the Lambda writes the score to Postgres, and MOD-016's evaluator reads aml.behavioural_scores on the hot path with no cross-system call.

This is the same pattern MOD-085 uses for SD05 reference data.

Hot-path contract with MOD-016

MOD-016's evaluator queries:

SELECT score, risk_tier, model_version, model_role, scored_at, valid_until
  FROM aml.behavioural_scores
 WHERE party_id = $1
   AND model_role = 'CHAMPION'
   AND valid_until > now()
 ORDER BY scored_at DESC
 LIMIT 1;

Indexed by idx_behavioural_scores_party_champion (partial, WHERE model_role='CHAMPION'). Single-row read. No external call.

valid_until is NOT NULL (per wiki SD03 spec) — every row carries an explicit freshness boundary. The MOD-016 reader treats stale rows (valid_until <= now()) as misses and falls back to rule-only.

The contract test at tests/contract/mod-016-reader-shape.test.ts pins the shape — if MOD-017 ever drops or renames a column MOD-016 depends on, that test fails first.

Data model — owned tables

Both tables are append-only, defended by the canonical aml.fn_immutable_row() SECURITY DEFINER trigger function (per ADR-048). Created in V001__behavioural_scores_and_deployment_history.sql.

aml.behavioural_scores

Column Type Notes
id uuid PK DB default
party_id uuid indexed
model_version text non-empty, ≤64 chars
model_role text CHECK IN ('CHAMPION','CHALLENGER')
score numeric(5,0) CHECK 0..1000
risk_tier text CHECK IN (LOW,MEDIUM,HIGH,CRITICAL)
feature_vector_hash text SHA-256 hex of input vector — FR-111 traceability
score_reasons jsonb array of reason codes
scored_at timestamptz event time at the model
valid_until timestamptz NOT NULL CHECK >= scored_at. Derived by the Lambda as scored_at + validity_hours — see validity-window service below; not in the inbound payload.
triggered_by text CHECK IN (SCHEDULED,EVENT)
source_event_id text NULL for triggered_by=EVENT
created_at timestamptz DB default
  • UNIQUE (party_id, model_version, scored_at) — provides INSERT ... ON CONFLICT DO NOTHING idempotency for the write-back Lambda under at-least-once Snowflake-Task delivery.
  • Partial index on CHAMPION rows only — keeps the MOD-016 hot-path lookup tight.

aml.model_deployment_history

Column names follow bank-wiki SD03. Two additions (model_role, trace_id) are V001 extensions tracked for wiki update via docs/handoffs/MOD-017-wiki-corrections.handoff.md.

Column Type Notes
id uuid PK DB default
model_version text
model_role text NOT NULL CHECK IN (CHAMPION,CHALLENGER). V001 extension — orthogonal to event_type for DT-005 audit queries.
previous_model_version text NULL Required for PROMOTED_TO_CHAMPION + ROLLED_BACK (chk_mdh_previous_version_required)
event_type text CHECK IN (CHALLENGER_DEPLOYED, PROMOTED_TO_CHAMPION, ROLLED_BACK, RETIRED)
effective_at timestamptz event time
deployed_by text NOT NULL Staff ID or automation principal that triggered the change
change_reason text NOT NULL Required human-readable rationale (DT-005 LOG)
champion_metrics jsonb NULL Precision/recall/AUC. Populated only on PROMOTED_TO_CHAMPION; NULL on the other three event types (chk_mdh_promotion_carries_metrics)
trace_id uuid NOT NULL ADR-031 — every audit-class row carries the runbook-originating trace_id. V001 extension.
created_at timestamptz DB default

CHECK constraints (ADR-048):

  • chk_mdh_created_at_after_effective (Cat 1) — created_at must be ≥ effective_at − 1 minute (rows are recorded as the event takes effect).
  • chk_mdh_previous_version_required (Cat 1) — PROMOTED_TO_CHAMPION and ROLLED_BACK must name the version they replace.
  • chk_mdh_promotion_carries_metrics (Cat 2) — PROMOTED_TO_CHAMPION rows must carry champion_metrics; non-promotion rows must omit it. Both directions enforced (Zod refine + DB CHECK).

One row written per event — the orchestrator's quality gate is explicit: a promotion writes one row, the next ROLLED_BACK writes a second row. The integration test fr-112-rollback.test.ts verifies both rows persist and the rollback's effective_at is within the 30-minute window of the prior promotion.

Components

1. Score-writeback Lambda (src/handlers/score-writeback.ts)

  • Trigger: Snowflake Task → API Integration → EXTERNAL FUNCTION → API Gateway → Lambda.
  • Body: Snowflake batch envelope { data: [[idx, row], ...] }.
  • Per row: Zod parse → assertScoreSemantics() → INSERT with ON CONFLICT DO NOTHING → response.
  • Per-row failures don't fail the batch — Snowflake gets per-row error envelopes back.
  • Idempotency: UNIQUE (party_id, model_version, scored_at).
  • Coverage: idempotency integration test plants the same payload twice → exactly one row in the table.

2. Deployment-event Lambda (src/handlers/deployment-event.ts)

  • Trigger: model-deployment runbook → API Gateway POST.
  • Body: single DeploymentEventPayload (Zod).
  • Idempotent over ${model_version}:${event_type}:${effective_at} via aml.idempotency_keys.
  • For ROLLED_BACK: looks up the prior PROMOTED_TO_CHAMPION row and computes rollback elapsed time; flags > 30 minutes as out_of_rollback_window=true but never rejects (a delayed rollback in a major incident must still be recorded — the audit trail is what matters).
  • One row written per event, every time. CHALLENGER_DEPLOYED writes a row. RETIRED writes a row. PROMOTED_TO_CHAMPION writes a row. ROLLED_BACK writes a row.

3. Pure-logic services

  • services/score-writer.ts — INSERT ... ON CONFLICT DO NOTHING + replay re-read. Takes the Lambda-derived validUntil: Date as a third argument.
  • services/deployment-writer.ts — single INSERT into aml.model_deployment_history. Column names match the wiki spec exactly.
  • services/score-validator.ts — semantic checks (future-drift on scored_at, tier/score consistency). valid_until is not validated here — it's no longer in the payload (see validity-window).
  • services/deployment-validator.ts — required-field checks (previous_model_version for PROMOTED + ROLLED_BACK) + 30-minute rollback window flag.
  • services/validity-window.ts — resolves the model freshness window from /bank/{stage}/mod-017/model/validity-hours (default 24h on miss) and derives valid_until = scored_at + window. Single source of truth so the window can be retuned via SSM without redeploying Snowflake or the Lambda. Cached for the Lambda instance lifetime.

ADR-048 invariants applied from day one

V001 contains no follow-up V002 — every invariant is in the first migration (this is what feedback memory feedback_adr-048_invariants.md asks for in new modules):

Invariant Mechanism
Append-only aml.behavioural_scores trg_behavioural_scores_immutable → aml.fn_immutable_row() SECURITY DEFINER
Append-only aml.model_deployment_history trg_model_deployment_history_immutable → same
score in [0, 1000] CHECK on column (Cat 1)
valid_until ≥ scored_at CHECK chk_behavioural_scores_valid_until (Cat 1)
valid_until NOT NULL column constraint (Cat 1) — Lambda-derived, never NULL
event_type ∈ valid set CHECK on column
previous_model_version required for PROMOTED/ROLLED_BACK CHECK chk_mdh_previous_version_required (Cat 1)
created_at ≥ effective_at - 1 minute CHECK chk_mdh_created_at_after_effective (Cat 1)
champion_metrics required iff PROMOTED_TO_CHAMPION CHECK chk_mdh_promotion_carries_metrics (Cat 2) + Zod refine
INSERT-only grants bank_aml_app_user has SELECT+INSERT only (no UPDATE, no DELETE)

Negative test for every invariant: tests/integration/adr-048-invariants.test.ts.

FR/policy mapping

FR Mechanism
FR-109 — score 0..1000 Cortex output, validated by Zod + DB CHECK
FR-110 — alert if score ≥ threshold (default 750) MOD-016 reads score; threshold is in MOD-016's AppConfig
FR-111 — model_version + feature_vector_hash Stored on every score row
FR-112 — rollback within 30 minutes Deployment-event Lambda + ROLLED_BACK row + rollback-elapsed semantic + RollbackOutOfWindow metric + alarm
Policy Mode Mechanism
AML-001 LOG aml.behavioural_scores + aml.model_deployment_history immutability
AML-005 AUTO Source-scan: no skip_score, bypass_model, disable_model tokens
DT-005 LOG aml.model_deployment_history immutability + previous_version CHECK

Tests

Tier Files Count Coverage
Unit tests/unit/* 49 pure-logic services, payload schemas, lib (logger, errors, trace, emf)
Contract tests/contract/mod-016-reader-shape.test.ts 4 (in unit run) reader-shape pin between MOD-017 write side + MOD-016 read side; asserts valid_until is NOT in the input payload
Integration tests/integration/{adr-048,score-writeback-idempotency,fr-112-rollback}.test.ts 17 DB-gated; assert triggers fire, INSERT...ON CONFLICT collapses replays, FR-112 rollback path end-to-end, valid_until NOT NULL + chk_mdh_promotion_carries_metrics
Policy tests/policy/{pol-aml-001,pol-aml-005,pol-dt-005}.test.ts 8 (1 + 4 + 3) source-scan + DB-gated immutability/governance

Local result: 53 unit/contract pass, 1 AML-005 source-scan policy pass, 24 DB-gated tests skipped correctly. Coverage 93.15% on the unit-only scope (above 80% gate).

SSM outputs

Path Value Consumed by
/bank/{env}/mod017/api/base-url API Gateway base URL Snowflake API Integration; runbook
/bank/{env}/mod017/lambda/score-writeback/arn Lambda ARN (reference — not invoked outside Snowflake)
/bank/{env}/mod017/lambda/deployment-event/arn Lambda ARN (reference)

SSM inputs

Path Source
/bank/{env}/iam/lambda/bank-aml/arn MOD-104
/bank/{env}/observability/adot-layer-arn MOD-076
/bank/{env}/network/vpc-id MOD-104
/bank/{env}/network/private-subnet-ids MOD-104
/bank/{env}/neon/pooler-host MOD-103
/bank/{env}/sns/alerts/arn MOD-104
/bank/{env}/eventbridge/bank-aml/dlq-arn MOD-104
/bank/{env}/mod-017/model/validity-hours Operations — sets the freshness window for valid_until derivation. Default 24h on miss; the validity-window service degrades gracefully without it. Type: numeric string (hours).

Dependencies + cross-cutting

Module Role
MOD-104 BankAmlRole, VPC, SNS alerts topic
MOD-103 bank_aml database, bank_aml_app_user/migrate_user secrets
MOD-076 ADOT layer
MOD-016 Reads aml.behavioural_scores on the rule path
MOD-102 Snowflake API Integration + warehouse access (pre-req for the write-back chain)
MOD-150 Provisions the Snowflake-side DDL (infra/snowflake/cortex_dynamic_table.sql is the contract)

What was NOT delivered

  • Cognito sub-group policy on the deployment-event route — v1 relies on MOD-075's bank-admin/compliance JWT scope at the gateway edge. v2 may tighten to a dedicated aml-model-runbook group.
  • Per-row bank.aml.score_published event — v1 publishes nothing (per ADR-047 the data IS the contract). v2 may add an event for observability dashboards if MOD-076 needs it.
  • Snowflake-side IaC — the DDL in infra/snowflake/ is reference; MOD-150 owns provisioning.

Runbook references

  • Promotion: runbook POSTs to /internal/v1/deployment-events with event_type=PROMOTED_TO_CHAMPION, previous_model_version=<old-champion>, champion_metrics={...} (precision/recall/AUC — required), and change_reason (non-empty).
  • Rollback: same endpoint with event_type=ROLLED_BACK and previous_model_version=<the-version-we-roll-forward-to>. Must complete within 30 minutes of the prior promotion to satisfy FR-112's window — the RollbackOutOfWindow CloudWatch alarm fires if not. The Lambda still records the row (audit trail must capture the event).
  • Challenger deployed: event_type=CHALLENGER_DEPLOYED, no previous_model_version, no champion_metrics.
  • Retired: event_type=RETIRED, no previous_model_version required, no champion_metrics.