Skip to content

Categorisation & merchant enrichment model

ID MOD-041
System SD06
Repo bank-risk-platform
Build status Deployed
Deployed Yes
Last commit 54197c02fff0ca78a988e6140d31778e59f05b46

XGBoost multi-class classifier. Retrained weekly on customer correction signals. Confidence-routed — ≥0.85 auto, 0.60–0.84 prompt, <0.60 show Other. See ADR-017.

Build notes — 2026-05-14

The AWS SCP blocker on the bank-merchant-assets-{env} S3 bucket is resolved (bank-platform commit 911a11f7 provisions the bucket). The current deploy failure is unrelated to S3: the Python UDF NORMALISE_MERCHANT is not deployed in dev because the CI pipeline runs with the HAS_DCM=false + HAS_DBT_PROJECT=false workaround, which skips the Snowpark deploy step. The dbt model int_normalised_merchant then fails with Unknown user-defined function BANK_DEV_RISK.RISK_CUSTOMER.NORMALISE_MERCHANT.

Resolution options (pick one): - Deploy the UDF manually once (pnpm udf:deploy in the module directory); it persists in Snowflake and subsequent pipeline runs will find it. - Make the UDF deploy step unconditional in bank-platform/.gitlab/ci/templates/risk-platform.gitlab-ci.yml so it runs regardless of HAS_DCM / HAS_DBT_PROJECT. This is the cleanest long-term fix. - Resolve the DCM ownership privilege issue (see MOD-056 notes), flip HAS_DCM=true + HAS_DBT_PROJECT=true, and the UDF deploy runs automatically.

Streamlit dashboard

MOD-041 ships a Streamlit page RISK_CUSTOMER.STREAMLIT_CATEGORISATION_DASHBOARD providing: - Transaction categorisation coverage rate (% of transactions with non-null category, by category tree level) - Merchant-enrichment match rate and top unmatched merchant patterns - Model accuracy on held-out validation set (macro F1, per-category breakdown) - Category volume trends over 30 days

Consumed by MOD-172 (Operations & Model Intelligence Dashboard) in the model performance section. Cross-schema SELECT on RISK_CUSTOMER.* published views required for OPERATIONS_ROLE.


Module dependencies

Depends on

Module Title Required? Contract Reason
MOD-042 CDC pipeline — Neon logical replication to S3 Iceberg Required Transaction categorisation model is trained and scored in Snowflake on the transaction history from the CDC pipeline.
MOD-104 AWS shared infrastructure bootstrap Required MOD-104 provisions the S3 Iceberg bucket (Snowflake external tables), KMS key, and bank-risk-platform EventBridge bus ARN. Required before this module can be deployed.
MOD-102 Snowflake account configuration & governance Required Snowflake account and governance provisioned by MOD-102 must exist before this module can read or write Snowflake.
MOD-172 Operations & Model Intelligence Dashboard Required Operations & Model Intelligence Dashboard shows categorisation model accuracy and merchant-enrichment metrics in its model performance page.

Required by

Module Title As Contract
MOD-070 Transaction history & search Optional enhancement
MOD-077 Account dashboard & insight feed Optional enhancement
MOD-166 Transaction category corrections Optional enhancement
MOD-172 Operations & Model Intelligence Dashboard Hard dependency

Policies satisfied

Policy Title Mode How
CON-005 Fee & Pricing Transparency Policy AUTO Transaction descriptions and categories are accurate and meaningful — not raw acquirer strings
DT-005 Model Risk Management Policy LOG Categorisation model versioned, retrained on feedback, and performance-monitored

Capabilities satisfied

(No capabilities mapped)


Part of SD06 — Snowflake Analytics & Risk Platform Compiled 2026-05-22 from source/entities/modules/MOD-041.yaml