Categorisation & merchant enrichment model¶


ID	`MOD-041`
System	SD06
Repo	`bank-risk-platform`
Build status	Deployed
Deployed	Yes
Last commit	`54197c02fff0ca78a988e6140d31778e59f05b46`

XGBoost multi-class classifier. Retrained weekly on customer correction signals. Confidence-routed — ≥0.85 auto, 0.60–0.84 prompt, <0.60 show Other. See ADR-017.

Build notes — 2026-05-14¶

The AWS SCP blocker on the bank-merchant-assets-{env} S3 bucket is resolved (bank-platform commit 911a11f7 provisions the bucket). The current deploy failure is unrelated to S3: the Python UDF NORMALISE_MERCHANT is not deployed in dev because the CI pipeline runs with the HAS_DCM=false + HAS_DBT_PROJECT=false workaround, which skips the Snowpark deploy step. The dbt model int_normalised_merchant then fails with Unknown user-defined function BANK_DEV_RISK.RISK_CUSTOMER.NORMALISE_MERCHANT.

Resolution options (pick one): - Deploy the UDF manually once (pnpm udf:deploy in the module directory); it persists in Snowflake and subsequent pipeline runs will find it. - Make the UDF deploy step unconditional in bank-platform/.gitlab/ci/templates/risk-platform.gitlab-ci.yml so it runs regardless of HAS_DCM / HAS_DBT_PROJECT. This is the cleanest long-term fix. - Resolve the DCM ownership privilege issue (see MOD-056 notes), flip HAS_DCM=true + HAS_DBT_PROJECT=true, and the UDF deploy runs automatically.

Streamlit dashboard¶

MOD-041 ships a Streamlit page RISK_CUSTOMER.STREAMLIT_CATEGORISATION_DASHBOARD providing: - Transaction categorisation coverage rate (% of transactions with non-null category, by category tree level) - Merchant-enrichment match rate and top unmatched merchant patterns - Model accuracy on held-out validation set (macro F1, per-category breakdown) - Category volume trends over 30 days

Consumed by MOD-172 (Operations & Model Intelligence Dashboard) in the model performance section. Cross-schema SELECT on RISK_CUSTOMER.* published views required for OPERATIONS_ROLE.

Module dependencies¶

Depends on¶

Module	Title	Required?	Contract	Reason
MOD-042	CDC pipeline — Neon logical replication to S3 Iceberg	Required	—	Transaction categorisation model is trained and scored in Snowflake on the transaction history from the CDC pipeline.
MOD-104	AWS shared infrastructure bootstrap	Required	—	MOD-104 provisions the S3 Iceberg bucket (Snowflake external tables), KMS key, and bank-risk-platform EventBridge bus ARN. Required before this module can be deployed.
MOD-102	Snowflake account configuration & governance	Required	—	Snowflake account and governance provisioned by MOD-102 must exist before this module can read or write Snowflake.
MOD-172	Operations & Model Intelligence Dashboard	Required	—	Operations & Model Intelligence Dashboard shows categorisation model accuracy and merchant-enrichment metrics in its model performance page.

Required by¶

Module	Title	As	Contract
MOD-070	Transaction history & search	Optional enhancement	—
MOD-077	Account dashboard & insight feed	Optional enhancement	—
MOD-166	Transaction category corrections	Optional enhancement	—
MOD-172	Operations & Model Intelligence Dashboard	Hard dependency	—

Policies satisfied¶

Policy	Title	Mode	How
CON-005	Fee & Pricing Transparency Policy	`AUTO`	Transaction descriptions and categories are accurate and meaningful — not raw acquirer strings
DT-005	Model Risk Management Policy	`LOG`	Categorisation model versioned, retrained on feedback, and performance-monitored

Capabilities satisfied¶

(No capabilities mapped)

Part of SD06 — Snowflake Analytics & Risk Platform Compiled 2026-05-22 from source/entities/modules/MOD-041.yaml