Categorisation & merchant enrichment model¶
| ID | MOD-041 |
| System | SD06 |
| Repo | bank-risk-platform |
| Build status | Deployed |
| Deployed | Yes |
| Last commit | 54197c02fff0ca78a988e6140d31778e59f05b46 |
XGBoost multi-class classifier. Retrained weekly on customer correction signals. Confidence-routed — ≥0.85 auto, 0.60–0.84 prompt, <0.60 show Other. See ADR-017.
Build notes — 2026-05-14¶
The AWS SCP blocker on the bank-merchant-assets-{env} S3 bucket is resolved
(bank-platform commit 911a11f7 provisions the bucket). The current deploy
failure is unrelated to S3: the Python UDF NORMALISE_MERCHANT is not deployed
in dev because the CI pipeline runs with the HAS_DCM=false +
HAS_DBT_PROJECT=false workaround, which skips the Snowpark deploy step.
The dbt model int_normalised_merchant then fails with
Unknown user-defined function BANK_DEV_RISK.RISK_CUSTOMER.NORMALISE_MERCHANT.
Resolution options (pick one):
- Deploy the UDF manually once (pnpm udf:deploy in the module directory); it
persists in Snowflake and subsequent pipeline runs will find it.
- Make the UDF deploy step unconditional in
bank-platform/.gitlab/ci/templates/risk-platform.gitlab-ci.yml so it runs
regardless of HAS_DCM / HAS_DBT_PROJECT. This is the cleanest long-term fix.
- Resolve the DCM ownership privilege issue (see MOD-056 notes), flip
HAS_DCM=true + HAS_DBT_PROJECT=true, and the UDF deploy runs automatically.
Streamlit dashboard¶
MOD-041 ships a Streamlit page RISK_CUSTOMER.STREAMLIT_CATEGORISATION_DASHBOARD providing:
- Transaction categorisation coverage rate (% of transactions with non-null category, by category tree level)
- Merchant-enrichment match rate and top unmatched merchant patterns
- Model accuracy on held-out validation set (macro F1, per-category breakdown)
- Category volume trends over 30 days
Consumed by MOD-172 (Operations & Model Intelligence Dashboard) in the model performance section. Cross-schema SELECT on RISK_CUSTOMER.* published views required for OPERATIONS_ROLE.
Module dependencies¶
Depends on¶
| Module | Title | Required? | Contract | Reason |
|---|---|---|---|---|
| MOD-042 | CDC pipeline — Neon logical replication to S3 Iceberg | Required | — | Transaction categorisation model is trained and scored in Snowflake on the transaction history from the CDC pipeline. |
| MOD-104 | AWS shared infrastructure bootstrap | Required | — | MOD-104 provisions the S3 Iceberg bucket (Snowflake external tables), KMS key, and bank-risk-platform EventBridge bus ARN. Required before this module can be deployed. |
| MOD-102 | Snowflake account configuration & governance | Required | — | Snowflake account and governance provisioned by MOD-102 must exist before this module can read or write Snowflake. |
| MOD-172 | Operations & Model Intelligence Dashboard | Required | — | Operations & Model Intelligence Dashboard shows categorisation model accuracy and merchant-enrichment metrics in its model performance page. |
Required by¶
| Module | Title | As | Contract |
|---|---|---|---|
| MOD-070 | Transaction history & search | Optional enhancement | — |
| MOD-077 | Account dashboard & insight feed | Optional enhancement | — |
| MOD-166 | Transaction category corrections | Optional enhancement | — |
| MOD-172 | Operations & Model Intelligence Dashboard | Hard dependency | — |
Policies satisfied¶
| Policy | Title | Mode | How |
|---|---|---|---|
| CON-005 | Fee & Pricing Transparency Policy | AUTO |
Transaction descriptions and categories are accurate and meaningful — not raw acquirer strings |
| DT-005 | Model Risk Management Policy | LOG |
Categorisation model versioned, retrained on feedback, and performance-monitored |
Capabilities satisfied¶
(No capabilities mapped)
Part of SD06 — Snowflake Analytics & Risk Platform
Compiled 2026-05-22 from source/entities/modules/MOD-041.yaml