Expense classification engine
|
|
| ID |
MOD-088 |
| System |
SD06 |
| Repo |
bank-risk-platform |
| Build status |
Not started |
| Deployed |
No |
What it does
MOD-088 is the expense classification engine. It takes enriched transaction events from MOD-087 and produces a multi-dimensional classification for each transaction:
| Dimension |
Values |
| Ownership |
Personal / Business / Mixed / Property |
| Purpose |
Client meeting / Supplies / Commute / Travel / etc. |
| Tax treatment |
Fully claimable / Partially claimable / Non-claimable |
| Accounting mapping |
Chart of accounts code |
| Confidence |
0–100% |
| Classification basis |
Merchant / Behavioural / Geo / Rule / User-confirmed |
The engine combines multiple signal types:
- Merchant intelligence — normalised merchant name, MCC, prior classification of this merchant in the user's history, population-level signal (privacy-safe aggregates)
- Behavioural patterns — time of day, day of week, recurrence (weekly subscriptions = likely business software), spend amount vs baseline, spend sequences (flight + hotel + meals = business trip)
- Geo-spatial context — home cluster, work cluster, travel period signals from MOD-089
- User-confirmed rules — explicit and implicit overrides from MOD-090
- Xero/MYOB history — imported on onboarding to bootstrap the model
Model approach
The classification model is a supervised ML model trained on labelled transaction histories, with a rule overlay for high-confidence cases (e.g. transactions at known payroll providers are always employer-sourced income). The model is retrained periodically from user-confirmed classifications across the anonymised portfolio.
Design phase
This module is in design. Build begins in Phase 2 of the Expense Intelligence Platform. See the Expense Intelligence Platform summary for the full implementation roadmap.
Module dependencies
Depends on
| Module |
Title |
Required? |
Contract |
Reason |
| MOD-087 |
Transaction enrichment engine |
Required |
contract/events/ |
Enriched transaction events from MOD-087 are the primary classification input — raw transaction data is not consumed directly. |
| MOD-089 |
Geo-spatial processor |
Required |
contract/events/ |
Geo-spatial signals (home cluster, work cluster, travel periods) from MOD-089 are incorporated as classification inputs per the classification input specification in MOD-088.md. |
| MOD-090 |
Auto rules engine |
Optional |
contract/events/ |
User-confirmed and implicit rules from MOD-090 are consumed as classification overrides; the base model runs without rule input but personalisation accuracy degrades over time without the feedback loop. |
Required by
| Module |
Title |
As |
Contract |
| MOD-090 |
Auto rules engine |
Hard dependency |
contract/events/ |
| MOD-092 |
Tax logic engine |
Hard dependency |
contract/events/ |
| MOD-093 |
Accounting mapper |
Hard dependency |
contract/events/ |
| MOD-094 |
Property attribution engine |
Hard dependency |
contract/events/ |
Policies satisfied
| Policy |
Title |
Mode |
How |
| PRI-001 |
Privacy Policy |
LOG |
Classification signals and model inputs are logged to support data minimisation review and individual access requests under PRI-001. |
Capabilities satisfied
| Capability |
Title |
Mode |
How |
| CAP-135 |
CAP-135 |
AUTO |
Applies ML model and rule overlay to produce a multi-dimensional classification (ownership, purpose, tax treatment, confidence) for each enriched transaction. |
Part of SD06 — Snowflake Analytics & Risk Platform
Compiled 2026-05-22 from source/entities/modules/MOD-088.yaml