Skip to content

Expense classification engine

ID MOD-088
System SD06
Repo bank-risk-platform
Build status Not started
Deployed No

What it does

MOD-088 is the expense classification engine. It takes enriched transaction events from MOD-087 and produces a multi-dimensional classification for each transaction:

Dimension Values
Ownership Personal / Business / Mixed / Property
Purpose Client meeting / Supplies / Commute / Travel / etc.
Tax treatment Fully claimable / Partially claimable / Non-claimable
Accounting mapping Chart of accounts code
Confidence 0–100%
Classification basis Merchant / Behavioural / Geo / Rule / User-confirmed

Classification inputs

The engine combines multiple signal types:

  • Merchant intelligence — normalised merchant name, MCC, prior classification of this merchant in the user's history, population-level signal (privacy-safe aggregates)
  • Behavioural patterns — time of day, day of week, recurrence (weekly subscriptions = likely business software), spend amount vs baseline, spend sequences (flight + hotel + meals = business trip)
  • Geo-spatial context — home cluster, work cluster, travel period signals from MOD-089
  • User-confirmed rules — explicit and implicit overrides from MOD-090
  • Xero/MYOB history — imported on onboarding to bootstrap the model

Model approach

The classification model is a supervised ML model trained on labelled transaction histories, with a rule overlay for high-confidence cases (e.g. transactions at known payroll providers are always employer-sourced income). The model is retrained periodically from user-confirmed classifications across the anonymised portfolio.

Design phase

This module is in design. Build begins in Phase 2 of the Expense Intelligence Platform. See the Expense Intelligence Platform summary for the full implementation roadmap.


Module dependencies

Depends on

Module Title Required? Contract Reason
MOD-087 Transaction enrichment engine Required contract/events/ Enriched transaction events from MOD-087 are the primary classification input — raw transaction data is not consumed directly.
MOD-089 Geo-spatial processor Required contract/events/ Geo-spatial signals (home cluster, work cluster, travel periods) from MOD-089 are incorporated as classification inputs per the classification input specification in MOD-088.md.
MOD-090 Auto rules engine Optional contract/events/ User-confirmed and implicit rules from MOD-090 are consumed as classification overrides; the base model runs without rule input but personalisation accuracy degrades over time without the feedback loop.

Required by

Module Title As Contract
MOD-090 Auto rules engine Hard dependency contract/events/
MOD-092 Tax logic engine Hard dependency contract/events/
MOD-093 Accounting mapper Hard dependency contract/events/
MOD-094 Property attribution engine Hard dependency contract/events/

Policies satisfied

Policy Title Mode How
PRI-001 Privacy Policy LOG Classification signals and model inputs are logged to support data minimisation review and individual access requests under PRI-001.

Capabilities satisfied

Capability Title Mode How
CAP-135 CAP-135 AUTO Applies ML model and rule overlay to produce a multi-dimensional classification (ownership, purpose, tax treatment, confidence) for each enriched transaction.

Part of SD06 — Snowflake Analytics & Risk Platform Compiled 2026-05-22 from source/entities/modules/MOD-088.yaml