|
|
| ID |
MOD-076 |
| System |
SD07 |
| Repo |
bank-platform |
| Build status |
Deployed |
| Deployed |
Yes |
| Last commit |
bbdfbac46a1b5cf6dc25b4c7cd428a8daa669d03 |
The observability platform provides the platform engineering team with full visibility into the runtime behaviour of every service: distributed traces for request-level debugging, time-series metrics for system health and SLO monitoring, structured logs for error investigation, and alerting rules that page the on-call engineer when something needs immediate attention.
All services emit traces using an OpenTelemetry SDK. The observability platform collects, correlates, and stores these traces, allowing any individual API call to be followed end-to-end across all services it touched — including the time spent, any errors encountered, and the database queries executed at each step. This is the primary tool for diagnosing latency regressions and cascading failures in production.
Metrics cover both system-level indicators (CPU, memory, network, queue depth) and business-level indicators (payment processing rate, KYC decision latency, fraud score distribution). SLO dashboards track the bank's reliability commitments over rolling windows. Alerting routes pages to the on-call engineer via the configured channel (PagerDuty, Slack) based on severity rules. Logs are centralised with full-text search and retained for 90 days in hot storage and 7 years in cold storage for regulatory purposes.
Module dependencies
Depends on
| Module |
Title |
Required? |
Contract |
Reason |
| MOD-104 |
AWS shared infrastructure bootstrap |
Required |
— |
AWS shared infrastructure provisioned by MOD-104 (EventBridge buses, S3, KMS, Kinesis, Cognito) is required before this module can be deployed. |
Required by
| Module |
Title |
As |
Contract |
| MOD-032 |
LCR / NSFR calculator |
Hard dependency |
— |
| MOD-033 |
RWA & capital ratio engine |
Hard dependency |
— |
| MOD-058 |
Regulatory incident & breach notification engine |
Hard dependency |
— |
| MOD-087 |
Transaction enrichment engine |
Hard dependency |
— |
| MOD-102 |
Snowflake account configuration & governance |
Optional enhancement |
— |
| MOD-118 |
Member equity and share registry |
Hard dependency |
— |
| MOD-150 |
Risk management platform |
Hard dependency |
— |
| MOD-156 |
CI/CD pipeline platform |
Hard dependency |
— |
| MOD-157 |
External provider stub service |
Hard dependency |
— |
Policies satisfied
| Policy |
Title |
Mode |
How |
| GOV-006 |
Internal Audit Policy |
LOG |
Platform-level system events, errors, and performance anomalies are captured in the observability store — available for internal audit review. |
| DT-004 |
Data Governance Policy |
ALERT |
Data quality anomalies detected by pipeline monitors are surfaced as observability alerts — the DT-004 obligation to detect and respond is operationalised here. |
Capabilities satisfied
| Capability |
Title |
Mode |
How |
| CAP-123 |
Distributed tracing & APM |
AUTO |
Collects distributed traces across all services using OpenTelemetry, correlating spans by trace ID so any request can be followed end-to-end across the platform. |
| CAP-124 |
Metrics, alerting & log aggregation |
ALERT |
Collects metrics from all services, evaluates alerting rules, routes notifications to the on-call team, and aggregates logs into a searchable store with a 90-day hot retention window. |
Part of SD07 — Data Platform & Governance Infrastructure
Compiled 2026-05-22 from source/entities/modules/MOD-076.yaml