AI Governance at Transaction Velocity: What Actually Works

The question of how to govern AI at transaction velocity is more tractable than the debate around it suggests, provided the question is asked precisely. The imprecise version, whether it is possible to explain and govern AI making millions of decisions per second, produces an answer that sounds impossible. The precise version, what do regulators actually require, what can be computed at inference time and what does not need to be, and which tools close the gap, produces an answer that is implementable today.

The explainability misconception

The assumption embedded in most AI governance discussions is that explainability requires computing and storing an explanation for every individual decision at the time it is made. At ten thousand decisions per second, that assumption makes governance feel intractable: the storage volume, the compute overhead, and the latency impact of generating a full explanation inside a millisecond authorisation window are all prohibitive.

The assumption is wrong. Explainability at scale is not a real-time computation problem. It is a logging and reconstruction problem.

LIME (Local Interpretable Model-agnostic Explanations) and SHAP (SHapley Additive exPlanations) are the two most widely used techniques for explaining individual model decisions. LIME works by perturbing the inputs of a specific prediction and observing how the output changes, fitting a simpler interpretable model around that local behaviour to show which features drove the result. SHAP uses game theory, specifically Shapley values, to assign each input feature a contribution score for a given prediction. SHAP is more mathematically rigorous; LIME is faster and more flexible across model types. Both are computationally expensive relative to the inference itself, and neither is designed to run synchronously inside a sub-millisecond authorisation window.

What they are designed for is post-hoc explanation: given a specific past decision, reproduce the explanation by replaying the model with the original inputs. That requires two things to be logged at inference time: the input feature values that the model received, and the model version that produced the decision. The explanation itself does not need to be stored. It is generated on demand when a regulator, auditor, or customer requires it.

This is the architecture that makes explainability viable at transaction velocity. Log the inputs and the model version. Generate the explanation retrospectively. The storage and compute overhead is orders of magnitude lower than pre-computing explanations for every transaction, and the compliance output is equivalent.

What you actually need to persist

At transaction scale on IBM Z, the practical persistence requirements for governance compliance are more specific than most governance discussions suggest.

At inference time, the minimum required log entry contains the transaction identifier, the timestamp, the model version identifier, the input feature vector, and the decision output. This is a compact record, typically a few hundred bytes per transaction, that provides the reconstruction foundation for any subsequent explanation requirement.

The model itself must be versioned and retained for the period during which decisions made by that version may be subject to regulatory examination. In most jurisdictions that period is defined by the relevant model risk framework: three to five years is common in financial services. watsonx.governance handles model versioning automatically, maintaining the model registry that connects each historical decision to the model version that made it.

What does not need to be persisted for every transaction is the LIME or SHAP explanation. The explanation is deterministic given the model version and the input features, which means it can be regenerated at any point from the logged record. The compliance value is identical to a pre-stored explanation; the storage cost is a fraction of it.

This architecture scales. A payment network processing ten billion transactions per year generates a governance log that is manageable in size, supports on-demand explanation for any individual decision, and provides the input feature history required for aggregate drift and bias monitoring.

How IBM Trustworthy AI and watsonx.governance work together

IBM Trustworthy AI is the explainability, fairness, and robustness toolkit that provides the LIME and SHAP computation capability, bias detection across protected attributes, and model robustness testing. It is the technical layer that generates governance evidence on demand.

watsonx.governance is the platform layer that connects model development, deployment, monitoring, and compliance documentation. It maintains the model registry, runs continuous performance monitoring against defined thresholds, tracks drift, manages the alert and escalation workflow, and produces the compliance reporting that satisfies model risk governance examinations.

On IBM Z, this architecture has a specific advantage. The transaction logs, the feature records, and the model decision outputs are co-located with the transaction itself rather than held in a separate analytical environment that must be synchronised with the operational system. When a regulator or auditor requests an explanation for a specific transaction, the evidence base is in the same system as the transaction record. There is no cross-system reconciliation, no synchronisation lag, and no integration gap that could produce an incomplete response to a regulatory information request.

ML for z/OS, IBM’s machine learning runtime for the IBM Z platform, integrates with watsonx.governance so that models deployed directly on IBM Z are governed through the same platform as models deployed elsewhere in the enterprise. The governance coverage is consistent regardless of where the model runs, and the compliance evidence is unified across the model portfolio.

What regulators actually require by geography

No major jurisdiction currently mandates that a specific explainability technique such as LIME or SHAP be used, or that explanations be computed and stored in real time for every automated decision. What regulators mandate varies by geography and by the risk classification of the AI system, but the pattern is consistent: transparency, documentation, monitoring, and the operational capacity to explain on examination.

In the United States, SR 11-7 requires model documentation, independent validation, and ongoing monitoring for models used in material decision-making in financial institutions. It is principles-based and technique-agnostic. The CFPB’s guidance on algorithmic decision-making in credit requires adverse action notices that explain why credit was denied, but does not mandate a specific underlying explanation method.

In the European Union, the AI Act imposes the most prescriptive requirements for high-risk AI systems, which includes AI used in credit scoring, insurance pricing, and employment decisions. Requirements include logging of outputs sufficient to enable post-hoc monitoring, technical documentation of system design and validation, and provisions for human oversight. The logging requirement is compatible with the input feature logging architecture described above. The EU AI Act also requires that high-risk AI system logs be retained for a minimum period, though the specific duration depends on the system category.

GDPR Article 22 grants individuals the right to a meaningful explanation of automated decisions that significantly affect them. This requires the ability to explain on request, not real-time explanation generation at every inference.

In the United Kingdom, PRA SS1/23 establishes model risk management expectations for banks that are substantively similar to SR 11-7: documentation, validation, and ongoing monitoring. The FCA has published guidance on the use of AI in financial services that emphasises accountability and transparency without mandating specific techniques.

In Singapore, MAS guidelines under the FEAT framework (Fairness, Ethics, Accountability, Transparency) are principles-based. In Australia, APRA’s guidance on model risk similarly establishes expectations without prescribing methods.

The practical implication is that an organisation running watsonx.governance with feature-level logging, model versioning, continuous performance monitoring, and on-demand LIME or SHAP explanation capability is positioned to satisfy regulatory examination in all major jurisdictions. The governance architecture that works at transaction velocity is not a compromise on compliance. It is what compliance actually requires.

The governance architecture that scales

Governing AI at transaction velocity requires three things working together: an inference architecture that logs the minimum required inputs at decision time without adding latency; a platform that maintains model versioning, continuous monitoring, and alert-to-intervention workflows; and an explainability capability that generates on-demand explanations from logged inputs when regulatory, audit, or customer requirements trigger them.

IBM Z with ML for z/OS, IBM Trustworthy AI, and watsonx.governance is that architecture. The governance is not bolted on after the fact. It is embedded in the operational environment where the decisions are made, which is the only place it can function when the decisions are being made at transaction velocity.

AI Governance at Transaction Velocity: What Actually Works

The explainability misconception

What you actually need to persist

How IBM Trustworthy AI and watsonx.governance work together

What regulators actually require by geography

The governance architecture that scales

Related articles

AI Doesn't Have a Technology Problem. It Has an Ownership Problem.

Decision Latency: The Metric No One Tracks

Every AI Opportunity Starts With a Sub-Optimal Decision

Nobody in the Room Is Asking the Right Question