The most important input to AI model quality is training data. Every practitioner agrees on this in principle. In practice, the decisions that determine training data quality receive less rigour than decisions about model architecture, feature selection, and hyperparameter tuning, and the cost of those decisions is rarely calculated before they are made.

The most consequential training data decision for enterprises with large operational workloads on IBM Z is not which features to include. It is what happens to the data between the operational system of record and the training environment, and what that journey costs in terms of the data quality that determines whether the resulting model performs at the level the organisation needs.

Why enterprises extract data

Enterprises do not extract data from IBM Z environments irrationally. Modern AI ecosystems, tooling, distributed training infrastructure, GPU compute, and cloud-native experimentation workflows evolved largely outside the mainframe environment. Python-based data science toolchains, open-source ML frameworks, and the talent that uses them are optimised for analytical environments, not z/OS transactional stores. Extraction architectures emerged as a practical way to connect operational systems to the analytical tooling where model development happens.

The critique of extraction is not that it exists. It is that the architectural trade-offs it introduces are frequently underestimated and rarely quantified in terms of their actual impact on the quality of the AI that results. The gap between the extraction decision and its consequence in model performance is wide enough that many organisations do not connect them.

A framework for operational AI data quality

Training data quality for operational AI is not a single dimension. It has three components that affect model performance differently and that extraction architectures degrade in different ways. Treating them as a unified framework makes the trade-off calculation more precise.

Completeness is the degree to which the training dataset includes all features that are relevant to the prediction task. IBM Z operational environments hold transaction-level detail, account state at the time of each transaction, network-level context from the surrounding transaction flow, and the history of prior model decisions and their outcomes. An extracted dataset contains the fields the extraction pipeline was designed to capture, shaped by decisions made when the pipeline was built about what was worth the extraction cost. Features not included in the original pipeline require rebuilding it to add. Features that do not survive schema translation are absent without that absence being visible in the dataset itself.

Currency is the degree to which the training data reflects the state of the operational environment the model will face when it runs in production. IBM Z data is current because it is the system of record. An extracted dataset is as current as the last extraction run, minus preparation and processing time. In payment fraud, where attack patterns evolve on a timescale of weeks, a training dataset that is four to six weeks old at the point of training is already behind the adversarial environment the model will face. The model enters production with a currency deficit relative to the current threat, regardless of how well it performs on historical patterns.

Fidelity is the degree to which the training data preserves the relational context of the operational environment. Transaction data on IBM Z exists relationally: each transaction is connected to account state at the moment it occurred, the customer’s full cross-channel history, and the surrounding network transaction flow. Conventional extraction produces a flattened representation, a table of transactions with selected attributes, that loses the relational context. Models cannot learn patterns that exist only in that context if the context was not preserved in the training data. The patterns can be reconstructed through feature engineering in the analytical environment, but reconstruction at distance from the source introduces cost, complexity, and its own fidelity risks.

What extraction architecture actually costs on each dimension

The cost of completeness loss is the features the model does not have. This is not always visible in validation metrics, because validation metrics are computed on the same extracted dataset the model was trained on. The missing features produce no anomaly in training or validation. They produce an anomaly in production, when the model encounters transaction contexts that would have been diagnostic had the relevant features been available, and classifies them on the basis of what it can see.

The cost of currency lag is most visible in adversarial environments. A fraud model trained on a dataset assembled six weeks before deployment has not seen attack patterns that emerged in that window. Established fraud patterns are well-represented and will continue to be detected. Emerging patterns that have been active for only a few weeks are underrepresented or absent. The model enters production with a structural detection gap for recent attack vectors, and that gap persists until the next retraining cycle partially closes it.

The cost of fidelity loss depends on how much signal exists in the relational context that extraction flattened. For some prediction tasks, the flattened representation is sufficient. For high-volume transactional AI where behavioural sequences, network-level signals, and real-time account state are primary diagnostic inputs, relational context carries significant signal that flattening loses. The cost is a model that is operating on a reduced information set relative to what the operational environment contains.

The architectural spectrum

The choice is not binary between full extraction and training directly on z/OS transactional stores. There is a spectrum of architectural patterns with different cost profiles on the three dimensions.

Full batch extraction to a cloud data warehouse represents one end of the spectrum: maximum tooling flexibility, maximum currency lag, highest completeness risk, highest fidelity loss. Most large enterprise AI programmes operate somewhere in this range for historical reasons.

Near-real-time replication using change data capture pipelines reduces currency lag substantially, though not to zero, while preserving more of the tooling flexibility of the analytical environment. CDC architectures are increasingly common in sophisticated data engineering programmes but require significant operational investment to maintain at scale.

Streaming feature engineering platforms reduce lag further and can preserve more relational context by computing features close to the event stream rather than from periodic batch snapshots. The operational complexity and the requirement for feature design discipline are higher.

Co-located feature engineering and model serving on IBM Z, using platforms such as IBM Machine Learning for z/OS, represents the opposite end of the spectrum: maximum currency, maximum fidelity, access to the full operational context without a data movement event, at the cost of operating in a more constrained tooling environment. For high-volume transactional AI where operational context is the primary signal and inference latency is a hard constraint, the co-located architecture can create meaningful advantages in model quality that compound over the model lifecycle.

The right architecture depends on the prediction task, the operational context of the data, the tooling and talent available, and the cost tolerance for the quality trade-offs at each point on the spectrum. What is not defensible is treating extraction architecture as a default without quantifying what it costs on completeness, currency, and fidelity for the specific AI workload under consideration.

The proximity thesis

The underlying argument of this article extends beyond IBM Z. Operational AI quality depends on proximity to operational reality. The further the training data moves from the system of record, the harder it becomes to preserve completeness, currency, and fidelity at the level that high-quality operational AI requires.

For enterprises with large transactional workloads on IBM Z, that proximity argument has a specific application. The data that defines the operational reality of global payments, fraud patterns, customer behaviour, and network risk is on IBM Z. AI trained closer to that data, through architectures that minimise the completeness, currency, and fidelity degradation that distance introduces, has the potential to perform better, degrade more slowly, and require less frequent retraining than AI trained on data that has travelled further from its source.

The data advantage is already there. Whether it is realised depends on whether the extraction architecture decision is made with the trade-offs understood.