Pay for data quality now, profit later
By: Will Shiflett
January 20, 2026 | Automated QA, Cost Savings, Data
Insights
Data costs more than it makes, or does it?
Data is a cost center. There are exceptions, and you should consider telling your rapt colleagues all about them. But, exceptions aside, data alone doesn’t generate revenue equivalent to the costs of sourcing, storing, and transforming data.
Organizations bear those costs because, in addition to being a cost center, data is integral to any product or mission. The data isn’t the product itself, or the mission. But neither the product nor the mission are viable without data.
So the cost must be borne. And since the cost detracts from profits, or budgets, it should also be minimized.
However, the data’s cost isn’t limited to sourcing, storing, and transforming the data. That may be what your Cloud bill says, and like anything a Cloud bill says, it’s correct and predictable.
The hidden cost of overlooking validation
What if your product’s sourced, stored, and transformed data is unreliable, incorrect, or simply incomplete? In this case, too, the product will not achieve its full potential, and the mission will be muddled.
In scenarios like these, the cost of validating the data will be less than or equal to the cost of missed sales, mispriced products, or time wasted chasing objectives not relevant to your organization’s mission. These are what economists call opportunity costs. As in, “You had the opportunity to validate your data, but opted to focus on something else, thereby shifting the cost from the present (pay for data validation) to the future (lost sales, mission muddle).”
In summary, it’s not enough to bear the cost of sourcing, storing, and transforming the data. Any reasonable accounting needs to include the cost of validating the data, too.
The business case for data validation
Asking your organization to spend more money on what’s already a cost center is always a tough sell. Nobody wants to foot the bill to understand the deep-seated issues your data carried with it from data-childhood (when it was only a few megabytes) to data-adulthood (when it’s multiple gigabytes); everybody wants costs minimized and data validation maximized.
Luckily, there are ways to do this. In the the next post, we’ll talk about some of them.