The Data Gravity Problem: How Data Mass Creates Organisational Inertia
As organisations accumulate data, it develops gravitational pull — attracting applications, processes, and decisions toward it. We examined how data gravity constrains AI strategy and creates structural inertia that no technology migration can overcome alone.
The physics metaphor
In astrophysics, gravity is a function of mass. The more massive an object, the stronger its gravitational pull, and the harder it is for other objects to escape its influence. Data behaves similarly in organisations.
As a dataset grows — in volume, in the number of applications that depend on it, in the number of processes that reference it, in the number of people whose decisions rely on it — it develops increasing gravitational pull. New applications are built near it because that’s where the data is. Processes are designed around it because moving the data is too expensive. Decisions default to what this dataset can answer because that’s what’s available.
Over time, the organisation’s strategic options are shaped not by what it needs to do, but by where its data mass sits. This is data gravity, and it’s one of the most underrecognised constraints on enterprise AI strategy.
How data gravity constrains AI
1. Location lock-in
AI models need data. When the data lives in a specific system — a legacy data warehouse, a vendor platform, a mainframe — the AI must either go to the data or the data must come to the AI. Both have costs. The gravitational pull of established data stores means AI deployments tend to cluster around existing data concentrations, regardless of whether those concentrations represent the right data for the problem being solved.
We saw a logistics company build its entire AI capability around the data available in its ERP system — not because ERP data was the best input for its use cases, but because the ERP was the largest accessible data mass. The resulting models were technically competent but strategically misaligned: they optimised for operational efficiency metrics the ERP could see, while the strategic priority was customer experience metrics that lived in three other systems with weaker data gravity.
2. Schema ossification
Large datasets develop rigid schemas that resist change. Every field, every table, every relationship has downstream dependencies — reports, integrations, processes, compliance requirements. Changing the schema means changing everything downstream.
For AI, this is particularly constraining. ML models often need data in forms that legacy schemas weren’t designed to support. New features need to be engineered. Historical data needs to be backfilled. Labels need to be created. Each of these requirements collides with the gravitational resistance of the existing schema.
3. Decision anchoring
When an organisation’s decision-making has been built around a particular dataset for years, the dataset shapes not just what decisions are made but how the organisation thinks about decisions. The available data defines the available questions.
A bank that has decades of transaction data thinks about customer value in transactional terms. Its AI strategy focuses on transaction-based predictions — fraud detection, spending patterns, credit risk. The strategic question — “how do we understand the customer’s financial life holistically?” — requires data the bank doesn’t have gravitational mass in: life events, aspirations, cross-institutional behaviour. The AI strategy follows the data gravity, not the strategic need.
Your AI strategy shouldn’t be shaped by where your data mass is concentrated. But in practice, it almost always is.
Escaping the gravity well
We’ve observed three strategies that organisations use to overcome data gravity:
Federated architecture. Rather than moving data to a central location (fighting gravity), build the capability to query data where it lives. This requires investment in data fabric or data mesh architecture, but it decouples AI capability from data location. The model goes to the data instead of the data coming to the model.
Synthetic bridging. When the data you need doesn’t exist in sufficient mass, create it. Not fabricated data, but derived data — features engineered from existing sources that approximate the signals you need. A bank that lacks holistic customer data can engineer features from transactional data that serve as proxies for life events. This doesn’t solve the underlying data gap, but it breaks the gravitational constraint on the AI strategy.
Strategic data investment. Deliberately build data mass in strategically important areas, even before the AI use case is fully defined. If the strategy says “customer experience” matters, invest in collecting, structuring, and accumulating customer experience data — not because there’s a model to feed today, but because you need this gravitational centre to exist when the models are ready.
The organisations that break free from data gravity are the ones that recognise it as an organisational design problem, not a technical migration problem. Moving data is engineering. Changing what data the organisation orbits around is strategy.