The CIO of a massive financial services company once told me that if he could have a single wish from a magic genie, it would be the ability to deliver faster.
But delivering the right outcome, fast, is a hard problem, made even more complex by our current data-centric world.
On the one hand, data adds layers and layers of complexity to the already-uncertain, already-risky discipline of software delivery. On the other, data is the differentiating asset of any business and the value that lies therein should be relentlessly pursued!
We need a data approach that democratises data, while minimising the biggest blocker to rapid delivery: waste.
Due to the complexity, there’s so much waste in the delivery pipeline. And we need an approach to data that can bring data closer, while reducing said waste.
In this blog, I will explore the problem of waste, before comparing the traditional data approach to the modern data mesh and seeing how they stack up against each other vis-a-vis waste and delivery speed.
Software and product delivery have long been plagued by all sorts of waste in the process that cause everything to slow down.
I want to explore the idea of waste through one of the main principles of Lean: identify and eliminate common sources of waste to make the process more efficient.
Specifically, let’s consider three sources of waste that in my experience are highly prevalent in data-centric use cases:
1) Waiting/queuing:
This is the time wasted when you are waiting for resources to become available or tasks to be completed that are essential to your stream of work. This is extremely common when you need to access a new data source or you need a new data view to be created for your requirements. This has a knock-on effect on how early you can deliver any piece of work that depends on this new data source being accessed.
2) Unplanned interruptions and context switching:
If you are on the receiving end of a new data access request, then this will potentially interfere with any planned work, especially that these new needs are hard to predict. If the data source you are responsible for is fairly popular, you can end up spending most of your time responding to ad hoc data access requests as multiple teams want to access your data at the same time.
3) Extra work:
The data you have might not be suitable for others to consume and therefore you might need to do ad hoc work to adapt it to each new consumer every time a new use case arises, making each interruption even more time-consuming!
I will show next how these sources of waste are prevalent in some of the most familiar data architectures, and how a data mesh can help escape them!
I am referring here specifically to data platforms that are often centralised, owned and operated by a restricted IT team acting as technical custodians, and are disconnected from the data domain owners.
This approach is relatively independent of the underlying technology but acts often as a barrier to scaling data accessibility and utilisation across the organisation.
Let’s see how this approach fairs when it comes to coping with the sources of waste we identified earlier:
- Waiting/queuing:
A monolithic data platform is especially susceptible to a high level of waiting and queuing because of its centralised nature. This turns it into a bottleneck forcing everyone in the organisation to compete for the prioritisation of their requirements against of each other.
- Unplanned interruptions and context switching:
Predicting and prioritising the demands on the data platform of the whole organisation can turn into a management and delivery nightmare very quickly. This can lead to a high amount of unplanned interruptions and task switching for the data engineers running the platform.
- Extra work:
Waste due to extra work can be mitigated in this approach if the team is allowed to learn from common requirements and implement these learning into reusable data assets. Note that this will require a close collaboration with the data domain owners.
However, at the same time, a monolithic data platform can be limited in terms of the technology choices it offers. For example, one team might need a time series database but this option might not be available, forcing teams to do even more unplanned work to adapt it - inefficiently - to their requirements.
In conclusion, the monolithic data platform is highly susceptible to excessive waste from multiple perspectives. And the more it scales, the deeper the problems become.
This is an approach to data that tends to emerge in organisations where data is already decentralised (for different historical reasons e.g. organic growth, pre-existing organisational structures) but distributed across organisational silos.
The level of federation in this approach is low and the data is often shaped in a way that mirrors the needs of the data producer almost exclusively for their internal consumption, rather than in a consumer-driven way.
How does it stack up vis-a-vis the main sources of waste?
- Waiting/queuing: this approach naturally mitigates the concentration of waiting and unplanned work by diluting these across the multiple silos.
- Unplanned interruptions and context switching: on the flip side, smaller silos can make things much worse for data producers of highly popular datasets as their smaller scale might not be sufficient to address the demands of the whole organisation.
- Extra work: siloed data tends to reflect the internal needs of the data owners almost exclusively. Therefore ad hoc work to make it consumer-ready is needed almost every time a new need for the data emerges. Multiplying the consumers of a data set increases interruptions and context switching on the data owners.
Distributed data siloes is typically a step up in terms of waste from a monolithic data platform, but still creates massive issues that are difficult to mitigate using the existing framework.
Data mesh is a decentralised, yet federated, approach, supporting distributed, democratised, self-serve access to data organised by business domain, not by pipeline stage.
We have seen elsewhere how a data mesh model helps organisations get the right outcome when it comes to data by:
1) Ensuring that the data is owned and maintained by the people who understand it
2) Putting the right data in the hands of the people who need it
3) Conferring an increased agility to the whole organisation by virtue of the increased autonomy that every team gains in a federated, self-serve model
The fundamental point that enables faster delivery with data mesh is that it reverses the dynamics of traditional data access patterns. Data mesh puts the emphasis on empowering data consumers to find and consume what they need autonomously. The implication for data owners is to treat their data as a product and make it available to the rest of the organisation. This approach can scale much more than the traditional ones because it is proactive and distributed at the same time.
By federating the responsibility for making data available to the organisation and keeping the end-to-end accountability within each data domain we can expect multiple improvements:
- Waiting/queuing: the federated aspect of data mesh means that there is no more one single bottleneck to throttle data access requirements across the organisation. Data is accessible self-serve removing any requirement for waiting if executed well.
- Unplanned interruptions and context switching:
Unplanned interruptions are now confined to genuinely new requirements. These can be turned into opportunities to evolve the data product and extend to meet new emerging demands.
- Extra work: the enhanced interoperability that is afforded by the consumer-centric data discovery, in addition to the shared norms and standards, reduce the amount of unplanned work as consumers can now self-serve and reuse data-assets for new use cases.
There is no free lunch, however. There is an initial investment of time and resources in each data domain to move the organisation into data mesh space. However, this work can be proactively and continuously done and shifted-left to minimise interference with product delivery work.
A data mesh model brings many advantages to data-driven organisations. The question that every business and IT leader will have on top of their mind is whether the investment in moving towards a data mesh approach will yield clear benefits in terms of the ability to deliver at a faster pace.
A data mesh brings a federated way of working that gives organisations the ability to operate with agility at scale, and breaks down the traditional delivery pipelines that create waste due to waiting, extra work and switching contexts.
As a result, waste in a data mesh system is far lower, resulting in massively accelerated software delivery potential.
Many organisations have seen similar gains adopting analogous approaches when it comes to building platforms and applications, and it’s time that the same mindset is adopted in the data space!
Interested in seeing our latest blogs as soon as they get released? Sign up for our newsletter using the form below, and also follow us on LinkedIn.