Data quality issues are as common and ancient as data itself, with businesses continually wrestling with poor data quality and the challenges of 'broken' data. In my professional journey, I've worked with several enterprises across the energy, utility, and financial services sectors, where data presents both significant challenges and opportunities.
From my direct experience, I've observed that business leaders frequently identify poor data quality as a major barrier to the adoption of AI and advanced data analytics. A research report by Mesh-AI identified data quality as the top barrier to AI adoption, amounting to over one third of the votes out of ten different factors. This fosters a “wait and see” approach as organisations delay initiatives that could deliver business value while they wait for an arbitrary data quality yardstick to be met.
This problem is intensified by the vast scale, complexity, and diversity of data sources that modern businesses need to consume, process, and manage. Many of these data sources originate outside the business's boundaries and are not under its control.
Traditionally, businesses have tackled data quality issues in a static manner, such as through manual assessments and expensive one-off data cleansing exercises. This approach hasn't durably solved these issues, which isn't surprising since many businesses still view their data as a static asset and prioritise risk over value.
However, there's growing recognition that in today's world, data is neither static nor solely owned by the organisation. The necessity to consume diverse data sources and enhance the quality and responsiveness of services using data to generate deep and actionable insights is now widely acknowledged. This is where the traditional static view of data and its quality falls short.
To move forward, we must adopt a fundamentally different and pragmatic approach to data quality.
When it comes to data quality, existing views have been shaped by how data has traditionally been utilised in enterprises, particularly to support fairly accurate, albeit not always timely, operational reporting. From this standpoint, data has often been categorised as either “broken” or “good” based on its perceived level of accuracy, completeness and validity.
This approach is impractical as it evaluates data based on a perceived level of perfection within narrow dimensions. However:
Instead, we need to adopt a more pragmatic view of data quality that does away with the assumptions of single ownership and perfect completeness.
We need to break down data quality into multiple measurable quality attributes that can be used to determine the suitability of a data set in meeting a real need.
In practical terms, data quality is about the key features that make data useful. Each feature is a specific, measurable trait (called quality attribute) that helps us decide if the data is right for a certain job. The quality we see in the data is based on how these attributes measure up. People who use the data can articulate what they need by pointing out what attribute levels are suitable to them.
Is this that simple? Having described the data in simple but clear terms, numerous powerful implications emerge:
We’ve sidestepped the trap of labelling these attributes as non-functionals, thereby relegating them to an afterthought in the perception of data. They are integral to the data itself and should be central to identifying usable data (for instance, as part of data contracts visible through data catalogues). Quality attributes are a crucial factor in determining a data set's suitability, not just its content.
Now that data quality is defined by a set of universal attributes, applying product thinking and techniques to identify and prioritise quality improvements becomes much more straightforward. This approach, grounded in product risks such as usability, feasibility, viability, and value, makes it easier to incorporate this mindset into long-term data management strategies. It helps overcome the dilemma of data quality issues becoming persistent obstacles, clarifying where to begin and what to prioritise in terms of product features or data quality optimisations to enable new capabilities.
Optimising data quality certainly incurs costs and complexity, and sometimes these costs can be significant, especially if improvements necessitate adopting a new architecture, for example. This is where value-based discussions that balance needs and costs can occur, leading to changes that align with the realities and the needs of the business.
As a business leader, you know the critical role of data in creating value and have seen past efforts to enhance it fall short. What's next? Move from viewing data quality as static and binary to using a measurable framework that captures quality through specific attributes. Focus on optimising data quality to meet business needs effectively, applying product thinking. This shift in managing data quality is crucial for realising data-driven ambitions, especially with AI.