As more organisations explore and adopt Generative AI, the ability to leverage the technology in a sustainable way poses a serious challenge. Our recent blog delved into the risks of dark data, highlighting how storing data that yields no value can pose significant financial and climate-related costs.
Data not being used or managed exposes organisations to cyber threats and the more obvious implications of making any subsequent analysis inaccurate. The carbon footprint of storing unnecessary data is also of huge significance. Data centres are responsible for 2.5% of human-produced carbon emissions, and if, as is estimated, between 70-90% of the world’s data is ‘dark’, we can start to see the scale of dark data’s impact on climate change.
Why does Dark Data Exist?
There could be myriad of reasons for organisations collecting, processing, and storing information or data points but generally not using or analysing. For enterprise businesses, this is usually due to one or more of the following:
- Regulatory requirements,
- Duplicated data across multiple platforms,
- Saving for use ‘just in case’ at a later date.
Clean Computing
Almost every industry is now looking at AI functionality that can help streamline processes and improve results. This has a significant impact on sustainability challenges. Here are the facts:
- The International Energy Agency states that data centres account for around 1% of the global electricity demand. By 2030, data centres are expected to reach 35 gigawatts of power consumption annually, up from 17 gigawatts in 2022.
- The more advanced chips (GPUs) required for AI adoption to keep pace require more energy to run and emit more heat - and therefore require more cooling - than your standard microchip. If Amazon or Microsoft replaced all the chips with GPUs, the energy consumption would increase 4-5 times.
- Data centres, according to Goldman Sachs, currently require 3% of the US’ energy usage, which will jump to 8% by 2030.
- The emissions of Google, Microsoft and Meta have all jumped in the last year, largely due to the energy demands of AI.
Clean computing refers to being environmentally responsible for the manufacture and disposal of computers, servers, and peripherals, as well as computing efficiency and reducing energy use.
With the rise of AI demands and dark data, we will examine what large regulated enterprises can do to reduce dark data in their organisations and ensure the right data is actioned to shape their stories.
Reimagining Data Strategy for Clean Computing
One of the key components of a data strategy is first understanding how data can contribute to the wider business strategy and goals. For organisations now more than ever, data strategy helps to stay relevant, competitive and innovative amidst the constant change. With this in mind, having the right data is fundamental to inform analysis and decision-making. Working to understand and eliminate dark data has a direct impact on realising an organisation’s sustainability strategy.
There are four key areas that play a key role in data strategy and we will examine how those roles play a part in reducing dark data. The four key areas:
- Data Audit
- Data Governance
- Data Sharing
- Data Management
Data Audit
Analysing what data is collected is an important step in reducing dark data. Adopting a structure of data product ownership across the organisation will greatly improve this, as individual teams will be able to analyse the data being ingested and assess its value to the business.
To audit dark data collection, the key areas organisations should focus on are:
- Formulate an Audit Plan: Define the scope and objectives of the dark data audit, identifying all sources of dark data and assessing their value and risks.
- Assemble a cross-functional team including IT, data governance, compliance and business owners to oversee the audit process.
- Identity Data Sources: Compiling a list of potential data sources, including Structured databases, unstructured files, logs, and Cloud storage.
- Include active (e.g. CRM data) and passive (e.g. server logs) data sources, ensuring no data sources are overlooked.
- Assess Data Usage: Analyse data access logs and usage patterns to determine which data is actively used, rarely used, or never used.
- Interview key stakeholders to understand data usage from a business perspective and identify any critical data that might be overlooked.
Fundamentally, all teams should be well-versed enough in the requirements and regulations to demonstrate autonomy in managing the data sets they’re responsible for.
Teams should also be encouraged to collaborate with data engineering teams to help ensure that the right data is being collected, stored, and processed. They can also work together to ensure that the organisation is adhering to compliance regulations, such as GDPR.
Data Governance
Data governance helps us to classify data being ingested and provide clear rules around where and how data should be stored. If some dark data is ingested, the governance will inform how this should be stored and whether certain data types have different security requirements. This will help to reduce risk in the event of a cyber attack.
Organisations need to start by assessing their current data landscape. Some points for building on any assessment could be:
- Ensuring it meets the legal regulatory requirements, such as GDPR.
- Establishing a data product ownership approach.
- Creating a data domain architecture and creating a data domain team function.
- Decentralising the data to put ownership in the hands of the experts on that domain, and implementing data governance.
- Creating a data catalogue to be kept with the data set after it has been transformed and curated, potentially ready to share with other data domains.
Key areas of reducing dark data with data governance are:
- Establish a Data Governance Framework: Develop a comprehensive framework that includes policies, standards, and procedures for managing dark data.
- Define the roles and responsibilities of data governance stakeholders, including data stewards, owners, and custodians.
- Define Data Retention and Deletion Policies: Develop data retention policies that specify how long data should be kept and when it should be archived or deleted.
- Ensure policies comply with regulatory requirements and business needs, and bake this in from the start in new applications.
- Monitor and Audit Data Usage: Develop and maintain a metadata management system to document dark data's origin, context, and usage.
- Use metadata to improve data discoverability and facilitate better data management.
Data Sharing
Sharing data across different business functions and data domains in a large regulated organisation is fundamental to how businesses make the most out of their data. A uniform data-sharing approach helps reduce data redundancy and duplication, which can form part of dark data collection.
As part of data strategy, organisations must develop a data-sharing capability and ensure that data domains follow the same approach. Below are key takeaways to undertake:
- Establish Clear Data Governance on Data Sharing: Define roles, responsibilities, and processes to ensure data quality, security, and compliance.
- Regular data audits should be conducted to identify and catalogue dark data. Governance policies should be implemented to document and manage all collected data throughout its lifecycle.
- Define Data-Sharing Policies: Create and enforce policies outlining what data can be shared, with whom, under what conditions, and how it can be used.
- Include guidelines for identifying, documenting, and sharing useful data while disposing of obsolete or redundant data. Ensure policies mandate regular reviews to minimise dark data accumulation.
- Promote Data Interoperability: Ensure data formats and systems are compatible to facilitate seamless data sharing.
- Standardise data formats to reduce the creation of unusable dark data. Ensure that data systems are compatible to facilitate efficient data sharing and minimise data silos.
Data Management
A rigid data management system assists with validating, storing, and protecting data to ensure its accessibility and reliability. Effective data management involves identifying, classifying, and either repurposing or removing the collected data that is not actively used, minimising storage costs and reducing potential risks.
- Comprehensive Data Management Plan: Create a detailed plan outlining data management goals, processes, and responsibilities. Ensure alignment with the organisation’s overall business objectives.
- Conduct regular data inventories to identify and document all data assets. Implement a plan to review and classify dark data, deciding whether to repurpose or dispose of it.
- Data Quality and Integrity: Implement data cleansing, validation, and enrichment processes to maintain high data quality. Regularly audit and monitor data for accuracy, consistency, and completeness.
- Implement data quality checks that flag unused or obsolete data. Regularly cleanse and update data to ensure that only relevant and current data is retained, reducing the accumulation of dark data.
- Scalable Data Storage Solutions: Utilise scalable and flexible data storage solutions that accommodate growing data volumes. Consider cloud-based storage for cost-effective and secure data management.
- Implement policies that regularly review storage to identify and manage dark data.
Embrace Clean Computing
Having a robust data strategy is now more critical than ever to tackle the challenges posed by dark data, including cybersecurity risks and a significant carbon footprint.
To address these issues effectively, departments must adopt a proactive approach to data management. This means embracing clean computing practices, conducting comprehensive data audits, and implementing rigorous data governance and sharing protocols.
Focusing on these areas can not only reduce dark data and the financial cost of storing it, but minimise your organisation’s carbon footprint and ensure that the right data drives your decision-making processes.