More than any other factor, the hyperabundance of accessible data has fueled the current surge in AI adoption and Generative AI aptitude. Collect, clean, organize and secure this data for AI and machine learning have become a project in itself – a governance effort in which AI tools themselves play an important role. The result can be a huge improvement in data governance that benefits the entire business.
The database remains the fundamental repository of data, but the ecosystem of AI-driven data governance tools is ubiquitous, including products from startups that may lack sustainability or deep foundation expertise of data. Over time, an increasing number of governance capabilities will likely be integrated into database software offerings and cloud database services.
Using AI to automate data governance shows immediate results. The better a company manages its data, the better its MLOps (machine learning operations) can use this data to create AI-based applications. More broadly, adding AI to data governance positively impacts any organization’s data analytics, regulatory compliance, and data quality efforts.
Here’s how AI is modernizing governance processes and how AI-enhanced tools can help ensure the success of AI/ML applications and Data managment in general.
Data cataloging
Do you know where your data is located? For governance to work, organizations need a complete inventory of all major data stores and an understanding of what they contain. The task of identifying, accessing and categorizing enterprise data is becoming increasingly daunting, due to the unruly proliferation of cloud data stores, not to mention the semi-structured logs used to identify trends and operational anomalies. Data cataloging software puts all of these repositories on the map.
AI can help with every phase of cataloging an organization’s data, starting with automated discovery of every business-relevant data store. The scope of cataloging tools varies, but some use AI to organize access control policies and/or enable natural language search within an organization’s data structure. AI-powered cataloging significantly reduces the manual work associated with classifying data assets and reveals data lineages showing where the data came from and how it has changed.
Metadata management
Effective metadata management – managing the information that describes your business data – is fundamental to successful governance. AI cataloging tools can identify metadata to properly categorize data assets, but metadata management is also vital for a healthy data estate. As a result, a wide range of offerings from data integration software to data observability platforms now provide metadata management capabilities.
AI-powered metadata management tools alleviate the tedium of manual data classification and help reconcile differences in metadata descriptions. In the past, businesses behaved as if metadata was relatively static, but today AI tools can continuously monitor and collect dynamic metadata about data storage, usage, and flow. Among other benefits, deep metadata around data assets can be used for AI recommendations on optimal storage platforms, or even to suggest potential data integration pipelines.
Data quality
The biggest impact of AI on data governance is on data quality, which has six dimensions: accuracy, completeness, consistency, uniqueness, timeliness and validity. Clearly, data lacking these qualities can be disastrous for operations. Not to mention that data scientists and analysts regularly find themselves cleaning data before they can use it.
AI/ML tools can automatically infer missing values, normalize data formats, flag data anomalies, and more. Humans still need to use judgment (are two customers with identical names the same or different?), but the overall time savings can be enormous. As AI tools learn patterns from large amounts of data, their recommendations, correlations, and corrections steadily improve. This benchmark can be used to monitor data quality in real time.
Data modeling
Structuring a database (or an entire data architecture) begins with collecting and analyzing data requirements and developing the logical and physical models to meet them. Several product offerings use AI to enable data architects and engineers to easily generate visual representations of data models.
Today, in many companies, data modeling is being disrupted to serve AI/ML applications. A number of AI data tools offer automated feature engineering, where key data characteristics are derived from datasets in preparation for AI training. In collaboration with Automatic ML (automated machine learning), this activity in turn supports another type of model selection: choosing the right ML model to power an application or power predictive analytics. If there is too little data to properly train a model, AI-powered data simulation tools can probe existing data stores and generate synthetic data that closely resembles the real thing.
Data policy and lifecycle management
Each organization must establish policies regarding the processing of its data, based on federal, state, industry and international regulations as well as internal business rules. In larger companies, a data governance committee defines these policies and specifies how they should be followed in a living document that evolves as regulations and procedures change. The natural language capabilities of generative AI can surface early versions of this documentation and make subsequent changes much less expensive.
By analyzing data usage patterns, regulatory requirements and internal workflows, AI can help organizations define and enforce data retention policies and automatically identify data that has reached end-of-use. their useful life. AI can even initiate the archiving or deletion process. In addition to reducing risk and ensuring compliance, automated data archiving helps free up storage space and reduce storage costs.
Data availability
AI-powered disaster recovery systems can help organizations develop robust recovery strategies by predicting potential failure scenarios and establishing preventative measures to minimize downtime and data loss. AI-enabled backup systems can ensure the integrity of backups and, in the event of a disaster, automatically initiate recovery procedures to restore lost or corrupted data.
AI-enabled storage management systems can replicate and distribute data across multiple storage locations to ensure high availability and low latency. At the same time, AI-based predictive analytics can ingest data from sensors, equipment logs and maintenance histories to predict potential failures or downtime. Nothing beats predictive maintenance to prevent loss of data availability.
Humans are still needed
Much of data governance is low-hanging fruit for AI. Many tasks associated with governance, from data discovery to data cleaning to policy management, are full of repetitive manual tasks that AI can handle easily and with greater precision than humans. This is a big win, especially as MLOps seeks clean, organized data stores on which AI applications can be built and trained.
Remember, however, that AI is not intelligent in the strict sense of the term. Resolving even minor discrepancies in data can require context born from extensive experience that only humans can acquire and digest. No one would delegate the creation of an enterprise data architecture to a machine. Yes, AI is already eliminating much of the manual work involved in data governance. But he won’t do the thinking for you.
Jozef de Vries is Director of Product Engineering at EDB.
—
Generative AI Insights provides a place for technology leaders, including vendors and other external contributors, to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is broad, ranging from in-depth reviews of technology to case studies to expert opinion, but it is also subjective, based on our judgment about which topics and treatments will best serve the technically sophisticated audience from InfoWorld. InfoWorld does not accept marketing materials for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.
Copyright © 2024 IDG Communications, Inc.