The end of “garbage in” and “garbage out”: how to profit from AI projects

As companies invest heavily in generative artificial intelligence (GenAI), many are disappointed by unmet expectations – and few successfully make the transition from prototype to production. Gartner, Inc. October 2023 Survey45% of organizations are currently testing GenAI, while only 10% have fully deployed it.

Growing disillusionment and delays often stem from a fundamental problem: poor data quality. The success of large-scale AI-driven language model (LLM) projects depends on accurate and reliable data. Yet many organizations attempt to build AI solutions on top of messy data warehouses without the necessary data engineering, including implementing a universal semantic layer. The result? Incomplete data, incomplete data.

Why AI needs a semantic layer

In the data world, many people have heard of semantic layers, but only a few AI professionals are familiar with them. A universal semantic layer is an abstraction layer that sits between data sources and consumers. It provides a consistent, standardized, and trusted view of data. The semantic layer enables data analysis, whether by humans or AI, by providing a unified data source.

Just like humans, LLMs need context and consistency to provide accurate results. Proper data cleaning, curation, and modeling are essential to improving AI accuracy. A universal semantic layer establishes metrics and metadata to ensure LLM consistency and accuracy. Providing a query interface can also limit what the model can use to answer a query rather than allowing it to answer using the full set of knowledge used for training.

Bringing order to the data

Implementing a universal semantic layer takes some work, but it’s worth it. Developers and data engineers can define a data model once, and it can be used anywhere, including with an AI application. The first step is to determine the problems to be solved, gather the appropriate information, and then code the connections between the two (business logic). Then, the metadata is used to develop an abstraction (semantic) layer based on the business logic.

A universal semantic layer helps overcome the garbage in/garbage out phenomenon that is so common today that many enterprises have adopted large-scale cloud data platforms like BigQuery, Databricks, and Snowflake. While very beneficial, storing the right logs, events, telemetry, customer behavior, etc. also adds another layer of complexity: an ever-growing web of permission definitions, caches, and metrics (is “average_cart_size” or “average_order_value” the right column for an e-commerce dashboard, for example?).

A universal semantic layer removes semantic complexity, helping individuals and LLMs navigate inconsistent metrics, overlapping schemas, and conflicting permissions issues that arise in modern data architectures.

Other benefits of AI projects

Integrating GenAI with consistent enterprise data improves reliability, transparency, and security while improving data quality and scalability. By sitting between data platforms and consumers, a universal semantic layer strengthens security through authentication and role-based access control.

A universal semantic layer also helps AI applications in several other ways. GenAI gains deeper insights into an organization’s unique context using private, semantically labeled data. This integration ensures that the AI system is accessing data that is updated in real time, improving the overall quality of the answers generated. And, as production AI models generate new data (predictions, answers, features), they need to be exposed to users. A universal semantic layer can automatically publish model-generated insights based on existing analytics and results.

As data analytics and AI projects grow, reliance on a single platform for AI or data analytics becomes less practical. A semantic layer connects various data tools and platforms by decoupling data sources from consumption, making analytics and AI accessible to more users.

The semantic layer can also enable explainable AI by organizing and disseminating information about why an AI model provides a particular answer. Providing greater insight into the reasoning behind an AI model’s suggestions builds trust in a model’s results.

Put an end to “garbage in” and “garbage out”

Although revolutionary, LLMs have limitations, particularly in producing accurate results due to the “garbage in, garbage out” problem. Essentially, LLMs are hallucinating. Simply providing them with database schemas is not enough to generate correct SQL. The prerequisite for successful AI projects is organizing the data into meaningful business definitions and a query interface so that LLMs can understand the data contextually.

Once data engineering is complete, businesses can start taking full advantage of AI and pave the way for innovative AI applications. To address the complexity of modern enterprise data management and AI, the universal semantic layer has emerged as the foundation that promises to improve efficiency and enable more informed decision-making by humans or AI.

So we need to start with data engineering and a universal semantic layer. Only then can companies optimize their AI investments and generate significant value.

The opinions and views expressed in this article are those of the author and do not necessarily reflect those of CDOTrends. Photo credit: iStockphoto/gmast3r