2023 will forever be remembered as the year of generative AI. In this digital age, you’d be hard-pressed to find anyone with access to the Internet who hasn’t heard of ChatGPT. If you’ve been around long enough to witness a technology cycle or two, you know we’re entering a pretty transformative cycle. According to a McKinsey survey, AI adoption has doubled since 2017. While recent developments will undoubtedly accelerate adoption further, I tend to believe that the results we see in 2024 will remain relatively small, as most companies Still trying to figure out how to align their data strategy with business objectives while dealing with increasing regulatory scrutiny. When it comes to the data industry, AI adoption will drive increased data adoption by making data and data infrastructure more accessible to more users within the organization, thus advocating for more AI projects. The secure democratization of data will be a major subject; we will see more practical implementations of data mesh and more investments in security, privacy and observability.
The purpose of this article is not to make bold claims about how AI will change the data industry as we know it, but rather to highlight some areas where we are likely to see continued investment from businesses and growing enthusiasm around data and AI. a self-fulfilling prophecy.
AI will be put to WORK and disrupt the modern data stack as we know it.
Of course, we start here. There is no denying that LLMs have completely changed the way we think about and implement technology, and the field of data and analytics is no exception. When it comes to the modern data stack, here are a few areas where LLMs will be game changers:
Data Analytic: Introducing AI into analytics workflows will increase automation, efficiency and accessibility.
- Automating: AI can be used to automate tedious tasks such as data collection, preparation and cleaning and reduce the risk of manual errors.
- Efficiency: Using more sophisticated predictive models will allow businesses to anticipate future trends and increase the accuracy of their forecasts. AI algorithms can be used to identify and study customer behavior, enabling highly personalized product recommendations and more targeted marketing campaigns.
- Accessibility: AI will facilitate the adoption of AI. NLP (Natural Language Processing) can be leveraged to make AI-driven data analysis more accessible by allowing even less expert users to interact with data in a conversational way.
Vector databases on the rise: LLMs require infrastructure that enables rapid querying and high processing speed of vast volumes of data, both structured and unstructured (schema-free). This is where the mathematical concept of vector and vector search databases comes into play. Instead of rows and columns (in the case of traditional relational databases), data is represented in a multidimensional space typical of a vector representation in mathematics. In the context of a Gen AI application, vector databases enable rapid processing and querying of vectorized data. More here And here.
“Think of a vector database as a large warehouse and artificial intelligence as a trained warehouse manager. In this warehouse, each item (data) is stored in a box (vector), carefully organized on shelves in multidimensional space,” as explained Mark Hinkle in The New Battery
The “ML pipeline”
In traditional data engineering, a data pipeline is the process by which data is moved from source to destination, typically to make it accessible to the business through BI for reporting and analysis. The ML pipeline is similar to the traditional data pipeline in that it is also a data movement process; However, its main purpose is to enable the process of developing and deploying machine learning models, and in this sense, unlike the data pipeline, the ML pipeline is not a “straight line” – more on the differences between data and ML pipelines. here And here.
Successful ML, AI, and data science projects will require a robust infrastructure that will build, test, train, optimize, and maintain model accuracy. It starts with well-structured ML pipelines.
Discretion please.
It is undeniable that the use of data and, therefore, the need for businesses to democratize both data and platform will continue to grow massively in 2024. That said, as data and AI are increasingly in addition to being regulated, the control exercised over personal data and protection policies will increase. Excellent summary of what to expect in AI regulation for the next 12 months here.
BYODM: bring your own data mesh
Since its introduction by its creator Zhamak in 2019, the data mesh has been the subject of much debate and a fair share of skepticism. Four years later, several implementations and variations have emerged where companies have adopted the principles of the concept and applied them to their architecture. Decentralization, domain-driven design, IaaS, data as a product, and end-to-end federated governance are all key principles that organizations should embrace to create and foster a democratized, silo-free data environment. However, moving from a traditional monolithic structure to a comprehensive data mesh is not easy and requires significant cultural and organizational changes. That’s why incremental adoption that allows you to slowly introduce the concept and prove its value while aligning existing and future technology and business considerations is what we’ve seen most work done over the past couple of years.
Ultimately, it is essential to remember that Data Mesh is an architectural and organizational change, not a technological solution. I think the BYODM approach will prevail in 2024.
Data and AI Observability
I’m biased here. That said, it’s hard to argue against the arguments for data observability and AI in a world where every organization is thinking about the potential of LLMs.
“There is no AI strategy without a data strategy. The intelligence we all seek comes from data” Frank Slootman.
Over the past few years, data observability has become a key part of any modern organization’s data strategy. If you are new to the concept, I recommend starting here Or here. There is no denying that AI will also reshape the data observability space. The adoption of AI agents and the use of NLP will increase the level of automation and inclusiveness of platform solutions, which in turn will propel adoption. The concept of data observability, as we know it, will evolve to capture the observability potential of AI and cover more AI use cases.
Most solutions available on the market already cover some aspects of what will become data and AI observability. If you think of data science as a data consumption use case, monitoring the data that goes into training models is already covered by most frameworks. The future of data observability and AI will evolve to include insights into ML model behavior, outcomes, and performance. Similar to how data pipelines are covered today, data observability platforms will include actionable insights into ML pipelines to enable effective anomaly detection, root cause analysis and incident management and bring reliability and efficiency to the deployment of ML products.
Conclusion
2024 is a leap year, which means we have 366 opportunities to do more and create innovation with data. Although 2023 will forever be remembered as the year of the AI generation, it is in 2024 that we will begin to see organizations working towards data and AI maturity. But to do AI well, a well-thought-out data strategy is essential. The modern data stack is an ever-evolving space, and in 2024 we will see more innovations brought about and catalyzed by the growing adoption of AI. As businesses experiment more with AI in 2024, governance and observability will take center stage to ensure smooth and efficient deployments.