Rytis Ulys, Analytics Team Lead at Oxylabs, discusses the key trends shaping the future of data analytics and the role of generative AI and machine learning.
What key trends do you think will shape the future of business intelligence data analytics?
In just over a decade, data analysis has undergone several major transformations. First, it was digitized. Second, we have seen the emergence of “Big Data” analytics, driven partly by digitalization and partly by massive improvements in storage and processing capabilities. Finally, in recent years, analytics has been once again transformed by emerging generative AI models, capable of analyzing data at a scale and speed never before seen. The AI generation becomes the data analyst’s personal assistant, taking over less exciting tasks, from basic code generation to data visualization.
I think the key effect of generative AI – and the biggest future trend in data analytics – is the democratization of data. Recently, there has been a lot of activity around “text to SQL” products for running natural language queries, meaning that people without a data science major have the opportunity to dig deeper into data analysis.
However, we must not get carried away too quickly by the media hype. These AI-based tools are neither 100% accurate nor error-free, and it is harder to notice errors for less experienced users. The holy grail of analytics is accuracy combined with a nuanced understanding of the business landscape – skills that are impossible to automate unless you achieve some sort of “general” AI.
The second key trend for enterprise data professionals is the move toward a single, umbrella-style AI system that can integrate sales, employee, financial, and product analytics into a single solution. This could bring immense business value due to cost savings (moving away from separate software) and also contribute to data democratization efforts.
Can you explain the role of Machine Learning and AI in next-generation data analytics for businesses?
Generative AI has somehow drawn an arbitrary and artificial line between next-gen analytics (powered by Gen AI) and “legacy” AI systems (everything that came before Gen AI). In the public discourse around AI, people often overlook the fact that “traditional” AI is not an outdated legacy; The AI generation is only smart on the surface; and the two areas are actually complementary.
In my previous answer, I highlighted the main challenges of using generative AI models for enterprise data analysis. AI generation is not, strictly speaking, intelligence: it is a stochastic technology operating on the basis of statistical probabilities, which constitutes its ultimate limit.
Increased data availability and innovative data recovery solutions have been the main drivers of the AI generation “revolution”; However, further progress cannot be made by simply injecting more data and computing power. As we move toward “general” AI, developers will need to reconsider what “intelligence” and “reasoning” mean. Until this happens, it is unlikely that generative models will bring anything more substantial to data analysis than they already have.
By saying this, I do not mean that there are no methods to improve the accuracy of generative AI and improve it in domain-specific tasks. A number of apps already do this. For example, guardrails are installed between an LLM and users, ensuring that the model delivers results consistent with the organization’s rules, while retrieval augmented generation (RAG) is increasingly used as an alternative fine-tuning the LLM. RAG relies on a set of technologies, such as vector databases (think Pinecone, Weaviate, Qdrant, etc.), frameworks (LlamaIndex, LangChain, Chroma), and semantic analysis and similarity search tools.
How can businesses effectively leverage Big Data to gain actionable insights and make strategic decisions?
In today’s globalized digital economy, businesses have no choice but to avoid data-driven decisions unless they operate in a very small local market and are limited in size. To boost competitiveness, a growing number of companies are collecting not only consumer data that they can obtain from their own channels, but also publicly available information on the web for pricing information, d market research, competitor analysis, cybersecurity and other purposes.
To some extent, companies could try to get by without resorting to data-driven decisions; However, when the pace of growth accelerates, companies that rely on intuition inevitably begin to fall behind. Unfortunately, there is no one-size-fits-all approach to effectively leveraging data that would work for all businesses. Every business must start from the basics: first, define the business problem; Second, answer very specifically what type of data could help solve the problem. More than 75% of data collected by businesses ends up as “dark data.” So, deciding what data you don’t need is no less important than deciding what data you do need.
How do you see the evolution of data visualization in the context of business intelligence and analytics?
Today, most data visualization solutions have AI-powered features that provide users with a more dynamic view and increased accuracy. Additionally, AI-driven automation also allows businesses to analyze patterns and generate insights from larger, more complex data sets, while freeing analysts from mundane visualization tasks.
I believe data visualization solutions will need to evolve into more democratic and user-friendly alternatives, bringing data insights beyond data teams and into sales, marketing, product, and customer support departments. Unfortunately, it’s difficult to say when we might expect such tools to arrive. Until now, the industry has not focused on finding the best visualization solution. There are many different tools available on the market, and they all have their pros and cons.
Could you discuss the importance of data privacy and security in the age of advanced analytics, and how businesses can ensure compliance while leveraging data effectively?
Data privacy and security were no less important before the era of advanced analytics. However, the increased scale and complexity of data collection and processing activities have also increased risks related to data mismanagement and leaks of sensitive data. Today, the importance of good data governance cannot be underestimated: errors can result in financial penalties, legal liabilities, reputational damage and consumer distrust.
In some cases, companies deliberately “cut corners” in order to reduce costs or gain other business benefits, resulting in poor data management. In many cases, however, inappropriate data behavior is unintentional.
Take the example of Gen AI developers who need huge amounts of multifaceted data to train and test ML models. When collecting data on such a scale, it is easy for a company to forget that parts of these datasets contain personal data or copyrighted material that the company was not authorized to collect and process. Worse, it is virtually impossible to obtain consent from thousands of Internet users who could technically be considered “copyright” owners.
So how can businesses ensure compliance? Again, it depends on the context, such as the company’s country of origin. The data regimes in the United States, the United Kingdom and the European Union are quite different, with the European Union being the strictest. The new EU AI law will certainly have an additional effect on data governance, as it affects both developers and deployers of AI systems within the EU. Although generative models fall into the low-risk zone, in some cases they may be subject to transparency requirements, forcing developers to reveal the data sources on which the AI systems were trained as well as the procedures data management.
There are, however, basic principles that apply to any business. First, businesses must carefully evaluate the nature of the data they plan to recover. Second, more data does not mean better data: deciding which data adds value to the business and omitting excessive or unnecessary data is the first step toward better compliance and fewer data management risks.
How can businesses foster a culture of data-driven decision-making within their organization?
The first step, of course, is to establish the data foundation: creating the Customer Data Platform (CDP), which integrates structured and cleansed data from various sources used by the business. To be successful, such a platform must include code-free access to data for non-technical stakeholders, and this is not an easy task to achieve.
No-code access means that your chosen platform (or “solution”) must contain both an SQL interface for experienced data users and some sort of “drag and drop” functionality for beginners. At Oxylabs, we chose Apache Superset to advance our self-service analytics. However, there is no solution that would suit every business and have only advantages and no disadvantages. Additionally, these solutions require well-documented data modeling.
Once you have the necessary applications, the second big challenge is building data literacy and confidence among non-technical users. This requires proper training to ensure employees manipulate data, interpret it, and derive insights from it correctly. Why is this a challenge? Because it is a slow process and will take time for the data teams.
Fostering a data-driven culture is not a one-time project: to turn data into action, you will need a cultural shift within the organization, as well as constant monitoring and improvement efforts to ensure non-technical employees feel confident during deployment. data in daily decisions. Management support and well-established cooperation between teams are essential to making self-service analytics (or data democratization, as it is often called) work for your business.
Click below to share this article