In just over a decade, data analytics has undergone several major transformations. First, it has become digitalized. Then, we have seen the emergence of “big data” analytics, driven in part by digitalization and in part by vastly improved storage and processing capabilities.
Finally, in recent years, analytics has been transformed once again by the emergence of generative AI models capable of analyzing data at unprecedented scale and speed.
We spoke to Rytis Ulys, Head of the Analytics Team at Oxylabsto learn more about these changes and what we can expect in the future.
BN: What do you think are the key trends that will shape the future of data analytics in business intelligence?
Gen AI becomes the data analyst’s personal assistant, taking over less exciting tasks, from basic code generation to data visualization.
I think the key effect of generative AI – and the biggest future trend in data analytics – is the democratization of data. Recently, there has been a lot of activity around “text-to-SQL” products to run natural language queries, which means that people without a data science background have the opportunity to dive deeper into data analysis.
But don’t get carried away by the hype. AI-powered tools aren’t 100% accurate, they’re not error-free, and they’re harder for less experienced users to spot when they’re wrong. The holy grail of analytics is accuracy combined with a nuanced understanding of the business landscape—skills that can’t be automated short of some kind of “general” AI.
The second trend, which is critical for business data professionals, is the move toward a single AI system that can integrate sales, employee, financial, and product analytics into a single solution. This could bring immense business value through cost savings (moving away from separate software) and also contribute to data democratization efforts.
BN: Can you tell us more about the role of machine learning and AI in next-generation data analytics for businesses?
Generative AI has somehow drawn an artificial and arbitrary line between next-generation analytics (powered by generative AI) and “traditional” AI systems (everything that came before generative AI). In the public discourse around AI, people often forget that “traditional” AI is not an outdated legacy; generative AI is only intelligent in appearance; and the two fields are in fact complementary.
In my previous answer, I highlighted the main challenges of using generative AI models for business data analysis. Generative AI is not, strictly speaking, intelligence: it is a stochastic technology operating on statistical probability, which is its ultimate limitation.
Increased data availability and innovative data mining solutions have been key drivers of the gen AI “revolution.” However, further progress won’t be achieved by simply injecting more data and computing power. To move toward “general” AI, developers will need to reconsider what “intelligence” and “reasoning” mean. Until that happens, generative models are unlikely to bring anything more substantial to data analysis than they already have.
That said, I don’t mean to say that there aren’t ways to improve the accuracy of generative AI and make it perform better on domain-specific tasks. A number of applications already do this. For example, there are guardrails between an LLM and users that ensure the model delivers results that follow the organization’s rules, while retrieval augmented generation (RAG) is increasingly used as an alternative to fine-tuning the LLM. RAG is based on a set of technologies, such as vector databases (think Pinecone, Weaviate, Qdrant, etc.), frameworks (LlamaIndex, LangChain, Chroma), and semantic analysis and similarity search tools.
BN: How can businesses effectively leverage Big Data to gain actionable insights and make strategic decisions?
In today’s globalized digital economy, companies have no choice but to make data-driven decisions unless they operate in a very small local market and are of limited size. To increase their competitiveness, a growing number of companies are collecting not only consumer data that they can obtain from their own channels, but also publicly available information on the web for price monitoring, market research, competitive analysis, cybersecurity and other purposes.
Up to a point, businesses can try to get by without using data to make decisions. However, as the pace of growth accelerates, businesses that rely solely on intuition inevitably start to fall behind. Unfortunately, there is no one-size-fits-all approach to effectively leveraging data that will work for every business. Every business must start with the basics: first, define the business problem; then, answer, very specifically, what type of data could help solve it. More than 75% of the data that businesses collect ends up “in storage.”dark data. So, deciding what data you don’t need is no less important than deciding what data you do need.
BN: How do you see data visualization evolving in the context of business intelligence and analytics?
RU: Most of today’s data visualization solutions have AI-powered features that provide users with a more dynamic view and increased accuracy. In addition, AI-powered automation also allows businesses to analyze patterns and generate insights from larger, more complex data sets, while freeing analysts from tedious visualization tasks.
I think data visualization solutions will need to evolve into more democratic and beginner-friendly alternatives, bringing data insights beyond data teams and into sales, marketing, product, and customer support departments. Unfortunately, it’s hard to say when we can expect these tools to arrive. The industry’s focus so far hasn’t been on finding the best visualization solution. There are many different tools on the market, and they all have their pros and cons.
BN: Could you talk about the importance of data privacy and security in the era of advanced analytics, and how companies can ensure compliance while leveraging data effectively?
RU: Data privacy and security were no less important before the era of advanced analytics. However, the growing scale and complexity of data collection and processing activities have also increased the risks associated with data mismanagement and sensitive data leaks. Today, the importance of good data governance cannot be underestimated: mistakes can lead to financial penalties, legal liability, reputational damage and consumer distrust.
In some cases, companies deliberately take shortcuts to reduce costs or gain other business advantages, resulting in poor data management. In many cases, however, data mismanagement is unintentional.
Consider Gen AI developers who need massive amounts of multidimensional data to train and test machine learning models. When collecting data at such a scale, it’s easy for a company to miss the point that some parts of those datasets contain personal data or copyrighted material that the company wasn’t authorized to collect and process. Worse yet, getting consent from thousands of internet users who could technically be considered “copyright” owners is nearly impossible.
So how can companies ensure compliance? Again, this depends on the context, such as the company’s country of origin. The data regimes in the US, UK and EU are very different, with the EU having the strictest regime. The new EU AI law will certainly have an additional effect on data governance, as it targets both developers and deployers of AI systems within the EU. While generative models are in the low-risk zone, in some cases they may still be subject to transparency requirements, requiring developers to reveal the data sources on which the AI systems were trained as well as the data management procedures.
There are, however, some basic principles that apply to all businesses. First, businesses should carefully assess the nature of the data they plan to extract. Second, more data does not necessarily mean better data. Determining which data adds value to the business and omitting excess or unnecessary data is the first step toward improving compliance and reducing data management risks.
BN: How can companies foster a culture of data-driven decision-making across their organizations?
RU: The first step is of course to establish the data foundation, namely the creation of the customer data platform (CDP), which integrates structured and cleansed data from various sources used by the company. To be successful, such a platform must include no-code access to the data for non-technical stakeholders, and this is not an easy task to achieve.
No-code access means that the chosen platform (or “solution”) must contain both a SQL interface for experienced data users and some sort of “drag and drop” functionality for beginners. At Oxylabs, we chose Apache Superset to advance our self-service analytics. However, there is no one-size-fits-all solution that would have only advantages and no disadvantages. In addition, these solutions require well-documented data modeling.
Once the necessary applications are in place, the second big challenge is building data literacy and trust among non-technical users. Proper training is needed to ensure employees process, interpret, and learn from data correctly. Why is this a challenge? Because it’s a slow process and takes time for data teams.
Fostering a data-driven culture isn’t a one-time project: Turning data into action will require a culture shift across the organization, as well as ongoing monitoring and improvement efforts to ensure non-technical employees feel confident deploying data in everyday decisions. Leadership support and strong collaboration across teams are essential to making self-service analytics (or data democratization, as it’s often called) work for your business.
Image credit: Sergey Nivens/depositphotos.com