Dr Clare Walsh, director of education at the Institute of Analytics, highlights data science in a discussion that also includes comments on artificial intelligence (AI).
By the end of 2024, business intelligence is being recognized across all sectors, with 98% of employers now looking for digital and data skills among graduates (1), regardless of their course of study. As data science becomes mainstream, it’s more important than ever to align data roles with overall business strategies, especially as the field evolves.
One of the biggest obstacles is the gap between advances in data science “syntax,” such as the growing range of algorithms, and “semantics,” or the interpretation and meaning of data. Even though algorithms have evolved rapidly, refining data to improve its accuracy and meaning remains complex.
Much web data is blocked, spoofed, and attributable to weaponized fake data campaigns or simply having more bots on the web. The lack of standardized practices, such as agreed-upon taxonomies for assessing the quality of data retrieved from the web, hinders clarity, and we have not made much progress in shared language around data quality to build trust and consistency throughout the area.
AI and Data Practices
Explainable AI (XAI) is another area where progress has been disappointing despite significant investment. Ironically, the rise of generative AI may have set back XAI’s efforts. Generative models lack standardized benchmarks, especially for non-textual applications, and established validation methods such as red-teaming are still underdeveloped.
Some recent advances, such as those from DeepMind, have raised optimism for model validation, but there is still much to be done. We’ve seen the greatest adoption of data solutions in industries like healthcare, where data scientists can rely on industry-established guidelines (such as clinical trials) and testing and approvals formalized in the absence of clear guidelines in our field.
Legislation governing AI and data practices is emerging in the European Union and Colorado, in addition to GDPR provisions under Article 22, promising a proactive approach to the unregulated growth of social media in the early 2000s. However, it is unclear how these laws will be interpreted in court. Without clear guidelines on acceptable practices, some data activities could easily be misrepresented.
For example, although “removing bias” from data sets is often presented as an achievable goal, certain segments of data will inevitably appear at a disadvantage due to natural statistical variance. Without nuanced guidelines, the general public could misinterpret these results,
potentially seeing discrimination where none exists. The data science community must define what constitutes acceptable or unfair bias to address these concerns and build trust.
Generative AI (Gen AI), introduced in 2017, has had limited influence on most practicing data scientists. Foundation models, designed to predict the next word or token in a sequence, have not yet proven themselves as reliable revenue generators. While search engines successfully associated with online advertising and retail have flourished, Generation AI has yet to find a similar basis for generating revenue. Despite nearly $1 trillion invested in generation AI to date, the technology has only produced about $1 billion in returns. This gap highlights the need for more practical and monetizable applications of the AI generation.
Recovery-augmented generation (RAG) models could change that, and data scientists will likely be called upon to help companies integrate these tools. Although interfaces like ChatGPT simplify the use of generative AI, they do not guarantee efficient or secure application. As data science tools become more accessible, the need for skilled professionals becomes even more critical.
While platforms like Alteryx enable simple point-and-click machine learning, misuse by inexperienced users could lead to significant errors. Assumptions of dimensionality reduction, imputation, and data distribution are key areas where small missteps can have outsized impacts on results. For example, assuming that data is normally distributed can create serious distortions, especially since the “normal” distribution is often rare in real-world data sets. The development of easy-to-use tools has given rise to “amateur” data scientists and self-described AI consultants, highlighting the need for experienced professionals to advocate even more strongly for safe and accurate practices.
Has there been an increase in data science programs?
The educational landscape is responding well to growing demand, with a rise in data science programs. These programs cover fundamental skills, but many graduates now find themselves in positions that require specialization and may lack adequate preparation for such tasks. HR departments often struggle to evaluate data science resumes, making the journey even more difficult for early-career data scientists, especially those who find themselves the only data experts within their organization. Improved onboarding and mentoring would help these professionals thrive and contribute effectively in their roles.
For research-focused graduates, the road ahead can be just as difficult. The private sector lags behind academic institutions in research output in the UK, which has led to a migration of our best talent to countries with better research and career opportunities. Addressing this imbalance could foster a stronger data science ecosystem in the UK and ensure that top talent finds ways to contribute within the country. (2)
Will data science become more deeply integrated into work processes?
In the coming year, we expect data science to be even more deeply integrated into daily work processes. With a growing number of professionals, the field is well-positioned to address ongoing challenges, from improving data quality standards to establishing ethical guidelines for AI practices. Building stronger connections between data professionals, legal executives, and organizational leaders will be key to shaping a future where data science is not only valuable, but also responsibly managed in the workplace.
1. https://26055784.fs1.hubspotusercontent-eu1.net/hubfs/26055784/Third%
20party%20events/Digital-GME-The%20Skills%20Gap.pdf?utm_campaign=IHEF
&utm_medium=email&_hsenc=p2ANqtz-80lySVgjKuvXChYKmEpkRKY3DaTiuzSi4
T9QdEhQwHqsmgjyGY4sUK72PGvs2kg9YAL7umDyRj6KYayQYPw0RLBktaK2qsg
DPGSh06afI1CixlJac&_hsmi=85912103&utm_content=85912103&utm_source=
hs_automation
2. https://www.ukonward.com/wp-content/uploads/2022/08/Rocket-Science.pdf