AI is 90% marketing, 10% reality, and its true business impact is yet to be proven

Joe makes a call from a payphone. It costs him 60 cents for each minute of call. After 10 minutes, the price drops to 50 cents per minute. How much would a 30 minute call cost him?

Questions like these are part of a series of arithmetic tests for U.S. elementary schools, typically targeting children ages 10 to 11. Mathematical reasoning is the key to problem solving. It can therefore be used to measure the capabilities of a artificial intelligence (AI).

The 8k Mathematics Suite (GSM8K) for primary schools has become a popular benchmark for various AI extended language models (LLMs), such as ChatGPT. The suite contains 8,500 problems like the one above, divided into problems to form an LLM and then into real problems to solve. ChatGPT’s latest OpenAI LLM, the GPT-4o model, scored 92.5% on the GSM8K suite, while Google’s Gemini 1.5 Pro LLM scored 91.7%. A smaller LLM with fewer tuning parameters, Microsoft’s Phi-3-small, nevertheless achieved an impressive 88.5 percent.

However, a recent article by six researchers from Apple discovered significant weaknesses in the reasoning ability of 22 different cutting-edge LLMs, including those mentioned above. A simple name change – for example from “Joe” to “Dave” in the problem above – and leaving the rest of the test question completely unchanged can lead to a different answer than an LLM. This is clearly surprising and would not be expected from a student with a real mathematical understanding.

The fragility of the different LLMs examined by the researchers was more significant when the numbers of the test problems were changed, rather than just the names.

For example, changing the base rate of the telephone call in the test above from 60 cents per minute to 70 cents per minute, and similar numerical changes in the rest of the test problems, led to greater variety of precision in responses. The researchers concluded that LLMs do not perform formal reasoning and hypothesized that they were doing their best to match patterns within the set of proposed training problems.

Even more intriguing, removing or adding additional clauses had a significant impact on LLM performance. For example, removing the clause specifying a reduction in the price of a call after 10 minutes in the test problem above, or adding a new clause granting a 5% reduction for calls costing more than $10, often caused variation in the accuracy of the results.

The researchers noted that as the difficulty of test problems increased by adding more clauses, the performance of LLMs deteriorated rapidly with increasing problem complexity. They posited that pattern finding and matching becomes much more difficult for LLMs as problem difficulties increase, reinforcing their suggestion that authentic mathematical reasoning does not actually take place.

In addition to changing the specified values and complexity of the problems, the researchers then tried to add seemingly relevant, but in practice completely inconsequential, clauses. For example, the phone call problem above might add an unimportant clause stating that phone call prices were actually 10% cheaper last year, but the problem nonetheless lies in the current cost of the phone call of Joe. However, it is often the case that LLMs nevertheless apply the discount rate. In these scenarios, the researchers observed catastrophic performance declines in all LLMs tested, which they attributed potentially to an overreliance of LLMs on a particular set of training problems.

The researchers concluded: “Ultimately, our work highlights important limitations in the ability of LLMs to perform true mathematical reasoning. The high variance in LLM performance on different versions of the same question, their substantial decline in performance with a slight increase in difficulty, and their sensitivity to inconsequential information indicate that their reasoning is fragile. This may seem more like sophisticated pattern matching than true logical reasoning.

The text responses of ChatGPT and other LLMs captured the attention of audiences and investors when they gave the impression that they genuinely understood the world. In practice, it appears that they have grown to such a size that they absorb more information from their training data than individual humans could typically know or remember, and combine this data in various combinations. With enough input and training data, requiring considerable investment and energy, an LLM can give an illusion of intelligence but is in fact inherently limited in high-level reasoning and lacks an intelligent conceptual model.

One of the most influential computer giants today is Linus Torvalds, the creator of the widely used Linux operating system. He recently said that even if he found AI really interesting, he was still going to ignore it for the moment. He observed that the entire tech industry around AI is 90% marketing and 10% reality, and that “in five years, things will change and at that point we’ll see what AI is used for real-world workloads every day.”

I agree with him. The current generation of LLMs are useful in text analysis and search, and can also produce stunning images and videos, but their true commercial impact has not yet been proven.

Latest News

AI is great, but agencies need to remember that in 2025 they will be in marketing

Will NetApp (NTAP) AI Innovations Drive Revenue Growth in 2025?

Welsh public sector leads the way in responsible use of AI

The quantum leap: D-Wave’s revolutionary financing. Is the future of AI and cybersecurity here?

AI detection and personality generators: preserving authenticity online

Bangkok Post – New AI-related cybersecurity threats expected to proliferate in 2025

The essential role of cybersecurity in the sustainability of businesses, AND CISO

The quantum leap: D-Wave’s revolutionary financing. Is the future of AI and cybersecurity here?

AI detection and personality generators: preserving authenticity online

Bangkok Post – New AI-related cybersecurity threats expected to proliferate in 2025

The essential role of cybersecurity in the sustainability of businesses, AND CISO

AI is great, but agencies need to remember that in 2025 they will be in marketing

Marketing and AI integrations: marketing experiences

Why AI Could Be the Best Thing to Happen to Marketing

The Meta Marketing Summit is back – register now to drive growth in 2025

AI is great, but agencies need to remember that in 2025 they will be in marketing

Marketing and AI integrations: marketing experiences

Why AI Could Be the Best Thing to Happen to Marketing

The Meta Marketing Summit is back – register now to drive growth in 2025

The Most Popular AI Tools of 2024

Updates to Veo, Imagen and VideoFX, and introduction of Whisk to Google Labs

Congress releases AI policy plan

Accounting TodayTech News: Wolters Kluwer adds conversational AIPlus, HubSync adds a host of new enhancements; Crowe announces cyber risk analysis solution; and other technical accounting news…for 20 minutes

The Most Popular AI Tools of 2024

Updates to Veo, Imagen and VideoFX, and introduction of Whisk to Google Labs

Congress releases AI policy plan

Accounting TodayTech News: Wolters Kluwer adds conversational AIPlus, HubSync adds a host of new enhancements; Crowe announces cyber risk analysis solution; and other technical accounting news…for 20 minutes

Exploring the Power of AI and ML in Smart Grids: Advances, Applications and Challenges

Unsupervised ML 17 — Future Trends in Unsupervised Machine Learning: What’s Next? | by Ayşe Kübra Kuyucu | December 2024

FrontiersMachine learning applications in search of life beyond EarthMachine learning (ML) and artificial intelligence (AI) have moved beyond niche applications to become transformative and essential tools for analyzing data….2 days

ML breakthroughs win 2024 Nobel Prize in Physics

Exploring the Power of AI and ML in Smart Grids: Advances, Applications and Challenges

Unsupervised ML 17 — Future Trends in Unsupervised Machine Learning: What’s Next? | by Ayşe Kübra Kuyucu | December 2024

FrontiersMachine learning applications in search of life beyond EarthMachine learning (ML) and artificial intelligence (AI) have moved beyond niche applications to become transformative and essential tools for analyzing data….2 days

ML breakthroughs win 2024 Nobel Prize in Physics

The impact of AI on the corporate banking sector will reach $250.3 million by 2033!

INSEAD KnowledgeHow Businesses Can Survive AI "Black hole"Why a balanced, experimental approach to GenAI could offer businesses the best chance of navigating an uncertain future.

Summit Partners | The (near) future of AI: key trends impacting businesses

Latest News

Subscribe to Updates

AI is 90% marketing, 10% reality, and its true business impact is yet to be proven – The Irish Times

Related Posts

Subscribe to Updates