Brevity is a valuable asset in using AI for data analysis

It’s wise to be brief when asking artificial intelligence tools to mine massive data sets for insights, according to a Cornell researcher. Emmanuel Trummer.

That’s why Trummer, an associate professor of computer science in Cornell’s Ann S. Bowers College of Computing and Information Science, has developed a new computing system, called Schemonic, that reduces the cost of using large language models (LLMs) like ChatGPT and Google Bard by combing through large data sets and generating what amounts to “CliffsNotes” versions of the data that the models can understand. Using Schemonic reduces the cost of using LLMs by up to tenfold, Trummer said.

“The monetary costs associated with using large language models are not insignificant,” said Trummer, the author of “Generating succinct descriptions of database schemas for cost-effective incentive of large language models”, which was presented at the 50th Very Large Database (VLDB) Conference held August 26-30 in Guangzhou, China. “I think this is a problem that everyone who uses these models faces.”

LLMs are the powerful algorithms that underpin generative AI. They have advanced to the point where they can crunch large data sets and show, through the computer code they generate, where to find patterns and insights in the data. Even those without a technical background can leverage these tools, Trummer said.

But getting LLMs to understand and process large data sets is difficult and potentially expensive, because the companies behind these models charge processing fees based on the number of individual “tokens” — words and numbers — within a data set. A large data set can contain billions of tokens or more, and the fees accrue each time users query the LLM, Trummer said.

“If you have hundreds of thousands of users all asking lots of questions about your dataset, you pay the price of repeatedly reading the data description for each request,” said Trummer, whose research explores how to make data analysis more efficient and user-friendly. “The costs can quickly add up.”

The key is to provide the LLM with concise instructions, in as few tokens as possible, about what the dataset contains and how it is organized, he said.

That’s where Schemonic comes in. Its abbreviated descriptions of database structure are enough for LLMs to do their magic at a fraction of the cost, he said.

“Schemonic detects a data structure pattern that can be summarized concisely,” he said. “This approach compresses the structured data in an optimal way to minimize the amount you would have to pay.”

There is often a quality tradeoff when compressing information, but the descriptions generated by Schemonic are guaranteed to be semantically correct, Trummer said. Additionally, state-of-the-art LLMs like OpenAI’s GPT-4 model can understand Schemonic’s abbreviated descriptions without any negative impact on the quality of their output, he said.

“LLMs are used in many data analysis cases, from translating questions about data into formal queries, to extracting tabular data from text, to finding semantic relationships between different data sets,” Trummer said. “All of these cases require you to describe the structure of the data to the LLM, which is why Schemonic helps you save money in all of these use cases.”

Louis DiPietro is a writer at the Cornell Ann S. Bowers College of Computing and Information Science.

Latest News

Human-centered, Ethical and Responsible AI Conference – Seoul

Instagram concerned about challenge of distinguishing real images from AI-generated images, Apple to launch foldable iPhone by 2026 and beyond: Consumer Tech News (Dec. 16-20) – Apple (NASDAQ: AAPL), Amazon.com (NASDAQ:AMZN)

From quantum threats to AI defenses: how cybersecurity will evolve in 2025

From quantum threats to AI defenses: how cybersecurity will evolve in 2025

Top Healthcare Cybersecurity and Privacy Predictions for 2025

The quantum leap: D-Wave’s revolutionary financing. Is the future of AI and cybersecurity here?

AI detection and personality generators: preserving authenticity online

From quantum threats to AI defenses: how cybersecurity will evolve in 2025

Top Healthcare Cybersecurity and Privacy Predictions for 2025

The quantum leap: D-Wave’s revolutionary financing. Is the future of AI and cybersecurity here?

AI detection and personality generators: preserving authenticity online

AI is great, but agencies need to remember that in 2025 they will be in marketing

Marketing and AI integrations: marketing experiences

Why AI Could Be the Best Thing to Happen to Marketing

The Meta Marketing Summit is back – register now to drive growth in 2025

AI is great, but agencies need to remember that in 2025 they will be in marketing

Marketing and AI integrations: marketing experiences

Why AI Could Be the Best Thing to Happen to Marketing

The Meta Marketing Summit is back – register now to drive growth in 2025

Instagram concerned about challenge of distinguishing real images from AI-generated images, Apple to launch foldable iPhone by 2026 and beyond: Consumer Tech News (Dec. 16-20) – Apple (NASDAQ: AAPL), Amazon.com (NASDAQ:AMZN)

AI is bad news for the Global South

China’s Shenzhen technology center issues ‘vouchers’ to support AI research and development

The Most Popular AI Tools of 2024

Instagram concerned about challenge of distinguishing real images from AI-generated images, Apple to launch foldable iPhone by 2026 and beyond: Consumer Tech News (Dec. 16-20) – Apple (NASDAQ: AAPL), Amazon.com (NASDAQ:AMZN)

AI is bad news for the Global South

China’s Shenzhen technology center issues ‘vouchers’ to support AI research and development

The Most Popular AI Tools of 2024

Exploring the Power of AI and ML in Smart Grids: Advances, Applications and Challenges

Unsupervised ML 17 — Future Trends in Unsupervised Machine Learning: What’s Next? | by Ayşe Kübra Kuyucu | December 2024

FrontiersMachine learning applications in search of life beyond EarthMachine learning (ML) and artificial intelligence (AI) have moved beyond niche applications to become transformative and essential tools for analyzing data….2 days

ML breakthroughs win 2024 Nobel Prize in Physics

Exploring the Power of AI and ML in Smart Grids: Advances, Applications and Challenges

Unsupervised ML 17 — Future Trends in Unsupervised Machine Learning: What’s Next? | by Ayşe Kübra Kuyucu | December 2024

FrontiersMachine learning applications in search of life beyond EarthMachine learning (ML) and artificial intelligence (AI) have moved beyond niche applications to become transformative and essential tools for analyzing data….2 days

ML breakthroughs win 2024 Nobel Prize in Physics

Latest News

Subscribe to Updates

Brevity is a valuable asset in using AI for data analysis

Subscribe to Updates