Experts Create 'Humanity's Last Test' to Thwart Powerful AI Technology

A team of technology experts launched a global call on Monday to identify the toughest questions to ask artificial intelligence systems, which are increasingly making popular benchmark tests seem child’s play.

Dubbed “Humanity’s Last Examination,” the project aims to determine when expert-level AI has arrived. It aims to remain relevant even as capabilities advance in the years to come, according to the organizers, a nonprofit called the Center for AI Safety (CAIS) and the startup Scale AI.

The call comes days after the creator of ChatGPT previewed a new model, known as OpenAI o1, that “destroyed the most popular reasoning benchmarks,” said Dan Hendrycks, executive director of CAIS and an adviser to Elon Musk’s startup xAI.

Hendrycks co-authored two 2021 papers that offered tests of now-widely used AI systems, one asking about their undergraduate-level knowledge of topics like U.S. history, the other probing the models’ ability to reason through competition-level math. The undergraduate-level test is downloaded more times from the online AI hub Hugging Face than any other such dataset.

At the time these papers were published, AI was giving almost random answers to exam questions. “They are now being crushed,” Hendrycks told Reuters.

For example, Anthropic AI lab’s Claude models went from scoring around 77% on an undergraduate-level test in 2023 to nearly 89% a year later, according to a major ability ranking.

These common benchmarks therefore have less meaning.

According to Stanford University’s AI Index report in April, AI appears to perform poorly on lesser-used tests involving plan formulation and visual pattern recognition puzzles. OpenAI o1 scored about 21% on one version of the ARC-AGI pattern recognition test, for example, ARC organizers said Friday.

Some AI researchers argue that results like these show that planning and abstract reasoning are better indicators of intelligence, though Hendrycks said the visual aspect of the ARC makes it less suited to assessing language patterns. “The final test of humanity” will require abstract reasoning, he said.

Answers to common benchmark tests may also have been incorporated into the data used to train AI systems, industry observers said. Hendrycks said some questions in “humanity’s final exam” will remain private to ensure AI systems’ answers don’t come from memorization.

The exam will include at least 1,000 crowdsourced questions to be submitted by November 1, which will be difficult for non-experts to answer. These questions will be subject to peer review, and winning submissions will be offered co-authorship and prizes of up to $5,000 sponsored by Scale AI.

“We desperately need more rigorous testing of expert-level models to measure the rapid progress of AI,” said Alexandr Wang, CEO of Scale.

One restriction: Organizers don’t want questions about weapons, which some say would be too dangerous for AI to study.

One more thing! We are now on WhatsApp channels! Follow us there to never miss any updates from the world of technology. ‎To follow the HT Tech channel on WhatsApp, click on here to join us now!

Latest News

AI Crypto Market Cap Rises Over 25% Amid Major Developments in the Sector

Developing ethical use of AI in primary and secondary education

Tottenham’s technological transformation! How AI is shaping the future of football

Trends for 2025 and beyond

New urgent Gmail security warning for billions as attacks continue

AI and Cybersecurity Industry in Middle East and Africa to See Tremendous Success

AI, 5G and Quantum: risks linked to innovation and cybersecurity

Trends for 2025 and beyond

New urgent Gmail security warning for billions as attacks continue

AI and Cybersecurity Industry in Middle East and Africa to See Tremendous Success

AI, 5G and Quantum: risks linked to innovation and cybersecurity

AI Crypto Market Cap Rises Over 25% Amid Major Developments in the Sector

Smarter AI agents are boosting the digital entertainment industry

ChatGPT and AI tools gain ground in the search market

AI is great, but agencies need to remember that in 2025 they will be in marketing

AI Crypto Market Cap Rises Over 25% Amid Major Developments in the Sector

Smarter AI agents are boosting the digital entertainment industry

ChatGPT and AI tools gain ground in the search market

AI is great, but agencies need to remember that in 2025 they will be in marketing

Sriram Krishnan, Donald Trump’s AI chief, fights to remove country caps on green cards; here’s why it’s good news for Indians

7 Google AI announcements from October

Instagram concerned about challenge of distinguishing real images from AI-generated images, Apple to launch foldable iPhone by 2026 and beyond: Consumer Tech News (Dec. 16-20) – Apple (NASDAQ: AAPL), Amazon.com (NASDAQ:AMZN)

AI is bad news for the Global South

Sriram Krishnan, Donald Trump’s AI chief, fights to remove country caps on green cards; here’s why it’s good news for Indians

7 Google AI announcements from October

Instagram concerned about challenge of distinguishing real images from AI-generated images, Apple to launch foldable iPhone by 2026 and beyond: Consumer Tech News (Dec. 16-20) – Apple (NASDAQ: AAPL), Amazon.com (NASDAQ:AMZN)

AI is bad news for the Global South

Machine learning at the Flatiron Institute

Exploring the Power of AI and ML in Smart Grids: Advances, Applications and Challenges

Unsupervised ML 17 — Future Trends in Unsupervised Machine Learning: What’s Next? | by Ayşe Kübra Kuyucu | December 2024

FrontiersMachine learning applications in search of life beyond EarthMachine learning (ML) and artificial intelligence (AI) have moved beyond niche applications to become transformative and essential tools for analyzing data….2 days

Machine learning at the Flatiron Institute

Exploring the Power of AI and ML in Smart Grids: Advances, Applications and Challenges

Unsupervised ML 17 — Future Trends in Unsupervised Machine Learning: What’s Next? | by Ayşe Kübra Kuyucu | December 2024

FrontiersMachine learning applications in search of life beyond EarthMachine learning (ML) and artificial intelligence (AI) have moved beyond niche applications to become transformative and essential tools for analyzing data….2 days

Sriram Krishnan, Donald Trump’s AI chief, fights to remove country caps on green cards; here’s why it’s good news for Indians

7 Google AI announcements from October

Instagram concerned about challenge of distinguishing real images from AI-generated images, Apple to launch foldable iPhone by 2026 and beyond: Consumer Tech News (Dec. 16-20) – Apple (NASDAQ: AAPL), Amazon.com (NASDAQ:AMZN)

Latest News

Subscribe to Updates

Experts Create ‘Humanity’s Last Test’ to Thwart Powerful AI Technology

Related Posts

Subscribe to Updates