New IBM Telum II processor and IBM Spyre accelerator deliver enterprise-scale AI capabilities including large language models and generative AI
Advanced IO technology enables and simplifies a scalable IO subsystem designed to reduce power consumption and data center footprint
August 26, 2024
PALO ALTO, California, August 26, 2024 /PRNewswire/ — IBM (NYSE: IBM) unveiled architectural details of the upcoming IBM Telum® II processor and IBM Spyre™ accelerator at Hot Chips 2024. The new technologies are designed to dramatically increase the processing power of next-generation IBM Z mainframe systems, helping to accelerate the use of traditional AI models and large-language AI models in tandem through a new ensemble approach to AI.
As many generative AI projects leveraging large language models (LLMs) move from proof-of-concept to production, the need for energy-efficient, secure, and scalable solutions has become a top priority. A Morgan Stanley study published in August predicts that generative AI’s energy requirements will grow 75% annually over the next few years, putting it on track to consume as much energy by 2026 as Spain made in 2022.1 Many IBM clients indicate that architectural decisions to support right-sized foundation models and hybrid-by-design approaches for AI workloads are increasingly important.
Key innovations unveiled today include:
- IBM Telum II processor: Designed to power next-generation IBM Z systems, the new IBM chip features increased memory frequency and capacity, a 40 percent increase in cache, and an integrated AI accelerator core and coherently connected data processing unit (DPU) compared to the first-generation Telum chip. The new processor is expected to support enterprise computing solutions for LLMs, addressing the industry’s complex transactional needs.
- IO Acceleration Unit: An entirely new data processing unit (DPU) on the Telum II processor chip is designed to accelerate complex I/O protocols for networking and storage on the mainframe. The DPU simplifies system operations and can improve the performance of key components.
- IBM Spyre Accelerator: Provides additional AI compute capacity to complement the Telum II processor. Working together, the Telum II and Spyre chips form a scalable architecture to support ensemble AI modeling methods, the practice of combining multiple machine learning or deep learning AI models with encoder LLMs. By leveraging the strengths of each model architecture, ensemble AI can deliver more accurate and robust results than individual models. The IBM Spyre Accelerator chip, previewed at Hot Chips 2024, will ship as an add-on option. Each accelerator chip connects via a 75-watt PCIe adapter and is based on technology developed in collaboration with IBM Research. As with other PCIe cards, the Spyre Accelerator chip is scalable to meet customer needs.
“Our robust, multi-generational roadmap keeps us ahead of technology trends, including growing demands for AI,” said Tina TarquinioVice President, Product Management, IBM Z and LinuxONE. “The Telum II processor and Spyre accelerator are designed to deliver high-performance, secure, and more energy-efficient enterprise computing solutions. After years of development, these innovations will be introduced in our next-generation IBM Z platform so clients can take advantage of LLM and generative AI at scale.”
The Telum II processor and IBM Spyre accelerator will be manufactured by IBM’s long-time manufacturing partner Samsung Foundry and built on its high-performance, low-power 5nm process node. Working together, they will support a range of advanced AI-driven use cases designed to unlock business value and create new competitive advantages. Using ensemble AI methods, customers can achieve faster and more accurate prediction results. The combined processing power announced today will provide an on-ramp to the application of generative AI use cases. Examples include:
- Insurance Claims Fraud Detection: Improved fraud Home insurance claim detection using ensemble AI, which combines LLM with traditional neural networks designed to improve performance and accuracy.
- Advanced fight against money laundering: Advanced detection of suspicious financial activities, promoting compliance with regulatory requirements and mitigating the risk of financial crimes.
- AI Assistants: Drive application lifecycle acceleration, knowledge and expertise transfer, code explanation and transformation, and more.
Specifications and performance measures:
Telum II Processor: Featuring eight high-performance cores running at 5.5 GHz, with 36 MB of L2 cache per core and a 40 percent increase in on-chip cache capacity for a total of 360 MB. The 2.88 GB virtual Level 4 cache per processor drawer provides a 40 percent increase over the previous generation. The integrated AI accelerator enables low-latency, high-throughput in-transaction AI inference, improving for example fraud detection during financial transactions and offers a fourfold increase in computing capacity per chip compared to the previous generation.
The new IO DPU is integrated into the Telum II chip and is designed to improve data management with 50% increased IO density. This advancement improves the overall efficiency and scalability of IBM Z, making it ideal for handling the large-scale AI workloads and data-intensive applications of today’s enterprises.
Spyre Accelerator:A purpose-built enterprise-grade accelerator with scalable capabilities for complex AI models and generative AI use cases is being introduced. It features up to 1TB of memory, designed to work in tandem across all eight cards in a standard I/O drawer to support AI model workloads across the entire mainframe while being designed to consume no more than 75W per card. Each chip will have 32 compute cores supporting int4, int8, fp8, and fp16 data types for low-latency, high-throughput AI applications.
Availability
The Telum II processor will be the core processor for the next-generation IBM Z and IBM LinuxONE platforms, and is expected to be available to IBM Z and LinuxONE customers in 2025. The IBM Spyre accelerator, currently in technical preview, is also expected to be available in 2025.
Statements regarding IBM’s future direction and intent are subject to change or withdrawal without notice and represent goals and objectives only.
About IBM
IBM is a leading global provider of hybrid cloud and artificial intelligence services and consulting expertise. We help clients in more than 175 countries unlock insights from their data, streamline business processes, reduce costs, and gain a competitive advantage in their industries. Thousands of governments and enterprises in critical infrastructure areas such as financial services, telecommunications, and healthcare rely on IBM’s hybrid cloud platform and Red Hat OpenShift to digitally transform quickly, efficiently, and securely. IBM’s breakthrough innovations in artificial intelligence, quantum computing, industry cloud solutions, and consulting provide open and flexible options for our clients. All of this is underpinned by IBM’s longstanding commitment to trust, transparency, accountability, inclusion, and service.
Additional sources
- Read Learn more about the IBM Telum II processor.
- Read Learn more about the IBM Spyre Accelerator.
- Read Learn more about the IO Accelerator
Media contact:
Chase Skinner
IBM Communications
chase.skinner@ibm.com
Aishwerya Paul
IBM Communications
aish.paul@ibm.com
1 Source: Morgan Stanley Research, August 2024.
SOURCE IBM