IBM is expected to launch the replacement to its current z16 range of servers in 2025. Our first glimpse of the its next mainframe line came with its announcement of the Telum II processor at the Hot Chips conference in Palo Alto in August this year (mirroring its announcement of the original Telum processor in at the same conference in September 2021, seven months ahead of its introduction of the z16 itself in April 2022). In this post I’ll look at how the technical advancements of the new chip will stretch the lead IBM has over other server vendors even further.
The technical advancements
Like its predecessor Telum II has been designed by IBM and will be fabricated by Samsung; the die size has been reduced from 7nm to 5nm and the operating frequency increased from 5.2 to 5.5MHz. The number of cores and chips that can be used together stay at 8 and 32 respectively (see my Figure above for a summary of the technical differences between the two).
Each core of the Telum II will have a maximum ‘private’ L2 memory cache of 45MB. Again the L2 cache of one core will be able to be used as ‘virtual’ caches for other cores in the processor, creating maxima of 360MB for L3 and 2.88GB for L4 cache sizes. Each of these sizes are 40% larger than those on the original chip.
These increases in memory are directly related to the smaller process size and faster operating frequency.
New additions for Generative AI
However IBM is also introducing two major enhancements to make the new processor even more focused on being used to create and run Large Language Models (LLMs) and Generative AI applications. These are:
- The addition of a Data Processing Unit (DPU) inside the new chip for accelerating complex I/O protocols for networking and storage on the mainframe. IBM claims that the DPU will help simplify system operations and improve the performance of key components.
- The IBM Spyre Accelerator which is an add-on option attachable via a 75-watt PCIe adaptor. The combination of the Spyre Accelerator and Telum II chip can be used to widen the AI compute capability of the overall system. Used together they can form a scalable architecture to support ‘ensemble methods’ of AI acceleration, where multiple machine or deep learning AI models can be combined with encoder LLMs.
IBM was the originator in the development of Generative AI and has succeeded in maintaining significant technical and ecological advantages over the x86/nVidia-GPU based systems that have been driving the strong growth of the server market over the last year. Its mainframes are already being used for real-time fraud detection by major financial institutions around the world for instance.
I expect these enhancements and additions to its mainframe processor will allow it to maintain its technical lead in transaction processing and Generative AI. IBM’s challenge, as always, will be to turn these to its financial advantage – to grow its revenue and profit margin beyond those of the largest commodity x86 server vendors. I can’t wait for the new mainframe!