IBM launches the z17 mainframe – supreme AI, I/O and security


Today IBM announced the z17 – the latest in a long line of mainframes. It is based on the new Telum II – a 5nm chip fabricated by Samsung. In this post I summarize the key differentiators of the new system based on pre-briefings by IBM for which I’m very grateful.
The new system follows three years in which the z16 is set to become the most successful IBM Z sever – an architecture (with ‘z’ standing for ‘zero downtime’) originally introduced with the eServer zSeries 900 (z900 for short) in 2000. The z16 has already massively overtaken the total of MIPS (Millions of Instructions per Second) shipped of each preceding generation and will overtake them all in revenue once the last one is installed: see my Figure above, where I’ve included model numbers for LinuxONE (in black) and those systems running IBM’s own operating systems (in white), as well as the time interval between announcements. In fact the chart could be extended almost endlessly to the left, as IBM has been making mainframes for 61 years! It is reasonable to assume that the z17 will also succeed in overtaking the z16 in revenues and MIPS shipped over its lifetime.

Telum II enables the z17’s enhancements in AI, I/O and security


Looking closer at the last four IBM Z generations, IBM has delivered key new features, unlocked by the new CPUs on which they’re based. The z17 uses the Telum II CPU. It features:

  • A 5nm processor manufactured by Samsung;
  • An on-chip AI accelerator running at a faster speed and giving ultra-low latency of real-time inferencing on every transaction;
  • 8-processor cores with a ‘deep super-scalar out-of-order instruction pipeline’ running at 5.5GHz (faster than the 5.2MHz of its predecessor);
  • High scalability – a single z17 system can use up to 32 Telum II processor chips including up to 208 cores;
  • A more extensive virtual cache system – the L2 cache of one core can be used as virtual L3 and L4 caches for another core, delivering maximum cache sizes 360MB (L3) and 2.88GB (L4).

Telum II maintains IBM’s proprietary approach to chip design in a world in which almost all over suppliers build systems using x86 processors from Intel or AMD. IBM isn’t trying to compete head on with those that are adding Nvidia GPUs to their servers to address Generative AI workloads for public cloud suppliers and other large customers; with the z17 it is fully integrating high-speed, ultra-low latency into transaction processing, which is the workload in which its mainframes truly excel.

Better for Advanced AI – on-board support for LLMs, in-draw routing and Spyre cards

With the z17 IBM has gone a long way beyond the on-chip AI inferencing included for the first time on its predecessor. In particular it is introducing:

  • On-board support for Large Language Model (LLM) primitives, improved quantization and matrix operations and faster processing,
  • New in-drawer intelligent routing, allowing remote AI processing and providing access to eight times the number of on-chip AI accelerators to any core in comparison with the z16, and
  • The ability to attach up to 48 new Spyre PCIe Gen5 acceleration cards per system, each of which has 32 AI-ready cores, which can perform at up to 300 Tera Operations per Second (TOPS[1]).

For running an equivalent AI workload without the new Spyre cards, a large z17 will use less electricity, need one less frame and two less power cords than its predecessor; with Spyre cards attached, it will be able to deliver much more powerful AI using approximately the same amount of electricity with the same advantages in frames and power cords.
The new system offers multi-model (Predictive AI and LLMs) inferencing, which can be used by its customers for hundreds of AI use cases – I’m particularly impressed with its expected use for advanced fraud and anomaly detection. Beyond AI processing IBM has also included assistants for Generative AI and agents, which offer AI insights for problem solving, reasoning and decision making.

Improved I/O handling – the DPU

Back in 2017 IBM introduced high-speed synchronization on the z14. It has now gone several stages further by including an on-chip Data Processing Unit (DPU) on the Telum II and z17, which can be used to create a massive I/O subsystem. The DPU implements complex I/O protocols, reduces latency and allows I/O management to run at 70% of the electrical power of the z16.

Even more Security – new functions for the Crypto Express 8S card, Crypto and Discovery Dashboard

The z16, introduced in 2022, was heralded as the first ‘quantum-safe’ server, using Crypto Express 8S cards to protect the system from quantum cyber attacks through its built-in dual signature scheme. These cards provide the same protection on the z17. Beyond that they can now also be used to protect the sensitive data and assets of new applications through quantum-safe APIs based on the National Institute of Standards and Technology (NIST) Post-Quantum Cryptography (PQC) standards.
Security on the z17 will also be enhanced through the new Crypto and Discovery Dashboard, which creates a ‘crypto inventory’ showing where and what cryptography is used in applications – very helpful for application migration and modernization.
In addition the z17 offers better threat detection. Using light-weight AI, the system can detect anomalous and potentially malicious data access to workloads running on Z/OS Systems.

If DeepSeek frightened many x86/Nvidia-based AI system builders for its cheapness, IBM should frighten many for how advanced a server built for AI can be. Despite the introduction of on-chip AI inferencing with the z16, IBM’s server market share still hasn’t begun to recover the heights of the early 2000s (see my Figure above). If it can persuade all major companies of the need to fully integrate AI into major transaction processing systems, it can now deliver them with lower latency, faster processing, smaller footprints, easier management, lower electricity usage and cheaper Total Cost of Ownership than ever before. In consequence it will build a customer base beyond the impressive list of large organizations it already serves. It will also see its quarterly server revenues soar higher than the normal huge growth planned for from a new mainframe announcement.
[1] A term used to measure Neural Processing Unit (NPU) performance in AI systems.

Leave a Reply