Today, the accelerated (32 GT/s) PCIe Gen 5 speed and the vast throughput of AI clusters dominate the technical vernacular in the modern data center. But a revolution is going on “beyond the main bus. The communication layer that monitors the health of the hardware, manages power states and protects hardware security is moving from legacy I2C/SMBus architectures to the Improved Inter-Integrated Circuit (I3C) standard. It’s partly due to the demand for deterministic, secure and fast telemetry in Compute Express Link (CXL) and DDR5 systems.
This discussion considers the reasons why SMBus cannot keep pace with the Legacy Wall
- The I2C-based System Management Bus (SMBus) has been adequate enough for such simple tasks for years, such as accessing a voltage level or a temperature sensor. As systems have grown into AI fueled hyperscale systems, though, its shortcomings have become significant system-level constraints.
- The number of buses in the bus loading problem and the complexity of the load multiplexer.
- I2C’s top speed is 1Mbps. With high density components such as DDR5 modules that combine the SPD Hubs, PMICs and dual temperature sensors, the capacitance associated with these buses can sometimes make for the use of physical multiplexers (MUXes). This architecture brings in communication problems and boosts the possibility of signal integrity failures.
- The Polling Tax
- Unlike other bus types, in SMBus there is no efficient interrupt mechanism so the Baseboard Management Controller (BMC) dons the role of constant polling all of the devices. This results in the delay of response time and power wastage.
- Fixed Addressing
- Address is hardware bound in Legacy I2C. This becomes difficult in the case of a lot of identical devices in massive server racks which must be remapped to prevent collisions, otherwise it is not possible to address the databases.
The I3C Logic Stack (MCTP, SPDM and PLDM)
I3C addresses these problems with 12.5 MHz clocks to meet tight timing requirements, In-Band Interrupts (IBI) for signaling nearly instantaneously and Dynamic Address Assignment (DAA) that prevents static mapping conflicts. What is most significant is that it offers the strong transport services needed for contemporary management protocols.
- MCTP (Management Component Transport Protocol) – It provides the framing for data transport for intercommunication between management controllers, GPUs, and CPUs, and their shared I3C bus, in a standard manner.
- SPDM (Security Protocol and Data Model) – SPDM provides for the discovery, authentication and recovery of cryptographic measurement data from devices. A single SPDM transaction can take up to 100ms to complete a critical timing window during which if a bus hang or MUX induced glitch occurs, a secure boot sequence can be compromised or even become stuck (bricked).
- PLDM (Platform Level Data Model) – PLDM is responsible for managing platform monitoring, platform power state change and firmware update throughout the platform to keep the system in a known, healthy state.
The sideband challenges in CXL and memory pooling
The sideband health is no longer “optional”, but necessary in CXL architectures to ensure memory coherency. In the event of a failure of the sideband bus (I3C/MCTP) during a memory-pooling request or during the retrieval of the security, the coherency or even the stability of the entire high speed AI cluster can be lost. Likewise, I3C is used in DDR5 and QSFP modules to overcome the problem of hundreds of sensors waking up at the same time, creating an initialization bottleneck.
What does it mean to validate? What is validation?
Tools that enable the validation of this sideband frontier need to enable more than just bit-level decoding – they must provide visibility and emulation on the protocol layer and system layer.
- PGY-I3C-EX-PD Protocol Exerciser and Analyzer – Engineers can simulate Controllers, Secondary Controllers and Targets on this platform. It allows injecting CRC or parity errors in SPDM query-responses to test security-layer timeouts recovery by firmware.
- Low-Cost Tooling – Solutions such as EX-PD-Lite are designed for manufacturing and Post-Silicon Validation (PPV) environments, while the I3C-USB Adaptor offers a light-weight, offline UI for field engineers to debug sensors when on the move.
- For deep root cause analysis, protocol data needs to be correlated with physical layer waveforms, which is not possible with the Validation Alliance. Prodigy’s integration with high end oscilloscopes from Keysight, Tektronix, and LeCroy enables the engineer to go directly from an MCTP-packet error to the noise spike that arrives at the same location 5ns later.
Reliability of data centric is taking more and more to rely as well on integrity of sideband. Xorcom’s validation of the intent of sideband protocols, rather than packets, will continue to be a key distinguishing factor in high-performance engineering as industries transition to UFS 4.1 for automotive storage and CXL for disaggregated memory. Knowing how to use the I3C logic stack, validation teams can make sure that their designs are as fast and responsive as the high-speed buses they’re managing.
