Intel Gaudi 3 PCIe: A Game Changer for AI Workloads Beyond OAM

Intel’s Gaudi 3 AI accelerator has been a significant advancement in AI hardware, previously available primarily in the OAM (Open Accelerator Module) form factor. The introduction of the PCIe version marks a pivotal shift, enabling broader adoption and integration into existing enterprise infrastructures.
What Is Intel Gaudi 3 PCIe?
The Intel Gaudi 3 PCIe (HL-338) is a dual-slot, full-height accelerator card designed to fit into standard PCIe Gen5 x16 slots. With a thermal design power (TDP) of up to 600W, it offers a balanced performance-to-power ratio suitable for various AI applications, including training and inference of large language models (LLMs) and multimodal models.
Key specifications include:
- Compute Units: 64 fully programmable Tensor Processor Cores (TPCs) and 8 Matrix Math Engines (MMEs)
- Memory: 128GB of HBM2e memory with 3.7 TB/s bandwidth, complemented by 96MB of on-die SRAM
- Data Types Supported: FP8, BF16, FP16, and FP32
- Host Interface: PCIe Gen5 x16 slot offering up to 128 GB/s of bandwidth
- Form Factor: Full-height, dual-slot PCIe card with a length of 10.5 inches
- Thermal Design Power (TDP): 600W
Why It’s a Game Changer
1. Enterprise Integration
Unlike the OAM form factor, which requires specialized infrastructure, the PCIe version of Gaudi 3 seamlessly integrates into existing server architectures. This compatibility allows enterprises to leverage their current hardware investments without the need for significant modifications.
2. Cost-Effective Scalability
Intel positions Gaudi 3 as a more affordable alternative to competitors like NVIDIA’s H100, offering up to 1.8x better performance per dollar for inference tasks. This cost efficiency is crucial for organizations aiming to scale their AI capabilities without escalating expenses.
3. Advanced Networking Capabilities
The RoCE v2 RDMA ports on the HL-338 are exposed through a gold-finger connector, which can utilize the top bridges ( HLTB-304A/HLTB-304B) to connect the 4 HL-338 cards. This means you can link four Gaudi 3 PCie cards with a top bridge to achieve 1200GB/s bandwidth.

Enabling AI Workloads with Nutanix and vSAN Ready Nodes
While Nutanix and VMware vSAN Ready Nodes currently support other PCIe form factor AI accelerators, the Intel Gaudi 3 PCIe presents a flexible and complementary option for organizations looking for additional choices in AI hardware.
Nutanix
Nutanix’s existing support for PCIe-based AI accelerators makes it well-positioned to integrate Gaudi 3 into its hyper-converged infrastructure platforms. With its powerful combination of Intel XEON processors and support for Intel Advanced Matrix Extensions (AMX), the addition of Gaudi 3 would provide customers with a broader range of options for handling AI workloads, offering flexibility in choosing the best accelerator for their specific needs.
vSAN Ready Nodes
Although vSAN Ready Nodes are currently optimized for VMware’s vSAN and are not yet officially integrated with Gaudi 3 PCIe accelerators, the growing demand for AI workloads suggests that future versions of vSAN Ready Nodes may incorporate support for these powerful accelerators. With the ability to handle model training and inference tasks efficiently, this would position vSAN Ready Nodes as a capable platform for AI-centric workloads.
Conclusion
The release of Intel Gaudi 3 in the PCIe form factor democratizes access to high-performance AI acceleration. By aligning with industry standards and offering cost-effective scalability, it empowers enterprises to integrate advanced AI capabilities into their existing infrastructures. Platforms like Nutanix and vSAN Ready Nodes can leverage Gaudi 3 PCIe to enhance their AI workload handling, paving the way for more intelligent and efficient enterprise operations in the near future.