Deploying AI Workloads in Virtualised Environments with VMware VVF, Private AI on VCF, RHEL KVM, and Nutanix

Artificial Intelligence (AI) is transforming industries, but deploying AI workloads efficiently remains a challenge. Many organisations look to virtualisation to maximise resource utilisation, improve security, and streamline AI infrastructure management. This blog explores how to deploy AI workloads in virtualised environments using VMware Virtualised vSphere for AI (VVF), Private AI on VMware Cloud Foundation (VCF), Red Hat Enterprise Linux (RHEL) KVM, and Nutanix, while also considering VM configuration, network settings, and acceleration technologies such as GPUs, Intel Advanced Matrix Extensions (AMX), and other hardware enhancements. Additionally, we discuss the choice between Virtual Machines (VMs) and Containers for AI workloads and the benefits of each approach.

Why Virtualise AI Workloads?

AI workloads often demand high-performance computing, acceleration technologies, and optimised resource allocation. Virtualisation provides several benefits:

  • Resource Optimisation: Efficiently allocate CPU, GPU, and memory to AI workloads.
  • Scalability: Easily scale AI infrastructure based on demand.
  • Security and Isolation: Protect workloads with strong isolation and security policies.
  • Simplified Management: Centralised management reduces complexity in AI deployment and maintenance.

Virtual Machines vs Containers for AI Workloads

When deploying AI workloads, organisations can choose between Virtual Machines (VMs) and Containers. Each has its own advantages:

Virtual Machines (VMs)

  • Stronger Isolation: Each VM runs its own operating system, providing better security and isolation.
  • Hardware Acceleration Support: Supports GPU passthrough and vGPU configurations for AI workloads.
  • Established Enterprise Support: VMs are widely adopted in enterprise environments, offering mature tools for management and monitoring.
  • Better for Mixed Workloads: Useful when running AI alongside other traditional applications in the same infrastructure.

Containers

  • Lightweight and Fast: Containers share the host OS kernel, making them more efficient in resource usage and quicker to deploy.
  • Portability: Containers can run consistently across different environments (on-premises, cloud, hybrid).
  • Ideal for Microservices and AI Pipelines: Containers work well with Kubernetes and container orchestration platforms for distributed AI workloads.
  • Scalability: Easier to scale AI workloads dynamically compared to VMs.

Which to Choose?

  • Use VMs when security, isolation, or GPU passthrough is a priority.
  • Use Containers when scalability, portability, and microservices-based AI workflows are needed.
  • A hybrid approach combining both VMs and Containers is often optimal, leveraging the strengths of each.

Key Considerations for AI Workloads in Virtualised Environments

VM and Container Configuration

Proper configuration is crucial for AI workloads to perform optimally. Consider the following:

  • CPU Allocation: Assign sufficient vCPUs based on AI model complexity.
  • Memory: Allocate ample RAM to handle data processing and model execution.
  • Storage: Use high-speed NVMe or SSD storage for faster I/O operations.
  • NUMA Awareness: Ensure VM placement considers NUMA node locality to reduce latency.
  • AMX Support: Enable Intel AMX on Xeon processors to accelerate matrix-heavy AI computations.

Network Configuration

AI workloads may require high network bandwidth, especially in distributed training scenarios. Best practices include:

  • SR-IOV (Single Root I/O Virtualisation): Improves network performance by reducing overhead.
  • vSwitch and NSX Integration: Optimise networking with VMware NSX for micro-segmentation and security.
  • RDMA (Remote Direct Memory Access): Useful for high-speed interconnects in AI training clusters.

Acceleration Technologies

AI workloads require specialised acceleration hardware. Consider the following options:

  • vGPU (Virtual GPU): Allows multiple VMs or containers to share a single GPU, optimising resource utilisation.
  • GPU Passthrough: Assigns a dedicated GPU to a VM for maximum performance but limits flexibility.
  • Intel AMX: Available on Intel Xeon Scalable processors, AMX accelerates deep learning inference and training workloads without requiring a GPU.
  • Bitfusion (VMware): Enables remote GPU sharing across multiple AI workloads.
  • Intel DL Boost: Enhances AI inference performance using dedicated CPU instructions.

Deploying AI Workloads with VMware VVF and Private AI on VCF

VMware vSphere for AI (VVF)

VMware VVF is specifically designed to optimise AI workloads on VMware vSphere, providing enhanced GPU virtualisation and support for CPU-based AI acceleration. It enables enterprises to leverage virtual GPUs (vGPU) and GPU passthrough while ensuring efficient resource allocation across multiple AI workloads.

Key Deployment Steps:

  1. Enable GPU and CPU AI Acceleration:
    • Install NVIDIA vGPU software and configure GPU passthrough.
    • Enable Intel AMX and DL Boost for CPU-based AI acceleration.
  2. Configure vGPU in VMware vSphere:
    • Install NVIDIA vGPU Manager in the ESXi hypervisor.
    • Assign vGPU profiles within vCenter.
  3. Deploy AI Workloads in VMs or Containers:
    • Create optimised VMs for AI workloads.
    • Deploy AI frameworks like TensorFlow, PyTorch, or OpenVINO within containers or VMs.

Private AI on VMware Cloud Foundation (VCF)

VMware Cloud Foundation (VCF) extends AI capabilities across private cloud environments, integrating AI frameworks within a fully software-defined data centre (SDDC). Private AI on VCF enables enterprises to securely deploy AI workloads at scale, leveraging VMware Tanzu for containerised AI applications while maintaining compliance and governance.

Key Deployment Steps:

  1. Deploy VMware Cloud Foundation:
    • Set up a full-stack SDDC with vSphere, vSAN, and NSX.
  2. Integrate AI Frameworks and Accelerators:
    • Use NVIDIA AI Enterprise Suite for GPU-based AI applications.
    • Enable Intel AMX and DL Boost for optimised AI inference on Xeon processors.
  3. Deploy AI Workloads in Kubernetes or VMs:
    • Use VMware Tanzu Kubernetes Grid (TKG) for container-based AI workloads.
    • Deploy machine learning (ML) models and inference workloads in VMs when strict isolation is required.
  4. Security and Performance Optimisation:
    • Leverage vSAN for high-performance AI data storage.
    • Implement NSX micro-segmentation to isolate AI workloads securely.

Deploying AI Workloads on RHEL KVM and Nutanix

Both RHEL KVM and Nutanix offer strong support for AI workloads in virtualised and containerised environments.

  • RHEL KVM:
    • Use qemu-kvm and libvirt for VM-based AI workloads.
    • Enable GPU passthrough with vfio-pci.
    • Deploy container-based AI workloads using OpenShift.
  • Nutanix:
    • Use Nutanix AHV for VM-based AI workloads.
    • Deploy AI models in containers with Nutanix Karbon Kubernetes service.
    • Optimise AI storage with Nutanix Files and Objects.

Conclusion

By differentiating between VMware VVF and Private AI on VCF, organisations can determine the best deployment strategy for their AI workloads. VVF provides optimised virtualised AI performance on vSphere, while Private AI on VCF integrates AI capabilities within a full-stack cloud environment. Selecting between VMs and Containers further refines AI infrastructure decisions, ensuring scalable, secure, and high-performance AI workloads.