The rise of large language models (LLMs) has driven significant demand for efficient inference and fine-tuning frameworks. One such framework, vLLM, is optimised for high-performance serving with PagedAttention, allowing for memory-efficient execution across diverse hardware architectures. With the introduction of new AI accelerators such as Gaudi3, H200, and MI300X, optimising fine-tuning parameters is essential to […]

Read More

The AI hardware market is rapidly evolving, driven by the increasing complexity of AI workloads. DeepSeek, a new large-scale AI model from China, has entered the scene, but its impact on the broader AI landscape remains an open question. Is it simply a competitor to OpenAI’s ChatGPT, or does it have wider implications for inferencing, […]

Read More