Apple and Nvidia team up to advance AI language models

Apple is exploring different ways to speed up AI models by using Nvidia’s platform

Tech Desk - Dec 20, 2024

An undated image of Nvidia (left) and Apple (right). — Getty Images

Apple has partnered with Nvidia to enhance the performance of large language models (LLMs) by integrating a new text generation technique, offering substantial speed improvements for artificial intelligence (AI) applications.

Previously, the Cupertino-based tech giant published an open-sourced Recurrent Drafter (ReDrafter), an approach that combines beam search and dynamic tree attention methods to accelerate text generation.

Beam search explores multiple potential text sequences simultaneously for better results, while tree attention manages and removes redundant overlaps among these sequences to enhance efficiency.

Currently, Apple has featured the technology in Nvidia's TensorRT-LLM framework, which optimises LLMs running on Nvidia GPUs. According to Apple, the framework achieved "state-of-the-art performance. "

The integration viewed the technique handle a 2.7x speed increase in tokens generated per second during testing with a production model containing tens of billions of parameters.

Apple said that the advanced performance not only reduces user-perceived latency but also leads to decreased GPU usage and power consumption.

From Apple's Machine Learning Research blog: "LLMs are increasingly being used to power production applications, and improving inference efficiency can both impact computational costs and reduce latency for users. With ReDrafter's novel approach to speculative decoding integrated into the Nvidia TensorRT-LLM framework, developers can now benefit from faster token generation on Nvidia GPUs for their production LLM applications."