Nvidia rolls out new vision-language AI model NVLM 1.0 to rival GPT-4

NVLM 1.0 progresses higher text-handling capabilities with the help of the 72 billion parameter NVLM-D-72B

Tech Desk - Oct 06, 2024

An undated image of Nividia logo. — Unsplash

Nvidia has launched a new artificial intelligence (AI) model NVLM 1.0, a superpower entrant up the company's sleeves to bring an end to the monopoly of archrivals like OpenAI which has been at the forefront of AI innovations since ChatGPT debuted back in 2022.

Standing out as a potent competitor across vision and language-related tasks, Nvidia's new NVLM 1.0 progresses in text-handling capabilities by leveraging the 72-billion parameter NVLM-D-72B.

“We introduce NVLM 1.0, a family of frontier-class multimodal large language models that achieve state-of-the-art results on vision-language tasks, rivalling the leading proprietary models (e.g., GPT-4o) and open-access models,” Venture Beat quoted Nvidia as stating in a research paper.

Ditching the traditional manner of releasing the AI model, publishing the NVLM 1.0 to the general public, and pledging to do so for its training code are what set Nvidia apart from the myriad of AI innovators.

Nvidia's approach to making its new vision-language model publicly available will enable researchers and developers to make the most of cutting-edge AI technology and build upon it.

“Our NVLM-D-1.0-72B demonstrates significant improvements over its text backbone on text-only math and coding benchmarks,” outlined the researcher's paper while emphasising some of the most significant properties of the tool.

The ability to exhibit diversity when it comes to processing complex visual and textual inputs is what grants NVLM 1.0 novelty, a compliment which was well validated keeping in view of the model’s ability to comprehend memes, and images, and solve mathematical problems in the most easy-to-understand way.