
OpenAI, the leading artificial intelligence (AI) firm, recently unveiled its generative large language model (LLM) GPT-4o in May during its Spring Update.
GPT-4o is a type of Omni model with capabilities such as speech-to-speech assistance and voice guidance. This natively multimodal model can also understand speech directly without converting it into text.
OpenAI is now set to make a limited 'alpha' roll-out of its Advanced Voice feature.
Read more: OpenAI unveils GPT-4o mini — Here’s what we know so far
What is GPT-4o Advanced Voice feature?
GPT-4o Advanced Voice is an updated version of the recently unveiled French model, Moshi. It can create custom character voices, generate sound effects while telling a story, and even act as a live translator.
This feature makes GPT-4o both faster and more accurate as a voice assistant, allowing it to pick up on tone and vocal nuances during a conversation. It also acts as a patient language teacher, with the ability to correct pronunciation and help improve accents.
Release date
The feature will be accessible to initial users in the upcoming weeks, according to OpenAI CEO Sam Altman. He further stated that although users have been waiting for access for a long time, safety testing is necessary. Therefore, it will be released to trusted users first, with availability for other users expected later in the autumn.