- OpenAI revealed a GPT-4o update for ChatGPT who can reason through audio, vision and text.
- The enhanced chatbot has a human-like ability to imitate dictation, adding humor and voice inflection.
CEO of OpenAI Sam Altman said that the new ChatGPT update “feels like magic” – and he wasn’t wrong.
The AI company essentially planted a flag in the sand emblazoned with two words aimed at its Big Tech rivals: your movement.
Mira Murati, CTO of OpenAI, presented the “Spring Update” for ChatGPT Monday with a series of live demos. The latest version of the AI chatbot, powered by OpenAI’s new flagship AI model, GPT-4o, can reason through audio, vision and text in real-time.
And it’s surprisingly human.
We’re getting closer to the movie “Her”
For starters, ChatGPT’s voice and conversational capabilities have taken a huge step forward thanks to GPT-4o, appearing capable of expressing emotion and varying its tone.
The new AI has what sounds like an American woman’s voice in the demo – think Scarlett Johansson in the Spike Jonze film “Her” – although OpenAI researchers changed it to a robot voice at a given moment. An OpenAI spokesperson said audio output will be limited to a selection of predefined voices at launch.
The voice didn’t just sound like a human voice. He also showed an uncanny ability to imitate human dictation. The new ChatGPT laughs, adds humor, and moderates voice inflection based on prompts.
It also appears to be able to pick up some human signals. When a researcher was hyperventilating while practicing deep breathing, the chatbot told him: “Mark, you are not a vacuum cleaner. »
You can also pause the chatbot, making conversations more natural. You don’t need to wait for the AI to finish its response before asking a clarifying question or changing the subject.
The response time was also lightning fast. An OpenAI spokesperson said the chatbot can respond to audio inputs with a similar response rate to humans, taking an average of 320 milliseconds.
Following the event, Sam Altman, CEO of OpenAI posted on X, formerly Twitter, with the title of the film which was on many people’s minds after seeing the demos.
her
– Sam Altman (@sama) May 13, 2024
ChatGPT’s eyes have also been improved
The chatbot demonstrated high abilities to interpret a graph, assist with coding, interpret emotions, and essentially teach users mathematical equations by viewing videos or images presented on a phone’s camera.
All the while, the voice assistant maintained a light and cheerful tone.
In a separate demo shared online, GPT-4o was even able to analyze video of the space around a user, taking into account that the person was wearing an OpenAI hoodie and surrounded by AI equipment. recording, to guess that she might be putting some together. OpenAI announcement.
Say hello to GPT-4o, our new flagship model that can reason in real-time about audio, vision and text: https://t.co/MYHZB79UqN
Text and image input will roll out to the API today and ChatGPT with voice and video in the coming weeks. pic.twitter.com/uuthKZyzYx
-OpenAI (@OpenAI) May 13, 2024
Even though the chatbot seemed to have a few hiccups, such as when it misinterpreted an image prompt or started answering inaccurately before the question was finished, these moments almost made the chatbot seem more human.
Everything feels more human – and ahead of what we’ve seen from our rivals
In one case, ChatGPT started responding to a prompt before the researcher showed the equation on camera, and the researcher stopped the chatbot in its tracks.
“Oops, I was so excited,” the chatbot responded. “I’m ready when you are.”
He also seemed to respond with responses that seemed to mimic feelings of appreciation. When the researcher showed the chatbot an image of writing that said “I like ChatGPT,” it responded “aw” and said “that’s so nice of you.”
In another case, ChatGPT said the researcher made him blush when he said he was talking about how useful and amazing ChatGPT was.
OpenAI made the announcements the day before Google’s big summer conference, Google IO, which is expected to reveal the company’s progress on its various AI products, such as Gemini.
But the timing of the OpenAI event – and its impressive demonstrations – will leave AI watchers curious to see if ChatGPT beats Google’s Geminior if Google has something up its sleeve.
But for now, OpenAI’s spring update once again demonstrates how impressive ChatGPT can be, especially when compared to the existing voice assistant space.
Amazon’s Alexa, Apple’s Siri and Google are all notified. Their voice assistants are known for giving robotic and direct answers to questions, far from being truly conversational. The new ChatGPT powered by GPT-4o blows them out of the water with its human-like responses.
Apple, for its part, seems aware of the gap between even older versions of ChatGPT and Siri, with a recent report indicating that the decision was made to review iPhone voice assistant after Apple executives spent weeks playing with ChatGPT and the company realized how far behind it was. There are also rumors that the two companies have been talking, and Apple may end up licensing OpenAI’s model for some iPhone features that haven’t been announced yet.
Apple fans shouldn’t have to wait long for more information. The company is expected to unveil its AI updates at Apple’s annual conference. Worldwide Developers Conference on June 10.
Meanwhile, Amazon was considering releasing a “AlexaPlus” paid version of the voice assistant powered by generative AI, Business Insider’s Eugene Kim first reported. The assistant is supposed to offer more conversational and personalized responses, but it’s unclear when it will release.
But, just as it did with the first version of Chat GPT, OpenAI has once again highlighted how impressive its technology can be – and is letting the rest of the tech industry prove it can catch up .