Google made an array of AI announcements during its annual developer conference, I/O 2024, which took place yesterday.
CEO Sundar Pichai said that Google is now “fully in [its] Gemini era” with new upgrades to the language model including a live feature that enables users to have “in-depth” voice chats with the AI model on their phones and upgrades to its search engine with AI Overviews.
Gemini is now available to users to interact with across iOS and on Android. The tech giant also introduced Gemini 1.5 Flash, its new multimodal model that’s optimised for “narrow, high-frequency, low-latency tasks” and made improvements to Gemini 1.5 Pro including doubling its context window from one million to two million tokens.
As part of the upgrade, users can ask Gemini questions and interrupt it when it’s mid-answer for clarifications while the chatbot will make itself familiar with the user by adapting to speech patterns over time and seeing/responding to physical surroundings captured via photos or videos on the device.
Pichai said: “Gemini is more than a chatbot; it’s designed to be your personal, helpful assistant that can help you tackle complex tasks and take actions on your behalf. Interacting with Gemini should feel conversational and intuitive.”
Gemini will also be used to help users filter through photos by promoting the AI model to show photos based on context. This “Ask Photos” feature is set to start rolling out in the summer.
Additionally, Gemini will also be integrated into apps like Gmail so users can use it to search, summarise, and draft emails. It will also be able to interact with other apps such as YouTube to ask for specific information.
During the conference, Google also demonstrated Project Astra, billed as a virtual assistant that can watch and understand what is happening through a device’s camera, remember where things are, and do things for you accordingly.
Creators will also be able to benefit from new generative AI (gen AI) tools available through Google such as VideoFX which can create 1080p videos based on text prompts, an improved version of Image FX which removes the issue of unwanted digital artefacts in pictures, and a DJ Mode feature in MusicFX which enables musicians to generate song loops and samples based on prompts.
The news follows a litany of recent moves on the gen AI front, with OpenAI announcing the next version of its chatbot called ‘GPT-4o’ earlier this week, which includes new features such as the ability to identify emotions from visual expressions, to recall previous prompts, and discuss the content within images.
Apple is also reportedly “finalising terms” with OpenAI to integrate ChatGPT’s technology into the new iOS 18 upgrade, according to a report by Bloomberg, as the company gears up to new AI announcements at the upcoming Worldwide Developers Conference in June.