
Google Research team has recently revealed the integration of Gemini-powered AI-based Ask Photos to Google Photos soon, it has also shared the way about how it will work.
Google Research team have further said that Ask Photos is certainly a robust example of how Gemini models can act as agents through memory abilities.
They said: ”Sample queries Google has provided outside the on-stage announcement include: 'Show me the best photo from each national park I’ve visited', 'What themes have we had for Lena’s birthday parties?'. Your conversational query is “passed to an agent model that uses Gemini to determine the best retrieval augmented generation (RAG) tool for the task.”
Read more: Google I/O 2024 — AI takes centre stage with Gemini, Project Astra, and more
Typically, the agent model frequently starts by the comprehension of the user's main focus and composes a search via photos by utilising an enhanced vector-based retrieval system that assists in expanding the strong metadata search built in Google Photos. This system is substantially preferable at comprehending basic language ideas instead of keyword research.
An answer model then considers the search photos as well as videos, then Gemini’s long context window as well as intermodal abilities are elevated in the search of the most applicable information.Then, the visual content or any text, dates, locations etc which has to be utilised.
Eventually, the answer model crafts a constructive response grounded in videos as well as the studied photos. Additionally, soon-to-be introduced Ask Photos will always remember all the information for the future conversations as well so it;’s surely a lot more than just a search feature which will assist you in multiple ways and offers a user-friendly search experience.