[ad_1]
Apple researchers have created an AI model that can understand what’s happening on your phone’s screen. This is the latest model in an ever-growing model lineup.
This multimodal large-scale language model (MLLM), called Ferret-UI, can perform various tasks based on what you see on your phone’s screen. Apple’s new model can, for example, identify the type of icon, locate specific pieces of text, and give you precise instructions on what to do to accomplish a particular task.
These features are documented in recently published documentation. paper We’ve detailed how this special MLLM is designed to understand and interact with mobile user interface (UI) screens.
What we don’t know yet is whether this will be part of the rumored Siri 2.0, or if it’s just an Apple AI research project beyond a published paper.
How Ferret-UI works
We now use our phones to perform a variety of tasks, such as looking up information and making reservations. To do this, we look at the phone and tap the button that leads to the goal.
Apple believes that if this process can be automated, it will make interacting with your phone even easier. It is also hoped that models such as Ferret-UI will be useful for accessibility, app testing, usability testing, and more.
For such a model to be useful, Apple needed to ensure that it could understand everything happening on the phone screen while still being able to focus on specific UI elements. Overall, we also had to match the instructions given in normal language with what we saw on the screen.
For example, Ferret-UI was shown a photo of AirPods at an Apple store and asked how to purchase them. Ferret-UI correctly responded that I needed to tap the “Buy” button.
Why is Ferret-UI important?
Most of us have a smartphone in our pocket, so it’s no surprise that companies are looking at ways to add tailored AI capabilities to these small devices.
Research scientists at Meta Reality Labs already expect to spend more than an hour each day talking directly to chatbots or running LLM processes in the background that power features such as recommendations.
Yann Le Cun, chief AI scientist at Meta, even went so far as to say that in the future, AI assistants will mediate our entire digital diet.
So while Apple didn’t reveal exactly what its plans are for Ferret-UI, we can imagine how such a model could be used to enhance Siri and make the iPhone experience more pleasant. It’s not that difficult. Perhaps even before this year is over.
Tom’s Guide Details
[ad_2]
Source link