Still, ChatGPT operates in a mostly siloed fashion. It can’t yet venture out “into the wild” to execute online tasks. For example, if you wanted to buy a milk frother on Amazon for under $100, ChatGPT might be able to recommend a product or two, and even provide links, but it can’t actually navigate Amazon and make the purchase.
Why? Besides obvious concerns, like letting a flawed AI model go on a shopping spree with your credit card, one challenge lies in training AI to successfully navigate graphical user interfaces (GUIs), like your laptop or smartphone screen.
But even the current version of GPT-4 seems to grasp the basic steps of online shopping. That’s the takeaway of a recent preprint paper in which AI researchers described how they successfully trained a GPT-4-based agent to “buy” products on Amazon. The agent, dubbed the MM-Navigator, did not actually purchase products, but it was able to analyze screenshots of an iOS smartphone screen and specify the appropriate action and where it should click, with impressive accuracy.