Stanford researchers have developed an innovative computer vision model that recognizes the real-world functions of objects, potentially allowing autonomous robots to select and use tools more effectively.
In the field of AI known as computer vision, researchers have successfully trained models that can identify objects in two-dimensional images. It is a skill critical to a future of robots able to navigate the world autonomously. But object recognition is only a first step. AI also must understand the function of the parts of an object—to know a spout from a handle, or the blade of a bread knife from that of a butter knife.
Computer vision experts call such utility overlaps “functional correspondence.” It is one of the most difficult challenges in computer vision. But now, in a paper that will be presented at the International Conference on Computer Vision (ICCV 2025), Stanford scholars will debut a new AI model that can not only recognize various parts of an object and discern their real-world purposes but also map those at pixel-by-pixel granularity between objects.