Can machines ever see the world as we see it? Researchers have uncovered compelling evidence that vision transformers (ViTs), a type of deep-learning model that specializes in image analysis, can spontaneously develop human-like visual attention patterns when trained without labeled instructions.
Visual attention is the mechanism by which organisms, or artificial intelligence (AI), filter out “visual noise” to focus on the most relevant parts of an image or view. While natural for humans, spontaneous learning has proven difficult for AI.
However, researchers have revealed, in their recent publication in Neural Networks, that with the right training experience, AI can spontaneously acquire human-like visual attention without being explicitly taught to do so.