Giving AI a classic psychological test reveals an inherent weakness in LLM decision-making abilities. Suketu Patel and colleagues explored how transformer-based machine attention differs from human attention by testing AI models on the “Stroop task,” in which words for colors are printed in colored ink, and participants are asked to name the ink color of each word while ignoring its meaning.
The findings are published in the journal PNAS Nexus.
The task is clinically used to assess executive control, especially a person’s ability to inhibit an automatic response. Although humans generally take longer to answer correctly when words and colors are mismatched than when they match, they can still perform stably and with high accuracy even on long word lists.
