
A new study published in PNAS Nexus reveals that large language models, which are advanced artificial intelligence systems, demonstrate a tendency to present themselves in a favorable light when taking personality tests. This “social desirability bias” leads these models to score higher on traits generally seen as positive, such as extraversion and conscientiousness, and lower on traits often viewed negatively, like neuroticism.
The language systems seem to “know” when they are being tested and then try to look better than they might otherwise appear. This bias is consistent across various models, including GPT-4, Claude 3, Llama 3, and PaLM-2, with more recent and larger models showing an even stronger inclination towards socially desirable responses.
Large language models are increasingly used to simulate human behavior in research settings. They offer a potentially cost-effective and efficient way to collect data that would otherwise require human participants. Since these models are trained on vast amounts of text data generated by humans, they can often mimic human language and behavior with surprising accuracy. Understanding the potential biases of large language models is therefore important for researchers who are using or planning to use them in their studies.