Scientists studying Large Language Models (LLMs) have found that LLMs perform similarly to humans in cognitive tasks, often making judgments and decisions that deviate from rational norms, such as risk and loss aversion. LLMs also exhibit human-like biases and errors, particularly in probability judgments and arithmetic operations tasks. These similarities suggest the potential for using LLMs as models of human cognition. However, significant challenges remain, including the extensive data LLMs are trained on and the unclear origins of these behavioural similarities.
The suitability of LLMs as models of human cognition is debated due to several issues. LLMs are trained on much larger datasets than humans and may have been exposed to test questions, leading to artificial enhancements in human-like behaviors through value alignment processes. Despite these challenges, fine-tuning LLMs, such as the LLaMA-1-65B model, on human choice datasets has improved accuracy in predicting human behavior. Prior research has also highlighted the importance of synthetic datasets in enhancing LLM capabilities, particularly in problem-solving tasks like arithmetic. Pretraining on such datasets can significantly improve performance in predicting human decisions.
Researchers from Princeton University and Warwick University propose enhancing the utility of LLMs as cognitive models by (i) utilizing computationally equivalent tasks that both LLMs and rational agents must master for cognitive problem-solving and (ii) examining task distributions required for LLMs to exhibit human-like behaviors. Applied to decision-making, specifically risky and intertemporal choice, Arithmetic-GPT, an LLM pretrained on an ecologically valid arithmetic dataset, predicts human behavior better than many traditional cognitive models. This pretraining suffices to align LLMs closely with human decision-making.