Large language models surpass human experts in predicting neuroscience results, according to a study published in Nature Human Behaviour.
Scientific research is increasingly challenging due to the immense growth in published literature. Integrating noisy and voluminous findings to predict outcomes often exceeds human capacity. This investigation was motivated by the growing role of artificial intelligence in tasks such as protein folding and drug discovery, raising the question of whether LLMs could similarly enhance fields like neuroscience.
Xiaoliang Luo and colleagues developed BrainBench, a benchmark designed to test whether LLMs could predict the results of neuroscience studies more accurately than human experts. BrainBench included 200 test cases based on neuroscience research abstracts. Each test case consisted of two versions of the same abstract: one was the original, and the other had a modified result that changed the study’s conclusion but kept the rest of the abstract coherent. Participants—both LLMs and human experts—were tasked with identifying which version was correct.