‘The present results strengthen the possibility of a paradigm shift in neuroscience… moving from the fragmented mapping of isolated cognitive tasks toward the use of unified, predictive foundation models of brain and cognitive functions By aligning the representations of Al systems to those of the human brain, we demonstrate that a single architecture can integrate a vast range of fMRI responses across hundreds of individuals, extending the framework that led the 2025 Algonauts competition. The observed log-linear scaling of encoding accuracy mirroring power laws in both artificial intelligence and neuroscience suggests that the ceiling for predicting human brain activity is yet to be reached.’
Cognitive neuroscience is fragmented into specialized models, each tailored to specific experimental paradigms, hence preventing a unified model of cognition in the human brain. Here, we introduce TRIBE v2, a tri-modal (video, audio and language) foundation model capable of predicting human brain activity in a variety of naturalistic and experimental conditions. Leveraging a unified dataset of over 1,000 hours of fMRI across 720 subjects, we demonstrate that our model accurately predicts high-resolution brain responses for novel stimuli, tasks and subjects, superseding traditional linear encoding models, delivering several-fold improvements in accuracy. Critically, TRIBE v2 enables in silico experimentation: tested on seminal visual and neuro-linguistic paradigms, it recovers a variety of results established by decades of empirical research.







