{"id":185395,"date":"2024-03-19T06:26:30","date_gmt":"2024-03-19T11:26:30","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2024\/03\/natural-language-instructions-induce-compositional-generalization-in-networks-of-neurons"},"modified":"2024-03-19T06:26:30","modified_gmt":"2024-03-19T11:26:30","slug":"natural-language-instructions-induce-compositional-generalization-in-networks-of-neurons","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2024\/03\/natural-language-instructions-induce-compositional-generalization-in-networks-of-neurons","title":{"rendered":"Natural language instructions induce compositional generalization in networks of neurons"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/natural-language-instructions-induce-compositional-generalization-in-networks-of-neurons2.jpg\"><\/a><\/p>\n<p>In this study, we use the latest advances in natural language processing to build tractable models of the ability to interpret instructions to guide actions in novel settings and the ability to produce a description of a task once it has been learned. RNNs can learn to perform a set of psychophysical tasks simultaneously using a pretrained language transformer to embed a natural language instruction for the current task. Our best-performing models can leverage these embeddings to perform a brand-new model with an average performance of 83% correct. Instructed models that generalize performance do so by leveraging the shared compositional structure of instruction embeddings and task representations, such that an inference about the relations between practiced and novel instructions leads to a good inference about what sensorimotor transformation is required for the unseen task. Finally, we show a network can invert this information and provide a linguistic description for a task based only on the sensorimotor contingency it observes.<\/p>\n<p>Our models make several predictions for what neural representations to expect in brain areas that integrate linguistic information in order to exert control over sensorimotor areas. Firstly, the CCGP analysis of our model hierarchy suggests that when humans must generalize across (or switch between) a set of related tasks based on instructions, the neural geometry observed among sensorimotor mappings should also be present in semantic representations of instructions. This prediction is well grounded in the existing experimental literature where multiple studies have observed the type of abstract structure we find in our sensorimotor-RNNs also exists in sensorimotor areas of biological brains<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\" title=\"Bernardi, S. et al. The geometry of abstraction in the hippocampus and prefrontal cortex. Cell 183954&ndash;967 (2020).\" href=\"https:\/\/www.nature.com\/articles\/s41593-024-01607-5#ref-CR3\" id=\"ref-link-section-d114193094e1615\">3<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 36\" title=\"Panichello, M. F. & Buschman, T. J. Shared mechanisms underlie the control of working memory and attention. Nature 592601&ndash;605 (2021).\" href=\"https:\/\/www.nature.com\/articles\/s41593-024-01607-5#ref-CR36\" id=\"ref-link-section-d114193094e1618\">36<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 37\" title=\"Nieh, E. H. et al. Geometry of abstract learned knowledge in the hippocampus. Nature 595, 80&ndash;84 (2021).\" href=\"https:\/\/www.nature.com\/articles\/s41593-024-01607-5#ref-CR37\" id=\"ref-link-section-d114193094e1621\">37<\/a><\/sup>. Our models theorize that the emergence of an equivalent task-related structure in language areas is essential to instructed action in humans. One intriguing candidate for an area that may support such representations is the language selective subregion of the left inferior frontal gyrus. This area is sensitive to both lexico-semantic and syntactic aspects of sentence comprehension, is implicated in tasks that require semantic control and lies anatomically adjacent to another functional subregion of the left inferior frontal gyrus, which is implicated in flexible cognition<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Fedorenko, E. & Blank, I. A. Broca\u2019s area is not a natural kind. Trends Cogn. Sci. 24270&ndash;284 (2020).\" href=\"https:\/\/www.nature.com\/articles\/s41593-024-01607-5#ref-CR38\" id=\"ref-link-section-d114193094e1625\">38<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Fedorenko, E., Duncan, J. & Kanwisher, N. Language-selective and domain-general regions lie side by side within broca\u2019s area. Curr. Biol. 22, 2059&ndash;2062 (2012).\" href=\"https:\/\/www.nature.com\/articles\/s41593-024-01607-5#ref-CR39\" id=\"ref-link-section-d114193094e1625_1\">39<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Gao, Z. et al. Distinct and common neural coding of semantic and non-semantic control demands. NeuroImage 236, 118230 (2021).\" href=\"https:\/\/www.nature.com\/articles\/s41593-024-01607-5#ref-CR40\" id=\"ref-link-section-d114193094e1625_2\">40<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 41\" title=\"Duncan, J. The multiple-demand (MD) system of the primate brain: mental programs for intelligent behaviour. Trends Cogn. Sci. 14172&ndash;179 (2010).\" href=\"https:\/\/www.nature.com\/articles\/s41593-024-01607-5#ref-CR41\" id=\"ref-link-section-d114193094e1628\">41<\/a><\/sup>. We also predict that individual units involved in implementing sensorimotor mappings should modulate their tuning properties on a trial-by-trial basis according to the semantics of the input instructions, and that failure to modulate tuning in the expected way should lead to poor generalization. This prediction may be especially useful to interpret multiunit recordings in humans. Finally, given that grounding linguistic knowledge in the sensorimotor demands of the task set improved performance across models (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"https:\/\/www.nature.com\/articles\/s41593-024-01607-5#Fig2\">2e<\/a>), we predict that during learning the highest level of the language processing hierarchy should likewise be shaped by the embodied processes that accompany linguistic inputs, for example, motor planning or affordance evaluation<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 42\" title=\"Buccino, G., Colag\u00e9, I., Gobbi, N. & Bonaccorso, G. Grounding meaning in experience: a broad perspective on embodied language. Neurosci. Biobehav. Rev. 69, 69&ndash;78 (2016).\" href=\"https:\/\/www.nature.com\/articles\/s41593-024-01607-5#ref-CR42\" id=\"ref-link-section-d114193094e1635\">42<\/a><\/sup>.<\/p>\n<p>One notable negative result of our study is the relatively poor generalization performance of GPTNET (XL), which used at least an order of magnitude more parameters than other models. This is particularly striking given that activity in these models is predictive of many behavioral and neural signatures of human language processing<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 10\" title=\"Schrimpf, M. et al. The neural architecture of language: integrative modeling converges on predictive processing. Proc. Natl Acad. Sci. USA https:\/\/doi.org\/10.1073\/pnas.2105646118 (2021).\" href=\"https:\/\/www.nature.com\/articles\/s41593-024-01607-5#ref-CR10\" id=\"ref-link-section-d114193094e1642\">10<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 11\" title=\"Goldstein, A. et al. Shared computational principles for language processing in humans and deep language models. Nature Neurosci. 25369&ndash;380 (2022).\" href=\"https:\/\/www.nature.com\/articles\/s41593-024-01607-5#ref-CR11\" id=\"ref-link-section-d114193094e1645\">11<\/a><\/sup>. Given this, future imaging studies may be guided by the representations in both autoregressive models and our best-performing models to delineate a full gradient of brain areas involved in each stage of instruction following, from low-level next-word prediction to higher-level structured-sentence representations to the sensorimotor control that language informs.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this study, we use the latest advances in natural language processing to build tractable models of the ability to interpret instructions to guide actions in novel settings and the ability to produce a description of a task once it has been learned. RNNs can learn to perform a set of psychophysical tasks simultaneously using [\u2026]<\/p>\n","protected":false},"author":661,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3,6],"tags":[],"class_list":["post-185395","post","type-post","status-publish","format-standard","hentry","category-biological","category-robotics-ai"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/185395","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/661"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=185395"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/185395\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=185395"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=185395"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=185395"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}