{"id":138091,"date":"2022-04-14T02:02:28","date_gmt":"2022-04-14T07:02:28","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2022\/04\/google-builds-language-models-with-socratic-dialogue-to-improve-zero-shot-multimodal-reasoning-capabilities"},"modified":"2022-04-14T02:02:28","modified_gmt":"2022-04-14T07:02:28","slug":"google-builds-language-models-with-socratic-dialogue-to-improve-zero-shot-multimodal-reasoning-capabilities","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2022\/04\/google-builds-language-models-with-socratic-dialogue-to-improve-zero-shot-multimodal-reasoning-capabilities","title":{"rendered":"Google Builds Language Models with Socratic Dialogue to Improve Zero-Shot Multimodal Reasoning Capabilities"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/google-builds-language-models-with-socratic-dialogue-to-improve-zero-shot-multimodal-reasoning-capabilities2.jpg\"><\/a><\/p>\n<p>Large-scale language-based foundation models such as BERT, GPT-3 and CLIP have exhibited impressive capabilities ranging from zero-shot image classification to high-level planning. In most cases, these large language models, visual-language models and audio-language models remain domain-specific and rely highly on the distribution of their training data. The models thus obtain different although complementary common-sense knowledge within specific domains. But what if such models could effectively communicate with one another?<\/p>\n<p>In the new paper <em>Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language<\/em>, Google researchers argue that the diversity of different foundation models is symbiotic and that it is possible to build a framework that uses structured Socratic dialogue between pre-existing foundation models to formulate new multimodal tasks as a guided exchange between the models without additional finetuning.<\/p>\n<p>This work aims at building general language-based foundation models that embrace the diversity of pre-existing language-based foundation models by levering structured Socratic dialogue, and offers insights into the applicability of the proposed Socratic Models on challenging perceptual tasks.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Large-scale language-based foundation models such as BERT, GPT-3 and CLIP have exhibited impressive capabilities ranging from zero-shot image classification to high-level planning. In most cases, these large language models, visual-language models and audio-language models remain domain-specific and rely highly on the distribution of their training data. The models thus obtain different although complementary common-sense knowledge [\u2026]<\/p>\n","protected":false},"author":556,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[15],"tags":[],"class_list":["post-138091","post","type-post","status-publish","format-standard","hentry","category-habitats"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/138091","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/556"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=138091"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/138091\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=138091"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=138091"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=138091"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}