{"id":174083,"date":"2023-10-13T03:23:13","date_gmt":"2023-10-13T08:23:13","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2023\/10\/multimodality-and-large-multimodal-models-lmms"},"modified":"2023-10-13T03:23:13","modified_gmt":"2023-10-13T08:23:13","slug":"multimodality-and-large-multimodal-models-lmms","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2023\/10\/multimodality-and-large-multimodal-models-lmms","title":{"rendered":"Multimodality and Large Multimodal Models (LMMs)"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/multimodality-and-large-multimodal-models-lmms3.jpg\"><\/a><\/p>\n<p>For a long time, each ML model operated in one data mode \u2013 text (translation, language modeling), image (object detection, image classification), or audio (speech recognition).<\/p>\n<p>However, natural intelligence is not limited to just a single modality. Humans can read and write text. We can see images and watch videos. We listen to music to relax and watch out for strange noises to detect danger. Being able to work with multimodal data is essential for us or any AI to operate in the real world.<\/p>\n<p>OpenAI noted in their <a href=\"https:\/\/cdn.openai.com\/papers\/GPTV_System_Card.pdf\">GPT-4V system card<\/a> that \u201c<em>incorporating additional modalities (such as image inputs) into LLMs is viewed by some as a key frontier in AI research and development<\/em>.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"<p>For a long time, each ML model operated in one data mode \u2013 text (translation, language modeling), image (object detection, image classification), or audio (speech recognition). However, natural intelligence is not limited to just a single modality. Humans can read and write text. We can see images and watch videos. We listen to music to [\u2026]<\/p>\n","protected":false},"author":556,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[42,6],"tags":[],"class_list":["post-174083","post","type-post","status-publish","format-standard","hentry","category-media-arts","category-robotics-ai"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/174083","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/556"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=174083"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/174083\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=174083"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=174083"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=174083"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}