{"id":223018,"date":"2025-10-07T23:12:52","date_gmt":"2025-10-08T04:12:52","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2025\/10\/how-one-ai-model-creates-a-physical-intuition-of-its-environment"},"modified":"2025-10-07T23:12:52","modified_gmt":"2025-10-08T04:12:52","slug":"how-one-ai-model-creates-a-physical-intuition-of-its-environment","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2025\/10\/how-one-ai-model-creates-a-physical-intuition-of-its-environment","title":{"rendered":"How One AI Model Creates a Physical Intuition of Its Environment"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/how-one-ai-model-creates-a-physical-intuition-of-its-environment.jpg\"><\/a><\/p>\n<p>Once this pretraining stage is complete, the next step is to tailor V-JEPA to accomplish specific tasks such as classifying images or identifying actions depicted in videos. This adaptation phase requires some human-labeled data. For example, videos have to be tagged with information about the actions contained in them. The adaptation for the final tasks requires much less labeled data than if the whole system had been trained end to end for specific downstream tasks. In addition, the same encoder and predictor networks can be adapted for different tasks.<\/p>\n<p><b><strong>Intuition Mimic<\/strong><\/b><\/p>\n<p>In February, the V-JEPA team <a href=\"https:\/\/arxiv.org\/html\/2502.11831v1\">reported<\/a> how their systems did at understanding the intuitive physical properties of the real world \u2014 properties such as object permanence, the constancy of shape and color, and the effects of gravity and collisions. On a test called <a href=\"https:\/\/arxiv.org\/abs\/1803.07616\">IntPhys<\/a>, which requires AI models to identify if the actions happening in a video are physically plausible or implausible, V-JEPA was nearly 98% accurate. A well-known model that predicts in pixel space was only a little better than chance.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Once this pretraining stage is complete, the next step is to tailor V-JEPA to accomplish specific tasks such as classifying images or identifying actions depicted in videos. This adaptation phase requires some human-labeled data. For example, videos have to be tagged with information about the actions contained in them. The adaptation for the final tasks [\u2026]<\/p>\n","protected":false},"author":661,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6,8],"tags":[],"class_list":["post-223018","post","type-post","status-publish","format-standard","hentry","category-robotics-ai","category-space"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/223018","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/661"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=223018"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/223018\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=223018"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=223018"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=223018"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}