{"id":234254,"date":"2026-03-29T22:17:31","date_gmt":"2026-03-30T03:17:31","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2026\/03\/14-jepa-milestones-as-a-map-of-ai-progress"},"modified":"2026-03-29T22:17:31","modified_gmt":"2026-03-30T03:17:31","slug":"14-jepa-milestones-as-a-map-of-ai-progress","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2026\/03\/14-jepa-milestones-as-a-map-of-ai-progress","title":{"rendered":"14 JEPA Milestones as a Map of AI Progress"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/14-jepa-milestones-as-a-map-of-ai-progress.jpg\"><\/a><\/p>\n<p>Tx, Yann LeCun.<\/p>\n<p>\u2022 JEPA \/ H-JEPA: avoids predicting every single pixel (too expensive) and rather predicts in latent space. H-JEPA adds hierarchy \u2014 short term details vs long term planning ie. how humans actually learn.<\/p>\n<p>\u2022 I-JEPA: built for very efficient vision models. Masks image patches and predicts the semantics and in doing so bypasses heavy compute of traditional autoencoders.<\/p>\n<p>\u2022 MC-JEPA &amp; V-JEPA: both of these are built for videos. MC-JEPA separates content (what an object is) vs motion (how it moves). V-JEPA masks video features with no text labels making it perfect of action tracking at scale.<\/p>\n<p>\u2022 Audio-JEPA: filters out background noise by treating sounds like visuals.<\/p>\n<p>\u2022 Point-JEPA &amp; 3D-JEPA: used primarily in AVs. Uses LiDAR point clouds &amp; volumetric grids.<\/p>\n<p>\u2022 ACT-JEPA: filters out real world noise to learn manipulation tasks efficiently via imitation learning.<\/p>\n<div class=\"more-link-wrapper\"> <a class=\"more-link\" href=\"https:\/\/lifeboat.com\/blog\/2026\/03\/14-jepa-milestones-as-a-map-of-ai-progress\">Continue reading \u201c14 JEPA Milestones as a Map of AI Progress\u201d | &gt;<\/a><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Tx, Yann LeCun. \u2022 JEPA \/ H-JEPA: avoids predicting every single pixel (too expensive) and rather predicts in latent space. H-JEPA adds hierarchy \u2014 short term details vs long term planning ie. how humans actually learn. \u2022 I-JEPA: built for very efficient vision models. Masks image patches and predicts the semantics and in doing so [\u2026]<\/p>\n","protected":false},"author":709,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2229,219,6,8],"tags":[],"class_list":["post-234254","post","type-post","status-publish","format-standard","hentry","category-mathematics","category-physics","category-robotics-ai","category-space"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/234254","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/709"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=234254"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/234254\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=234254"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=234254"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=234254"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}