{"id":173584,"date":"2023-10-05T22:26:01","date_gmt":"2023-10-06T03:26:01","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2023\/10\/a-new-ai-lie-detector-can-reveal-its-inner-thoughts"},"modified":"2023-10-05T22:26:01","modified_gmt":"2023-10-06T03:26:01","slug":"a-new-ai-lie-detector-can-reveal-its-inner-thoughts","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2023\/10\/a-new-ai-lie-detector-can-reveal-its-inner-thoughts","title":{"rendered":"A new AI lie detector can reveal its \u201cinner thoughts\u201d"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/a-new-ai-lie-detector-can-reveal-its-inner-thoughts2.jpg\"><\/a><\/p>\n<p>\u201cWish I had this to cite,\u201d lamented Jacob Andreas, a professor at MIT, who had just published a paper exploring the extent to which language models mirror the internal motivations of human communicators.<\/p>\n<p>Jan Leike, the head of alignment at OpenAI, who is chiefly responsible for guiding new models like GPT-4 to help, rather than harm, human progress, responded to the paper by offering Burns a job, which Burns initially declined, before a personal appeal from Sam Altman, the cofounder and CEO of OpenAI, changed his mind.<\/p>\n<p>\u201cCollin\u2019s work on \u2018Discovering Latent Knowledge in Language Models Without Supervision\u2019 is a novel approach to determining what language models truly believe about the world,\u201d Leike says. \u201cWhat\u2019s exciting about his work is that it can work in situations where humans don\u2019t actually know what\u2019s true themselves, so it could apply to systems that are smarter than humans.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u201cWish I had this to cite,\u201d lamented Jacob Andreas, a professor at MIT, who had just published a paper exploring the extent to which language models mirror the internal motivations of human communicators. Jan Leike, the head of alignment at OpenAI, who is chiefly responsible for guiding new models like GPT-4 to help, rather than [\u2026]<\/p>\n","protected":false},"author":359,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"class_list":["post-173584","post","type-post","status-publish","format-standard","hentry","category-robotics-ai"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/173584","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/359"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=173584"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/173584\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=173584"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=173584"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=173584"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}