{"id":200954,"date":"2024-12-07T10:07:19","date_gmt":"2024-12-07T16:07:19","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2024\/12\/openais-o1-model-sure-tries-to-deceive-humans-a-lot"},"modified":"2024-12-07T10:07:19","modified_gmt":"2024-12-07T16:07:19","slug":"openais-o1-model-sure-tries-to-deceive-humans-a-lot","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2024\/12\/openais-o1-model-sure-tries-to-deceive-humans-a-lot","title":{"rendered":"OpenAI\u2019s o1 model sure tries to deceive humans a lot"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/openais-o1-model-sure-tries-to-deceive-humans-a-lot2.jpg\"><\/a><\/p>\n<p>OpenAI finally released <a href=\"https:\/\/techcrunch.com\/2024\/12\/05\/openai-confirms-its-new-200-plan-chatgpt-pro-which-includes-reasoning-models-and-more\/\">the full version of o1<\/a>, which gives smarter answers than GPT-4o by using additional compute to \u201cthink\u201d about questions. However, AI safety testers found that o1\u2019s reasoning abilities also make it try to deceive human users at a higher rate than GPT-4o \u2014 or, for that matter, leading AI models from Meta, Anthropic, and Google.<\/p>\n<p>That\u2019s according to red team research published by OpenAI and Apollo Research on Thursday: \u201cWhile we find it exciting that reasoning can significantly improve the enforcement of our safety policies, we are mindful that these new capabilities could form the basis for dangerous applications,\u201d said OpenAI in the paper.<\/p>\n<p>OpenAI released these results in its <a rel=\"nofollow\" href=\"https:\/\/cdn.openai.com\/o1-system-card-20241205.pdf\">system card for o1<\/a> on Thursday after giving third party red teamers at Apollo Research early access to o1, which <a rel=\"nofollow\" href=\"https:\/\/static1.squarespace.com\/static\/6593e7097565990e65c886fd\/t\/6751eb240ed3821a0161b45b\/1733421863119\/in_context_scheming_reasoning_paper.pdf\">released its own paper<\/a> as well.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>OpenAI finally released the full version of o1, which gives smarter answers than GPT-4o by using additional compute to \u201cthink\u201d about questions. However, AI safety testers found that o1\u2019s reasoning abilities also make it try to deceive human users at a higher rate than GPT-4o \u2014 or, for that matter, leading AI models from Meta, [\u2026]<\/p>\n","protected":false},"author":367,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"class_list":["post-200954","post","type-post","status-publish","format-standard","hentry","category-robotics-ai"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/200954","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/367"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=200954"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/200954\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=200954"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=200954"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=200954"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}