{"id":180636,"date":"2024-01-16T17:23:40","date_gmt":"2024-01-16T23:23:40","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2024\/01\/scientists-train-ai-to-be-evil-find-they-cant-reverse-it"},"modified":"2024-01-16T17:23:40","modified_gmt":"2024-01-16T23:23:40","slug":"scientists-train-ai-to-be-evil-find-they-cant-reverse-it","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2024\/01\/scientists-train-ai-to-be-evil-find-they-cant-reverse-it","title":{"rendered":"Scientists Train AI to Be Evil, Find They Can\u2019t Reverse It"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/scientists-train-ai-to-be-evil-find-they-cant-reverse-it2.jpg\"><\/a><\/p>\n<p>How hard would it be to train an AI model to be secretly evil? As it turns out, according to AI researchers, not very \u2014 and attempting to reroute a bad apple AI\u2019s more sinister proclivities might backfire in the long run.<\/p>\n<p>In a yet-to-be-peer-reviewed <a href=\"https:\/\/arxiv.org\/pdf\/2401.05566.pdf\" class=\"\">new paper<\/a>, researchers at the <a href=\"https:\/\/futurism.com\/the-byte\/google-invests-300-million-ai\" class=\"\">Google-backed<\/a> AI firm <a href=\"https:\/\/futurism.com\/the-byte\/openai-secretly-merge-anthropic\" class=\"\">Anthropic<\/a> claim they were able to train advanced large language models (LLMs) with \u201cexploitable code,\u201d meaning it can be triggered to prompt bad AI behavior via seemingly benign words or phrases. As the Anthropic researchers write in the paper, humans often engage in \u201cstrategically deceptive behavior,\u201d meaning \u201cbehaving helpfully in most situations, but then behaving very differently to pursue alternative objectives when given the opportunity.\u201d If an AI system were trained to do the same, the scientists wondered, could they \u201cdetect it and remove it using current state-of-the-art safety training techniques?\u201d<\/p>\n<p>Unfortunately, as it stands, the answer to that latter question appears to be a resounding \u201cno.\u201d The Anthropic scientists found that once a model is trained with exploitable code, it\u2019s exceedingly difficult \u2014 if not impossible \u2014 to train a machine <em>out <\/em>of its duplicitous tendencies. And what\u2019s worse, according to the paper, attempts to reign in and reconfigure a deceptive model may well reinforce its bad behavior, as a model might just learn how to better hide its transgressions.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>How hard would it be to train an AI model to be secretly evil? As it turns out, according to AI researchers, not very \u2014 and attempting to reroute a bad apple AI\u2019s more sinister proclivities might backfire in the long run. In a yet-to-be-peer-reviewed new paper, researchers at the Google-backed AI firm Anthropic claim [\u2026]<\/p>\n","protected":false},"author":705,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6,1491],"tags":[],"class_list":["post-180636","post","type-post","status-publish","format-standard","hentry","category-robotics-ai","category-transportation"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/180636","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/705"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=180636"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/180636\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=180636"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=180636"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=180636"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}