{"id":188072,"date":"2024-04-24T22:32:29","date_gmt":"2024-04-25T03:32:29","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2024\/04\/openais-new-instruction-hierarchy-could-make-ai-models-harder-to-fool"},"modified":"2024-04-24T22:32:29","modified_gmt":"2024-04-25T03:32:29","slug":"openais-new-instruction-hierarchy-could-make-ai-models-harder-to-fool","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2024\/04\/openais-new-instruction-hierarchy-could-make-ai-models-harder-to-fool","title":{"rendered":"OpenAI\u2019s new \u2018instruction hierarchy\u2019 could make AI models harder to fool"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/openais-new-instruction-hierarchy-could-make-ai-models-harder-to-fool2.jpg\"><\/a><\/p>\n<p>1\/ OpenAI researchers have proposed a new instruction hierarchy approach to reduce the vulnerability of large language models (LLMs) to prompt injection attacks and jailbreaks.<\/p>\n<hr>\n<p>\n<strong>OpenAI researchers propose an instruction hierarchy for AI language models. It is intended to reduce vulnerability to prompt injection attacks and jailbreaks. Initial results are promising.<\/strong><\/p>\n<p>Language models (LLMs) are vulnerable to <a href=\"https:\/\/the-decoder.com\/prompt-injection-gpt-3-has-a-serious-security-flaw\/\">prompt injection attacks<\/a> and <a href=\"https:\/\/the-decoder.com\/gpt-4-is-vulnerable-to-jailbreaks-in-rare-languages\/\">jailbreaks<\/a>, where attackers replace the model\u2019s original instructions with their own malicious prompts.<\/p>\n<p>OpenAI researchers argue that a key vulnerability is that LLMs often give system prompts from developers the same priority as texts from untrusted users and third parties.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>1\/ OpenAI researchers have proposed a new instruction hierarchy approach to reduce the vulnerability of large language models (LLMs) to prompt injection attacks and jailbreaks. OpenAI researchers propose an instruction hierarchy for AI language models. It is intended to reduce vulnerability to prompt injection attacks and jailbreaks. Initial results are promising. Language models (LLMs) are [\u2026]<\/p>\n","protected":false},"author":359,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"class_list":["post-188072","post","type-post","status-publish","format-standard","hentry","category-robotics-ai"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/188072","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/359"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=188072"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/188072\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=188072"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=188072"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=188072"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}