{"id":180913,"date":"2024-01-19T14:25:57","date_gmt":"2024-01-19T20:25:57","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2024\/01\/a-simple-technique-to-defend-chatgpt-against-jailbreak-attacks"},"modified":"2024-01-19T14:25:57","modified_gmt":"2024-01-19T20:25:57","slug":"a-simple-technique-to-defend-chatgpt-against-jailbreak-attacks","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2024\/01\/a-simple-technique-to-defend-chatgpt-against-jailbreak-attacks","title":{"rendered":"A simple technique to defend ChatGPT against jailbreak attacks"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/a-simple-technique-to-defend-chatgpt-against-jailbreak-attacks3.jpg\"><\/a><\/p>\n<p>Large language models (LLMs), deep learning-based models trained to generate, summarize, translate and process written texts, have gained significant attention after the release of Open AI\u2019s conversational platform ChatGPT. While ChatGPT and similar platforms are now widely used for a wide range of applications, they could be vulnerable to a specific type of cyberattack producing biased, unreliable or even offensive responses.<\/p>\n<p>Researchers at Hong Kong University of Science and Technology, University of Science and Technology of China, Tsinghua University and Microsoft Research Asia recently carried out a study investigating the potential impact of these attacks and techniques that could protect models against them. Their <a href=\"https:\/\/www.nature.com\/articles\/s42256-023-00765-8\">paper<\/a>, published in <i>Nature Machine Intelligence<\/i>, introduces a new psychology-inspired technique that could help to protect ChatGPT and similar LLM-based conversational platforms from cyberattacks.<\/p>\n<p>\u201cChatGPT is a societally impactful artificial intelligence tool with millions of users and integration into products such as Bing,\u201d Yueqi Xie, Jingwei Yi and their colleagues write in their paper. \u201cHowever, the emergence of <a href=\"https:\/\/techxplore.com\/tags\/jailbreak\/\" rel=\"tag\" class=\"\">jailbreak<\/a> attacks notably threatens its responsible and secure use. Jailbreak attacks use adversarial prompts to bypass ChatGPT\u2019s ethics safeguards and engender harmful responses.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Large language models (LLMs), deep learning-based models trained to generate, summarize, translate and process written texts, have gained significant attention after the release of Open AI\u2019s conversational platform ChatGPT. While ChatGPT and similar platforms are now widely used for a wide range of applications, they could be vulnerable to a specific type of cyberattack producing [\u2026]<\/p>\n","protected":false},"author":662,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[34,30,6],"tags":[],"class_list":["post-180913","post","type-post","status-publish","format-standard","hentry","category-cybercrime-malcode","category-ethics","category-robotics-ai"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/180913","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/662"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=180913"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/180913\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=180913"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=180913"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=180913"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}