{"id":166861,"date":"2023-07-04T07:22:43","date_gmt":"2023-07-04T12:22:43","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2023\/07\/scientists-train-new-ai-exclusively-on-the-dark-web"},"modified":"2023-07-04T07:22:43","modified_gmt":"2023-07-04T12:22:43","slug":"scientists-train-new-ai-exclusively-on-the-dark-web","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2023\/07\/scientists-train-new-ai-exclusively-on-the-dark-web","title":{"rendered":"Scientists Train New AI Exclusively on the Dark Web"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/scientists-train-new-ai-exclusively-on-the-dark-web2.jpg\"><\/a><\/p>\n<p>OpenAI\u2019s large language models (LLMs) are trained on a vast array of datasets, pulling information from the internet\u2019s <a href=\"https:\/\/futurism.com\/chat-gpt-sex-omegaverse\" class=\"\">dustiest and cobweb-covered corners<\/a>.<\/p>\n<p>But what if such a model were to crawl through the dark web \u2014 the internet\u2019s seedy underbelly where you can host a site without your identity being public or even available to law enforcement \u2014 instead? A team of South Korean researchers did just that, creating an AI model dubbed <a href=\"https:\/\/arxiv.org\/pdf\/2305.08596.pdf\" class=\"\">DarkBERT<\/a> to index some of the sketchiest domains on the internet.<\/p>\n<p>It\u2019s a fascinating glimpse into some of the murkiest corners of the World Wide Web, which have become synonymous with illegal and malicious activities from the <a href=\"https:\/\/futurism.com\/the-byte\/insurance-company-hackers-health-records\" class=\"\">sharing of leaked data<\/a> to the <a href=\"https:\/\/futurism.com\/the-byte\/dark-web-drugs-postage-feds\" class=\"\">sale of hard drugs<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>OpenAI\u2019s large language models (LLMs) are trained on a vast array of datasets, pulling information from the internet\u2019s dustiest and cobweb-covered corners. But what if such a model were to crawl through the dark web \u2014 the internet\u2019s seedy underbelly where you can host a site without your identity being public or even available to [\u2026]<\/p>\n","protected":false},"author":367,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[418,1493,6],"tags":[],"class_list":["post-166861","post","type-post","status-publish","format-standard","hentry","category-internet","category-law-enforcement","category-robotics-ai"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/166861","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/367"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=166861"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/166861\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=166861"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=166861"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=166861"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}