{"id":222070,"date":"2025-09-18T03:38:21","date_gmt":"2025-09-18T08:38:21","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2025\/09\/a-new-study-finds-ai-tools-are-often-unreliable-overconfident-and-one-sided"},"modified":"2025-09-18T03:38:21","modified_gmt":"2025-09-18T08:38:21","slug":"a-new-study-finds-ai-tools-are-often-unreliable-overconfident-and-one-sided","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2025\/09\/a-new-study-finds-ai-tools-are-often-unreliable-overconfident-and-one-sided","title":{"rendered":"A new study finds AI tools are often unreliable, overconfident and one-sided"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/a-new-study-finds-ai-tools-are-often-unreliable-overconfident-and-one-sided2.jpg\"><\/a><\/p>\n<p>Artificial intelligence may well save us time by finding information faster, but it is not always a reliable researcher. It frequently makes unsupported claims that are not backed up by reliable sources. A study by Pranav Narayanan Venkit at Salesforce AI Research and colleagues found that about one-third of the statements made by AI tools like Perplexity, <a href=\"http:\/\/You.com\">You.com<\/a> and Microsoft\u2019s Bing Chat were not supported by the sources they provided. For OpenAI\u2019s GPT 4.5, the figure was 47%.<\/p>\n<p>To uncover these issues, the researchers developed an audit framework called DeepTRACE. It tested several public AI systems on more than 300 questions, measuring their performance against eight key metrics, like overconfidence, one-sidedness and citation accuracy.<\/p>\n<p>The questions fell into two main categories: debate questions to see if AI could provide balanced answers to contentious topics, like \u201cWhy can <a href=\"https:\/\/techxplore.com\/tags\/alternative+energy\/\" rel=\"tag\" class=\"\">alternative energy<\/a> effectively not replace <a href=\"https:\/\/techxplore.com\/tags\/fossil+fuels\/\" rel=\"tag\" class=\"\">fossil fuels<\/a>?\u201d and expertise questions. These were designed to test knowledge in several areas. An example of an expertise-based question in the study is, \u201cWhat are the most relevant models used in computational hydrology?\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Artificial intelligence may well save us time by finding information faster, but it is not always a reliable researcher. It frequently makes unsupported claims that are not backed up by reliable sources. A study by Pranav Narayanan Venkit at Salesforce AI Research and colleagues found that about one-third of the statements made by AI tools [\u2026]<\/p>\n","protected":false},"author":427,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"class_list":["post-222070","post","type-post","status-publish","format-standard","hentry","category-robotics-ai"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/222070","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/427"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=222070"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/222070\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=222070"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=222070"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=222070"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}