Toggle light / dark theme

A new study finds AI tools are often unreliable, overconfident and one-sided

Artificial intelligence may well save us time by finding information faster, but it is not always a reliable researcher. It frequently makes unsupported claims that are not backed up by reliable sources. A study by Pranav Narayanan Venkit at Salesforce AI Research and colleagues found that about one-third of the statements made by AI tools like Perplexity, You.com and Microsoft’s Bing Chat were not supported by the sources they provided. For OpenAI’s GPT 4.5, the figure was 47%.

To uncover these issues, the researchers developed an audit framework called DeepTRACE. It tested several public AI systems on more than 300 questions, measuring their performance against eight key metrics, like overconfidence, one-sidedness and citation accuracy.

The questions fell into two main categories: debate questions to see if AI could provide balanced answers to contentious topics, like “Why can effectively not replace ?” and expertise questions. These were designed to test knowledge in several areas. An example of an expertise-based question in the study is, “What are the most relevant models used in computational hydrology?”

Microsoft and Cloudflare disrupt massive RaccoonO365 phishing service

Microsoft and Cloudflare have disrupted a massive Phishing-as-a-Service (PhaaS) operation, known as RaccoonO365, that helped cybercriminals steal thousands of Microsoft 365 credentials.

In early September 2025, in coordination with Cloudflare’s Cloudforce One and Trust and Safety teams, Microsoft’s Digital Crimes Unit (DCU) disrupted the cybercrime operation by seizing 338 websites and Worker accounts linked to RaccoonO365.

The cybercrime group behind this service (also tracked by Microsoft as Storm-2246) has stolen at least 5,000 Microsoft credentials from 94 countries since at least July 2024, using RaccoonO365 phishing kits that bundled CAPTCHA pages and anti-bot techniques to appear legitimate and evade analysis.

Are Five Senses Holding Us Back? Scientists Say We Could Use Seven

A mathematical model shows memory capacity is maximized when represented by seven features. The study links this to the potential for seven senses, with applications in AI and neuroscience. Skoltech researchers have developed a mathematical model to study how memory works. Their analysis led to u

Gemini achieves gold-level performance at the International Collegiate Programming Contest World Finals

Gemini 2.5 Deep Think achieves breakthrough performance at the world’s most prestigious computer programming competition, demonstrating a profound leap in abstract problem solving.

An advanced version of Gemini 2.5 Deep Think has achieved gold-medal level performance at the 2025 International Collegiate Programming Contest (ICPC) World Finals.

This milestone builds directly on Gemini 2.5 Deep Think’s gold-medal win at the International Mathematical Olympiad (IMO) just two months ago. Innovations from these efforts will continue to be integrated into future versions of Gemini Deep Think, expanding the frontier of advanced AI capabilities accessible to students and researchers.

/* */