New research into large language models shows that they repeat conspiracy theories, harmful stereotypes, and other forms of misinformation.
In a recent study, researchers at the University of Waterloo systematically tested an early version of ChatGPT’s understanding of statements in six categories: facts, conspiracies, controversies, misconceptions, stereotypes, and fiction. This was part of Waterloo researchers’ efforts to investigate human-technology interactions and explore how to mitigate risks.
They discovered that GPT-3 frequently made mistakes, contradicted itself within the course of a single answer, and repeated harmful misinformation. The study, “Reliability Check: An Analysis of GPT-3’s Response to Sensitive Topics and Prompt Wording,” was published in Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing.
Comments are closed.