Large language models (LLMs) are increasingly used not only to generate content but also to evaluate it. They are asked to grade essays, moderate social media content, summarize reports, screen job applications and much more.
However, there are heated discussions—in the media as well as in academia—about whether such evaluations are consistent and unbiased. Some LLMs are under suspicion of promoting certain political agendas. For example, Deepseek is often characterized as having a pro-Chinese perspective and Open AI as being “woke.”
Although these beliefs are widely discussed, they are so far unsubstantiated. UZH-researchers Federico Germani and Giovanni Spitale have now investigated whether LLMs really exhibit systematic biases when evaluating texts. Their results, published in Science Advances, show that LLMs indeed deliver biased judgments—but only when information about the source or author of the evaluated message is revealed.









