Feizi and his coauthors looked at how easy it is for bad actors to evade watermarking attempts. (He calls it “washing out” the watermark.) In addition to demonstrating how attackers might remove watermarks, the study shows how it’s possible to add watermarks to human-generated images, triggering false positives. Released online this week, the preprint paper has yet to be peer-reviewed, but Feizi has been a leading figure in AI detection, so it’s worth paying attention to, even at this early stage.
It’s timely research. Watermarking has emerged as one of the more promising strategies to identify AI-generated images and text. Just as physical watermarks are embedded on paper money and stamps to prove authenticity, digital watermarks are meant to trace the origins of images and text online, helping people spot deepfaked videos and bot-authored books. With the US presidential elections on the horizon in 2024, concerns over manipulated media are high—and some people are already getting fooled. Former US president Donald Trump, for instance, shared a fake video of Anderson Cooper on his Truth Social platform; Cooper’s voice had been AI-cloned.
This summer, OpenAI, Alphabet, Meta, Amazon, and several other major AI players pledged to develop watermarking technology to combat misinformation. In late August, Google’s DeepMind released a beta version of its new watermarking tool, SynthID. The hope is that these tools will flag AI content as it’s being generated, in the same way that physical watermarking authenticates dollars as they’re being printed.