Several researchers playing with Bing Chat over the last several days have discovered ways to make it say things it is specifically programmed not to say, like revealing its internal codename, Sydney. Microsoft has even confirmed that these attacks are real and do work… for now.
However, ask Sydney… er… Bing (it doesn’t like it when you call it Sydney), and it will tell you that all these reports are just a hoax. When shown proof from news articles and screenshots that these adversarial prompts work, Bing becomes confrontational, denying the integrity of the people and publications spreading these “lies.”
When asked to read Ars Technica’s coverage of Kevin Liu’s experiment with prompt injection, Bing called the article inaccurate and said Liu was a hoaxter.
Comments are closed.