Dec 7, 2024
OpenAI’s o1 model sure tries to deceive humans a lot
Posted by Shailesh Prasad in category: robotics/AI
OpenAI finally released the full version of o1, which gives smarter answers than GPT-4o by using additional compute to “think” about questions. However, AI safety testers found that o1’s reasoning abilities also make it try to deceive human users at a higher rate than GPT-4o — or, for that matter, leading AI models from Meta, Anthropic, and Google.
That’s according to red team research published by OpenAI and Apollo Research on Thursday: “While we find it exciting that reasoning can significantly improve the enforcement of our safety policies, we are mindful that these new capabilities could form the basis for dangerous applications,” said OpenAI in the paper.
OpenAI released these results in its system card for o1 on Thursday after giving third party red teamers at Apollo Research early access to o1, which released its own paper as well.