Dec 32025 How confessions can keep language models honest We’re sharing an early, proof-of-concept method that trains models to report when they break instructions or take unintended shortcuts.