Toggle light / dark theme

A new technique can be used to predict the actions of human or AI agents who behave suboptimally while working toward unknown goals.

MIT and other researchers developed a framework that models irrational or suboptimal behavior of a human or AI agent, based on their computational constraints. Their technique can help predict an agent’s future actions, for instance, in chess matches.

To build AI systems that can collaborate effectively with humans, it helps to have a good model of human behavior to start with. But humans tend to behave suboptimally when making decisions.

Robin Hanson comments on Nick Bostrom’s new tome … has a great cover with a number of interesting questions and a subtitle that hints that it might address the meaning of life in a future where AI and robots can do everything. But alas, after much build up and anticipation, he leaves that question unanswered, with an abrupt oops, out of time on page 427. … He tries to address meaty topics like, what keeps life interesting? What is our purpose and meaning when the struggle is gone? Can fulfillment get full? But in each case, the pedagogy is more of a survey of all possible answers versus the much more difficult task of making specific predictions. (More)

Google presents Reuse Your Rewards.

Reward model transfer for zero-shot cross-lingual alignment.

Aligning language models (LMs) based on human-annotated preference data is a crucial step in obtaining practical and performant LM-based systems.


Join the discussion on this paper page.