{"id":186980,"date":"2024-04-08T08:23:27","date_gmt":"2024-04-08T13:23:27","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2024\/04\/paper-page-direct-nash-optimization-teaching-language-models-to-self-improve-with-general-preferences"},"modified":"2024-04-08T08:23:27","modified_gmt":"2024-04-08T13:23:27","slug":"paper-page-direct-nash-optimization-teaching-language-models-to-self-improve-with-general-preferences","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2024\/04\/paper-page-direct-nash-optimization-teaching-language-models-to-self-improve-with-general-preferences","title":{"rendered":"Paper page \u2014 Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/paper-page-direct-nash-optimization-teaching-language-models-to-self-improve-with-general-preferences2.jpg\"><\/a><\/p>\n<p>From Microsoft.<\/p>\n<p>Direct Nash Optimization.<\/p>\n<p>Teaching Language Models to Self-Improve with General Preferences.<\/p>\n<p>This paper studies post-training large language models (LLMs) using #preference feedback from a powerful oracle to help a model iteratively improve over\u2026<\/p>\n<hr>\n<p>Join the discussion on this paper page.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>From Microsoft. Direct Nash Optimization. Teaching Language Models to Self-Improve with General Preferences. This paper studies post-training large language models (LLMs) using #preference feedback from a powerful oracle to help a model iteratively improve over\u2026 Join the discussion on this paper page.<\/p>\n","protected":false},"author":709,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[32],"tags":[],"class_list":["post-186980","post","type-post","status-publish","format-standard","hentry","category-education"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/186980","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/709"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=186980"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/186980\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=186980"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=186980"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=186980"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}