{"id":176611,"date":"2023-11-23T14:27:43","date_gmt":"2023-11-23T20:27:43","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2023\/11\/illustrating-reinforcement-learning-from-human-feedback-rlhf"},"modified":"2023-11-23T14:27:43","modified_gmt":"2023-11-23T20:27:43","slug":"illustrating-reinforcement-learning-from-human-feedback-rlhf","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2023\/11\/illustrating-reinforcement-learning-from-human-feedback-rlhf","title":{"rendered":"Illustrating Reinforcement Learning from Human Feedback (RLHF)"},"content":{"rendered":"<p><\/p>\n<p><iframe style=\"display: block; margin: 0 auto; width: 100%; aspect-ratio: 4\/3; object-fit: contain;\" src=\"https:\/\/www.youtube.com\/embed\/fGq4np3brbs?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; encrypted-media; gyroscope;\n   picture-in-picture\" allowfullscreen><\/iframe><\/p>\n<p>Q* appears to apply a RL technique that uses AI generated data and teaches LLMs how to solve multi step logic problems Q* techniques can be applied to GPT-5 endowing it with excellent reasoning and retrieval skills This may not be AGI but it is an extremely powerful LLM.<\/p>\n<hr>\n<p>We\u2019re on a journey to advance and democratize artificial intelligence through open source and open science.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Q* appears to apply a RL technique that uses AI generated data and teaches LLMs how to solve multi step logic problems Q* techniques can be applied to GPT-5 endowing it with excellent reasoning and retrieval skills This may not be AGI but it is an extremely powerful LLM. We\u2019re on a journey to advance [\u2026]<\/p>\n","protected":false},"author":709,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"class_list":["post-176611","post","type-post","status-publish","format-standard","hentry","category-robotics-ai"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/176611","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/709"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=176611"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/176611\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=176611"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=176611"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=176611"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}