{"id":127332,"date":"2021-09-06T22:23:14","date_gmt":"2021-09-07T05:23:14","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2021\/09\/deepmind-wants-to-change-how-reinforcement-learning-collect-infer"},"modified":"2021-09-06T22:23:14","modified_gmt":"2021-09-07T05:23:14","slug":"deepmind-wants-to-change-how-reinforcement-learning-collect-infer","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2021\/09\/deepmind-wants-to-change-how-reinforcement-learning-collect-infer","title":{"rendered":"DeepMind Wants To Change How Reinforcement Learning \u2018Collect &amp; Infer\u2019"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/deepmind-wants-to-change-how-reinforcement-learning-collect-infer2.jpg\"><\/a><\/p>\n<p>Reinforcement learning (RL) is the most widely used machine learning algorithm, besides supervised and unsupervised learning and the less common self-supervised and semi-supervised learning. RL focuses on the controlled learning process, where a machine learning algorithm is provided with a set of actions, parameters, and end values. It teaches the machine trial and error.<\/p>\n<p>From a data efficiency perspective, several methods have been proposed, including online setting, reply buffer, storing experience in a transition memory, etc. In recent years, off-policy actor-critic algorithms have been gaining prominence, where <a href=\"https:\/\/analyticsindiamag.com\/can-reinforcement-learning-be-used-for-better-economic-policies\/\">RL algorithms<\/a> can learn from limited data sets entirely without interaction (offline RL).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Reinforcement learning (RL) is the most widely used machine learning algorithm, besides supervised and unsupervised learning and the less common self-supervised and semi-supervised learning. RL focuses on the controlled learning process, where a machine learning algorithm is provided with a set of actions, parameters, and end values. It teaches the machine trial and error. From [\u2026]<\/p>\n","protected":false},"author":359,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[41,31,6],"tags":[],"class_list":["post-127332","post","type-post","status-publish","format-standard","hentry","category-information-science","category-policy","category-robotics-ai"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/127332","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/359"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=127332"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/127332\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=127332"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=127332"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=127332"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}