{"id":35583,"date":"2017-03-25T10:42:48","date_gmt":"2017-03-25T17:42:48","guid":{"rendered":"http:\/\/lifeboat.com\/blog\/2017\/03\/evolution-strategies-as-a-scalable-alternative-to-reinforcement-learning"},"modified":"2017-06-04T07:07:45","modified_gmt":"2017-06-04T14:07:45","slug":"evolution-strategies-as-a-scalable-alternative-to-reinforcement-learning","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2017\/03\/evolution-strategies-as-a-scalable-alternative-to-reinforcement-learning","title":{"rendered":"Evolution Strategies as a Scalable Alternative to Reinforcement Learning"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/evolution-strategies-as-a-scalable-alternative-to-reinforcement-learning.jpg\"><\/a><\/p>\n<p>We\u2019ve <a href=\"https:\/\/arxiv.org\/abs\/1703.03864\">discovered<\/a> that <strong>evolution strategies (ES)<\/strong>, an optimization technique that\u2019s been known for decades, rivals the performance of standard <strong>reinforcement learning (RL)<\/strong> techniques on modern RL benchmarks (e.g. Atari\/MuJoCo), while overcoming many of RL\u2019s inconveniences.<\/p>\n<p>In particular, ES is simpler to implement (there is no need for <a href=\"http:\/\/neuralnetworksanddeeplearning.com\/chap2.html\">backpropagation<\/a>), it is easier to scale in a distributed setting, it does not suffer in settings with sparse rewards, and has fewer <a href=\"https:\/\/www.quora.com\/What-are-hyperparameters-in-machine-learning\">hyperparameters<\/a>. This outcome is surprising because ES resembles simple hill-climbing in a high-dimensional space based only on <a href=\"https:\/\/en.wikipedia.org\/wiki\/Finite_difference\">finite differences<\/a> along a few random directions at each step.<\/p>\n<p><!-- Link: <a href=\"https:\/\/blog.openai.com\/evolution-strategies\/\">https:\/\/blog.openai.com\/evolution-strategies\/<\/a> --><\/p>\n","protected":false},"excerpt":{"rendered":"<p>We\u2019ve discovered that evolution strategies (ES), an optimization technique that\u2019s been known for decades, rivals the performance of standard reinforcement learning (RL) techniques on modern RL benchmarks (e.g. Atari\/MuJoCo), while overcoming many of RL\u2019s inconveniences. In particular, ES is simpler to implement (there is no need for backpropagation), it is easier to scale in a [\u2026]<\/p>\n","protected":false},"author":423,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[385],"tags":[],"class_list":["post-35583","post","type-post","status-publish","format-standard","hentry","category-evolution"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/35583","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/423"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=35583"}],"version-history":[{"count":2,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/35583\/revisions"}],"predecessor-version":[{"id":58315,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/35583\/revisions\/58315"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=35583"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=35583"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=35583"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}