{"id":137722,"date":"2022-04-05T03:22:20","date_gmt":"2022-04-05T08:22:20","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2022\/04\/google-nyu-maryland-us-token-dropping-approach-reduces-bert-pretraining-time-by-25"},"modified":"2022-04-05T03:22:20","modified_gmt":"2022-04-05T08:22:20","slug":"google-nyu-maryland-us-token-dropping-approach-reduces-bert-pretraining-time-by-25","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2022\/04\/google-nyu-maryland-us-token-dropping-approach-reduces-bert-pretraining-time-by-25","title":{"rendered":"Google, NYU &amp; Maryland U\u2019s Token-Dropping Approach Reduces BERT Pretraining Time by 25%"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/google-nyu-maryland-us-token-dropping-approach-reduces-bert-pretraining-time-by-252.jpg\"><\/a><\/p>\n<p>The pretraining of BERT-type large language models \u2014 which can scale up to billions of parameters \u2014 is crucial for obtaining state-of-the-art performance on many natural language processing (NLP) tasks. This pretraining process however is expensive, and has become a bottleneck hindering the industrial application of such large language models.<\/p>\n<p>In the new paper <em>Token Dropping for Efficient BERT Pretraining<\/em>, a research team from Google, New York University, and the University of Maryland proposes a simple but effective \u201ctoken dropping\u201d technique that significantly reduces the pretraining cost of transformer models such as BERT, without degrading performance on downstream fine-tuning tasks.<\/p>\n<p>The team summarizes their main contributions as:<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The pretraining of BERT-type large language models \u2014 which can scale up to billions of parameters \u2014 is crucial for obtaining state-of-the-art performance on many natural language processing (NLP) tasks. This pretraining process however is expensive, and has become a bottleneck hindering the industrial application of such large language models. In the new paper Token [\u2026]<\/p>\n","protected":false},"author":556,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"class_list":["post-137722","post","type-post","status-publish","format-standard","hentry","category-robotics-ai"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/137722","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/556"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=137722"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/137722\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=137722"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=137722"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=137722"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}