{"id":173176,"date":"2023-09-30T07:22:52","date_gmt":"2023-09-30T12:22:52","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2023\/09\/ai-language-models-can-exceed-png-and-flac-in-lossless-compression-says-study"},"modified":"2023-09-30T07:22:52","modified_gmt":"2023-09-30T12:22:52","slug":"ai-language-models-can-exceed-png-and-flac-in-lossless-compression-says-study","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2023\/09\/ai-language-models-can-exceed-png-and-flac-in-lossless-compression-says-study","title":{"rendered":"AI language models can exceed PNG and FLAC in lossless compression, says study"},"content":{"rendered":"<p style=\"padding-right: 20px\"><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/ai-language-models-can-exceed-png-and-flac-in-lossless-compression-says-study2.jpg\"><\/a><\/p>\n<p>Effective compression is about finding patterns to make data smaller without losing information. When an algorithm or model can accurately guess the next piece of data in a sequence, it shows it\u2019s good at spotting these patterns. This links the idea of making good guesses\u2014which is what large language models like GPT-4 <a href=\"https:\/\/arstechnica.com\/science\/2023\/07\/a-jargon-free-explanation-of-how-ai-large-language-models-work\/\">do very well <\/a>\u2014to achieving good compression.<\/p>\n<p>In an arXiv research paper titled \u201c<a href=\"https:\/\/arxiv.org\/abs\/2309.10668\">Language Modeling Is Compression<\/a>,\u201d researchers detail their discovery that the DeepMind large language model (LLM) called <a href=\"https:\/\/en.wikipedia.org\/wiki\/Chinchilla_AI\">Chinchilla 70B<\/a> can perform <a href=\"https:\/\/en.wikipedia.org\/wiki\/Lossless_compression\">lossless compression<\/a> on image patches from the <a href=\"https:\/\/www.image-net.org\/\">ImageNet<\/a> image database to 43.4 percent of their original size, beating the <a href=\"https:\/\/en.wikipedia.org\/wiki\/PNG\">PNG<\/a> algorithm, which compressed the same data to 58.5 percent. For audio, Chinchilla compressed samples from the <a href=\"https:\/\/www.openslr.org\/12\">LibriSpeech<\/a> audio data set to just 16.4 percent of their raw size, outdoing <a href=\"https:\/\/en.wikipedia.org\/wiki\/FLAC\">FLAC<\/a> compression at 30.3 percent.<\/p>\n<p>In this case, lower numbers in the results mean more compression is taking place. And lossless compression means that no data is lost during the compression process. It stands in contrast to a lossy compression technique like <a href=\"https:\/\/arstechnica.com\/information-technology\/2017\/03\/google-jpeg-guetzli-encoder-file-size\/\">JPEG<\/a>, which sheds some data and reconstructs some of the data with approximations during the decoding process to significantly reduce file sizes.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Effective compression is about finding patterns to make data smaller without losing information. When an algorithm or model can accurately guess the next piece of data in a sequence, it shows it\u2019s good at spotting these patterns. This links the idea of making good guesses\u2014which is what large language models like GPT-4 do very well [\u2026]<\/p>\n","protected":false},"author":662,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[41,6],"tags":[],"class_list":["post-173176","post","type-post","status-publish","format-standard","hentry","category-information-science","category-robotics-ai"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/173176","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/662"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=173176"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/173176\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=173176"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=173176"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=173176"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}