{"id":187828,"date":"2024-04-21T16:23:15","date_gmt":"2024-04-21T21:23:15","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2024\/04\/huggingfacefw-fineweb-%c2%b7-datasets-at-hugging-face"},"modified":"2024-04-21T16:23:15","modified_gmt":"2024-04-21T21:23:15","slug":"huggingfacefw-fineweb-%c2%b7-datasets-at-hugging-face","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2024\/04\/huggingfacefw-fineweb-%c2%b7-datasets-at-hugging-face","title":{"rendered":"HuggingFaceFW\/fineweb \u00b7 Datasets at Hugging Face"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/huggingfacefw-fineweb-c2b7-datasets-at-hugging-face2.jpg\"><\/a><\/p>\n<p>FineWeb: 15 trillion tokens of high quality web data the web has to offer.<\/p>\n<p>The \ud83c\udf77 dataset consists of more than 15T tokens of cleaned and deduplicated english web data from CommonCrawl.<\/p>\n<hr>\n<p>We\u2019re on a journey to advance and democratize artificial intelligence through open source and open science.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>FineWeb: 15 trillion tokens of high quality web data the web has to offer. The \ud83c\udf77 dataset consists of more than 15T tokens of cleaned and deduplicated english web data from CommonCrawl. We\u2019re on a journey to advance and democratize artificial intelligence through open source and open science.<\/p>\n","protected":false},"author":709,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[418,6],"tags":[],"class_list":["post-187828","post","type-post","status-publish","format-standard","hentry","category-internet","category-robotics-ai"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/187828","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/709"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=187828"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/187828\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=187828"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=187828"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=187828"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}