{"id":218303,"date":"2025-07-19T14:14:36","date_gmt":"2025-07-19T19:14:36","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2025\/07\/how-distillation-makes-ai-models-smaller-and-cheaper"},"modified":"2025-07-19T14:14:36","modified_gmt":"2025-07-19T19:14:36","slug":"how-distillation-makes-ai-models-smaller-and-cheaper","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2025\/07\/how-distillation-makes-ai-models-smaller-and-cheaper","title":{"rendered":"How Distillation Makes AI Models Smaller and Cheaper"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/how-distillation-makes-ai-models-smaller-and-cheaper.jpg\"><\/a><\/p>\n<p>Considering that the distillation requires access to the innards of the teacher model, it\u2019s not possible for a third party to sneakily distill data from a closed-source model like OpenAI\u2019s o1, as DeepSeek was thought to have done. That said, a student model could still learn quite a bit from a teacher model just through prompting the teacher with certain questions and using the answers to train its own models \u2014 an almost Socratic approach to distillation.<\/p>\n<p>Meanwhile, other researchers continue to find new applications. In January, the NovaSky lab at the University of California, Berkeley, <a href=\"https:\/\/novasky-ai.github.io\/posts\/sky-t1\/\">showed that distillation works well for training chain-of-thought reasoning models<\/a>, which use multistep \u201cthinking\u201d to better answer complicated questions. The lab says its fully open-source Sky-T1 model cost less than $450 to train, and it achieved similar results to a much larger open-source model. \u201cWe were genuinely surprised by how well distillation worked in this setting,\u201d said <a href=\"https:\/\/dachengli1.github.io\/\">Dacheng Li,<\/a> a Berkeley doctoral student and co-student lead of the NovaSky team. \u201cDistillation is a fundamental technique in AI.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Considering that the distillation requires access to the innards of the teacher model, it\u2019s not possible for a third party to sneakily distill data from a closed-source model like OpenAI\u2019s o1, as DeepSeek was thought to have done. That said, a student model could still learn quite a bit from a teacher model just through [\u2026]<\/p>\n","protected":false},"author":367,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"class_list":["post-218303","post","type-post","status-publish","format-standard","hentry","category-robotics-ai"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/218303","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/367"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=218303"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/218303\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=218303"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=218303"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=218303"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}