Considering that the distillation requires access to the innards of the teacher model, it’s not possible for a third party to sneakily distill data from a closed-source model like OpenAI’s o1, as DeepSeek was thought to have done. That said, a student model could still learn quite a bit from a teacher model just through prompting the teacher with certain questions and using the answers to train its own models — an almost Socratic approach to distillation.
Meanwhile, other researchers continue to find new applications. In January, the NovaSky lab at the University of California, Berkeley, showed that distillation works well for training chain-of-thought reasoning models, which use multistep “thinking” to better answer complicated questions. The lab says its fully open-source Sky-T1 model cost less than $450 to train, and it achieved similar results to a much larger open-source model. “We were genuinely surprised by how well distillation worked in this setting,” said Dacheng Li, a Berkeley doctoral student and co-student lead of the NovaSky team. “Distillation is a fundamental technique in AI.”