{"id":221880,"date":"2025-09-14T20:02:27","date_gmt":"2025-09-15T01:02:27","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2025\/09\/defeating-nondeterminism-in-llm-inference"},"modified":"2025-09-14T20:02:27","modified_gmt":"2025-09-15T01:02:27","slug":"defeating-nondeterminism-in-llm-inference","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2025\/09\/defeating-nondeterminism-in-llm-inference","title":{"rendered":"Defeating Nondeterminism in LLM Inference"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/defeating-nondeterminism-in-llm-inference2.jpg\"><\/a><\/p>\n<p>Reproducibility is a bedrock of scientific progress. However, it\u2019s remarkably difficult to get reproducible results out of large language models.<\/p>\n<p>For example, you might observe that asking ChatGPT the same question multiple times provides different results. This by itself is not surprising, since getting a result from a language model involves \u201csampling\u201d, a process that converts the language model\u2019s output into a probability distribution and probabilistically selects a token.<\/p>\n<p>What might be more surprising is that even when we adjust the temperature down to 0This means that the LLM always chooses the highest probability token, which is called greedy sampling. (thus making the sampling theoretically deterministic), LLM APIs are still not deterministic in practice (see past discussions here, here, or here). Even when running inference on your own hardware with an OSS inference library like vLLM or SGLang, sampling still isn\u2019t deterministic (see here or here).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Reproducibility is a bedrock of scientific progress. However, it\u2019s remarkably difficult to get reproducible results out of large language models. For example, you might observe that asking ChatGPT the same question multiple times provides different results. This by itself is not surprising, since getting a result from a language model involves \u201csampling\u201d, a process that [\u2026]<\/p>\n","protected":false},"author":709,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"class_list":["post-221880","post","type-post","status-publish","format-standard","hentry","category-robotics-ai"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/221880","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/709"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=221880"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/221880\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=221880"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=221880"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=221880"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}