{"id":224343,"date":"2025-10-31T22:02:28","date_gmt":"2025-11-01T03:02:28","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2025\/10\/benchmarking-large-language-models-for-personalized-biomarker-based-health-intervention-recommendations"},"modified":"2025-10-31T22:02:28","modified_gmt":"2025-11-01T03:02:28","slug":"benchmarking-large-language-models-for-personalized-biomarker-based-health-intervention-recommendations","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2025\/10\/benchmarking-large-language-models-for-personalized-biomarker-based-health-intervention-recommendations","title":{"rendered":"Benchmarking large language models for personalized, biomarker-based health intervention recommendations"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/benchmarking-large-language-models-for-personalized-biomarker-based-health-intervention-recommendations.jpg\"><\/a><\/p>\n<p>The use of large language models (LLMs) in clinical diagnostics and intervention planning is expanding, yet their utility for personalized recommendations for longevity interventions remains opaque. We extended the BioChatter framework to benchmark LLMs\u2019 ability to generate personalized longevity intervention recommendations based on biomarker profiles while adhering to key medical validation requirements. Using 25 individual profiles across three different age groups, we generated 1,000 diverse test cases covering interventions such as caloric restriction, fasting and supplements. Evaluating 56,000 model responses via an LLM-as-a-Judge system with clinician validated ground truths, we found that proprietary models outperformed open-source models especially in comprehensiveness. However, even with Retrieval-Augmented Generation (RAG), all models exhibited limitations in addressing key medical validation requirements, prompt stability, and handling age-related biases. Our findings highlight limited suitability of LLMs for unsupervised longevity intervention recommendations. Our open-source framework offers a foundation for advancing AI benchmarking in various medical contexts.<\/p>\n<hr>\n<p>Silcox, C. <i>et al.<\/i> The potential for artificial intelligence to transform healthcare: perspectives from international health leaders. <i>NPJ Digit. Med.<\/i> <b>7<\/b>, 88 (2024).<\/p>\n<p><a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"10.1038\/s41597-022-01782-9\" data-track-item_id=\"10.1038\/s41597-022-01782-9\" data-track-value=\"article reference\" data-track-action=\"article reference\" href=\"https:\/\/doi.org\/10.1038%2Fs41597-022-01782-9\" aria-label=\"Article reference 37\" data-doi=\"10.1038\/s41597-022-01782-9\">Article<\/a> <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed reference\" data-track-action=\"pubmed reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/entrez\/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=36302776\" aria-label=\"PubMed reference 37\">PubMed<\/a> <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed central reference\" data-track-action=\"pubmed central reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC9610299\" aria-label=\"PubMed Central reference 37\">PubMed Central<\/a> <a data-track=\"click_references\" data-track-action=\"google scholar reference\" data-track-value=\"google scholar reference\" data-track-label=\"link\" data-track-item_id=\"link\" rel=\"nofollow noopener\" aria-label=\"Google Scholar reference 37\" href=\"http:\/\/scholar.google.com\/scholar_lookup?&title=Benchmarking%20emergency%20department%20prediction%20models%20with%20machine%20learning%20and%20public%20electronic%20health%20records&journal=Sci.%20Data.&doi=10.1038%2Fs41597-022-01782-9&volume=9&publication_year=2022&author=Xie%2CF\"> Google Scholar<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The use of large language models (LLMs) in clinical diagnostics and intervention planning is expanding, yet their utility for personalized recommendations for longevity interventions remains opaque. We extended the BioChatter framework to benchmark LLMs\u2019 ability to generate personalized longevity intervention recommendations based on biomarker profiles while adhering to key medical validation requirements. Using 25 individual [\u2026]<\/p>\n","protected":false},"author":685,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11,269,6],"tags":[],"class_list":["post-224343","post","type-post","status-publish","format-standard","hentry","category-biotech-medical","category-life-extension","category-robotics-ai"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/224343","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/685"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=224343"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/224343\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=224343"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=224343"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=224343"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}