{"id":206999,"date":"2025-02-22T18:05:55","date_gmt":"2025-02-23T00:05:55","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2025\/02\/ai-can-now-model-and-design-the-genetic-code-for-all-domains-of-life-with-evo-2"},"modified":"2025-02-23T12:40:25","modified_gmt":"2025-02-23T18:40:25","slug":"ai-can-now-model-and-design-the-genetic-code-for-all-domains-of-life-with-evo-2","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2025\/02\/ai-can-now-model-and-design-the-genetic-code-for-all-domains-of-life-with-evo-2","title":{"rendered":"AI can now model and design the genetic code for all domains of life with Evo 2"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/ai-can-now-model-and-design-the-genetic-code-for-all-domains-of-life-with-evo-22.jpg\"><\/a><\/p>\n<p>Very excellent.<\/p>\n<hr>\n<p>Arc Institute researchers have developed a machine learning model called Evo 2 that is trained on the DNA of over 100,000 species across the entire tree of life. Its deep understanding of biological code means that Evo 2 can identify patterns in gene sequences across disparate organisms that experimental researchers would need years to uncover. The model can accurately identify disease-causing mutations in human genes and is capable of designing new genomes that are as long as the genomes of simple bacteria.<\/p>\n<p>Evo 2\u2019s developers\u2014made up of scientists from Arc Institute and NVIDIA, convening collaborators across Stanford University, UC Berkeley, and UC San Francisco\u2014will post details about the model as a <a href=\"https:\/\/arcinstitute.org\/manuscripts\/Evo2\">preprint<\/a> on February 19, 2025, accompanied by a user-friendly interface called <a href=\"https:\/\/arcinstitute.org\/tools\/evo\/evo-designer\">Evo Designer<\/a>. The Evo 2 code is publicly accessible from <a href=\"https:\/\/github.com\/arcinstitute\/evo2\" rel=\"noopener noreferrer\" target=\"_blank\">Arc\u2019s GitHub<\/a>, and is also integrated into the <a href=\"https:\/\/github.com\/NVIDIA\/bionemo-framework\" rel=\"noopener noreferrer\" target=\"_blank\">NVIDIA BioNeMo framework<\/a>, as part of a <a href=\"https:\/\/arcinstitute.org\/news\/news\/arc-nvidia\">collaboration between Arc Institute and NVIDIA<\/a> to accelerate scientific research. Arc Institute also worked with AI research lab <a href=\"https:\/\/goodfire.notion.site\/DRAFT-Evo-2-Mech-Interp-Blog-post-18ff566bfbc18019ba9bfa6d472ebe1a\" rel=\"noopener noreferrer\" target=\"_blank\">Goodfire<\/a> to develop a <a href=\"https:\/\/arcinstitute.org\/tools\/evo\/evo-mech-interp\">mechanistic interpretability visualizer<\/a> that uncovers the key biological features and patterns the model learns to recognize in genomic sequences. The Evo team is sharing its training data, training and inference code, and model weights to release the largest-scale, fully open source AI model to date.<\/p>\n<p>Building on its predecessor <a href=\"https:\/\/www.science.org\/doi\/10.1126\/science.ado9336\" rel=\"noopener noreferrer\" target=\"_blank\">Evo 1<\/a>, which was trained entirely on single-cell genomes, Evo 2 is the largest artificial intelligence model in biology to date, trained on over 9.3 trillion nucleotides\u2014the building blocks that make up DNA or RNA\u2014from over 128,000 whole genomes as well as metagenomic data. In addition to an expanded collection of bacterial, archaeal, and phage genomes, Evo 2 includes information from humans, plants, and other single-celled and multi-cellular species in the eukaryotic domain of life.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Very excellent. Arc Institute researchers have developed a machine learning model called Evo 2 that is trained on the DNA of over 100,000 species across the entire tree of life. Its deep understanding of biological code means that Evo 2 can identify patterns in gene sequences across disparate organisms that experimental researchers would need years [\u2026]<\/p>\n","protected":false},"author":661,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11,412,6],"tags":[],"class_list":["post-206999","post","type-post","status-publish","format-standard","hentry","category-biotech-medical","category-genetics","category-robotics-ai"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/206999","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/661"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=206999"}],"version-history":[{"count":1,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/206999\/revisions"}],"predecessor-version":[{"id":207032,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/206999\/revisions\/207032"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=206999"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=206999"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=206999"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}