A new algorithm, Evo 2, trained on roughly 128,000 genomes—9.3 trillion DNA letter pairs—spanning all of life’s domains, is now the largest generative AI model for biology to date. Built by scientists at the Arc Institute, Stanford University, and Nvidia, Evo 2 can write whole chromosomes and small genomes from scratch.
It also learned how DNA mutations affect proteins, RNA, and overall health, shining light on “non-coding” regions, in particular. These mysterious sections of DNA don’t make proteins but often control gene activity and are linked to diseases.
The team has released Evo 2’s software code and model parameters to the scientific community for further exploration. Researchers can also access the tool through a user-friendly web interface. With Evo 2 as a foundation, scientists may develop more specific AI models. These could predict how mutations affect a protein’s function, how genes operate differently across cell types, or even help researchers design new genomes for synthetic biology.