{"id":130706,"date":"2021-11-16T18:24:20","date_gmt":"2021-11-17T02:24:20","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2021\/11\/element-selection-for-crystalline-inorganic-solid-discovery-guided-by-unsupervised-machine-learning-of-experimentally-explored-chemistry"},"modified":"2021-11-16T18:24:20","modified_gmt":"2021-11-17T02:24:20","slug":"element-selection-for-crystalline-inorganic-solid-discovery-guided-by-unsupervised-machine-learning-of-experimentally-explored-chemistry","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2021\/11\/element-selection-for-crystalline-inorganic-solid-discovery-guided-by-unsupervised-machine-learning-of-experimentally-explored-chemistry","title":{"rendered":"Element selection for crystalline inorganic solid discovery guided by unsupervised machine learning of experimentally explored chemistry"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/element-selection-for-crystalline-inorganic-solid-discovery-guided-by-unsupervised-machine-learning-of-experimentally-explored-chemistry2.jpg\"><\/a><\/p>\n<p>Machine learning (ML) models are powerful tools to study multivariate correlations that exist within large datasets but are hard for humans to identify<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 16\" title=\"Tshitoyan, V. et al. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature. 571, 95&ndash;98 (2019).\" href=\"https:\/\/www.nature.com\/articles\/s41467-021-25343-7#ref-CR16\" id=\"ref-link-section-d270619959e721\">16<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 23\" title=\"Canabarro, A., Fanchini, F. F., Malvezzi, A. L., Pereira, R. & Chaves, R. Unveiling phase transitions with machine learning. Phys. Rev. B. 100, 045129 (2019).\" href=\"https:\/\/www.nature.com\/articles\/s41467-021-25343-7#ref-CR23\" id=\"ref-link-section-d270619959e724\">23<\/a><\/sup>. Our aim is to build a model that captures the chemical interactions between the element combinations that afford reported crystalline inorganic materials, noting that the aim of such models is efficacy rather than interpretability, and that as such they can be complementary guides to human experts. The model should assist expert prioritization between the promising element combinations by ranking them quantitatively. Researchers have practically understood how to identify new chemistries based on element combinations for phase-field exploration, but not at significant scale. However, the prioritization of these attractive knowledge-based choices for experimental and computational investigation is critical as it determines substantial resource commitment. The collaborative ML workflow<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 24\" title=\"Tschandl, P. et al. Human&ndash;computer collaboration for skin cancer recognition. Nat. Med. 26, 1229&ndash;1234 (2020).\" href=\"https:\/\/www.nature.com\/articles\/s41467-021-25343-7#ref-CR24\" id=\"ref-link-section-d270619959e728\">24<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 25\" title=\"More Than Machines. Nat. Mach. Intell. 1, 1&ndash;1 (2019).\" href=\"https:\/\/www.nature.com\/articles\/s41467-021-25343-7#ref-CR25\" id=\"ref-link-section-d270619959e731\">25<\/a><\/sup> developed here includes a ML tool trained across all available data at a scale beyond that, which humans can assimilate simultaneously to provide numerical ranking of the likelihood of identifying new phases in the selected chemistries. We illustrate the predictive power of ML in this workflow in the discovery of a new solid-state Li-ion conductor from unexplored quaternary phase fields with two anions. To train a model to assist prioritization of these candidate phase fields, we extracted 2021 <i>M<\/i><sub><i>x<\/i><\/sub><i>M <\/i>\u2032<sub><i>y<\/i><\/sub><i>A<\/i><sub>z<\/sub><i>A <\/i>\u2032<sub><i>t<\/i><\/sub> phases reported in ICSD (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"https:\/\/www.nature.com\/articles\/s41467-021-25343-7#Fig1\">1<\/a>, Step 1), and associated each phase with the phase fields <i>M<\/i>-<i>M <\/i>\u2032-<i>A<\/i>-<i>A<\/i>\u2032 where <i>M<\/i>, <i>M <\/i>\u2032 span all cations, <i>A<\/i>, <i>A <\/i>\u2032 are anions {N<sup>3\u2212<\/sup>, P<sup>3\u2212<\/sup>, As<sup>3\u2212<\/sup>, O<sup>2\u2212<\/sup>, S<sup>2\u2212<\/sup>, Se<sup>2\u2212<\/sup>, Te<sup>2\u2212<\/sup>, F<sup>\u2212<\/sup>, Cl<sup>\u2212<\/sup>, Br<sup>\u2212<\/sup>, and I<sup>\u2212<\/sup>} and <i>x, y, z, t<\/i> denote concentrations (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"https:\/\/www.nature.com\/articles\/s41467-021-25343-7#Fig1\">1<\/a>, Step 2). Data were augmented by 24-fold elemental permutations to enhance learning and prevent overfitting (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"https:\/\/www.nature.com\/articles\/s41467-021-25343-7#MOESM1\">2<\/a>).<\/p>\n<p>ML models rely on using appropriate features (often called descriptors)<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 26\" title=\"Jha, D. et al. ElemNet: deep learning the chemistry of materials from only elemental composition. Sci. Rep. 8, 1&ndash;13 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41467-021-25343-7#ref-CR26\" id=\"ref-link-section-d270619959e946\">26<\/a><\/sup> to describe the data presented, so feature selection is critical to the quality of the model. The challenge of selecting the best set of features among the multitude available for the chemical elements (e.g., atomic weight, valence, ionic radius, etc.)<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 26\" title=\"Jha, D. et al. ElemNet: deep learning the chemistry of materials from only elemental composition. Sci. Rep. 8, 1&ndash;13 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41467-021-25343-7#ref-CR26\" id=\"ref-link-section-d270619959e950\">26<\/a><\/sup> lies in balancing competing considerations: a small number of features usually makes learning more robust, while limiting the predictive power of resulting models, large numbers of features tend to make models more descriptive and discriminating while increasing the risk of overfitting. We evaluated 40 individual features<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 26\" title=\"Jha, D. et al. ElemNet: deep learning the chemistry of materials from only elemental composition. Sci. Rep. 8, 1&ndash;13 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41467-021-25343-7#ref-CR26\" id=\"ref-link-section-d270619959e954\">26<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 27\" title=\"Glawe, H., Sanna, A., Gross, E. K. U. & Marques, M. A. L. The optimal one dimensional periodic table: a modified pettifor chemical scale from data mining. N. J. Phys. 18, 093011 (2016).\" href=\"https:\/\/www.nature.com\/articles\/s41467-021-25343-7#ref-CR27\" id=\"ref-link-section-d270619959e957\">27<\/a><\/sup> (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"https:\/\/www.nature.com\/articles\/s41467-021-25343-7#MOESM1\">4<\/a>, <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"https:\/\/www.nature.com\/articles\/s41467-021-25343-7#MOESM1\">5<\/a>) that have reported values for all elements and identify a set of 37 elemental features that best balance these considerations. We thus describe each phase field of four elements as a vector in a 148-dimensional feature space (37 features \u00d7 4 elements = 148 dimensions).<\/p>\n<p>To infer relationships between entries in such a high-dimensional feature space in which the training data are necessarily sparsely distributed<sup>28<\/sup>, we employ the variational autoencoder (VAE), an unsupervised neural network-based dimensionality reduction method (Fig. 1, Step 3), which quantifies nonlinear similarities in high-dimensional unlabelled data<sup>29<\/sup> and, in addition to the conventional autoencoder, pays close attention to the distribution of the data features in multidimensional space. A VAE is a two-part neural network, where one part is used to compress (encode) the input vectors into a lower-dimensional (latent) space, and the other to decode vectors in latent space back into the original high-dimensional space. Here we choose to encode the 148-dimensional input feature space into a four-dimensional latent feature space (Supplementary Methods).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Machine learning (ML) models are powerful tools to study multivariate correlations that exist within large datasets but are hard for humans to identify16,23. Our aim is to build a model that captures the chemical interactions between the element combinations that afford reported crystalline inorganic materials, noting that the aim of such models is efficacy rather [\u2026]<\/p>\n","protected":false},"author":427,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[19,6,8],"tags":[],"class_list":["post-130706","post","type-post","status-publish","format-standard","hentry","category-chemistry","category-robotics-ai","category-space"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/130706","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/427"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=130706"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/130706\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=130706"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=130706"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=130706"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}