{"id":168552,"date":"2023-07-29T08:28:19","date_gmt":"2023-07-29T13:28:19","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2023\/07\/a-universal-null-distribution-for-topological-data-analysis"},"modified":"2023-07-29T08:28:19","modified_gmt":"2023-07-29T13:28:19","slug":"a-universal-null-distribution-for-topological-data-analysis","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2023\/07\/a-universal-null-distribution-for-topological-data-analysis","title":{"rendered":"A universal null-distribution for topological data analysis"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/a-universal-null-distribution-for-topological-data-analysis2.jpg\"><\/a><\/p>\n<p>One of the key challenges in TDA is to distinguish between \u201c<i>signal<\/i>\u201d\u2014meaningful structures underlying the data, and \u201cnoise\u201d\u2014features that arise from the local randomness and inaccuracies within the data<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Chazal, F., Cohen-Steiner, D. & Lieutier, A. A sampling theory for compact sets in Euclidean space. Discrete Comput. Geom. 41, 461&ndash;479 (2009).\" href=\"https:\/\/www.nature.com\/articles\/s41598-023-37842-2#ref-CR15\" id=\"ref-link-section-d102107804e386\">15<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Chazal, F., Guibas, L. J., Oudot, S. Y. & Skraba, P. Persistence-based clustering in Riemannian manifolds. J. ACM (JACM) 60, 41 (2013).\" href=\"https:\/\/www.nature.com\/articles\/s41598-023-37842-2#ref-CR16\" id=\"ref-link-section-d102107804e386_1\">16<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 17\" title=\"Niyogi, P., Smale, S. & Weinberger, S. Finding the homology of submanifolds with high confidence from random samples. Discrete Comput. Geom. 39(1&ndash;3), 419&ndash;441 (2008).\" href=\"https:\/\/www.nature.com\/articles\/s41598-023-37842-2#ref-CR17\" id=\"ref-link-section-d102107804e389\">17<\/a><\/sup>. The most prominent solution developed in TDA to address this issue is <i>persistent homology<\/i>. Briefly, it identifies structures such as holes and cavities (\u201cair pockets\u201d) formed by the data, and records the scales at which they are created and terminated (<i>birth<\/i> and <i>death<\/i>, respectively). The common practice in TDA has been to use this birth-death information to assess the statistical significance of topological features<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Blumberg, A. J., Gal, I., Mandell, M. A., Pancia, M. Robust statistics, hypothesis testing, and confidence intervals for persistent homology on metric measure spaces. Found. Comput. Math. 14745&ndash;789 (2013).\" href=\"https:\/\/www.nature.com\/articles\/s41598-023-37842-2#ref-CR18\" id=\"ref-link-section-d102107804e402\">18<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Fasy, B. T. et al. Confidence sets for persistence diagrams. Ann. Stat. 42, 2301&ndash;2339 (2014).\" href=\"https:\/\/www.nature.com\/articles\/s41598-023-37842-2#ref-CR19\" id=\"ref-link-section-d102107804e402_1\">19<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Reani, Y., & Bobrowski, O. Cycle registration in persistent homology with applications in topological bootstrap. IEEE Trans. Pattern Anal. Mach. Intell. 45, 5579&ndash;5593 (2022).\" href=\"https:\/\/www.nature.com\/articles\/s41598-023-37842-2#ref-CR20\" id=\"ref-link-section-d102107804e402_2\">20<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 21\" title=\"Vejdemo-Johansson, M. & Mukherjee, S. Multiple hypothesis testing with persistent homology. Found. Data Sci. 4, 667&ndash;705 (2022).\" href=\"https:\/\/www.nature.com\/articles\/s41598-023-37842-2#ref-CR21\" id=\"ref-link-section-d102107804e405\">21<\/a><\/sup>. However, research so far has yet to provide an approach which is generic, robust, and theoretically justified. A parallel line of research has been the theoretical probabilistic analysis of <i>persistent homology<\/i> generated by random data, as means to establish a null-distribution. While this direction has been fruitful<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Hiraoka, Y., Shirai, T. & Trinh, K. D. Limit theorems for persistence diagrams. Ann. Appl. Probab. 28, 2740&ndash;2780 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41598-023-37842-2#ref-CR22\" id=\"ref-link-section-d102107804e410\">22<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Owada, T. & Adler, R.J. Limit theorems for point processes under geometric constraints (and topological crackle). Ann. Probab. 45, 2004&ndash;2055 (2017).\" href=\"https:\/\/www.nature.com\/articles\/s41598-023-37842-2#ref-CR23\" id=\"ref-link-section-d102107804e410_1\">23<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Yogeshwaran, D. & Adler, R. J. On the topology of random complexes built over stationary point processes. Ann. Appl. Probab. 25, 3338&ndash;3380 (2015).\" href=\"https:\/\/www.nature.com\/articles\/s41598-023-37842-2#ref-CR24\" id=\"ref-link-section-d102107804e410_2\">24<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 25\" title=\"Yogeshwaran, D., Subag, E., Adler, R. J. Random geometric complexes in the thermodynamic regime. Probab. Theory Relat. Fields 167, 107&ndash;142 (2016).\" href=\"https:\/\/www.nature.com\/articles\/s41598-023-37842-2#ref-CR25\" id=\"ref-link-section-d102107804e413\">25<\/a><\/sup>, its use in practice has been limited. The main gap between theory and practice is that these studies indicate that the distribution of noise in <i>persistent homology<\/i>: (a) does not have a simple closed-form description, and (b) strongly depends on the model generating the point-cloud.<\/p>\n<p>Our main goal in this paper is to refute the last premise, and to make the case that the distribution of noise in <i>persistent homology<\/i> of random point-clouds is in fact <i>universal<\/i>. Specifically, we claim that the limiting distribution of <i>persistence values<\/i> (measured using the death\/birth ratio) is independent of the model generating the point-cloud. This result is loosely analogous to the central limit theorem, where sums of many different types of random variables always converge to the normal distribution. The emergence of such <i>universal <\/i>ity for <i><i>persistence diagram<\/i>s<\/i> is highly surprising.<\/p>\n<p>We support our <i>universal <\/i>ity statements by an extensive body of experiments, including point-clouds generated by different geometries, topologies, and probability distributions. These include simulated data as well as data from real-world applications (image processing, <i>signal<\/i> processing, and natural language processing). Our main goal here is to introduce the unexpected behavior of statistical <i>universal <\/i>ity in <i><i>persistence diagram<\/i>s<\/i>, in order to initiate a shift of paradigm in stochastic topology that will lead to the development of a new theory. Developing this new theory, and proving the conjectures made here, is anticipated to be an exciting yet a challenging long journey, and is outside the scope of this paper. Based on our <i>universal <\/i>ity conjectures, we develop a powerful hypothesis testing framework for <i><i>persistence diagram<\/i>s<\/i>, allowing us to compute numerical significance measures for individual features using very few assumptions on the underlying model.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>One of the key challenges in TDA is to distinguish between \u201csignal\u201d\u2014meaningful structures underlying the data, and \u201cnoise\u201d\u2014features that arise from the local randomness and inaccuracies within the data15,16,17. The most prominent solution developed in TDA to address this issue is persistent homology. Briefly, it identifies structures such as holes and cavities (\u201cair pockets\u201d) formed [\u2026]<\/p>\n","protected":false},"author":661,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"class_list":["post-168552","post","type-post","status-publish","format-standard","hentry","category-robotics-ai"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/168552","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/661"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=168552"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/168552\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=168552"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=168552"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=168552"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}