{"id":223008,"date":"2025-10-07T03:30:21","date_gmt":"2025-10-07T08:30:21","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2025\/10\/topsicle-a-method-for-estimating-telomere-length-from-whole-genome-long-read-sequencing-data"},"modified":"2025-10-07T03:30:21","modified_gmt":"2025-10-07T08:30:21","slug":"topsicle-a-method-for-estimating-telomere-length-from-whole-genome-long-read-sequencing-data","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2025\/10\/topsicle-a-method-for-estimating-telomere-length-from-whole-genome-long-read-sequencing-data","title":{"rendered":"Topsicle: a method for estimating telomere length from whole genome long-read sequencing data"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/topsicle-a-method-for-estimating-telomere-length-from-whole-genome-long-read-sequencing-data2.jpg\"><\/a><\/p>\n<p>Long read sequencing technology (advanced by Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (Nanopore)) is revolutionizing the genomics field [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 43\" title=\"Marx V. Method of the year: long-read sequencing. Nat Methods. 2023;20:6&ndash;11.\" href=\"https:\/\/genomebiology.biomedcentral.com\/articles\/10.1186\/s13059-025-03783-4#ref-CR43\" id=\"ref-link-section-d75272624e517\">43<\/a>] and it has major potential to be a powerful computational tool for investigating the telomere length variation within populations and between species. Read length from long read sequencing platforms is orders of magnitude longer than short read sequencing platforms (tens of kilobase pairs versus 100\u2013300 bp). These long reads have greatly aided in resolving the complex and highly repetitive regions of the genome [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 44\" title=\"Warburton PE, Sebra RP. Long-read DNA sequencing: recent advances and remaining challenges. Annu Rev Genomics Hum Genet. 2023;24:109&ndash;32.\" href=\"https:\/\/genomebiology.biomedcentral.com\/articles\/10.1186\/s13059-025-03783-4#ref-CR44\" id=\"ref-link-section-d75272624e520\">44<\/a>], and near gapless genome assemblies (also known as telomere-to-telomere assembly) are generated for multiple organisms [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 45\" title=\"Garg V, Bohra A, Mascher M, Spannagl M, Xu X, Bevan MW, et al. Unlocking plant genetics with telomere-to-telomere genome assemblies. Nat Genet. 2024;56:1788&ndash;99.\" href=\"https:\/\/genomebiology.biomedcentral.com\/articles\/10.1186\/s13059-025-03783-4#ref-CR45\" id=\"ref-link-section-d75272624e523\">45<\/a>, <a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 46\" title=\"Li H, Durbin R. Genome assembly in the telomere-to-telomere era. Nat Rev Genet. 2024;25:658&ndash;70.\" href=\"https:\/\/genomebiology.biomedcentral.com\/articles\/10.1186\/s13059-025-03783-4#ref-CR46\" id=\"ref-link-section-d75272624e526\">46<\/a>]. The long read sequences can also be used for estimating telomere length, since whole genome sequencing using a long read sequencing platform would contain reads that span the entire telomere and subtelomere region. Computational methods can then be developed to determine the telomere\u2013subtelomere boundary and use it to estimate the telomere length. As an example, telomere-to-telomere assemblies have been used for estimating telomere length by analyzing the sequences at the start and end of the gapless chromosome assembly [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Miga KH, Koren S, Rhie A, Vollger MR, Gershman A, Bzikadze A, et al. Telomere-to-telomere assembly of a complete human X chromosome. Nature. 2020;585:79&ndash;84.\" href=\"https:\/\/genomebiology.biomedcentral.com\/articles\/10.1186\/s13059-025-03783-4#ref-CR47\" id=\"ref-link-section-d75272624e529\">47<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Chen J, Wang Z, Tan K, Huang W, Shi J, Li T, et al. A complete telomere-to-telomere assembly of the maize genome. Nat Genet. 2023;55:1221&ndash;31.\" href=\"https:\/\/genomebiology.biomedcentral.com\/articles\/10.1186\/s13059-025-03783-4#ref-CR48\" id=\"ref-link-section-d75272624e529_1\">48<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"O\u2019Donnell S, Yue J-X, Saada OA, Agier N, Caradec C, Cokelaer T, et al. Telomere-to-telomere assemblies of 142 strains characterize the genome structural landscape in Saccharomyces cerevisiae. Nat Genet. 2023;55:1390&ndash;9.\" href=\"https:\/\/genomebiology.biomedcentral.com\/articles\/10.1186\/s13059-025-03783-4#ref-CR49\" id=\"ref-link-section-d75272624e529_2\">49<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 50\" title=\"Zhou Y, Xiong J, Shu Z, Dong C, Gu T, Sun P, et al. The telomere-to-telomere genome of Fragaria vesca reveals the genomic evolution of Fragaria and the origin of cultivated octoploid strawberry. Hortic Res. 2023;10:uhad027.\" href=\"https:\/\/genomebiology.biomedcentral.com\/articles\/10.1186\/s13059-025-03783-4#ref-CR50\" id=\"ref-link-section-d75272624e533\">50<\/a>]. But generating gapless genome assemblies is resource intensive and cannot be used for estimating the telomeres of multiple individuals. Alternatively, methods such as TLD [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 51\" title=\"Reed J, Kirkman LA, Kafsack BF, Mason CE, Deitsch KW. Telomere length dynamics in response to DNA damage in malaria parasites. iScience. 2021;24:102082.\" href=\"https:\/\/genomebiology.biomedcentral.com\/articles\/10.1186\/s13059-025-03783-4#ref-CR51\" id=\"ref-link-section-d75272624e536\">51<\/a>], Telogator [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 52\" title=\"Stephens Z, Ferrer A, Boardman L, Iyer RK, Kocher J-PA. Telogator: a method for reporting chromosome-specific telomere lengths from long reads. Bioinformatics. 2022;38:1788&ndash;93.\" href=\"https:\/\/genomebiology.biomedcentral.com\/articles\/10.1186\/s13059-025-03783-4#ref-CR52\" id=\"ref-link-section-d75272624e539\">52<\/a>], and TeloNum [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 53\" title=\" Colt K, Petrus S, Abramson BW, Mamerto A, Hartwick NT, Michael TP. Telomere Length in Plants Estimated with Long Read Sequencing. bioRxiv; 2024. p. 2024.03.27.586973. Available from: https:\/\/www.biorxiv.org\/content\/10.1101\/2024.03.27.586973v1. Cited 2025 Mar 2.\" href=\"https:\/\/genomebiology.biomedcentral.com\/articles\/10.1186\/s13059-025-03783-4#ref-CR53\" id=\"ref-link-section-d75272624e542\">53<\/a>] analyze raw long read sequences to estimate telomere lengths. These methods require a known telomere repeat sequence but this can be determined through <i>k<\/i>-mer based analysis [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 54\" title=\"Brown MR, Manuel Gonzalez de La Rosa P, Blaxter M. tidk: a toolkit to rapidly identify telomeric repeats from genomic datasets. Bioinformatics. 2025;41:btaf049.\" href=\"https:\/\/genomebiology.biomedcentral.com\/articles\/10.1186\/s13059-025-03783-4#ref-CR54\" id=\"ref-link-section-d75272624e548\">54<\/a>]. Specialized methods have also been developed to concentrate long reads originating from chromosome ends. These methods involve attaching sequencing adapters that are complementary to the single-stranded 3\u2032 G-overhang of the telomere, which can subsequently be used for selectively amplifying the chromosome ends for long read sequencing [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Sholes SL, Karimian K, Gershman A, Kelly TJ, Timp W, Greider CW. Chromosome-specific telomere lengths and the minimal functional telomere revealed by nanopore sequencing. Genome Res. 2022;32:616&ndash;28.\" href=\"https:\/\/genomebiology.biomedcentral.com\/articles\/10.1186\/s13059-025-03783-4#ref-CR55\" id=\"ref-link-section-d75272624e552\">55<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Tham C-Y, Poon L, Yan T, Koh JYP, Ramlee MK, Teoh VSI, et al. High-throughput telomere length measurement at nucleotide resolution using the PacBio high fidelity sequencing platform. Nat Commun. 2023;14:281.\" href=\"https:\/\/genomebiology.biomedcentral.com\/articles\/10.1186\/s13059-025-03783-4#ref-CR56\" id=\"ref-link-section-d75272624e552_1\">56<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Karimian K, Groot A, Huso V, Kahidi R, Tan K-T, Sholes S, et al. Human telomere length is chromosome end&ndash;specific and conserved across individuals. Science. 2024;384:533&ndash;9.\" href=\"https:\/\/genomebiology.biomedcentral.com\/articles\/10.1186\/s13059-025-03783-4#ref-CR57\" id=\"ref-link-section-d75272624e552_2\">57<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 58\" title=\"Schmidt TT, Tyer C, Rughani P, Haggblom C, Jones JR, Dai X, et al. High resolution long-read telomere sequencing reveals dynamic mechanisms in aging and cancer. Nat Commun. 2024;15:5149.\" href=\"https:\/\/genomebiology.biomedcentral.com\/articles\/10.1186\/s13059-025-03783-4#ref-CR58\" id=\"ref-link-section-d75272624e555\">58<\/a>]. While these methods can enrich telomeric long reads, they require optimization of the protocol (e.g., designing the adapter sequence to target the G-overhang) and organisms with naturally blunt-ended telomeres [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 59\" title=\"Kazda A, Zellinger B, R\u00f6ssler M, Derboven E, Kusenda B, Riha K. Chromosome end protection by blunt-ended telomeres. Genes Dev. 2012;26:1703&ndash;13.\" href=\"https:\/\/genomebiology.biomedcentral.com\/articles\/10.1186\/s13059-025-03783-4#ref-CR59\" id=\"ref-link-section-d75272624e558\">59<\/a>, <a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 60\" title=\"Nelson ADL, Shippen DE. Blunt-ended telomeres: an alternative ending to the replication and end protection stories. Genes Dev. 2012;26:1648&ndash;52.\" href=\"https:\/\/genomebiology.biomedcentral.com\/articles\/10.1186\/s13059-025-03783-4#ref-CR60\" id=\"ref-link-section-d75272624e561\">60<\/a>] would have difficulty implementing the methods.<\/p>\n<p>An explosion of long read sequencing data has been generated for many organisms across the animal and plant kingdom [61, 62]. A computational method that can use this abundant long read sequencing data and estimate telomere length with minimal requirements can be a powerful toolkit for investigating the biology of telomere length variation. But so far, such a method is not available, and implementing one would require addressing two major algorithmic considerations before it can be widely used across many different organisms. The first algorithmic consideration is the ability to analyze the diverse telomere sequence variation across the tree of life. All vertebrates have an identical telomere repeat motif TTAGGG [63] and most previous long read sequencing based computational methods were largely designed for analyzing human genomic datasets where the algorithms are optimized on the TTAGGG telomere motif. But the telomere repeat motif is highly diverse across the animal and plant kingdom [64,65,66,67], and there are even species in fungi and plants that utilize a mix of repeat motifs, resulting in a sequence complex telomere structure [64, 68, 69]. A new computational method would need to accommodate the diverse telomere repeat motifs, especially across the inherently noisy and error-prone long read sequencing data [70]. With recent improvements in sequencing chemistry and technology (HiFi sequencing for PacBio and Q20 + Chemistry kit for Nanopore) error rates have been substantially reduced to 1% [71, 72]. But even with this low error rate, a telomeric region that is several kilobase pairs long can harbor substantial erroneous sequences across the read [73] and hinder the identification of the correct telomere\u2013subtelomere boundary. In addition, long read sequencers are especially error-prone to repetitive homopolymer sequences [74,75,76], and the GT-rich microsatellite telomere sequences are predicted to be an especially erroneous region for long read sequencing. A second algorithmic consideration relates to identifying the telomere\u2013subtelomere boundary. Prior long read sequencing based methods [51, 52] have used sliding windows to calculate summary statistics and a threshold to determine the boundary between the telomere and subtelomere. Sliding window and threshold based analyses are commonly used in genome analysis, but they place the burden on the user to determine the appropriate cutoff, which for telomere length measuring computational methods may differ depending on the sequenced organism. In addition, threshold based sliding window scans can inflate both false positive and false negative results [77,78,79,80,81,82] if the cutoff is improperly determined.<\/p>\n<p>Here, we introduce Topsicle, a computational method that uses a novel strategy to estimate telomere lengths from raw long read sequences from the entire whole genome sequencing library. Methodologically, Topsicle iterates through different substring sizes of the telomere repeat sequence (i.e., telomere <i>k<\/i>-mer) and different phases of the telomere <i>k<\/i>-mer are used to summarize the telomere repeat content of each sequencing read. The <i>k<\/i>-mer based summary statistics of telomere repeats are then used for selecting long reads originating from telomeric regions. Topsicle uses those putative reads from the telomere region to estimate the telomere length by determining the telomere\u2013subtelomere boundary through a binary segmentation change point detection analysis [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 83\" title=\"Bai J. Estimating multiple breaks one at a time. Econometr Theory. 1997;13:315&ndash;52.\" href=\"https:\/\/genomebiology.biomedcentral.com\/articles\/10.1186\/s13059-025-03783-4#ref-CR83\" id=\"ref-link-section-d75272624e636\">83<\/a>]. We demonstrate the high accuracy of Topsicle through simulations and apply our new method on long read sequencing datasets from three evolutionarily diverse plant species (<i>A. thaliana<\/i>, maize, and <i>Mimulus<\/i>) and human cancer cell lines. We believe using Topsicle will enable high-resolution explorations of telomere length for more species and achieve a broad understanding of the genetics and evolution underlying telomere length variation.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Long read sequencing technology (advanced by Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (Nanopore)) is revolutionizing the genomics field [43] and it has major potential to be a powerful computational tool for investigating the telomere length variation within populations and between species. Read length from long read sequencing platforms is orders of magnitude longer than [\u2026]<\/p>\n","protected":false},"author":662,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11,19,1523,412,41,2028],"tags":[],"class_list":["post-223008","post","type-post","status-publish","format-standard","hentry","category-biotech-medical","category-chemistry","category-computing","category-genetics","category-information-science","category-satellites"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/223008","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/662"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=223008"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/223008\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=223008"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=223008"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=223008"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}