Researchers develop new, more accurate computational tool for long-read RNA sequencing

On the journey from gene to protein, a nascent RNA molecule can be cut and joined, or spliced, in different ways before being translated into a protein. This process, known as alternative splicing, allows a single gene to encode several different proteins. Alternative splicing occurs in many biological processes, like when stem cells mature into tissue-specific cells. In the context of disease, however, alternative splicing can be dysregulated. Therefore, it is important to examine the transcriptome—that is, all the RNA molecules that might stem from genes—to understand the root cause of a condition.

However, historically it has been difficult to “read” RNA molecules in their entirety because they are usually thousands of bases long. Instead, researchers have relied on so-called short-read RNA sequencing, which breaks RNA molecules and sequence them in much shorter pieces—somewhere between 200 to 600 bases, depending on the platform and protocol. Computer programs are then used to reconstruct the full sequences of RNA molecules.

Short-read RNA sequencing can give highly accurate sequencing data, with a low per-base error rate of approximately 0.1% (meaning one base is incorrectly determined for every 1,000 bases sequenced). Nevertheless, it is limited in the information that it can provide due to the short length of the sequencing reads. In many ways, short-read RNA sequencing is like breaking a large picture into many jigsaw pieces that are all the same shape and size and then trying to piece the picture back together.

Blog