“Identifying long non-coding RNA, misannotated and novel genes in the watermelon genome using PacBio Iso-Seq”
Watermelon (Citrullus lanatus) is an economically important and widely cultivated vegetable crop in the cucurbit family, which also includes cucumber, pumpkin, squash and muskmelon. An improved watermelon genome would be an important resource for watermelon research and its close relatives. In this project, to improve the watermelon genome annotation and to identify long non-coding RNAs (lncRNAs), we generated large-scale transcriptome sequences using PacBio Iso-Seq technology from mixed watermelon tissues. Errors in the transcriptome sequences were corrected using Illumina RNA-Seq data and then full-length transcript isoforms were extracted. A total of 96.5% of the isoforms could be aligned to the watermelon reference genome.
Based on the alignments we identified a total of 1,326 lncRNAs in the watermelon genome, including 49 intronic, 845 intergenic and 432 antisense. We also found 350 novel genes that were previously not annotated in the reference genome, which could code for proteins such as a defensin-like protein and a Mads1 protein etc. We also identified 851 potential errors in the previous annotations, where genes annotated as separate in the reference genome, should be combined because multiple full-length reads spanned those genes. The improved gene predictions in the watermelon genome as well as the newly identified lncRNAs are valuable resources for research on watermelon and an overall better understanding of the cucurbit family.
My internship at BTI, has been a very valuable and memorable experience. Prior to BTI, I had taken both computer science and biology classes however I had never combined both for research. Through this experience, I have gained a better grasp of utilizing the command line, received exposure to a multitude of pipelines and software commonly used in the field of bioinformatics, and experienced a research project in its beginning, middle and end stages. My mentor, Xin Wang, was very supportive and guided me as well as challenged me throughout the project. After listening to BTI researchers about their work and its real world applications, my interest and curiosity to learn more about plants and bioinformatics has increased significantly.