Wight - Boyce Thompson Institute

Haley Wight

Year: 2015

De Novo Assembly and Annotation of the Glomus Versiforme Genome

Project Summary

Glomus versiforme is a species of arbuscular mychorrhizal (AM) fungi. This type of fungus forms a mutualistic relationship with most vascular plants: the fungus colonizes around the root of the plant to acquire carbon, and supplies the plant with phosphate in return. Phosphorus is being added to fertilizers because of its importance to plant growth which is expensive and inefficient. Genomic information would facilitate the exploration of the underlying mechanisms involved in this ecologically important symbiosis. Although Glomus is the largest genus of AM fungi, no member of the Glomusgenus has been sequenced. Our collaborators used Next Generation Sequencing (NGS) technology to gather a large collection of short sequencing reads. The focus of my project was to assemble this data into a continuous genome and annotate protein coding genes. To prepare this data for assembly the adapters and bar codes were removed, repeated reads were collapsed, and sequence errors were corrected. Through the analysis of the large scale short sequences, we estimated the genome size of G. versiforme to be around 311.6 Mb, making this the largest sequenced fungal genome to date. Sequence analysis also indicated the genome is highly homozygous and highly repetitive. The high-quality cleaned short reads was then assembled de novo using SOAPdenovo2 and the resulting assembly has a total size of 151.2 Mb and N50 of ~15 Kb. Once the genome was assembled, a set of conserved eukaryotic genes, ESTs, and protein evidence was aligned to train ab initio protein predictors. The results from these predictors were consolidated into a genome annotation of 9,546 genes using the MAKER pipeline. This genome assembly and annotation will provide a basis for future research on symbiosis mechanisms of G. versiforme.

My Experience

I am a Bioinformatics major at Ramapo College. However, most of the courses are focused on either computer science or biology independently. Being involved in this research allowed me to completely integrate my interest of both fields. This internship had a wealth of resources: I was able to work with Next-Generation Sequencing data for the first time on about 30 cores. The most valuable resource was the guidance from my mentor, who taught me new ways to troubleshoot problems, advanced techniques within the UNIX environment, and many other skills necessary to succeed in the research environment. BTI also held professional events outside the lab environment for interns which gave me a forum to learn about other research projects as well as seek guidance from others in my field. Overall, this internship has confirmed my interest in bioinformatics and has given me the confidence to pursue a Ph.D. in bioinformatics.