Zhangjun Fei
Professor
Developing and curating powerful genomic resources and computational tools and applying integrative bioinformatics approaches to harness vast ‘omics’ datasets for a deeper understanding of crop origin, domestication, and key traits.

How can large-scale plant genomics datasets be efficiently integrated to advance biological discovery and crop improvement?
Email: zf25@cornell.edu
Adjunct Professor
Section of Plant Pathology & Plant-Microbe Biology
School of Integrative Plant Science
Cornell University
Graduate Fields: Plant Pathology & Plant-Microbe Biology; Plant Biology
Wu S#, Sun H#, Zhao X#, Hamilton JP, Mollinari M, Gesteira GS, Kitavi M, Yan M, Wang H, Yang J, Yencho GC, Buell CR, Fei Z* (2025) Phased chromosome-level assembly provides insight into the genome architecture of hexaploid sweetpotato. Nature Plants 11:1951-1959
Research Briefing Decoding the complexity of the hexaploid sweet potato genome Nature Plants 11:1712-1713
Zhang X, Tang C, Jiang B, Zhang R, Li M, Wu Y, Yao Z, Huang L, Luo Z, Zou H, Yang Y, Wu M, Chen A, Wu S, Hou X, Xu Liu X*, Fei Z*, Fu J*, Wang Z* (2025) Refining polyploid breeding in sweetpotato through allele dosage enhancement. Nature Plants 11:36-48
Research Briefing Understanding the genomic basis to empower sweet potato breeding Nature Plants 11:14-15
Chen W, Xie Q, Fu J, Li S, Shi Y, Lu J, Zhang Y, Zhao Y, Ma R, Li B, Zhang B, Grierson D, Yu M*, Fei Z*, Chen K* (2025) Graph pangenome reveals the regulation of malate accumulation in blood-fleshed peach by NAC transcription factors. Genome Biology 26:7
Hu X, Xu C, Li X, Li L, Bao Y, Gu M, Li X, Huo L, Gong J, Li X, Wang M, Xu K, Yin X, Fei Z*, Sun X* (2025) Subgenome dominance in allotetraploid Actinidia valvata regulates RNA m6A modification for waterlogging tolerance. Advanced Science 12:e03974
Lai E, Guo S, Wu P, Qu M, Yu X, Hao C, Li S, Peng H, Yi Y, Zhou M, Fu G, Li X, Liu H, Zheng Y*, Wang X*, Fei Z*, Gao L* (2025) Genome of root celery and population genomic analysis reveal the complex breeding history of celery. Plant Biotechnology Journal 23:946-959
Research Overview
The advance of high-throughput technologies has given rise to a wealth of genome-wide data encompassing environmental, genetic, and evolutionary diversity. This has revolutionized agricultural research and crop breeding, yielding abundant resources and insights that drive key innovations. However, it remains a major challenge to effectively digest these massive datasets to formulate hypotheses, explore genome evolution, and elucidate regulatory mechanisms underlying critical biological processes. To address this challenge, my group has focused on developing genomic tools, databases, resources, and novel algorithms to analyze and integrate large-scale ‘omics’ datasets, with the goal of uncovering and understanding important biological phenomena.
Research in my lab focuses on:
- Developing biological databases and computational tools for efficient storage, management, dissemination, and mining of diverse ‘omics’ datasets
- Building large-scale genomic resources to advance research and crop improvement
- Applying integrated bioinformatics and genomics approaches for trait discovery, crop improvement, and knowledge advancement.
Databases
- Cucurbit Genomics Database
- Tomato Functional Genomics Database
- Tomato Epigenome Database
- Pepper Genomics Database
- Kiwifruit Genome Database
- SpinachBase
- Whitefly Genomics Database
- Trichoplusia ni Genome Database
- Chinese Tomato Virome
- Pan-African Sweet Potato Virome
- Plant Genome Editing Database
Bioinformatics
- iTAK – A package to identify and classify plant transcription factors and protein kinases.
- VirusDetect – An automated pipeline for efficient virus discovery using deep sequencing of small RNAs.
- Plant MetGenMAP – a web-based tool for comprehensive mining and integration of gene expression and metabolite changes in the context of biochemical pathways.
- iAssembler – A de novo assembly package for transcriptome sequences generated using 454 or Sanger platforms.
Lab Members
Honghe Sun
Xuebo Zhao
Xuanbo Zhang
In the News
Cucumber is an economically important crop worldwide, ranking as the third most-produced vegetable after tomatoes and onions. Yet breeding improved varieties—plants that are more resilient, produce better-shaped fruit, or are...
BTI, Meiogenix, and FFAR Announce $2 Million Breakthrough Tomato Genetics Collaboration
Research Lays the Foundation for Breakthroughs in Global Food Security In a landmark $2 million initiative, the Boyce Thompson Institute (BTI) and biotechnology company Meiogenix have launched a collaboration to develop drought- and disease-resistant tomatoes by tapping...
The sweetpotato feeds millions worldwide, especially in sub-Saharan Africa, where its natural resilience to climate extremes makes it crucial for food security. But this humble root vegetable has guarded its...
Study Reveals Role of Allele Dosage in Improving Sweetpotato Traits
Sweetpotatoes are an agricultural powerhouse that feeds millions globally. However, their complex genetics make it challenging for breeders to understand and improve traits like yield, disease resistance, and nutritional content....
Study Finds Genetic Mechanisms Behind High-Yield Apple Trees
Apples rank among the world’s most valuable fruit crops, with production spanning more than 100 countries. Some apple trees naturally develop into what farmers call “spur-type” varieties—compact trees that are...
Unlocking the Genetic Mysteries of Modern Roses
Roses are one of the world’s most beloved and widely cultivated ornamental plants, captivating hearts and adorning gardens for centuries. Despite their popularity, the genetic origins and breeding history of...
Internships
BTI offers a summer research experience program for undergraduate and high school students.
Intern Projects in the Fei Lab
Genomics and bioinformatics have revolutionized plant research and crop breeding. Reference genomes have played a central role in advancing basic research, gene/QTL cloning, molecular marker discovery, marker-assisted breeding, and our understanding of genome evolution and crop domestication. However, reference genomes derived from only one or a few accessions cannot fully capture the genetic diversity within a crop species, leading to the loss of significant and valuable genetic information. To address this limitation, Dr. Fei’s group has focused on comprehensive investigations into the pangenome of horticultural crops to better understand the genetic basis of their origin, domestication and key agronomic traits.
Previous Interns
Anthony Corbett
Anthony is currently at Rochester Institute of Technology For the duration of his internship, Anthony worked under the supervision of Dr. Zhangjun Fei on PathOmics, integrating omics datasets with pathway information.
Intern Info
Viktor Vasilev
Viktor worked on developing a pipeline on identification of SNPs from cDNA sequences generated by 454-sequencing technology and creating an EST unigene build.
Intern Info
Aileen Tolentino
Aileen worked on developing a small RNA analysis pipeline, mainly to identify miRNAs. She also worked on integrating the analysis results into the Tomato Functional Genomics Database.
Intern Info
Catherine A. Peluso
Functional module identification with tomato gene and metabolite expression profiles.
Intern Info
David Selassie Opoku
Identification of Virus Genome sequences from RNA-seq data of a field-grown tomato plant
Viral diseases in crops have a detrimental agricultural and economical impact globally, especially in developing countries. However, efforts to mitigate the impact of crop viruses are hampered by the lack of low-cost and efficient tools that can geographically detect and characterize crop viruses. With the recent advent of next generation sequencing technologies, novel methods could be developed to efficiently identify plant virus genomes by employing these technologies. In this project, we propose a novel method of deep sequencing plant transcriptomes (RNA-seq) to detect virus genomes. By de novo assembly of an RNA-seq dataset generated from fruits of an Ithaca field-grown tomato plant (cultivar M82), we were able to identify three virus genomes, although the plant did not show any visible disease symptoms: potato virus Y, southern tomato virus and tomato mosaic virus. The identified potato virus Y and southern tomato virus are same as previous reported genomes (GenBank Acc#: X12456 and EF442780, respectively), while the tomato mosaic virus is a new isolate, which shares 86% nucleotide sequence identity to the previous reported genome (GenBank Acc#: AF332868). With this approach, it will be highly efficient to geographically identify and characterize virus genome for major food crops; a key step towards the overall goal of reducing crop loss due to viral diseases.
My Experience
The opportunity to work as a bioinformatics intern in the Fei lab at the Boyce Thompson Institute gave me the opportunity to finally combine my knowledge from biology and computer science in real world research, an opportunity not available at my liberal arts college. I enriched my skills in programming while learning new biological concepts and solidifying old ones through work with next generation sequencing tools. The most exciting part of this summer experience, was the chance to work on the tomato virus genome project, a precursor to a bigger project towards identification of virus genome in Pan-African sweet potato that the Fei lab will be working on. This opportunity did not only strengthened my desire to study bioinformatics or computational biology at the graduate level but as a student from Ghana, also the possibility to focus on plant and agricultural research in developing regions such as sub-Saharan Africa.
Intern Info
Robert Langan
Comparative Transcriptome Analysis of Watermelon Flesh and Rind During Fruit Development
The flesh and rind of watermelon fruit are fundamentally different, and with the advent of high throughput sequencing, it is possible to determine this difference at the level of gene expression. Through comparative transcriptome sequencing and analysis of watermelon flesh and rind at four critical stages of fruit development, immature white (10 days after pollination (DAP)), white-pink flesh (18 DAP), red flesh (26 DAP) and over-ripe (34 DAP), we were able to identify a total of 764, 1389, 4305, and 3358 differentially expressed genes between flesh and rind at 10, 18, 26, and 34 DAP, respectively. Further characterization of these differentially expressed genes indicated that functional categories such as responses to various biotic/abiotic stresses and photosynthesis were highly enriched in all four stages, whereas carotene/carotenoid metabolic process and sugar metabolism including hexose catabolic process, glucose catabolic process, and monosaccharide catabolic process were highly enriched in later stages of fruit development. These results supported the major physiological differences including sugar content and fruit color observed between flesh and rind.
My Experience
This internship opportunity has been a great opportunity to look into the field of bioinformatics. Before this summer, I did not have a good idea what it is like to work in this field. Now, after working with Dr. Fei and his lab group I have a new appreciation for what it is like to work with very large data sets. His lab group was very helpful and answered whatever questions I had and pointed me in the right direction when I was confused about what to do next. As a computational biologist, this was a worthwhile experience because it helps narrow down this broad field that I wish to enter as I start to look for graduate programs.
Intern Info
Samantha Klasfeld
Virus Identification in Sweet Potato Samples Collected from Mozambique and Ghana Through AnalyzingDeep siRNA Sequences
Sweet potato, Ipomoea batatas (L.) Lam. (Family Convolvulaceae), is among the most important food crops in the world and an extremely important food crop for subsistence farmers in sub-Saharan Africa (SSA). In SSA, the sweet potato production is very low and viruses are regarded as a major limiting factor of the production of this crop. Recently, deep sequencing of virus siRNAs has proven to be an efficient approach for de novo assembly of known and novel virus identification in plants. The major aim of my summer intern project was to analyze siRNA sequences generated from 35 and 38 sweet potato samples collected from Mozambique and Ghana, respectively, to identify known and novel sweet potato viruses. First the siRNA sequences were aligned to the virus sequence database to identify known viruses. Then the siRNA sequences were de novo assembled and the resulting contig sequences were compared to known virus sequence database at the amino acid level using blast, to identify novel virus sequences.
My Experience
The BTI bioinformatics program has given me the opportunity to experience a new species of lab. Escaping the pipettes and glass-wear for the summer, I was thrilled when I realized how much quicker experiments on the computer move than those done in wet labs. Though we only had ten weeks, I felt I could be more creative while simultaneously maintaining a time line and obtaining significant amounts of data. I built a better foundation in programming by practicing Linux, Perl, etc., and Professor Fei was always willing to answer all my questions. Being particularly interested in the field of genetics and genomics, I was especially interested in relating the programming I studied in class to the study of biology. This internship excelled in giving me that exposure and the confidence to build on what I learned.
Intern Info
Kevin Nguyen
Utilizing breakpoint detection algorithms to locate structural variations in wild and cultivated cucumbers
What makes each organism and species unique is their genetic makeups. The majority of these differences can be attributed to structural variations (SVs). Ranging in types from duplications to insertions and deletions, the SVs influence what genotypes are included and thus what phenotypes will be expressed. In an effort to document which genes give strength or weakness, it is important that these SV are identified and annotated. With this kind of information, evolution and population structures can be inferred as well as be utilized in marker-assisted breeding. In this project, the focus will be on detecting deletions, a type of SVs, in a cultivated cucumber, Cucumis sativus L, and a wild cucumber. Several years back, the genome of cucumber was successfully sequenced and annotated; recently the genome for the wild species was sequenced using next-generation methods. Using certain properties from the next-gen sequencing, the two genomes can be aligned to each other. In order to detect the deletions, previously published literatures were surveyed in order to find the appropriate algorithm. Pindel, developed by The Genome Institute at Washington University, was picked for detecting breakpoints in the alignment to location and measure the deletions. Later on, other algorithms were implemented to extract and filter key information, such as breakpoint location and deletion length, based on certain parameters. The result is a list of genes and phenotypes that were lost in the wild type.
My experience at BTI is the first one that was more focused on the computational aspect of research. While I do have quite a bit of background in programming, it was the first time I used perl and various pipelines such as Samtools and Pindel. It took some time to pick them up, but once I understood them I could see how useful and powerful they are in research. This was also my first time collaborating with a full lab. From what I was used to, it was just one on one interactions with my mentor; this time I had a whole team to ask questions and work with. This served as a reminder that research is not a solo act but a group effort. Having the opportunity to work in Dr. Fei’s lab has strengthened my desire to commit to real world research that combines biology and computer science.
Intern Info
Jessica Lovejoy
Development of an eQTL database depicting SNP and expressional data analysis tools for tomato
A database was constructed to illustrate the RNA-seq data analyzed from the introgression lines parented by the domesticated tomato Solanum lycopersicum and the wild-type tomato S. pennellii. The purpose of this work is to organize large data sets in a comprehensible, efficient and convenient way for scientists to research expressional data and SNPs in tomato. Methods for the database construction included the making of the database schema, the use of the MySQL and phpmyadmin programs, and the manipulation of data through Perl scripts for appropriate upload to the database. Upon completion, Perl CGI scripts were written to allow viewing access to the data via HTML. This work allows differentially expressed genes and SNPs on tomato chromosomes to be analyzed in a more optimized and simplistic way. Following this, it is inferred that researchers would be more readily able to determine the importance of SNPs relative to desired phenotype traits in tomato. In turn, this helps to stimulate successful tomato production.
My Experience
Through my internship at the Boyce Thompson Institute, I have acquired the personal, professional and academic goals I set for myself upon arrival. I have been exposed to a new topic of interest for me: database design. I learned the procedures for designing a database schema and then implementing it, which also gave me a greater understanding of the Perl coding language. To complete my work, I often had to do a lot of self-learning, which helped to boost confidence in my ability to conduct research and overcome obstacles. The knowledge and experience that my mentors provided helped me through times where I had little direction. I thoroughly enjoyed the atmosphere that comes from working with a research team due to the sense of a team effort in pursuit of an overall goal. Additionally, I have earned experience that will prove invaluable towards my pursuit of graduate school studies.
Intern Info
Jensen Lo
Transcriptome profiling of susceptible and resistant watermelon cultivar response to Fusarium oxysporum f. sp. Niveum
Watermelon is an important crop worldwide; by yield, it is the fifth most harvested vegetable and most harvested fruit. However, the fungal pathogen Fusarium oxysporum f. sp. Niveum limits watermelon crop yields all over the world. Plant breeders have discovered several watermelon cultivars that appear to have enhanced resistance to Fusarium Wilt. These cultivars are often unappetizing compared to commercial watermelon varieties, which are typically susceptible to the disease. Our research is focused on finding the genes responsible for disease resistance to the pathogen using the RNA-seq approach. We hypothesize that important resistance genes will be expressed differently between resistant and susceptible cultivars upon the infection of the pathogen. The results of our research will allow the use of marker-assisted breeding to combine disease resistance and appetizing taste in a single watermelon cultivar. In addition, the analysis of the differentially expressed genes may provide clues as to the mechanisms behind resistance to Fusarium Wilt, thus our findings could potentially be applied to other plants.
My Experience
The PGRP program gave me the opportunity to experience scientific research first-hand, an opportunity that I would not otherwise have had in high school. I learned many skills about the application of computers to the study of the life sciences, but even more importantly, I experienced data analysis and critical reasoning in a scientific setting. In my time here, there was a lot of hard work, but the work was worthwhile and meaningful. I always felt like there was an important reason behind everything that we did. The PGRP program has shaped the path of my future career immensely and I have a better idea of what I want to pursue in college and my graduate studies. My only regret about the PGRP program is that I didn’t participate sooner.
Intern Info
Rachel Blakely
Identification of copy number variations (CNVs) in wild and Eurasian cucumber populations
Project Summary
The Eurasian cucumber population has been domesticated in order to select for traits that improve growth and consumption. In doing so, the population has undergone a number of genetic changes. Discovering which genes vary between populations and which remain the same can help scientists understand the observable differences between the populations. It is valuable to investigate such differences in common fruits and vegetables because as a food source, it is necessary to ensure that they can be bred to be as nutritional as possible. In many cases, domestication has removed or altered genes with negative effects on crop traits and in the process may have also removed beneficial genes such as those that help with disease resistance or nutritional value. Discovering these inadvertently altered genes will allow breeders to be put back into the genome. In order to investigate the differences between the wild and Eurasian cucumber populations, the copy number variations (CNVs) were identified using cn.mops. The program distinguished 943 different regions in the cucumber genome, which contain CNVs. Among these regions was one containing the F locus, which is implicated as the region that causes gynoecious cucumbers. Many of the identified regions contain known genes, but other regions do not have any known genes. Discovering these genes in the CNV regions will provide useful knowledge for future genomics-based breeding that could improve important aspects of the plant.
My Experience
I have worked in two other Bioinformatics labs as an undergraduate, however, I have learned more from this experience at BTI than from either of the others. Working with a mentor for the first time, I had to adjust from working independently to collaborating with a group of researchers.. While unsure of what to expect from the mentor-mentee relationship, this guidance enabled me to understand the work I was doing better than I had in any previous experience. As a result, I gained confidence in my ability to successfully conduct research. As my first experience working in the field of plant biology, this internship was a great success. I have learned a lot about the dynamic of working in a research lab and about the impact and importance of plant biology research.
Intern Info
Syed Ather
RNA-Seq analysis of lncRNAs and cis-natural antisense transcript (cis-NAT) in developing tomatoes
Project Summary
Since the sequencing of the human genome, it has been revealed that the vast majority of DNA does not code for proteins. However, upon transcription, some of these regions are not fruitless, but produce long noncoding RNAs (lncRNAs), which recently have been reported to play important roles in a number of biological processes in eukaryotes. When a certain RNA sequence is transcribed from DNA, it may be inhibited from translation by another transcribed RNA sequence in proximity. If this inhibiting RNA, known as antisense RNA, occurs on the opposite DNA strand, it is referred to as a cis-Natural Antisense Transcript (cis-NAT). RNA-seq, a revolutionary tool for transcriptome analysis has been used to systematically discover and characterize lncRNAs and cis-NATs in plants and animals. In my study, strand-specific RNA-seq data from tomato fruits at two critical developmental stages (mature green and breaker) were used to identify lncRNAs and cis-NATs and to investigate their differential expression during tomato development. Through identification and expression analysis of cis-NATs and linc RNAs in tomato we can obtain deeper insight in the regulatory mechanism of fruit ripening, which can help improve fruit quality to fight problems of malnutrition.
My Experience
The Boyce Thompson Institute internship provided a challenging, meaningful experience of every kind in scientific study. I was given a balanced mix of completing projects on schedule with the flexibility of learning what interested me. This freedom, allowed me to be innovative and learn a lot in the process. I pursued my goals in both software engineering and biological research from the abstract theories to practical use; learning how to work in a group, with my mentor and professor was the only way it was possible. Working with my mentor to correct my mistakes and to seek guidance helped me understand how to manage projects, to solve problems effectively. Though I arrived with a strong computational background but little interest in plant studies, the BTI internship taught me to appreciate nature and see the importance of plant science.
Intern Info
Haley Wight
De Novo Assembly and Annotation of the Glomus Versiforme Genome
Project Summary
Glomus versiforme is a species of arbuscular mychorrhizal (AM) fungi. This type of fungus forms a mutualistic relationship with most vascular plants: the fungus colonizes around the root of the plant to acquire carbon, and supplies the plant with phosphate in return. Phosphorus is being added to fertilizers because of its importance to plant growth which is expensive and inefficient. Genomic information would facilitate the exploration of the underlying mechanisms involved in this ecologically important symbiosis. Although Glomus is the largest genus of AM fungi, no member of the Glomusgenus has been sequenced. Our collaborators used Next Generation Sequencing (NGS) technology to gather a large collection of short sequencing reads. The focus of my project was to assemble this data into a continuous genome and annotate protein coding genes. To prepare this data for assembly the adapters and bar codes were removed, repeated reads were collapsed, and sequence errors were corrected. Through the analysis of the large scale short sequences, we estimated the genome size of G. versiforme to be around 311.6 Mb, making this the largest sequenced fungal genome to date. Sequence analysis also indicated the genome is highly homozygous and highly repetitive. The high-quality cleaned short reads was then assembled de novo using SOAPdenovo2 and the resulting assembly has a total size of 151.2 Mb and N50 of ~15 Kb. Once the genome was assembled, a set of conserved eukaryotic genes, ESTs, and protein evidence was aligned to train ab initio protein predictors. The results from these predictors were consolidated into a genome annotation of 9,546 genes using the MAKER pipeline. This genome assembly and annotation will provide a basis for future research on symbiosis mechanisms of G. versiforme.
My Experience
I am a Bioinformatics major at Ramapo College. However, most of the courses are focused on either computer science or biology independently. Being involved in this research allowed me to completely integrate my interest of both fields. This internship had a wealth of resources: I was able to work with Next-Generation Sequencing data for the first time on about 30 cores. The most valuable resource was the guidance from my mentor, who taught me new ways to troubleshoot problems, advanced techniques within the UNIX environment, and many other skills necessary to succeed in the research environment. BTI also held professional events outside the lab environment for interns which gave me a forum to learn about other research projects as well as seek guidance from others in my field. Overall, this internship has confirmed my interest in bioinformatics and has given me the confidence to pursue a Ph.D. in bioinformatics.
Intern Info
Ronan Perry
Recombination Patterns of Watermelon Recombinant Inbred Lines (RILs)
Project Summary
Watermelons are a massive international crop accounting for nearly ten percent of international vegetable production. One of its many cousins is the citron melon which, while lacking the flavor of the watermelon, has an accession, PI-296341-FR, with notable resistance to Fusarium wilt, a prominent disease that troubles melon growers. Both of these melons have sequenced genomes and have been crossed in order to create an F8 population of 96 recombinant inbred lines (RILs) that have a mostly homozygous mix of their parent’s genomes. The genomes of these RILs have been sequenced as well. With this data, the RIL genomes can be analyzed in relation to the parents in order to generate a genetic map of parent specific regions within the RIL genomes. Recombination events can then be derived and hotspots and coldspots can be identified. With this information, plant breeders and researchers will be better able understand patterns of genetic variation in watermelons and be able to accelerate breeding through knowledge of recombination rates. Additionally, the creation of linkage maps and identification of QTLs will aid in the development of watermelon genetic mapping.
My Experience
BTI has provided me an amazing and unique experience with this internship. I didn’t know what exactly to expect but knew that I would be able to explore my interest in computer science. Although I had a strong knowledge base to work off of, I came to realize that I understood little and so over the course of this internship, I have spent as much time learning as I have spent working. I taught myself Perl, learned to work with the Linux command line, came to understand how bioinformatics and genomic analysis works, and gained experience working with a lab and mentor. Additionally, despite my initial focus and interest on the computational side of this work, the internship has given me an appreciation for the importance of the biological research side. I would like to thank BTI for hosting me here, Dr. Fei and my mentor, Honghe Sun, for their support and help along the way.
Intern Info
Noah Legall
SNP study of the Ma1 gene
Project Summary
Malus domestica, commonly known as the domesticated apple, is an economically important crop grown worldwide. Consumers tend to favor apples that generally taste better, and breeders invest time and energy in order to improve apple taste. One of the major factors determining apple taste is the malic acid content present within the apple flesh, and geneticist have tied malic acid content to theMa1 gene, which creates a protein that pumps malic acid into the fruit. The gene itself is variable among different varieties and species in the Malus genus, with abundant single nucleotide polymorphisms (SNPs) found throughout the Ma1 open reading frame. The location of these SNPs could teach geneticists and breeders about genotypes that correspond to high and low acidity in apples, which could help to develop better tasting apples for the consumer. The purpose of this project is to identify SNPs among different Malus accessions in the Ma1 gene region through bioinformatics analysis of genome deep resequencing dataset and to then look for SNPs that are tightly associated with apple acidity by integrating the phenotypic acidity data. New genetic markers potentially linked to apple acidity were determined. These potential markers need to be studied further to confirm their relationship with acidity phenotypes.
My Experience
I vehemently enjoyed my experience here at BTI this summer. I was always interested in a career in research, but the only thing that limited my idea of my future conducting research was being burdened with the heavy schoolwork I had during my first year at UNC Chapel Hill. I had conducted research before, but had trouble fully investing time to it due to my schedule. Working at BTI gave me the opportunity to focus solely on a research project this summer and I’m grateful I had the chance to give 100 percent to it. I grew up and was raised to deal with pressure and to find a way to make problems work, and I really came to appreciate that during the rigorous parts of the internship. I was grateful to have such supportive mentors as well; they taught me so much in the short time I was here!
Intern Info
Lisa Yoo
Identifying dynamically expressed genes during sweetpotato root development
Project Summary
The sweetpotato (Ipomoea batatas) is the seventh most important crop in the world. With its high nutritional quality and relatively low labor cost, it serves as a major food security crop for many sub-Saharan African nations. Though traditional breeding techniques have been implemented in order to boost the yield and quality of the sweetpotato, these have not been very effective. Therefore, it is necessary that we gain more genomic and genetic information about this crop to facilitate the development of next generation breeding tools. There are three types of sweetpotato roots: fibrous roots (FR), pencil roots, and storage roots (SR), of which the storage roots are mainly consumed. For this project, we generated transcriptome profiles of total roots at 10 and 20 DAT (days after transplant) and fibrous roots and storage roots at 30, 40, and 50 DAT, for Beauregard, one of the world’s most popular varieties of sweetpotato. The RNA-Seq reads were mapped to the reference genome of I. trifida. To examine differential gene expression, we made comparisons between samples of different developmental stages in SR, and between FR and SR samples. We further evaluated these results using gene ontology and enzyme pathway analysis. Our results allowed us to identify candidate genes in the carbohydrate metabolism pathway that are important in storage root development, which is potentially useful in context of increasing the yield and quality of the sweetpotato.
My Experience
My summer at BTI was such an incredible learning experience. Coming into the program, I had almost no background in plant biology, research, or computer science. However, as the internship is drawing to a close, I can now say that I have developed knowledge and skills in all of these areas. Everyday, I was faced with new obstacles, whether it be writing my research proposal, or figuring out how to run a script on the server, or presenting my research in front of my lab. Through dealing with these tasks, I learned responsibility and independence. Working in a real laboratory has given me a higher level of appreciation for the challenges and excitement of conducting research.
I’d like to thank my mentor, Shan Wu, for helping me through every step of my research process. I’d also like to thank Dr. Fei, the whole Fei lab, the other interns, and Tiffany Fleming for making my experience so valuable and memorable.
Intern Info
Angela Taylor
Exploring Tomato Virus Diversity in China using Deep Small RNA Sequencing
Project Summary
My Experience
Intern Info
Michael Morikone
Identification of Long Non-Coding RNA and Alternative Splicing Events During Tomato Fruit Development
Project Summary
Long non-coding RNAs (lncRNAs) are a class of regulatory RNA that are longer than 200 base pairs and do not code for protein. Alternative splicing (AS) is a process in gene transcription where a multi-exon gene is spliced into two or more different mature transcripts. Solanum lycopersicum, a model plant for fleshy fruit development, was studied through the use of RNA-Seq analysis for lncRNAs and AS events in order to better understand the molecular mechanisms of fleshy fruit development. Five distinct stages of tomato fruit with three biological replicates were used to monitor the changes in cell expanison, maturation and ripening. My project was based around generating a comprehensive list of isoforms from the RNA-Seq data in order to identify lncRNAs, AS events, and new protein-coding genes. It was determined from this dataset that there were 565 lncRNAs with 162 of these having significant changes in expression during fruit development. It was also shown that 289 paired isoforms had significant differential expression between stages. Additionally, there were 214 new protein coding genes that were identified from the dataset. A new lncRNA was identified that is an example of a cis-natural antisense transcript, which showed an inverted expression pattern to a zeta-carotene isomerase gene, a key component of lycopene biosynthesis, suggesting a possible regulatory role of this lncRNA in carotenogenesis. Identification of these lncRNAs and AS events will provide valuable resources for future research on the molecular mechanisms of gene regulation in fruit development, leading to the betterment of fruiting crops.
My Experience
I am an undergraduate student at California State University San Bernardino where I am close to graduating with degrees in both bioinformatics and biology. The bioinformatics program at my home institution is a combination of discrete biology, computer science, and mathematics courses, so having this bioinformatics experience at BTI has been very valuable. While I have had other bioinformatics related research experiences, my time at BTI has proved to be the most helpful as a budding scientist. The relationship with my mentor was very informative and allowed me to learn many things that I have not had the opportunity to otherwise. Outside of my project, there were also seminars and lab meetings that were commonly held that provided a scientific forum which further fed my interests in bioinformatics. After this internship experience, I am much more confident in my pursuit of a Ph.D. in bioinformatics or computational biology.
Intern Info
Peter Kohler
“Ethylene mediated epigenome changes in ripening climacteric melon”
Project Summary:
Ripening is a complex process that dramatically alters fruit color, flavor, firmness, and nutritional content. Improving our understanding of this process promotes the ability to optimize crop yields, disease resistance, and fruit quality. Ethylene is a natural plant hormone and is best known as a crucial regulatory component for climacteric fruit ripening. However, how ethylene controls ripening at the epigenome level is poorly understood.
To illustrate whether ethylene mediate gene expression through DNA methylation, a climacteric melon and ACC oxidase RNAi mutant with inhibited ethylene production were selected for the bisulfite sequencing (BS-Seq) analysis. By comparing the methylation level between wild-type and mutant melon, an overall demethylation patterns are observed in the mutant fruit. We identified 10,243 and 52,829 differentially methylated regions (DMRs) for CG and CHH cytosine context, representing 0.32% and 1.79% of the melon genome, respectively.
Furthermore 670 CG and 5,194 CHH DMRs intersect with 570 and 2,574 genes, respectively, of which 73% and 75% were differentially expressed based on a previous transcriptome study using the same samples. Hypo-methylation of promoters of ethylene signal transduction and biosynthesis pathway genes in the wild-type melon indicates the role of DNA methylation in the ethylene positive feedback loop. Hypo-methylated promoters and upregulated expression of some ripening-related genes in the wild-type melon, e.g. STAY-GREEN, alcohol acyl transferase, indicate ethylene regulated these ripening genes by DNA methylation.
My Experience:
Coming from a heavy computational background, this internship has been wonderful experience of learning for me, as it provided a project whose core activities catered to my strengths, while also requiring me to obtain an understanding of its biological significance. I have gotten to learn Perl, deepen my knowledge of GNU make and R, and practice analyzing the interrelated facets of biological systems, which require a different sort of consideration than the systems I am used to dealing with. I have had to learn to manage large data sets, and consequently also obtained a rare opportunity to practice performance-conscious programming.
What I appreciate most, however, is how being in BTI’s environment—listening to presentations and getting into conversations—has increased my grasp of the broader biological picture, made me aware of exciting developments and discoveries, and answered some of my long-held questions and confusion. I feel that I have been given a solid introduction to the concerns, interests, tools and mindsets of this field, which is exactly what I was hoping to obtain this summer.
Intern Info
Sophia Hu
“Identifying long non-coding RNA, misannotated and novel genes in the watermelon genome using PacBio Iso-Seq”
Project Summary:
Watermelon (Citrullus lanatus) is an economically important and widely cultivated vegetable crop in the cucurbit family, which also includes cucumber, pumpkin, squash and muskmelon. An improved watermelon genome would be an important resource for watermelon research and its close relatives. In this project, to improve the watermelon genome annotation and to identify long non-coding RNAs (lncRNAs), we generated large-scale transcriptome sequences using PacBio Iso-Seq technology from mixed watermelon tissues. Errors in the transcriptome sequences were corrected using Illumina RNA-Seq data and then full-length transcript isoforms were extracted. A total of 96.5% of the isoforms could be aligned to the watermelon reference genome.
Based on the alignments we identified a total of 1,326 lncRNAs in the watermelon genome, including 49 intronic, 845 intergenic and 432 antisense. We also found 350 novel genes that were previously not annotated in the reference genome, which could code for proteins such as a defensin-like protein and a Mads1 protein etc. We also identified 851 potential errors in the previous annotations, where genes annotated as separate in the reference genome, should be combined because multiple full-length reads spanned those genes. The improved gene predictions in the watermelon genome as well as the newly identified lncRNAs are valuable resources for research on watermelon and an overall better understanding of the cucurbit family.
My Experience:
My internship at BTI, has been a very valuable and memorable experience. Prior to BTI, I had taken both computer science and biology classes however I had never combined both for research. Through this experience, I have gained a better grasp of utilizing the command line, received exposure to a multitude of pipelines and software commonly used in the field of bioinformatics, and experienced a research project in its beginning, middle and end stages. My mentor, Xin Wang, was very supportive and guided me as well as challenged me throughout the project. After listening to BTI researchers about their work and its real world applications, my interest and curiosity to learn more about plants and bioinformatics has increased significantly.
Intern Info
Christopher Neely
“Pan-genomic analysis of Solanum habrochaites, a wild tomato plant”
Project Summary:
Solanum habrochaites is a diploid, wild tomato species that grows on the slopes of the Andes Mountains. Its unique phenotype includes glandular trichomes on the fruit, and these trichomes have been shown to be related to sesquiterpenes and other chemicals that repel insects. Because of these and other specificities, it has been commonly used as an important source of novel genes for tomato breeding. Therefore, we are interested in better understanding the genomic differences between S. habrochaites and the cultivated tomato, S. lycopersicum. To accomplish this, we constructed a pan-genome based on recent sequencing data from seven available S. habrochaites accessions. For each accession, we de novo assembled quality-filtered reads, aligned the assembled contigs to the reference genome from S. lycopersicum, and then extracted unaligned sequences. A total of 354.4 Mbp of non-reference sequences were obtained and annotated, yielding 4,002 protein-coding genes, of which 3,736 were functionally annotated. Enrichment analysis showed that Gene Ontology (GO) terms related to protein binding, ligase activity, and DNA helicase activity were significantly overrepresented in the non-reference portion of the pan-genome. The presence/absence variation (PAV) analysis showed that the core genome is comprised of over 27,000 genes, and that most genes are shared by all the accessions, with few genes specific to 5 or fewer accessions. Further analysis of genes specific in S. habrochaites will facilitate interpretation of its specificities and provide instructive information for future breeding practices in tomatoes.
My Experience:
Working at the Boyce Thompson Institute in the Fei Bioinformatics Laboratory has given me a wonderful glimpse of the life of a researcher. Completing this project required a lot of studying and planning. I learned that the answer for how to do something often comes from the community of researchers seeking to complete similar tasks or to answer similar questions. I became a lot more comfortable with finding information on my own and with relying on my mentor for larger picture issues. I also learned the importance of effective communication with my mentor during the project. Working at BTI has strengthened my resolve to be a professional researcher, and I know that I want to continue my career in in the field of biology. Once I return to the West Coast, I plan to continue to expand my skill set in analyzing big data.
Intern Info
Keeley Collins
“SpinachBase: A new database for spinach research and development”
Project Summary:
SpinachBase (www.spinachbasez.feilab.net) is a new web-based database for the Spinacia oleracea genome providing centralized public access to genomic and transcriptomic data as well as analytical tools to assist further research in spinach. Through the database, the whole genome sequence for spinach is available to browse or download with a variety of annotations. Those annotations include genes, mRNA, and other features; gene homologs; association of InterPro protein domains; Gene Ontology (GO) terms; and genome pathway terms. The annotations are available in genome (feature) pages, and may also be queried using a search interface and viewed in JBrowse, a genome browser. Transcriptome (RNA-Seq) data for 120 different accessions of spinach (wild and cultivated) may also be viewed in JBrowse (a genome viewer). Metabolic pathway information is available through the SpinachCyc database. SpinachBase also provides tools for analysis like NCBI BLAST, GO Enrichment analysis, Pathway Enrichment analysis, and batch downloads of specific sequences and annotations.
My Experience:
I’ve done bioinformatics research in the past at my university, however working at BTI was the first time that I’ve worked in a large lab. I really enjoyed spending time with them. Many people in my lab were post docs or graduate students so I gained a lot by talking with them about their experiences and research. In my research I encountered a large variety of different problems to solve, and had to work with a few different programming languages that I hadn’t worked with previously. Due to these challenges, my troubleshooting and programming skills have really increased. Being an intern at BTI is nice because if you run into problems or get stuck with research your mentor and lab can help you, and since you are with a group of other interns you can all swap stories and experiences and really support each other.
Intern Info
Patrick Yuan
Genetic Characterization of Cucumber (Cucumis sativus) Using Whole Genome Re-sequencing
The cucumber (Cucumis sativus), member of the Cucurbitaceae botanical family, is a widely grown creeping vine plant that produces fruits which are commonly used as vegetables. Important traits in crops, such as yield, resistance to diseases and insects, ease of harvest, and nutritional value, depend on genetic variation. The cucumber germplasm in the National Plant Germplasm System (NPGS), which consists of approximately 1300 accessions, have been previously characterized using Genotyping by Sequencing (GBS) technology, which covers only a small portion of the genome. A core collection of 395 cucumbers was inferred from this cucumber germplasm, among which genomes of a total of 149 cultivated cucumbers have been resequenced to date. Analysis of the genome resequencing data, together with that from 5 wild cucumbers (C.s var. hardwickii) representing the outgroup, resulted in a variation map of more than 1.6 million high-quality single-nucleotide polymorphisms (SNPs) distributed across the cucumber genome at approximately 1 SNP / 153bp. Principal component analysis (PCA), population structure, and phylogenetic analyses using this variation map, as well as a heatmap of a Hamming distance matrix, supported three major clades of these cucumber accessions: one with origins in India/South Asia, a second with origins in East Asia, and a third with origins from Central/West Asia, Turkey, Europe, Africa, and North America. Additionally, the neighbor-joining phylogenetic tree identified cucumbers from India/South Asia as the closest relatives to the wild cucumbers, and the heatmap revealed a relatively high level of genetic diversity within the India/South Asia accessions; both of these findings are consistent with the current understanding of India as the center of origin for cultivated cucumbers. Population structure analysis further identifies North American and African groups as subclades of the third clade. The variation map produced in this project, combined with that from the rest of the core cucumber accessions, provides a valuable resource that can help identify variants significantly associated with important traits.
My Experience
My experience in the Fei lab this summer has allowed me to explore how computer science is applied in bioinformatics. With the help of my mentor and fellow BTI interns, I learned about the steps and tools that are used when collecting, processing, and analyzing genetic data, as well as various biological concepts such as genetic diversity and population structure. This internship has also improved my programming skills and helped me learn new languages, such as Perl and R. Overall, my experience in the Fei lab was my first in academic research and has helped me understand biological applications of computer science, as well as potential careers that I could pursue in the future.
Intern Info
Charles Wang
Identifying Bottle Gourd Genes Responsive to the Infection of Papaya Ringspot Virus
Bottle gourd is an important crop that has helped both ancient and modern civilizations thrive. Besides its applications in medicine, musical instruments, and containers, bottle gourd fruit has relatively high nutritional value in its early stages of development, establishing it as a major food staple in developing countries. Another essential property of bottle gourd lies in its ability to be used as rootstock for grafting to other cucurbit crops. By exploiting this property, farmers can drastically increase their annual yield because grafting increases the scion’s tolerance to abiotic and biotic stresses. That being said, it is necessary that we develop novel methods of increasing pathogen resistance in bottle gourd, as doing so will not only improve the uses that we already have for this crop, but it will also magnify the benefit to other cucurbit crops. On the other hand, Papaya ringspot virus (PRSV) is a major limiting factor of cucurbit production, and it effectively thwarts the benefits of bottle gourd by inhibiting development in young plants and deforming the fruit of mature specimens. To gain insight into the molecular mechanisms underlying bottle gourd resistance to PRSV, we assessed global transcriptome changes in both resistant (USVL5) and susceptible (USVL10) bottle gourd accessions upon infection at 7, 14, and 21 DPI (days post infection). At the beginning of our study, we obtained RNA-Seq transcriptome data, which were subsequently cleaned through the removal of adaptors, low-quality sequences, virus contamination, and rRNA sequences. We then mapped each read to the reference bottle gourd genome, and differentially expressed genes (DEGs) were then identified at each time point for each accession based on the read counts obtained from alignments to the reference genome. Ultimately, we found more DEGs in the susceptible accession upon the PRSV infection than in the resistant accession, especially at 7 DPI. Furthermore, by analyzing the expression patterns of DEGs in the two genotypes and at different days post infection, we identified four clusters of genes with different expression characteristics. Gene Ontology (GO) term enrichment analyses were carried out on these clusters, and significantly overrepresented biological processes and molecular functions were identified. Our results indicated that while genes involved in DNA replication, protein refolding, and stress responses were upregulated in Clusters 1 and 2, genes related to photosynthesis and hormone production were downregulated in Clusters 3 and 4. Interestingly, two genes encoding Argonaute (AGO) proteins, which are essential to antiviral RNA silencing, were upregulated in Clusters 1 and 2, respectively. RNA silencing is one of the known mechanisms underlying plant resistance to virus, and further investigation into the genes involved in the antiviral RNA silencing pathway, such as those encoding RNA-dependent RNA polymerase (RDR), Dicer-like (DCL) proteins, and AGO proteins, showed that their expression levels all started out high in the USVL10 accession at 7 DPI. Overall, our study provides greater insight into the responses of bottle gourd accessions with different levels of resistance to PRSV, which could be applied to future crop improvement through breeding and genetic modification.
My Experience
At the beginning of the PGRP internship, I walked into BTI’s facility with the hope that I could successfully combine my interests in computer science and biology while also learning more about research, data analysis, and lab etiquette. Now, as my internship is drawing to a close, I can state – with a degree of certainty – that my experience at BTI has taken my understanding of all these areas to another level. Although I had conducted many research projects at school, my work at BTI exposed myself to a professional lab setting unlike any that I had worked in before. Challenges that I faced, such as adapting to other programming languages that I had no prior experience with, were admittedly daunting, but overcoming them was truly the biggest step in contributing to my growth as a member of the Fei Lab. All in all, working at BTI has given me a higher level of appreciation for conducting lab research, and in the future, I hope to integrate the skills that I have learned this summer in a manner that not only benefits myself, but also the community as a whole.
As a final note, I would like to thank my mentor, Shan Wu, for guiding me through the research process, as well as Dr. Fei for giving me such an incredible opportunity to work at BTI. In addition, I would also like to thank Mr. Dempsey, Mrs. Pawlowska, and Mrs. McDonald, for they are the individuals who continually motivate me to work hard and follow my dreams.
Intern Info
Stefanos Stravoravdis
Identification of structural variations between genomes of cultivated tomato Solanum lycopersicum and its wild progenitor S. pimpinellifolium
Cultivated tomato, Solanum lycopersicum, is an abundantly consumed crop worldwide, acting as a substantial source of nutrients and nourishment. Through selective breeding, multiple traits, such as size and production, have improved within these cultivated varieties; however, such breeding has resulted in allele loss, thereby narrowing genetic diversity. S. pimpinellifolium, the wild progenitor species of S. lycopersicum, possesses several favorable traits missing from the cultivar, prompting breeders to use this species as a new source of alleles for the domesticated species. In order to aid these breeding efforts, we strived to identify the presence of large structural variations (SVs; > 10 bp) near genes involved in important biological or agronomic processes between genomes of S. lycopersicum cultivar ‘Heinz 1706’ and S. pimpinellifolium accession LA2093. Minimap2 was employed to align the two genomes. Assemblytics and in-house scripts were used to extract the SVs. We then used Python scripts to validate and output SVs which overlapped with or were very close to a gene sequence (in promoter, CDS, or non-CDS regions). Each protein sequence from both genomes was analyzed to identify protein functional domains. The genes underwent Gene Ontology mapping and annotation, and GOATOOLS was used to perform GO enrichment analysis of the resulting data. Through this analysis, breeders can begin to identify favorable alleles present in the progenitor yet absent within the cultivar, thus using said information to breed improved tomato variants expressing beneficial phenotypes.
My Experience
As an intern in the NSF Plant Genome Research Program, I gained a wonderful opportunity to hone my bioinformatic skills in order to analyze a fascinating research topic. I am grateful for the opportunity given to me by the Fei lab and my mentor, Dr. Gao. Although I was already familiar with how to conduct research in the lab or on the computer (though not specifically with plant systems), I gained an important understanding of how a variety of different approaches can be taken to extract and analyze information from genomic data. I have confidence that I can use these new techniques and tools to further my own research endeavors, be such for the remainder of my undergraduate studies or for my future graduate research.
Intern Info
Charis Qi
“Construction of a graph-based watermelon pan-genome and investigation of genetic variation in cultivated watermelon and its wild relatives”
Project Summary:
Watermelon is one of the most popular and economically important fruit crops worldwide. The cultivated watermelon, Citrullus lanatus subsp. vulgaris, was created from over 4000 years of domestication. However, the cultivated watermelon is vulnerable to various diseases, while wild watermelons display resistance to many of these diseases. Wild watermelons have been widely used in modern breeding to introduce disease resistance. However, this process is slow, as it is unclear which specific genes and variants are giving disease resistance in wild watermelons. The goal of the project was to identify potentially beneficial genes, including resistance genes, to be introduced into breeding. First, nine genome assemblies from the cultivated watermelon and its direct wild progenitor (C. lanatus subsp. cordophanus) and close wild relative (C. mucosospermus) were aligned to the reference genome ‘97103’ to identify SVs, which were used to construct a graph-based pan-genome. Seventy-five representative accessions were genotyped for the SVs through mapping Illumina short reads to the constructed pan-genome. SVs with significantly higher frequencies in the two wild populations compared to the cultivated population were identified. Affected genes by these SVs with the disease resistance related functions were further identified, including those encoding NBS-LRR resistance proteins. Selective sweep analysis was also performed to detect genes related to fruit quality that are under selection during domestication. One of the genes identified encoded a RING-type E3 ubiquitin transferase, and this gene is located in a sugar content QTL, QBRX2-1, suggesting its potential role in fruit flesh sugar accumulation during watermelon domestication. The genes and variants discovered here could be candidates for further functional characterization and for watermelon breeding.
My Experience:
My time in the Fei lab was very insightful in many aspects and gave me my first immersive research experience in a field that I am interested in. Throughout my project, I was exposed to different programs and coding languages commonly used in bioinformatics, including many of which I was previously unfamiliar with. I also have a better understanding of what it means to continue down a career path in this field. On top of all of this, working with highly experienced professors, graduate students, and postdocs in this field was a very eye-opening experience. I also loved meeting other interns with similar interests and career passions as I do. Overall, I had a great time here. This summer at BTI has taught me a lot and gave me a better sense of direction for my post-undergraduate years.
Intern Info
Maddie Shaklee
“Finding Genome Regions Associated with Pepper Fruit Shape Using Genome-Wide Association Study”
Project Summary:
Peppers are currently the fourth leading vegetable crop worldwide and are cultivated for an incredible amount of diversity in appearance. However, the genetic bases underlying their fruit shape variations are largely unknown. A genome-wide association study (GWAS) was preformed using high-density single nucleotide polymorphisms (SNPs) to identify genome regions associated with fruit shape variations of Capsicum baccatum. A region on chromosome 10 was found to be significantly associated with fruit shape. This candidate region harbored eighteen protein-coding genes, including one (Ca10g05148) encoding an OVATE family protein. OVATE family protein has been known as a key regulator of fruit shape in tomato and other fruit crops, and is thus the plausible candidate gene for fruit shape regulation in C. baccatum. This study provides novel genes for pepper fruit shape regulation and valuable information that could be used for fruit shape breeding for pepper and other crops.
My Experience:
This summer has been an amazing experience which has given me many new skills and opportunities. Most notably is the preparation I have gotten for future research and pursuing graduate studies. This summer has taught me about developing a project and seeing it to completion with the help of my mentor. I expanded my ability to troubleshoot and work through code issues on my own and carefully organize my work to improve future research. This has given me confidence to consider my future studies more seriously and understand what research I would like to participate in moving forward.
Intern Info
Adam Cason
“SpinachBasev2: An updated central portal for Spinach genomics tools and information”
Project Summary:
Spinach (Spinacia oleracea) is an important agricultural crop because of its nutritional value, popularity, and other potential applications to human health such as being a vector for edible vaccines. In the summer of 2018, a project was undertaken by Fei Lab to build a central database for spinach genomic information and a site, SpinachBase, was created. This site houses data and resources related to the first published spinach genome of cultivar Sp75. In the following years, four more spinach genomes were sequenced of cultivars Monoe-Viroflay, Viroflay, 03-009, and Cornell No.9, creating a need to update the public store of spinach genomic knowledge. The release of these additional genome sequences, along with updates made to the Drupal/Tripal web system upon which the original database is built, necessitated the creation of a new spinach genomic database. Thus, SpinachBasev2 has been constructed and is intended to update and replace the existing SpinachBase site, while continuing to provide helpful tools and analyses relating to spinach genetic and genomic data. These tools include a BLAST similarity search function, a keyword search tool, a synteny viewer, and a genome browser. Various bioinformatics programs were used to format and annotate the genomic data for the database such as NCBI BLAST, Blast2Go, AHRD, InterProScan, and MCScanX. This database will allow spinach researchers and breeders to more easily and efficiently find and investigate data related to spinach genetics and genomics.
My Experience:
Attending a small, liberal arts university has been great, but has not allowed me to work or research in a large laboratory setting during the regular school year. Here at BTI, I have been able to gain a whole new experience of interacting with a PI along with other undergraduates, graduate students, and postdocs. Being able to work on my own bioinformatics project with my mentor has taught me how to use a myriad of different data annotation programs and coding languages, and has helped me to see what a career in bioinformatics really looks like. I have also become a more well-rounded scientist in general through the weekly lectures from diverse plant science researchers and classes dedicated to science ethics and science communication. All in all, this experience has prepared me in many different areas to confidently pursue graduate school and a career in the plant sciences.
Intern Info
Benjamin Beer
Project Summary:
Watermelon (Citrullus lanatus) is a prevalent crop in many countries thanks to its high nutritional value and refreshing taste. Despite this popularity, it is still challenging to successfully cultivate watermelons with negligible crop loss. This is due in part to watermelon’s narrow genetic diversity caused by a domestication bottleneck which leaves it vulnerable to various diseases. Wild species within the Citrullus genus have exhibited resistances to some of the most important diseases. Effectively hybridizing cultivated watermelons with the wild forms could offer opportunities to create a more resistant crop. To successfully utilize the genetic diversity preserved in the wild watermelons and guide efficient selection of breeding materials that carry beneficial traits, analysis on gene presence/absence variations (PAVs) was performed in 480 watermelon accessions belonging to cultivated watermelon and three wild relatives, C. mucosospermus, C. amarus and C. colocynthis, with a super pan-genome capturing genes existing in different watermelon species. Characterizing gene PAVs in the watermelon super pan-genome demonstrated the divergence among the wild and cultivated watermelons. Through comparative analysis, functionally important genes with significantly changed occurrence frequencies between the wild and cultivated watermelons were identified. These included disease resistance genes that were lost in the cultivated watermelon and could be brought back from the wild watermelons.
My Experience:
I learned a lot during my time working in Fei lab. I acquired a variety of technical skills such as working in Linux environments and writing scripts in R, while also being able to expand upon previously developed skills, namely Python and Bash. Learning how to use bioinformatics tools like samtools and Blast2GO was interesting and gave me further insights on the nuances of genetics. This internship also served as an introduction to the scientific community. The symposiums, other lectures, and lab work all showed me common practices and skills used in research while also greatly broadening my knowledge of plants. Overall, this was a very positive and informative experience that served as a great introduction to professional research. Lastly, I would like to thank my mentor, Shan Wu, for teaching me many of the skills I’ve acquired while at BTI and Dr. Fei for giving me this opportunity.
Intern Info
Mukund Gaur
Identification of Differentially Expressed Genes in F1 from the Cross between S. lycoperiscum M82 and S. pennellii LA0716
The modern cultivated tomato is characterized by a lack of genetic diversity due to extensive selective breeding. In contrast, wild tomatoes continue to display broad morphological and metabolic diversities including sugars, organics, and volatiles. Identifying genes with allele-specific expression (ASE) related to key agronomic traits in the F1 hybrid derived from the cross between the S. lycoperiscum M82 and the wild S. pennellii LA0716 could deepen our understanding of the regulation of tomato nutritional quality and flavor. This project used RNA sequencing (RNA-Seq) data from different fruit tissues at different developmental stages of the F1 cross. RNA-Seq reads were processed and mapped to the genomes of the two parents, respectively. Based on the mapping quality to each parent genome, the reads were assigned to one of the parents, and raw and normalized (FPKM) read counts for each of the two alleles were calculated for all genes. ASE genes were then identified and weighted gene co-expression analysis was performed. Gene ontology analysis for genes within the module with highly differential allele expression identified enriched biological processes, and genes with functions relevant to fruit flavor and nutrition were identified. The identification of these candidate genes could allow for future functional analysis that will deepen our understanding on how cross breeding with wild species could improve flavor and nutritional quality in cultivated tomatoes.
Through this experience at the BTI lab, I was able to gain invaluable knowledge about not only the tomato genome and the RNA-seq technology, but also in data analysis techniques and strategies. Throughout this process, I was able to learn about how to use the R, Linux, and Python programming languages for data analysis, and also used code management tools such as RPubs and Github. It was amazing to get to experience working in a laboratory environment and have access to the tools offered, and conduct an experiment with real-world applications. I did not come into this program with a lot of background in biology or plant science, so I was able to learn about both fields through my own research and the weekly seminars, which were a really great way for me to learn about current research in plant science.
Intern Info
Grace Coppinger
Identification of structural variants affecting fruit quality traits between wild and cultivated watermelons
Watermelon, Citrullus lanatus, is among the top five most consumed fruits globally. Its domestication has led to significant changes of fruit quality traits, including higher sugar content compared to its wild relatives such as Citrullus amarus. Genomic structural variants (SVs) have been reported to contribute to domestication traits. However, our knowledge about SVs between cultivated and wild watermelons and their phenotypic effects remains incomplete. To address this, a comparative study was conducted between a representative accession of C. lanatus subsp. vulgaris and a representative accession of C. amarus. Alignment of the high-quality genomes of these two accessions identified 111,738 SVs larger than 20 bp, which affected 19,482 and 19,096 genes in the cultivated and wild genomes, respectively. We found that 1,107 in the cultivated genome and 838 genes in the wild carried SVs affecting coding sequences, among which 25 had annotated functions related to disease resistance. RNA-seq data from cultivar and wild fruit flesh tissues were used for gene differential expression analysis. As a result, 2,732 genes in the cultivated genome and 4,884 genes in the wild genomes were differentially expressed during the fruit development. Through integrating SV and differential gene expression information, we identified candidate genes potentially crucial for fruit development and sugar content. This study has advanced our understanding of SVs and genes potentially affecting fruit quality traits and disease resistance. The SVs identified here serve as useful resources to facilitate future watermelon breeding efforts.
My experience at Boyce Thompson Institute this summer revolves around the incredible people I met. From engaging in lab work throughout the day to having fun playing sand volleyball, this has been a memorable summer. I am so grateful for all the friendships I’ve made during this time, and I look forward to witnessing all the remarkable achievements my peers will accomplish in the future. During my time at the Fei lab, I improved my bioinformatics skills and expanded my scientific knowledge and experience. I have a newfound appreciation for those practicing computer science and analyzing large data sets. This rewarding and challenging experience has amplified my desire to pursue a Ph.D. I am excited to continue my scientific career and put my new skills to use.
Intern Info
Aaron Alexander
Generating a Phased Genome Assembly of the Hexaploid Sweetpotato Cultivar, ‘New Kawogo’
Sweet potato is among the most important staple crops. Sweet potato cultivars rich in Vitamin A have been produced and promoted in areas where childhood Vitamin A deficiency is common. Sweet potato improvement is challenged by a lack of knowledge of the genetic and molecular basis of key agronomic traits. This research project aims to generate a phased genome assembly of the hexaploid sweet potato cultivar, ‘New Kawogo’. PacBio HiFi sequencing was used to produce highly accurate long reads and the reads were assembled into phased contigs with Hifiasm, a de novo assembler that produces haplotype-resolved genome assemblies by integrating chromatin conformation capture (Hi-C) sequencing data. Due to the complexity of the hexaploid sweet potato genome, chimeric contigs resulting from erroneously connected sequences from different haplotypes were present in the initial assembly. By taking advantage of the phased genetic maps, Hi-C contact maps and genome synteny, misassemblies were corrected in the ‘New Kawogo’ assembly. These corrected contigs have been used to produce a haplotype-resolved chromosome-level genome assembly of ‘New Kawogo’, which provides a valuable resource for the discovery of genetic controls of important traits and genomics-assisted improvement of sweet potato. This assembly serves as a foundation for the genetics and biology of ‘New Kawogo’ and will accelerate sweet potato breeding.
I am grateful for the opportunity to conduct summer research in the Fei lab at BTI under Shan Wu’s mentorship. Being part of the team working on the phased genome assembly of the hexaploid sweet potato cultivar ‘New Kawogo’ has been immensely rewarding, as this research supports biofortification efforts to promote Vitamin A-rich cultivars in regions of the world affected by childhood Vitamin A deficiency. I expanded my knowledge of genomics, including genetic markers, Hi-C contact signals, and genome synteny. The guidance and collaborative environment have greatly enriched my learning and research skills. With the help of my mentor and the BTI BCBC bioinformatics course, I learned the basics of UNIX, coding in R, and bioinformatics tools like BLAST, Seqkit, and BUSCO. I am also grateful for the REU programs, such as weekly seminars and the DGS Graduate School Panel, which helped me learn about and decide to pursue higher studies.