Yuan - Boyce Thompson Institute

Patrick Yuan

Year: 2019

Genetic Characterization of Cucumber (Cucumis sativus) Using Whole Genome Re-sequencing

The cucumber (Cucumis sativus), member of the Cucurbitaceae botanical family, is a widely grown creeping vine plant that produces fruits which are commonly used as vegetables. Important traits in crops, such as yield, resistance to diseases and insects, ease of harvest, and nutritional value, depend on genetic variation. The cucumber germplasm in the National Plant Germplasm System (NPGS), which consists of approximately 1300 accessions, have been previously characterized using Genotyping by Sequencing (GBS) technology, which covers only a small portion of the genome. A core collection of 395 cucumbers was inferred from this cucumber germplasm, among which genomes of a total of 149 cultivated cucumbers have been resequenced to date. Analysis of the genome resequencing data, together with that from 5 wild cucumbers (C.s var. hardwickii) representing the outgroup, resulted in a variation map of more than 1.6 million high-quality single-nucleotide polymorphisms (SNPs) distributed across the cucumber genome at approximately 1 SNP / 153bp. Principal component analysis (PCA), population structure, and phylogenetic analyses using this variation map, as well as a heatmap of a Hamming distance matrix, supported three major clades of these cucumber accessions: one with origins in India/South Asia, a second with origins in East Asia, and a third with origins from Central/West Asia, Turkey, Europe, Africa, and North America. Additionally, the neighbor-joining phylogenetic tree identified cucumbers from India/South Asia as the closest relatives to the wild cucumbers, and the heatmap revealed a relatively high level of genetic diversity within the India/South Asia accessions; both of these findings are consistent with the current understanding of India as the center of origin for cultivated cucumbers. Population structure analysis further identifies North American and African groups as subclades of the third clade. The variation map produced in this project, combined with that from the rest of the core cucumber accessions, provides a valuable resource that can help identify variants significantly associated with important traits.

My Experience

My experience in the Fei lab this summer has allowed me to explore how computer science is applied in bioinformatics. With the help of my mentor and fellow BTI interns, I learned about the steps and tools that are used when collecting, processing, and analyzing genetic data, as well as various biological concepts such as genetic diversity and population structure. This internship has also improved my programming skills and helped me learn new languages, such as Perl and R. Overall, my experience in the Fei lab was my first in academic research and has helped me understand biological applications of computer science, as well as potential careers that I could pursue in the future.