“Pedigree Verification in Cassava through Analysis of Single Nucleotide Polymorphisms”
Project Summary:
Manihot esculenta, more commonly known as cassava, is a tropical root crop that serves as the primary food source for 500 million people around the world, particularly in Sub-Saharan Africa. Efforts to improve cassava breeding are tracked using CassavaBase (cassavabase.org), a public database developed by the Mueller Lab as part of the NEXTGEN Cassava Breeding project. CassavaBase allows breeders to store their data in a free and open format. This data includes the pedigrees of thousands of breeding lines. In this project, a pedigree verification tool based on genetic similarity was developed in Perl and implemented in CassavaBase. This tool examines genetic data, analyzes a select set of single nucleotide polymorphisms (SNPs) from the genotypes of the parents and child identified in a pedigree, and determines whether the given combination is possible (e.g. both parents having two copies of an allele and it being absent in the child would be impossible). For lines that do not appear to be a genetic match with their documented parents, it is then possible to search for the true parents through genetic comparison against a larger population of potential parents.
Breeding higher-yielding and more resilient crops will be critical as land area available for agriculture shrinks and populations increase. Studies have shown that the cassava is one of the only staple crops that may resist or even benefit from climate change, an increasingly important concern as global temperatures rise. By ensuring the accuracy of pedigree records in CassavaBase, this tool can contribute to worldwide cassava breeding efforts, including the goals of the overreaching NEXTGEN project: shortening the breeding cycle, improving yield, increasing genetic diversity, and increasing the exchange of cassava breeding information.
My Experience:
My internship at Boyce Thompson Institute has given me the opportunity to collaborate with extremely talented bioinformaticians and researchers from around the world. I have gained experience in a considerable amount of areas including: using Linux command line, writing and running scripts for data analysis in R, accessing and managing databases, construction of packages, controller, and modules in Perl, website design in HTML and JavaScript, the use of virtual machines for development, and the importance of file management and backups. Much of my work required collaboration with lab members who possess unique skill sets and having to communicate with them clearly and effectively to obtain the information that I needed was a valuable experience. This internship experience has shown me that I work very well in a research setting, and in the future I would consider returning to biological research as a potential career field.