Lukas Mueller

Professor

Developing databases and software tools that organize plant genetics data, helping scientists and breeders accelerate crop improvement—especially for staple crops in food-insecure regions.

Google Scholar

Research Focus

How can genomics contribute to improved crop breeding?

Email: lam87@cornell.edu

Office Phone: 607-255-6557

Office/Lab: Room 221

Adjunct Professor
Section of Plant Breeding and Genetics
School of Integrative Plant Science
Cornell University

The evolutionary dynamics of genetic mutational load throughout tomato domestication history
Razifard, H., Visa, S., Menda, N., Mueller, L., Tieman, D., van der Knaap, E. and Caicedo, A.L.

Sustaining public plant breeding programs across generations
Hale, I., Koebernick, J., Hershberger, J., Rife, T., Arbelaez, J.-D., Anderson, N., Bekkerman, A., Bohn, M., Bourland, F., Burke, T., Chee, P., Evans, K., Fumia, N., Feldmann, M., Gasic, K., Hague, S., Heilman-Morales, A. M., Kemp, A. H., Iglesias, C., Mueller, L., … Kantar, M.

BrAPI v2-A unified framework for data integration and collaboration for breeding and genetic resources
P Selby, R Abbeloos, AF Adam-Blondon, FJ Agosto-Pérez, M Alaux, I Alic, ...

Post-composing ontology terms for efficient phenotyping in plant breeding.

Menda N, Ellerbrock BJ, Simoes CC, Karaikal SK, Nyaga C, Flores-Gonzalez M, Tecle IY, Lyon D, Agbona A, Agre PA, Peteti P, Akech V, Asiimwe A, Fauvelle E, Meghar K, Tran T, Dufour D, Cooper L, Laporte MA, Arnaud E, Mueller LA.

Genomic Predicted cross performance: a tool for optimizing parental combinations in breeding programs.

Nyaga C, Labroo MR, Paterne A, Asfaw A, Wolfe MD, Tecle IY, Mueller LA.

Breedbase: a digital ecosystem for modern plant breeding.

Morales N, Ogbonna AC, Ellerbrock BJ, Bauchet GJ, Tantikanjana T, Tecle IY, Powell AF, Lyon D, Menda N, Simoes CC, Saha S, Hosmani P, Flores M, Panitz N, Preble RS, Agbona A, Rabbi I, Kulakow P, Peteti P, Kawuki R, Esuma W, Kanaabi M, Chelangat DM, Uba E, Olojede A, Onyeka J, Shah T, Karanja M, Egesi C, Tufan H, Paterne A, Asfaw A, Jannink JL, Wolfe M, Birkett CL, Waring DJ, Hershberger JM, Gore MA, Robbins KR, Rife T, Courtney C, Poland J, Arnaud E, Laporte MA, Kulembeka H, Salum K, Mrema E, Brown A, Bayo S, Uwimana B, Akech V, Yencho C, de Boeck B, Campos H, Swennen R, Edwards JD, Mueller LA

Research Overview

In recent years, technological advances in fields such as sequencing have transformed certain aspects of biology into an information-based discipline. To make this abundance of data—often called Big Data—useful to researchers and breeders, it needs to be organized and made accessible. Towards this goal, the Mueller lab designs and implements databases that assist scientists in their research and plant breeders in more efficient crop improvement.

Our databases and software make transcriptomic, genotypic and phenotypic data from thousands of experiments accessible to the public, often focusing on under-researched staple crops from food-insecure regions. A method called Genomic Selection that uses high-throughput genotyping technologies, such as genotyping-by-sequencing (GBS), and large phenotyping data sets allows for rapid prediction of desirable traits in new plant crosses.

Based on these tools, the Mueller laboratory collaborates on a variety of different projects. With the Nextgen Cassava project, we have created Cassavabase, a database specifically designed for cassava breeders in Africa. We coordinate the Solanaceae Genomics Network—a compilation of all the genetic information known about solanaceous plants, such as tomato, petunia, and Nicotiana. We are also developing breeding databases for yam, sweet potato, and the cooking banana. Finally, the Mueller group is involved in multiple genome sequencing projects, including tomato, coffee, petunia, and Nicotania benthamiana.

BreedBase
Breedbase is a comprehensive breeding management and analysis software. It can be used to design field layouts, collect phenotypic and other information using Phenoapps software, support the collection of genotyping samples in a field, store large amounts of high density genotypic information, and provide Genomic Selection related analyses and predictions.

Cassavabase
Access to data and tools for breeders and researchers, including genomic selection algorithms and analysis capacity, a cassava genome browser, cassava ontology tools, phenotyping tools, and social networking.

Citrus Greening Solutions
A systems-based pipeline approach for delivering commercial, grove-deployable solutions using a novel therapeutic delivery strategy and citrus transgenics.

Musabase
A breeding database designed for advanced breeding methods in banana breeding.

Rtbbase
A collection of Root Tubers & Banana Databases, which hold genomic and phenotypic information for next generation breeding applications.

Sol Genomics Network
A site for genome data of Solanaceae species such as tomato, potato, and pepper, and related to the tomato genome sequencing project.

Sweetpotatobase
Part of the Genomics Tools for Sweet Potato (GT4SP) Improvement Project focused on developing a set of “next generation” breeder tools for African sweetpotato breeders in Africa.

Yambase
A database about breeding data for Yam (genus Dioscorea). Yam species that are being used for breeding include , Dioscorea rotundata, Dioscorea cayenensis (both are native to Africa and the major cultivated species), Dioscorea aleata (native to Southeast Asia), and Dioscorea praehensilis, as well as several other species.

Nicotiana benthamiana
A project to improve the assembly of the Nicotiana benthamiana genome sequence and link the sequence to the nineteen chromosomes. We are also working to improve the gene annotation.

Lab Members

In the News

October 7, 2022

Breedbase software to help speed crop improvement

October 7, 2022

To help plant breeders speed crop improvement around the world, Lukas Mueller of the Boyce Thompson Institute worked with an international team of 57 people to create Breedbase, a database software that was described...

June 24, 2022

Wild tomato genome will benefit domesticated cousins

June 24, 2022

Wild relatives of crops are becoming increasingly valuable to plant researchers and breeders. During the process of domestication, crops tend to lose many genes, but wild relatives often retain genes...

December 1, 2020

Tomato’s Wild Ancestor Is a Genomic Reservoir for Plant Breeders

December 1, 2020

Thousands of years ago, people in the region now known as South America began domesticating Solanum pimpinellifolium, a weedy plant with small, intensely flavored fruit. Over time, the plant evolved into S....

May 30, 2012

Tomato Genome Becomes Fully Sequenced – Paving the Way for Healthier Plants

May 30, 2012

ITHACA, N.Y. – For the first time, the genome of the tomato, Solanum lycopersicum, has been decoded. It becomes an important step toward improving yield, nutrition, disease resistance, taste and color...

More news

Research Experience

Internships

BTI offers a summer research experience program for undergraduate and high school students.

Intern Projects in the Mueller Lab

Interns in the Mueller Lab work on a variety of bioinformatics and genomics projects and gain experience in the following areas: genome assembly, structural and functional annotation, biochemical pathways, comparative genomics, ontology development, and data presentation and visualization.

Previous Interns

2024

2023

Nistar Steinerman

2023

Auditing Database Tables in Breedbase

In an era when genetic sequencing and phenotyping are quickly producing vast amounts of data, database systems such as Breedbase are becoming increasingly necessary in genomics research. Websites that use the Breedbase ecosystem support a range of tasks related to crop breeding, which enact operations on the database in the form of insertions, updates, and deletions of data. Breedbase users requested that a record of these changes be made available on the website as a reference, for archival purposes and correction of errors. This feature was built by running a PostgreSQL database patch, which creates audit tables and inserts audit data every time a change is made to the database. These data, including the timestamp, type of operation, username of the account that committed the change, and state before and after the change, are displayed in dedicated audit pages and sections on the website. These webpages were designed using HTML and JavaScript/jQuery, and database connections are handled via a controller written in Perl. With the advent of audit tables in Breedbase, users will be able to handle their data more efficiently and securely. This is impactful as Breedbase programs mainly focus on staple crops such as cassava, which are integral to food security and economic prosperity.

The BTI REU program has been such a wonderful, educational, and memorable experience. Coming from a university where opportunities for plant science research are limited, it has been very exciting to join a bustling, vibrant plant research community. I am still honing my academic focus, and working in the Mueller Lab, a bioinformatics research group, has been helpful with learning about how computer science and biology intersect. It was also fascinating to observe how an interdisciplinary lab operates and collaborates with other labs at BTI and with partner institutions. Other aspects of the REU program, such as the weekly seminars and science communication workshops, have been very impactful for me in the way that I think about research. I think this research experience has equipped me with skills and perspectives that will be invaluable as I continue on my academic journey. I am so grateful to my mentors, Adrian Powell and Lukas Mueller, as well as Megan Truesdail and Delanie Sickler, for their efforts in coordinating this internship.

Intern Info

School Brandeis University

Faculty Advisor Lukas Mueller

Mentor Adrian Powell

Year 2023

2022

2019

Arianna Kazemi

2019

The Sweet Potato Expression Atlas

The domesticated sweet potato (Ipomoea batatas) is a staple food, particularly in sub-Saharan Africa. It has many nutritional benefits, including high carbohydrate, fiber, and vitamin content. Ensuring a hearty and healthy sweet potato crop is therefore crucial to maintaining this important food source. Current threats to sweet potato yield include drought, certain fungal diseases including root rot, and sweet potato weevils, incredibly damaging pests. Creating a crop that is both resistant to these threats and has greater starch and protein yields could be possible by turning to the 14 crop wild relatives of sweet potato, each of which has unique traits that could improve sweet potato growth. The genomes of two of these relatives, I. trifida and I. triloba, have recently been sequenced. Using these genomes and existing transcriptomic data collected from these relatives under a variety of stress conditions, genes and networks of interest can be identified for further study.

Such study is made more accessible through the creation of a sweet potato gene expression atlas, a web-based tool that allows users to view levels of gene expression in these two species through a variety of graphical tools. With the atlas, the location and conditions under which genes are differentially expressed can be visualized while coregulated genes can be easily identified, suggesting potential relations and pathways. This tool was created with HTML and JavaScript making up the webpage and a Perl controller accessing a PostgreSQL database. The gene expression atlas will make it easier to develop hypotheses about gene function and differences between these two species of Ipomoea, information which could lead to discoveries that can increase I. batatas yields and maintain this staple crop’s productivity under conditions of biotic and abiotic stress.

My Experience

Through my work in the Mueller Lab this summer, I gained a great appreciation for open and accessible science. Being in an environment where our work aimed to make research available to all, which could, in turn, improve not only crop production but also benefit lives gave this project a clear purpose. Seeing the collaborative nature of the lab was also motivating, especially since the Mueller Lab is a non-traditional computational group, and demonstrated to me how bioinformatics requires just as much communication as traditional research. This sense of working for a common goal will stay with me as I continue my career as a scientist.
At BTI, I learned new computing languages and became very familiar with the troubleshooting process. With the help of my mentor, I feel more confident in my abilities as a computational biologist and have strengthened my desire to work on similar projects that combine technology with biology. Through my experiences both in the lab and with my fellow students, I feel more assured about continuing with my educational journey in graduate school and am grateful to everyone who made this program possible.

Intern Info

School University of Massachusetts Amherst

Faculty Advisor Lukas Mueller

Year 2019

2019

Tosin Olayinka

2019

Genomic Analysis of NLR and PR Genes in Coffea Arabica and Its Ancestral Parents

Coffee is one of the world’s most widely consumed beverages and an economically important crop for both its growers and its processors. Commercially grown coffee is usually one of two species – Coffea arabica or Coffea canephora. These two species differ in terms of flavor, favored climates, and, most importantly, disease resistance. C. arabica is the more desired of the two species due to its sweeter, less bitter taste and its more delicate flavor. However, C. canephora is easier to take care of and has superior general disease resistance.

C. arabica is an allotetraploid – a species resulting from a hybridization event between two different diploid parental species – and is the result of an ancient cross between Coffea eugenioides and the aforementioned C. canephora. This hybridization is interesting because both the allotetraploid and one of the diploid parents are widely cultivated and, of the two, it is the diploid which has superior disease resistance. This is unusual because, in general, polyploid plants tend to be more robust than their diploid ancestors as a result of increased diversity in key gene families. Using tools from comparative genomics, we can determine the potential mechanisms behind this discrepancy.

In order to do this, we looked at two gene families heavily involved in plant immunity to pests and disease – nucleotide-binding leucine rich repeat (NLR) and pathogen-related (PR) genes. Using high quality functional gene annotations of the three species as a base, NLR genes were identified using specific DNA motifs as signals and PR genes were identified using their associated InterPro IDs. In addition to this, genes in C. arabica were characterized into a subgenome that was either ancestral to C. eugenioides and C. canephora. With this information, comparisons of C. arabica subgenomes with their respective ancestral species on a genomewide and gene family scale were made; making comparisons on gene content, orthology, and synteny. With this were able to show that the C. arabica genome contains numerically less NLR genes with less diversity than its parents.

Finally, expression data from two varieties of C. arabica were compared. One from a variety which is resistant to coffee berry disease (Catimor) and the other from one which is not (Cattura). These were compared across three different timescales and conditions (infected and non-infected) and genes which were deferentially expressed between susceptible and resistant varieties were identified. In addition, genes were clustered based on similar changes in expression over time and graphed together using the TCSeq Bioconductor package. For both the time series and differential expression analyses, gene ontology enrichment was performed using the topGO package in R and used to better interpret the data.

My Experience

This summer has been an interesting experience both inside and outside of the lab. While I do have experience working with next-generation sequencing (NGS) data, I have not studied an organism with the same breadth and depth that I did this summer with the three Coffea species. Linking different experiments and statistics to a certain hypothesis was both the most frustrating and the most satisfying part of my time here. The confusion coming from an unexpected figure and the satisfaction coming from producing elegant results both helped me grow and become a better scientist.

I was also lucky enough to be surrounded by a cohort of students, professors, and researchers who were passionate about a wide range of different topics – both inside and outside the realm of science. People who, when asked, were eager to help me and others out. Hopefully, I am able to do the same for others in the near and far future.

Intern Info

School The University of North Carolina at Chapel Hill

Faculty Advisor Lukas Mueller

Mentor Mirealla Flores Gonzalez

Year 2019

2018

Thomas Chan

2018

“Marker & Pedigree Hybrid Visualization Tool”

Project Summary:

Cassava (Manihot esculenta) is a staple crop and key source of carbohydrates in many tropical regions of the world. Inbreeding depression has occurred in most cassava populations, resulting in mutations that have reduced yield. In turn, its deleterious mutations pose a threat to food security in developing regions. The creation of a haplotype-pedigree hybrid visualization too will greatly aid cassava breeders to perform marker assisted selection and eliminate deleterious alleles.

The tool was created using JavaScript and HTML for the web page, D3 for the figure, and Perl and PostgreSQL to access the database. The first phase of the project included designing the website and writing functions to collect user input. Then second phase focus on the back end, such as using user input to retrieve the genetic marker information from the database and editing the pedigree tool to support marker visualization. When using the tool, users must first specify list of accessions (or populations) and a genotyping protocol to yield a list of the shared genetic markers. Upon selecting up to seven markers from the list, users can then display the available pedigree of the accessions. These are displayed as parents and children with color-coded markers and dosage values displayed to the right of the respective node.

The visualizer will help breeders follow the flow of alleles through accession crosses and help inform future breeding decisions. Future additions include genetic linkage data of the chosen markers and display of allele pairs with the marker name. The tool can be employed with other plants to eliminate deleterious alleles or perform crop marker assisted breeding.

My Experience:

The Plant Genome Research Program Internship allowed me to explore my interest in both biology and computer science in a way that can create meaningful change. During my time at the Boyce Thompson Institute, I worked on both the user interface and back end of the application; it was especially rewarding to see the completion of the tool from beginning to end. Furthermore, I was introduced to David Lyon and Guillaume Bauchet, my two mentors who provided wisdom and guidance throughout the development process and proved to me that three minds are better than one. This summer research experience has opened my eyes to the possibilities of biological and computational applications to the real world and has solidified my devotion to use technology for the advancement of human health and well-being. Needless to say, I am forever thankful to the opportunities I’ve been afforded by the program.

Intern Info

School Tufts University

Faculty Advisor Lukas Mueller

Year 2018

2018

Celine Manigbas

2018

“Analysis of the Asclepias syriaca Genome and Gene Families”

Project Summary:

Asclepias syriaca, known as the common milkweed, is found throughout northeastern and southeastern parts of the United States. Cardenolides are a subclass of cardiac glycosides found in Asclepias, and they contain steroidal toxins poisonous to insects and animals when consumed. The larvae of monarch butterfly, however, utilize Asclepias as their main food source and protection. There is a lack of high quality genomic information regarding A. syriaca to explore the cardenolide biosynthetic pathway and to comparatively analyze against other species in the Apocynaceae family that do not produce cardiac glycosides. The genome of A. syriaca was recently sequenced by the Jander lab using PacBio with >300x coverage, generating longer reads than in the published genome of A. syriaca, and assembled using Falcon Assembler. This assembly could provide more genomic information for annotation and gene prediction, and this could contribute more information for further genomic research concerning milkweeds, its evolution, and similar plants.

The assembly was error corrected using Arrow and was repeat-masked. RNA-seq data mapped to the genome and error corrected using Mikado and Portcullis were used to train the ab inito gene predictors Snap and Augustus. These predictors were used in the MAKER pipeline along with RNA and protein evidence to synthesize the data into structural gene annotations. Blast2Go was used for functional annotation. The gene families of published Asclepias syriaca genome, Catharanthus roseus, Rhazya stricta, Coffea canephora, Theobroma cacao, and Solanum lycopersicum were then identified using Orthofinder. KinFin was used to associate functions to the orthogroups. Gene family expansion was identified using CAFE.

My Experience:

This summer challenged me and taught me a lot about the field of bioinformatics research. This REU was my first exposure to working with Big Data of plant genomes. Prior to this internship, I had very limited experience in bioinformatics. Along the way, I learned a lot about the ever-changing and advancing field through the use of different programs, and it was exciting to work with the cutting edge programs towards research that is relevant to the real world. Before coming here, I was not so clear on the future path I wanted to take and if computational research was the route for me. But after this experience, I realized that I am truly interested in going to the field of bioinformatics.

Intern Info

School Massachusetts College of Liberal Arts

Faculty Advisor Lukas Mueller

Year 2018

2017

Asha Duhan

2017

“Identification of a genomic region in lycopersicoides associated with resistance to Pseudomonas syringe pv. tomato”

Project Summary:

Tomato (Solanum lycopersicum) is an economically important crop and a nutritious staple used to enrich diets. Speck is a disease caused by Pseudomonas syringaepv. Tomato, and specifically targets tomatoes, resulting in dark spots on the tomato fruit and leaves. Speck can be lethal to tomatoes and results in a drastic decrease in fruit yield and marketability. An abundance of natural variation, including resistance to disease, exists in wild relatives of tomato and many species in the tomato clade can be crossed with cultivated tomato.

In this project, S. lycopersicoides LA2951, a speck-resistant accession, was analyzed to determine differences in the genome to that of tomato, such as structural differences that may confer resistance. S. pennellii was also analyzed for comparison. Informative plots have been generated using the results of alignment programs run on these genomes, to visually illustrate structural differences such as inversions and deletions that occur between these three genomes. Furthermore, primers were designed and used in the laboratory techniques of Polymerase Chain Reaction (PCR) and Gel electrophoresis to make progress on mapping the resistance locus in two S. Lycopersicoides introgression lines. These introgression lines both contain large segments of the region of interest in S. lycopersicoides on chromosome 4.

The future goals of this project are to narrow down the large resistance locus to a few genes that can later be isolated and bred into the tomato line. This research could lead to a greater understanding of plant-pathogen interactions and serve as a model for plant resistance.

My Experience:

I really enjoyed my internship at BTI. Coming into the program, I had minimal knowledge in the field of bioinformatics. As the program progressed, I have gained invaluable experience and exposure in both the field of bioinformatics and in plant laboratory research. I was fortunate to be able to experience both the field of bioinformatics and laboratory research, and both aspects of my project were extremely helpful and further enriched my knowledge on how research is conducted. I would like to thank my mentors, Suzy Strickler, Adrian Powell and Sammy Mainiero on helping me with my research project and also introducing me to the world of scientific research. This internship has furthered my interest and curiosity in discovery driven science, and I am now considering pursuing biology or a related field in college.

Intern Info

School Ithaca High School

Faculty Advisor Lukas Mueller

Year 2017

2017

Alexander Ivanov

2017

“BrAPI connecting the world’s plant breeding data through a mobile application”

Project Summary:

Breeding technology is advancing very rapidly, and it is important to stay connected between databases to share information faster. It is also important that researchers stay connected so that information can be revised and improved. My application does just that.

My application is written in Android Studio to allow the user to search through different databases at brapi.org. First, I need to make sure that the user has internet connection, if he doesn’t, then he would be notified immediately. Next I added intents, abstract descriptions of an operation to be performed, to transition from one page to the next whenever the user chooses a database or category before viewing the data text file. If the user does not see the information he needs, he can always change the page size by clicking on text box below to get more results, and specify the information he needs on the specifications page.

After completion, I learned how to build a simple android application, about the connection between two activities, how to check for connection or connect to the internet, and how to design stylish pages. My mentor and I created an application that allows any user who is logged in to brapi.org to connect and query much faster to the available plant breeding databases than using standard search methods. In the near future, I am planning on releasing the beta version of this free application on the Google Play store for all users.

My Experience:

I gained many experiences from this internship that I realize are necessary for me to start a computer programming career. This internship helped me understand that not everything can be solved right away, and often requires to communicate with others to discuss problems that need to be resolved immediately.

Before entering this program, I never fully understood the purpose of programming, and often failed to seek help from others. Now that I have created an app that has practical applications, I understand that programming requires dedication, and there is a plethora of unknown surprises when creating an app, so asking for help is just a matter of time. I want to thank my mentor, Nicolas Morales, for always helping me find ways of solving complex problems with simple codes, and helping me understand trying to resolve everything on my own is not always the best option.

Intern Info

School Thomas S. Wootton High School

Faculty Advisor Lukas Mueller

Year 2017

2017

Anna Yaschenko

2017

“Characterizing long non-coding RNA in the Asian Citrus Psyllid through genome annotation and molecular biology”

Project Summary:

Huanglongbing (HLB), or citrus greening disease, is jeopardizing the citrus industry around the world. HLB is an incurable disease of citrus that results in yield loss, decline of tree health and ultimately death of the plant. It is associated with the pathogen Candidatus Liberibacter asiaticus (CLas), which is spread by Diaphorina citri, or the Asian citrus psyllid (ACP), when psyllids harboring CLas feed on healthy citrus. It is important to note that in order to be effective vectors of CLas, psyllids must acquire CLas as nymphs, the juvenile life stage of the psyllid. My project focuses on interactions between CLas and ACP nymphs, specifically investigating the role of long noncoding RNA (lncRNA). Recent work by the Heck lab identified 83 differentially expressed lncRNAs in the gut of adult ACPs, when exposed to CLas as nymphs, through multi-omics. By identifying and characterizing these lncRNA, we can better understand the relationship between the vector ACP and the pathogen CLas. I identified lncRNA in the ACP genome using multiple bioinformatics pipelines and validated the presence of lncRNA in nymphs with reverse transcription polymerase chain reaction (RT-PCR) and cloning methods. Specific lncRNA found in the adult ACP gut were confirmed to be present in nymphs. Interestingly, expression seemed to vary based on host plant. The end goal of this research is to identify the role of lncRNA in CLas transmission, which will hopefully lead to the development of methods that slow or even stop the spread of HLB.

My Experience:

My experience at the Boyce Thompson Institute has been phenomenal. From patient, understanding mentors to a friendly work environment, BTI has allowed me to grow as a scientist and gain skills I had never been exposed to before. Though this summer was challenging, I was able to overcome the difficulties I faced in my research. It would not have been possible without my wonderful mentors, Angela Kruse, Surya Saha, and Prashant Hosmani, and the endless support provided by both the Mueller and Heck Labs. Without them, I would have never considered plant genetics as a possible career track. The lab and technical skills I learned at BTI will continue to aid me as a researcher as I embark upon the adventure of my career. I encourage anyone who is considering a career in research to apply to this program and immerse themselves in the inspiring community that is BTI.

Intern Info

School University of Maryland, Baltimore County

Faculty Advisor Lukas Mueller

Year 2017

2017

Kyndra Zacherl

2017

“Pedigree Verification in Cassava through Analysis of Single Nucleotide Polymorphisms”

Project Summary:

Manihot esculenta, more commonly known as cassava, is a tropical root crop that serves as the primary food source for 500 million people around the world, particularly in Sub-Saharan Africa. Efforts to improve cassava breeding are tracked using CassavaBase (cassavabase.org), a public database developed by the Mueller Lab as part of the NEXTGEN Cassava Breeding project. CassavaBase allows breeders to store their data in a free and open format. This data includes the pedigrees of thousands of breeding lines. In this project, a pedigree verification tool based on genetic similarity was developed in Perl and implemented in CassavaBase. This tool examines genetic data, analyzes a select set of single nucleotide polymorphisms (SNPs) from the genotypes of the parents and child identified in a pedigree, and determines whether the given combination is possible (e.g. both parents having two copies of an allele and it being absent in the child would be impossible). For lines that do not appear to be a genetic match with their documented parents, it is then possible to search for the true parents through genetic comparison against a larger population of potential parents.

Breeding higher-yielding and more resilient crops will be critical as land area available for agriculture shrinks and populations increase. Studies have shown that the cassava is one of the only staple crops that may resist or even benefit from climate change, an increasingly important concern as global temperatures rise. By ensuring the accuracy of pedigree records in CassavaBase, this tool can contribute to worldwide cassava breeding efforts, including the goals of the overreaching NEXTGEN project: shortening the breeding cycle, improving yield, increasing genetic diversity, and increasing the exchange of cassava breeding information.

My Experience:

My internship at Boyce Thompson Institute has given me the opportunity to collaborate with extremely talented bioinformaticians and researchers from around the world. I have gained experience in a considerable amount of areas including: using Linux command line, writing and running scripts for data analysis in R, accessing and managing databases, construction of packages, controller, and modules in Perl, website design in HTML and JavaScript, the use of virtual machines for development, and the importance of file management and backups. Much of my work required collaboration with lab members who possess unique skill sets and having to communicate with them clearly and effectively to obtain the information that I needed was a valuable experience. This internship experience has shown me that I work very well in a research setting, and in the future I would consider returning to biological research as a potential career field.

Intern Info

School Gannon University

Faculty Advisor Lukas Mueller

Year 2017

2016

Danielle Dixon

2016

Elucidating genetic variation in ‘Candidatus Liberibacter asiaticus’ transmission between Asian citrus psyllid isofemale lines

Project Summary

Citrus greening disease or Huanglongbing (HLB) is the most devastating citrus disease worldwide. HLB causes citrus to be inconsumable due to a reduction in fruit quality, decrease in overall fruit production and eventual death of the citrus tree. The disease is associated with this bacterium Candidatus Liberibacter asiaticus (CLas), which infects citrus hosts by being acquired and transmitted by the Asian citrus psyllid (ACP). CLas must infect the ACP by traveling through its saliva, gut wall, blood and return to the saliva to be successfully transmitted. It has been found that the rate acquisition and transmission rates of CLas are variable. Much of the work in citrus greening is focused on the development of improved early detection methods for the disease by understanding both of these factors. My project focuses on improving the current psyllid genome through manual curation, as well as helping characterize transmission rates among different ACP isofemale line via quantitative Polymerase Chain Reaction (qPCR). Gene families that are involved in psyllid immunity have been identified and are curated with the objective of understanding how they contribute to CLas infection of the psyllid. While, qPCR allows us to quantify the amount of CLas present in citrus leaves post ACP feeding, we can then analyze the transmission rates. The ability to connect transmission rates with different psyllid lines may allow for further experiments to reveal the connection between phenotypic and genotypic variation that lends to the spread of CLas. Objectively, our goal is to slow the disease spread and stop losses in the citrus industry.

My Experience

My summer at the Boyce Thompson Institute and Cornell University was wonderful. I had the opportunity to explore another scientific method by gaining computational skills in the field of bioinformatics. Before this summer I was not attune to the connections between molecular biology techniques and bioinformatics, such as how protein expression analysis may contribute to the structural change of a gene model. I am immensely grateful for the caring and open community that I experienced while interning at BTI. Working individually with my mentors and collaboratively with members of the Cilia and Mueller lab I felt supported and capable of success while working toward our goal. My overall experience was equally challenging and rewarding. Additionally, I was privy to various perspectives about the different paths I can pursue as I continue on in the field of scientific research and eventually toward my Ph.D.

Intern Info

School University of Puget Sound

Faculty Advisor Lukas Mueller

Year 2016

2016

2015

Allison Izsak

2015

Gene discovery, annotation and orthology in the Asian citrus psyllid genome

Project summary

The Asian citrus psyllid, Diaphorina citri, is the vector host to the citrus industry’s most threatening bacterial pathogen, Candidatus Liberibacter asiaticus. This bacterium is the causative agent of citrus greening disease and has cost the citrus industry more than $4 billion in revenue loss. The focus of my project this summer was to help find a solution to this devastating disease by working with the psyllid genome and looking for genes that might be involved in pathogen transmission or pathogen survival.

Not much is known about the genetics and genomics of the psyllid, so a literature search of related vector systems was conducted and a candidate gene list highlighting genes that play a role in immunity and gut-microbe homeostasis was compiled. The candidate genes that were successfully identified in the psyllid via BLAST were then manually annotated based on predicted gene models in Web Apollo. Additionally, an OrthoMCL analysis was done using the proteomes of 8 related hemipterans, including the psyllid, to identify conserved hemipteran proteins as well as proteins that are common to all sequenced hemipterans, but missing in the psyllid. This helped to evaluate the completeness of the psyllid genome assembly.

The results from this project will enable other researchers working on this problem to focus their efforts on finding ways to inhibit proteins/genes that have been identified and manually annotated in the psyllid and, therefore, help find a solution to citrus greening faster.

My Experience

Working at BTI this summer was great. As someone who had never done research prior to this internship, I was excited and amazed to find out that the project I would be working on was of such high importance. To be given real responsibility and treated as an important member of the team was extremely rewarding. Everyone I worked with wanted me to succeed and gave me all of the support I needed to be able to do so. I really enjoyed being in an environment where learning about different aspects of plant biology, and science in general, was encouraged. Also, at the beginning of the summer I was totally new to bioinformatics research, but now I’m leaving with a skill set that I can use in many different fields. I feel lucky to have been exposed to so many areas of research and am eager to continue exploring plant science in future studies.

Intern Info

School Cornell University

Faculty Advisor Lukas Mueller

Year 2015

2015

Ivana Rodriguez

2015

Analysis and characterization of conserved non-coding sequences in species of the Legume Family (Fabaceae)

Project Summary

A character of considerable evolutionary and ecological significance is nodulation, the symbiotic fixation of atmospheric nitrogen by soil bacteria housed in specialized structures (nodules) of various angiosperms, particularly the diverse legume family. Conserved non-coding sequences (CNSs) are regions of DNA in close proximity to genes, and they serve regulatory purposes not yet fully understood in gene function. Previous work has demonstrated the mechanisms by which gene regulation provides crucial contributions and influences evolutionary change among species. Analysis of these highly conserved sequences may provide a novel way by which to track evolutionary changes and relationships among Leguminosae species on a genomics scale. More precisely, conserved non-coding sequence have great potential to supply novel evolutionary insights and may answer the question of whether these economically crucial nitrogen-fixing species have acquired their traits through a common ancestor, or through independent (convergent) means.

This project focused on assembling a pipeline by which to streamline identification of conserved sequences using whole genome data for the legume species. The location of these sequences was determined relative to coding regions to gain insight into the classes of genes the CNSs may be regulating. The identified sequences were further characterized by defining over-represented motifs, which were then queried against databases of transcription factor binding sites to obtain putative functions. Once we can deduce CNS sequences that are associated with nodulation-specific functions, additional hypotheses concerning the origins and evolution of nodulation can be deduced. Ultimately, this bioinformatics approach will serve to complement progress towards discovering evolutionary origins of nodulation.

My Experience

My internship at Boyce Thompson helped solidify my passion for discovery and research in plant genetics and genomics. My time spent at Cornell this summer helped me fulfill new goals I never thought I’d have the confidence to reach. Prior to the internship, I had almost no experience with computational biology, but with the help of my mentors, I’ve become well-oriented with bioinformatic data analysis. I was able to reach new heights in my understanding of what it means to be a scientist, and for me, that is an exhilarating and valuable feeling. I am immensely thankful to everyone in Dr. Mueller’s lab, as well as Dr. Suzy Strickler and Dr. Doyle for their endless support, faith and patience.

Intern Info

School New Mexico State University

Faculty Advisor Lukas Mueller

Year 2015

2015

Jonathan Gomes Selman

2015

Extending CRISPR design software features for tomato protein kinase silencing

Project Summary

The discovery of the CRISPR/Cas9 method in 2012 represents a revolutionary advance in genome editing. The system relies on two main functional groups: A Cas9 protein that cuts the targeted DNA strand and an sgRNA (single guide RNA) that guides the protein to specific locations in the genome. One of the primary challenges in implementing CRISPR systems is designing optimal guide RNAs with minimum off-target matches. Currently, several algorithms have been developed to score guide RNAs effectiveness by a variety of experimentally determined factors. My research has extended the scope of currently published CRISPR tools, such as CCTop and CRISPR-P. With added functionality, researchers will have greater specificity when selecting sgRNAs. Researchers will be able to analyze guide RNA positions within a gene, rewarding preference for RNAs near the 5’ end, and view whether they target specific gene domains for increased desirability. Additionally, researchers will have the option to design sequences intended to target multiple genes within a family. By extending guide RNAs to target several genes, scientists studying multiple related genes or genes from the same family will more easily be able to experiment with simultaneously silencing several genes. Utilizing these extended techniques, I have designed multiple guide sequences for the Receptor Like Cytoplasmic Kinase (RLCK) gene family, a family that plays an important role in plant immune responses. Overall, extending the functionality of current CRISPR tools allows for more specialized guide sequences to be obtained, enabling scientists a greater range of experiments when researching genes through genetic manipulation.

My Experience

Working as an intern at the Boyce Thompson Institute (BTI) has been a truly invaluable experience. Although I started the internship with a minimal background in bioinformatics and plant biology, through the amazing guidance of my mentor Dr. Noe Fernandez, I have gained an immense amount of knowledge about these areas. Every day presented an array of different challenges, teaching me the rigor and dedication needed to conduct research. My experience over the last six weeks has shown me the excitement of being in the pursuit of discovery, and has opened my eyes to the possibility of continuing to conduct research in the future. The opportunity to be imaginative, creative, and to think critically about a problem strongly appeals to my interest as an individual. I would like to extend my sincere gratitude to Tiffany Fleming, Nicole Waters Fisher, my mentor Dr. Noe Fernandez, Professor Lukas Mueller, and the entire Mueller lab for making my summer internship at BTI such an amazing and memorable experience.

Intern Info

School Ithaca High School

Faculty Advisor Lukas Mueller

Year 2015

2014

Angela Zhang

2014

Transcriptome characterization and evolution in cultivated and wild tomato species

Project Summary

Solanum lycopersicum, more commonly known as the tomato, is one of our most important agricultural crop. Not only is the bright red and fleshy fruit of the tomato a good source of vitamins and antioxidants, it also provides an important model system for fruit development. Despite its importance in agriculture however, Solanum lycopersicum suffers from a severe lack of genetic variation. Due to a combination of bottleneck effects and centuries of inbreeding, individual tomato plants are nearly genetically identical to each other. As a consequence of the decreased variation, this cultivated plant becomes much more susceptible to disease and changes in climate. Wild tomato species on the other hand, are rich in genetic variability and contain many adaptations to less-than-ideal conditions, such as arid climates and high altitude. Such species may be vital in breeding programs to create a more evolutionarily fit tomato.

This summer we assembled the transcriptomes of the wild tomato species, Solanum pervianum andSolanum pimpinellifolium and annotated the genomes of Solanum chilense, Solanum habrochaites, and Solanum pennellii with tools such as Trinity and Maker respectively. We also completed de novo genome assembly of Solanum chilense, a species more heterozygous than tomato,with Platanus. With the data from these wild tomato species, we constructed a phylogeny using programs such as ClustalX, Dnaml, and PAML. While tomato phylogeny trees had already been constructed in previous efforts, they were based on a much smaller sample size than we have available to us now. Using a larger sample size can help resolve phylogenies for wild species and provide greater insight into the evolutionary divergence of these tomato species. Overall, the work we completed offered interesting insights within the wild tomato species and will aid further research in improving tomato crop.

My Experience

My summer at Boyce Thompson Institute and in Lukas Mueller’s lab has allowed my bioinformatics skills to grow exponentially. Not only am I much more comfortable with UNIX systems and a multitude of bioinformatics tools, I have gained a better appreciation for the broad potential of big data in supporting the growth of plant science research. Outside of bioinformatics, I was able to witness the diversity of plant research conducted by some of the best scientists in their field. I leave the summer much more experienced and am ready to apply my newly learned skills to my continuing immersion in bioinformatics.

Intern Info

Faculty Advisor Lukas Mueller

Year 2014

2014

Javon Mullings

2014

Multi gene analysis tool for Virus Induced Gene Silencing

Project Summary

Virus Induced Gene Silencing (VIGS) is a very useful method of studying gene function in plants. For this method to be effectively employed, the target gene fragment introduced into the virus vector has to have a specific sequence. Identifying that specific sequence will allow for the silencing of target genes while influencing as few off target genes as possible. To aide in this effort, Sol Genomics Network (SGN) created the SGN VIGS Tool to help researchers design VIGS constructs with a user-friendly and highly customizable web tool. The original version of the tool only worked with one sequence at a time, yet researchers could require the silencing of hundreds of genes to study particular gene functions in large screening experiments. With this in mind, a new algorithm was developed to improve the tool to accept multiple sequences at a time. Implementation of the algorithm was done on a Stand Alone version of the SGN VIGS Tool, incorporating several programming languages and software. The result of this algorithm is a new feature on the SGN VIGS Tool, Bulk, that will accept a file of multiple Fasta sequences, and return the appropriate construct sequences, and target and off target gene information. The user also has the option to upload expression data that can accompany the results of their target and off target gene information. Future improvements will involve the addition of primer3 to further aide researchers in making multi gene virus constructs more efficiently.

My Experience

In creating the VIGS Tool Bulk, I was exposed to new programming languages and software such as Perl, Catalyst, and the Command-line interface. Often my code would return errors or I would encounter new processes, and need to do in depth research online to fix the issue. Consequently my knowledge and proficiency in those languages and software have increased significantly. I now also have extensive experience with web tool development. The members of the Mueller lab were great resources all summer, especially my mentor, Noe Fernandez. With his guidance, I was able to gain a greater understanding of the relationship between biology and computer science, and how to code more efficiently—something that will be invaluable if I pursue a career in bioinformatics.

Intern Info

School Wheaton College

Faculty Advisor Lukas Mueller

Year 2014

2013

Amelia Lovelace

2013

Diversity analysis of wild tomato species in the Lycopersicum clade using transcriptomes

S. lycopersicum, the domesticated tomato’s genetic diversity has been drastically reduced due to bottlenecks during domestication and as a result, useful allele diversity has been lost in the gene pool. Fortunately, wild tomato species have high genetic variation and thus have been utilized for restoration of the gene diversity in cultivated tomatoes. However, in order to fully understand the domesticated tomato’s genetic potential, the diversity of the wild tomato species must be further analyzed. RNA-seq data from the SRA and Next Generation Sequencing on various wild tomato tissue samples were utilized for analyzing genetic diversity. Cleaned paired-end Illumina reads from S. arcanum, S. peruvianum, S. pimpinellifolium, S. cornemulleri, S. chilense, and S. pennellii were mapped to the Heinz genome using the reference-based assembly program,Tophat2. The accepted hits files for each accession were then merged using Samtools. In order to analyze the coverage for the wild species data from various tissues in reference to the Heinz genome, Bedtools was utilized to get the number of reads for each Heinz gene that were found in a wild tomato species. Due to the sufficient coverage of these samples, a consensus sequence was generated based on the read mapping using Samtools. The consensus sequence for each species can then be compared using ClustalW to generate a phylogeny tree. A manual was generated in an effort to facilitate the analysis of future data as more wild tomato samples get sent in for Next Generation Sequencing.

My Experience

This internship has helped me learn a great deal about my research interests as well as what it means to conduct bioinformatics research. I was initially overwhelmed by the amount of catching up I had to do in order to complete my project; but looking back, I am surprised by the amount that I have accomplished in this short ten week internship. Not only am I more familiar with bioinformatics software and programming, but I now have a better idea of what I want to study in graduate school. Although, this internship has helped me realize how much I miss working in the greenhouse and wet lab, the skills I have developed in this bioinformatics program will be applicable for my future research in plant genetics at graduate school. Most importantly, this internship has fueled my passion for plant research in that it has revealed many interesting research topics.

Intern Info

School Hood College

Faculty Advisor Lukas Mueller

Year 2013

2013

Matthew Crum

2013

Genome assembly and analysis of wild tomato for markers associated with Tomato Yellow Leaf Curl Virus

The tomato is one of the most important fruit crops and humans have been domesticating them for hundreds of years in order to maximize the crop’s yield. Through the use of selective breeding, tomatoes have been able to produce much greater yields, but this process has also reduced the genetic diversity of domesticated tomatoes and limited their ability to adapt and fight diseases. Scientists and breeders have been trying to battle this problem through introgression of new and diverse genes and alleles from wild tomatoes species which still maintain their genetic diversity into the domestic tomato. By taking advantage of the disease resistance of wild-type tomatoes, scientists have been able to transfer resistances to certain diseases into domestic tomatoes. In order to identify which genes or alleles the domestic tomato needs in order to resist a disease, first the genetic basis for the wild tomatoes resistance must be identified. Using Next-Generation-Sequencing combined with bioinformatics analysis tools, it has become possible to sequence genomes and compare them much more efficiently. Using Bowtie2 (a tool which aligns Next-Gen-Sequence reads to a reference genome), samtools (a set of tools used to manipulate Next-Gen-Sequence reads), and Gbrowse (a genome visualization tool) we have been able to map and analyze multiple wild tomato accessions, including both resistant and susceptible inbreds, and compare their genomes in order to find loci which may contribute to resistance of Tomato Yellow Leaf Curl Virus.

My Experience

This experience has helped me grow in areas which span the purely scientific and academic to the development of inter-personal skills necessary to work in team settings. I have been more thoroughly introduced to using the Linux environment, improved my perl scripting skills, and been taught to use various bioinformatics tools such as Bowtie2 and samtools. This internship gave me a chance to observe firsthand the strength of applied bioinformatics in resolving biological problems that would have required far more resources otherwise. Perhaps most importantly, my experience with my mentor, Naama Menda, has taught me the skills necessary to work in a research setting. The freedom I was allowed in working out the kinks in my project helped me develop self-reliance while Naama’s assistance when I became stuck demonstrated the value, and often necessity, of asking for help and collaborating with others to accomplish a common goal.

Intern Info

School Ramapo College

Faculty Advisor Lukas Mueller

Year 2013

2013

Saji Akhil

2013

Exploring the benefits of distributed parallel computing on large biological datasets

The scientific community is accumulating biological data at an exponential rate. Next generation sequencing technologies easily accumulate many terabytes of genomic data per week. Investigators require new methods to efficiently query and analyze data of this scale. Apache Hadoop is open source software designed to replicate the features of Google MapReduce and Google File System (GFS)1. The basic principle behind the MapReduce paradigm is to dissect a computationally demanding task into smaller sub-tasks that can be distributed amongst a cluster of nodes. The Hadoop Distributed File System (HDFS) is the second major function of the Hadoop package. HDFS is designed to be an extremely redundant and expansive storage medium that can serve a viable purpose in a scientific setting where data parity and large data quantities are common. Our preliminary investigation of utilizing Hadoop in a scientific setting involved two key steps. First, thorough study of the benefits of distributed parallel computing vs. linear computing on relevant datasets was required. This encompassed both benchmarking the physical hardware and testing MapReduce applications and comparing the results to their linear counterparts. The results indicated that Hadoop offered significant improvements in computation time in specific cases such as filtering large datasets. Second, a MapReduce application was designed to query Genotyping-by-Sequencing (GBS) data. Normally, querying these datasets with linear processing takes extensive time and computational power,however, using parallel computing, query times were reduced by several orders of magnitude. This project will enable scientists to make novel discoveries using large biological datasets.

My Experience

The Bioinformatics Internship at the Boyce Thompson Institute (BTI) has been an eye opening experience into science as a whole. From the opportunity to attend seminars on a weekly basis to the immersive experience of working in a scientific environment on a daily basis has lead me to appreciate and gain insight into the lifestyle of a scientist. I became interested in the program offered at BTI due to my innate interest in computing and biology. Throughout the past ten weeks I have thoroughly explored biology in a omputational context and gained an understanding of the plethora of benefits that computation can offer to life scientists. The background and skills I have gained during this internship will be a valuable resource as I venture forward into future research endeavors during graduate school.

Intern Info

School Boston University

Faculty Advisor Lukas Mueller

Year 2013

2012

Kristin Blacklock

2012

De Novo Discovery and Comparison of Transposable Element Families in S. lycopersicum and S. pimpinellifolium

Transposable elements (TEs) are sequences of DNA capable of changing their relative position in the genome of an organism either by moving or copying themselves. Their discovery in the 1940s is credited to maize geneticist Barbara McClintock, whose suggestions of TE functionality were dismissed for decades thereafter. Recently, however, researchers have discovered several important aspects of TEs, including one unusual retrotransposon, Rider, whose activity in the SUN gene of the domesticated tomato (Solanum lycopersicum) has resulted in altered fruit morphology phenotypes. Thus, TEs may have played an important role in the speciation between the domesticated tomato and its wild ancestor, and so the identification of putative new active TE families in the S. lycopersicum genome that are absent or less abundant in the S. pimpinellifolium genome may be of particular interest for the advancement of tomato research.

This summer, I implemented a de novo transposable element discovery pipeline called the REPET Package on the tomato genome. Its two main components, TEdenovo and TEannot, are dedicated to the detection and analysis of repeats in genomic sequences, where TEdenovo returns a library of classified, non-redundant consensus sequences, and TEannot filters these results based on a similarity search with known TEs. Once obtained, the TE content of the domesticated and wild ancestor species was then compared to identify TE families with characteristics that suggest recent activity. For those TE families of interest, the presence or absence of individual elements was verified by aligning flanking sequences from the two species. The positions of TE polymorphism sites were compared to the locations of known genes to find instances of TEs that may be contributing to functional genetic variation.

My Experience

This summer internship has been an amazing experience in which I have grown both as a person and researcher. In these past weeks, I have gained not only great new friends, but also a deeper appreciation for bioinformatics and the answers we can find using computer science in conjunction with traditional biological research. I enjoyed my projects immensely, both the beginner project, which was to create a Catalyst-based web interface for Primer3 on the Sol Genomics website, and main project, which dealt with the de novo identification of transposable elements in the tomato genome. I have learned so much from my mentor and others, and now feel confident that my future career will involve a blend of computer science and biology.

Intern Info

School Chapman University

Faculty Advisor Lukas Mueller

Year 2012

2012

2011

Dil Begum

2011

Filling of Reference Genome Gaps Using Next Generation Sequencing

From famous entraes such as spaghetti and pasta to mouthwatering salsa, tomato has a wide variety of uses in world’s cuisine. Tomato is also very important in our diet. According to a magazine (Scott-Dixon, Krista), tomato is an antioxidant powerhouse. So where do they come from? What makes them so different in taste, color and appearance? Scientists are working day by day to improve tomato quality. Since the tomato genome has been sequenced, this information can be used to help us understand how genes work together to effect growth, development and functionality of an entire genome. However, when the tomato genome was sequenced gaps were left as a result of the assembly process thus creating areas of missing sequences. Therefore, the purpose of this study is to fill as many gaps as possible and possibly reduce the number of gaps using the de novo contigs assembled from Illumina short reads. The results were produced using tools such as BWA, novoallign, SamTools, Picard, BLAST, SOAPdenovo, BedTools as well as Perl scripts and some Linux regular expressions.

My Experience

The purpose of the project was to use both the reference genome with information about where short reads map, and to generate contigs that can be mapped to the reference genome to identify contigs that are located near gaps. The reads were obtained using the Illumina next generations sequencing technology. Different tools were used to generate contigs from the reads such as SamTools, Picard, SOAPdenovo Assembly, BLAST, BedTools, Perl programming language as well as some Linux regular expressions. After generating the contigs from the SOAPdenovo, contigs were run against BLAST which gave information about contigs ID, chromosome ID, e-values, percentage matches, etc. I was only interested in three fields such as chromosome ID, contig start and contig ends. With this information, I wrote a script that took the first two best outputs from BLAST and outputted into a BedTools format. I also wrote a second script that generated the locations of the gaps from start to finish and also outputted them in a BedTools format. Then I used BedTools with the output from both scripts to show where each contigs lined up to the chromosome, what chromosome they lined up to and how far they were from each other in terms of base pairs. For this project I chose regions less than 20bp.

Intern Info

School George Mason University

Faculty Advisor Lukas Mueller

Year 2011

2011

Samuel Moijueh

2011

Implementing an RSS feed into the sol genomics website using the PERL programming language

The Sol Genomics Network (SGN) is an open-source plant genomics database where researchers and agriculturalists can exchange information. However, this database previously did not have an RSS feed available to dynamically keep SGN users aware of any updates made to the database. Thus, the objective of this project was to implement an RSS feed that would make it easier for users to collaborate and share the latest information whenever it is updated, and access aggregated content from the central repository even if they are not in the SGN database. The Perl modules used to automate the process of generating feeds were XML::Feed, Catalyst::Model::XML::Feed, and Date::Calc. Essentially, the script resides in a URL; when the user clicks link (calls the URL), thescript parses the database for loci and then automatically generates the feed. A webpage was also designed using HTML, Javascript, among others to display all the available feeds on the Sol Genomics Network. An RSS feed will prove to a great addition to the Sol Genomics Network.

My Experience

This summer I worked on implementing an RSS feed into the sol genomics websites, a plant genome database, using PERL programming language. In today’s modern world, many websites, news services and blogs are using RSS feeds as a competitive means of syndicating or distributing their content over the web. Similarly, professors in academia, scientists in research institutions or even researchers in industry and development want to stay-up-date with the increasing knowledge in plant genomes – an RSS feed would be a great tool to ensure this. Thus, an RSS feed on the sol genomics datebase does not only make it easier for scientists to share and collaborate information but is also a highly marketable feature for generating revenue via RSS traffic advertising services such as Pheedo.

I am finishing my rotation at BTI with the bioinformatic interns. This experience has been helpful in getting me engaged into the field of bioinformatics. This summer I learned how to program in Perl. Specifically, I have learned how to write a simple program, run it and write tests to ensure its working properly, and if necessary debugging the program. I have also gotten more comfortable working in the Linux command line. Additionally, I learned quite a bit about the necessary websites and tools one should learn as a bioinformatician. For example, BLAST, SQL, github, RNA sequencing, dotplot, among others. Aside from acquiring these technical skills, this summer I developed a better sense of what I would like to study in graduate school. I like how the weekly seminars have exposed me to different kinds of research and what is required to earn a PhD. I also liked working with a mentor. The cxgn work channel reminded me that there was always help available. I have enjoyed my work at BTI. I recommend incoming interns or undergraduate scientists to come with an open mind and be prepared to work.

Intern Info

School Cornell University

Faculty Advisor Lukas Mueller

Year 2011

2008

Learn More

Lukas Mueller

How can genomics contribute to improved crop breeding?

Research Overview

Chris Costa Simoes

Srikanth Karaikal

Benjamin Maza

Naama Menda

Christine Nyaga

Ryan Preble

Titima Tantikanjana

Breedbase software to help speed crop improvement

Wild tomato genome will benefit domesticated cousins

Tomato’s Wild Ancestor Is a Genomic Reservoir for Plant Breeders

Tomato Genome Becomes Fully Sequenced – Paving the Way for Healthier Plants

Internships

Ana Sofia Castellanos Mosquera

CRISPGET: The CRISPR Guide Evaluator Tool to support guide selection based on off-target scoring systems

Intern Info

Nistar Steinerman

Auditing Database Tables in Breedbase

Intern Info

Ariel Maroney

My Experience:

Intern Info

Arianna Kazemi

The Sweet Potato Expression Atlas

My Experience

Intern Info

Tosin Olayinka

Genomic Analysis of NLR and PR Genes in Coffea Arabica and Its Ancestral Parents

My Experience

Intern Info

Thomas Chan

“Marker & Pedigree Hybrid Visualization Tool”

Project Summary:

My Experience:

Intern Info

Nicolas Dufour

“Improving Genotypic Data Storage Using The Hadoop Cluster”

Project Summary:

My Experience:

Intern Info

Alice Hu

“Analysis of Iochroma cyaneum Gene Families”

Project Summary:

My Experience:

Intern Info

Matthew Larson

“Developing a Data Comparison Tool for Musabase and MGIS Using the BrAPI Interface”

Project Summary:

My Experience:

Intern Info

Celine Manigbas

“Analysis of the Asclepias syriaca Genome and Gene Families”

Project Summary:

My Experience:

Intern Info

Asha Duhan

“Identification of a genomic region in lycopersicoides associated with resistance to Pseudomonas syringe pv. tomato”

Project Summary:

My Experience:

Intern Info

Alexander Ivanov

“BrAPI connecting the world’s plant breeding data through a mobile application”

Project Summary:

My Experience:

Intern Info

Anna Yaschenko

“Characterizing long non-coding RNA in the Asian Citrus Psyllid through genome annotation and molecular biology”

Project Summary:

My Experience:

Intern Info

Kyndra Zacherl

“Pedigree Verification in Cassava through Analysis of Single Nucleotide Polymorphisms”

Project Summary:

My Experience:

Intern Info

Suzi Barboza-Pacheco

Comparing Database Management Systems in Order to Store Cassava Genetic Data

Project Summary