Genomic Analysis of NLR and PR Genes in Coffea Arabica and Its Ancestral Parents
Coffee is one of the world’s most widely consumed beverages and an economically important crop for both its growers and its processors. Commercially grown coffee is usually one of two species – Coffea arabica or Coffea canephora. These two species differ in terms of flavor, favored climates, and, most importantly, disease resistance. C. arabica is the more desired of the two species due to its sweeter, less bitter taste and its more delicate flavor. However, C. canephora is easier to take care of and has superior general disease resistance.
C. arabica is an allotetraploid – a species resulting from a hybridization event between two different diploid parental species – and is the result of an ancient cross between Coffea eugenioides and the aforementioned C. canephora. This hybridization is interesting because both the allotetraploid and one of the diploid parents are widely cultivated and, of the two, it is the diploid which has superior disease resistance. This is unusual because, in general, polyploid plants tend to be more robust than their diploid ancestors as a result of increased diversity in key gene families. Using tools from comparative genomics, we can determine the potential mechanisms behind this discrepancy.
In order to do this, we looked at two gene families heavily involved in plant immunity to pests and disease – nucleotide-binding leucine rich repeat (NLR) and pathogen-related (PR) genes. Using high quality functional gene annotations of the three species as a base, NLR genes were identified using specific DNA motifs as signals and PR genes were identified using their associated InterPro IDs. In addition to this, genes in C. arabica were characterized into a subgenome that was either ancestral to C. eugenioides and C. canephora. With this information, comparisons of C. arabica subgenomes with their respective ancestral species on a genomewide and gene family scale were made; making comparisons on gene content, orthology, and synteny. With this were able to show that the C. arabica genome contains numerically less NLR genes with less diversity than its parents.
Finally, expression data from two varieties of C. arabica were compared. One from a variety which is resistant to coffee berry disease (Catimor) and the other from one which is not (Cattura). These were compared across three different timescales and conditions (infected and non-infected) and genes which were deferentially expressed between susceptible and resistant varieties were identified. In addition, genes were clustered based on similar changes in expression over time and graphed together using the TCSeq Bioconductor package. For both the time series and differential expression analyses, gene ontology enrichment was performed using the topGO package in R and used to better interpret the data.
This summer has been an interesting experience both inside and outside of the lab. While I do have experience working with next-generation sequencing (NGS) data, I have not studied an organism with the same breadth and depth that I did this summer with the three Coffea species. Linking different experiments and statistics to a certain hypothesis was both the most frustrating and the most satisfying part of my time here. The confusion coming from an unexpected figure and the satisfaction coming from producing elegant results both helped me grow and become a better scientist.
I was also lucky enough to be surrounded by a cohort of students, professors, and researchers who were passionate about a wide range of different topics – both inside and outside the realm of science. People who, when asked, were eager to help me and others out. Hopefully, I am able to do the same for others in the near and far future.