GenBank Update
Solgenomics has been working on sequencing the tomato genome for some time now. The reference genome created can be found in GenBank where it was submitted for other researchers to access. GenBank is an on-line public repository of genome sequences run by the National Institutes of Health (NIH) and accessed by labs worldwide. Due to new advances and changing data, GenBank was not up to date. In order to update GenBank three scripts were needed. The file containing all the information for GenBank submission was in the incorrect format. A script already existed to reformat this data, however, it did not include orientation information for the contigs (pieces of sequenced DNA). Thus the existing reformatting script was edited to preserve the orientation information. The next step was to make the reference genome as accurate as possible, this was achieved by integrating other data types. The current reference genome was created from Next Generation Sequencing, a method that cuts DNA and sequences only the ends of these small pieces. Data was available from Bacterial Artificial Chromosomes (BACs) created from an E.coli biased sequencing technique. A new script was created to select the most accurate data, from a Basic Local Alignment Search Tool (BLAST) output. Another new script to integrate this information into the current reference genome was also created.
My Experience
This summer I learned many skills related to Bioinfromatics. I began the sumer by learning a new computer language, Perl. As I had some programming experience in the past, I was able to pick it up fairly quickly. My project was very exciting as I was creating scripts to do something that had never been done before, integrating BAC data into current genome information. I was lucky enough to have two mentors this summer, and they both were very helpful. They would check in on me regularly, and help me solve problems and errors that I ran into. I learned that it is possible to do programming and still have a link to the lab. Many of the interns I met this summer were working in labs creating data similar to what I was working with. I was able to improve my computer programming skills while learning about the most recent biological techniques. The most important thing I learned this summer is that I would love to pursue a career in Bioinfromatics.