Intern 2023 BCBC Bioinformatics Course
About the Course
We are living in massive data times, and science is not an exception. New sequencing technologies are filling hard disks with terabytes of information, billion of sequences that need to be analyzed in a proper way. But not only sequence data is growing, gene expression and metabolite concentrations are analyzed by the hundreds or thousands, in a way that makes it difficult, if not impossible, to use familiar tools such as Excel. In this new world, bioinformatic skills are needed, not only by computational biologists, but by biologists and biochemists who find themselves analyzing many genes, proteins or metabolites at the same time.
With this perspective, we aim to further bioinformatic skills within the postdoc community at Boyce Thompson Institute through this course. We try to keep it as simple as we can, with just one idea: “Show useful tools to resolve common problems found during the *omics data analysis”. For example, if I have two lists of hundred of genes, how can I combine them and find the common ones, or how can I analyze GO terms for my over-expressed genes, or how can download a chromosome region using Jbrowse or… there are dozens of examples.
Before the Course: Setting Up the BCBC Course Virtual Machine
The BTI Bioinformatics Course uses Virtual Box to virtualize a Linux operating system (OS) inside any computer. The virtual machine (VM) is a Linux OS, Debian distribution for 64-bit computers. In order to get it running, you’ll need to download both the VirtualBox software, and the virtual machine itself.
Steps:
1. Download and install VirtualBox following the package instructions, as well as the VirtualBox Extension Pack. For more info consult the user manual. Please download and install the latest VirtualBox version for your operating system.
2. Check if you have a 64-bit or 32-bit system (see these links for instructions: Windows or MacOS) and download the BCBC_Debian11.ova file from below. You can also click the link to download it. If you have a 32-bit system, please send us an email. Some Windows 64 bits systems don’t activate the acceleration VT-x/AMD-V, which prevents running the virtual machine. You will need to enable the VT-x/AMD-V in the Bios of your computer.
If you are running a 32-bit computer you can come to our office to find an alternative 64-bit laptop that you may use for the course.
3. Create a VM Folder in your system and copy/move the .ova file.
4. Open the VirtualBox program.
5. Select the option File > Import Appliance
6. Click “Open Appliance”. Select the .ova file and click “Continue”.
7. Enable “Generate new MAC addresses for all network adapters” and click “Import”.
8. If you have problems with VM installation, you can attend the bioinformatics hour on Tuesday or Thursday at 1 – 2 pm.
Troubleshooting
- I have enabled virtualization but VirtualBox still gives an error asking to “Enable vt-X” or similar.
Make sure you have disabled hyper-v. This can be done by following these instructions.
- I have downloaded both the VirtualBox software and the virtual machine file and enabled virtualization, but it still will not run.
- If you use a MAC and have problems with importing the latest VM , please check this link to do the troubleshooting. https://medium.com/@DMeechan/fixing-the-installation-failed-virtualbox-error-on-mac-high-sierra-7c421362b5b5
Make sure you have a 64-bit machine and have followed the above steps precisely, especially enabling . Come to our office or email us for further troubleshooting if you cannot find the problem.
6/15/23 — UNIX Command-Line Intro, Part 1
Presenter: Ryan
Topics covered
- Terminal file system navigation
- Wildcards, shortcuts and special characters
- File permissions
- Compression UNIX commands
- Networking UNIX commands
Estimated Time
- Lecture and exercises: 2:00 h
Materials
6/22/23 — UNIX Command-Line Intro, Part 2
Presenter: Ryan
Topics covered
- Basic NGS file formats
- Text files manipulation commands
- Command-line pipelines
- Introduction to bash scripts
Estimated Time
- Lecture and exercises: 2:00 h
Materials
6/29/23 — Quality Control
Presenter: Adrian
Topics covered
- Different sequencing technologies
- Sequence file types and formats
- Genome Assembly
- Annotation
- Quality encoding
- Quality control tools
Estimated Time
- Lecture and examples: 2:00 h
Materials