Learning Goals

At the end of this course students will:

  • Understand the increasing necessity for computation in modern life sciences research.
  • Be able to use and evaluate online bioinformatics resources including major biomolecular and genomic databases, search and analysis tools, genome browsers, structure viewers, and select quality control and analysis tools to solve problems in the biological sciences.
  • Be able to use the UNIX command line and the R environment to analyze bioinformatics data at scale.
  • Understand the process by which genomes are currently sequenced and the bioinformatics processing and analysis required for their interpretation.
  • Be familiar with the research objectives of the bioinformatics related sub-disciplines of Genomics, Transcriptomics and Structural bioinformatics.

In short, students will develop a solid foundational knowledge of bioinformatics and be able to evaluate new biomolecular and genomic information using existing bioinformatic tools and resources.

Specific Learning Goals

Teaching toward the specific learning goals below is expected to occupy 60%-70% of class time. The remaining course content is at the discretion of the instructor with student body input. This includes student selected topics for peer presentation, as well as one student selected guest lecture from an industry based genomic scientist.

All students who receive a passing grade should be able to:

  Lecture(s):
1Appreciate and describe in general terms the role of computation in hypothesis-driven discovery processes within the life sciences.1, 2, 20
2Be able to query, search, compare and contrast the data contained in major bioinformatics databases and describe how these databases intersect (GenBank, GENE, UniProt, PFAM, OMIM, PDB, UCSC, ENSEMBLE).2, 12, 13
3Describe how nucleotide and protein sequence and structure data are represented (FASTA, FASTQ, GenBank, UniProt, PDB).3, 10
4Be able to describe how dynamic programming works for pairwise sequence alignment and appreciate the differences between global and local alignment along with their major application areas.4, 5
5Calculate the alignment score between two nucleotide or protein sequences using a provided scoring matrix and be able to perform BLAST, PSI-BLAST, HMMER and protein structure based database searches and interpret the results in terms of the biological significance of an e-value.5, 10
6Use UNIX command-line tools for file system navigation and text file manipulation.6, 7, 10, 11, 24, 15
7Use existing programs at the UNIX command line to analyze bioinformatics data.7, 10, 11, 13, 14, 15, 16
8Use R to read and parse comma-separated (.csv) formatted files ready for subsequent analysis.8, 9, 10, 11, 13, 15, 16
9Perform elementary statistical analysis on biomolecular and “omics” datasets with R and produce informative graphical displays and data summaries.9, 10, 11, 13, 15, 16
10Be able to find, install and use R packages from CRAN, Bioconductor and GitHub.7, 11-17
11View and interpret the structural models in the PDB.10, 11
12Explain the outputs from structure prediction algorithms and small molecule docking approaches.11
13Appreciate and describe in general terms the rapid advances in sequencing technologies and the new areas of investigation that these advances have made accessible.13, 14, 15
14Understand the process by which genomes are currently sequenced and the bioinformatics processing and analysis required for their interpretation.13
15For a genomic region of interest (e.g. the neighborhood of a particular gene), use a genome browser to view nearby genes, transcription factor binding regions, epigenetic information, etc.14
16Given an RNA-Seq data file, find the set of significantly differentially expressed genes and use online tools to interpret gene lists and annotate potential gene functions.15, 16
17Perform a GO analysis to identify the pathways relevant to a set of genes (e.g. identified by transcriptomic study or a proteomic experiment).16
18Use the KEGG pathway database to look up interaction pathways.17
19Use graph theory to represent biological data networks.17, 18
20Understand the challenges in integrating and interpreting large heterogenous high throughput data sets into their functional context.19
21Have an appreciation for the social impacts and ethical implications of how genomic sequence information is used in our society20