# Lectures

All Lectures are Tu/Th 9:00-12:00 pm in Warren Lecture Hall 2015 (WLH 2015) (Map). Clicking on the class topics below will take you to corresponding lecture notes, homework assignments, pre-class video screen-casts and required reading material.

#DateTopics for Fall 2017
1Th, 09/28Welcome to Foundations of Bioinformatics
Course introduction, Leaning goals & expectations, Biology is an information science, History of Bioinformatics, Types of data, Application areas and introduction to upcoming course segments, Student computer setup

2Tu, 10/03Bioinformatics databases and key online resources
NCBI & EBI resources for the molecular domain of bioinformatics, Focus on GenBank, UniProt, Entrez and Gene Ontology. Hands on with BLAST, GenBank, OMIM, GENE, UniProt, Muscle, PFAM and PDB bioinformatics tools and databases

3Th, 10/05Sequence alignment fundamentals, algorithms and applications
Homology, Sequence similarity, Local and global alignment, classic Needleman-Wunsch, Smith-Waterman and BLAST heuristic approaches

Database searching beyond BLAST, PSI-BLAST, Profiles and HMMs, Protein structure comparisons

5Th, 10/12Introduction to UNIX for bioinformatics
Why do we use UNIX for bioinformatics? UNIX philosophy, 21 Key commands, Understanding processes, File system structure, Connecting to remote servers

6Tu, 10/17Working with Unix
Bioinformatics on the command line, Redirection, streams and pipes, Workflows for batch processing, Shell scripting, Organizing computational projects

7Th, 10/19Bioinformatics data analysis with R
R language basics and the RStudio IDE, Major R data structures and functions, Using R scripts from the command line

8Tu, 10/24Data exploration and visualization in R
Import data in various formats (both local and from online sources), The exploratory data analysis mindset, Data visualization best practices, Simple base graphics (scatterplots, histograms, bar graphs and boxplots), Building more complex charts with ggplot

9Th, 10/26Why, when and how of writing your own R functions
Import data in various formats both local and from online sources, The basics of writing your own functions that promote code robustness, reduce duplication and facilitate code re-use

10Tu, 10/31Working with R packages for bioinformatics
Extending functionality and utility with R packages, Obtaining R packages from CRAN and bioconductor, Working with Bio3D for molecular data, Managing and analyzing genome-scale data with bioconductor

11Th, 11/02Structural Bioinformatics
Protein structure function relationships, Protein structure and visualization resources, Modeling energy as a function of structure, Homology modeling, Predicting functional dynamics, Inferring protein function from structure

12Tu, 11/07Bioinformatics in drug discovery and design
Target identification, Lead identification, Small molecule docking methods, Protein motion and conformational variants, Molecular simulation and drug optimization

13Th, 11/09Project: Find a gene assignment
Principles of database searching, sequence analysis, structure analysis and bioinformatic data analysis with the R environment

14Tu, 11/14Genome informatics and high throughput sequencing
Searching genes and gene functions, Genome databases, Variation in the genome, Sequencing technologies past, present and future (Sanger, Shotgun, PacBio, Illumina, toward the 500 human genome), Biological applications of sequencing, Bioinformatics analysis methods 15Th, 11/16Major bioinformatics resources for genomics. Databases, tools and visualization resources from NCBI, EBI & UCSC, The Galaxy platform for quality control and analysis; FASTQ, SAM and BAM file formats; Sample workflows with FASTQC and bowtie2 16Tu, 11/21Immunoinformatics resources for the understanding of immunological information Guest lecture from Dr. Bjoern Peters (LIAI) with topics including: Epitope prediction, Reverse vaccinology, Immune system modeling, Disease diagnosis and therapy along with implications for the development of personalized medicine. Th, 11/23Happy Thanksgiving! No class N.B. Find a gene assignment due on Monday 11/27! 17Tu, 11/28Transcriptomics and the analysis of RNA-Seq data RNA-Seq aligners, Differential expression tests, RNA-Seq statistics, Counts and FPKMs and avoiding P-value misuse, Hands-on analysis of RNA-Seq data with R 18Th, 11/30Genome annotation and the interpretation of gene lists Gene finding and functional annotation, Functional databases KEGG, InterPro, GO ontologies and functional enrichment 19Tu, 12/05Guest lecture Student selected guest presentation with possible topics including: Metagenomics / Pharmacogenomics / Epigenomicss / Personal genomics / Genome evolution / Genome editing and synthetic genomics / Social impacts and ethical implications of continuing* genomic advances 20Th, 12/07Course summary Summary of learning goals, Student course evaluation time and exam preparation TBD (Th, 12/12)Final exam! # Class material ## 1: Welcome to Foundations of Bioinformatics Topics: Course introduction, Leaning goals & expectations, Biology is an information science, History of Bioinformatics, Types of data, Application areas and introduction to upcoming course segments, Student 30-second introductions, Student computer setup. Goals: • Understand course scope, expectations, logistics and ethics code. • Understand the increasing necessity for computation in modern life sciences research. • Get introduced to how bioinformatics is practiced. • Complete the pre-course questionnaire. • Setup your laptop computer for this course. Material: Homework: Screen Casts: 1 Welcome to BGGN-213: Course introduction and logistics. 2 What is Bioinformatics? Bioinformatics can mean diferent things to different people. What will we actually learn in this class? 3 How do we do Bioinformatics? Some basic bioinformatics can be done online or with downloaded tools. However, most often we will need a specailized computational setup. ## 2: Bioinformatics databases and key online resources Topics: NCBI & EBI resources for the molecular domain of bioinformatics, Focus on GenBank, UniProt, Entrez and Gene Ontology. Hands on with BLAST, GenBank, OMIM, GENE, UniProt, Muscle, PFAM and PDB bioinformatics tools and databases. There are many bioinformatics databases (see handout) and being able to judge their utility and quality is important. Goals: • Be able to query, search, compare and contrast the data contained in major bioinformatics databases (GenBank, GENE, UniProt, PFAM, OMIM, PDB) and describe how these databases intersect. • Be able to describe how nucleotide and protein sequence and structure data are represented (FASTA, FASTQ, GenBank, UniProt, PDB). • Be familiar with online tools at the EBI and NCBI including Muscle and BLAST. • The goals of the hands-on session is to introduce a range of core bioinformatics databases and associated online services whilst actively investigating the molecular basis of several common human disease. Material: Homework: ## 3. Alignment fundamentals, algorithms and applications Topics: Sequence Alignment and Database Searching Homology, Sequence similarity, Local and global alignment, Heuristic approaches, Database searching with BLAST, E-values and evaluating alignment scores and statistics. Goal: • Be able to describe how dynamic programming works for pairwise sequence alignment • Appreciate the differences between global and local alignment along with their major application areas. • Understand how aligning novel sequences with previously characterized genes or proteins provides important insights into their common attributes and evolutionary origins. • The goals of the hands-on session are to explore the principles underlying the computational tools that can be used to compute and evaluate sequence alignments. Material: Homework: ## 4: Advanced Database Searching Topics: Database searching beyond BLAST, Using PSI-BLAST, Profiles and HMMs, Protein structure comparisons, Beginning with command line based database searches. Goal: • Be able to calculate the alignment score between two nucleotide or protein sequences using a provided scoring matrix • Understand the limits of homology detection with tools such as BLAST • Be able to perform PSI-BLAST, HMMER and protein structure based database searches and interpret the results in terms of the biological significance of an e-value. • Run our first bioinformatics tool from the command line. Material: Homework: • Questions and alignment problem from Lecture 3 above are due before the next class. ## 5: Introduction to UNIX for bioinformatics Topics: Why do we use UNIX for bioinformatics? UNIX philosophy, 21 Key commands, Understanding processes, File system structure, Connecting to remote servers, Starting up and managing a Jetstream service virtual machine instance. Goal: • Understand why we use UNIX for bioinformatics • Use UNIX command-line tools for file system navigation and text file manipulation. • Have a familiarity with 21 key UNIX commands that we will use ~90% of the time. • Be able to connect to remote servers from the command line. Material: Homework: ## 6: Working with Unix Topics: Bioinformatics on the command line, Redirection, streams and pipes, Workflows for batch processing, Shell scripting, Organizing computational projects. Goal: • Use existing programs at the UNIX command line to analyze bioinformatics data, • Understand IO Redirection, Streams and pipes, • Think in terms of modular workflows for batch processing, • Understand best practices for organizing computational projects. Material: Homework: • Questions, • List an unexpected feature of a command of your choice. A feature that you would have not expected when reading about the command. • The file SGD_features.tab file contains the annotations for genomic features of the Yeast genome. The feature type is stored in the second column. • Create a file that counts how many times does each type occur. • What command would show the top ten most common features? • What command would show the least common features? ## 7: Bioinformatics data analysis with R Topics: R language basics and the RStudio IDE, Major R data structures and functions, Using R for data exploration and visualization. R scripts and R Markdown. Goal: • Familiarity with R’s basic syntax, • Be able to use R to read and parse comma-separated (.csv) formatted files ready for subsequent analysis, • Familiarity with major R data structures (vectors, matrices and data.frames), • Understand the basics of using functions (arguments, vectorizion and re-cycling). Material: Homework: ## 8: Data exploration and visualization in R Topics: The exploratory data analysis mindset, Data visualization best practices, Simple base graphics (including scatterplots, histograms, bar graphs, dot chats, boxplots and heatmaps), Building more complex charts with ggplot. Goal: • Appreciate the major elements of exploratory data analysis and why it is important to visualize data. • Be conversant with data visualization best practices and understand how good visualizations optimize for the human visual system. • Be able to generate informative graphical displays including scatterplots, histograms, bar graphs, boxplots, dendrograms and heatmaps and thereby gain exposure to the extensive graphical capabilities of R. • Appreciate that you can build even more complex charts with ggplot and additional R packages such as rgl. Material: Homework: ## 9: Why, When and How of Writing Your Own R Functions Topics: Import data in various formats both local and from online sources, The basics of writing your own functions that promote code robustness, reduce duplication and facilitate code re-use. Goals: • Be able to import data in various flat file formats from both local and online sources. • Understand the structure and syntax of R functions and how to view the code of any R function. • Understand when you should be writing functions. • Be able to follow a step by step process of going from a working code snippet to a more robust function. Material: Homework: ## 10: Using CRAN and Bioconductor Packages for Bioinformatics Topics: More on how to write R functions with worked examples. Further extending functionality and utility with R packages, Obtaining R packages from CRAN and Bioconductor, Working with Bio3D for molecular data, Managing genome-scale data with bioconductor. Goals: • Be able to find and install R packages from CRAN and bioconductor, • Understand how to find and use package vignettes, demos, documentation, tutorials and source code repository where available. • Be able to write and (re)use basic R scripts to aid with reproducibility. Material: Homework: • Complete question 6 from the lecture 9 worksheet. This entails turning a supplied code snippet into a more robust and re-usable function that will take any of the three listed input proteins and plot the effect of drug binding. Note assessment rubric within document. (Submission deadline: 9am Th, 11/09). ## 11: Structural Bioinformatics Topics: Protein structure function relationships, Protein structure and visualization resources, Modeling energy as a function of structure, Homology modeling, Predicting functional dynamics, Inferring protein function from structure. Goal: • View and interpret the structural models in the PDB, • Understand the classic Sequence>Structure>Function via energetics and dynamics paradigm, • Appreciate the role of bioinformatics in mapping the ENERGY LANDSCAPE of biomolecules, • Be able to use the Bio3D package for exploratory analysis of protein sequence-structure-function-dynamics relationships. Material: ## 12: Bioinformatics in drug discovery and design Topics: The traditional path to drug discovery; High throughput screening approaches; Computational receptor/target-based bioinformatics approaches; Computational ligand/drug-based bioinformatics approaches; Small molecule docking methods; Prediction and analysis of biomolecular motion, conformational variants and functional dynamics; Molecular simulation and drug optimization. Goals: • Appreciate how bioinformatics can predict functional dynamics & aid drug discovery, • Be able to use Bio3D and R for the analysis and prediction of protein flexibility, • Be able to perform In silico docking and virtual screening strategies for drug discovery, • Understand the increasing role of bioinformatics in the drug discovery process. Material: ## 13: Project Assignment Introduction! The find-a-gene project is a required assignment for BGGN-213. The objective with this assignment is for you to demonstrate your grasp of database searching, sequence analysis, structure analysis and the R environment that we have covered to date in class. You may wish to consult the scoring rubric at the end of the above linked project description and the example report for format and content guidance. Your responses to questions Q1-Q4 are due at the beginning of class Thursday November 16th (11/16/17). The complete assignment, including responses to all questions, is due at the beginning of class Tuesday December 5th (12/5/17). Late responses will not be accepted under any circumstances. ### Hands-on with Git: Today’s lecture and hands-on sessions with introduce Git, currently the most popular version control system. We will learn how to perform common operations with Git that you’ll do every day. We will also cover the popular social code-hosting platforms GitHub and BitBucket. ## 14: Genome informatics and high throughput sequencing Topics: Searching genes and gene functions, Genome databases, Variation in the genome, Sequencing technologies past, present and future Sanger, Shotgun, PacBio, Illumina, toward the500 human genome), Biological applications of sequencing, RNA-Sequencing for gene expression analysis, Bioinformatics analysis methods

Goals:

• Appreciate and describe in general terms the rapid advances in sequencing technologies and the new areas of investigation that these advances have made accessible.
• Understand the process by which genomes are currently sequenced and the bioinformatics processing and analysis required for their interpretation.
• Be able to launch your own cloud based Galaxy server for NGS analysis.
• Be able to navigate the Galaxy platform, input NGS sequence data and access common NGS tools for sequence analysis.

Material:

## 15: Major bioinformatics resources for genomics.

Topics: Databases, tools and visualization resources from NCBI, EBI & UCSC, The Galaxy platform for quality control and analysis; FASTQ, SAM and BAM file formats; Sample Galaxy workflow with FastQC and Bowtie2

Goals:

• For a genomic region of interest (e.g. the neighborhood of a particular SNP), use a genome browser to view nearby genes, transcription factor binding regions, epigenetic information, etc.
• Understand the FASTQ file format and the information it holds.
• Understand the SAM/BAM file format and the information it holds.
• Be able to launch your own cloud based Galaxy server for NGS analysis.
• Be able to use the Galaxy platform for basic RNA-Seq analysis from raw reads to expression value determination.

Material:

## 16: Immunoinformatics

Topics: Immunoinformatics resources for the understanding of immunological information. A case study in personalized cancer immunotherapy.
Guest lecture from Dr. Bjoern Peters (LIAI) with topics including: Epitope prediction, Reverse vaccinology, Immune system modeling, Disease diagnosis and therapy along with implications for the development of personalized medicine.

Material:

## 17: Transcriptomics and the analysis of RNA-Seq data

Topics: Analysis of RNA-Seq data with R, Differential expression tests, RNA-Seq statistics, Counts and FPKMs, Normalizing for sequencing depth, DESeq2 analysis.

Goals:

• Given an RNA-Seq dataset, find the set of significantly differentially expressed genes and their annotations.
• Given an RNA-Seq dataset, find the set of significantly differentially expressed genes and their annotations
• Gain competency with data import, processing and analysis with DESeq2 and other bioconductor packages
• Understand the structure of count data and metadata required for running analysis
• Be able to extract, explore, visualize and export results

Material:

## 18: Genome annotation and the interpretation of gene lists

Topics: Gene finding and functional annotation, Functional databases KEGG, InterPro, GO ontologies and functional enrichment

Goals: Perform a GO analysis to identify the pathways relevant to a set of genes (e.g. identified by transcriptomic study or a proteomic experiment). Use both Bioconductor packages and online tools to interpret gene lists and annotate potential gene functions.

Material:

• Good review article: Trapnell C, Hendrickson DG, Sauvageau M, Goff L et al. “Differential analysis of gene regulation at transcript resolution with RNA-seq”. Nat Biotechnol 2013 Jan;31(1):46-53. PMID: 23222703.

## 19: Guest lecture

Topics: Student selected industry based genomic scientist presentation with possible topics including: Metagenomics / Pharmacogenomics / Epigenomicss / Personal genomics / Genome evolution / Genome editing and synthetic genomics / Social impacts and ethical implications of continuing genomic advances

Goals: Understand the challenges in integrating and interpreting large heterogenous high throughput data sets into their functional context.

## 20: Foundational statistics for bioinformatics

Topics: Data summary statistics; Inferential statistics; Significance testing; Two sample T-test in R; Power analysis in R; Chi-square test in R; Multiple testing correction; and almost everything you wanted to know about Principal Component Analysis (PCA) but were afraid to ask!

Material:

© 2017 Barry J. Grant. All rights reserved. A UCSD Division of Biological Sciences Course