Module 2. Introduction to Statistics in Bioinformatics
Basic statistics as used in bioinformatics, especially standard statistical tests of significance and when they apply. Applications to genetics, experimental and observational medical data, as well as exploration of multiple testing issues that arise in bioinformatics and other experimental settings.
N.B. Please complete this pre-course questionnaire if you have not already done so.
Lecture (2-1): Framework for statistical analysis of biomedical data
- Time: Feb 9 (Tuesday), 2:30 - 4:00 PM
- Topics: Probability distributions, quantifying central values and variability, quantifying association, graphical displays of data
- Material:
Lecture slides (PDF)
We will be using R throughout this module to demonstrate data analyses concepts and best practices. In preparation for our first lab session we are requesting that you all complete the free online interactive learning tutorial “TryR” (http://tryr.codeschool.com). This will take you through a gentle introduction to R syntax and some of the major R data structures (called vectors, matrices and data.frames).
Lab (2-1): Descriptive statistics and summarizing data
- Time: 2:30 – 4:00 PM, Feb 11 (Thursday) or Feb 12, 10:30 - 12:00 PM, (Friday)
- Topics: Introduction to R, probability distributions, quantifying central values and variability, quantifying association, graphical displays of data.
- Material:
PDF slides: Introduction to R, Video
Lab worksheet
Lab worksheet with key
Dataset TROPHY.csv
Readings Feasibility of Treating Prehypertension with an Angiotensin-Receptor Blocker (TROPHY. S. Julius 2006), R Data Types
Muddy point assessment - Homework:
Homework Assignment 1
Homework Assignment 1 with key
Lecture (2-2): Approaches to statistical estimation and testing
- Time: Feb 16 (Tuesday), 2:30 - 4:00 PM
- Topics: Estimation and standard errors, standard errors for means, correlations, and log odds ratios, formal hypothesis testing, tests involving means, correlations, and log odds ratios, power.
- Material:
Lecture slides (PDF)
Lab (2-2): Statistical estimation and hypothesis testing
- Time: 2:30 – 4:00 PM, Feb 18 (Thursday) or Feb 19, 10:30 - 12:00 PM, (Friday)
- Topics: Estimation and standard errors, standard errors for means, correlations, and log odds ratios, formal hypothesis testing, one and two sample tests involving means, power.
- Material:
Lab worksheet
Lab worksheet with key
Dataset TROPHY.csv
Muddy point assessment - Homework:
Homework Assignment 2
Homework Assignment 2 with key
Lecture (2-3): Analyses involving associations
- Time: Feb 23 (Tuesday), 2:30 - 4:00 PM
- Topics: Pearson correlation, t-test, odds ratios, discussion of a research article
- Material:
Lecture slides (PDF)
Lab (2-3): Pearson correlation, t-test, and log odds ratios
- Time: 2:30 – 4:00 PM, Feb 25 (Thursday) or Feb 26, 10:30 - 12:00 PM, (Friday)
- Topics: Tests based on Pearson correlation t-test, and log odds ratios
- Material:
Lab worksheet
Lab worksheet with key
Muddy point assessment - Homework:
Homework Assignment 3
Homework Assignment 3 with key
Lecture (2-4): Linear regression
- Time: Mar 8 (Tuesday), 2:30 - 4:00 PM
- Topics: Single and multiple variable linear regression, Bonferroni correction, power for regression analysis
- Material:
Lecture slides (PDF)
Lab (2-4): Regression models
- Time: 2:30 – 4:00 PM, Mar 10 (Thursday) or Mar 11, 10:30 - 12:00 PM, (Friday)
- Topics: Fitting regression models for prediction and effect estimation, inference for regression effects, R^2, diagnostics, comparing models
- Material:
Lab worksheet
Lab worksheet with key
Muddy point assessment - Homework:
Homework Assignment 4
Homework Assignment 4 with key
Lecture (2-5): Introduction to graphical methods for multivariate data analysis
- Time: Mar 15 (Tuesday), 2:30 - 4:00 PM
- Topics: Clustering methods, Multidimensional scaling and Principal component analysis
- Material:
Lecture slides (PDF)
Lab (2-5): Clustering and principal component analysis
- Time: 2:30 – 4:00 PM, Mar 17 (Thursday) or Mar 18, 10:30 - 12:00 PM, (Friday)
- Topics: Multivariate data, Heat maps and dendrograms, clustering methods, principal component analysis
- Material:
Lab worksheet with key
Muddy point assessment
Reference material
RStudio cheatsheet: well designed reference card for RStudio features.
Try R An excellent interactive online R tutorial.