Module 2. Introduction to Statistics in Bioinformatics
Basic statistics as used in bioinformatics, especially standard statistical tests of significance and when they apply. Applications to genetics, experimental and observational medical data, as well as exploration of multiple testing issues that arise in bioinformatics and other experimental settings.
N.B. Please complete this pre-course questionnaire if you have not already done so.
Lecture (2-1): Framework for statistical analysis of biomedical data
- Time: Feb 7 (Tuesday), 2:30 - 4:00 PM
- Topics: Probability distributions, quantifying central values and variability, quantifying association, graphical displays of data
- Material:
Lecture slides (PDF),
Pre class screen cast,
We will be using R throughout this module to demonstrate data analyses concepts and best practices. In preparation for our first lab session we are requesting that you all complete the free online interactive learning tutorial “TryR” (http://tryr.codeschool.com). This will take you through a gentle introduction to R syntax and some of the major R data structures (called vectors, matrices and data.frames).
Lab (2-1): Descriptive statistics and summarizing data
- Time: 2:30 – 4:00 PM, Feb 9 (Thursday)
- Topics: Introduction to R, probability distributions, quantifying central values and variability, quantifying association, graphical displays of data.
- Material:
PDF slides: Introduction to R, Video
Lab worksheet
Dataset TROPHY.csv
Readings Feasibility of Treating Prehypertension with an Angiotensin-Receptor Blocker (TROPHY. S. Julius 2006), R Data Types
Muddy point assessment - Homework:
Homework Assignment 1
Lecture (2-2): Approaches to statistical estimation and testing
- Time: Feb 14 (Tuesday), 2:30 - 4:00 PM
- Topics: Estimation and standard errors, standard errors for means, correlations, and log odds ratios, formal hypothesis testing, tests involving means, correlations, and log odds ratios, power.
- Material:
Lecture slides (PDF)
Lab (2-2): Statistical estimation and hypothesis testing
- Time: 2:30 – 4:00 PM, Feb 16 (Thursday)
- Topics: Estimation and standard errors, standard errors for means, correlations, and log odds ratios, formal hypothesis testing, one and two sample tests involving means, power.
- Material:
Lab worksheet
Dataset TROPHY.csv
Muddy point assessment - Homework:
Homework Assignment 2
Lecture (2-3): Analyses involving associations
- Time: Feb 21 (Tuesday), 2:30 - 4:00 PM
- Topics: Pearson correlation, t-test, odds ratios, discussion of a research article
- Material:
Lecture slides (PDF)
Lab (2-3): Pearson correlation, t-test, and log odds ratios
- Time: 2:30 – 4:00 PM, Feb 23 (Thursday)
- Topics: Tests based on Pearson correlation t-test, and log odds ratios
- Material:
Lab worksheet
Muddy point assessment - Homework (Due 3/14):
Homework Assignment 3
Lecture (2-4): Linear regression
- Time: Mar 7 (Tuesday), 2:30 - 4:00 PM
- Topics: Single and multiple variable linear regression, Bonferroni correction, power for regression analysis
- Material:
Lecture slides (PDF)
Lab (2-4): Regression models
- Time: 2:30 – 4:00 PM, Mar 9 (Thursday)
- Topics: Fitting regression models for prediction and effect estimation, inference for regression effects, R^2, diagnostics, comparing models
- Material:
Lab worksheet
Muddy point assessment - Homework:
Homework Assignment 4
Lecture (2-5): Introduction to graphical methods for multivariate data analysis
- Time: Mar 14 (Tuesday), 2:30 - 4:00 PM
- Topics: Clustering methods, Multidimensional scaling and Principal component analysis
- Material:
Lecture slides (PDF)
Lab (2-5): Clustering and principal component analysis
- Time: 2:30 – 4:00 PM, Mar 16 (Thursday)
- Topics: Multivariate data, Heat maps and dendrograms, clustering methods, principal component analysis
- Material:
Muddy point assessment
Reference material
RStudio cheatsheet: well designed reference card for RStudio features.
Try R An excellent interactive online R tutorial.