Module 2. Introduction to Statistics in Bioinformatics

Basic statistics as used in bioinformatics, especially standard statistical tests of significance and when they apply. Applications to genetics, experimental and observational medical data, as well as exploration of multiple testing issues that arise in bioinformatics and other experimental settings.

N.B. Please complete this pre-course questionnaire if you have not already done so.

     
2.1 Lecture Framework for statistical analysis of biomedical data
  Lab Descriptive statistics and summarizing data
2.2 Lecture Approaches to statistical estimation and testing
  Lab Statistical estimation and hypothesis testing
2.3 Lecture Analyses involving associations
  Lab Pearson correlation, t-test, and log odds ratios
2.4 Lecture Linear regression
  Lab Regression models
2.5 Lecture Graphical methods for multivariate data analysis
  Lab Clustering and principal component analysis




Lecture (2-1): Framework for statistical analysis of biomedical data

  • Time: Feb 9 (Tuesday), 2:30 - 4:00 PM
  • Topics: Probability distributions, quantifying central values and variability, quantifying association, graphical displays of data
  • Material:
    Lecture slides (PDF)
    We will be using R throughout this module to demonstrate data analyses concepts and best practices. In preparation for our first lab session we are requesting that you all complete the free online interactive learning tutorial “TryR” (http://tryr.codeschool.com). This will take you through a gentle introduction to R syntax and some of the major R data structures (called vectors, matrices and data.frames).

Hands-on in class exercise.

Lab (2-1): Descriptive statistics and summarizing data




Lecture (2-2): Approaches to statistical estimation and testing

  • Time: Feb 16 (Tuesday), 2:30 - 4:00 PM
  • Topics: Estimation and standard errors, standard errors for means, correlations, and log odds ratios, formal hypothesis testing, tests involving means, correlations, and log odds ratios, power.
  • Material:
    Lecture slides (PDF)


Lab (2-2): Statistical estimation and hypothesis testing




Lecture (2-3): Analyses involving associations

  • Time: Feb 23 (Tuesday), 2:30 - 4:00 PM
  • Topics: Pearson correlation, t-test, odds ratios, discussion of a research article
  • Material:
    Lecture slides (PDF)


Lab (2-3): Pearson correlation, t-test, and log odds ratios




Lecture (2-4): Linear regression

  • Time: Mar 8 (Tuesday), 2:30 - 4:00 PM
  • Topics: Single and multiple variable linear regression, Bonferroni correction, power for regression analysis
  • Material:
    Lecture slides (PDF)


Lab (2-4): Regression models




Lecture (2-5): Introduction to graphical methods for multivariate data analysis

  • Time: Mar 15 (Tuesday), 2:30 - 4:00 PM
  • Topics: Clustering methods, Multidimensional scaling and Principal component analysis
  • Material:
    Lecture slides (PDF)


Lab (2-5): Clustering and principal component analysis

  • Time: 2:30 – 4:00 PM, Mar 17 (Thursday) or Mar 18, 10:30 - 12:00 PM, (Friday)
  • Topics: Multivariate data, Heat maps and dendrograms, clustering methods, principal component analysis
  • Material:
    Lab worksheet with key
    Muddy point assessment


Reference material

RStudio cheatsheet: well designed reference card for RStudio features.
Try R An excellent interactive online R tutorial.