# Module 2. Introduction to Statistics in Bioinformatics

Basic statistics as used in bioinformatics, especially standard statistical tests of significance and when they apply. Applications to genetics, experimental and observational medical data, as well as exploration of multiple testing issues that arise in bioinformatics and other experimental settings.

N.B. Please complete this pre-course questionnaire if you have not already done so.

2.1 Lecture Framework for statistical analysis of biomedical data
Lab Descriptive statistics and summarizing data
2.2 Lecture Approaches to statistical estimation and testing
Lab Statistical estimation and hypothesis testing
2.3 Lecture Analyses involving associations
Lab Pearson correlation, t-test, and log odds ratios
2.4 Lecture Linear regression
Lab Regression models
2.5 Lecture Graphical methods for multivariate data analysis
Lab Clustering and principal component analysis

#### Lecture (2-1): Framework for statistical analysis of biomedical data

• Time: Feb 9 (Tuesday), 2:30 - 4:00 PM
• Topics: Probability distributions, quantifying central values and variability, quantifying association, graphical displays of data
• Material:
Lecture slides (PDF)
We will be using R throughout this module to demonstrate data analyses concepts and best practices. In preparation for our first lab session we are requesting that you all complete the free online interactive learning tutorial “TryR” (http://tryr.codeschool.com). This will take you through a gentle introduction to R syntax and some of the major R data structures (called vectors, matrices and data.frames).

#### Lecture (2-2): Approaches to statistical estimation and testing

• Time: Feb 16 (Tuesday), 2:30 - 4:00 PM
• Topics: Estimation and standard errors, standard errors for means, correlations, and log odds ratios, formal hypothesis testing, tests involving means, correlations, and log odds ratios, power.
• Material:
Lecture slides (PDF)

#### Lecture (2-3): Analyses involving associations

• Time: Feb 23 (Tuesday), 2:30 - 4:00 PM
• Topics: Pearson correlation, t-test, odds ratios, discussion of a research article
• Material:
Lecture slides (PDF)

#### Lecture (2-4): Linear regression

• Time: Mar 8 (Tuesday), 2:30 - 4:00 PM
• Topics: Single and multiple variable linear regression, Bonferroni correction, power for regression analysis
• Material:
Lecture slides (PDF)

#### Lecture (2-5): Introduction to graphical methods for multivariate data analysis

• Time: Mar 15 (Tuesday), 2:30 - 4:00 PM
• Topics: Clustering methods, Multidimensional scaling and Principal component analysis
• Material:
Lecture slides (PDF)

#### Lab (2-5): Clustering and principal component analysis

• Time: 2:30 – 4:00 PM, Mar 17 (Thursday) or Mar 18, 10:30 - 12:00 PM, (Friday)
• Topics: Multivariate data, Heat maps and dendrograms, clustering methods, principal component analysis
• Material:
Lab worksheet with key
Muddy point assessment

### Reference material

RStudio cheatsheet: well designed reference card for RStudio features.
Try R An excellent interactive online R tutorial.