Day 3. Data Analysis and Graphics with R
R is powerful data programming language and environment for statistical computing, data analysis and graphics. R is typically used to explore and understand data in an open-ended, highly interactive, iterative way. Learning R will give you the freedom to experiment and problem solve during data analysis — exactly what we need as bioinformaticians and data scientists.
Before getting our hands dirty working with real data in R, we need to learn the basics of the R language. Even if you’ve poked around in R and seen these concepts before, I would still recommend you follow along and complete the free online interactive learning tutorial “TryR” (http://tryr.codeschool.com). This will take you through a gentle introduction to R syntax and some of the major R data structures (called vectors, matrices data.frames and lists) that we will cover in more detail in class .
Schedule (Tentative):
| Session | Time | Topics | 
|---|---|---|
| I | 9:00-10:15 AM | Introduction to R | 
| 10:15-10:30AM | Coffee Break | |
| II | 10:30-12:00 AM | R Control Structures and Functions | 
| 12:00-1:00PM | Lunch | |
| III | 1:00-2:15 PM | Data Exploration and Visualization in R | 
| 2:15-2:30 PM | Coffee Break | |
| IV | 2:30-4:00 PM | Working with R packages from CRAN & Bioconductor | 
Instructors:
Armand Bankhead (AB) Jacob Kitzman (JOK) Ryan Mills (RM)
Topics (Tentative):
I) Introduction to R [1:15 hr] (Slides) AB
- Why R?
 - Ways to Use R
 - R as a Statistical Programming Language
 - Writing and Runnnig R Scripts
 - Data Types
 - Data Structures
 - Vector and Matrix Operations
 - Optional Extra #1: R basics
 - Optional Extra #2: working with strings.
 
—- Coffee Break [15 mins] —
II) R Control Structures and Functions [1:30 hr] (Slides)
- Working Directory
 - Reading and Writing Data in R
 - Factors
 - Using Indexes
 - Merging Data Frames
 - Functions
 - Program Control Structures
 
—- Lunch Break [1 hr] —
III) Data Exploration and Visualization in R (Slides) JOK
- Getting your data into R.
    
- Import data in various formats (both local and from online sources).
 
 - The exploratory data analysis mindset.
 - Data visualization best practices.
 - R base graphics and the grammar of graphics.
    
- Simple base graphics (scatterplots, histograms, bar graphs and boxplots).
 - Building more complex charts with ggplot.
 
 
—- Coffee Break [15 mins] —
IV) Working with packages from CRAN & Bioconductor [1.30 hr] (Slides) JOK
- CRAN - the Comprehensive R Archive Network.
 - Bioconductor bioinformatics package system.
 
—- End/Wrap-Up —
Datasets
Reference material
RStudio cheatsheet: A well designed reference card for RStudio features.
Try R: An excellent interactive online R tutorial for beginners.
R for Data Science: A brand new O’Reilly book, available free online, that will teach you how to do data science with R.
Class notes on R language basics.
Class notes on useful R functions for working with strings.