# Day 3. Data Analysis and Graphics with R

R is powerful data programming language and environment for statistical computing, data analysis and graphics. R is typically used to explore and understand data in an open-ended, highly interactive, iterative way. Learning R will give you the freedom to experiment and problem solve during data analysis — exactly what we need as bioinformaticians and data scientists.

Before getting our hands dirty working with real data in R, we need to learn the basics of the R language. Even if you’ve poked around in R and seen these concepts before, I would still recommend you follow along and complete the free online interactive learning tutorial “TryR” (http://tryr.codeschool.com). This will take you through a gentle introduction to R syntax and some of the major R data structures (called vectors, matrices data.frames and lists) that we will cover in more detail in class .

### Schedule (Tentative):

Session | Time | Topics |
---|---|---|

I | 9:00-10:15 AM | R Language Basics and the RStudio IDE |

10:15-10:30AM | Coffee Break | |

II | 10:30-12:00 AM | R Data Structures and Functions |

12:00-1:00PM | Lunch | |

III | 1:00-2:15 PM | Data Exploration and Visualization in R |

2:15-2:30 PM | Coffee Break | |

IV | 2:30-4:00 PM | Working with R packages from CRAN & Bioconductor |

### Instructors:

Barry Grant (BJG)

Jacob Kitzman (JOK)
Ryan Mills (RM)

### Topics (Tentative):

#### I) R Language Basics and the RStudio IDE [1:15 hr] (Slides) (Exercise-1) BG

- What is R?
- Motivation: Why use R?
- Getting started with R and the RStudio IDE (integrated development environment).
- Using R.
- Getting help.
- Major data structures (vectors, matrices and data.frames).
- Using functions (arguments, vectorizion and re-cycling).
- R scripts and reproducibility.
- Optional extra: working with strings.

—- Coffee Break [15 mins] —

#### II) R Data Structures and Functions [1:30 hr] (Slides)

- Major data structures (vectors, matrices, lists and data.frames).
- Indexing and vectorizion.

- Using and writing functions (arguments, more on vectorizion and re-cycling).

—- Lunch Break [1 hr] —

#### III) Data Exploration and Visualization in R (Slides) JOK

- Getting your data into R.
- Import data in various formats (both local and from online sources).

- The exploratory data analysis mindset.
- Data visualization best practices.
- R base graphics and the grammar of graphics.
- Simple base graphics (scatterplots, histograms, bar graphs and boxplots).
- Building more complex charts with ggplot.

—- Coffee Break [15 mins] —

#### IV) Working with packages from CRAN & Bioconductor [1.30 hr] (Slides) BG

- CRAN - the Comprehensive R Archive Network.
- Bioconductor bioinformatics package system.

—- End/Wrap-Up —

### Datasets

### Reference material

RStudio cheatsheet: A well designed reference card for RStudio features.

Try R: An excellent interactive online R tutorial for beginners.

R for Data Science: A brand new O’Reilly book, available free online, that will teach you how to do data science with R.

Class notes on R language basics.

Class notes on useful R functions for working with strings.