Visualization with R Lecture (Part 2)

BGGN-213 Lecture 8:
Barry Grant < http://thegrantlab.org >
Date: 2017-10-23 (14:04:51 PDT on Mon, Oct 23)

Background

This document contains all the code used to generate the images for lecture number 8 of BGGN-213 (Foundations of Bioinformatics) and follows on from Part 1.. This document is not meant to stand alone – it has very little explanatory text. It is meant to be a reference for students who want to see how the plots for the presentation were made.

Setup

First I set up a theme for presentations. It uses white on black (good for large, dimly lit rooms), and lines and text that are too small/fine when rendered in a document, but which work well when rendered at 300dpi for the presnetation. Note that I would usually not use white on black for documents, but I use this as the default theme throughout because there isn’t an easy way (that I know of) to set the default color of a geom to something other than black.

theme_preso <- theme_dark(base_size=11)
##theme_set(theme_preso)

preso_save<-function(filename, plot=last_plot(), theme=theme_preso, 
                     width=1280/300, height=720/300, dpi=300) {
  ggsave(filename, ##plot+theme, 
         width=width, height=height, dpi=dpi)
}

Part 1: Why visualize at all?

A circle

Showing the data in a scatterplot makes the relationship obivous pre-attentively.

circle<-read.delim(textConnection(
"x  y
1.972   1.236
1.112   1.994
0   1.009
0.665   1.942
0.235   0.356
0.247   1.658
1.275   1.961
0.702   0.045
1.76    0.35
1.691   0.277
1.628   1.778
1.957   1.29
0.111   0.542
0.902   0.005
0.598   0.085
1.613   1.79
1.298   1.955
0.651   1.937
1.949   1.316
0.099   0.567
0.862   0.01
0.027   0.768
0.706   1.956
1.042   1.999"))

ggplot(circle, aes(x,y))+geom_point(color="black", size=2)+coord_equal()

preso_save("img/circle.png")

But showing it as a time-series obscures the relationship. This is arguably worse than a simple table.

circle_w <- circle %>% 
  mutate(n=1:nrow(circle)) %>%
  gather(var, value, x, y)
ggplot(circle_w, aes(n, value, color=var))+geom_line()+xlab(NULL)

preso_save("img/circle_lines.png")

Anscombe’s Quartet

quartet<-read.delim(textConnection(
"group  x   y
A   10  8
A   8   7
A   13  7.6
A   9   8.8
A   11  8.3
A   14  10
A   6   7.2
A   4   4.3
A   12  10.8
A   7   4.8
A   5   5.7
B   10  9.1
B   8   8.1
B   13  8.7
B   9   8.8
B   11  9.3
B   14  8.1
B   6   6.1
B   4   3.1
B   12  9.1
B   7   7.3
B   5   4.7
C   10  7.5
C   8   6.8
C   13  12.7
C   9   7.1
C   11  7.8
C   14  8.8
C   6   6.1
C   4   5.4
C   12  8.2
C   7   6.4
C   5   5.7
D   8   6.6
D   8   5.8
D   8   7.7
D   8   8.8
D   8   8.5
D   8   7
D   8   5.3
D   19  12.5
D   8   5.6
D   8   7.9
D   8   6.9"))

All four are identical on common statistical measures.

quartet %>%
  group_by(group) %>%
  summarize(mean_x=mean(x), mean_y=mean(y))
## # A tibble: 4 x 3
##    group mean_x   mean_y
##   <fctr>  <dbl>    <dbl>
## 1      A      9 7.500000
## 2      B      9 7.490909
## 3      C      9 7.500000
## 4      D      9 7.509091
quartet %>%
  group_by(group) %>%
  do(tidy(lm(y~x, data=.))) %>%
  select(group, term, estimate) %>%
  spread(term, estimate)
## # A tibble: 4 x 3
## # Groups:   group [4]
##    group `(Intercept)`         x
## * <fctr>         <dbl>     <dbl>
## 1      A      3.008182 0.4990909
## 2      B      2.990909 0.5000000
## 3      C      3.016364 0.4981818
## 4      D      3.017273 0.4990909
quartet %>%
  group_by(group) %>%
  do(glance(lm(y~x, data=.))) %>%
  select(group, r.squared)
## # A tibble: 4 x 2
## # Groups:   group [4]
##    group r.squared
##   <fctr>     <dbl>
## 1      A 0.6709131
## 2      B 0.6641054
## 3      C 0.6694547
## 4      D 0.6740641

But obivously different visually.

ggplot(quartet, aes(x,y))+geom_point(color="black")+facet_wrap(~group)

preso_save("img/anscombe.png")

Part 2: Estimation

The hierarchy

The first rule of color is do not talk about color

Color has three separate channels.

values<-data.frame(val=1:5, y=1)

theme_extras <- theme(axis.title=element_blank(),
                      axis.text=element_blank(),
                      axis.ticks=element_blank(),
                      legend.position="none")

ggplot(values, aes(val,y,fill=val)) + geom_tile() + scale_fill_gradient(low="black", high="white", limits=c(0,6)) + theme_extras

preso_save("img/luminance.png", height=720/300/3, theme=theme_preso + theme_extras)
ggplot(values, aes(val,y,fill=val)) + geom_tile() + scale_fill_gradient(low="white", high="red", limits=c(0,6)) + theme_extras

preso_save("img/saturation.png", height=720/300/3, theme=theme_preso + theme_extras)
ggplot(values, aes(val,y,fill=factor(val))) + geom_tile() + scale_fill_discrete() + theme_extras

preso_save("img/hue.png", height=720/300/3, theme=theme_preso + theme_extras)

Lots of plots with the mtcars dataset. I had to drop the font size way down to get the plots the render well for the presentation. They are much too small for the document. I offer my apologies.

# Make model an actual column in the dataset.
mtcars <- mtcars %>%
  mutate(model=rownames(mtcars))

Hue

The viridis hue scale. I didn’t end up using this in the presentation. I probably should have, as it’s far superior to the crappy one I made up.

ggplot(mtcars, aes("a", model, fill=mpg))+geom_tile()+scale_fill_viridis()

ggplot(mtcars, aes("a", reorder(model, mpg), fill=mpg))+geom_tile()+scale_fill_viridis()

Ordered and unordered plots with hue encoding efficiency.

theme_extras <- theme(axis.title.x=element_blank(),
                      axis.text.x=element_blank(),
                      axis.ticks.x=element_blank(),
                      axis.text.y=element_text(size=rel(0.6)))

ggplot(mtcars, aes("a", model, fill=mpg)) + 
  geom_tile() + 
    scale_fill_gradient2(midpoint=median(mtcars$mpg), mid="#ffffbf", low="#91bfdb", high="#fc8d59") + 
  # Terrible for color blind people
  #    scale_fill_gradient2(midpoint=median(mtcars$mpg), mid=muted("green")) + 
  ylab(NULL) + theme_extras

preso_save("img/mtcars_hue.png", width=1280/300*(3/4), theme=theme_preso+theme_extras)

ggplot(mtcars, aes("a", reorder(model, mpg), fill=mpg)) + 
  geom_tile() + 
    scale_fill_gradient2(midpoint=median(mtcars$mpg), mid="#ffffbf", low="#91bfdb", high="#fc8d59") + 
  # Terrible for color blind people
  #    scale_fill_gradient2(midpoint=median(mtcars$mpg), mid=muted("green")) + 
  ylab(NULL) + theme_extras

preso_save("img/mtcars_hue_ordered.png", width=1280/300*(3/4), theme=theme_preso+theme_extras)

Saturation

Unordered and ordered plots with saturation.

ggplot(mtcars, aes("a", model, fill=mpg)) + 
  geom_tile() + 
  scale_fill_gradient(low="white", high=muted("blue"))+ 
  ylab(NULL) + theme_extras

preso_save("img/mtcars_saturation.png", width=1280/300*(3/4), theme=theme_preso+theme_extras)

ggplot(mtcars, aes("a", reorder(model, mpg), fill=mpg)) + 
  geom_tile() + 
  scale_fill_gradient(low="white", high=muted("blue"))+ 
  ylab(NULL) + theme_extras

preso_save("img/mtcars_saturation_ordered.png", width=1280/300*(3/4), theme=theme_preso+theme_extras)

For accurate ratioing with saturation, the scale has to extend to zero.

ggplot(mtcars, aes("a", reorder(model, mpg), fill=mpg)) + 
  geom_tile() + 
  scale_fill_gradient(low="white", high=muted("blue"))+ 
  ylab(NULL) + theme_extras + expand_limits(fill=0)

preso_save("img/mtcars_saturation_expanded.png", width=1280/300*(3/4), theme=theme_preso+theme_extras)

Area

mtcars <- mtcars %>%
  mutate(model=reorder(model, mpg)) %>%
  arrange(mpg, model)

theme_extras <- theme(axis.title=element_blank(),
                      axis.text=element_blank(),
                      axis.ticks=element_blank())

cols<-4
ggplot(mtcars, aes(rep(1:cols, 32/cols), rep(1:(32/cols), each=cols))) + 
  geom_point(aes(size=mpg), color="black") + 
  geom_text(aes(label=model), color="black", position=position_nudge(y=0.5), size=2) +
  expand_limits(x=c(0.5,4.5), y=c(1,8.5))+ ylab(NULL) + scale_size_area() + theme_extras 

preso_save("img/mtcars_area.png", theme=theme_preso+theme_extras)

Again, accurate ratioing depends on a scale that goes to zero.

ggplot(mtcars, aes(rep(1:cols, 32/cols), rep(1:(32/cols), each=cols))) + 
  geom_point(aes(size=mpg), color="black") + 
  geom_text(aes(label=model), color="black", position=position_nudge(y=0.5), size=2) +
  expand_limits(x=c(0.5,4.5), y=c(1,8.5), size=15)+ ylab(NULL) + theme_extras 

preso_save("img/mtcars_area_nonzero.png", theme=theme_preso+theme_extras)

“Bubble charts” use size to indicate the weight of a point. In these plots the outliers have low weight (small size).

to_plot<-data.frame(p=runif(100),
                    n=rpois(100, 5)^2) %>%
  mutate(y=rbinom(100,n,p))
ggplot(to_plot,aes(p,y/n,size=n)) +geom_point(color="black") +scale_size_area()

ggplot(to_plot,aes(p,y/n)) +geom_point(color="black")

to_plot<-data.frame(trial=1:100,
                    n=rpois(100, 5)^2) %>% 
  mutate(y=rbinom(100,n,0.5)/n)
ggplot(to_plot, aes(trial, y)) + geom_point(color="black")
## Warning: Removed 1 rows containing missing values (geom_point).

preso_save("img/flips_same.png")
## Warning: Removed 1 rows containing missing values (geom_point).
ggplot(to_plot, aes(trial, y, size=n)) + geom_point(color="black", alpha=2/3) + scale_size_area()
## Warning: Removed 1 rows containing missing values (geom_point).

preso_save("img/flips_area.png")
## Warning: Removed 1 rows containing missing values (geom_point).

Angle

Again, note that the scale goes to zero (a horizontal line).

make_segment <- function(df, scale_low, scale_high) {
  scale_range <- scale_high-scale_low
  fraction <- (df$mpg-scale_low) / scale_range
  theta <- fraction * pi/2
  return(data.frame(x=c(0, cos(theta)),
                    y=c(0, sin(theta))))
}
mtcars_angle <- mtcars %>%
  group_by(model) %>%
  do(make_segment(., 0, max(mtcars$mpg)))
#  do(make_segment(., min(mtcars$mpg), max(mtcars$mpg)))

theme_extras <- theme(axis.title=element_blank(),
                      axis.text=element_blank(),
                      axis.ticks=element_blank(),
                      strip.text.x = element_text(size=rel(0.5), hjust=0))

ggplot(mtcars_angle, aes(x,y,group=model))+
  geom_path(color="black")+
  facet_wrap(~model,ncol=8)+coord_equal() +
  theme_extras

preso_save("img/mtcars_slope.png", theme=theme_preso+theme_extras)

Length

In the presentation, I use the jittered bars, so that the audience can’t rely on a common scale (which is next in the hierarchy).

theme_extras <- theme(axis.text.y=element_text(size=rel(0.7)),
                      axis.title.x=element_blank(),
                      axis.text.x=element_blank(),
                      axis.ticks.x=element_blank())

mtcars$random <- runif(32,0,5)
ggplot(mtcars, aes(x=random, xend=mpg+random, y=model, yend=model)) +
  geom_segment(size=2, color="black") + xlab(NULL) + ylab(NULL) + theme_extras

preso_save("img/mtcars_length.png", theme=theme_preso+theme_extras)


theme_extras <- theme(axis.text.y=element_text(size=rel(0.7)))

ggplot(mtcars, aes(x=0, xend=mpg, y=model, yend=model)) +
  geom_segment(size=2, color="black") + xlab(NULL) + ylab(NULL) + theme_extras

preso_save("img/mtcars_bars.png", theme=theme_preso+theme_extras)

Error bars encode information using length not on a common scale.

trials<-10
flips<-data.frame(trial=1:trials) %>%
  mutate(n=round(runif(trials, 20, 500)),
         y=rbinom(trials, n, 0.5),
         p=y/n,
         se=sqrt(p*(1-p)/n),
         lower=p-1.96*se,
         upper=p+1.96*se)
ggplot(flips, aes(trial, p, ymin=lower, ymax=upper)) + geom_pointrange(color="black")

preso_save("img/flips.png")

Position on non-aligned scales

Doing this with facets is the easiest way in ggplot, but uses too much space.

theme_extras <- theme(axis.title.y=element_blank(),
                      axis.text.y=element_blank(),
                      axis.ticks.y=element_blank(),
                      strip.text.x = element_text(size=rel(0.5), hjust=0),
                      panel.grid.major =   element_line(colour = "grey50", size = 0.2),
                      panel.grid.minor =   element_line(colour = "grey50", size = 0.2, linetype="dotted")
                      )

ggplot(mtcars, aes(x=mpg, y="a")) + geom_point(color="black", size=1) + 
  scale_x_continuous(breaks=c(10,20,30)) +
  facet_wrap(~model, ncol=8) + theme_extras

preso_save("img/mtcars_nonaligned.png", theme=theme_preso+theme_extras)

Position on a common scale

Ahhhh…

theme_extras <- theme(axis.text.y=element_text(size=rel(0.7)),
                      axis.title.y=element_text(size=rel(0.7)),
                      panel.grid.major =   element_line(colour = "grey20", size = 0.2))

ggplot(mtcars, aes(x=mpg, y=model)) + 
  geom_point(color="black") + 
  theme_extras + ylab(NULL)

preso_save("img/mtcars_aligned.png", theme=theme_preso+theme_extras)

Ordering still crucial.

mtcars$model2<-factor(as.character(mtcars$model))
ggplot(mtcars, aes(x=mpg, y=model2)) + 
  geom_point(color="black") + 
  theme_extras + ylab(NULL)

preso_save("img/mtcars_aligned_alpha.png", theme=theme_preso+theme_extras)

Dropping the scale to zero is a mistake here, lowers resolution and isn’t needed for accurate ratioing.

ggplot(mtcars, aes(x=mpg, y=model)) + 
  geom_point(color="black") + 
  theme_extras + ylab(NULL) + expand_limits(x=0)

preso_save("img/mtcars_aligned_zero.png", theme=theme_preso+theme_extras)

Observations

Stacked anything is nearly always a mistake

A stacked bar chart does two things, neither of the well.

ggplot(diamonds, aes(clarity, fill=cut, group=cut)) + 
  geom_bar(stat="count", position="stack") +
  scale_y_continuous("Count", labels=comma)

preso_save("img/diamonds_stack.png")

The first is the total in each group.

diamonds_summary<-diamonds %>%
  group_by(clarity) %>%
  summarize(n=length(clarity))

ggplot(diamonds_summary, aes(clarity, n)) + 
  geom_line(group=1, color="black") + 
  geom_point(color="black") + 
  expand_limits(y=0) +
  scale_y_continuous("Count", labels=comma)

preso_save("img/diamonds_overall.png")

The second in the total in each combination, which is better shown with a parallel coordinates plot.

diamonds_summary2<-diamonds %>%
  group_by(clarity, cut) %>%
  summarize(n=length(clarity))

ggplot(diamonds_summary2, aes(clarity, n, color=cut, group=cut)) + 
  geom_line() + 
  geom_point() + 
  expand_limits(y=0) +
  scale_y_continuous("Count", labels=comma)

preso_save("img/diamonds_freqpoly.png")

Some phone data (from wikipedia).

phone_data<-
"Quarter    Windows Mobile  RIM Symbian iOS Android Bada    Windows Phone   Other
2016 Q2[39] 0   400.4   0   44395   296912  0   1971    680.6
2016 Q1[40] 0   660 0   51630   293771  0   2400    791
2015 Q4[41] 0   906.9   0   71526   325394  0   4395    887.3
2015 Q3[42] 0   977 0   46062   298797  0   5874    1133.6
2015 Q2[43] 0   1153    0   48086   271010  0   8198    1229
2015 Q1[44] 0   1325    0   60177   265012  0   8271    1268
2014 Q4[45] 0   1734    0   74832   279058  0   10425   1286.9
2014 Q3[46] 0   2420    0   38187   254354  0   9033    1310.2
2014 Q2[47] 0   2044    0   35345   243484  0   8095    2044
2014 Q1[44] 0   1714    0   43062   227549  0   7580    1371
2013 Q4[48] 0   1807    0   50224   219613  0   8534    1994
2013 Q3[49] 0   4401    458 30330   205023  633 8912    475
2013 Q2[50] 0   6180    631 31900   177898  838 7408    472
2013 Q1[51] 0   6219    1349    38332   156186  1371    5989    600
2012 Q4[52] 0   7333    2569    43457   144720  2684    6186    713
2012 Q3[53] 0   8947    4405    23550   122480  5055    4058    684
2012 Q2[54] 0   7991    9072    28935   98529   4209    4087    863
2012 Q1[55] 0   9939    12467   33121   81067   3842    2713    1243
2011 Q4[56] 0   13185   17458   35456   75906   3111    2759    1167
2011 Q3[57] 0   12701   19500   17295   60490   2479    1702    1018
2011 Q2[58] 0   12652   23853   19629   46776   2056    1724    1051
2011 Q1 982 13004   27599   16883   36350   1862    1600    1495
2010 Q4[56] 3419    14762   32642   16011   30801   2027    0   1488
2010 Q3[57] 2204    12508   29480   13484   20544   921 0   1991
2010 Q2[58] 3059    11629   25387   8743    10653   577 0   2011
2010 Q1[59] 3696    10753   24068   8360    5227    0   0   2403
2009 Q4[60] 4203    10508   23857   8676    4043    0   0   2517
2009 Q3[61] 3260    8523    18315   7040    1425    0   0   2531
2009 Q2[62] 3830    7782    20881   5325    756 0   0   2398
2009 Q1[63] 3739    7534    17825   3848    575 0   0   2986
2008 Q4[64] 4714    7443    17949   4079    639 0   0   3319
2008 Q3[65] 4053    5800    18179   4720    0   0   0   3763
2008 Q2[66] 3874    5594    18405   893 0   0   0   3456
2008 Q1[64] 3858    4312    18400   1726    0   0   0   4113
2007 Q4[64] 4374    4025    22903   1928    0   0   0   3536
2007 Q3[65] 4180    3192    20664   1104    0   0   0   3612
2007 Q2[66] 3212    2471    18273   270 0   0   0   3628
2007 Q1[64] 2931    2080    15844   0   0   0   0   4087"
phones<-read.delim(textConnection(phone_data))
phones <- phones %>%
  mutate(Other = Other + Bada + Windows.Phone,
         Bada = NULL,
         Windows.Phone = NULL,
         Quarter = str_replace(Quarter, "\\[\\d+\\]", ""),
         Year = as.integer(str_split_fixed(Quarter, " ", 2)[,1]),
         Quarter = str_split_fixed(Quarter, " ", 2)[,2],
         qtr = as.integer(str_replace(Quarter, "Q", "")),
         year = Year+0.25*(qtr-1)) %>%
  gather(os, ct, -Year, -Quarter, -qtr, -year) %>%
  mutate(ct = as.numeric(ct)) %>%
  group_by(year) %>%
  mutate(share = ct/sum(ct))
phones
## # A tibble: 228 x 7
## # Groups:   year [38]
##    Quarter  Year   qtr    year             os    ct share
##      <chr> <int> <int>   <dbl>          <chr> <dbl> <dbl>
##  1      Q2  2016     2 2016.25 Windows.Mobile     0     0
##  2      Q1  2016     1 2016.00 Windows.Mobile     0     0
##  3      Q4  2015     4 2015.75 Windows.Mobile     0     0
##  4      Q3  2015     3 2015.50 Windows.Mobile     0     0
##  5      Q2  2015     2 2015.25 Windows.Mobile     0     0
##  6      Q1  2015     1 2015.00 Windows.Mobile     0     0
##  7      Q4  2014     4 2014.75 Windows.Mobile     0     0
##  8      Q3  2014     3 2014.50 Windows.Mobile     0     0
##  9      Q2  2014     2 2014.25 Windows.Mobile     0     0
## 10      Q1  2014     1 2014.00 Windows.Mobile     0     0
## # ... with 218 more rows

All rules have exceptions. A stacked area chart with phone data is ok because it makes the point more effectively.

ggplot(phones, aes(year, share, group=os, fill=os)) + geom_area()

preso_save("img/phones_area.png")

The line chart is weaker.

ggplot(phones, aes(year, share, group=os, color=os)) + geom_line()

Growth charts usually aren’t

This stacked area chart sucks.

library(gapminder)
continent_pop <- gapminder %>%
  group_by(continent, year) %>%
  summarize(pop=sum(as.numeric(pop))) %>%
  ungroup()
ggplot(continent_pop, aes(year, pop/1e9, group=continent, fill=continent)) + 
  geom_area() + 
  scale_y_continuous("Population (Billions)", labels=comma) +
  scale_color_brewer(type="qual", palette=1) + xlab(NULL)

preso_save("img/population_area.png")

This line chart isn’t much better if you care about growth rates, becuse growth is encoded with angle (slope).

ggplot(continent_pop, aes(year, pop/1e9, group=continent, color=continent)) + 
  geom_line() + scale_y_log10("Population (Billions)", labels=comma) +
  scale_color_brewer(type="qual", palette=1) + xlab(NULL)

preso_save("img/population.png")

If growth is important, take derivatives and plot growth directly on the y-axis.

continent_pop <- continent_pop %>%
  group_by(continent) %>%
  arrange(year) %>%
  mutate(growth=c(NA,diff(pop)))
ggplot(continent_pop, aes(year, growth/pop, group=continent, color=continent)) + 
  geom_line() + scale_color_brewer(type="qual", palette=1) + 
  xlab(NULL) + scale_y_continuous("Pct. Growth", labels=percent)
## Warning: Removed 5 rows containing missing values (geom_path).

preso_save("img/population_growth.png")
## Warning: Removed 5 rows containing missing values (geom_path).

Putting the data on a common scale helps tremendously

host_dat = data.frame(t=seq.POSIXt(as.POSIXct("2016-06-01 00:00:00"),
                                   as.POSIXct("2016-06-01 01:00:00"),
                                   by=30)) %>%
  mutate(cpu=rnorm(length(t), 40, 5),
         latency=rnorm(length(t), 4, 0.1))

host_dat$cpu[50] = 100
host_dat$latency[60] = host_dat$latency[60]*3

host_dat_w <- host_dat %>%
  gather(var, value, -t) %>%
  group_by(var) %>%
  mutate(std_value = (value-mean(value))/sd(value))

The user has to work to see if the peaks are aligned.

ggplot(host_dat_w, aes(t, value)) + geom_line(color="black") + 
  facet_wrap(~var, scales="free") + xlab(NULL)

preso_save("img/cpu_latency_nonaligned.png")

You want to put them on the same scale, but the magnitude of the series is different.

ggplot(host_dat_w, aes(t, value, color=var, group=var)) + 
  geom_line() + xlab(NULL) + ylab("Value") 

preso_save("img/cpu_latency_aligned.png")

So “standardize” them by suntracting the mean and dividing by the standard deviation.

ggplot(host_dat_w, aes(t, std_value, color=var, group=var)) + 
  geom_line() + xlab(NULL) + ylab("Standardized Value")

preso_save("img/cpu_latency_std_aligned.png")

TODO: Show deseasonalizing too (usually easy and automatic if many seasonal cycles are available).

Scatter plots show relationships directly

dat <- data.frame(x=seq(1,6*pi,length.out = 100))
# Two series, noe of which varies with the square, the other linearly
dat <- dat %>% 
  mutate(Dependency=1.4+sin(x), 
         Service=(1.1+sin(x))^2)
# Jitter the points
dat <- dat %>%
  mutate(Service=rnorm(length(x), mean = Service, sd=Service/10),
         Dependency=rnorm(length(x), mean = Dependency, sd=Dependency^(2/3)/10))
# Put in long format
dat_l <- dat %>%
  gather(var, value, -x)

They move together, but what, preciscely is the nature of the relationship?

ggplot(dat_l, aes(x, value))+geom_line(color="black")+facet_wrap(~var, scales="free")

preso_save("img/ts_vs_scatter_facet.png")
ggplot(dat_l, aes(x, value, color=var))+geom_line()

preso_save("img/ts_vs_scatter_ts.png")

A scatterplot is more revealing.

ggplot(dat, aes(Dependency, Service)) + geom_point(color="black") + 
  ylab("Some service's latency") + xlab("Its dependency")

preso_save("img/ts_vs_scatter_scatter.png")

It’s clear that the relationship isn’t linear.

ggplot(dat, aes(Dependency, Service))+geom_point(color="black")+geom_smooth(method="lm") + 
  ylab("Some service's latency") + xlab("Its dependency")

preso_save("img/ts_vs_scatter_lm.png")

A loess fit looks nice.

ggplot(dat, aes(Dependency, Service))+geom_point(color="black")+geom_smooth(method="loess") + 
  ylab("Some service's latency") + xlab("Its dependency")

preso_save("img/ts_vs_scatter_loess.png")

Part 3: Assembly

The Law of Continuity

The circle from the start is a good example, as is a simple scatterplot. Note that the continuity just has to be suggestive, not perfect.

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point(color="black") + 
  xlab("weight (1,000 lbs)")

preso_save("img/wt_mpg_scatter.png")

The Law of Similarity

Color is pretty good.

# Make number of cylinders a factor, so ggplot will choose a discrete scale.
mtcars$cyl<-factor(mtcars$cyl)

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point(aes(color=cyl)) + 
  xlab("weight (1,000 lbs)") +
  scale_color_brewer("Cylinders", type="qual", palette=1)

preso_save("img/wt_mpg_scatter_color.png")

These filled shapes aren’t great.

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point(aes(shape=cyl), color="black") + 
  xlab("weight (1,000 lbs)") +
  scale_shape("Cylinders")

preso_save("img/wt_mpg_scatter_shape1.png")

But these open ones are great.

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point(aes(shape=cyl), color="black") + 
  xlab("weight (1,000 lbs)") +
  scale_shape_manual("Cylinders", values = c(1,2,3))

preso_save("img/wt_mpg_scatter_shape2.png")

It’s perfectly ok to redundantly encode using shape and color.

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point(aes(shape=cyl, color=cyl)) + 
  xlab("weight (1,000 lbs)") +
  scale_shape_manual("Cylinders", values = c(1,2,3)) +
  scale_color_brewer("Cylinders", type="qual", palette=1)

preso_save("img/wt_mpg_scatter_both.png")

You’d think this would be a great encoding, but it’s terrible because of the similar curvature of the sixes and eights.

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point(aes(shape=cyl), color="black", size=2) + 
  xlab("weight (1,000 lbs)") +
  scale_shape_manual(values = c(52,54,56))

preso_save("img/wt_mpg_scatter_letters.png")

Overlaying linear fits encloses space and makes the grouped objects very strong.

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point(aes(shape=cyl, color=cyl)) + 
  geom_smooth(aes(group=cyl, color=cyl), method="lm") + 
  xlab("weight (1,000 lbs)") +
  scale_shape_manual("Cylinders", values = c(1,2,3)) +
  scale_color_brewer("Cylinders", type="qual", palette=1)

preso_save("img/wt_mpg_scatter_lm.png")

Law of Proximity

A dodged bar chart. Try to compare the blue vs. green bars. I find it very difficult.

ggplot(diamonds, aes(clarity, fill=cut, group=cut)) + 
  geom_bar(stat="count", position="dodge") +
  scale_y_continuous("Count", labels=comma)

preso_save("img/diamonds_dodge.png")

Assembly with iris data

Another dataset that’s good for demonstrating assembly. I didnt use these in the presentation.

ggplot(iris, aes(Sepal.Length, Sepal.Width, color=Species))+geom_jitter()

ggplot(iris, aes(Sepal.Length, Petal.Width, color=Species))+geom_jitter()

ggplot(iris, aes(Sepal.Length, Petal.Length, color=Species))+geom_jitter()

Part 4: Detection

People frequenty hinder detection by lowering the contrast in luminance.

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point(color="gray30") + 
  xlab("weight (1,000 lbs)")

preso_save("img/wt_mpg_scatter_grey.png")

Or by hiding the data with non-data clutter, like gridlines.

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point(color="black", size=0.2) + 
  xlab("weight (1,000 lbs)")

preso_save("img/wt_mpg_scatter_small.png")

theme_extras <- theme(panel.grid.major =   element_line(colour = "grey20", size = 0.2))
p<-ggplot(mtcars, aes(wt, mpg)) + 
  geom_point(color="black", size=1.5) + 
  xlab("weight (1,000 lbs)")

theme_extras <- theme(panel.border =       element_rect(fill = NA, colour = "white"),
                      panel.grid.major =   element_line(colour = "white", size = 0.2))
p+theme_extras

preso_save("img/wt_mpg_scatter_grid1.png", theme=theme_preso + theme_extras)

theme_extras <- theme(panel.border =       element_rect(fill = NA, colour = "white"),
                      panel.grid.minor =   element_line(colour = "white", size = 0.2),
                      panel.grid.major =   element_line(colour = "white", size = 0.5))
p+theme_extras

preso_save("img/wt_mpg_scatter_grid2.png", theme=theme_preso + theme_extras)

theme_extras <- theme(panel.border =       element_rect(fill = NA, colour = "white"),
                      panel.grid.minor =   element_line(colour = "white", size = 0.5),
                      panel.grid.major =   element_line(colour = "white", size = 1))
p+theme_extras

preso_save("img/wt_mpg_scatter_grid4.png", theme=theme_preso + theme_extras)

The Hermann Grid illusion. Do you see the faint grey dots?

theme_extras <- theme(panel.border =       element_rect(fill = NA, colour = "white"),
                      panel.grid.major =   element_line(colour = "white", size = 3.5))
ggplot(mtcars, aes(wt, mpg)) + 
  geom_point(color="black", size=0, alpha=0) + 
  xlab("weight (1,000 lbs)") + theme_extras

preso_save("img/wt_mpg_scatter_grid3.png", theme=theme_preso + theme_extras)

Decent grid and borders

theme_extras <- theme(panel.border =       element_rect(fill = NA, colour = "grey70"),
                      panel.grid.major =   element_line(colour = "grey70", size = 0.2))
p+theme_extras

preso_save("img/wt_mpg_scatter_grid5.png", theme=theme_preso + theme_extras)

Part 5: Other Useful results

Weber’s Law

weber <- data.frame(bar = c("c", "a", "b", "bb"),
                    zpad = c(3, 0, 0, 0),
                    ylen = c(10, 10.5, 0, 0),
                    xtop = c(2, 1.5, 0, 0))

weber1 <- weber %>% 
  select(-xtop) %>%
  gather(var, val, -bar) 

theme_extras <- theme(axis.title=element_blank(),
                      axis.text=element_blank(),
                      axis.ticks=element_blank(),
                      legend.position="none")

It’s hard to tell if the bars are the same legnth or different.

ggplot(weber1, aes(bar, val, fill=var)) + geom_bar(stat="identity") + 
  scale_fill_manual(values=c("white", "black")) +
  expand_limits(y=c(0,15)) + theme_extras

preso_save("img/weber1.png", theme=theme_preso+theme_extras)

But with the little blue ones, it’s easy, even though the absolute difference in length is exactly the same as with the white bars (0.5 units).

weber2 <- weber %>% 
  gather(var, val, -bar) 

ggplot(weber2, aes(bar, val, fill=var)) + geom_bar(stat="identity") + 
  scale_fill_manual(values=c(muted("blue",l=60), "white", "black")) +
  expand_limits(y=c(0,15)) + theme_extras

preso_save("img/weber2.png", theme=theme_preso+theme_extras)

Define eight different curves.

to_plot<-data.frame(x=seq(1,100,length.out=500)) %>%
  mutate(y=log(x),
         y2 = -(((x-100))^2),
         y03 = 100-1/x*y,
         y04 = 100-2/x*y,
         y05 = 100-2/(x+10)*y,
         y06 = 100-2/(x+5)*y,
         y07 = 99.8-1.5/(x+5)*y,
         y08 = 100-2.5/(x+5)*y,
         y09 = 100-1.5/(x+52)*y,
         y10 = 100-1.5/(x+25)*y
         )

Without gridlines it’s hard to compare them precisely.

theme_extras <- theme(axis.title=element_blank(),
                      axis.text=element_blank(),
                      axis.ticks=element_blank(),
                      panel.border = element_rect(fill = NA, colour = "grey70"),
                      strip.text=element_blank())

ggplot(to_plot %>% select(-y2, -y) %>% gather(var, val, -x),
       aes(x,val)) + geom_line(color="black") + facet_wrap(~var, ncol=2) + theme_extras

preso_save("img/weber_grid1.png", width=1280/300/2, theme=theme_extras)

It’s much easier with gridlines, because of Weber’s law.

theme_extras <- theme(axis.title=element_blank(),
                      axis.text=element_blank(),
                      axis.ticks=element_blank(),
                      panel.border = element_rect(fill = NA, colour = "grey70"),
                      panel.grid.major = element_line(colour = "grey60", size = 0.2),
                      strip.text=element_blank())

ggplot(to_plot %>% select(-y2, -y) %>% gather(var, val, -x),
       aes(x,val)) + geom_line(color="black") + facet_wrap(~var, ncol=2) + theme_extras

preso_save("img/weber_grid2.png", width=1280/300/2, theme=theme_extras)

You are most sensitive to changes in angle close to 45 degrees

The same data ploteed three ways. It’s easiest to detect the change in slope near 45 degrees.

dat<-data.frame(ct=c(1e6-10000, 1e6, 1e6+15000),
                dt=as.Date(c("2016-06-01","2016-06-02", "2016-06-03")))
p<-ggplot(dat, aes(dt, ct)) + 
  geom_line(group=1, color="black") + 
  geom_point(color="black") + ylab(NULL) + scale_x_date(NULL, labels=date_format("%b-%d"))
p

preso_save("img/45deg.png")
p + expand_limits(y=c(800000,1200000))

preso_save("img/45deg_tall.png")
p + expand_limits(x=as.Date(c("2016-05-01","2016-07-01")))

preso_save("img/45deg_wide.png")

Difficult to see with bars (where the y-axis must extend to zero).

ggplot(dat, aes(dt, ct)) + 
  geom_bar(stat="identity", fill="white") + 
  ylab(NULL) + scale_x_date(NULL, labels=date_format("%b-%d"))

preso_save("img/45deg_bars.png", width=1280/300/2)

For banking to 45, see the example with the sunspots data in part 6.

The eye measures perdendicular distance

Where is the distance between the lines the greatest?

dat<-data.frame(x=seq(-10, 8, length.out = 100)) %>%
  mutate(y=-2*(x^2)+300,
         y2=y-seq(30,5,length.out = 100)) 

dat_w <- dat %>%
  gather(var, val, y, y2)

ggplot(dat_w, aes(x, val, group=var))+
  geom_line(color="black")

preso_save("img/gap_lines.png", width=1280/300*(2/3))

Here’s the distance plotted directly. Suprisingly, the distance is the largest at the left edge.

ggplot(dat, aes(x, y-y2)) + geom_line(color="black") + ylab("gap")

preso_save("img/gap_direct.png")

Y-axis to zero?

See the saturation, area and length examples from the estimation section with mtcars.

With bars the y-axis must go to zero. This obscures the variation in this dataset.

to_plot<-data.frame(x=1:20) %>%
  mutate(y=1e6 + 100*x*(-1)^x)
ggplot(to_plot, aes(x,y)) + geom_bar(fill="white",stat="identity") + xlab(NULL) + ylab(NULL)

preso_save("img/yaxis_bars.png")

If we switch to points (and position on a common scale), we are free to clip the y-axis to just the relevant interval.

ggplot(to_plot, aes(x,y)) + geom_line(color="black") + geom_point(color="black") + xlab(NULL) + ylab(NULL)

preso_save("img/yaxis_lines.png")

Hue (used carefully) is good for revealing fine structure

ramps<-data.frame(x=seq(-5,5,length.out=200)) %>%
  mutate(linear=x/10+0.5,
         sigmoid=1/(1+exp(-x)))

theme_extras <- theme(axis.title.y=element_blank(),
                      axis.text.y=element_blank(),
                      axis.ticks.y=element_blank(),
                      legend.position="none")

ggplot(ramps, aes(x,1,fill=sigmoid)) + geom_tile() + theme_extras

preso_save("img/ramp_sigmoid.png", theme=theme_preso+theme_extras)

ggplot(ramps, aes(x,1,fill=linear)) + geom_tile() + theme_extras

preso_save("img/ramp_linear.png", theme=theme_preso+theme_extras)

theme_extras <- theme(axis.title.y=element_blank(),
                      axis.text.y=element_blank(),
                      axis.ticks.y=element_blank())

ggplot(ramps %>% gather(series,value,-x), aes(x,value,color=series)) + geom_line() + theme_extras

preso_save("img/ramp_both_lines.png", theme=theme_preso+theme_extras)

theme_extras <- theme(axis.title.y=element_blank(),
                      legend.position="none")

ggplot(ramps %>% gather(series,value,-x), aes(x,series,fill=value)) + geom_tile() + theme_extras

preso_save("img/ramp_both.png", theme=theme_preso+theme_extras)

jet.colors <- colorRampPalette(c("#00007F", "blue", "#007FFF", "cyan", "#7FFF7F", "yellow", "#FF7F00", "red", "#7F0000"))
ggplot(ramps %>% gather(series,value,-x), aes(x,series,fill=value)) + geom_tile() + theme_extras + scale_fill_gradientn(colors = jet.colors(7))

preso_save("img/ramp_both_hue.png", theme=theme_preso+theme_extras)

Viridis not a good as colorjet. Presumably because colorjet traverses more of the space of colors.

ggplot(ramps %>% gather(series,value,-x), aes(x,series,fill=value)) + geom_tile() + theme_extras + scale_fill_viridis()

Part 6: Tools/Workflow

Sunspot data

The data is from: http://www.sidc.be/silso/DATA/SN_d_tot_V2.0.txt.

First we load it in and tidy it up.

# Format:
#Line format [character position]:
#- [1-4] Year
#- [6-7] Month
#- [9-10] Day
#- [12-19] Decimal date
#- [22-24] Daily sunspot number
#- [26-30] Standard deviation
#- [33-35] Number of observations
#- [37] Definitive/provisional indicator

positions <- fwf_positions(start=c(1,6,9,12,22,26,33,37), 
                           end=c(4,7,10,19,24,30,35,37),
                           col_names=c("year", "month", "day", "decimal_date", 
                                       "sunspot_number", "std_dev", "n", "provisional"))
sunspots_raw<-read_fwf("http://www.sidc.be/silso/DATA/SN_d_tot_V2.0.txt", col_positions=positions)
## Parsed with column specification:
## cols(
##   year = col_integer(),
##   month = col_integer(),
##   day = col_character(),
##   decimal_date = col_double(),
##   sunspot_number = col_integer(),
##   std_dev = col_double(),
##   n = col_integer(),
##   provisional = col_character()
## )
sunspots <- sunspots_raw %>%
  mutate(day = as.integer(day),
         dt = as.Date(sprintf("%4d-%02d-%02d", year, month, day)),
         sunspot_number = ifelse(sunspot_number==-1, NA, as.numeric(sunspot_number)),
         std_dev = ifelse(std_dev==-1, NA, std_dev),
         provisional = !is.na(provisional)) %>%
  # There are many missing values before 1850 which complicate analysis
  filter(dt >= "1850-01-01") %>%
  select(-year, -month, -day, -decimal_date)
summary(sunspots)
##  sunspot_number      std_dev             n          provisional    
##  Min.   :  0.00   Min.   : 0.000   Min.   : 1.000   Mode :logical  
##  1st Qu.: 22.00   1st Qu.: 3.600   1st Qu.: 1.000   FALSE:61177    
##  Median : 65.00   Median : 6.700   Median : 1.000   TRUE :92       
##  Mean   : 84.89   Mean   : 7.075   Mean   : 4.765                  
##  3rd Qu.:130.00   3rd Qu.: 9.900   3rd Qu.: 1.000                  
##  Max.   :528.00   Max.   :77.700   Max.   :58.000                  
##        dt            
##  Min.   :1850-01-01  
##  1st Qu.:1891-12-09  
##  Median :1933-11-16  
##  Mean   :1933-11-16  
##  3rd Qu.:1975-10-24  
##  Max.   :2017-09-30

A first attempt at plotting… yuck.

ggplot(sunspots, aes(dt, sunspot_number)) + 
  geom_line(color="black") + 
  xlab(NULL) + ylab("Sunspot Number")

preso_save("img/sunspots_line.png")

Plot as dots, use alpha transparency to reduce overplotting. Notice the gaps between zero and non-zero. You wouldn’t have noticed those without visualization.

ggplot(sunspots, aes(dt, sunspot_number)) + 
  geom_point(size=0.5, alpha=1/40, color="black") + 
  xlab(NULL) + ylab("Sunspot Number")

preso_save("img/sunspots_point.png")

There are lots of things we could try to explain about this data, but the most obivous one is the major cycle that’s about 11 years long.

minima <- as.Date("1954-01-01") + seq(-9, 5) * 11 * 365.25
ggplot(sunspots, aes(dt, sunspot_number)) + 
  geom_point(size=0.5, alpha=1/40, color="black") + 
  geom_vline(xintercept=as.numeric(minima), color="lightblue") +
  xlab(NULL) + ylab("Sunspot Number")

preso_save("img/sunspots_vlines.png")

To make it easier to work with, let’s take monthly averages of the data.

# This works, but it's silly to work on this data at a daily level
#sunspots_ts<- ts(sunspots$sunspot_number, frequency=11*365.25)
#z <- fourier(sunspots_ts, K=3)
#fit <- auto.arima(sunspots_ts, d=1, xreg=z, allowdrift = F, seasonal=FALSE, trace = T)

sunspots_month <- sunspots %>% 
  mutate(month = as.Date(format(dt, "%Y-%m-01"))) %>%
  group_by(month) %>%
  summarize(mean_sn = mean(sunspot_number, na.rm=T))
summary(sunspots_month)
##      month               mean_sn      
##  Min.   :1850-01-01   Min.   :  0.00  
##  1st Qu.:1891-12-01   1st Qu.: 25.90  
##  Median :1933-11-01   Median : 72.55  
##  Mean   :1933-10-31   Mean   : 84.89  
##  3rd Qu.:1975-10-01   3rd Qu.:127.79  
##  Max.   :2017-09-01   Max.   :359.39
ggplot(sunspots_month, aes(month, mean_sn)) + geom_line(color="black")

preso_save("img/sunspots_month.png")
#ggplot(sunspots_month, aes(month, mean_sn)) + geom_point(color="black")

Now we can put a loess smoother over the monthly data, and we finally have a nice visualization of the series.

ggplot(sunspots_month, aes(month, mean_sn)) + geom_point(color="black", alpha=1/3) + 
  geom_smooth(method="loess", span=0.05, n=400) + 
  xlab(NULL) + ylab("Mean Sunspot Number")

preso_save("img/sunspots_month_smooth.png")

Now let’s impose a model on the data.

library(forecast)
## Warning: package 'forecast' was built under R version 3.4.2
tb<-tbats(sunspots_month$mean_sn, 
          use.trend=F, use.box.cox=F, 
          seasonal.periods = c(12*11))
tb_components_mat<-tbats.components(tb)
tb_components<-data.frame(month=sunspots_month$month,
                          observed = tb_components_mat[,"observed"],
                          level = tb_components_mat[,"level"],
                          season = tb_components_mat[,"season"],
                          residuals = residuals(tb),
                          fitted = fitted(tb))

tb_components_w <- tb_components %>%
  gather(variable, value, -month) %>%
  mutate(variable = factor(variable, levels=c("observed", "level", "season", "fitted", "residuals")))

… and visualize the results of the model.

ggplot(tb_components_w %>% filter(variable %in% c("observed","level","season")), 
       aes(month, value)) + 
  geom_line(color="black") + 
  facet_wrap(~variable, ncol=1, scales="free_y")

preso_save("img/sunspots_tbats.png")

But is the model good? A common model diagnostic is to look at the “residuals” the difference between the observed data and the model. According to our model, this should be normally distributed with a mean of zero.

binw=5
ggplot(tb_components, aes(residuals)) + geom_histogram(binwidth=binw, fill="lightgrey")

preso_save("img/sunspots_residuals.png")

If you don’t have great intuition about what a normal curve looks like, it can help to overlay one. It look like our model ok, but not great.

curve = data.frame(x=seq(min(tb_components$residuals), max(tb_components$residuals), length.out = 200)) %>%
  mutate(y=(binw*length(tb_components$residuals)) * dnorm(x, mean(tb_components$residuals), sd(tb_components$residuals)))
ggplot(tb_components, aes(residuals)) + geom_histogram(binwidth=binw, fill="lightgrey") +
  geom_line(data=curve, aes(x,y), color="red")

# Other diagnostic plots
qqnorm(residuals(tb))
plot(acf(residuals(tb), lag.max=150))
# 11 year moving average
plot(ma(sunspots$sunspot_number, round(365.25*11)))

Demonstrates banking to 45. There seems to be a pattern of a sharper attack and slower decay.

sunspots_month <- sunspots_month %>%
#  mutate(piece = ifelse(month < '1900-01-01', "1850-1900", ifelse(month < '1950-01-01', "1900-1950", "1950-2000"))) %>% 
  mutate(piece = ifelse(month < '1905-01-01', "1850-1905", ifelse(month < '1960-01-01', "1906-1960", "1960-2015"))) %>% 
  group_by(piece) %>%
  mutate(obs=1:length(month)) %>%
  ungroup()

theme_extras <- theme(axis.title.x=element_blank(),
                      axis.text.x=element_blank(),
                      axis.ticks.x=element_blank())

ggplot(sunspots_month %>% filter(month < '2015-01-01'), 
       aes(obs, mean_sn)) + geom_point(color="black", alpha=1/3) + 
  geom_smooth(method="loess", span=0.05*2, n=150) + facet_wrap(~piece, ncol=1) + 
  theme_extras + 
  ylab("Mean Sunspot Number")

preso_save("img/sunspots_month_wide.png", theme=theme_preso + theme_extras)

Appendix

Barley dataset

library(lattice) # For the barley dataset
# If you want to encourage comparing between years, then put them on a common scale
ggplot(barley, aes(yield, variety, color=year, shape=year))+
  geom_point()+facet_wrap(~site)

# If you want to encourage comparing between sites, then put them on a common scale
ggplot(barley, aes(variety, yield, group=site, color=site, shape=site))+
  geom_point()+geom_line()+facet_wrap(~year)+coord_flip()

# Can you spot the anomaly?
barley_differences <- barley %>%
  group_by(variety, site) %>%
  arrange(year) %>%
  summarize(difference = yield[2]-yield[1])

ggplot(barley_differences, 
       aes(difference, variety))+
  geom_point(color="black") + facet_wrap(~site)

ggplot(barley_differences, 
       aes(difference, variety))+
  geom_point(color="black") + geom_vline(xintercept=0) + facet_wrap(~site)

ggplot(barley_differences, 
       aes(difference, variety, color=factor(sign(difference))))+
  geom_point()+facet_wrap(~site)

Session Information

sessionInfo()
## R version 3.4.1 (2017-06-30)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.6
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] lattice_0.20-35   bindrcpp_0.2      gapminder_0.2.0  
##  [4] viridis_0.4.0     viridisLite_0.2.0 broom_0.4.2      
##  [7] scales_0.5.0      stringr_1.2.0     tidyr_0.7.1      
## [10] dplyr_0.7.4       readr_1.1.1       ggplot2_2.2.1    
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.13       RColorBrewer_1.1-2 compiler_3.4.1    
##  [4] plyr_1.8.4         bindr_0.1          tools_3.4.1       
##  [7] digest_0.6.12      evaluate_0.10.1    tibble_1.3.4      
## [10] gtable_0.2.0       nlme_3.1-131       pkgconfig_2.0.1   
## [13] rlang_0.1.2        psych_1.7.8        curl_2.8.1        
## [16] yaml_2.1.14        parallel_3.4.1     gridExtra_2.3     
## [19] knitr_1.17         hms_0.3            tidyselect_0.2.0  
## [22] rprojroot_1.2      grid_3.4.1         glue_1.1.1        
## [25] R6_2.2.2           foreign_0.8-69     rmarkdown_1.6     
## [28] reshape2_1.4.2     purrr_0.2.3        magrittr_1.5      
## [31] backports_1.1.1    htmltools_0.3.6    assertthat_0.2.0  
## [34] mnormt_1.5-5       colorspace_1.3-2   labeling_0.3      
## [37] stringi_1.1.5      lazyeval_0.2.0     munsell_0.4.3