01_conservation.r

Date Written:

2015-08-22

Descrption:

FInd conserved residue clusters in GalphaI PFAM alignment.
First:
- Read alignment
- Calculate conservation
- Calculate consensus sequence
- Plot conservation score along with conserved consnsus position labels

Usage and versions:

Developed in interactive R mode then rendered with rmarkdown (i.e.) library(rmarkdown); render("01_conservation.r")

See sessionInfo() output at end.

library(bio3d)

# Read PFAM alignment
aln <- read.fasta("../data/PF00503_seed.txt")

# Calculate and plot conservation score per alignment position
sim <- conserv(aln)
plot(sim, typ="h", ylab="Conservation", xlab="Alignment position")

# Zero-out low scoring positions for plot below
sim[sim < 0.6] = 0


## Calculate consensus sequence
con <- consensus(aln, cutoff = 0.8)
paste(con$seq, collapse="")
## [1] "--------------------------------------------------------------------------------------------------------------------------------------------------------------L-L-G---SG-----K-T--------------------------------------------------------------------------N-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------W-----------------------------------------------------------------------------------------------Y-P---------------------D------------------------------------------------------------------------------------------------------------------------------R------T-----------G---------------------------------------------------------------------D-G---G------------R--W--------F---------------F----------------------------------------E------------------------------------NR---------------------------------------------F-------------------------------------L--NK-D----K----------------------------------------------------------------------------------------------------F--------------------------------------------------------------------------"
# Determine which positions score more 0.75
conserved.inds <- which(sim > 0.75)

# Plot setup development
plot(sim, typ="h", ylab="Conservation", xlab="Alignment position", col="gray20")
text(conserved.inds, sim[conserved.inds], labels=con$seq[conserved.inds], col="blue")
rug(conserved.inds, col="blue")

# Plot results to PDF
pdf("../results/conservation.pdf", width=10)
plot(sim, typ="h", ylab="Conservation", xlab="Alignment position", col="gray20")
text(conserved.inds, sim[conserved.inds], labels=con$seq[conserved.inds], col="blue")
rug(conserved.inds, col="blue")
dev.off()
## quartz 
##      2

Report on software versions

sessionInfo()
## R version 3.1.2 (2014-10-31)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] rmarkdown_0.5.3.1 bio3d_2.2-2.9000 
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.8    evaluate_0.7    grid_3.1.2      htmltools_0.2.6
##  [5] knitr_1.10.5    magrittr_1.5    parallel_3.1.2  Rcpp_0.11.6    
##  [9] stringi_0.4-1   stringr_1.0.0   tools_3.1.2