01_conservation.r
2015-08-22
FInd conserved residue clusters in GalphaI PFAM alignment.
First:
- Read alignment
- Calculate conservation
- Calculate consensus sequence
- Plot conservation score along with conserved consnsus position labels
Developed in interactive R mode then rendered with rmarkdown (i.e.) library(rmarkdown); render("01_conservation.r")
See sessionInfo() output at end.
library(bio3d)
# Read PFAM alignment
aln <- read.fasta("../data/PF00503_seed.txt")
# Calculate and plot conservation score per alignment position
sim <- conserv(aln)
plot(sim, typ="h", ylab="Conservation", xlab="Alignment position")
# Zero-out low scoring positions for plot below
sim[sim < 0.6] = 0
## Calculate consensus sequence
con <- consensus(aln, cutoff = 0.8)
paste(con$seq, collapse="")
## [1] "--------------------------------------------------------------------------------------------------------------------------------------------------------------L-L-G---SG-----K-T--------------------------------------------------------------------------N-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------W-----------------------------------------------------------------------------------------------Y-P---------------------D------------------------------------------------------------------------------------------------------------------------------R------T-----------G---------------------------------------------------------------------D-G---G------------R--W--------F---------------F----------------------------------------E------------------------------------NR---------------------------------------------F-------------------------------------L--NK-D----K----------------------------------------------------------------------------------------------------F--------------------------------------------------------------------------"
# Determine which positions score more 0.75
conserved.inds <- which(sim > 0.75)
# Plot setup development
plot(sim, typ="h", ylab="Conservation", xlab="Alignment position", col="gray20")
text(conserved.inds, sim[conserved.inds], labels=con$seq[conserved.inds], col="blue")
rug(conserved.inds, col="blue")
# Plot results to PDF
pdf("../results/conservation.pdf", width=10)
plot(sim, typ="h", ylab="Conservation", xlab="Alignment position", col="gray20")
text(conserved.inds, sim[conserved.inds], labels=con$seq[conserved.inds], col="blue")
rug(conserved.inds, col="blue")
dev.off()
## quartz
## 2
sessionInfo()
## R version 3.1.2 (2014-10-31)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] rmarkdown_0.5.3.1 bio3d_2.2-2.9000
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.8 evaluate_0.7 grid_3.1.2 htmltools_0.2.6
## [5] knitr_1.10.5 magrittr_1.5 parallel_3.1.2 Rcpp_0.11.6
## [9] stringi_0.4-1 stringr_1.0.0 tools_3.1.2