Bio3D_introduction.Rmd
Bio3D1 is a group of R packages containing utilities for the analysis of biomolecular structure, sequence and trajectory data (Grant et al. 2006). Features include the ability to read and write biomolecular structure, sequence and dynamic trajectory data, perform atom selection, re-orientation, superposition, rigid core identification, clustering, distance matrix analysis, conservation analysis, normal mode analysis and principal component analysis. Bio3D takes advantage of the extensive graphical and statistical capabilities of the R environment and thus represents a useful framework for exploratory analysis of structural data.
The functionality for various complex analysis tasks is packed in separate Bio3D packages, including:
bio3d (the core package) for data processing and basic analysis, including alignment, sequence and structure comparisons, and inter-conformer analysis with PCA.
bio3d.nma for ensemble normal mode analysis aimed at predicting and contrasting functional dynamics across protein families.
bio3d.cna for protein structure and correlation network analysis to characterize correlated protein motions underlying allosteric regulation.
bio3d.web for enabling user-friendly online interactive analysis of protein structures and their dynamics.
bio3d.eddm for ensemble difference distance matrix analysis approach to characterizing functionally significant conformational changes.
bio3d.view (in development) for interactive 3D visualization.
The aim of this document, termed a vignette2 in R parlance, is to provide a brief overview of Bio3D. A number of other Bio3D package vignettes are available, including:
At the time of writing these include3:
Before you attempt to install Bio3D packages you should have a relatively recent version of R installed and working on your system. Detailed instructions for obtaining and installing R on various platforms can be found on the R home page.
To get the most out of Bio3D you should be quite familiar with basic R usage. There are several on–line resources that can help you get started using R. Again they can be found from the R home page.
Some users find this a steep learning curve; your experience may be similar. However, if you have mastered basic vectors and matrices you should feel confident about getting stuck into using the Bio3D package.
There are a number of additional packages and programs that will either interface with Bio3D or that we consider generally invaluable for working with biomolecular structure (e.g. VMD or PyMOL) and sequence (e.g. Seaview) data. A brief description of how to obtain these additional packages is given below.
We are always interested in adding additional functionality to Bio3D. If you have ideas, suggestions or code that you would like to distribute as part of this package, please contact us. You are also encouraged to contribute your code or issues directly to our bitbucket repository for incorporation into the development version of the package. Please do get in touch – we would like to hear from you!
Start R (type R at the command prompt or, on Windows, double click on the R icon) and load a Bio3D package, e.g. the core bio3d package, by typing library(bio3d)
at the R console prompt.
Then use the command lbio3d()
and help()
to list the functions within the package
To get help on a particular function try ?
function or help(function)
. For example, ?pca.xyz
?pca.xyz
To search the help system for documentation matching a particular word or topic use the command help.search("topic")
. For example, help.search("pdb")
help.search("pdb")
Typing help.start()
will start a local HTML interface. After initiating help.start()
in a session the ?function
commands will open as HTML pages. To execute examples for a particular function use the command example(function)
. To run examples for the read.dcd
function try example(read.dcd
)
Run the command demo(bio3d)
to obtain a quick overview.
demo(bio3d)
The bio3d package consists of input/output functions, conversion and manipulation functions, analysis functions, and graphics functions all of which are fully documented. Remember that you can get help on any particular function by using the command ?function
or help(function)
from within R.
help(pca.xyz)
To better understand how a particular function operates it is often helpful to view and execute an example. Every function within the Bio3D packages is documented with example code that you can view by issuing the help
command.
Running the command example(function)
will directly execute the example for a given function. In addition, a number of worked examples are available as short Tutorials on the Bio3D website.
example(plot.bio3d)
pdb <- read.pdb("1hel")
## Note: Accessing on-line PDB file
print(pdb)
##
## Call: read.pdb(file = "1hel")
##
## Total Models#: 1
## Total Atoms#: 1186, XYZs#: 3558 Chains#: 1 (values: A)
##
## Protein Atoms#: 1001 (residues/Calpha atoms#: 129)
## Nucleic acid Atoms#: 0 (residues/phosphate atoms#: 0)
##
## Non-protein/nucleic Atoms#: 185 (residues: 185)
## Non-protein/nucleic resid values: [ HOH (185) ]
##
## Protein sequence:
## KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINS
## RWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDV
## QAWIRGCRL
##
## + attr: atom, xyz, seqres, helix, sheet,
## calpha, remark, call
head(pdb$atom)
## type eleno elety alt resid chain resno insert x y z o b
## 1 ATOM 1 N <NA> LYS A 1 <NA> 3.294 10.164 10.266 1 11.18
## 2 ATOM 2 CA <NA> LYS A 1 <NA> 2.388 10.533 9.168 1 9.68
## 3 ATOM 3 C <NA> LYS A 1 <NA> 2.438 12.049 8.889 1 14.00
## 4 ATOM 4 O <NA> LYS A 1 <NA> 2.406 12.898 9.815 1 14.00
## 5 ATOM 5 CB <NA> LYS A 1 <NA> 0.949 10.101 9.559 1 13.29
## 6 ATOM 6 CG <NA> LYS A 1 <NA> -0.050 10.621 8.573 1 13.52
## segid elesy charge
## 1 <NA> N <NA>
## 2 <NA> C <NA>
## 3 <NA> C <NA>
## 4 <NA> O <NA>
## 5 <NA> C <NA>
## 6 <NA> C <NA>
print(pdb$xyz)
##
## Total Frames#: 1
## Total XYZs#: 3558, (Atoms#: 1186)
##
## [1] 3.294 10.164 10.266 <...> 7.795 26.278 15.645 [3558]
##
## + attr: Matrix DIM = 1 x 3558
This document is available from the Bio3D website in R markdown, HTML, and PDF formats. All code can be extracted and automatically executed to generate Figures and/or the PDF with the following commands:
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS 10.15.5
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] bio3d_2.4-1.9000
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.5 digest_0.6.25 crayon_1.3.4 rprojroot_1.3-2
## [5] assertthat_0.2.1 grid_3.6.0 R6_2.4.1 backports_1.1.8
## [9] magrittr_1.5 evaluate_0.14 stringi_1.4.6 rlang_0.4.6.9000
## [13] fs_1.4.2 rmarkdown_2.3.2 pkgdown_1.5.1.9000 desc_1.2.0
## [17] tools_3.6.0 stringr_1.4.0 parallel_3.6.0 yaml_2.2.1
## [21] xfun_0.15 compiler_3.6.0 memoise_1.1.0 htmltools_0.5.0
## [25] knitr_1.29
Grant, B. J., A. P. D. C Rodrigues, K. M. Elsawy, A. J. Mccammon, and L. S. D. Caves. 2006. “Bio3d: An R Package for the Comparative Analysis of Protein Structures.” Bioinformatics 22: 2695–6. https://doi.org/10.1093/bioinformatics/btl461.
The latest version of the package, full documentation and further vignettes (including detailed installation instructions) can be obtained from the main Bio3D website: thegrantlab.org/bio3d/.↩︎
This vignette contains executable examples, see help(vignette)
for further details.↩︎
See also dedicated vignettes for ensemble NMA provided with the Bio3D package.↩︎