Atom Selection from PDB and PRMTOP Structure Objects

Return the ‘atom’ and ‘xyz’ coordinate indices of ‘pdb’ or ‘prmtop’ structure objects corresponding to the intersection of a hierarchical selection.

atom.select(...)

# S3 method for pdb
atom.select(pdb, string = NULL,
                          type  = NULL, eleno = NULL, elety = NULL,
                          resid = NULL, chain = NULL, resno = NULL,
                          insert = NULL, segid = NULL, 
                          operator = "AND", inverse = FALSE,
                          value = FALSE, verbose=FALSE, ...)
# S3 method for pdbs
atom.select(pdbs, string = NULL, 
                           resno = NULL, chain = NULL, resid = NULL,
                           operator="AND", inverse = FALSE,
                           value = FALSE, verbose=FALSE, ...)
# S3 method for mol2
atom.select(mol, string=NULL,
                           eleno = NULL, elena = NULL, elety = NULL,
                           resid = NULL, chain = NULL, resno = NULL,
                           statbit = NULL,
			   operator = "AND", inverse = FALS"AND"FALSE
                           value = FALSE, verbose=FALSE,  ...)

# S3 method for prmtop
atom.select(prmtop, ...)

# S3 method for select
print(x, ...)

Arguments

...	arguments passed to `atom.select.pdb`, `atom.select.prmtop`, or `print`.
pdb	a structure object of class `"pdb"`, obtained from `read.pdb`.
pdbs	a numeric matrix of aligned C-alpha xyz Cartesian coordinates as obtained with `read.fasta.pdb` or `pdbaln`.
string	a single selection keyword from `calpha` `cbeta` `backbone` `sidechain` `protein` `nucleic` `ligand` `water` `h` or `noh`.
type	a single element character vector for selecting ‘ATOM’ or ‘HETATM’ record types.
eleno	a numeric vector of element numbers.
elena	a character vector of atom names.
elety	a character vector of atom names.
resid	a character vector of residue name identifiers.
chain	a character vector of chain identifiers.
resno	a numeric vector of residue numbers.
insert	a character vector of insert identifiers. Non-insert residues can be selected with `NA` or ‘’ values. The default value of `NULL` will select both insert and non-insert residues.
segid	a character vector of segment identifiers. Empty segid values can be selected with `NA` or ‘’ values. The default value of `NULL` will select both empty and non-empty segment identifiers.
operator	a single element character specifying either the AND or OR operator by which individual selection components should be combined. Allowed values are ‘"AND"’ and ‘"OR"’.
verbose	logical, if TRUE details of the selection are printed.
inverse	logical, if TRUE the inversed selection is retured (i.e. all atoms NOT in the selection).
value	logical, if FALSE, vectors containing the (integer) indices of the matches determined by `atom.select` are returned, and if TRUE, a `pdb` object containing the matching atoms themselves is returned.
mol	a structure object of class `"mol2"`, obtained from `read.mol2`.
statbit	a character vector of statbit identifiers.
prmtop	a structure object of class `"prmtop"`, obtained from `read.prmtop`.
x	a atom.select object as obtained from `atom.select`.

Details

This function allows for the selection of atom and coordinate data corresponding to the intersection of various input criteria.

Input selection criteria include selection string keywords (such as "calpha", "backbone", "sidechain", "protein", "nucleic", "ligand", etc.) and individual named selection components (including ‘chain’, ‘resno’, ‘resid’, ‘elety’ etc.).

For example, atom.select(pdb, "calpha") will return indices for all C-alpha (CA) atoms found in protein residues in the pdb object, atom.select(pdb, "backbone") will return indices for all protein N,CA,C,O atoms, and atom.select(pdb, "cbeta") for all protein N,CA,C,O,CB atoms.

Note that keyword string shortcuts can be combined with individual selection components, e.g. atom.select(pdb, "protein", chain="A") will select all protein atoms found in chain A.

Selection criteria are combined according to the provided operator argument. The default operator AND (or &) will combine by intersection while OR (or |) will take the union.

For example, atom.select(pdb, "protein", elety=c("N", "CA", "C"), resno=65:103) will select the N, CA, C atoms in the protein residues 65 through 103, while atom.select(pdb, "protein", resid="ATP", operator="OR") will select all protein atoms as well as any ATP residue(s).

Other string shortcuts include: "calpha", "back", "backbone", "cbeta", "protein", "notprotein", "ligand", "water", "notwater", "h", "noh", "nucleic", and "notnucleic".

In addition, the combine.select function can further combine atom selections using ‘AND’, ‘OR’, or ‘NOT’ logical operations.

Note

Protein atoms are defined as any atom in a residue matching the residue name in the attached aa.table data frame. See aa.table$aa3 for a complete list of residue names.

Nucleic atoms are defined as all atoms found in residues with names A, U, G, C, T, I, DA, DU, DG, DC, DT, or DI.

Water atoms/residues are defined as those with residue names H2O, OH2, HOH, HHO, OHH, SOL, WAT, TIP, TIP, TIP3, or TIP4.

Value

Returns a list of class "select" with the following components:

atom

a numeric matrix of atomic indices.

xyz

a numeric matrix of xyz indices.

call

the matched call.

References

Grant, B.J. et al. (2006) Bioinformatics 22, 2695--2696.

Author

Barry Grant, Lars Skjaerven

Examples


##- PDB example
# Read a PDB file
pdb <- read.pdb( system.file("examples/1hel.pdb", package="bio3d") )

# Select protein atoms of chain A
atom.select(pdb, "protein", chain="A")
#> 
#>  Call:  atom.select.pdb(pdb = pdb, string = "protein", chain = "A")
#> 
#>    Atom Indices#: 1001  ($atom)
#>    XYZ  Indices#: 3003  ($xyz)
#> 
#> + attr: atom, xyz, call

# Select all atoms except from the protein
atom.select(pdb, "protein", inverse=TRUE, verbose=TRUE)
#> 
#>  .. 00001001 atom(s) from 'string' selection 
#>  .. 00001001 atom(s) in final combined selection 
#>  .. 00000000 atom(s) in inversed selection 
#> 
#> 
#>  Call:  atom.select.pdb(pdb = pdb, string = "protein", inverse = TRUE, 
#>     verbose = TRUE)
#> 
#>    Atom Indices#: 0  ($atom)
#>    XYZ  Indices#: 0  ($xyz)
#> 
#> + attr: atom, xyz, call

# Select all C-alpha atoms with residues numbers between 43 and 54
sele <- atom.select(pdb, "calpha", resno=43:54, verbose=TRUE)
#> 
#>  .. 00000129 atom(s) from 'string' selection 
#>  .. 00000090 atom(s) from 'resno' selection 
#>  .. 00000012 atom(s) in final combined selection 
#> 

# Access the PDB data with the selection indices
print( pdb$atom[ sele$atom, "resid" ] )
#>  [1] "THR" "ASN" "ARG" "ASN" "THR" "ASP" "GLY" "SER" "THR" "ASP" "TYR" "GLY"
print( pdb$xyz[ sele$xyz ] )
#>  [1] 10.705 15.992 17.773 12.828 18.909 18.863 15.061 19.112 21.827 15.247
#> [11] 22.135 24.062 18.108 23.768 25.931 16.626 22.832 29.261 16.950 19.205
#> [21] 28.182 13.311 18.716 27.712 11.958 17.610 24.179  8.767 18.608 22.181
#> [31]  6.899 15.745 20.548  4.390 15.347 17.710

# Trim PDB to selection
ca.pdb <- trim.pdb(pdb, sele)

if (FALSE) {

##- PRMTOP example
prmtop <- read.prmtop(system.file("examples/crambin.prmtop", package="bio3d"))

## Atom selection
ca.inds <- atom.select(prmtop, "calpha")

}