Read a PQR coordinate file.

read.pqr(file, maxlines = -1, multi = FALSE, rm.insert = FALSE,
         rm.alt = TRUE, verbose = TRUE)

Arguments

file

the name of the PQR file to be read.

maxlines

the maximum number of lines to read before giving up with large files. By default if will read up to the end of input on the connection.

multi

logical, if TRUE multiple ATOM records are read for all models in multi-model files.

rm.insert

logical, if TRUE PDB insert records are ignored.

rm.alt

logical, if TRUE PDB alternate records are ignored.

verbose

print details of the reading process.

Details

PQR file format is basically the same as PDB format except for the fields of o and b. In PDB, these two fields are filled with ‘Occupancy’ and ‘B-factor’ values, respectively, with each field 6-column long. In PQR, they are atomic ‘partial charge’ and ‘radii’ values, respectively, with each field 8-column long.

maxlines may require increasing for some large multi-model files. The preferred means of reading such data is via binary DCD format trajectory files (see the read.dcd function).

Value

Returns a list of class "pdb" with the following components:

atom

a data.frame containing all atomic coordinate ATOM and HETATM data, with a row per ATOM/HETATM and a column per record type. See below for details of the record type naming convention (useful for accessing columns).

helix

‘start’, ‘end’ and ‘length’ of H type sse, where start and end are residue numbers “resno”.

sheet

‘start’, ‘end’ and ‘length’ of E type sse, where start and end are residue numbers “resno”.

seqres

sequence from SEQRES field.

xyz

a numeric matrix of class "xyz" containing the ATOM and HETATM coordinate data.

calpha

logical vector with length equal to nrow(atom) with TRUE values indicating a C-alpha “elety”.

call

the matched call.

References

Grant, B.J. et al. (2006) Bioinformatics 22, 2695--2696.

For a description of PDB format (version3.3) see:
http://www.wwpdb.org/documentation/format33/v3.3.html.

Author

Barry Grant

Note

For both atom and het list components the column names can be used as a convenient means of data access, namely: Atom serial number “eleno” , Atom type “elety”, Alternate location indicator “alt”, Residue name “resid”, Chain identifier “chain”, Residue sequence number “resno”, Code for insertion of residues “insert”, Orthogonal coordinates “x”, Orthogonal coordinates “y”, Orthogonal coordinates “z”, Occupancy “o”, and Temperature factor “b”. See examples for further details.

See also

atom.select, write.pqr, read.pdb, write.pdb, read.dcd, read.fasta.pdb, read.fasta

Examples

# \donttest{ # PDB server connection required - testing excluded # Read a PDB file and write it as a PQR file pdb <- read.pdb( "4q21" )
#> Note: Accessing on-line PDB file
#> Warning: /var/folders/xf/qznxnpf91vb1wm4xwgnbt0xr0000gn/T//Rtmp4WslmZ/4q21.pdb exists. Skipping download
outfile = file.path(tempdir(), "eg.pqr") write.pqr(pdb=pdb, file = outfile) # Read the PQR file pqr <- read.pqr(outfile) ## Print a brief composition summary pqr
#> #> Call: read.pqr(file = outfile) #> #> Total Models#: 1 #> Total Atoms#: 1447, XYZs#: 4341 Chains#: 1 (values: A) #> #> Protein Atoms#: 1340 (residues/Calpha atoms#: 168) #> Nucleic acid Atoms#: 0 (residues/phosphate atoms#: 0) #> #> Non-protein/nucleic Atoms#: 107 (residues: 80) #> Non-protein/nucleic resid values: [ GDP (1), HOH (78), MG (1) ] #> #> Protein sequence: #> MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAG #> QEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDL #> AARTVESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQHKL #> #> + attr: atom, helix, sheet, seqres, xyz, #> calpha, call
## Examine the storage format (or internal *str*ucture) str(pqr)
#> List of 7 #> $ atom :'data.frame': 1447 obs. of 16 variables: #> ..$ type : chr [1:1447] "ATOM" "ATOM" "ATOM" "ATOM" ... #> ..$ eleno : num [1:1447] 1 2 3 4 5 6 7 8 9 10 ... #> ..$ elety : chr [1:1447] "N" "CA" "C" "O" ... #> ..$ alt : chr [1:1447] NA NA NA NA ... #> ..$ resid : chr [1:1447] "MET" "MET" "MET" "MET" ... #> ..$ chain : chr [1:1447] "A" "A" "A" "A" ... #> ..$ resno : num [1:1447] 1 1 1 1 1 1 1 1 2 2 ... #> ..$ insert: chr [1:1447] NA NA NA NA ... #> ..$ x : num [1:1447] 64.1 64 63.7 64.4 65.4 ... #> ..$ y : num [1:1447] 50.5 51.6 52.8 53.1 51.8 ... #> ..$ z : num [1:1447] 32.5 33.4 32.7 31.7 34.2 ... #> ..$ o : num [1:1447] 1 1 1 1 1 1 1 1 1 1 ... #> ..$ b : num [1:1447] 28.7 29.2 30.3 34.9 28.5 ... #> ..$ segid : chr [1:1447] NA NA NA NA ... #> ..$ elesy : chr [1:1447] NA NA NA NA ... #> ..$ charge: chr [1:1447] NA NA NA NA ... #> $ helix :List of 4 #> ..$ start: num(0) #> ..$ end : num(0) #> ..$ chain: chr(0) #> ..$ type : chr(0) #> $ sheet :List of 4 #> ..$ start: num(0) #> ..$ end : num(0) #> ..$ chain: chr(0) #> ..$ sense: chr(0) #> $ seqres: NULL #> $ xyz : 'xyz' num [1, 1:4341] 64.1 50.5 32.5 64 51.6 ... #> $ calpha: logi [1:1447] FALSE TRUE FALSE FALSE FALSE FALSE ... #> $ call : language read.pqr(file = outfile) #> - attr(*, "class")= chr [1:2] "pdb" "sse"
## Print data for the first four atom pqr$atom[1:4,]
#> type eleno elety alt resid chain resno insert x y z o b #> 1 ATOM 1 N <NA> MET A 1 <NA> 64.080 50.529 32.509 1 28.66 #> 2 ATOM 2 CA <NA> MET A 1 <NA> 64.044 51.615 33.423 1 29.19 #> 3 ATOM 3 C <NA> MET A 1 <NA> 63.722 52.849 32.671 1 30.27 #> 4 ATOM 4 O <NA> MET A 1 <NA> 64.359 53.119 31.662 1 34.93 #> segid elesy charge #> 1 <NA> <NA> <NA> #> 2 <NA> <NA> <NA> #> 3 <NA> <NA> <NA> #> 4 <NA> <NA> <NA>
## Print some coordinate data head(pqr$atom[, c("x","y","z")])
#> x y z #> 1 64.080 50.529 32.509 #> 2 64.044 51.615 33.423 #> 3 63.722 52.849 32.671 #> 4 64.359 53.119 31.662 #> 5 65.373 51.805 34.158 #> 6 65.122 52.780 35.269
## Print C-alpha coordinates (can also use 'atom.select' function) head(pqr$atom[pqr$calpha, c("resid","elety","x","y","z")])
#> resid elety x y z #> 2 MET CA 64.044 51.615 33.423 #> 10 THR CA 62.439 54.794 32.359 #> 17 GLU CA 63.968 58.232 32.801 #> 26 TYR CA 61.817 61.333 33.161 #> 38 LYS CA 63.343 64.814 33.163 #> 47 LEU CA 61.321 67.068 35.557
inds <- atom.select(pqr, elety="CA") head( pqr$atom[inds$atom, ] )
#> type eleno elety alt resid chain resno insert x y z o b #> 2 ATOM 2 CA <NA> MET A 1 <NA> 64.044 51.615 33.423 1 29.19 #> 10 ATOM 10 CA <NA> THR A 2 <NA> 62.439 54.794 32.359 1 28.10 #> 17 ATOM 17 CA <NA> GLU A 3 <NA> 63.968 58.232 32.801 1 30.95 #> 26 ATOM 26 CA <NA> TYR A 4 <NA> 61.817 61.333 33.161 1 23.42 #> 38 ATOM 38 CA <NA> LYS A 5 <NA> 63.343 64.814 33.163 1 21.34 #> 47 ATOM 47 CA <NA> LEU A 6 <NA> 61.321 67.068 35.557 1 18.99 #> segid elesy charge #> 2 <NA> <NA> <NA> #> 10 <NA> <NA> <NA> #> 17 <NA> <NA> <NA> #> 26 <NA> <NA> <NA> #> 38 <NA> <NA> <NA> #> 47 <NA> <NA> <NA>
## The atom.select() function returns 'indices' (row numbers) ## that can be used for accessing subsets of PDB objects, e.g. inds <- atom.select(pqr,"ligand") pqr$atom[inds$atom,]
#> type eleno elety alt resid chain resno insert x y z o #> 1341 HETATM 1342 MG <NA> MG A 273 <NA> 65.614 76.977 46.715 1 #> 1342 HETATM 1343 PB <NA> GDP A 274 <NA> 62.667 77.781 47.505 1 #> 1343 HETATM 1344 O1B <NA> GDP A 274 <NA> 61.587 77.413 46.626 1 #> 1344 HETATM 1345 O2B <NA> GDP A 274 <NA> 63.294 79.098 47.336 1 #> 1345 HETATM 1346 O3B <NA> GDP A 274 <NA> 63.804 76.731 47.410 1 #> 1346 HETATM 1347 O3A <NA> GDP A 274 <NA> 62.281 77.644 49.012 1 #> 1347 HETATM 1348 PA <NA> GDP A 274 <NA> 62.781 76.563 50.116 1 #> 1348 HETATM 1349 O1A <NA> GDP A 274 <NA> 64.200 76.858 50.463 1 #> 1349 HETATM 1350 O2A <NA> GDP A 274 <NA> 62.459 75.187 49.671 1 #> 1350 HETATM 1351 O5' <NA> GDP A 274 <NA> 61.927 76.929 51.222 1 #> 1351 HETATM 1352 C5' <NA> GDP A 274 <NA> 61.690 78.290 51.572 1 #> 1352 HETATM 1353 C4' <NA> GDP A 274 <NA> 61.260 78.393 53.002 1 #> 1353 HETATM 1354 O4' <NA> GDP A 274 <NA> 59.989 77.748 53.185 1 #> 1354 HETATM 1355 C3' <NA> GDP A 274 <NA> 62.181 77.747 54.015 1 #> 1355 HETATM 1356 O3' <NA> GDP A 274 <NA> 62.291 78.499 55.179 1 #> 1356 HETATM 1357 C2' <NA> GDP A 274 <NA> 61.548 76.420 54.295 1 #> 1357 HETATM 1358 O2' <NA> GDP A 274 <NA> 61.846 76.085 55.643 1 #> 1358 HETATM 1359 C1' <NA> GDP A 274 <NA> 60.078 76.792 54.224 1 #> 1359 HETATM 1360 N9 <NA> GDP A 274 <NA> 59.258 75.630 53.844 1 #> 1360 HETATM 1361 C8 <NA> GDP A 274 <NA> 59.255 75.041 52.612 1 #> 1361 HETATM 1362 N7 <NA> GDP A 274 <NA> 58.334 74.158 52.460 1 #> 1362 HETATM 1363 C5 <NA> GDP A 274 <NA> 57.550 74.278 53.590 1 #> 1363 HETATM 1364 C6 <NA> GDP A 274 <NA> 56.499 73.638 53.877 1 #> 1364 HETATM 1365 O6 <NA> GDP A 274 <NA> 56.005 72.734 53.233 1 #> 1365 HETATM 1366 N1 <NA> GDP A 274 <NA> 55.907 74.049 55.053 1 #> 1366 HETATM 1367 C2 <NA> GDP A 274 <NA> 56.436 74.939 55.901 1 #> 1367 HETATM 1368 N2 <NA> GDP A 274 <NA> 55.747 75.129 57.025 1 #> 1368 HETATM 1369 N3 <NA> GDP A 274 <NA> 57.574 75.569 55.662 1 #> 1369 HETATM 1370 C4 <NA> GDP A 274 <NA> 58.110 75.138 54.493 1 #> b segid elesy charge #> 1341 20.79 <NA> <NA> <NA> #> 1342 31.08 <NA> <NA> <NA> #> 1343 30.69 <NA> <NA> <NA> #> 1344 24.13 <NA> <NA> <NA> #> 1345 29.87 <NA> <NA> <NA> #> 1346 29.39 <NA> <NA> <NA> #> 1347 32.94 <NA> <NA> <NA> #> 1348 38.15 <NA> <NA> <NA> #> 1349 39.73 <NA> <NA> <NA> #> 1350 37.11 <NA> <NA> <NA> #> 1351 37.93 <NA> <NA> <NA> #> 1352 36.58 <NA> <NA> <NA> #> 1353 40.35 <NA> <NA> <NA> #> 1354 36.25 <NA> <NA> <NA> #> 1355 38.09 <NA> <NA> <NA> #> 1356 39.40 <NA> <NA> <NA> #> 1357 43.75 <NA> <NA> <NA> #> 1358 40.06 <NA> <NA> <NA> #> 1359 39.43 <NA> <NA> <NA> #> 1360 38.59 <NA> <NA> <NA> #> 1361 38.35 <NA> <NA> <NA> #> 1362 35.85 <NA> <NA> <NA> #> 1363 37.59 <NA> <NA> <NA> #> 1364 39.03 <NA> <NA> <NA> #> 1365 38.56 <NA> <NA> <NA> #> 1366 36.65 <NA> <NA> <NA> #> 1367 34.76 <NA> <NA> <NA> #> 1368 37.24 <NA> <NA> <NA> #> 1369 37.60 <NA> <NA> <NA>
pqr$xyz[inds$xyz]
#> [1] 65.614 76.977 46.715 62.667 77.781 47.505 61.587 77.413 46.626 63.294 #> [11] 79.098 47.336 63.804 76.731 47.410 62.281 77.644 49.012 62.781 76.563 #> [21] 50.116 64.200 76.858 50.463 62.459 75.187 49.671 61.927 76.929 51.222 #> [31] 61.690 78.290 51.572 61.260 78.393 53.002 59.989 77.748 53.185 62.181 #> [41] 77.747 54.015 62.291 78.499 55.179 61.548 76.420 54.295 61.846 76.085 #> [51] 55.643 60.078 76.792 54.224 59.258 75.630 53.844 59.255 75.041 52.612 #> [61] 58.334 74.158 52.460 57.550 74.278 53.590 56.499 73.638 53.877 56.005 #> [71] 72.734 53.233 55.907 74.049 55.053 56.436 74.939 55.901 55.747 75.129 #> [81] 57.025 57.574 75.569 55.662 58.110 75.138 54.493
## See the help page for atom.select() function for more details. # }