Quantifies residue conservation in a given protein sequence alignment by calculating the degree of amino acid variability in each column of the alignment.

conserv(x, method = c("similarity","identity","entropy22","entropy10"),
        sub.matrix = c("bio3d", "blosum62", "pam30", "other"),
        matrix.file = NULL, normalize.matrix = TRUE)

Arguments

x

an alignment list object with id and ali components, similar to that generated by read.fasta.

method

the conservation assesment method.

sub.matrix

a matrix to score conservation.

matrix.file

a file name of an arbitary user matrix.

normalize.matrix

logical, if TRUE the matrix is normalized pior to assesing conservation.

Details

To assess the level of sequence conservation at each position in an alignment, the “similarity”, “identity”, and “entropy” per position can be calculated.

The “similarity” is defined as the average of the similarity scores of all pairwise residue comparisons for that position in the alignment, where the similarity score between any two residues is the score value between those residues in the chosen substitution matrix “sub.matrix”.

The “identity” i.e. the preference for a specific amino acid to be found at a certain position, is assessed by averaging the identity scores resulting from all possible pairwise comparisons at that position in the alignment, where all identical residue comparisons are given a score of 1 and all other comparisons are given a value of 0.

“Entropy” is based on Shannons information entropy. See the entropy function for further details.

Note that the returned scores are normalized so that conserved columns score 1 and diverse columns score 0.

Value

Returns a numeric vector of scores

References

Grant, B.J. et al. (2006) Bioinformatics 22, 2695--2696. Grant, B.J. et al. (2007) J. Mol. Biol. 368, 1231--1248.

Author

Barry Grant

Note

Each of these conservation scores has particular strengths and weaknesses. For example, entropy elegantly captures amino acid diversity but fails to account for stereochemical similarities. By employing a combination of scores and taking the union of their respective conservation signals we expect to achieve a more comprehensive analysis of sequence conservation (Grant, 2007).

See also

read.fasta, read.fasta.pdb

Examples

## Read an example alignment aln <- read.fasta(system.file("examples/hivp_xray.fa",package="bio3d")) ## Score conservation conserv(x=aln$ali, method="similarity", sub.matrix="bio3d")
#> [1] 9.765816e-01 9.952941e-01 8.513097e-01 9.733885e-01 9.952941e-01 #> [6] 9.952941e-01 6.199423e-01 9.850477e-01 1.000000e+00 9.379756e-01 #> [11] 1.000000e+00 1.000000e+00 9.766926e-01 7.709889e-01 1.000000e+00 #> [16] 9.790233e-01 1.000000e+00 1.000000e+00 9.766926e-01 9.258624e-01 #> [21] 9.905993e-01 9.790233e-01 1.000000e+00 9.730255e-01 8.756582e-01 #> [26] 1.000000e+00 1.000000e+00 9.915494e-01 1.000000e+00 9.776781e-01 #> [31] 9.545105e-01 9.590455e-01 6.639556e-01 9.443130e-01 9.271010e-01 #> [36] 9.153596e-01 5.746038e-01 1.000000e+00 9.743618e-01 9.761776e-01 #> [41] 7.836815e-01 9.766926e-01 9.577003e-01 1.000000e+00 9.518668e-01 #> [46] 9.404950e-01 9.750677e-01 9.831099e-01 9.534406e-01 9.534406e-01 #> [51] 1.109878e-05 1.109878e-05 1.000000e+00 1.000000e+00 1.000000e+00 #> [56] 9.564218e-01 9.790233e-01 9.766926e-01 9.860155e-01 9.813541e-01 #> [61] 1.000000e+00 9.766926e-01 9.674029e-01 9.696737e-01 4.100422e-01 #> [66] 8.215583e-01 1.000000e+00 9.860155e-01 5.633552e-01 9.813097e-01 #> [71] 9.790233e-01 9.971765e-01 8.923885e-01 9.696737e-01 9.831787e-01 #> [76] 1.000000e+00 9.860155e-01 9.836848e-01 9.766926e-01 1.000000e+00 #> [81] 9.720311e-01 1.000000e+00 1.000000e+00 6.371454e-01 1.000000e+00 #> [86] 8.794939e-01 9.674029e-01 1.000000e+00 1.000000e+00 9.776781e-01 #> [91] 9.850477e-01 9.340866e-01 1.000000e+00 9.743618e-01 9.813541e-01 #> [96] 1.000000e+00 5.092952e-01 9.790233e-01 1.000000e+00 1.000000e+00 #> [101] 9.850477e-01
##conserv(x=aln$ali,method="entropy22", sub.matrix="other")