hmmer.Rd
Perform a HMMER search against the PDB, NR, swissprot or other sequence and structure databases.
hmmer(seq, type="phmmer", db = NULL, verbose = TRUE, timeout = 90)
seq | a multi-element character vector containing the query
sequence. Alternatively a ‘fasta’ object as obtained
from functions |
---|---|
type | character string specifying the ‘HMMER’ job type. Current options are ‘phmmer’, ‘hmmscan’, ‘hmmsearch’, and ‘jackhmmer’. |
db | character string specifying the database to search. Current options are ‘pdb’, ‘nr’, ‘swissprot’, ‘pfam’, etc. See ‘details’ for a complete list. |
verbose | logical, if TRUE details of the download process is printed. |
timeout | integer specifying the number of seconds to wait for the blast reply before a time out occurs. |
This function employs direct HTTP-encoded requests to the HMMER web server. HMMER can be used to search sequence databases for homologous protein sequences. The HMMER server implements methods using probabilistic models called profile hidden Markov models (profile HMMs).
There are currently four types of HMMER search to perform:
- ‘phmmer’: protein sequence vs protein sequence database.
(input argument seq
must be a sequence).
Allowed options for type
includes:
‘env_nr’, ‘nr’, ‘refseq’, ‘pdb’,
‘rp15’, ‘rp35’, ‘rp55’, ‘rp75’,
‘swissprot’, ‘unimes’, ‘uniprotkb’,
‘uniprotrefprot’, ‘pfamseq’.
- ‘hmmscan’: protein sequence vs profile-HMM database.
(input argument seq
must be a sequence).
Allowed options for type
includes:
‘pfam’, ‘gene3d’, ‘superfamily’, ‘tigrfam’.
- ‘hmmsearch’: protein alignment/profile-HMM vs protein sequence
database.
(input argument seq
must be an alignment).
Allowed options for type
includes:
‘pdb’, ‘swissprot’.
- ‘jackhmmer’: iterative search vs protein sequence database.
(input argument seq
must be an alignment).
‘jackhmmer’ functionality incomplete!!
Allowed options for type
includes:
‘env_nr’, ‘nr’, ‘refseq’, ‘pdb’,
‘rp15’, ‘rp35’, ‘rp55’, ‘rp75’,
‘swissprot’, ‘unimes’, ‘uniprotkb’,
‘uniprotrefprot’, ‘pfamseq’.
More information can be found at the HMMER website:
http://hmmer.org
A list object with components ‘hit.tbl’ and ‘url’. ‘hit.tbl’ is a data frame with multiple components depending on the selected job ‘type’. Frequently reported fields include:
a character vector containing the name of the target.
a character vector containing the accession identifier of the target.
a character vector containing secondary accession of the target.
same as ‘acc’.
a character vector containing Identifier of the target
a character vector containing entry description.
a numeric vector containing bit score of the sequence (all domains, without correction).
same as ‘score’.
a numeric vector containing the P-value of the score.
a numeric vector containing the E-value of the score.
a numeric vector containing minus the natural log of the E-value.
a numeric vector containing Number of regions evaluated.
a numeric vector containing the number of envelopes handed over for domain definition, null2, alignment, and scoring.
a numeric vector containing the total number of domains identified in this sequence.
a numeric vector containing the number of domains satisfying reporting thresholding.
a numeric vector containing the number of domains satisfying inclusion thresholding.
a character vector containing The NCBI taxonomy identifier of the target (if applicable).
a character vector containing the species name.
a character vector containing the kingdom of life that the target belongs to - based on placing in the NCBI taxonomy tree.
Note that the chained ‘pdbs’ HMMER field (used for redundant
PDBs) is included directly into the result list (applies only when
db='pdb'
). In this case, the ‘name’ component of the
target contains the parent (non redundant) entry, and the ‘acc’
component the chained PDB identifiers. The search results will therefore
provide duplicated PDB identifiers for component $name
, while
$acc
should be unique.
Grant, B.J. et al. (2006) Bioinformatics 22, 2695--2696.
Finn, R.D. et al. (2011) Nucl. Acids Res. 39, 29--37. Eddy, S.R. (2011) PLoS Comput Biol 7(10): e1002195.
See also the ‘HMMER’ website:
http://hmmer.org
Lars Skjaerven
Online access is required to query HMMER services.
blast.pdb
, plot.blast
,
seqaln
, get.seq
,
pfam
, uniprot
if (FALSE) { # HMMER server connection required - testing excluded ##- PHMMER seq <- get.seq("2abl_A", outfile=tempfile()) res <- hmmer(seq, db="pdb") ##- HMMSCAN fam <- hmmer(seq, type="hmmscan", db="pfam") pfam.aln <- pfam(fam$hit.tbl$acc[1]) ##- HMMSEARCH hmm <- hmmer(pfam.aln, type="hmmsearch", db="pdb") unique(hmm$hit.tbl$species) hmm$hit.tbl$acc }