|
The Domains & Families tab
The Domains and Families tab shows hits from the gene to domains and gene families from various databases. For each hit, an id, description, range, and coordinates are shown. Mousing over the id will show the member database's name, and clicking the id will pop up a new browser window to that website's page for that hit, if applicable. For those applicable domains and families, mousing over the e-value reported by HMMER (or, for COG, rpsblast, and for PDB, BLASTp) will show the score as reported by HMMER, rpsblast, BLASTp, or InterProScan. (For BlastProDom and ProfileScan, which do not use HMMER, e-values are not currently shown.) Many models that are part of our analysis pipeline have been classified by InterPro. In these cases where a FastHMM or HMMER hit has an InterPro entry, the InterPro description is shown, and links to the page describing that particular domain or family. Mousing over the description will show the InterPro id of the hit, if available. The range is color-coded so that hits with the same InterPro id have the same color; unclassified hits are gray. A legend is at the bottom of the page. For PDB hits, we also show the percent identity returned by BLASTp. If available, mousing over the range will show you the coverage of the hit to the target, as well as the length of the entire domain or family. We have built gene trees for most domains. Clicking the red T will bring you to the gene tree for that gene and domain. You can sort the domain hits by start, by IPR, by domain database, or by category. Sorting by start simply lists each hit from the beginning of the gene to the end; if multiple hits have the same start, longer hits are shown first. Sorting by IPR groups hits with the same IPR id together; unclassified hits are shown after all classified hits. Sorting by domain database sorts by start within each database. Sorting by domain/family/pdb/site groups each hit based on whether InterPro classifies it as a domain or a gene family; unclassified or other hits are shown unders Sites, and PDBs are grouped separately as well. We are currently transitioning our HMM-based analyses to HMMER version 3. For the March 2010 release, HMMER 3 is still officially in beta testing, so we are using version 3.0b3. (3.0rc1 was released too late to be used for this release.) Because HMMER 3 does not convert HMMER 2 model parameters (see this blog entry), we have opted to use FastHMM for our TIGRFam (releae 9.0) analysis. (When TIGRFams are released with HMMER 3 models, which should be for TIGRFam 10.0, we will switch to using HMMER3 for TIGRFam.) We use HMMER 3 for Pfam (version 24), and Superfamily (version 1.73), and the other InterPro database members (Gene3D, PIRSF, SMART, PANTHER) using the InterPro versions of those databases. InterPro is a database that classifies various member databases, so that similar models all have the same InterPro id. InterProScan is a tool that allows users to analyze genes against all InterPro member databases. We use an internally modified version of InterProScan for all non-HMM-based analyses. For HMM-based analyses we instead use HMMER3 or FastHMM. For previous releases, the FastHMM pipeline replaced using hmmpfam for searching many HMMs with many query sequences. We use or generate alignments for each family of HMMs, then use PSIBLAST with a high cutoff to analyze each gene sequence against each alignment. The PSIBLAST output is filtered to find candidate sequences for each HMM, and only those sequences are analyzed using hmmsearch against the matching HMM. The result is that, instead of analyzing over one million genes against each HMM, we analyze on the order of a few hundred, thus reducing processor time by about 20-fold. We have found no false positives and a very few false negatives compared to running hmmsearch. You can learn more about FastHMM, or download it, here. The June 2009 release of MicrobesOnline, the last to use FastHMM for all HMM databases, used these versions of external HMM databases:
last updated February 2010 MicrobesOnline Home Page |