| ||||||||
InterPro home | Text Search | Sequence Search | Databases | Documentation | FTP site | Protein of the month |
|
|
The Signatures field lists the protein signature matches. For each protein signature the Member database, the signature ID, signature name and number of proteins it matches are given. The member database names are linked to their respective home page and the signature IDs are linked to the corresponding entry information page.
Functional classification of the entry is given by listing associated GO terms. The Gene Ontology project (GO) http://www.geneontology.org/ is a dynamic controlled vocabulary defined in three ontology's, molecular function, biological process and cellular component.
For each associated term the name of the term and GO accession number is given. The assignment of GO terms to InterPro entries was done manually by reading the abstract of the entries and annotation of proteins in the protein match table for each entry. An appropriate GO term for an entry is one, which applies to the whole protein. The GO terms associated with an InterPro entry applies to all proteins with true hits to the signatures in that entry. The assignments are incomplete and are ongoing due to the dynamic nature of the GO project. Some entries could be mapped to very low level (specific) GO terms, while entries describing wider families or common domains were mapped to higher level terms or could not be mapped at all. The GO terms and mappings can be found using the EBI QuickGo browser.
It is important to remember these mappings provide useful predictions of GO assignments to the corresponding proteins however, biological exceptions like inactivated enzymes may occur.
The Abstract describes the signatures in the entry, the protein matches, the taxonomic range and provides references. Where possible a functional inference is made.
Database links include, cross-references to:
The Taxonomy Display aims to provide a, 'at a glance', view of the taxonomic range of the sequences associated with each InterPro entry and the number of sequences associated with each lineage. The numbers associated with each taxonomic lineage are 'clickable' and link to the protein overview matches for the selected taxonomy, the species being sorted and displayed alphabetically. Full taxonomic information can be retrieved from the Newt taxonomy browser for the species by clicking on the taxonomic id number next to the species name on the display. Both the protein accession number and the protein overview match are clickable and return the detailed matches view for the protein. For proteins with a known structure a link to the MSD is provided in the InterPro name column.
The lineages were carefully selected to provide a view of the major groups of organisms. The circular display has the taxonomy-tree root as its centre. The model organisms selected populate the outer most circle. Nodes of the taxonomy-tree are placed on the inner circles. Radial lines lead to the description for each node. No significance is attached to the position of the node on a particular inner-circle, other than convenience, though some attempt has been made to group nodes. The nodes themselves are either true taxonomy nodes and have a NCBI taxonomy number or are artificial nodes created for this display; of which there are three: 'Unclassified', 'Other Eukaryota (Non-Metazoa)' and the 'Plastid Group'.
Artificial Taxon: 'Unclassified' contains the following NCBI taxon groups:
The Eukaryota (TAXONOMY:2759) comprises 29 taxons, these have been grouped into two artificial taxons and one existing taxon:
Fungi/Metazoa (TAXONOMY:33154); Node 'Metazoa'
Artificial Taxon; Plastid Group, this contains the following NCBI taxon groups:
Each taxonomic group within this artificial taxon contains organisms that have a plastid.
Artificial Taxon; Other Eukaryotes (Non-Metazoa), this comprises the following NCBI taxon groups:
Each taxonomic group within this artificial taxon are the remaining taxonomic groups of the NCBI taxon:2759, which are not in the Plastid Group and are not Fungi/Metazoa (TAXONOMY:33154).
Note, many UniProt proteins do not have a Database cross-reference to InterPro (DR line); therefore not all sequence records associated with an InterPro entry can be recovered using the InterPro accession number and the taxonomy group as search terms with SRS. In addition some PROSITE signatures give false positives, which could result in a misleading taxonomy display. Some protein records may have more than one taxonomy, for example where a mouse and human sequence have been merged, this will result in multiple taxonomy counts for a protein.
This section displays entries that share more than 70% of their proteins. Such overlaps define Parent/Child and Contains/Found In relationships between InterPro entries.
IPR000001 | Numbers of overlapping proteins | Average numbers of overlapping amino acids |
In the above example, InterPro entry IPR008293 contains proteins which are also found in IPR000001 as a result of the protein signatures of the two entries overlapping.
The two entries have been compared firstly by counting the number of proteins which are common to both, the results of which are displayed in the Venn diagram on the left, and secondly by calculating the average overlap of the protein signatures, in amino acids, with the results displayed in the bar diagram on the right.
Venn diagram display of the overlap of proteins common to both entries:
Bar diagram display of the average amino acid overlap between the protein signatures:
The average number of amino acids overlapping in the sequences of the 10 proteins common to both entries is then calculated, with the results displayed in the bar diagram on the right. The bar diagram display is only shown for 'Domain - Domain' relationships.
The protein entries in the examples, have a match status of TRUE, and illustrate as far as possible the kind of diversity in structure and function of the proteins in the InterPro entry. For each example protein the accession number, UniProt name and a compact view of the matches is given.
The reference field provides a list of publications associated with each InterPro entry. The list is often derived from the reference lists of the member databases.
InterPro protein matches are now calculated for all UniProt proteins, which are a combination of UniProt/Swiss-Prot, UniProt/TrEMBL and PIR proteins. For more information go to the UniProt home page.
Match lists give a number of different views of the signature matches on the sequences in each InterPro entry. Match information includes the protein sequence accession number, the accession number of the signature (PROSITE, PRINTS, Pfam, ProDom, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER), the position of the signature on the protein sequence and the status of the match (true, false positive, false negative or unknown).
Accession numbers provide a stable way of identifying InterPro entries from release to release. When the signatures in an InterPro entry are split or merged to give new or modified entries, then the accession number of the original InterPro entry becomes the secondary accession number in the new or modified InterPro entry.
In a recent change accession numbers are now linked to methods so any accession number that has been associated with a method will become a secondary accession number in the entry in which the method currently appears. In this way it will be possible to trace movement of methods through splitting and merging of entries.
Every InterPro entry has an accession number of the form IPRXXXXXX, where X is a digit. The accession number provides a stable way of identifying InterPro entries. InterPro accession numbers are stable and therefore allow unambiguous citation of database entries.
The InterPro entry Name describes the InterPro entry and should give an idea of the type of protein matches for that entry.
Type defines the entry as a Family, Domain, Repeat or Site. Sites are classified into either PTM, post-translational modification; AS, active site or BS, binding site.
An InterPro family is a group of evolutionarily related proteins that share similar domain (or repeat) architecture. One or more signatures may define an InterPro Family and a single signature may not necessarily cover the whole protein. A signature may also define a group of proteins with more than one function - a superfamily. A list of the current Families in InterPro is available: Family List.
An InterPro domain is an independent structural unit, which can be found alone or in conjunction with other domains or repeats. Domains are evolutionarily related. An InterPro entry of Type=Domain is diagnostic for a domain but does not necessarily define the domain boundaries exactly. A list of the current Domains in InterPro is available: Domain List.
An InterPro repeat is a region that is not expected to fold into a globular domain on its own. For example 6-8 copies of the WD40 repeat are needed to form a single globular domain. There are also many other short repeat motifs that probably do not form a globular fold that have TYPE=Repeat. A list of the current Repeats in InterPro is available: Repeat List.
A post-translational modification modifies the primary protein structure. This modification may be necessary for activation or de-activation of function. Examples include glycosylation, phosphorylation, and sulphation, splicing etc. The process of modification may be permanent or reversible and the process may be required for functional activation or deactivation. To be recognised in InterPro the sequence signature must be described. Many of the PTM sites have low specificity and the number of proteins recognised by the sequence signatures cannot be displayed. Such signatures also group together many functionally unrelated proteins. A list of the current PTMs in InterPro is available: PTM List.
An InterPro Binding site binds chemical compounds, which themselves are not substrates for a reaction. The compound, which is bound, may be a required co-factor for a chemical reaction, be involved in electron transport or be involved in protein structure modification. The binding is reversible and the amino acids involved in the binding reaction must be described for a site to be described. A list of the current Binding Sites in InterPro is available: Binding Site List.
Active sites are best known as the catalytic pockets of enzymes where a substrate is bound and converted to a product, which is then released. Distant parts of a protein's primary structure may be involved in the formation of the catalytic pocket. Therefore, to describe an active site, different signatures will be needed to cover the active site residues. A list of the current Active Sites in InterPro is available: Active Site List.
There are some cases where no matches are shown for an InterPro entry due to low specificity of the signature(s). The number of hits is excessive, and includes many false positives. In the case of some PTMs, the signatures are either general rules or weak patterns, resulting in a large number of matches. The InterPro entries affected are:
Every InterPro entry has an accession number of the form IPRXXXXXX, where X is a digit. The accession number provides a stable way of identifying InterPro entries. InterPro accession numbers are stable and therefore allow unambiguous citation of database entries.
The short name is a short, concise name unique to each InterPro entry.
The number of proteins, with match status TRUE, matching one or more of the entry signatures is displayed next to the short name.
Matches
|
|
||||||||||||||
Accession
|
IPR001413 Dopa1A_receptor Matches: 18 proteins | ||||||||||||||
Type
|
Family | ||||||||||||||
Signatures
|
|
||||||||||||||
Tree |
IPR000929 Dopamine receptor
|
||||||||||||||
Process
|
GO:0007186 G-protein coupled receptor protein signaling pathway
|
||||||||||||||
Function
|
GO:0004952 dopamine receptor activity
|
||||||||||||||
Component
|
GO:0016021 integral to membrane
|
||||||||||||||
Abstract
|
Dopamine neurons in the vertebrate central nervous system are involved in the initiation and execution of movement, the maintenance of emotional stability, and the regulation of pituitary function [5]. Various human neurological diseases (e.g., Parkinson disease and schizophrenia), are believed to be manifestations of dopamine and dopamine receptor imbalance. The receptors have been divided into several different subtypes, distinguished by their G-protein coupling, ligand specificity, anatomical distribution and physiological effects. D1 receptors are found in greatest abundance in the caudate-putamen, nucleus accumbens and olfactory tubercle, with lower levels in the frontal cortex, habenula, amygdala, hypothalamus and thalamus. In the periphery, binding sites are found in the kidney, heart, liver and parathyroid gland. The receptors stimulate adenylyl cyclase through G proteins; they may also be able to stimulate phosphoinositide metabolism [6].
G-protein-coupled receptors, GPCRs, constitute a vast protein family that encompasses a wide range of functions (including various autocrine, paracrine and endocrine processes). They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups. We use the term clan to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence [1]. The currently known clan members include the rhodopsin-like GPCRs, the secretin-like GPCRs, the cAMP receptors, the fungal mating pheromone receptors, and the metabotropic glutamate receptor family. There is a specialized database for GPCRs: http://www.gpcr.org/7tm/.
The rhodopsin-like GPCRs themselves represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [2, 3, 4]. |
||||||||||||||
Database links
|
IUPHAR:
2.1:DA:1:D1A:
|
Taxonomic coverage
|
|
Overlapping InterPro entries
|
|||
Rhodopsin-like GPCR superfamily |
Example proteins
|
|||||||||
P21728 D(1A) dopamine receptor
IPR001413Dopamine 1A receptor 9-25 IPR000276Rhodopsin-like GPCR superfamily 25-339 IPR000929Dopamine receptor 45-55 IPR000929Dopamine receptor 80-89 IPR000929Dopamine receptor 124-132 IPR001413Dopamine 1A receptor 233-244 IPR001413Dopamine 1A receptor 245-261 IPR000929Dopamine receptor 308-319 IPR000929Dopamine receptor 332-346 IPR001413Dopamine 1A receptor 351-369 IPR001413Dopamine 1A receptor 371-392 IPR001413Dopamine 1A receptor 410-443 More proteins
|
Publications
|
|
1. |
Attwood T.K.
,
Findlay J.B.C.
Fingerprinting G-protein-coupled receptors. Protein Eng. 7: 195-203 (1994) [PubMed: 8170923] |
2. |
Birnbaumer L.
G-proteins in signal transduction. Annu. Rev. Pharmacol. Toxicol. 30: 675-705 (1990) [PubMed: 2111655] |
3. |
Casey P.J.
,
Gilman A.G.
G-protein involvement in receptor-effector coupling. J. Biol. Chem. 263: 2577-2580 (1988) [PubMed: 2830256] |
4. |
Attwood T.K.
,
Findlay J.B.C.
Design of a discriminating fingerprint for G-protein-coupled receptors. Protein Eng. 6: 167-176 (1993) [PubMed: 8386361] |
5. |
Grandy D.K.
,
Marchionni M.A.
,
Makam H.
,
Stofko R.E.
,
Alfano M.
,
Frothingham L.
,
Fischer J.B.
,
Burke-Howie K.J.
,
Bunzow J.R.
,
Server A.C.
,
Civelli O.
Cloning of the cDNA and gene for a human D2 dopamine receptor. Proc. Natl. Acad. Sci. U.S.A. 86: 9762-9766 (1989) [PubMed: 2532362] |
6. |
Watson S.
,
Arkinstall S.
Dopamine. The G-protein Linked Receptor Factsbook. : 96-110 (1994) [PubMed: ] |