Hot Spots of PRotein INTerfaces Database


About

Welcome to Computational Hot Spots of Protein Interfaces (HotSprint) database. Hotsprint gives information about the evolutionary history of the residues on the interface and represents which residues are highly conserved on the interface. In this way, functionally and structurally important residues on the interface can be distinguished.

Definitions

Interface Residue:
Conventionally, protein-protein interface is the set of residues that are connected to each other through non covalent interactions. In HotSprint, two residues on opposite chains of a protein complex are considered as interface residues if the distance between any atom of the first residue and any atom of the other residue is below sum of van der Waals radii of two atoms plus 0.5A°.

Conserved Residue:
Some amino acids mutate more infrequently then others. Such amino acids are called conserved residues. In this particular work, a residue is considered as evolutionarily conserved (in sequence) if its conservation score (calculated by Rate4Site) is greater or equal to 7.

Average Conservation Score:
Average conservation score of an interface is the sum of conservation scores of interface residues divided by the number of interface residues.

Hot Spot Residue:
Certain residues on the surfaces of the proteins are critical for binding and structure. Experimentally these residues are found by alanine scanning mutagenesis. In the context of this study (database), hot spot refers to computationally predicted hot spot residues. How the prediction is done explained below in hot spot prediction models section.

Buried Accessible Surface Area (ASA):
Buried ASA is the surface area of a protein complex that becomes inaccessible to solvent upon complexation. It is found by differentiating sum of chain ASAs from sum of complex ASA. NACCESS is employed to calculate ASA of the proteins' individual chains and protein complexes.

Conserved Residue Propensity:
Conserved residue propensity is enrichment of a certain residue on the interface in terms of conservation. For a residue of type i, propensity is calculated by multiplying the ratio of conserved interface residues of type i to number of residues of type i in the chains with the ratio of the number of residues in the chains to the number of conserved residues in the chains.

Database Contents

Hotsprint contains overall properties of the interface such as number of computational hot spots on the interface, number of conserved residues on the interface, average conservation score of interface residues and buried ASA of the interface. Additionly, residues of the interface along with their position, name, conservation score, ASA in monomer, ASA in complex, type (contacting interface residue, neighboring interface residue or none) and whether the residue is computational hot spot or not information are presented.

Hot Spot Prediction Models

There are three different hot spot prediction models in HotSprint.
  1. Prediction based on only sequence conservation (score):
    The first model predicts hot spots based on the residues' evolutionary conservation in sequence. A residue is tagged as hot spot if its conservation score (score) is higher than specified threshold (default threshold is 6).

  2. Prediction based on propensity scaled sequence conservation (pScore):
    In the second model a residue is tagged as hot spot if its conserved residue scaled score (pScore) is higher than the specified threshold (default value is 6.2).

  3. Prediction based on propensity scaled sequence conservation and solvent accessible surface area (pScore+ASA):
    The third model flags a residue as hot spot if the residue has a propensity scaled conservation score higher than 6.2 and either its ASA change upon complexation is higher than 49A°² or its ASA in complex is lower than 12A°².

Web Interface Usage

Use query boxes provided below to start benefiting the database;
  • In the first query box on the main page, you may search through interface dataset to fetch associated interfaces of a given protein using its 4 letter pdb identifier.
  • Below the first query box, resides the second query box which allows you search for structures with given criteria.
  • The query box at the bottom provides access to residue information of individual chains that interfaces come from.

Retrieving HotSprint Database

Data in HotSprint Database is provided as a single database dump file in SQL format. Download compressed version of HotSprint Database dump file hotsprint_dumped.sql.gz (697M) . First, create a database named HotSprint in your SQL server, then decompress the file and finally create and fill tables by executing the SQL queries in the decompressed file (e.g. For mysql users: mysql -u userName -p databaseName < hotsprint_dumped.sql).

HotSprint MySQL database contains the following tables:(see Entity-Relationship diagram ):
  • chain_residues: Stores conservation score (as calculated by Rate4Site following a normal distribution with mean 0 and sigma 1), ASA in monomer information for the residues of each chain in PDB monomers/multimers.
  • partner_chain_residues: Stores ASA in complex information for the residues of each chain in PDB complexes along with the information of which residues are contained in the interface of that particular complex.
  • propensities: Stores conserved residue propensities of 20 amino acids.
  • interfaces_cumulative_information: Stores buried total ASA, average conserved residue propensities, # of total residues, # of conserved residues (with rescaled conservation score cutoff 7) and # of hotspots (with respect to 3 criteria explained above) for each interface. It is used to provide fast access for range queries.
  • interfaces: Stores size and total ASA information of interfaces.
Note that, individual mapping of which residues being hotspots is not stored, rather it is calculated on the fly on the web with the explained formulas (so that any hot spot criterion can be used and information is not replicated). For any hotspot prediction method; chain_residues table would be used to fetch conservation score and chain ASA of a residue, partner_chain_residues table would be used to get complex ASA of residues and conserved residue propensity of a given residue type would be taken from propensities table. Furthermore, conservation scores stored in the database, correspond to variability of residues, thus, the smaller the score, the more conserved a residue is. We adopted the approach Rate4site (Consurf) authors used to rescale scores so that residues have integer scores between 1-9 (where 1: less conserved, 9: more conserved). You can find a python script here achieving this task.

Citing HotSprint

Guney, E. and Tuncbag, N. and Keskin, O. and Gursoy, A., 2007, HotSprint: Database of Computational HotSpots at Protein Interfaces , Nucleic Acids Research DB issue, gkm813.

!! Corrections !!

In the NAR DB issue manuscript, the second "and" in the hot spot prediction formula should be "or". Therefore, the correct formulation is:
pScorei > t AND ( {Delta}ASA > tASA OR ASAcomplex < tASAx )
Moreover, the ASA cutoff on the web page used to calculate hot spots for the 3rd model was corrected to 49A°² (from the obsolete value of 72A°²).

We are grateful to Julie Baussand from University of Paris 7, for pointing these two issues out and apologize for the inconvenience.




Contact with the webmaster for any requests. Cosbi, 2007. Last modified on Aug 10, 2007.