HotspotEL: Hot Spot prediction in protein-protein interactions by ensemble learning



Abstract:

Background
  Hot spot residues are functional sites in protein interaction interfaces. The identification of hot spot residues is time-consuming and laborious using experimental methods. In order to address the issue, computational methods were developed to predict hot spot residues. Most prediction methods are based on structural features, sequence characteristics, or other protein features.


Results
  This article proposed an ensemble learning method to predict hot spot residues that only uses sequence features and the relative accessible surface area of amino acid sequences. In this work, a novel feature selection technique was developed, an auto-correlation function combined with a sliding window technique was applied to obtain the characteristics of amino acid residues in sequence, and an ensemble classifier using SVM and KNN classifiers was built to get the best classification performance.


Conclusion
  The experimental results showed that our model yields the highest F1 score of 0.92 and an MCC value of 0.87 on ASEdb dataset. Compared with other machine learning methods, our model achieves a big improvement in hot spot prediction.

Encoding schema for protein residues.
  The protein sequence was first converted to a numerical sequence using the 46 attributes of AAindex1. Then, each residue is encoded using the autocorrelation function combined with the sliding window. Here, R1 represents the 1st residue in the protein sequence, R2 represents the 2nd residue ..., and R3 represents the L-th residue, each of them belongs to the 20 common types of amino acids.

 

Software available:

 A simple Matlab implement of our predictor is available here: HotspotEL.

 

Suplementary Materils:
1. Table S1. The 46 relative independently properties of AAindex1
2. Table S2. Training dataset on ASEdb
3. Table S3. Test dataset derived form BID
4. Table S4. Test dataset derived from SKEMPI
5. Table S5. Test dataset derived from dbMPIKT
6. Table S6. Test dataset derived from Mixed dataset: supplementary materials

 

Citation:
Quanya Liu, Bing Wang, Hot Spot prediction in protein-protein interactions by ensemble learning. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2018. Submitted.

Copyright @ 2004-2016 by Peng Chen

All Rights Reserved